Deployment

Monitoring & Observability

Set up health checks, structured logging, and OpenTelemetry tracing for your thinnestAI deployment.

Monitoring & Observability

Running thinnestAI in production requires visibility into what's happening. This guide covers health checks, logging, and distributed tracing with OpenTelemetry.

Health Checks

The backend exposes a health endpoint that verifies all critical services are operational.

Health Endpoint

curl https://your-api-url/api/health

Healthy Response (200 OK):

{
  "status": "healthy",
  "version": "1.0.0",
  "database": "connected",
  "redis": "connected",
  "uptime": "3d 14h 22m"
}

Unhealthy Response (503 Service Unavailable):

{
  "status": "unhealthy",
  "database": "disconnected",
  "redis": "connected",
  "error": "Cannot connect to PostgreSQL"
}

What's Checked

ComponentCheckFailure Impact
PostgreSQLConnection queryAPI returns errors
RedisPing commandCaching and rate limiting disabled
WorkersQueue connectivityBackground tasks stop processing

Using Health Checks

Docker Compose

services:
  backend:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/api/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

GCP Cloud Run

Cloud Run automatically performs health checks. Configure a startup probe for slower starts:

gcloud run deploy thinnestai-backend \
  --startup-cpu-boost \
  --cpu-throttling

External Monitoring

Point your uptime monitoring service (UptimeRobot, Pingdom, etc.) at:

https://your-api-url/api/health

Set alerts for:

  • Response status != 200
  • Response time > 5 seconds
  • Any "disconnected" values in the response body

Logging

thinnestAI uses structured logging with correlation IDs for tracing requests across services.

Log Format

In production, logs are output as JSON for easy parsing by log aggregators:

{
  "timestamp": "2026-03-05T14:30:00.123Z",
  "level": "INFO",
  "logger": "api.chat",
  "message": "Chat request processed",
  "correlation_id": "req_abc123",
  "user_id": "user_456",
  "agent_id": "agent_789",
  "duration_ms": 1250,
  "tokens_used": 173
}

In development, logs use a human-readable format:

2026-03-05 14:30:00 INFO [api.chat] Chat request processed (req_abc123) - 1250ms

Log Levels

LevelWhen to UseEnvironment Variable
DEBUGDetailed debugging infoDevelopment only
INFONormal operationsDefault for production
WARNINGSomething unexpected but not criticalAlways enabled
ERRORSomething failedAlways enabled

Set the log level via environment variable:

LOG_LEVEL=INFO

Correlation IDs

Every API request is assigned a unique correlation ID (correlation_id). This ID is:

  • Included in all log entries for that request.
  • Returned in the response header X-Correlation-ID.
  • Passed to background workers for end-to-end tracing.

To trace a specific request through the system:

# Find all logs for a specific request
# GCP Cloud Logging
gcloud logging read 'jsonPayload.correlation_id="req_abc123"' \
  --project YOUR_PROJECT \
  --limit 50

# Docker
docker-compose logs backend | grep "req_abc123"

Viewing Logs

GCP Cloud Logging

# Recent logs
gcloud run services logs read thinnestai-backend \
  --region us-central1 \
  --limit 100

# Stream live logs
gcloud run services logs tail thinnestai-backend \
  --region us-central1

# Filter by severity
gcloud logging read 'resource.type="cloud_run_revision" AND severity>=ERROR' \
  --project YOUR_PROJECT

# Filter by user
gcloud logging read 'jsonPayload.user_id="user_456"' \
  --project YOUR_PROJECT

Docker

# All logs
docker-compose logs -f backend

# Filter errors
docker-compose logs backend 2>&1 | grep ERROR

# Last 100 lines
docker-compose logs --tail 100 backend

Log Aggregation

For production deployments, send logs to a centralized platform:

PlatformIntegration
GCP Cloud LoggingAutomatic with Cloud Run
DatadogSet DD_API_KEY and use the Datadog agent
Grafana LokiConfigure a Loki logging driver
ELK StackUse Filebeat or Logstash to ship logs

OpenTelemetry

thinnestAI supports OpenTelemetry for distributed tracing, metrics, and enhanced observability.

Enabling OpenTelemetry

Set these environment variables:

OTEL_ENABLED=true
OTEL_SERVICE_NAME=thinnestai-backend
OTEL_EXPORTER_OTLP_ENDPOINT=http://your-collector:4318

What's Traced

OperationTrace Includes
API RequestsMethod, path, status code, duration
Database QueriesQuery type, table, duration
LLM CallsModel, provider, token counts, latency
Agent ExecutionAgent ID, tools used, steps taken
Background JobsJob type, queue, processing time
Redis OperationsCommand, key pattern, duration

Trace Example

A single chat request generates a trace like:

[Trace: req_abc123]
├── POST /api/chat (1250ms)
│   ├── auth.validate_token (5ms)
│   ├── db.get_agent (12ms)
│   ├── db.get_session (8ms)
│   ├── knowledge.search (45ms)
│   │   └── pgvector.similarity_search (38ms)
│   ├── llm.chat_completion (1150ms)
│   │   └── openai.gpt-4o-mini (1140ms)
│   ├── db.save_message (15ms)
│   └── billing.record_usage (10ms)

Collector Setup

Using Jaeger (Development)

# Add to docker-compose.yml
services:
  jaeger:
    image: jaegertracing/all-in-one:1.53
    ports:
      - "16686:16686"  # Jaeger UI
      - "4318:4318"    # OTLP HTTP
    environment:
      COLLECTOR_OTLP_ENABLED: true

Then set:

OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318

Access the Jaeger UI at http://localhost:16686.

Using Grafana Tempo (Production)

OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://your-tempo-instance:4318

Using GCP Cloud Trace

For GCP deployments, traces are automatically sent to Cloud Trace when running on Cloud Run with the appropriate service account permissions:

OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=https://monitoring.googleapis.com

Custom Spans

If you extend thinnestAI with custom code, add tracing to your functions:

from monitoring.otel import get_tracer

tracer = get_tracer(__name__)

async def my_custom_function():
    with tracer.start_as_current_span("my_custom_operation") as span:
        span.set_attribute("custom.key", "value")
        # Your code here
        result = await do_something()
        span.set_attribute("custom.result_count", len(result))
        return result

Key Metrics to Monitor

Application Metrics

MetricWhat to WatchAlert Threshold
Request latency (p95)Response time degradation> 5 seconds
Error ratePercentage of 5xx responses> 1%
Active sessionsConcurrent usersVaries by capacity
Token usageDaily/hourly consumptionBudget thresholds

Infrastructure Metrics

MetricWhat to WatchAlert Threshold
CPU utilizationSustained high usage> 80% for 5 minutes
Memory usageMemory leaks> 85%
Database connectionsPool exhaustion> 80% of pool size
Redis memoryCache pressure> 75% of allocated
Disk usageStorage filling up> 80%

Business Metrics

MetricDescription
Messages per hourChat volume trend
Voice minutes per dayVoice usage trend
Agent response qualityTrack user feedback
Campaign delivery rateSuccessful deliveries vs. failures

Dashboard Setup

Grafana

If using Grafana, import these dashboards:

  1. API Overview — Request rates, latencies, error rates.
  2. Agent Performance — Per-agent metrics, model usage, token costs.
  3. Infrastructure — Database, Redis, CPU, memory.

GCP Cloud Monitoring

Create custom dashboards in the GCP Console:

  1. Go to Monitoring > Dashboards.
  2. Click Create Dashboard.
  3. Add widgets for Cloud Run metrics (request count, latency, memory).
  4. Add Cloud SQL metrics (connections, CPU, storage).
  5. Add Memorystore metrics (memory usage, connections).

Alerting

Critical Alerts (Page Someone)

  • Health endpoint returns non-200 for > 2 minutes.
  • Error rate exceeds 5% for > 5 minutes.
  • Database connection failures.
  • Redis unreachable.

Warning Alerts (Notify)

  • p95 latency exceeds 3 seconds.
  • CPU usage above 80% for 10 minutes.
  • Database connection pool above 70%.
  • Daily spending exceeds threshold.

Setting Up Alerts (GCP)

# Create an alert policy for high error rate
gcloud alpha monitoring policies create \
  --display-name="High Error Rate" \
  --condition-display-name="Error rate > 5%" \
  --condition-filter='resource.type="cloud_run_revision" AND metric.type="run.googleapis.com/request_count" AND metric.labels.response_code_class="5xx"' \
  --condition-threshold-value=0.05 \
  --condition-threshold-duration=300s \
  --notification-channels=YOUR_CHANNEL_ID

Next Steps

On this page