Set up health checks, structured logging, and OpenTelemetry tracing for your thinnestAI deployment.

Monitoring & Observability

Running thinnestAI in production requires visibility into what's happening. This guide covers health checks, logging, and distributed tracing with OpenTelemetry.

Health Checks

The backend exposes a health endpoint that verifies all critical services are operational.

Health Endpoint

curl https://your-api-url/api/health

Healthy Response (200 OK):

{
  "status": "healthy",
  "version": "1.0.0",
  "database": "connected",
  "redis": "connected",
  "uptime": "3d 14h 22m"
}

Unhealthy Response (503 Service Unavailable):

{
  "status": "unhealthy",
  "database": "disconnected",
  "redis": "connected",
  "error": "Cannot connect to PostgreSQL"
}

What's Checked

Component	Check	Failure Impact
PostgreSQL	Connection query	API returns errors
Redis	Ping command	Caching and rate limiting disabled
Workers	Queue connectivity	Background tasks stop processing

Using Health Checks

Docker Compose

services:
  backend:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/api/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

GCP Cloud Run

Cloud Run automatically performs health checks. Configure a startup probe for slower starts:

gcloud run deploy thinnestai-backend \
  --startup-cpu-boost \
  --cpu-throttling

External Monitoring

Point your uptime monitoring service (UptimeRobot, Pingdom, etc.) at:

https://your-api-url/api/health

Set alerts for:

Response status != 200
Response time > 5 seconds
Any "disconnected" values in the response body

Logging

thinnestAI uses structured logging with correlation IDs for tracing requests across services.

Log Format

In production, logs are output as JSON for easy parsing by log aggregators:

{
  "timestamp": "2026-03-05T14:30:00.123Z",
  "level": "INFO",
  "logger": "api.chat",
  "message": "Chat request processed",
  "correlation_id": "req_abc123",
  "user_id": "user_456",
  "agent_id": "agent_789",
  "duration_ms": 1250,
  "tokens_used": 173
}

In development, logs use a human-readable format:

2026-03-05 14:30:00 INFO [api.chat] Chat request processed (req_abc123) - 1250ms

Log Levels

Level	When to Use	Environment Variable
`DEBUG`	Detailed debugging info	Development only
`INFO`	Normal operations	Default for production
`WARNING`	Something unexpected but not critical	Always enabled
`ERROR`	Something failed	Always enabled

Set the log level via environment variable:

LOG_LEVEL=INFO

Correlation IDs

Every API request is assigned a unique correlation ID (correlation_id). This ID is:

Included in all log entries for that request.
Returned in the response header X-Correlation-ID.
Passed to background workers for end-to-end tracing.

To trace a specific request through the system:

# Find all logs for a specific request
# GCP Cloud Logging
gcloud logging read 'jsonPayload.correlation_id="req_abc123"' \
  --project YOUR_PROJECT \
  --limit 50

# Docker
docker-compose logs backend | grep "req_abc123"

Viewing Logs

GCP Cloud Logging

# Recent logs
gcloud run services logs read thinnestai-backend \
  --region us-central1 \
  --limit 100

# Stream live logs
gcloud run services logs tail thinnestai-backend \
  --region us-central1

# Filter by severity
gcloud logging read 'resource.type="cloud_run_revision" AND severity>=ERROR' \
  --project YOUR_PROJECT

# Filter by user
gcloud logging read 'jsonPayload.user_id="user_456"' \
  --project YOUR_PROJECT

Docker

# All logs
docker-compose logs -f backend

# Filter errors
docker-compose logs backend 2>&1 | grep ERROR

# Last 100 lines
docker-compose logs --tail 100 backend

Log Aggregation

For production deployments, send logs to a centralized platform:

Platform	Integration
GCP Cloud Logging	Automatic with Cloud Run
Datadog	Set `DD_API_KEY` and use the Datadog agent
Grafana Loki	Configure a Loki logging driver
ELK Stack	Use Filebeat or Logstash to ship logs

OpenTelemetry

thinnestAI supports OpenTelemetry for distributed tracing, metrics, and enhanced observability.

Enabling OpenTelemetry

Set these environment variables:

OTEL_ENABLED=true
OTEL_SERVICE_NAME=thinnestai-backend
OTEL_EXPORTER_OTLP_ENDPOINT=http://your-collector:4318

What's Traced

Operation	Trace Includes
API Requests	Method, path, status code, duration
Database Queries	Query type, table, duration
LLM Calls	Model, provider, token counts, latency
Agent Execution	Agent ID, tools used, steps taken
Background Jobs	Job type, queue, processing time
Redis Operations	Command, key pattern, duration

Trace Example

A single chat request generates a trace like:

[Trace: req_abc123]
├── POST /api/chat (1250ms)
│   ├── auth.validate_token (5ms)
│   ├── db.get_agent (12ms)
│   ├── db.get_session (8ms)
│   ├── knowledge.search (45ms)
│   │   └── pgvector.similarity_search (38ms)
│   ├── llm.chat_completion (1150ms)
│   │   └── openai.gpt-4o-mini (1140ms)
│   ├── db.save_message (15ms)
│   └── billing.record_usage (10ms)

Collector Setup

Using Jaeger (Development)

# Add to docker-compose.yml
services:
  jaeger:
    image: jaegertracing/all-in-one:1.53
    ports:
      - "16686:16686"  # Jaeger UI
      - "4318:4318"    # OTLP HTTP
    environment:
      COLLECTOR_OTLP_ENABLED: true

Then set:

OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318

Access the Jaeger UI at http://localhost:16686.

Using Grafana Tempo (Production)

OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://your-tempo-instance:4318

Using GCP Cloud Trace

For GCP deployments, traces are automatically sent to Cloud Trace when running on Cloud Run with the appropriate service account permissions:

OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=https://monitoring.googleapis.com

Custom Spans

If you extend thinnestAI with custom code, add tracing to your functions:

from monitoring.otel import get_tracer

tracer = get_tracer(__name__)

async def my_custom_function():
    with tracer.start_as_current_span("my_custom_operation") as span:
        span.set_attribute("custom.key", "value")
        # Your code here
        result = await do_something()
        span.set_attribute("custom.result_count", len(result))
        return result

Key Metrics to Monitor

Application Metrics

Metric	What to Watch	Alert Threshold
Request latency (p95)	Response time degradation	> 5 seconds
Error rate	Percentage of 5xx responses	> 1%
Active sessions	Concurrent users	Varies by capacity
Token usage	Daily/hourly consumption	Budget thresholds

Infrastructure Metrics

Metric	What to Watch	Alert Threshold
CPU utilization	Sustained high usage	> 80% for 5 minutes
Memory usage	Memory leaks	> 85%
Database connections	Pool exhaustion	> 80% of pool size
Redis memory	Cache pressure	> 75% of allocated
Disk usage	Storage filling up	> 80%

Business Metrics

Metric	Description
Messages per hour	Chat volume trend
Voice minutes per day	Voice usage trend
Agent response quality	Track user feedback
Campaign delivery rate	Successful deliveries vs. failures

Dashboard Setup

Grafana

If using Grafana, import these dashboards:

API Overview — Request rates, latencies, error rates.
Agent Performance — Per-agent metrics, model usage, token costs.
Infrastructure — Database, Redis, CPU, memory.

GCP Cloud Monitoring

Create custom dashboards in the GCP Console:

Go to Monitoring > Dashboards.
Click Create Dashboard.
Add widgets for Cloud Run metrics (request count, latency, memory).
Add Cloud SQL metrics (connections, CPU, storage).
Add Memorystore metrics (memory usage, connections).

Alerting

Critical Alerts (Page Someone)

Health endpoint returns non-200 for > 2 minutes.
Error rate exceeds 5% for > 5 minutes.
Database connection failures.
Redis unreachable.

Warning Alerts (Notify)

p95 latency exceeds 3 seconds.
CPU usage above 80% for 10 minutes.
Database connection pool above 70%.
Daily spending exceeds threshold.

Setting Up Alerts (GCP)

# Create an alert policy for high error rate
gcloud alpha monitoring policies create \
  --display-name="High Error Rate" \
  --condition-display-name="Error rate > 5%" \
  --condition-filter='resource.type="cloud_run_revision" AND metric.type="run.googleapis.com/request_count" AND metric.labels.response_code_class="5xx"' \
  --condition-threshold-value=0.05 \
  --condition-threshold-duration=300s \
  --notification-channels=YOUR_CHANNEL_ID

Next Steps

Environment Variables — Configure logging and tracing variables.
GCP Deployment — Production deployment with built-in monitoring.
Docker Deployment — Local deployment with health checks.

Monitoring & Observability

On this page