Monitoring & Observability
Set up health checks, structured logging, and OpenTelemetry tracing for your thinnestAI deployment.
Monitoring & Observability
Running thinnestAI in production requires visibility into what's happening. This guide covers health checks, logging, and distributed tracing with OpenTelemetry.
Health Checks
The backend exposes a health endpoint that verifies all critical services are operational.
Health Endpoint
curl https://your-api-url/api/healthHealthy Response (200 OK):
{
"status": "healthy",
"version": "1.0.0",
"database": "connected",
"redis": "connected",
"uptime": "3d 14h 22m"
}Unhealthy Response (503 Service Unavailable):
{
"status": "unhealthy",
"database": "disconnected",
"redis": "connected",
"error": "Cannot connect to PostgreSQL"
}What's Checked
| Component | Check | Failure Impact |
|---|---|---|
| PostgreSQL | Connection query | API returns errors |
| Redis | Ping command | Caching and rate limiting disabled |
| Workers | Queue connectivity | Background tasks stop processing |
Using Health Checks
Docker Compose
services:
backend:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/api/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40sGCP Cloud Run
Cloud Run automatically performs health checks. Configure a startup probe for slower starts:
gcloud run deploy thinnestai-backend \
--startup-cpu-boost \
--cpu-throttlingExternal Monitoring
Point your uptime monitoring service (UptimeRobot, Pingdom, etc.) at:
https://your-api-url/api/healthSet alerts for:
- Response status != 200
- Response time > 5 seconds
- Any
"disconnected"values in the response body
Logging
thinnestAI uses structured logging with correlation IDs for tracing requests across services.
Log Format
In production, logs are output as JSON for easy parsing by log aggregators:
{
"timestamp": "2026-03-05T14:30:00.123Z",
"level": "INFO",
"logger": "api.chat",
"message": "Chat request processed",
"correlation_id": "req_abc123",
"user_id": "user_456",
"agent_id": "agent_789",
"duration_ms": 1250,
"tokens_used": 173
}In development, logs use a human-readable format:
2026-03-05 14:30:00 INFO [api.chat] Chat request processed (req_abc123) - 1250msLog Levels
| Level | When to Use | Environment Variable |
|---|---|---|
DEBUG | Detailed debugging info | Development only |
INFO | Normal operations | Default for production |
WARNING | Something unexpected but not critical | Always enabled |
ERROR | Something failed | Always enabled |
Set the log level via environment variable:
LOG_LEVEL=INFOCorrelation IDs
Every API request is assigned a unique correlation ID (correlation_id). This ID is:
- Included in all log entries for that request.
- Returned in the response header
X-Correlation-ID. - Passed to background workers for end-to-end tracing.
To trace a specific request through the system:
# Find all logs for a specific request
# GCP Cloud Logging
gcloud logging read 'jsonPayload.correlation_id="req_abc123"' \
--project YOUR_PROJECT \
--limit 50
# Docker
docker-compose logs backend | grep "req_abc123"Viewing Logs
GCP Cloud Logging
# Recent logs
gcloud run services logs read thinnestai-backend \
--region us-central1 \
--limit 100
# Stream live logs
gcloud run services logs tail thinnestai-backend \
--region us-central1
# Filter by severity
gcloud logging read 'resource.type="cloud_run_revision" AND severity>=ERROR' \
--project YOUR_PROJECT
# Filter by user
gcloud logging read 'jsonPayload.user_id="user_456"' \
--project YOUR_PROJECTDocker
# All logs
docker-compose logs -f backend
# Filter errors
docker-compose logs backend 2>&1 | grep ERROR
# Last 100 lines
docker-compose logs --tail 100 backendLog Aggregation
For production deployments, send logs to a centralized platform:
| Platform | Integration |
|---|---|
| GCP Cloud Logging | Automatic with Cloud Run |
| Datadog | Set DD_API_KEY and use the Datadog agent |
| Grafana Loki | Configure a Loki logging driver |
| ELK Stack | Use Filebeat or Logstash to ship logs |
OpenTelemetry
thinnestAI supports OpenTelemetry for distributed tracing, metrics, and enhanced observability.
Enabling OpenTelemetry
Set these environment variables:
OTEL_ENABLED=true
OTEL_SERVICE_NAME=thinnestai-backend
OTEL_EXPORTER_OTLP_ENDPOINT=http://your-collector:4318What's Traced
| Operation | Trace Includes |
|---|---|
| API Requests | Method, path, status code, duration |
| Database Queries | Query type, table, duration |
| LLM Calls | Model, provider, token counts, latency |
| Agent Execution | Agent ID, tools used, steps taken |
| Background Jobs | Job type, queue, processing time |
| Redis Operations | Command, key pattern, duration |
Trace Example
A single chat request generates a trace like:
[Trace: req_abc123]
├── POST /api/chat (1250ms)
│ ├── auth.validate_token (5ms)
│ ├── db.get_agent (12ms)
│ ├── db.get_session (8ms)
│ ├── knowledge.search (45ms)
│ │ └── pgvector.similarity_search (38ms)
│ ├── llm.chat_completion (1150ms)
│ │ └── openai.gpt-4o-mini (1140ms)
│ ├── db.save_message (15ms)
│ └── billing.record_usage (10ms)Collector Setup
Using Jaeger (Development)
# Add to docker-compose.yml
services:
jaeger:
image: jaegertracing/all-in-one:1.53
ports:
- "16686:16686" # Jaeger UI
- "4318:4318" # OTLP HTTP
environment:
COLLECTOR_OTLP_ENABLED: trueThen set:
OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318Access the Jaeger UI at http://localhost:16686.
Using Grafana Tempo (Production)
OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://your-tempo-instance:4318Using GCP Cloud Trace
For GCP deployments, traces are automatically sent to Cloud Trace when running on Cloud Run with the appropriate service account permissions:
OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=https://monitoring.googleapis.comCustom Spans
If you extend thinnestAI with custom code, add tracing to your functions:
from monitoring.otel import get_tracer
tracer = get_tracer(__name__)
async def my_custom_function():
with tracer.start_as_current_span("my_custom_operation") as span:
span.set_attribute("custom.key", "value")
# Your code here
result = await do_something()
span.set_attribute("custom.result_count", len(result))
return resultKey Metrics to Monitor
Application Metrics
| Metric | What to Watch | Alert Threshold |
|---|---|---|
| Request latency (p95) | Response time degradation | > 5 seconds |
| Error rate | Percentage of 5xx responses | > 1% |
| Active sessions | Concurrent users | Varies by capacity |
| Token usage | Daily/hourly consumption | Budget thresholds |
Infrastructure Metrics
| Metric | What to Watch | Alert Threshold |
|---|---|---|
| CPU utilization | Sustained high usage | > 80% for 5 minutes |
| Memory usage | Memory leaks | > 85% |
| Database connections | Pool exhaustion | > 80% of pool size |
| Redis memory | Cache pressure | > 75% of allocated |
| Disk usage | Storage filling up | > 80% |
Business Metrics
| Metric | Description |
|---|---|
| Messages per hour | Chat volume trend |
| Voice minutes per day | Voice usage trend |
| Agent response quality | Track user feedback |
| Campaign delivery rate | Successful deliveries vs. failures |
Dashboard Setup
Grafana
If using Grafana, import these dashboards:
- API Overview — Request rates, latencies, error rates.
- Agent Performance — Per-agent metrics, model usage, token costs.
- Infrastructure — Database, Redis, CPU, memory.
GCP Cloud Monitoring
Create custom dashboards in the GCP Console:
- Go to Monitoring > Dashboards.
- Click Create Dashboard.
- Add widgets for Cloud Run metrics (request count, latency, memory).
- Add Cloud SQL metrics (connections, CPU, storage).
- Add Memorystore metrics (memory usage, connections).
Alerting
Critical Alerts (Page Someone)
- Health endpoint returns non-200 for > 2 minutes.
- Error rate exceeds 5% for > 5 minutes.
- Database connection failures.
- Redis unreachable.
Warning Alerts (Notify)
- p95 latency exceeds 3 seconds.
- CPU usage above 80% for 10 minutes.
- Database connection pool above 70%.
- Daily spending exceeds threshold.
Setting Up Alerts (GCP)
# Create an alert policy for high error rate
gcloud alpha monitoring policies create \
--display-name="High Error Rate" \
--condition-display-name="Error rate > 5%" \
--condition-filter='resource.type="cloud_run_revision" AND metric.type="run.googleapis.com/request_count" AND metric.labels.response_code_class="5xx"' \
--condition-threshold-value=0.05 \
--condition-threshold-duration=300s \
--notification-channels=YOUR_CHANNEL_IDNext Steps
- Environment Variables — Configure logging and tracing variables.
- GCP Deployment — Production deployment with built-in monitoring.
- Docker Deployment — Local deployment with health checks.