Operations & Observability
Health checks, error tracking, and graceful shutdown for production deployments.
osmTalk exposes operational endpoints and integrations for monitoring, alerting, and safe deploys.
Health Check Endpoints
| Endpoint | What it checks | Use for |
|---|---|---|
GET /api/health | API up + Database + Redis | Uptime monitors, load balancers, Docker healthcheck |
GET /api/health/deep | API + DB + Redis + Bot + Voice transport + MinIO | On-call dashboards, deploy gates |
GET /health (bot, port 8080) | Bot process alive | Docker healthcheck for speak_up-bot-1 |
GET /health/deep (bot) | Bot + env vars + active call count | Detailed bot status |
Deep-health returns 200 when every critical dependency is ok; 503 when any critical dependency is unhealthy. Safe to poll every 30s.
curl https://api.osmtalk.com/api/health/deep
# → { "status": "healthy", "checks": { "api":"ok", "database":"ok", "redis":"ok",
# "bot":"ok", "voice_transport":"ok", "minio":"ok" }, "uptime": 12345 }Error Tracking (Sentry)
Both the API and the bot support Sentry out of the box. Set SENTRY_DSN and restart — no code changes needed.
Environment variables
| Var | Applies to | Default | Description |
|---|---|---|---|
SENTRY_DSN | Bot + API | (empty → disabled) | Your Sentry project DSN |
APP_ENV | Bot + API | production | Environment tag (production/staging/dev) |
GIT_SHA | Bot + API | dev | Release identifier — helps match errors to commits |
SENTRY_TRACES_SAMPLE_RATE | Bot + API | 0.1 | Fraction of requests to record as performance spans |
SENTRY_PROFILES_SAMPLE_RATE | Bot only | 0.0 | Fraction of traces to also profile (CPU-heavy) |
Sentry is initialized with send_default_pii=False so user phone numbers and transcripts are not sent by default.
What gets reported
- Unhandled exceptions in API routes
- Unhandled exceptions in bot
run_bot()and pipeline processors - Performance traces (10% sample rate by default)
Verify
docker logs speak_up-bot-1 2>&1 | grep -i sentry
# → Sentry initialized for botGraceful Shutdown
When the bot process receives SIGTERM (e.g., during a docker compose down or rolling deploy), it:
- Logs
SIGTERM: winding down N active call(s), grace=20s - For each active call, plays a TTS message:
"I need to end this call for a system update. Please call back shortly. Thank you!"
- Waits up to
SHUTDOWN_GRACE_SECS(default 20) for in-flight TTS to finish - Queues
EndFrameon each pipeline — saves transcripts, metrics, and recordings to the API - Exits cleanly
This prevents dropped calls during deploys. Set SHUTDOWN_GRACE_SECS=30 if your TTS is slow.
Test it locally
docker kill --signal=SIGTERM speak_up-bot-1
docker logs speak_up-bot-1 2>&1 | tail -20
# Expected output:
# SIGTERM: winding down 1 active call(s), grace=20s
# SIGTERM: all calls drained cleanlyCall Metrics & Cost Tracking
Every completed call writes to the calls table:
Latency metrics (metrics JSONB column)
{
"turns": 8,
"interruptions": 1,
"avgLatency": 1.42,
"avgLlmTtfb": 0.89,
"avgTtsTtfb": 0.21,
"avgSttTtfb": 0.32,
"totalPromptTokens": 1240,
"totalCompletionTokens": 380,
"totalTtsCharacters": 2104,
"events": [ /* per-frame timeline */ ]
}Shown in the UI on the Call Details page → Metrics tab.
Cost breakdown (cost_* columns)
| Column | Unit | Description |
|---|---|---|
cost_llm | INR | LLM tokens × model rate (input + output) |
cost_stt | INR | STT seconds × provider rate |
cost_tts | INR | TTS characters × provider rate |
cost_sip | INR | SIP per-minute × call duration (phone only) |
cost_total | INR | Sum of all components |
Rates are defined in packages/config/src/rates.ts. Deducted from org credits after call completion.
Shown in the UI on the Call Details page → Cost Breakdown card.
Call Recording
When enableCallRecording is set on an agent:
- Stereo WAV (user=left, bot=right) is uploaded to MinIO on call completion
calls.recordingUrlis populated with the signed MinIO URL- Audio player appears on the Call Details page
Configure MinIO via env:
MINIO_ENDPOINT=http://minio:9000
MINIO_PUBLIC_ENDPOINT=https://storage.osmtalk.com
MINIO_ACCESS_KEY=...
MINIO_SECRET_KEY=...
MINIO_BUCKET=osmtalk-recordingsSet a lifecycle rule on the bucket (90-day retention recommended):
mc ilm add myminio/osmtalk-recordings --expiry-days 90Concurrent-Call Capacity Gate
Voice calls are gated at the API before a bot is spawned, so callers see a clean HTTP 429 with a friendly message instead of silence when the platform is at capacity.
Why this exists
Each active voice call opens one TTS streaming socket to your TTS provider. On Deepgram Pay-as-you-go that limit is 45 concurrent TTS streams — exceeding it causes Deepgram to reject the next socket and the caller hears nothing for the rest of the call. The gate refuses calls before the socket is opened.
STT streaming on Deepgram PAYG is 150 concurrent — TTS is the binding bottleneck, so the gate sizes against TTS.
Configuration
| Var | Default | Description |
|---|---|---|
MAX_CONCURRENT_CALLS | 43 | Global hard cap. Default = Deepgram PAYG TTS limit (45) − 2 buffer. |
MAX_CONCURRENT_CALLS_PER_ORG | 15 | Per-tenant cap so one workspace can't starve every other tenant. |
To raise/lower:
# /opt/osmtalk/speak_up/.env
MAX_CONCURRENT_CALLS=40
MAX_CONCURRENT_CALLS_PER_ORG=12Then recreate the API container:
docker compose -f docker-compose.prod.yml up -d --force-recreate apiHow acquire/release works
The gate is atomic — it uses a Redis Lua script to INCR-and-check both counters in one round-trip, so 10 simultaneous starts can't all "see 42" and all pass. Each successful acquire is matched by exactly one release, called from:
POST /api/calls/:id/complete(normal call end via the bot)POST /api/calls/:id/end(dashboard "Hang up")- Every failure-path that flips
status='failed'(bot spawn errors, SIP errors, WhatsApp errors) POST /api/jobs/sweep-stale-calls(cleanup for crashes)
If Redis is unavailable, the gate falls back to a non-atomic DB count — still blocks at the limit, with the original race window. A Redis capacity Lua failed warning logs every fallback so you notice the degraded mode.
The counter only includes rows whose channel opens a TTS streaming socket — web, phone, whatsapp_call. Chat sessions (channel='chat') and WhatsApp text (channel='whatsapp_message') can stay active indefinitely as resumable conversations and do not consume voice slots.
Behavior at the cap
When either limit is reached, every voice-spawn route returns:
HTTP 429 Too Many Requests
{
"error": "All voice agents are currently busy. Please try again in a moment.",
"code": "high_volume",
"scope": "global",
"global": 43,
"org": 7,
"globalLimit": 43,
"orgLimit": 15
}scope tells you which limit triggered ("global" or "org") — useful for dashboards. The widget and dashboard already surface body.error to the user — no extra UI work needed.
When to upgrade your provider plan
If you regularly see Capacity gate triggered warnings in API logs, you've outgrown Deepgram PAYG. Either:
- Move to Deepgram Growth (typically 200+ concurrent TTS), or
- Configure a fallback TTS provider (ElevenLabs Flash or Groq Orpheus) to absorb overflow.
Background Jobs (In-Process Scheduler)
Two maintenance jobs run automatically inside the API container — no external cron, no extra service. They start on API boot and stop cleanly on graceful shutdown. You don't need to set up crontab or any scheduler.
| Job | What it does | Default cadence |
|---|---|---|
sweep-stale-calls | Flips calls.status='active' rows older than STALE_CALL_HOURS (default 2h) to failed and releases their capacity slot. Recovers from bot crashes / OOM / lost network. | Every 1 hour |
reconcile-capacity | Resets Redis voice-capacity counters from DB ground truth. Corrects drift after Redis restarts or missed releases. | Every 30 minutes |
Each successful run logs:
INFO: Internal cron: job completed — job: sweep-stale-calls, ms: 12, result: { sweptCount: 0, ... }and failures log Internal cron: job failed. Wire those into your alerting if you want pages on persistent failures.
Boot adds 0–30 s of jitter so timers don't all fire on the same tick if you scale to multiple API replicas. Overlap protection skips a run if the previous one is still in progress.
Tunable env vars
| Var | Default | Description |
|---|---|---|
INTERNAL_CRON_ENABLED | 1 | Set to 0 to disable the in-process scheduler (e.g. if you prefer your own external cron). |
STALE_CALL_HOURS | 2 | Threshold past which an active call is considered crashed. |
SWEEP_INTERVAL_SECS | 3600 | How often the sweep runs (floor 60s). |
RECONCILE_INTERVAL_SECS | 1800 | How often the reconciler runs (floor 60s). |
Apply changes:
# /opt/osmtalk/speak_up/.env
SWEEP_INTERVAL_SECS=600docker compose -f docker-compose.prod.yml up -d --force-recreate apiManual / external trigger
Both jobs are idempotent and have HTTP endpoints, so you can also drive them from an external scheduler (Kubernetes CronJob, GitHub Actions, cron-job.org, etc.) — disable the in-process scheduler with INTERNAL_CRON_ENABLED=0 if you go this route.
curl -s -X POST https://api.osmtalk.com/api/jobs/sweep-stale-calls \
-H "x-internal-secret: $INTERNAL_API_SECRET"
# → { "sweptCount": 0, "cutoff": "...", "thresholdHours": 2 }
curl -s -X POST https://api.osmtalk.com/api/jobs/reconcile-capacity \
-H "x-internal-secret: $INTERNAL_API_SECRET"
# → { "globalCount": 7, "orgs": 3 }Verify after deploy
After bringing up the API container, check the logs for the startup banner:
docker compose -f docker-compose.prod.yml logs api | grep "Internal cron"
# → Internal cron started — jobs: [...]Then within 60 s of boot you should see the first run:
docker compose -f docker-compose.prod.yml logs api | grep "Internal cron: job completed"If you see the banner but no completed runs after ~5 min, something's wrong — check Internal cron: job failed lines for the cause.
Log Output
Both services emit structured logs:
- API → pino JSON (stdout)
- Bot → loguru (stderr)
Redirect to your log aggregator (Grafana Loki, Datadog, etc.):
docker logs -f speak_up-api-1 2>&1 | vector --config vector.toml