Operations & Observability

Health checks, error tracking, and graceful shutdown for production deployments.

osmTalk exposes operational endpoints and integrations for monitoring, alerting, and safe deploys.

Health Check Endpoints

Endpoint	What it checks	Use for
`GET /api/health`	API up + Database + Redis	Uptime monitors, load balancers, Docker healthcheck
`GET /api/health/deep`	API + DB + Redis + Bot + Voice transport + MinIO	On-call dashboards, deploy gates
`GET /health` (bot, port 8080)	Bot process alive	Docker healthcheck for `speak_up-bot-1`
`GET /health/deep` (bot)	Bot + env vars + active call count	Detailed bot status

Deep-health returns 200 when every critical dependency is ok; 503 when any critical dependency is unhealthy. Safe to poll every 30s.

curl https://api.osmtalk.com/api/health/deep
# → { "status": "healthy", "checks": { "api":"ok", "database":"ok", "redis":"ok",
#     "bot":"ok", "voice_transport":"ok", "minio":"ok" }, "uptime": 12345 }

Error Tracking (Sentry)

Both the API and the bot support Sentry out of the box. Set SENTRY_DSN and restart — no code changes needed.

Environment variables

Var	Applies to	Default	Description
`SENTRY_DSN`	Bot + API	(empty → disabled)	Your Sentry project DSN
`APP_ENV`	Bot + API	`production`	Environment tag (production/staging/dev)
`GIT_SHA`	Bot + API	`dev`	Release identifier — helps match errors to commits
`SENTRY_TRACES_SAMPLE_RATE`	Bot + API	`0.1`	Fraction of requests to record as performance spans
`SENTRY_PROFILES_SAMPLE_RATE`	Bot only	`0.0`	Fraction of traces to also profile (CPU-heavy)

Sentry is initialized with send_default_pii=False so user phone numbers and transcripts are not sent by default.

What gets reported

Unhandled exceptions in API routes
Unhandled exceptions in bot run_bot() and pipeline processors
Performance traces (10% sample rate by default)

Verify

docker logs speak_up-bot-1 2>&1 | grep -i sentry
# → Sentry initialized for bot

Graceful Shutdown

When the bot process receives SIGTERM (e.g., during a docker compose down or rolling deploy), it:

Logs SIGTERM: winding down N active call(s), grace=20s
For each active call, plays a TTS message:

"I need to end this call for a system update. Please call back shortly. Thank you!"
Waits up to SHUTDOWN_GRACE_SECS (default 20) for in-flight TTS to finish
Queues EndFrame on each pipeline — saves transcripts, metrics, and recordings to the API
Exits cleanly

This prevents dropped calls during deploys. Set SHUTDOWN_GRACE_SECS=30 if your TTS is slow.

Test it locally

docker kill --signal=SIGTERM speak_up-bot-1
docker logs speak_up-bot-1 2>&1 | tail -20
# Expected output:
# SIGTERM: winding down 1 active call(s), grace=20s
# SIGTERM: all calls drained cleanly

Call Metrics & Cost Tracking

Every completed call writes to the calls table:

Latency metrics (`metrics` JSONB column)

{
  "turns": 8,
  "interruptions": 1,
  "avgLatency": 1.42,
  "avgLlmTtfb": 0.89,
  "avgTtsTtfb": 0.21,
  "avgSttTtfb": 0.32,
  "totalPromptTokens": 1240,
  "totalCompletionTokens": 380,
  "totalTtsCharacters": 2104,
  "events": [ /* per-frame timeline */ ]
}

Shown in the UI on the Call Details page → Metrics tab.

Cost breakdown (`cost_*` columns)

Column	Unit	Description
`cost_llm`	INR	LLM tokens × model rate (input + output)
`cost_stt`	INR	STT seconds × provider rate
`cost_tts`	INR	TTS characters × provider rate
`cost_sip`	INR	SIP per-minute × call duration (phone only)
`cost_total`	INR	Sum of all components

Rates are defined in packages/config/src/rates.ts. Deducted from org credits after call completion.

Shown in the UI on the Call Details page → Cost Breakdown card.

Call Recording

When enableCallRecording is set on an agent:

Stereo WAV (user=left, bot=right) is uploaded to MinIO on call completion
calls.recordingUrl is populated with the signed MinIO URL
Audio player appears on the Call Details page

Configure MinIO via env:

MINIO_ENDPOINT=http://minio:9000
MINIO_PUBLIC_ENDPOINT=https://storage.osmtalk.com
MINIO_ACCESS_KEY=...
MINIO_SECRET_KEY=...
MINIO_BUCKET=osmtalk-recordings

Set a lifecycle rule on the bucket (90-day retention recommended):

mc ilm add myminio/osmtalk-recordings --expiry-days 90

Concurrent-Call Capacity Gate

Voice calls are gated at the API before a bot is spawned, so callers see a clean HTTP 429 with a friendly message instead of silence when the platform is at capacity.

Each active voice call opens one TTS streaming socket to your TTS provider. On Deepgram Pay-as-you-go that limit is 45 concurrent TTS streams — exceeding it causes Deepgram to reject the next socket and the caller hears nothing for the rest of the call. The gate refuses calls before the socket is opened.

STT streaming on Deepgram PAYG is 150 concurrent — TTS is the binding bottleneck, so the gate sizes against TTS.

Configuration

Var	Default	Description
`MAX_CONCURRENT_CALLS`	`43`	Global hard cap. Default = Deepgram PAYG TTS limit (45) − 2 buffer.
`MAX_CONCURRENT_CALLS_PER_ORG`	`15`	Per-tenant cap so one workspace can't starve every other tenant.

To raise/lower:

# /opt/osmtalk/speak_up/.env
MAX_CONCURRENT_CALLS=40
MAX_CONCURRENT_CALLS_PER_ORG=12

Then recreate the API container:

docker compose -f docker-compose.prod.yml up -d --force-recreate api

How acquire/release works

The gate is atomic — it uses a Redis Lua script to INCR-and-check both counters in one round-trip, so 10 simultaneous starts can't all "see 42" and all pass. Each successful acquire is matched by exactly one release, called from:

POST /api/calls/:id/complete (normal call end via the bot)
POST /api/calls/:id/end (dashboard "Hang up")
Every failure-path that flips status='failed' (bot spawn errors, SIP errors, WhatsApp errors)
POST /api/jobs/sweep-stale-calls (cleanup for crashes)

If Redis is unavailable, the gate falls back to a non-atomic DB count — still blocks at the limit, with the original race window. A Redis capacity Lua failed warning logs every fallback so you notice the degraded mode.

The counter only includes rows whose channel opens a TTS streaming socket — web, phone, whatsapp_call. Chat sessions (channel='chat') and WhatsApp text (channel='whatsapp_message') can stay active indefinitely as resumable conversations and do not consume voice slots.

Behavior at the cap

When either limit is reached, every voice-spawn route returns:

HTTP 429 Too Many Requests
{
  "error": "All voice agents are currently busy. Please try again in a moment.",
  "code": "high_volume",
  "scope": "global",
  "global": 43,
  "org": 7,
  "globalLimit": 43,
  "orgLimit": 15
}

scope tells you which limit triggered ("global" or "org") — useful for dashboards. The widget and dashboard already surface body.error to the user — no extra UI work needed.

When to upgrade your provider plan

If you regularly see Capacity gate triggered warnings in API logs, you've outgrown Deepgram PAYG. Either:

Move to Deepgram Growth (typically 200+ concurrent TTS), or
Configure a fallback TTS provider (ElevenLabs Flash or Groq Orpheus) to absorb overflow.

Background Jobs (In-Process Scheduler)

Two maintenance jobs run automatically inside the API container — no external cron, no extra service. They start on API boot and stop cleanly on graceful shutdown. You don't need to set up crontab or any scheduler.

Job	What it does	Default cadence
`sweep-stale-calls`	Flips `calls.status='active'` rows older than `STALE_CALL_HOURS` (default 2h) to `failed` and releases their capacity slot. Recovers from bot crashes / OOM / lost network.	Every 1 hour
`reconcile-capacity`	Resets Redis voice-capacity counters from DB ground truth. Corrects drift after Redis restarts or missed releases.	Every 30 minutes

Each successful run logs:

INFO: Internal cron: job completed — job: sweep-stale-calls, ms: 12, result: { sweptCount: 0, ... }

and failures log Internal cron: job failed. Wire those into your alerting if you want pages on persistent failures.

Boot adds 0–30 s of jitter so timers don't all fire on the same tick if you scale to multiple API replicas. Overlap protection skips a run if the previous one is still in progress.

Tunable env vars

Var	Default	Description
`INTERNAL_CRON_ENABLED`	`1`	Set to `0` to disable the in-process scheduler (e.g. if you prefer your own external cron).
`STALE_CALL_HOURS`	`2`	Threshold past which an `active` call is considered crashed.
`SWEEP_INTERVAL_SECS`	`3600`	How often the sweep runs (floor 60s).
`RECONCILE_INTERVAL_SECS`	`1800`	How often the reconciler runs (floor 60s).

Apply changes:

# /opt/osmtalk/speak_up/.env
SWEEP_INTERVAL_SECS=600

docker compose -f docker-compose.prod.yml up -d --force-recreate api

Manual / external trigger

Both jobs are idempotent and have HTTP endpoints, so you can also drive them from an external scheduler (Kubernetes CronJob, GitHub Actions, cron-job.org, etc.) — disable the in-process scheduler with INTERNAL_CRON_ENABLED=0 if you go this route.

curl -s -X POST https://api.osmtalk.com/api/jobs/sweep-stale-calls \
  -H "x-internal-secret: $INTERNAL_API_SECRET"
# → { "sweptCount": 0, "cutoff": "...", "thresholdHours": 2 }

curl -s -X POST https://api.osmtalk.com/api/jobs/reconcile-capacity \
  -H "x-internal-secret: $INTERNAL_API_SECRET"
# → { "globalCount": 7, "orgs": 3 }

Verify after deploy

After bringing up the API container, check the logs for the startup banner:

docker compose -f docker-compose.prod.yml logs api | grep "Internal cron"
# → Internal cron started — jobs: [...]

Then within 60 s of boot you should see the first run:

docker compose -f docker-compose.prod.yml logs api | grep "Internal cron: job completed"

If you see the banner but no completed runs after ~5 min, something's wrong — check Internal cron: job failed lines for the cause.

Log Output

Both services emit structured logs:

API → pino JSON (stdout)
Bot → loguru (stderr)

Redirect to your log aggregator (Grafana Loki, Datadog, etc.):

docker logs -f speak_up-api-1 2>&1 | vector --config vector.toml