osmTalk Docs
Settings

Operations & Observability

Health checks, error tracking, and graceful shutdown for production deployments.

osmTalk exposes operational endpoints and integrations for monitoring, alerting, and safe deploys.

Health Check Endpoints

EndpointWhat it checksUse for
GET /api/healthAPI up + Database + RedisUptime monitors, load balancers, Docker healthcheck
GET /api/health/deepAPI + DB + Redis + Bot + Voice transport + MinIOOn-call dashboards, deploy gates
GET /health (bot, port 8080)Bot process aliveDocker healthcheck for speak_up-bot-1
GET /health/deep (bot)Bot + env vars + active call countDetailed bot status

Deep-health returns 200 when every critical dependency is ok; 503 when any critical dependency is unhealthy. Safe to poll every 30s.

curl https://api.osmtalk.com/api/health/deep
# → { "status": "healthy", "checks": { "api":"ok", "database":"ok", "redis":"ok",
#     "bot":"ok", "voice_transport":"ok", "minio":"ok" }, "uptime": 12345 }

Error Tracking (Sentry)

Both the API and the bot support Sentry out of the box. Set SENTRY_DSN and restart — no code changes needed.

Environment variables

VarApplies toDefaultDescription
SENTRY_DSNBot + API(empty → disabled)Your Sentry project DSN
APP_ENVBot + APIproductionEnvironment tag (production/staging/dev)
GIT_SHABot + APIdevRelease identifier — helps match errors to commits
SENTRY_TRACES_SAMPLE_RATEBot + API0.1Fraction of requests to record as performance spans
SENTRY_PROFILES_SAMPLE_RATEBot only0.0Fraction of traces to also profile (CPU-heavy)

Sentry is initialized with send_default_pii=False so user phone numbers and transcripts are not sent by default.

What gets reported

  • Unhandled exceptions in API routes
  • Unhandled exceptions in bot run_bot() and pipeline processors
  • Performance traces (10% sample rate by default)

Verify

docker logs speak_up-bot-1 2>&1 | grep -i sentry
# → Sentry initialized for bot

Graceful Shutdown

When the bot process receives SIGTERM (e.g., during a docker compose down or rolling deploy), it:

  1. Logs SIGTERM: winding down N active call(s), grace=20s
  2. For each active call, plays a TTS message:

    "I need to end this call for a system update. Please call back shortly. Thank you!"

  3. Waits up to SHUTDOWN_GRACE_SECS (default 20) for in-flight TTS to finish
  4. Queues EndFrame on each pipeline — saves transcripts, metrics, and recordings to the API
  5. Exits cleanly

This prevents dropped calls during deploys. Set SHUTDOWN_GRACE_SECS=30 if your TTS is slow.

Test it locally

docker kill --signal=SIGTERM speak_up-bot-1
docker logs speak_up-bot-1 2>&1 | tail -20
# Expected output:
# SIGTERM: winding down 1 active call(s), grace=20s
# SIGTERM: all calls drained cleanly

Call Metrics & Cost Tracking

Every completed call writes to the calls table:

Latency metrics (metrics JSONB column)

{
  "turns": 8,
  "interruptions": 1,
  "avgLatency": 1.42,
  "avgLlmTtfb": 0.89,
  "avgTtsTtfb": 0.21,
  "avgSttTtfb": 0.32,
  "totalPromptTokens": 1240,
  "totalCompletionTokens": 380,
  "totalTtsCharacters": 2104,
  "events": [ /* per-frame timeline */ ]
}

Shown in the UI on the Call Details page → Metrics tab.

Cost breakdown (cost_* columns)

ColumnUnitDescription
cost_llmINRLLM tokens × model rate (input + output)
cost_sttINRSTT seconds × provider rate
cost_ttsINRTTS characters × provider rate
cost_sipINRSIP per-minute × call duration (phone only)
cost_totalINRSum of all components

Rates are defined in packages/config/src/rates.ts. Deducted from org credits after call completion.

Shown in the UI on the Call Details page → Cost Breakdown card.

Call Recording

When enableCallRecording is set on an agent:

  • Stereo WAV (user=left, bot=right) is uploaded to MinIO on call completion
  • calls.recordingUrl is populated with the signed MinIO URL
  • Audio player appears on the Call Details page

Configure MinIO via env:

MINIO_ENDPOINT=http://minio:9000
MINIO_PUBLIC_ENDPOINT=https://storage.osmtalk.com
MINIO_ACCESS_KEY=...
MINIO_SECRET_KEY=...
MINIO_BUCKET=osmtalk-recordings

Set a lifecycle rule on the bucket (90-day retention recommended):

mc ilm add myminio/osmtalk-recordings --expiry-days 90

Concurrent-Call Capacity Gate

Voice calls are gated at the API before a bot is spawned, so callers see a clean HTTP 429 with a friendly message instead of silence when the platform is at capacity.

Why this exists

Each active voice call opens one TTS streaming socket to your TTS provider. On Deepgram Pay-as-you-go that limit is 45 concurrent TTS streams — exceeding it causes Deepgram to reject the next socket and the caller hears nothing for the rest of the call. The gate refuses calls before the socket is opened.

STT streaming on Deepgram PAYG is 150 concurrent — TTS is the binding bottleneck, so the gate sizes against TTS.

Configuration

VarDefaultDescription
MAX_CONCURRENT_CALLS43Global hard cap. Default = Deepgram PAYG TTS limit (45) − 2 buffer.
MAX_CONCURRENT_CALLS_PER_ORG15Per-tenant cap so one workspace can't starve every other tenant.

To raise/lower:

# /opt/osmtalk/speak_up/.env
MAX_CONCURRENT_CALLS=40
MAX_CONCURRENT_CALLS_PER_ORG=12

Then recreate the API container:

docker compose -f docker-compose.prod.yml up -d --force-recreate api

How acquire/release works

The gate is atomic — it uses a Redis Lua script to INCR-and-check both counters in one round-trip, so 10 simultaneous starts can't all "see 42" and all pass. Each successful acquire is matched by exactly one release, called from:

  • POST /api/calls/:id/complete (normal call end via the bot)
  • POST /api/calls/:id/end (dashboard "Hang up")
  • Every failure-path that flips status='failed' (bot spawn errors, SIP errors, WhatsApp errors)
  • POST /api/jobs/sweep-stale-calls (cleanup for crashes)

If Redis is unavailable, the gate falls back to a non-atomic DB count — still blocks at the limit, with the original race window. A Redis capacity Lua failed warning logs every fallback so you notice the degraded mode.

The counter only includes rows whose channel opens a TTS streaming socket — web, phone, whatsapp_call. Chat sessions (channel='chat') and WhatsApp text (channel='whatsapp_message') can stay active indefinitely as resumable conversations and do not consume voice slots.

Behavior at the cap

When either limit is reached, every voice-spawn route returns:

HTTP 429 Too Many Requests
{
  "error": "All voice agents are currently busy. Please try again in a moment.",
  "code": "high_volume",
  "scope": "global",
  "global": 43,
  "org": 7,
  "globalLimit": 43,
  "orgLimit": 15
}

scope tells you which limit triggered ("global" or "org") — useful for dashboards. The widget and dashboard already surface body.error to the user — no extra UI work needed.

When to upgrade your provider plan

If you regularly see Capacity gate triggered warnings in API logs, you've outgrown Deepgram PAYG. Either:

  • Move to Deepgram Growth (typically 200+ concurrent TTS), or
  • Configure a fallback TTS provider (ElevenLabs Flash or Groq Orpheus) to absorb overflow.

Background Jobs (In-Process Scheduler)

Two maintenance jobs run automatically inside the API container — no external cron, no extra service. They start on API boot and stop cleanly on graceful shutdown. You don't need to set up crontab or any scheduler.

JobWhat it doesDefault cadence
sweep-stale-callsFlips calls.status='active' rows older than STALE_CALL_HOURS (default 2h) to failed and releases their capacity slot. Recovers from bot crashes / OOM / lost network.Every 1 hour
reconcile-capacityResets Redis voice-capacity counters from DB ground truth. Corrects drift after Redis restarts or missed releases.Every 30 minutes

Each successful run logs:

INFO: Internal cron: job completed — job: sweep-stale-calls, ms: 12, result: { sweptCount: 0, ... }

and failures log Internal cron: job failed. Wire those into your alerting if you want pages on persistent failures.

Boot adds 0–30 s of jitter so timers don't all fire on the same tick if you scale to multiple API replicas. Overlap protection skips a run if the previous one is still in progress.

Tunable env vars

VarDefaultDescription
INTERNAL_CRON_ENABLED1Set to 0 to disable the in-process scheduler (e.g. if you prefer your own external cron).
STALE_CALL_HOURS2Threshold past which an active call is considered crashed.
SWEEP_INTERVAL_SECS3600How often the sweep runs (floor 60s).
RECONCILE_INTERVAL_SECS1800How often the reconciler runs (floor 60s).

Apply changes:

# /opt/osmtalk/speak_up/.env
SWEEP_INTERVAL_SECS=600
docker compose -f docker-compose.prod.yml up -d --force-recreate api

Manual / external trigger

Both jobs are idempotent and have HTTP endpoints, so you can also drive them from an external scheduler (Kubernetes CronJob, GitHub Actions, cron-job.org, etc.) — disable the in-process scheduler with INTERNAL_CRON_ENABLED=0 if you go this route.

curl -s -X POST https://api.osmtalk.com/api/jobs/sweep-stale-calls \
  -H "x-internal-secret: $INTERNAL_API_SECRET"
# → { "sweptCount": 0, "cutoff": "...", "thresholdHours": 2 }

curl -s -X POST https://api.osmtalk.com/api/jobs/reconcile-capacity \
  -H "x-internal-secret: $INTERNAL_API_SECRET"
# → { "globalCount": 7, "orgs": 3 }

Verify after deploy

After bringing up the API container, check the logs for the startup banner:

docker compose -f docker-compose.prod.yml logs api | grep "Internal cron"
# → Internal cron started — jobs: [...]

Then within 60 s of boot you should see the first run:

docker compose -f docker-compose.prod.yml logs api | grep "Internal cron: job completed"

If you see the banner but no completed runs after ~5 min, something's wrong — check Internal cron: job failed lines for the cause.

Log Output

Both services emit structured logs:

  • API → pino JSON (stdout)
  • Bot → loguru (stderr)

Redirect to your log aggregator (Grafana Loki, Datadog, etc.):

docker logs -f speak_up-api-1 2>&1 | vector --config vector.toml