osmTalk Docs

Changelog

Version history and release notes for osmTalk.

v0.7.0 — Honest Call Outcomes (May 2026)

Bulk-call accuracy fixes

  • Calls with no audio are now marked status: "failed", not falsely completed. The bot now tracks whether real audio flowed in either direction and the API routes silent calls through a new /api/calls/:id/fail endpoint that skips billing.
  • 5-minute idle timeout now reports endReason: "idle_timeout" instead of silently flipping to completed. Same for the provider error-circuit breaker (endReason: "provider_circuit_open") and bot-startup timeouts (endReason: "bot_startup_failed").
  • New calls.failureReason column + machine-readable enum of 10 reasons (no_audio_output, no_audio_either_direction, idle_timeout, provider_circuit_open, sip_no_answer, sip_rejected, bot_startup_failed, caller_hung_up_silently, stale_sweep, unknown). Full per-reason guide at Failure Reasons.
  • Campaign workers automatically retry retryable failuresfailed calls with retryable reasons now hit your retryPolicy.maxAttempts instead of being treated as terminal "no answer" outcomes.
  • Refund path — calls marked failed via this system have billingStatus: "free", so they don't burn credits.

Dashboard

  • Failure-reason banner on every failed call's detail page — title, plain-English cause, "who's to blame" pill (platform / caller / environment / carrier), and "what to try". Same copy as the SDK's describeFailureReason() and the docs failure-reasons page.

Bot

  • Storage health probe at bot startup — recordings silently failing because MinIO is unreachable now surface as a loud startup-time error and a 503 on /health/deep. Previously this was only visible via per-call "Recording URL save FAILED" warnings, which are easy to miss.
  • Audio-flow tracking in MetricsCollector — observes BotStartedSpeakingFrame and TranscriptionFrame to set bot_audio_flowed / user_speech_flowed and end_reason properties that the cleanup path posts to /complete.

SDK

  • @osmapi/osmtalk-sdk@0.5.0 ships CallRecord.failureReason plus three helpers: describeFailureReason(), isRetryableFailure(), callConnected(). New filters on calls.list({ campaignId, failureReason }).

Why

A bulk-call test produced 85 calls of which 81% were incorrectly marked completed despite zero audio flowing — phantom calls from a TTS connect-storm under concurrency. The platform was charging for silence and the campaign engine wasn't retrying. This release makes the call status honest, the retries automatic, and the dashboard self-explaining when something fails.

v0.6.0 — Voice Agent Platform Hardening (April 2026)

Voice Pipeline Upgrade to v1.0

  • osmTalk Voice Pipeline v1.0 stable — upgraded from v0.0.102
  • Sarvam SDK 0.1.26 — fixes outdated-SDK integration bugs
  • WebSocket retry storm fix — bot no longer floods providers with requests after an auth/credit failure (caps at 3 rapid failures → non-fatal ErrorFrame instead of infinite reconnect)
  • MCPClient lifecycle — now uses the required await mcp.start() before register_tools(llm)

Tool Calling Improvements

  • Smart cancel_on_interruption defaults — HTTP POST/PUT/DELETE tools default to False (won't half-finish transactions); GET tools default to True
  • Per-tool cancelOnInterruption override in tool config
  • Per-tool timeoutSecs — set tight timeouts on fast APIs, loose on slow ones
  • Group parallel tools — LLM tool calls now run in parallel (default ON)
  • Streaming intermediate results — handlers can emit progress via result_callback(msg, is_final=False)
  • Function-args interrupt crash fix — no more JSONDecodeError when user interrupts mid-tool

Voice Experience

  • VAD audio_idle_timeout — fixes VAD getting stuck in SPEAKING state (default 1.5s, override via audioIdleTimeoutSecs setting)
  • ElevenLabs 48kHz native audio — crisper voice, no upsampling artifacts (override via ttsSampleRate setting)
  • Warm handoff summarytransfer_to_human now sends the last 10 conversation turns to the human agent's API

New LLM Provider: OpenAI Responses API

  • Opt-in via openai-responses provider on agent
  • WebSocket-based incremental context for lower latency on long calls
  • Falls back to Chat Completions on HTTP chat (WhatsApp/Web Chat) since Responses API is voice-only

DTMF Keypad Support

  • Phone callers — DTMF digits from PSTN are captured and fed to the LLM context as user input (e.g., "Press 1 for English, 2 for Hindi")
  • Web widget keypad UI — users can tap a 3×4 keypad that sends digits over the data channel
  • Per-agent toggle via enableDtmf setting (auto-enabled for phone agents)

Tool Status UI

  • Live tool-execution cards in the voice widget — users see "Looking that up…" spinner while HTTP tools run
  • Auto-dismiss after completion (3s fade)
  • Error cards show red X and keep showing until the call ends

Observability

  • Sentry integration (both bot + API) — set SENTRY_DSN env var to enable
  • /api/health/deep endpoint — probes Database, Redis, Bot, voice transport, and MinIO
  • Bot /health/deep endpoint — reports active call count, Sentry status, version
  • Latency metrics dashboard (already existed) — per-call TTFB breakdown across STT/LLM/TTS

Graceful Shutdown

  • Bot SIGTERM handler — active calls hear "I need to end this call for a system update, please call back" before disconnecting
  • 20-second grace period (configurable via SHUTDOWN_GRACE_SECS) for in-flight TTS to finish
  • Prevents dropped calls during deploys

System Prompt Fixes

  • Web chat now sends dynamic context (agent name, date/time, format rules, language rules) — previously only raw prompt
  • Same behavior across all 4 channels (web call, phone call, web chat, WhatsApp chat)

Security & Cleanup

  • Removed deprecated models from catalog: llama-4-maverick, anthropic/claude-sonnet-4-5-20250929 (on OpenRouter), saaras:v2.5, scribe_v2_realtime
  • Validated all remaining STT/TTS/LLM models against live provider APIs (35 total verified)

v0.5.0 — Shared Credits & Billing (April 2026)

Billing & Payments

  • Usage-based credits — Pay per call, no subscriptions
  • Shared credits with osmAPI — Single balance across both platforms
  • Razorpay integration — Top up via card, UPI, or netbanking
  • Billing dashboard — Balance, transactions, usage with INR/USD toggle
  • Pre-call balance check — Calls rejected if credits below ₹1
  • Cost breakdown per call — See LLM, STT, TTS, and SIP costs individually
  • Chat billing — LLM token costs deducted for chat messages
  • Inbound call billing — SIP charges for incoming phone calls
  • WhatsApp call billing — SIP charges for WhatsApp voice calls
  • Minimum charge — ₹1 minimum per call
  • Phone number setup fee — ₹50 one-time on purchase
  • Monthly phone rental — Automatic billing via background job
  • Low balance alerts — Email when credits drop below ₹10
  • Phone deactivation — Numbers auto-deactivated when rental unpaid
  • Data retention — Free 30-day storage, cleanup after expiry

Auth

  • Shared auth with osmAPI — Single login across both platforms
  • Login/register redirects to osmAPI (email, Google, GitHub)
  • Cross-domain token-based authentication
  • Redis session cache (5-minute TTL)
  • Auth guard on all dashboard pages

Enterprise Features

  • Decimal.js precision — No floating-point billing errors
  • Atomic transactions — All-or-nothing credit deduction
  • Rate limiting — Per IP, user, and API key
  • Structured logging — Pino JSON logs for production
  • Background jobs — Phone billing + data retention cron endpoints
  • Idempotent webhooks — Duplicate payment protection

Infrastructure

  • Dual database (OsmTalk + StartFlow shared DB on Neon)
  • Docker PostgreSQL removed (migrated to Neon cloud)
  • Multi-organization support with org switcher
  • Projects table for grouping agents

Docs

  • Billing documentation (overview, credits, pricing, top-up)
  • API reference for all endpoints
  • Phone number guides updated
  • Changelog added

v0.4.0 — MCP Server & Outbound Calls (March 2026)

New Features

  • MCP Server — Make phone calls from Claude Desktop (npm)
  • openingMessage — Agent speaks a pre-written message instantly when call connects (no LLM delay)
  • callerName — Agent identifies who it's calling on behalf of
  • Call history via MCP — View calls, transcripts, and analytics from Claude
  • Dashboard analytics — Total calls, success rate, top agents

Improvements

  • Widget outbound call validation (instruction length, E.164 phone format)
  • MCP tools: list_calls, get_dashboard, get_call_result

v0.3.0 — WhatsApp & Call Transfer (February 2026)

New Features

  • WhatsApp integration — Connect WhatsApp Business numbers to voice agents
  • WhatsApp calling — Inbound and outbound calls via WhatsApp
  • Call transfer — Transfer active calls to a human agent or another number
  • Context summarization — Automatic conversation summarization for long calls

Improvements

  • ElevenLabs STT/TTS support (Scribe v2, Flash v2.5, Turbo v2.5)
  • Smart turn detection for natural conversations
  • Voicemail detection for outbound calls
  • End-call detection (agent hangs up when user says goodbye)

v0.2.0 — Phone Numbers & SIP (January 2026)

New Features

  • Phone number provisioning — Buy Indian phone numbers and assign to agents
  • Inbound calls — Assign agents to phone numbers for automatic answering
  • Outbound calls — Dial any number with a voice agent
  • SIP integration — osmTalk SIP gateway for PSTN connectivity
  • Call recordings — Automatic recording with MinIO storage
  • Widget embed — Embeddable voice/chat widget for websites

Improvements

  • Background sounds (office, nature, cafe)
  • Advanced VAD settings (confidence, start/stop times)
  • Multiple TTS voices per provider

v0.1.0 — Initial Release (December 2025)

Features

  • Voice agent creation with custom system prompts
  • Multi-provider support: OpenAI, Groq, Anthropic, Deepgram, Sarvam
  • Real-time voice calls via osmTalk's WebRTC transport
  • Chat interface (text-based conversations)
  • Call transcripts and metrics
  • Team management with role-based access
  • Agent tools (HTTP tools, client tools, MCP servers)
  • Dashboard with call analytics