Troubleshooting
Flaky models, noisy audio, "sometimes it works sometimes not" — and how to fix each one.
If your agent works one call and fails the next, the root cause is almost always one of these five. Work through them in order — the cheap checks are first.
1. Check the provider health badge
The dashboard polls every provider every 60 seconds and renders a yellow or red strip at the top when something's degraded. If you see "Provider issue detected", that's your answer.
You can also hit the health endpoint directly:
curl https://api.osmtalk.com/api/health/modelsReturns:
{
"openai": { "status": "ok", "latencyMs": 312 },
"anthropic": { "status": "ok", "latencyMs": 451 },
"groq": { "status": "down", "latencyMs": null, "error": "401 invalid api key" },
"deepgram": { "status": "ok", "latencyMs": 87 },
"elevenlabs": { "status": "ok", "latencyMs": 220 },
"sarvam": { "status": "ok", "latencyMs": 195 }
}status values:
| Value | Meaning | Action |
|---|---|---|
ok | Provider responded in under 3s | None |
degraded | Responded but over 3s | Switch to a backup model for this call |
down | Provider errored | Switch model. File a ticket if persistent. |
unconfigured | API key not set on the platform | Contact support |
The endpoint is cached for 60s server-side so checking it from a hot loop is fine.
2. "Sometimes works, sometimes not" on a specific model
The single biggest cause in 2026 is Groq Orpheus first-use ToS. When you call Orpheus for the first time on a new key, Groq requires a one-time terms-of-service acceptance in their console. Until you accept, the call fails ~50% of the time with no clear error.
Fix: log into console.groq.com, open any Orpheus model page, and click "Accept terms".
Other common culprits:
- OpenAI rate limits — Tier 1 keys cap at 500 RPM. Voice agents burst hard during peak hours. Upgrade tier or switch to
groq/llama-3.3-70b-versatilewhich has higher RPM. - ElevenLabs concurrency — Free / Starter plans cap concurrent TTS streams at 2 / 5. Calls fail with "no available concurrency". Upgrade to Creator+.
- Sarvam region — Sarvam endpoints have a soft preference for Mumbai region. Calls from us-east will see 400–700 ms added latency. If you're not Indian-language, use Deepgram instead.
3. Noisy audio — calls from cafes, traffic, call centers
osmTalk supports server-side noise suppression that runs BEFORE the audio reaches VAD and STT.
Enable in: Agent → Advanced → Turn Detection → Noise Suppression
| Mode | Cost | Quality | When to use |
|---|---|---|---|
| Off | Free | — | Quiet rooms only |
| RNNoise | Free (CPU only) | Good | Default for noisy environments |
RNNoise removes ~80% of constant ambient noise (HVAC, traffic, crowd murmur). For non-stationary noise (cafe babble, typing, doors, music, human voices in the background), we offer Krisp as a managed add-on — contact us to enable it on your account.
Audio sounds metallic / cuts in and out after enabling noise suppression
Aggressive noise suppression can over-process when the room is already quiet. Switch to Off for indoor / studio deployments.
4. "The bot doesn't hear me" / "It keeps interrupting"
This is a VAD / turn-detection issue, not a model issue. See the VAD & Turn Detection guide — channel-aware defaults solve 90% of cases.
Quick triage:
| Symptom | Likely fix |
|---|---|
| Bot interrupts mid-sentence | Increase vadStopSecs and smartTurnStopSecs |
| Bot doesn't respond | Decrease vadConfidence and vadMinVolume |
| Long silence before bot replies | Decrease smartTurnStopSecs |
| Hangs forever on a noisy line | Lower audioIdleTimeoutSecs to 5s |
5. Markdown showing as # in the system prompt
If you used "Generate with AI" before May 2026 the LLM sometimes inserted markdown (## Personality, **bold**) into the generated prompt, which then appeared as literal # and * characters in the editor.
Fixed in v0.5.2 — the generator now requests plain text and the response is sanitized server-side. Existing prompts can be cleaned up manually in the editor or re-generated.
6. Per-minute cost varies between calls
This is expected, not a bug.
Voice-AI cost is not flat per minute — it depends on:
- LLM input tokens (your system prompt + each user turn's context)
- LLM output tokens (how much the bot says)
- TTS characters (how many spoken characters the bot generates)
- STT minutes (call duration)
- SIP minutes (for phone calls only)
The preset picker on agent creation shows a range (low / typical / high) computed from the same rate tables we bill with. A short FAQ call lands at the bottom of the range; a verbose sales call with RAG context lands at the top.
The exact per-call cost is broken down in Calls → [Call] → Cost after each call. You only pay for what you used.
Still stuck?
- Logs: each call has detailed traces under Calls → [Call] → Logs
- Replay: every call can be re-run in the Simulate tab without spending real money
- Support: include the call ID — that's enough for us to reproduce everything
Operator notes (self-hosted)
If you're running the bot yourself:
- Set
BOT_INTERNAL_SECRETon the bot process to the same value as the API'sINTERNAL_API_SECRET. The bot rejects any/start*,/whatsapp-answer, or/whatsapp-acceptedrequest without a matchingX-Internal-Secretheader. - The bot refuses to start without
BOT_INTERNAL_SECRETwhenAPP_ENVis notdevelopmentortest. This is intentional fail-closed behavior — the bot can spawn outbound calls billed to your telephony account, so anonymous access is never safe in production. - Tighten
BOT_CORS_ORIGINS(comma-separated list) to only your API host(s). The default ships dev-friendly localhost values; replace them before going to production.