osmTalk Docs
Agents

Troubleshooting

Flaky models, noisy audio, "sometimes it works sometimes not" — and how to fix each one.

If your agent works one call and fails the next, the root cause is almost always one of these five. Work through them in order — the cheap checks are first.

1. Check the provider health badge

The dashboard polls every provider every 60 seconds and renders a yellow or red strip at the top when something's degraded. If you see "Provider issue detected", that's your answer.

You can also hit the health endpoint directly:

curl https://api.osmtalk.com/api/health/models

Returns:

{
  "openai":     { "status": "ok",   "latencyMs": 312 },
  "anthropic":  { "status": "ok",   "latencyMs": 451 },
  "groq":       { "status": "down", "latencyMs": null, "error": "401 invalid api key" },
  "deepgram":   { "status": "ok",   "latencyMs": 87  },
  "elevenlabs": { "status": "ok",   "latencyMs": 220 },
  "sarvam":     { "status": "ok",   "latencyMs": 195 }
}

status values:

ValueMeaningAction
okProvider responded in under 3sNone
degradedResponded but over 3sSwitch to a backup model for this call
downProvider erroredSwitch model. File a ticket if persistent.
unconfiguredAPI key not set on the platformContact support

The endpoint is cached for 60s server-side so checking it from a hot loop is fine.

2. "Sometimes works, sometimes not" on a specific model

The single biggest cause in 2026 is Groq Orpheus first-use ToS. When you call Orpheus for the first time on a new key, Groq requires a one-time terms-of-service acceptance in their console. Until you accept, the call fails ~50% of the time with no clear error.

Fix: log into console.groq.com, open any Orpheus model page, and click "Accept terms".

Other common culprits:

  • OpenAI rate limits — Tier 1 keys cap at 500 RPM. Voice agents burst hard during peak hours. Upgrade tier or switch to groq/llama-3.3-70b-versatile which has higher RPM.
  • ElevenLabs concurrency — Free / Starter plans cap concurrent TTS streams at 2 / 5. Calls fail with "no available concurrency". Upgrade to Creator+.
  • Sarvam region — Sarvam endpoints have a soft preference for Mumbai region. Calls from us-east will see 400–700 ms added latency. If you're not Indian-language, use Deepgram instead.

3. Noisy audio — calls from cafes, traffic, call centers

osmTalk supports server-side noise suppression that runs BEFORE the audio reaches VAD and STT.

Enable in: Agent → Advanced → Turn Detection → Noise Suppression

ModeCostQualityWhen to use
OffFreeQuiet rooms only
RNNoiseFree (CPU only)GoodDefault for noisy environments

RNNoise removes ~80% of constant ambient noise (HVAC, traffic, crowd murmur). For non-stationary noise (cafe babble, typing, doors, music, human voices in the background), we offer Krisp as a managed add-on — contact us to enable it on your account.

Audio sounds metallic / cuts in and out after enabling noise suppression

Aggressive noise suppression can over-process when the room is already quiet. Switch to Off for indoor / studio deployments.

4. "The bot doesn't hear me" / "It keeps interrupting"

This is a VAD / turn-detection issue, not a model issue. See the VAD & Turn Detection guide — channel-aware defaults solve 90% of cases.

Quick triage:

SymptomLikely fix
Bot interrupts mid-sentenceIncrease vadStopSecs and smartTurnStopSecs
Bot doesn't respondDecrease vadConfidence and vadMinVolume
Long silence before bot repliesDecrease smartTurnStopSecs
Hangs forever on a noisy lineLower audioIdleTimeoutSecs to 5s

5. Markdown showing as # in the system prompt

If you used "Generate with AI" before May 2026 the LLM sometimes inserted markdown (## Personality, **bold**) into the generated prompt, which then appeared as literal # and * characters in the editor.

Fixed in v0.5.2 — the generator now requests plain text and the response is sanitized server-side. Existing prompts can be cleaned up manually in the editor or re-generated.

6. Per-minute cost varies between calls

This is expected, not a bug.

Voice-AI cost is not flat per minute — it depends on:

  • LLM input tokens (your system prompt + each user turn's context)
  • LLM output tokens (how much the bot says)
  • TTS characters (how many spoken characters the bot generates)
  • STT minutes (call duration)
  • SIP minutes (for phone calls only)

The preset picker on agent creation shows a range (low / typical / high) computed from the same rate tables we bill with. A short FAQ call lands at the bottom of the range; a verbose sales call with RAG context lands at the top.

The exact per-call cost is broken down in Calls → [Call] → Cost after each call. You only pay for what you used.

Still stuck?

  • Logs: each call has detailed traces under Calls → [Call] → Logs
  • Replay: every call can be re-run in the Simulate tab without spending real money
  • Support: include the call ID — that's enough for us to reproduce everything

Operator notes (self-hosted)

If you're running the bot yourself:

  • Set BOT_INTERNAL_SECRET on the bot process to the same value as the API's INTERNAL_API_SECRET. The bot rejects any /start*, /whatsapp-answer, or /whatsapp-accepted request without a matching X-Internal-Secret header.
  • The bot refuses to start without BOT_INTERNAL_SECRET when APP_ENV is not development or test. This is intentional fail-closed behavior — the bot can spawn outbound calls billed to your telephony account, so anonymous access is never safe in production.
  • Tighten BOT_CORS_ORIGINS (comma-separated list) to only your API host(s). The default ships dev-friendly localhost values; replace them before going to production.