osmTalk Docs
Agents

Picking the Voice

How to make your agent sound the way you want — accent, language, and speed.

Your agent uses three AI services to have a conversation. You pick each one. Think of it like assembling a team:

ServiceWhat it doesLike a...
STT (Speech-to-Text)Hears the caller and writes down what they saidStenographer
LLM (Language Model)Thinks about what to say backBrain
TTS (Text-to-Speech)Speaks the reply out loudVoice actor

Each one has a few choices with different trade-offs.

Quick recommendations

Your situationSTTLLMTTS
English customers, want it cheapDeepgramOpenAI gpt-5.4-miniDeepgram Helena
English customers, want top qualityDeepgramAnthropic claude-sonnet-4-6ElevenLabs Sarah
Hindi / Tamil / regional languageSarvam saaras:v3OpenAI gpt-5.4-miniElevenLabs Multilingual
Mixed Hindi-English ("Hinglish")Deepgram nova-3 (multi)OpenAI gpt-5.4-miniElevenLabs Multilingual
Highest speed, lowest costGroq Whisper TurboGroq llama-3.3-70bGroq Orpheus

If unsure, take row 1.


The ears: Speech-to-Text (STT)

This is what hears the caller and converts speech to text. It happens 20-30 times per minute.

ProviderBest forPrice (per minute)
Deepgram nova-3-generalEnglish (any accent)$0.0077
Deepgram nova-3-medicalMedical conversations$0.0145
Deepgram nova-2-phonecallBad-quality phone audio$0.0058
Sarvam saaras:v3Hindi, Tamil, Telugu, Kannada, etc.$0.0083
Sarvam saarika:v2.5Indian languages (older)$0.0083
Groq Whisper TurboCheapest option, lower accuracy$0.0006
ElevenLabs Scribe v2High accuracy batch$0.0083

English: Use Deepgram. It's faster and more accurate than the others. Indian languages: Use Sarvam. Deepgram does NOT support Hindi/Tamil/etc. Hinglish (code-switching): Set language to multi and use Deepgram nova-3.

The brain: Language Model (LLM)

This is what decides what the bot says. It's by far the most important choice for quality.

ModelSpeedCost per 1K input tokensWhen to pick
OpenAI gpt-5.4-nano⚡⚡⚡$0.20Simple FAQs, light dialog
OpenAI gpt-5.4-mini⚡⚡⚡$0.40Default — most use cases
OpenAI gpt-5.4⚡⚡$2.50Complex reasoning, agentic tasks
Anthropic claude-haiku-4-5⚡⚡⚡$1.00Multilingual, formal tone
Anthropic claude-sonnet-4-6⚡⚡$3.00Balanced quality + speed
Anthropic claude-opus-4-7$5.00Premium quality — long, complex calls
Groq llama-3.3-70b⚡⚡⚡⚡$0.59When you need very low latency
Groq gpt-oss-120b⚡⚡⚡⚡$0.15Cheap + open-source
Groq qwen3-32b⚡⚡⚡⚡$0.29Multi-language

⭐ = default. Don't change unless you have a reason.

The voice: Text-to-Speech (TTS)

This is the voice the caller hears. Each TTS provider has multiple voices.

Deepgram (Aura-2) — best balance

Fast, natural-sounding English. 14 voices included in the price.

Female:

  • Helena ⭐ — Warm, professional (default)
  • Asteria — Confident, articulate
  • Luna — Friendly, casual
  • Athena — Authoritative
  • Aurora — Bright, energetic
  • Iris — Gentle, soothing

Male:

  • Orpheus — Smooth, deep
  • Apollo — Professional
  • Zeus — Commanding
  • Hermes — Friendly
  • Atlas — Strong, mature

Price: $15 per 1M characters (about ₹2.50 per minute of speech).

ElevenLabs — best quality, especially for non-English

Models:

  • eleven_flash_v2_5 ⭐ — Best for voice agents, ~75ms latency
  • eleven_turbo_v2_5 — Higher quality, ~250ms latency
  • eleven_multilingual_v2 — 29 languages, highest quality, slower

Recommended voices:

  • Sarah (EXAVITQu4vr4xnSDxMaL) ⭐ — Mature female, English
  • Roger (CwhRBWXzGAHq8TQ4Fs17) — Casual male, English
  • George — Warm British storyteller
  • Daniel — Steady British broadcaster

For Hindi/Tamil/etc., use eleven_multilingual_v2 with any voice — they handle all 29 languages naturally.

Price: $50 per 1M characters for Flash, more for Turbo/Multilingual.

Groq (Orpheus) — cheapest

Six English-only voices: autumn, diana, hannah, austin, daniel, troy.

Price: $22 per 1M characters (about $0.40 of audio per dollar of TTS).

Setup note: First-time Orpheus use requires accepting Groq's terms once at console.groq.com/playground?model=canopylabs/orpheus-v1-english. One-time, per Groq org.

Voice speed and tuning (ElevenLabs only)

In Configure → Voice → Advanced, you can adjust:

SettingRangeDefaultWhat it does
Speed0.7 – 1.21.0How fast the bot talks
Stability0 – 10.7Higher = more consistent. Lower = more emotional range.
Similarity boost0 – 10.75Tries to sound exactly like the original voice
Style0 – 10Adds expressive style. Slow but emotive.
Speaker booston/offoffImproves clarity (slight latency hit)

90% of users only ever touch Speed.

Background sound (optional)

You can play a quiet ambient sound during calls so the agent feels more human:

SoundWhen it helps
NoneDefault — most calls
Office"Sales agent calling from an office"
Cafe"Friend casually chatting"
RainCalming, late-night support
White noiseHide your real environment
NatureOutdoor / wellness brands
Keyboard"Tech support typing while I talk"

Volume slider goes 0-100. 40 is the right default — audible but not distracting.

Pronunciation tweaks (advanced)

If your brand name keeps getting mispronounced ("AIVF" said as "ay-vif" instead of "ay-eye-vee-eff"), add a pronunciation entry:

Configure → Advanced → Pronunciation Dictionary

[
  { "word": "AIVF", "pronunciation": "ay-eye-vee-eff" },
  { "word": "osmTalk", "pronunciation": "awsm-talk" }
]

Full guide: Pronunciation & Keyword Boost.

Per-agent provider keys (advanced)

If you want one agent to use YOUR Anthropic key and another to use the platform's, go to:

Settings → Provider Keys

Set a key globally there. Override per-agent in the agent's Voice tab if needed. Per-agent keys take priority over global keys.

Common questions

"My bot sounds robotic." Try ElevenLabs Sarah. It's the most natural English voice. For Hindi, try eleven_multilingual_v2.

"My Hindi callers complain the bot doesn't understand them." Switch STT to Sarvam saaras:v3. Deepgram does not support Hindi.

"It's too slow." Switch LLM to Groq llama-3.3-70b (4× faster than OpenAI). Quality slightly lower but usually fine.

"It's expensive." Switch to gpt-5.4-mini + Deepgram Helena (the defaults). You're probably overspending on a flagship model you don't need.

"It speaks in the wrong language." Your Language setting (Configure → Voice → Language) is wrong. Or your system prompt says "respond in English" — check both.