Picking the Voice

Your agent uses three AI services to have a conversation. You pick each one. Think of it like assembling a team:

Service	What it does	Like a...
STT (Speech-to-Text)	Hears the caller and writes down what they said	Stenographer
LLM (Language Model)	Thinks about what to say back	Brain
TTS (Text-to-Speech)	Speaks the reply out loud	Voice actor

Each one has a few choices with different trade-offs.

Quick recommendations

Your situation	STT	LLM	TTS
English customers, want it cheap	Deepgram	OpenAI gpt-5.4-mini	Deepgram Helena
English customers, want top quality	Deepgram	Anthropic claude-sonnet-4-6	ElevenLabs Sarah
Hindi / Tamil / regional language	Sarvam saaras:v3	OpenAI gpt-5.4-mini	ElevenLabs Multilingual
Mixed Hindi-English ("Hinglish")	Deepgram nova-3 (multi)	OpenAI gpt-5.4-mini	ElevenLabs Multilingual
Highest speed, lowest cost	Groq Whisper Turbo	Groq llama-3.3-70b	Groq Orpheus

If unsure, take row 1.

The ears: Speech-to-Text (STT)

This is what hears the caller and converts speech to text. It happens 20-30 times per minute.

Provider	Best for	Price (per minute)
Deepgram nova-3-general ⭐	English (any accent)	$0.0077
Deepgram nova-3-medical	Medical conversations	$0.0145
Deepgram nova-2-phonecall	Bad-quality phone audio	$0.0058
Sarvam saaras:v3	Hindi, Tamil, Telugu, Kannada, etc.	$0.0083
Sarvam saarika:v2.5	Indian languages (older)	$0.0083
Groq Whisper Turbo	Cheapest option, lower accuracy	$0.0006
ElevenLabs Scribe v2	High accuracy batch	$0.0083

English: Use Deepgram. It's faster and more accurate than the others. Indian languages: Use Sarvam. Deepgram does NOT support Hindi/Tamil/etc. Hinglish (code-switching): Set language to multi and use Deepgram nova-3.

The brain: Language Model (LLM)

This is what decides what the bot says. It's by far the most important choice for quality.

Model	Speed	Cost per 1K input tokens	When to pick
OpenAI gpt-5.4-nano	⚡⚡⚡	$0.20	Simple FAQs, light dialog
OpenAI gpt-5.4-mini ⭐	⚡⚡⚡	$0.40	Default — most use cases
OpenAI gpt-5.4	⚡⚡	$2.50	Complex reasoning, agentic tasks
Anthropic claude-haiku-4-5	⚡⚡⚡	$1.00	Multilingual, formal tone
Anthropic claude-sonnet-4-6	⚡⚡	$3.00	Balanced quality + speed
Anthropic claude-opus-4-7	⚡	$5.00	Premium quality — long, complex calls
Groq llama-3.3-70b	⚡⚡⚡⚡	$0.59	When you need very low latency
Groq gpt-oss-120b	⚡⚡⚡⚡	$0.15	Cheap + open-source
Groq qwen3-32b	⚡⚡⚡⚡	$0.29	Multi-language

⭐ = default. Don't change unless you have a reason.

The voice: Text-to-Speech (TTS)

This is the voice the caller hears. Each TTS provider has multiple voices.

Deepgram (Aura-2) — best balance

Fast, natural-sounding English. 14 voices included in the price.

Female:

Helena ⭐ — Warm, professional (default)
Asteria — Confident, articulate
Luna — Friendly, casual
Athena — Authoritative
Aurora — Bright, energetic
Iris — Gentle, soothing

Male:

Orpheus — Smooth, deep
Apollo — Professional
Zeus — Commanding
Hermes — Friendly
Atlas — Strong, mature

Price: $15 per 1M characters (about ₹2.50 per minute of speech).

ElevenLabs — best quality, especially for non-English

Models:

eleven_flash_v2_5 ⭐ — Best for voice agents, ~75ms latency
eleven_turbo_v2_5 — Higher quality, ~250ms latency
eleven_multilingual_v2 — 29 languages, highest quality, slower

Recommended voices:

Sarah (EXAVITQu4vr4xnSDxMaL) ⭐ — Mature female, English
Roger (CwhRBWXzGAHq8TQ4Fs17) — Casual male, English
George — Warm British storyteller
Daniel — Steady British broadcaster

For Hindi/Tamil/etc., use eleven_multilingual_v2 with any voice — they handle all 29 languages naturally.

Price: $50 per 1M characters for Flash, more for Turbo/Multilingual.

Groq (Orpheus) — cheapest

Six English-only voices: autumn, diana, hannah, austin, daniel, troy.

Price: $22 per 1M characters (about $0.40 of audio per dollar of TTS).

Setup note: First-time Orpheus use requires accepting Groq's terms once at console.groq.com/playground?model=canopylabs/orpheus-v1-english. One-time, per Groq org.

Voice speed and tuning (ElevenLabs only)

In Configure → Voice → Advanced, you can adjust:

Setting	Range	Default	What it does
Speed	0.7 – 1.2	1.0	How fast the bot talks
Stability	0 – 1	0.7	Higher = more consistent. Lower = more emotional range.
Similarity boost	0 – 1	0.75	Tries to sound exactly like the original voice
Style	0 – 1	0	Adds expressive style. Slow but emotive.
Speaker boost	on/off	off	Improves clarity (slight latency hit)

90% of users only ever touch Speed.

Background sound (optional)

You can play a quiet ambient sound during calls so the agent feels more human:

Sound	When it helps
None ⭐	Default — most calls
Office	"Sales agent calling from an office"
Cafe	"Friend casually chatting"
Rain	Calming, late-night support
White noise	Hide your real environment
Nature	Outdoor / wellness brands
Keyboard	"Tech support typing while I talk"

Volume slider goes 0-100. 40 is the right default — audible but not distracting.

Pronunciation tweaks (advanced)

If your brand name keeps getting mispronounced ("AIVF" said as "ay-vif" instead of "ay-eye-vee-eff"), add a pronunciation entry:

Configure → Advanced → Pronunciation Dictionary

[
  { "word": "AIVF", "pronunciation": "ay-eye-vee-eff" },
  { "word": "osmTalk", "pronunciation": "awsm-talk" }
]

Full guide: Pronunciation & Keyword Boost.

Per-agent provider keys (advanced)

If you want one agent to use YOUR Anthropic key and another to use the platform's, go to:

Settings → Provider Keys

Set a key globally there. Override per-agent in the agent's Voice tab if needed. Per-agent keys take priority over global keys.

Common questions

"My bot sounds robotic." Try ElevenLabs Sarah. It's the most natural English voice. For Hindi, try eleven_multilingual_v2.

"My Hindi callers complain the bot doesn't understand them." Switch STT to Sarvam saaras:v3. Deepgram does not support Hindi.

"It's too slow." Switch LLM to Groq llama-3.3-70b (4× faster than OpenAI). Quality slightly lower but usually fine.

"It's expensive." Switch to gpt-5.4-mini + Deepgram Helena (the defaults). You're probably overspending on a flagship model you don't need.

"It speaks in the wrong language." Your Language setting (Configure → Voice → Language) is wrong. Or your system prompt says "respond in English" — check both.

On this page