Pronunciation & Keyword Boost

Make your agent say tricky words correctly and recognize jargon, product names, and proper nouns more accurately.

Voice agents stumble on out-of-vocabulary terms in two places:

TTS speaks brand names, technical terms, or non-English words incorrectly ("AIVF" → "ay-vif" instead of "eye-vee-eff").
STT mis-transcribes the same uncommon words when the caller speaks them.

osmTalk fixes both with per-agent dictionaries — no provider-side uploads required.

Pronunciation dictionary

A list of { word, pronunciation } entries injected into the agent's system prompt. The LLM uses the phonetic spelling when generating text, so the TTS service then reads the correct sound.

When to use

Brand names / product names that aren't pronounced phonetically (Plivo → plee-vo, osmTalk → awsm-talk)
Acronyms that should be spoken letter-by-letter (AI → ay-eye, KYC → kay-why-see)
Foreign words mixed into the dialogue
Hindi/Tamil/etc. transliterations that should be pronounced like an Indian native, not Anglicized

Configure

In the agent config, under Advanced → Pronunciation Dictionary:

Field	Description
Word	The literal text the user might type/speak
Pronunciation	Phonetic spelling the bot should use (free-form, e.g. `"ay-eye-vee-eff"`)
Alphabet	`ipa` (International Phonetic Alphabet) or `cmu-arpabet` — informational only; we currently use plain phonetic spellings since they work across providers

API example:

{
  "pronunciationDictionary": [
    { "word": "AIVF", "pronunciation": "ay-eye-vee-eff", "alphabet": "ipa" },
    { "word": "Plivo", "pronunciation": "plee-vo" },
    { "word": "Maharashtra", "pronunciation": "muh-haa-rashtra" }
  ]
}

When the LLM responds, it'll emit the phonetic spelling and the TTS service speaks it correctly. Caller hears the right sound; transcript still records the original word (LLM is instructed not to re-translate).

Limits

Limit	Value
Max entries	200
Max word length	64 chars
Max pronunciation length	128 chars

Boosted keywords (STT)

A list of { keyword, weight } entries passed to the speech recognizer. The recognizer biases toward those terms — significantly increasing recall for proper nouns, product names, and jargon that fall outside its general vocabulary.

Supported providers

Deepgram Nova-2 / Nova-3 — full native support (keyword + weight 0–10)
Sarvam — partial (general LM only; ignores weight)
ElevenLabs / Groq Whisper — currently ignored (keep the dictionary; we'll wire it as their SDKs add support)

Configure

In the agent config, under Advanced → Boosted Keywords:

Field	Description
Keyword	The exact term that should be recognized more reliably
Weight	Relative bias (default `1.5`, max `10`). Higher = more aggressive.

API example:

{
  "boostedKeywords": [
    { "keyword": "osmTalk", "weight": 2 },
    { "keyword": "Razorpay", "weight": 1.5 },
    { "keyword": "Aadhaar" },
    { "keyword": "Plivo", "weight": 1.8 }
  ]
}

Limits

Limit	Value
Max entries	100
Max keyword length	64 chars
Weight range	0 – 10

When to add a keyword

A specific term keeps appearing wrong in transcripts even though the audio was clear
The keyword is rare or has homophones (AIVF vs IVF — both valid, but very different meaning)
Brand spellings that the recognizer would normalize (osmTalk becoming osm talk)

Ambient background sound

Available via the existing backgroundSound agent setting. Choose from the built-in tracks (office, cafe, light rain, white noise) and tune backgroundVolume (0–100). The mixer is server-side for phone calls and client-side for the web widget, so it doesn't interfere with VAD.

See Voice Settings for the full list.

Tips

Don't over-stuff the pronunciation dictionary. Each entry uses tokens in the system prompt. Stick to terms the LLM actually gets wrong.
Try the pronunciation in the widget before going live — the wrong pronunciation can sound worse than the original.
Boosted keywords are case-insensitive on Deepgram. "osmtalk" and "osmTalk" boost the same recognition slot.

Pronunciation & Keyword Boost

On this page