Pronunciation & Keyword Boost
Make your agent say tricky words correctly and recognize jargon, product names, and proper nouns more accurately.
Voice agents stumble on out-of-vocabulary terms in two places:
- TTS speaks brand names, technical terms, or non-English words incorrectly ("AIVF" → "ay-vif" instead of "eye-vee-eff").
- STT mis-transcribes the same uncommon words when the caller speaks them.
osmTalk fixes both with per-agent dictionaries — no provider-side uploads required.
Pronunciation dictionary
A list of { word, pronunciation } entries injected into the agent's system prompt. The LLM uses the phonetic spelling when generating text, so the TTS service then reads the correct sound.
When to use
- Brand names / product names that aren't pronounced phonetically (
Plivo→plee-vo,osmTalk→awsm-talk) - Acronyms that should be spoken letter-by-letter (
AI→ay-eye,KYC→kay-why-see) - Foreign words mixed into the dialogue
- Hindi/Tamil/etc. transliterations that should be pronounced like an Indian native, not Anglicized
Configure
In the agent config, under Advanced → Pronunciation Dictionary:
| Field | Description |
|---|---|
| Word | The literal text the user might type/speak |
| Pronunciation | Phonetic spelling the bot should use (free-form, e.g. "ay-eye-vee-eff") |
| Alphabet | ipa (International Phonetic Alphabet) or cmu-arpabet — informational only; we currently use plain phonetic spellings since they work across providers |
API example:
{
"pronunciationDictionary": [
{ "word": "AIVF", "pronunciation": "ay-eye-vee-eff", "alphabet": "ipa" },
{ "word": "Plivo", "pronunciation": "plee-vo" },
{ "word": "Maharashtra", "pronunciation": "muh-haa-rashtra" }
]
}When the LLM responds, it'll emit the phonetic spelling and the TTS service speaks it correctly. Caller hears the right sound; transcript still records the original word (LLM is instructed not to re-translate).
Limits
| Limit | Value |
|---|---|
| Max entries | 200 |
| Max word length | 64 chars |
| Max pronunciation length | 128 chars |
Boosted keywords (STT)
A list of { keyword, weight } entries passed to the speech recognizer. The recognizer biases toward those terms — significantly increasing recall for proper nouns, product names, and jargon that fall outside its general vocabulary.
Supported providers
- Deepgram Nova-2 / Nova-3 — full native support (keyword + weight 0–10)
- Sarvam — partial (general LM only; ignores weight)
- ElevenLabs / Groq Whisper — currently ignored (keep the dictionary; we'll wire it as their SDKs add support)
Configure
In the agent config, under Advanced → Boosted Keywords:
| Field | Description |
|---|---|
| Keyword | The exact term that should be recognized more reliably |
| Weight | Relative bias (default 1.5, max 10). Higher = more aggressive. |
API example:
{
"boostedKeywords": [
{ "keyword": "osmTalk", "weight": 2 },
{ "keyword": "Razorpay", "weight": 1.5 },
{ "keyword": "Aadhaar" },
{ "keyword": "Plivo", "weight": 1.8 }
]
}Limits
| Limit | Value |
|---|---|
| Max entries | 100 |
| Max keyword length | 64 chars |
| Weight range | 0 – 10 |
When to add a keyword
- A specific term keeps appearing wrong in transcripts even though the audio was clear
- The keyword is rare or has homophones (
AIVFvsIVF— both valid, but very different meaning) - Brand spellings that the recognizer would normalize (
osmTalkbecomingosm talk)
Ambient background sound
Available via the existing backgroundSound agent setting. Choose from the built-in tracks (office, cafe, light rain, white noise) and tune backgroundVolume (0–100). The mixer is server-side for phone calls and client-side for the web widget, so it doesn't interfere with VAD.
See Voice Settings for the full list.
Tips
- Don't over-stuff the pronunciation dictionary. Each entry uses tokens in the system prompt. Stick to terms the LLM actually gets wrong.
- Try the pronunciation in the widget before going live — the wrong pronunciation can sound worse than the original.
- Boosted keywords are case-insensitive on Deepgram. "osmtalk" and "osmTalk" boost the same recognition slot.