osmTalk Docs
Agents

Pronunciation & Keyword Boost

Make your agent say tricky words correctly and recognize jargon, product names, and proper nouns more accurately.

Voice agents stumble on out-of-vocabulary terms in two places:

  • TTS speaks brand names, technical terms, or non-English words incorrectly ("AIVF" → "ay-vif" instead of "eye-vee-eff").
  • STT mis-transcribes the same uncommon words when the caller speaks them.

osmTalk fixes both with per-agent dictionaries — no provider-side uploads required.

Pronunciation dictionary

A list of { word, pronunciation } entries injected into the agent's system prompt. The LLM uses the phonetic spelling when generating text, so the TTS service then reads the correct sound.

When to use

  • Brand names / product names that aren't pronounced phonetically (Plivoplee-vo, osmTalkawsm-talk)
  • Acronyms that should be spoken letter-by-letter (AIay-eye, KYCkay-why-see)
  • Foreign words mixed into the dialogue
  • Hindi/Tamil/etc. transliterations that should be pronounced like an Indian native, not Anglicized

Configure

In the agent config, under Advanced → Pronunciation Dictionary:

FieldDescription
WordThe literal text the user might type/speak
PronunciationPhonetic spelling the bot should use (free-form, e.g. "ay-eye-vee-eff")
Alphabetipa (International Phonetic Alphabet) or cmu-arpabet — informational only; we currently use plain phonetic spellings since they work across providers

API example:

{
  "pronunciationDictionary": [
    { "word": "AIVF", "pronunciation": "ay-eye-vee-eff", "alphabet": "ipa" },
    { "word": "Plivo", "pronunciation": "plee-vo" },
    { "word": "Maharashtra", "pronunciation": "muh-haa-rashtra" }
  ]
}

When the LLM responds, it'll emit the phonetic spelling and the TTS service speaks it correctly. Caller hears the right sound; transcript still records the original word (LLM is instructed not to re-translate).

Limits

LimitValue
Max entries200
Max word length64 chars
Max pronunciation length128 chars

Boosted keywords (STT)

A list of { keyword, weight } entries passed to the speech recognizer. The recognizer biases toward those terms — significantly increasing recall for proper nouns, product names, and jargon that fall outside its general vocabulary.

Supported providers

  • Deepgram Nova-2 / Nova-3 — full native support (keyword + weight 0–10)
  • Sarvam — partial (general LM only; ignores weight)
  • ElevenLabs / Groq Whisper — currently ignored (keep the dictionary; we'll wire it as their SDKs add support)

Configure

In the agent config, under Advanced → Boosted Keywords:

FieldDescription
KeywordThe exact term that should be recognized more reliably
WeightRelative bias (default 1.5, max 10). Higher = more aggressive.

API example:

{
  "boostedKeywords": [
    { "keyword": "osmTalk", "weight": 2 },
    { "keyword": "Razorpay", "weight": 1.5 },
    { "keyword": "Aadhaar" },
    { "keyword": "Plivo", "weight": 1.8 }
  ]
}

Limits

LimitValue
Max entries100
Max keyword length64 chars
Weight range0 – 10

When to add a keyword

  • A specific term keeps appearing wrong in transcripts even though the audio was clear
  • The keyword is rare or has homophones (AIVF vs IVF — both valid, but very different meaning)
  • Brand spellings that the recognizer would normalize (osmTalk becoming osm talk)

Ambient background sound

Available via the existing backgroundSound agent setting. Choose from the built-in tracks (office, cafe, light rain, white noise) and tune backgroundVolume (0–100). The mixer is server-side for phone calls and client-side for the web widget, so it doesn't interfere with VAD.

See Voice Settings for the full list.

Tips

  • Don't over-stuff the pronunciation dictionary. Each entry uses tokens in the system prompt. Stick to terms the LLM actually gets wrong.
  • Try the pronunciation in the widget before going live — the wrong pronunciation can sound worse than the original.
  • Boosted keywords are case-insensitive on Deepgram. "osmtalk" and "osmTalk" boost the same recognition slot.