Advanced Settings
Every knob in the agent config — what it does, default, when to change.
This page is the complete reference for the Advanced Settings section on each agent. Every toggle and slider in the dashboard is here, with a one-line "when would I change this?" so you can scan for the right knob.
For a conceptual walk-through (how STT/LLM/TTS/VAD fit together), see the Agent Config Guide in the examples repo.
Where to find these settings in the dashboard
app.osmtalk.com
└─ Agents
└─ click your agent
└─ Config tab
├─ Identity (name + greeting)
├─ Instructions (system prompt)
├─ Voice & STT (provider/model/voice)
├─ Tools (HTTP + client + MCP + transfer + DTMF routes)
├─ Knowledge (pronunciation + boosted keywords + post-call analysis)
└─ Advanced ◀── all the knobs below
├─ Model & Reasoning
├─ Voice Activity Detection (VAD)
├─ Smart Turn
├─ Audio Quality (noise suppression, STT extras)
├─ TTS Quality (ElevenLabs-specific)
├─ Conversation Flow (interruptions, mute, idle, end-call)
├─ Recording & Data
└─ DTMF / KeypadEvery section is collapsible. Defaults are sane — only open what you need.
1. Model & Reasoning
Controls the LLM that generates the agent's responses.
| Setting | Key | Range | Default | What it does |
|---|---|---|---|---|
| Temperature | temperature | 0–2 | 0.7 | Creativity vs determinism. Lower (0.2) for strict scripts (compliance, IVR). Higher (1.0+) for warm/playful agents. |
| Max Tokens | maxTokens | 1–8192 | 1024 | Hard cap per response. Voice agents do best at 200-500 — bot is faster, less rambling. |
| Frequency Penalty | frequencyPenalty | -2 to 2 | 0 | Penalizes word repetition. Bump to 0.3 if the agent keeps saying the same phrase. |
| Presence Penalty | presencePenalty | -2 to 2 | 0 | Encourages topic diversity. Usually leave at 0 for voice. |
| Enable Reasoning | enableReasoning | toggle | false | Turn on for o1/o3/o4/gpt-5* models — they reason before responding. Slower (300ms-3s extra) but better on complex tasks. |
| Reasoning Effort | reasoningEffort | low / medium / high / xhigh | medium | Only when Enable Reasoning is on. Higher = slower, more correct, more tokens spent. |
| Max Context Tokens | maxContextTokens | (auto) | varies | Conversation-history token budget. If long calls cause TTFB to grow, lower this. |
| Enable Context Summarization | enableContextSummarization | toggle | true | Auto-summarizes old turns when context grows. Keeps long calls responsive. |
| Max Unsummarized Messages | maxUnsummarizedMessages | 5-50 | 20 | How many recent turns are kept verbatim before older ones get summarized into a single rolling summary. |
GPT-5 / o-series auto-routing — these models work via OpenAI's /v1/responses endpoint, not the legacy /v1/chat/completions. The bot detects them by name (gpt-5*, o1*, o3*, o4*) and routes automatically — you just pick the model in the Voice & STT tab.
2. Voice Activity Detection (VAD)
VAD is "is the user speaking right now?" detection. The bot uses Silero VAD for this.
| Setting | Key | Range | Default (web) | Default (phone) | What it does |
|---|---|---|---|---|---|
| VAD Confidence | vadConfidence | 0–1 | 0.70 | 0.75 | How confident before declaring "speech detected". Higher = stricter (rejects TV/babble but might miss quiet talkers). |
| VAD Start Duration | vadStartSecs | 0.05–1.0s | 0.20 | 0.20 | How long speech must persist before VAD says "user is talking". Lower = more responsive but reacts to coughs. |
| VAD Stop Duration | vadStopSecs | 0.1–1.5s | 0.20 | 0.20 | How long silence before VAD says "user stopped". The most impactful knob — raise to 0.4–0.6 if the bot interrupts mid-sentence. |
| VAD Min Volume | vadMinVolume | 0–1 | 0.60 | 0.70 | Minimum audio volume to consider as speech. Higher rejects whispers/background; lower captures quiet speakers. |
Defaults are channel-aware — the bot picks 0.75 confidence + 0.7 min_volume for phone audio (noisier) and 0.70 + 0.60 for web (cleaner). You only need to override when the defaults misbehave for your specific environment.
Tuning recipes — see VAD & Turn Detection troubleshooting for the symptom→fix table (bot interrupts user / bot waits forever / bot reacts to TV / bot ignores soft speakers).
3. Smart Turn
Silero VAD answers "is there speech?". Smart Turn answers "is the user DONE talking?" — a much harder question.
Without Smart Turn, the bot only uses vadStopSecs as the end-of-turn signal. Users naturally pause mid-sentence (filler words, thinking), and short pauses get mistaken for end-of-turn. Smart Turn looks at the last few seconds of audio + transcript and classifies the utterance as COMPLETE (respond now) or INCOMPLETE (keep listening).
| Setting | Key | Range | Default | What it does |
|---|---|---|---|---|
| Smart Turn Stop | smartTurnStopSecs | 0.5–3.0s | 0.8 web / 1.2 phone | Max wait time after VAD-stop before forcing end-of-turn regardless of Smart Turn's verdict. Raise for slow speakers; lower for snappy bots. |
| Smart Turn Pre-Speech | smartTurnPreSpeechMs | 100-1000 ms | 300 | How much audio context BEFORE the user starts talking is fed to the classifier. Don't touch unless turn-taking feels off. |
| Filter Incomplete Turns | filterIncompleteTurns | toggle | true | When Smart Turn flags a turn as INCOMPLETE, drop the partial transcript instead of sending it to the LLM. Prevents the bot from "answering" half-sentences. Recommend keeping on. |
The bot's prompt is automatically extended with the Smart Turn instruction set — your agent prompt does NOT need to know about ✓/○/◐ indicators.
4. Audio Quality
| Setting | Key | Options | Default | What it does |
|---|---|---|---|---|
| Noise Suppression | denoisingMode | Off / RNNoise | Off | Strip background noise (HVAC, road, fan) before VAD/STT. RNNoise is free, ~5 ms latency, no setup. For louder environments (cafe babble, human-voice background) contact us about Krisp. |
| STT Smart Format | sttSmartFormat | toggle | true | Adds punctuation, capitalization, formats numbers/dates in the transcript (Deepgram only). Leave on. |
| STT Filler Words | sttFillerWords | toggle | false | Transcribe "um"/"uh"/"like" (Deepgram only). Enable only if your LLM needs filler signals (rare). |
| STT Word Timestamps | sttWordTimestamps | toggle | false | Include per-word timestamps in the transcript (useful for diarization / waveform UIs). Adds slight payload size. |
| STT Latency (TTFS) | sttLatency | 0–3s | 0 (auto) | Override Deepgram's TTFS p99 budget. Don't touch unless support guides you. |
Pronunciation Dictionary + Boosted Keywords live in the Knowledge tab, not Advanced. See Pronunciation & Boost — they're the biggest accuracy lever when the agent keeps mis-hearing your brand name.
5. TTS Quality (ElevenLabs only)
These only apply when TTS Provider = ElevenLabs in the Voice & STT tab. Deepgram/Groq/OpenAI TTS use their own internal quality presets.
| Setting | Key | Range | Default | What it does |
|---|---|---|---|---|
| TTS Speed (Pace) | ttsSpeed (also ttsPace) | 0.7–1.2 | 1.0 | Playback rate. 1.1 reads slightly faster; 0.95 slower for elderly callers. |
| TTS Stability | ttsStability | 0–1 | 0.9 | Higher = more consistent voice across turns. Lower = more expression and emotion. For sales/empathy agents try 0.6. |
| TTS Similarity Boost | ttsSimilarityBoost | 0–1 | 0.75 | How closely to match the source voice. Higher = more accurate but can sound stiff. |
| TTS Style | ttsStyle | 0–1 | 0 | Style exaggeration. Push to 0.3-0.5 for character voices. 0 is safe default. |
| TTS Speaker Boost | ttsSpeakerBoost | toggle | false | Enhances similarity at the cost of slight latency. Niche; usually leave off. |
| TTS Text Normalization | ttsTextNormalization | auto / on / off | auto | Spells out numbers, dates, currencies. Turn off if you have a custom pronunciation dictionary that handles those. |
| TTS Enable SSML | ttsEnableSsml | toggle | false | Allow <break time="500ms"/>, <emphasis>, etc. in LLM output. Useful when you want to insert pauses for dramatic effect. |
| TTS Sample Rate | ttsSampleRate | 16000 / 24000 / 48000 | 48000 | Audio quality (Hz). 48kHz matches osmTalk's media-layer native rate, no resampling needed. Drop to 16k only if bandwidth is constrained. |
For non-ElevenLabs TTS providers, these knobs are hidden — they don't have analogous params.
6. Conversation Flow
| Setting | Key | Default | What it does |
|---|---|---|---|
| Allow Interruptions | allowInterruptions | true | User can talk over the bot. When detected, bot stops mid-TTS and listens. Set false for IVR-style flows where you don't want callers cutting in. |
| Mute During Welcome | muteDuringWelcome | true | User mic is muted during the opening greeting. Prevents people from talking over your "Hi, this is Maya from..." intro. |
| Mute During Function Calls | muteDuringFunctionCalls | true | User mic is muted while the bot is calling an HTTP tool. Avoids "I'm processing... wait, what did you say?" |
| Enable End-Call Detection | enableEndCallDetection | true | Bot can autonomously hang up when the conversation has clearly ended ("Thanks, bye!"). Saves 10-30s of paid call time per call vs. waiting for the user to disconnect. |
| End-Call Trigger | endCallTrigger | "when the user says goodbye…" | What the LLM looks for to decide the call is over. Customize for compliance scripts (e.g. "only end after confirming customer ID has been verified"). |
| End-Call Message | endCallMessage | "Thank you for calling. Goodbye!" | What the bot says right before hanging up. Empty string = silent hangup. |
Idle detection
If a caller goes silent — fell asleep on speakerphone, walked away, etc. — the bot can nudge them and eventually hang up.
| Setting | Key | Default | What it does |
|---|---|---|---|
| Enable User Idle Detection | enableUserIdleDetection | true | Master switch for nudging on silence. |
| User Idle Timeout | userIdleTimeoutSecs | 10s | Seconds of silence before the bot says "Are you still there?" |
| User Idle Max Retries | userIdleMaxRetries | 3 | How many escalating nudges before the bot ends the call. Sequence: "Are you still there?" → "Hello?" → final attempt → hangup. |
| Audio Idle Timeout | audioIdleTimeoutSecs | 1.5s | Hard pipeline timeout if the audio stream goes completely dark (transport failure, not just user-silent). Lower than user-idle because it indicates a real connection problem. |
| Cancel on Idle Timeout | cancelOnIdleTimeout | true | Whether the pipeline cancels cleanly when audio-idle fires. Usually leave on. |
| Idle Timeout (Pipeline) | idleTimeoutSecs | 300s | Pipeline-level safety net — force end after this much total silence. |
| Max Call Duration | maxCallDurationSecs | 0 (unlimited) | Hard cap on call length in seconds. Useful for outbound campaigns where you want to budget per-call cost. |
7. Recording & Data
| Setting | Key | Default | What it does |
|---|---|---|---|
| Record Calls | enableCallRecording | false | Save call audio (mixed mono WAV) to your storage. Recording URL appears on the call detail page. Free if you keep recordings ≤90 days; small fee after. |
| Enable Multi-Channel Recording | enableMultiChannelRecording | false | Separate user-L / bot-R channels in a .stereo.wav file alongside the mixed mono. Useful for diarization or stereo analytics. |
| Save Transcripts | enableTranscriptSaving | true | Store per-turn transcripts in the database. Required for the call detail page to show what was said. |
| Save Metrics | enableMetricsSaving | true | Track per-component latency (STT/LLM/TTS), token counts, and cost breakdown. Required for the dashboard's analytics charts. |
| Post-Call Analysis | postCallAnalysis | (no default) | Declarative schema that runs an LLM analyzer after the call ends to extract structured fields (sentiment, disposition, custom JSON). Configure in the Knowledge tab. See Post-Call Analysis. |
Recordings are stored at storage.osmtalk.com (presigned URLs, 7-day TTL by default). Increase retention in Settings → Storage.
8. DTMF / Keypad
Enable DTMF (enableDtmf) — false by default. Turn on to let callers press phone keypad digits as input. Comprehensive guide: DTMF Keypad Input.
Three modes once enabled:
- Prompt-driven — the LLM sees
[The caller pressed keypad digit: 1]in its context and branches on that. Best for menus with conversational fallback. - Direct routes — map specific digits to phone numbers or other agents in your workspace. The bot transfers/swaps immediately, bypassing the LLM. ~0.5s latency vs ~2-4s via prompt.
- Mixed — mapped digits go direct, unmapped fall through to the LLM. Common for "1 = Sales (direct), 2 = Support (direct), anything else → LLM tries to help."
9. Transfer
Configure Call Transfer in the Tools tab → Transfer Settings. Two flavors:
- Transfer to Human — single destination phone number. When the LLM calls the
transfer_to_humantool, the bot reads a hold message, captures a warm-handoff summary, and SIP-bridges the call. - Transfer to Agents — multiple AI agents the LLM can swap into (e.g. "if user asks about billing, hand off to Billing Bot"). Same call continues with a new system prompt + optionally a new voice.
Full guide: Call Transfer.
10. Voicemail Detection (Outbound Phone Calls)
For outbound campaigns where you don't want to burn 60 seconds talking to an answering machine.
| Setting | Key | Default | What it does |
|---|---|---|---|
| Enable Voicemail Detection | enableVoicemailDetection | false | Master switch. Only applies to outbound phone calls (channel = phone). |
| Voicemail Response Delay | voicemailResponseDelay | 2.0s | Wait before classifying the first audio as voicemail. Too short = false positives on slow "hello?" answers; too long = wastes voicemail time. |
| Voicemail Message | voicemailMessage | "" | Optional message to leave when voicemail is detected. Leave empty for silent hangup. Spoken at max TTS speed to fit in the typical 30s voicemail window. |
Requires an OpenAI API key in provider keys — the classifier uses gpt-4o-mini (separate from your agent's LLM choice). Full guide: Voicemail Detection.
Reset to defaults
Each section has a small Reset link in its header. Resets only that section's knobs back to the recommended channel-aware defaults — doesn't touch settings in other sections.
There's also a global Reset all Advanced Settings at the bottom of the page — useful when you've tuned an agent into a weird state and want to start over without recreating it.
Per-call overrides (via SDK)
Any setting on this page can be overridden for a single call without touching the agent itself:
await client.calls.outbound({
agentId,
phoneNumberId,
destination,
assistantOverride: {
welcomeMessage: "Special opening for THIS call only",
llmModel: "gpt-4o", // override model
settings: {
ttsSpeed: 1.1, // faster TTS for this call
vadStopSecs: 0.4, // looser turn-taking for elderly callers
},
},
});Useful for A/B testing voice/model variations without creating multiple agents. See SDK examples and Outbound Calls.
See also
- Voice & STT setup — picking provider + model + voice (Deepgram, Sarvam, ElevenLabs, etc.)
- VAD & Turn Detection deep dive — symptom-to-fix for "bot keeps interrupting" / "bot waits too long"
- DTMF Keypad Input — phone keypad menus + multi-digit PIN entry + direct routes
- Call Transfer — handoff to humans or other agents
- Voicemail Detection — auto-detect answering machines on outbound
- Post-Call Analysis — extract structured outcomes after the call
- Pronunciation & Boost — make STT and TTS get your brand name right
- Agent Config full reference — every knob explained at the bot pipeline level, plus 5 tuning recipes (fastest agent / most natural / strict compliance / Indic multilingual / noisy line / elderly callers)