osmTalk Docs
Agents

Advanced Settings

Every knob in the agent config — what it does, default, when to change.

This page is the complete reference for the Advanced Settings section on each agent. Every toggle and slider in the dashboard is here, with a one-line "when would I change this?" so you can scan for the right knob.

For a conceptual walk-through (how STT/LLM/TTS/VAD fit together), see the Agent Config Guide in the examples repo.

Where to find these settings in the dashboard

app.osmtalk.com
   └─ Agents
       └─ click your agent
            └─ Config tab
                ├─ Identity         (name + greeting)
                ├─ Instructions     (system prompt)
                ├─ Voice & STT      (provider/model/voice)
                ├─ Tools            (HTTP + client + MCP + transfer + DTMF routes)
                ├─ Knowledge        (pronunciation + boosted keywords + post-call analysis)
                └─ Advanced ◀── all the knobs below
                     ├─ Model & Reasoning
                     ├─ Voice Activity Detection (VAD)
                     ├─ Smart Turn
                     ├─ Audio Quality (noise suppression, STT extras)
                     ├─ TTS Quality (ElevenLabs-specific)
                     ├─ Conversation Flow (interruptions, mute, idle, end-call)
                     ├─ Recording & Data
                     └─ DTMF / Keypad

Every section is collapsible. Defaults are sane — only open what you need.


1. Model & Reasoning

Controls the LLM that generates the agent's responses.

SettingKeyRangeDefaultWhat it does
Temperaturetemperature0–20.7Creativity vs determinism. Lower (0.2) for strict scripts (compliance, IVR). Higher (1.0+) for warm/playful agents.
Max TokensmaxTokens1–81921024Hard cap per response. Voice agents do best at 200-500 — bot is faster, less rambling.
Frequency PenaltyfrequencyPenalty-2 to 20Penalizes word repetition. Bump to 0.3 if the agent keeps saying the same phrase.
Presence PenaltypresencePenalty-2 to 20Encourages topic diversity. Usually leave at 0 for voice.
Enable ReasoningenableReasoningtogglefalseTurn on for o1/o3/o4/gpt-5* models — they reason before responding. Slower (300ms-3s extra) but better on complex tasks.
Reasoning EffortreasoningEffortlow / medium / high / xhighmediumOnly when Enable Reasoning is on. Higher = slower, more correct, more tokens spent.
Max Context TokensmaxContextTokens(auto)variesConversation-history token budget. If long calls cause TTFB to grow, lower this.
Enable Context SummarizationenableContextSummarizationtoggletrueAuto-summarizes old turns when context grows. Keeps long calls responsive.
Max Unsummarized MessagesmaxUnsummarizedMessages5-5020How many recent turns are kept verbatim before older ones get summarized into a single rolling summary.

GPT-5 / o-series auto-routing — these models work via OpenAI's /v1/responses endpoint, not the legacy /v1/chat/completions. The bot detects them by name (gpt-5*, o1*, o3*, o4*) and routes automatically — you just pick the model in the Voice & STT tab.


2. Voice Activity Detection (VAD)

VAD is "is the user speaking right now?" detection. The bot uses Silero VAD for this.

SettingKeyRangeDefault (web)Default (phone)What it does
VAD ConfidencevadConfidence0–10.700.75How confident before declaring "speech detected". Higher = stricter (rejects TV/babble but might miss quiet talkers).
VAD Start DurationvadStartSecs0.05–1.0s0.200.20How long speech must persist before VAD says "user is talking". Lower = more responsive but reacts to coughs.
VAD Stop DurationvadStopSecs0.1–1.5s0.200.20How long silence before VAD says "user stopped". The most impactful knob — raise to 0.40.6 if the bot interrupts mid-sentence.
VAD Min VolumevadMinVolume0–10.600.70Minimum audio volume to consider as speech. Higher rejects whispers/background; lower captures quiet speakers.

Defaults are channel-aware — the bot picks 0.75 confidence + 0.7 min_volume for phone audio (noisier) and 0.70 + 0.60 for web (cleaner). You only need to override when the defaults misbehave for your specific environment.

Tuning recipes — see VAD & Turn Detection troubleshooting for the symptom→fix table (bot interrupts user / bot waits forever / bot reacts to TV / bot ignores soft speakers).


3. Smart Turn

Silero VAD answers "is there speech?". Smart Turn answers "is the user DONE talking?" — a much harder question.

Without Smart Turn, the bot only uses vadStopSecs as the end-of-turn signal. Users naturally pause mid-sentence (filler words, thinking), and short pauses get mistaken for end-of-turn. Smart Turn looks at the last few seconds of audio + transcript and classifies the utterance as COMPLETE (respond now) or INCOMPLETE (keep listening).

SettingKeyRangeDefaultWhat it does
Smart Turn StopsmartTurnStopSecs0.5–3.0s0.8 web / 1.2 phoneMax wait time after VAD-stop before forcing end-of-turn regardless of Smart Turn's verdict. Raise for slow speakers; lower for snappy bots.
Smart Turn Pre-SpeechsmartTurnPreSpeechMs100-1000 ms300How much audio context BEFORE the user starts talking is fed to the classifier. Don't touch unless turn-taking feels off.
Filter Incomplete TurnsfilterIncompleteTurnstoggletrueWhen Smart Turn flags a turn as INCOMPLETE, drop the partial transcript instead of sending it to the LLM. Prevents the bot from "answering" half-sentences. Recommend keeping on.

The bot's prompt is automatically extended with the Smart Turn instruction set — your agent prompt does NOT need to know about // indicators.


4. Audio Quality

SettingKeyOptionsDefaultWhat it does
Noise SuppressiondenoisingModeOff / RNNoiseOffStrip background noise (HVAC, road, fan) before VAD/STT. RNNoise is free, ~5 ms latency, no setup. For louder environments (cafe babble, human-voice background) contact us about Krisp.
STT Smart FormatsttSmartFormattoggletrueAdds punctuation, capitalization, formats numbers/dates in the transcript (Deepgram only). Leave on.
STT Filler WordssttFillerWordstogglefalseTranscribe "um"/"uh"/"like" (Deepgram only). Enable only if your LLM needs filler signals (rare).
STT Word TimestampssttWordTimestampstogglefalseInclude per-word timestamps in the transcript (useful for diarization / waveform UIs). Adds slight payload size.
STT Latency (TTFS)sttLatency0–3s0 (auto)Override Deepgram's TTFS p99 budget. Don't touch unless support guides you.

Pronunciation Dictionary + Boosted Keywords live in the Knowledge tab, not Advanced. See Pronunciation & Boost — they're the biggest accuracy lever when the agent keeps mis-hearing your brand name.


5. TTS Quality (ElevenLabs only)

These only apply when TTS Provider = ElevenLabs in the Voice & STT tab. Deepgram/Groq/OpenAI TTS use their own internal quality presets.

SettingKeyRangeDefaultWhat it does
TTS Speed (Pace)ttsSpeed (also ttsPace)0.7–1.21.0Playback rate. 1.1 reads slightly faster; 0.95 slower for elderly callers.
TTS StabilityttsStability0–10.9Higher = more consistent voice across turns. Lower = more expression and emotion. For sales/empathy agents try 0.6.
TTS Similarity BoostttsSimilarityBoost0–10.75How closely to match the source voice. Higher = more accurate but can sound stiff.
TTS StylettsStyle0–10Style exaggeration. Push to 0.3-0.5 for character voices. 0 is safe default.
TTS Speaker BoostttsSpeakerBoosttogglefalseEnhances similarity at the cost of slight latency. Niche; usually leave off.
TTS Text NormalizationttsTextNormalizationauto / on / offautoSpells out numbers, dates, currencies. Turn off if you have a custom pronunciation dictionary that handles those.
TTS Enable SSMLttsEnableSsmltogglefalseAllow <break time="500ms"/>, <emphasis>, etc. in LLM output. Useful when you want to insert pauses for dramatic effect.
TTS Sample RatettsSampleRate16000 / 24000 / 4800048000Audio quality (Hz). 48kHz matches osmTalk's media-layer native rate, no resampling needed. Drop to 16k only if bandwidth is constrained.

For non-ElevenLabs TTS providers, these knobs are hidden — they don't have analogous params.


6. Conversation Flow

SettingKeyDefaultWhat it does
Allow InterruptionsallowInterruptionstrueUser can talk over the bot. When detected, bot stops mid-TTS and listens. Set false for IVR-style flows where you don't want callers cutting in.
Mute During WelcomemuteDuringWelcometrueUser mic is muted during the opening greeting. Prevents people from talking over your "Hi, this is Maya from..." intro.
Mute During Function CallsmuteDuringFunctionCallstrueUser mic is muted while the bot is calling an HTTP tool. Avoids "I'm processing... wait, what did you say?"
Enable End-Call DetectionenableEndCallDetectiontrueBot can autonomously hang up when the conversation has clearly ended ("Thanks, bye!"). Saves 10-30s of paid call time per call vs. waiting for the user to disconnect.
End-Call TriggerendCallTrigger"when the user says goodbye…"What the LLM looks for to decide the call is over. Customize for compliance scripts (e.g. "only end after confirming customer ID has been verified").
End-Call MessageendCallMessage"Thank you for calling. Goodbye!"What the bot says right before hanging up. Empty string = silent hangup.

Idle detection

If a caller goes silent — fell asleep on speakerphone, walked away, etc. — the bot can nudge them and eventually hang up.

SettingKeyDefaultWhat it does
Enable User Idle DetectionenableUserIdleDetectiontrueMaster switch for nudging on silence.
User Idle TimeoutuserIdleTimeoutSecs10sSeconds of silence before the bot says "Are you still there?"
User Idle Max RetriesuserIdleMaxRetries3How many escalating nudges before the bot ends the call. Sequence: "Are you still there?" → "Hello?" → final attempt → hangup.
Audio Idle TimeoutaudioIdleTimeoutSecs1.5sHard pipeline timeout if the audio stream goes completely dark (transport failure, not just user-silent). Lower than user-idle because it indicates a real connection problem.
Cancel on Idle TimeoutcancelOnIdleTimeouttrueWhether the pipeline cancels cleanly when audio-idle fires. Usually leave on.
Idle Timeout (Pipeline)idleTimeoutSecs300sPipeline-level safety net — force end after this much total silence.
Max Call DurationmaxCallDurationSecs0 (unlimited)Hard cap on call length in seconds. Useful for outbound campaigns where you want to budget per-call cost.

7. Recording & Data

SettingKeyDefaultWhat it does
Record CallsenableCallRecordingfalseSave call audio (mixed mono WAV) to your storage. Recording URL appears on the call detail page. Free if you keep recordings ≤90 days; small fee after.
Enable Multi-Channel RecordingenableMultiChannelRecordingfalseSeparate user-L / bot-R channels in a .stereo.wav file alongside the mixed mono. Useful for diarization or stereo analytics.
Save TranscriptsenableTranscriptSavingtrueStore per-turn transcripts in the database. Required for the call detail page to show what was said.
Save MetricsenableMetricsSavingtrueTrack per-component latency (STT/LLM/TTS), token counts, and cost breakdown. Required for the dashboard's analytics charts.
Post-Call AnalysispostCallAnalysis(no default)Declarative schema that runs an LLM analyzer after the call ends to extract structured fields (sentiment, disposition, custom JSON). Configure in the Knowledge tab. See Post-Call Analysis.

Recordings are stored at storage.osmtalk.com (presigned URLs, 7-day TTL by default). Increase retention in Settings → Storage.


8. DTMF / Keypad

Enable DTMF (enableDtmf) — false by default. Turn on to let callers press phone keypad digits as input. Comprehensive guide: DTMF Keypad Input.

Three modes once enabled:

  1. Prompt-driven — the LLM sees [The caller pressed keypad digit: 1] in its context and branches on that. Best for menus with conversational fallback.
  2. Direct routes — map specific digits to phone numbers or other agents in your workspace. The bot transfers/swaps immediately, bypassing the LLM. ~0.5s latency vs ~2-4s via prompt.
  3. Mixed — mapped digits go direct, unmapped fall through to the LLM. Common for "1 = Sales (direct), 2 = Support (direct), anything else → LLM tries to help."

9. Transfer

Configure Call Transfer in the Tools tab → Transfer Settings. Two flavors:

  • Transfer to Human — single destination phone number. When the LLM calls the transfer_to_human tool, the bot reads a hold message, captures a warm-handoff summary, and SIP-bridges the call.
  • Transfer to Agents — multiple AI agents the LLM can swap into (e.g. "if user asks about billing, hand off to Billing Bot"). Same call continues with a new system prompt + optionally a new voice.

Full guide: Call Transfer.


10. Voicemail Detection (Outbound Phone Calls)

For outbound campaigns where you don't want to burn 60 seconds talking to an answering machine.

SettingKeyDefaultWhat it does
Enable Voicemail DetectionenableVoicemailDetectionfalseMaster switch. Only applies to outbound phone calls (channel = phone).
Voicemail Response DelayvoicemailResponseDelay2.0sWait before classifying the first audio as voicemail. Too short = false positives on slow "hello?" answers; too long = wastes voicemail time.
Voicemail MessagevoicemailMessage""Optional message to leave when voicemail is detected. Leave empty for silent hangup. Spoken at max TTS speed to fit in the typical 30s voicemail window.

Requires an OpenAI API key in provider keys — the classifier uses gpt-4o-mini (separate from your agent's LLM choice). Full guide: Voicemail Detection.


Reset to defaults

Each section has a small Reset link in its header. Resets only that section's knobs back to the recommended channel-aware defaults — doesn't touch settings in other sections.

There's also a global Reset all Advanced Settings at the bottom of the page — useful when you've tuned an agent into a weird state and want to start over without recreating it.


Per-call overrides (via SDK)

Any setting on this page can be overridden for a single call without touching the agent itself:

await client.calls.outbound({
  agentId,
  phoneNumberId,
  destination,
  assistantOverride: {
    welcomeMessage: "Special opening for THIS call only",
    llmModel: "gpt-4o",                      // override model
    settings: {
      ttsSpeed: 1.1,                         // faster TTS for this call
      vadStopSecs: 0.4,                      // looser turn-taking for elderly callers
    },
  },
});

Useful for A/B testing voice/model variations without creating multiple agents. See SDK examples and Outbound Calls.


See also