osmTalk Docs
Agents

DTMF Keypad Input

Let callers press keypad digits to navigate menus (press 1 for English, 2 for Hindi, etc.)

DTMF is the keypad tone that plays when someone presses a digit on their phone. osmTalk captures these digits and feeds them to your agent's LLM so it can react to phone-menu-style input.

Works on both:

  • Phone calls (SIP) — digits captured from the phone line automatically
  • Web widget — a keypad UI appears during voice calls

Example 1: Language Selection Menu

Goal: when the caller presses 1 they get English, press 2 they get Hindi.

Step 1 — Enable DTMF

In agent settings → Advanced SettingsKeypad (DTMF) → turn Enable Keypad Input on. (Defaults to off — opt in per agent.)

Step 2 — Update your system prompt

You are Priya, a support agent for BlueBank.

When the call starts, greet the caller and ask them to choose a language:
  "For English, press 1. Hindi ke liye, 2 dabayein."

The caller's keypad press will appear in the conversation as:
  [The caller pressed keypad digit: 1]

Rules:
- If you see "keypad digit: 1" → continue in English
- If you see "keypad digit: 2" → switch to Hindi (हिन्दी) for the rest of the call
- If you see any other digit → say "Sorry, that option isn't available. Please press 1 for English or 2 for Hindi."

Once the caller has selected a language, DO NOT mention the menu again. Proceed to ask how you can help.

Step 3 — What the caller experiences

Bot: "Hi, you've reached BlueBank. For English press 1, Hindi ke liye 2 dabayein."
[caller taps 2 on their phone]
Bot: "ठीक है, मैं हिंदी में बात करूँगी। आप मेरी कैसे मदद कर सकते हैं?"

Example 2: Account Verification via Keypad

Goal: caller enters their 4-digit PIN using the keypad instead of speaking it (more secure, avoids mic-echo).

System prompt

When the caller wants to check their account balance, ask them:
  "Please enter your 4-digit PIN using the keypad."

The bot batches rapid keypad presses and delivers them as a single message:
  [The caller pressed keypad digits: 7420]

Extract the PIN and call verify_pin(pin="7420"). Never ask the caller to
speak their PIN aloud.

How batching works

The bot waits 1.2 seconds after the last keypress before sending digits to the LLM. So when a caller types a 4-digit PIN:

  • Press 7... (wait)
  • Press 4... (wait)
  • Press 2... (wait)
  • Press 0... (wait 1.2s — no more digits)
  • Bot sees: [The caller pressed keypad digits: 7420] → one LLM turn

A single isolated press (like a menu selection) arrives as: [The caller pressed keypad digit: 5]

What the caller hears

Caller: "Can you tell me my balance?"
Bot:    "Sure, please enter your 4-digit PIN on the keypad."
[caller taps 7, 4, 2, 0]
Bot:    "Thanks, verifying... Your balance is ₹12,450."

Example 3: Customer Support Tree

Goal: classic IVR-style tree for a 3-department business.

System prompt

When the call connects, play this menu:
  "Welcome to TechCo. Press 1 for Sales, 2 for Technical Support, 3 for Billing, or stay on the line to speak to any agent."

Wait for a keypad press OR wait 8 seconds for a voice response.

Keypad responses:
- "keypad digit: 1" → "Connecting you to Sales." then call transfer_to_human with department="sales"
- "keypad digit: 2" → "Connecting you to Technical Support." then call transfer_to_human with department="support"
- "keypad digit: 3" → Proceed to billing flow (ask "What's your account number?")

Voice responses:
- Any keyword matching sales/buy/pricing → treat as if they pressed 1
- Any keyword matching broken/not working/help → treat as if they pressed 2
- Any keyword matching invoice/bill/refund → treat as if they pressed 3

Web Widget Keypad

Users on the website-embedded widget can tap keypad digits too. During a voice call, a # icon appears next to the mute button:

[🎤 Mute]  [# Keypad]

Tapping opens a 3×4 grid (1 2 3 / 4 5 6 / 7 8 9 / * 0 #). Each tap sends the digit to the bot via osmTalk's real-time data channel — same LLM prompt format as phone DTMF.

Works great for:

  • Testing your menu flow before taking real calls
  • Users on laptops / mobile web who want silent/private input
  • Accessibility — users who prefer not to speak

Troubleshooting

ProblemLikely causeFix
Bot doesn't react to keypad pressesenableDtmf turned offEnable it in agent settings
DTMF works on web but not phoneSIP provider not forwarding RFC 2833Contact your SIP provider — Plivo supports this by default
Bot says "Invalid option" every timeSystem prompt doesn't handle [The caller pressed keypad digit: X] patternUpdate system prompt with the routing rules shown above
Caller hears menu twiceSystem prompt missing "DO NOT repeat menu once selected" ruleAdd that rule to stop re-playing

Direct Keypad Routes (Phone or AI Agent)

Skip the LLM entirely for specific keypad digits. Configure up to 10 routes per agent, each mapping one or more digits (0–9, *, #) to either:

  • Phone — a real phone number (SIP transfer; caller leaves the bot)
  • Agent — another AI agent in your workspace (mid-call swap; same call, new system prompt + optional voice)

When the caller presses a mapped digit, the bot announces and routes immediately — no LLM in the loop.

How it works

  1. Caller dials your number → hears greeting → hears menu ("Press 1 for Sales, 2 for Support")
  2. Caller presses 1
  3. Bot detects the mapped route BEFORE the LLM sees the press
  4. Bot speaks sayBeforeTransfer (e.g. "Connecting you to Sales. Please hold.")
  5. Bot initiates SIP transfer to the mapped number
  6. Caller connects to the human agent; bot exits the call

Latency: ~0.5s between digit press and transfer start (vs ~2-4s via LLM tool).

Example — mixed phone + agent IVR

In agent config → AdvancedKeypad (DTMF)Direct Transfer Routes:

KeyLabelTypeDestination
1SalesPhone+919876543210
2Tech SupportAgentTech Support Bot (workspace agent)
3BillingPhone+919876543212
0OperatorPhone+919876543200

When the caller presses 2, the same call continues but the bot's system prompt is swapped to your Tech Support Bot configuration — voice can also swap if voiceOverride is set on the route. Conversation history is preserved.

System prompt:

When the call connects, greet the caller and read this menu:
  "Welcome to TechCo. Press 1 for Sales, 2 for Technical Support,
   3 for Billing, or 0 for operator. You can also just tell me how
   I can help."

If the caller speaks their request (doesn't press keys), handle it
conversationally as usual. Mapped keypad digits are handled automatically
so you don't need special logic for them.

That's it. No tool calls, no prompt engineering — presses work automatically.

Multi-digit routes

Routes support 1-3 character sequences. Examples:

  • 1 → main reception
  • 01 → reception after hours
  • 911 → emergency hotline
  • *9 → voicemail

When a caller starts typing and the buffer could still match a longer route, the bot waits. Press 9 → bot waits 1.2s → if caller hits 1, 1 right after → matches 911. If caller stops at 9, bot flushes 9 to the LLM after the debounce.

Routes vs system-prompt menus

Both approaches work. Pick the right one:

Use routes whenUse prompt menu when
Direct dial to a specific numberBranching logic needed ("if after 5pm, say X")
Fastest possible transferCaller needs to enter additional info first
No conversation state mattersRouting depends on context (recent topic, etc.)
Same phone destination every timePhone destination computed at runtime

You can mix — mapped digits go direct, unmapped digits fall through to the LLM.

Fallbacks

ScenarioBehavior
Web widget press of a Phone routeBot sends an info message ("On phone this would dial X") but doesn't actually transfer (no SIP on web). Used mainly for testing.
Web widget press of an Agent routeWorks on web — agent swap doesn't need SIP.
Phone route — destination offline / busySIP returns failure → bot apologizes and continues with LLM
Agent route — target agent missing or invalidBot says "Sorry, I couldn't connect to that agent" and continues with the original agent
User presses unmapped digitGoes to LLM as [The caller pressed keypad digit: X]

Configuration Reference

SettingDefaultDescription
enableDtmffalseCapture keypad digits — opt in per agent
transferSettings.dtmfRoutes[]Array of route objects, max 10
dtmfRoutes[].digit1–3 characters from 0-9 * #
dtmfRoutes[].labelHuman-readable name (e.g. "Sales")
dtmfRoutes[].destinationType"phone""phone" (SIP transfer) or "agent" (mid-call AI swap)
dtmfRoutes[].destinationNumberRequired when type is "phone" — E.164 format (+919876543210)
dtmfRoutes[].targetAgentIdRequired when type is "agent" — id of the workspace agent to swap to
dtmfRoutes[].voiceOverrideOptional TTS voice id to switch to on agent swap
dtmfRoutes[].sayBeforeTransfer"Connecting you now. Please hold."TTS message played before transfer/swap

The bot delivers digits to the LLM in two formats depending on how the caller is pressing:

ScenarioLLM sees
Single isolated press (e.g., menu choice)[The caller pressed keypad digit: 5]
Rapid multi-digit entry (e.g., PIN)[The caller pressed keypad digits: 7420]

Where each <digit> is one of 0-9, *, or #.

Safety & Rate Limits (Built-In)

Every DTMF input — whether from SIP phone or web widget — passes through a hardened validator:

LimitValueWhy
Valid digits0-9 * # onlyRejects injected strings like <script>alert(1)</script>
Min interval between presses50msPrevents floods (1000 digits/second abuse)
Max digits per batch20Caps PIN-length; larger entries silently truncated
Debounce window1.2sGroups rapid PIN entries into one LLM turn
Input truncationFirst char only from webDefends against client sending "1234" as one digit

These are enforced by the backend DTMFProcessor. Invalid input is silently dropped and logged at DEBUG level — attackers get no feedback.