Skip to main content
Interruption handling determines how your agent responds when users speak while the agent is talking. Iqra AI provides multiple strategies—from simple voice activity detection to LLM-based decision making—giving you full control over the conversation dynamics.

Why interruptions matter

Humans naturally interrupt each other in conversation:
  • Barge-in - “Actually, I need to—”
  • Backchannel - “Uh-huh”, “mm-hmm”, “I see”
  • Clarification - “Wait, what was that last part?”
  • Correction - “No, that’s not my address”
Your agent needs to distinguish between:
  • Noise (ignore)
  • Backchannels (acknowledge but keep talking)
  • Real interruptions (stop and listen)

Configuration overview

Interruption settings live in the agent configuration:
{
  "Interruptions": {
    "UseTurnByTurnMode": false,
    "IncludeInterruptedSpeechInTurnByTurnMode": null,
    "TurnEnd": { /* When to stop listening */ },
    "PauseTrigger": { /* When to pause speaking */ },
    "Verification": { /* Verify if interruption is real */ }
  }
}

Turn-by-turn mode

The simplest approach: strict turn-taking with no interruptions allowed.
UseTurnByTurnMode
boolean
default:"false"
Enable strict turn-taking
  • true - Agent speaks, waits for silence, then listens
  • false - Users can interrupt mid-speech (barge-in enabled)
IncludeInterruptedSpeechInTurnByTurnMode
boolean
default:"null"
When agent is interrupted, include what it was saying in context
  • true - AI knows what was cut off
  • false - AI only sees what was actually spoken
  • null - Use system default
When to use:
  • Formal interactions (legal disclosures, compliance scripts)
  • Noisy environments where false interruptions are common
  • Simple IVR-style menus
Example:
Agent: "Your account balance is $1,250. Your last transaction was..."
[User tries to speak - ignored]
Agent: "...a debit of $45 on March 3rd. Do you have any questions?"
[Now user can speak]

Turn end detection

Determines when the user has finished speaking so the agent can respond.

VAD (Voice Activity Detection)

Type: VAD Uses signal processing to detect speech vs. silence.
VadSpeechDurationMS
integer
default:"150"
Minimum milliseconds of speech to register as “user started talking”
VadSilenceDurationMS
integer
default:"300"
Milliseconds of silence to register as “user finished talking”
Configuration example:
{
  "Type": "VAD",
  "VadSpeechDurationMS": 150,
  "VadSilenceDurationMS": 300
}
Pros:
  • Fastest response time (no API calls)
  • Deterministic and predictable
  • Works offline
Cons:
  • May cut off slow speakers
  • Can’t distinguish between pause and completion
  • Sensitive to noise
Increase VadSilenceDurationMS to 500-700ms for elderly users or non-native speakers who pause mid-sentence.

STT (Speech-to-Text)

Type: STT Uses your STT provider’s endpointing logic. Configuration example:
{
  "Type": "STT"
}
Pros:
  • More accurate than VAD
  • Provider-optimized algorithms
  • Language-aware
Cons:
  • Slightly slower than VAD
  • Depends on provider quality
  • Requires network round-trip
When to use: Default for most conversational agents.

ML (Machine Learning)

Type: ML Uses a specialized ML model trained to predict turn completion.
MLTurnEndVADMinimumSpeechDurationMS
integer
default:"150"
Minimum speech duration before ML model activates
MLTurnEndVADMinimumSilenceDurationMS
integer
default:"300"
Minimum silence before ML model evaluates
MlTurnEndFallbackMs
integer
default:"2000"
Maximum wait time before forcing turn end
Configuration example:
{
  "Type": "ML",
  "MLTurnEndVADMinimumSpeechDurationMS": 150,
  "MLTurnEndVADMinimumSilenceDurationMS": 300,
  "MlTurnEndFallbackMs": 2000
}
Pros:
  • Best at distinguishing pauses from completion
  • Adapts to speaking patterns
  • Reduces false triggers
Cons:
  • Adds latency (model inference time)
  • Requires ML infrastructure
  • May need tuning per language
When to use: Complex conversations where users speak in long, multi-clause sentences.

AI (LLM-based)

Type: AI Uses an LLM to analyze if the user’s statement is complete.
UseAgentLLM
boolean
default:"null"
  • true - Use the agent’s configured LLM
  • false - Use dedicated LLM (specify in LLMIntegration)
LLMIntegration
object
Custom LLM configuration (if UseAgentLLM: false)
Configuration example:
{
  "Type": "AI",
  "UseAgentLLM": true
}
Pros:
  • Semantic understanding of completion
  • Best for complex, multi-turn exchanges
  • Context-aware decisions
Cons:
  • Highest latency (LLM API call)
  • Non-deterministic
  • Higher cost
When to use: High-value conversations where perfect turn-taking is critical (therapy bots, executive assistants).
AI turn end detection adds 200-500ms latency. Only use when semantic accuracy justifies the delay.

Pause trigger

Determines when to pause the agent’s speech if the user starts talking (barge-in detection).
PauseTrigger.Enabled
boolean
default:"null"
Enable pause trigger (null = disabled)
PauseTrigger.Type
enum
  • VAD - Voice activity detection
  • STT - Speech-to-text based

VAD pause trigger

VadDurationMS
integer
Milliseconds of speech detected to trigger pause
Configuration example:
{
  "PauseTrigger": {
    "Type": "VAD",
    "VadDurationMS": 300
  }
}
Behavior:
Agent: "Your account balance is $1,250 and your last trans—"
[User speaks for 300ms]
Agent: [pauses immediately]

STT pause trigger

WordCount
integer
Number of words transcribed to trigger pause
Configuration example:
{
  "PauseTrigger": {
    "Type": "STT",
    "WordCount": 2
  }
}
Behavior:
Agent: "Your account balance is $1,250 and your last trans—"
User: "Wait, stop" [2 words detected]
Agent: [pauses]
Comparison:
TypeLatencyAccuracyUse Case
VAD300msModerateFast-paced conversations
STT500-800msHighAvoid false positives from noise
Use STT pause trigger with WordCount: 2 to ignore backchannels like “uh-huh” while catching real interruptions.

Interruption verification

After pausing, verify if the interruption was intentional or just noise/backchanneling.
Verification.Enabled
boolean
default:"false"
Enable LLM-based verification
Verification.UseAgentLLM
boolean
default:"true"
  • true - Use agent’s LLM
  • false - Use dedicated LLM (specify in LLMIntegration)
Verification.LLMIntegration
object
Custom LLM configuration (if UseAgentLLM: false)
Configuration example:
{
  "Verification": {
    "Enabled": true,
    "UseAgentLLM": true
  }
}
Behavior:
Agent: "Your balance is $1,250 and your last—"
User: "Uh-huh" [pause triggered]

LLM analyzes: Is this a real interruption or backchannel?

Decision: Backchannel

Agent: [resumes] "—transaction was a debit of $45."
vs.
Agent: "Your balance is $1,250 and your last—"
User: "Wait, that's wrong!" [pause triggered]

LLM analyzes: Is this a real interruption or backchannel?

Decision: Real interruption

Agent: [stops completely] "I'm sorry, what was wrong?"
Prompting: The LLM receives:
Agent was saying: "Your balance is $1,250 and your last transaction..."
User said: "Uh-huh"

Is this:
A) A backchannel acknowledgment (agent should continue)
B) A real interruption (agent should stop and respond)
Verification adds ~300ms latency but dramatically improves conversation naturalness by preventing false interruptions.

Configuration strategies

Strategy 1: Fast and simple

Use case: High-volume IVR, simple transactions
{
  "UseTurnByTurnMode": true,
  "TurnEnd": {
    "Type": "VAD",
    "VadSpeechDurationMS": 150,
    "VadSilenceDurationMS": 300
  }
}
Characteristics:
  • No barge-in
  • Fastest response time
  • Deterministic behavior

Strategy 2: Natural conversations

Use case: Customer service, general assistants
{
  "UseTurnByTurnMode": false,
  "TurnEnd": {
    "Type": "STT"
  },
  "PauseTrigger": {
    "Type": "STT",
    "WordCount": 2
  },
  "Verification": {
    "Enabled": true,
    "UseAgentLLM": true
  }
}
Characteristics:
  • Barge-in enabled
  • Distinguishes backchannels from interruptions
  • Balanced latency and accuracy

Strategy 3: Maximum accuracy

Use case: Therapy, coaching, high-stakes consultations
{
  "UseTurnByTurnMode": false,
  "TurnEnd": {
    "Type": "AI",
    "UseAgentLLM": true
  },
  "PauseTrigger": {
    "Type": "STT",
    "WordCount": 3
  },
  "Verification": {
    "Enabled": true,
    "UseAgentLLM": false,
    "LLMIntegration": {
      "provider": "anthropic",
      "model": "claude-3-opus"
    }
  }
}
Characteristics:
  • Semantic understanding at all stages
  • Highest accuracy
  • Higher latency and cost (justified for high-value use cases)

Strategy 4: Noisy environments

Use case: Call centers, outdoor applications
{
  "UseTurnByTurnMode": false,
  "TurnEnd": {
    "Type": "VAD",
    "VadSpeechDurationMS": 200,
    "VadSilenceDurationMS": 500
  },
  "PauseTrigger": {
    "Type": "STT",
    "WordCount": 4
  },
  "Verification": {
    "Enabled": true,
    "UseAgentLLM": true
  }
}
Characteristics:
  • Higher thresholds to avoid false positives
  • STT + verification reduce noise interruptions
  • Slightly slower but more reliable

Testing interruptions

1

Test backchannels

While agent is speaking, say short acknowledgments:
  • “Okay”
  • “Mm-hmm”
  • “I see”
Agent should continue (if verification enabled).
2

Test real interruptions

While agent is speaking, say:
  • “Wait, stop”
  • “That’s wrong”
  • “I have a question”
Agent should stop and respond.
3

Test slow speakers

Pause mid-sentence for 1-2 seconds.Agent should wait (not cut you off).
4

Test noisy environment

Play background noise or music.Agent should not treat noise as speech.
5

Test turn-by-turn

Try interrupting in turn-by-turn mode.Agent should ignore interruptions until finished.

Best practices

Match culture and context

  • Western cultures - More interruptions expected, enable barge-in
  • Eastern cultures - More respectful turn-taking, consider turn-by-turn mode
  • Formal contexts - Stricter turn-taking
  • Casual contexts - More flexible interruptions

Tune for audience

  • Young adults - Fast VAD thresholds (200ms silence)
  • Elderly users - Slow VAD thresholds (500-700ms silence)
  • Non-native speakers - STT or ML turn detection (better at handling pauses)

Provide feedback

When paused, give audio cues:
{
  "AI Response": "[pause tone] Yes, how can I help?"
}
Or use the agent’s personality:
Agent: "Sorry to interrupt—you were saying?"

Monitor false positives

Track metrics:
  • Interruptions per conversation
  • Average interruption latency
  • Backchannel vs. real interruption ratio
Adjust thresholds based on data.

Use dedicated LLMs for verification

For high-traffic agents, use a faster/cheaper model for verification:
{
  "Verification": {
    "Enabled": true,
    "UseAgentLLM": false,
    "LLMIntegration": {
      "provider": "openai",
      "model": "gpt-3.5-turbo"  // Fast and cheap
    }
  }
}
Reserve your primary LLM (e.g., GPT-4) for conversation generation.

Latency breakdown

ConfigurationPause DetectionTurn End DetectionVerificationTotal
VAD only~100ms~100ms-~200ms
VAD + STT~100ms~300ms-~400ms
STT + Verification~300ms~300ms~300ms~900ms
ML + Verification~200ms~400ms~300ms~900ms
AI (full LLM)~400ms~500ms~300ms~1200ms
Latencies above 500ms are noticeable to users. Only use AI/ML strategies when accuracy justifies the delay.

Next steps

Agent configuration

Complete agent settings reference

Visual IDE

Build conversation scripts

Integrations

Configure LLM and STT providers