Barge-In Configuration
Overview
Barge-In enables users to naturally interrupt the Voice Agent while it is speaking, creating a more human-like conversational experience. Instead of requiring users to wait for the assistant to finish its response, the agent can detect when the user starts speaking and determine whether the speech is intended as an interruption.
The Barge-In configuration controls how interruptions are detected, validated, and handled. These settings help balance responsiveness with accuracy by reducing false interruptions caused by background noise, microphone echo, brief acknowledgements, or accidental speech.
The Voice Agent supports two interruption modes:
- Hard Barge-In – Immediately stops assistant playback when a valid interruption is detected or when predefined interruption phrases are recognized.
- Soft Barge-In – Temporarily pauses assistant playback while the system evaluates whether the user genuinely intends to interrupt. Playback resumes automatically if the interruption is not confirmed.
The configuration allows administrators to control:
- How quickly user speech is recognized as an interruption.
- How much speech evidence is required before an interruption is confirmed.
- How long the system waits before resuming playback during soft interruptions.
- Protection mechanisms that prevent interruptions caused by echo or assistant audio leakage.
- Recognition of common interruption phrases such as "stop" or "wait".
- Filtering of acknowledgement phrases such as "yeah" or "okay" that should not interrupt the conversation.
Proper tuning of these settings helps ensure that the Voice Agent remains responsive to genuine user interruptions while minimizing unintended interruptions that can negatively impact the conversation experience.
Assistant Protection
These settings help prevent accidental interruptions caused by audio leakage, echo, or double-talk.
| Option | Value | Description |
|---|---|---|
| Assistant Playback Grace Period | 1000 ms | Time immediately after the assistant starts speaking during which brief echo or audio leakage from the assistant is ignored to prevent accidental interruption detection. |
| Assistant Post Interrupt Backoff | 0 ms | Additional delay before assistant speech resumes after a confirmed interruption. Useful for reducing double-talk. A value of 0 disables the delay. |
Speech Detection
These settings determine how user speech is detected and classified as a potential interruption.
| Option | Value | Description |
|---|---|---|
| Server Energy VAD Debounce | 100 ms | Minimum interval between server-side Voice Activity Detection (VAD) interrupt evaluations to reduce processing noise and event flooding. |
| Barge-In Speech Start Duration | 0.45 seconds | Minimum continuous speech duration required before user speech is considered a valid interruption of the assistant. |
| Barge-In Speech Stop Duration | 0.2 seconds | Minimum duration of silence required before the interruption is considered complete. |
Client Controls
These settings regulate client-side speech events and buffering during interruption detection.
| Option | Value | Description |
|---|---|---|
| Client User Speaking Debounce | 100 ms | Minimum interval between user-speaking notifications received from the client browser to prevent excessive start/stop events. |
| Soft Hold Client Buffer Maximum | 2000 ms | Maximum amount of client audio buffered while evaluating a potential soft barge-in. |
Soft Barge-In Behavior
Soft barge-in allows the system to temporarily pause assistant playback while determining whether the user intends to interrupt.
| Option | Value | Description |
|---|---|---|
| Soft Barge-In Probe Timeout | 600 ms | Maximum time allowed for a soft barge-in probe before a decision is made to resume or interrupt the assistant. |
| Soft Barge-In Post Quiet Tail | 2000 ms | After VAD detects silence during a soft barge-in probe, waits this duration before resuming assistant speech. A value of 0 resumes immediately after silence is detected. |
Interruption Confirmation
These settings define the minimum speech recognition evidence required before a detected interruption is confirmed.
| Option | Value | Description |
|---|---|---|
| Barge-In Confirm Minimum Words | 1 | Minimum number of recognized words required to confirm an interruption. |
| Barge-In Confirm Minimum Characters | 0 | Minimum number of recognized characters required to confirm an interruption. A value of 0 disables this threshold. |
| Barge-In Confirm Minimum Partials | 1 | Minimum number of partial speech recognition results required before confirming an interruption. |
Interruption Phrases
These phrases trigger an immediate interruption when detected while the assistant is speaking.
| Option | Value | Description |
|---|---|---|
| Barge-In Interruption Phrases | stop, wait, hold on, hang on, excuse me, never mind, cancel that, slow down | When any of these phrases are detected while the assistant is speaking, an immediate hard interruption is triggered. Matching is case-insensitive. |
Acknowledgement Phrases
These phrases are treated as passive acknowledgements and do not interrupt the conversation flow.
| Option | Value | Description |
|---|---|---|
| Barge-In Acknowledgement Phrases | uh huh, uh-huh, mm hmm, mm-hmm, mhm, yeah, yep, okay, i see, sure, got it, alright | Short acknowledgement phrases that are ignored while the assistant is speaking or processing. These do not generate chat messages or language model requests. |
Notes
- Hard Barge-In immediately stops assistant playback and gives control to the user.
- Soft Barge-In temporarily pauses assistant playback while the system determines whether the user genuinely intends to interrupt.
- Voice Activity Detection (VAD) is used to distinguish actual speech from background noise and brief audio artifacts.
- Phrase-based interruption provides a fast path for common commands such as "stop" or "wait" without requiring full interruption confirmation logic.