Audio Ambience
Two perceived-latency hacks borrowed from production voice agents:
- Background sound — a low-volume office-chatter loop running for
the entire call. Masks silence, makes the agent feel “present in a
room” rather than floating in a vacuum.
- Thinking sound — a short keyboard-typing clip that plays while
the agent is in its “thinking” state. Masks LLM time-to-first-token
and the gaps that open up during tool calls / agent handoffs.
Both are powered by LiveKit’s BackgroundAudioPlayer under the hood.
The agent’s TTS / Gemini Live audio plays on top — the ambience is
mixed at low volume so it doesn’t compete with the spoken reply.
Where to find it
In Agent Studio → Voice Configuration → Advanced → Call Control,
toggle the Audio Ambience card on. Click the gear icon to open the
configuration modal — every setting lives there.
Configuration
Background Sound
| Option | Behaviour |
|---|
| Off | No background sound. |
| Office Chatter (built-in) | Subtle call-centre chatter loop. Plays continuously throughout the call. |
| Custom | Upload your own .mp3 or .wav (up to 25 MB). |
A volume slider (0–100%, default 30%) sits below the picker.
Thinking Sound
| Option | Behaviour |
|---|
| Off | No thinking sound. |
| Keyboard Typing (built-in) | Plays while the agent is in its “thinking” state — i.e. waiting for the LLM. |
| Custom | Upload your own .mp3 or .wav (up to 25 MB). |
Volume slider (0–100%, default 50%).
The thinking sound is auto-triggered by LiveKit’s session events; you
don’t have to wire anything to make it play during tool calls or
handoffs.
Custom audio uploads
Click Custom for either slot to enable the upload zone. Drag a
.mp3 or .wav in (or click to pick from disk). The file is uploaded
to the same audio asset endpoint used by the rest of the platform and
the URL is stored on the agent config.
You can preview the uploaded clip with the play button before saving.
Use the trash icon to remove the uploaded file.
Best tested on a real phone call. In a browser test (laptop speakers +
mic, no echo cancellation), the agent can hear its own ambience and
treat it as user speech. Use headphones, or test from a phone.
How it interacts with the audio path
- The ambience and thinking clips are published on the agent’s
outbound audio track. They don’t go through the noise-cancellation
path (NC only processes inbound caller audio).
- The mixer waits ~3 seconds after the session starts before the
background streams attach, so the greeting TTS has time to drive
the audio path. Without that delay, LiveKit’s audio mixer drops the
background streams on first-frame timeout.
- When the call ends, the player is closed cleanly and any
temp-downloaded custom audio files are deleted.
Cost
Audio Ambience is free — there’s no per-minute charge.
Quick start
- Open your agent in Agent Studio → Voice Configuration.
- Go to Advanced → Call Control.
- Toggle Audio Ambience on. Both default to the built-in clips
(office chatter + keyboard typing).
- Click the gear icon if you want to upload your own clips, change
volumes, or pick “Off” for either slot.
- Save and test on a real phone (or with headphones).
Next Steps