Voice Agents
Build intelligent voice agents that handle phone calls, web calls, and complex conversations with natural-sounding AI voices.
Voice Agents
Voice agents are AI-powered phone agents that can answer calls, make outbound calls, and hold natural conversations with your customers. They understand speech in real time, process intent, and respond with human-like voices — all without human intervention.
Why Voice Agents?
Traditional IVR systems frustrate callers with rigid menus and limited options. Voice agents replace that experience with fluid, intelligent conversation:
- 24/7 availability — Never miss a call, regardless of time zone or volume.
- Instant response — No hold times. Your agent picks up immediately.
- Natural conversation — Callers speak naturally instead of pressing buttons.
- Scalable — Handle 1 call or 1,000 simultaneous calls with the same quality.
- Cost-effective — A fraction of the cost of human call centers.
How It Works
When a call reaches your thinnestAI voice agent, here is what happens:
Caller dials your number
↓
Phone provider (Twilio / Vobiz) routes the call
↓
thinnestAI voice engine picks up
↓
Caller speaks → Speech-to-Text (Vega / Deepgram / Sarvam / Google / more)
↓
AI agent processes the transcript, decides on a response
↓
Response → Text-to-Speech (Aero / Sarvam / Cartesia / ElevenLabs / Deepgram / more)
↓
Caller hears the response in real timeThe entire loop — listen, think, respond — happens in milliseconds, creating a seamless conversational experience.
Supported Features
| Feature | Description |
|---|---|
| Inbound calls | Answer incoming calls on your phone numbers automatically |
| Outbound calls | Trigger calls to customers via API or campaigns |
| Web calls | Browser-based voice sessions using WebRTC |
| Call recording | Record and store every call for review and compliance |
| Call transcription | Full text transcripts of every conversation |
| Call transfers | Hand off calls to human agents or other departments |
| IVR navigation | Navigate existing IVR systems during outbound calls |
| Voicemail detection | Detect answering machines and respond appropriately |
| Campaign calling | Batch outbound calls with scheduling and throttling |
Voice Providers
thinnestAI supports 8 TTS providers and 6 STT providers so you can choose the voice and transcription engine that best fits your use case:
Text-to-Speech (TTS):
- Cartesia (recommended) — Sub-100ms latency with emotion control
- Sarvam — India-first, 10+ Indian languages with code-switching
- Deepgram — Fast, affordable, great for high volume
- ElevenLabs — Ultra-realistic voices with voice cloning
- Rime — Consistent, reliable voice quality
- Inworld — Character-driven, persona-consistent voices
- Google Cloud TTS — 220+ voices across 40+ languages
- Azure Speech — Enterprise-grade with custom voice training
Speech-to-Text (STT):
- Deepgram (default) — Fastest and most accurate for English
- Sarvam — Best accuracy for Indian languages
- Google Speech — 125+ languages
- Azure Speech — Enterprise, custom models
- Cartesia — Low latency with word timestamps
- AssemblyAI — Highest accuracy with speaker diarization
You can mix and match providers — for example, use Sarvam for STT and Aero for TTS. Switch providers at any time without changing your agent logic. See Voice Configuration for setup details.
Architecture Overview
The thinnestAI voice engine handles all the complexity of real-time audio:
┌──────────────────────────────────────────────────────────┐
│ Your Application │
│ (Dashboard / API / Embed Widget) │
└────────────────────────┬─────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ thinnestAI Voice Engine │
│ │
│ ┌─────────────┐ ┌──────────┐ ┌───────────────────┐ │
│ │ Speech-to- │ │ AI Agent │ │ Text-to-Speech │ │
│ │ Text (STT) │→ │ (LLM) │→ │ (TTS) │ │
│ └─────────────┘ └──────────┘ └───────────────────┘ │
│ │
│ ┌─────────────┐ ┌──────────────────────────────────┐ │
│ │ Call │ │ Recording & Transcription │ │
│ │ Management │ │ Storage │ │
│ └─────────────┘ └──────────────────────────────────┘ │
└────────────────────────┬─────────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Twilio │ │ Vobiz │ │ WebRTC │
│ (PSTN) │ │ (SIP) │ │ (Browser) │
└──────────────┘ └──────────────┘ └──────────────┘The voice engine manages audio streams, handles interruptions, processes speech in real time, and coordinates with your AI agent to generate responses. You interact with it through the dashboard or API — no need to manage audio infrastructure yourself.
Getting Started
Step 1: Create a Voice Agent
In the thinnestAI dashboard, click Create Agent and select Voice Agent as the type.
Configure the basics:
- Name — Give your agent a descriptive name (e.g., "Customer Support Agent")
- System prompt — Define your agent's personality, knowledge, and behavior
- Voice — Choose a TTS provider and voice
- LLM — Select the language model (GPT-4, Claude, Gemini, etc.)
Step 2: Connect a Phone Number
Go to Settings → Phone Numbers and either:
- Purchase a new number through Twilio
- Connect an existing Twilio or Vobiz number
Assign the number to your voice agent.
Step 3: Test Your Agent
Use the Web Call button in the dashboard to have a live conversation with your agent. Speak naturally and verify that responses are accurate and appropriately timed.
Step 4: Go Live
Once you are satisfied with testing, your agent is ready to handle real calls. Monitor performance in the Call Logs section of the dashboard.
Next Steps
- Inbound Calls — Set up your agent to answer incoming calls
- Outbound Calls — Trigger calls to customers via API
- Web Calls — Embed voice calls in your website
- Voice Configuration — Fine-tune voice settings
- Call Recording — Enable recording and transcription
- Call Transfers — Route calls to human agents
- SIP Integration — Connect your SIP infrastructure
- Voice Tools — End call, DTMF, SMS mid-call, voicemail drop, and more