Voice Agents

Voice agents are AI-powered phone agents that can answer calls, make outbound calls, and hold natural conversations with your customers. They understand speech in real time, process intent, and respond with human-like voices — all without human intervention.

Why Voice Agents?

Traditional IVR systems frustrate callers with rigid menus and limited options. Voice agents replace that experience with fluid, intelligent conversation:

24/7 availability — Never miss a call, regardless of time zone or volume.
Instant response — No hold times. Your agent picks up immediately.
Natural conversation — Callers speak naturally instead of pressing buttons.
Scalable — Handle 1 call or 1,000 simultaneous calls with the same quality.
Cost-effective — A fraction of the cost of human call centers.

How It Works

When a call reaches your thinnestAI voice agent, here is what happens:

Caller dials your number
        ↓
  Phone provider (Twilio / Vobiz) routes the call
        ↓
  thinnestAI voice engine picks up
        ↓
  Caller speaks → Speech-to-Text (Deepgram / Sarvam / Google / more)
        ↓
  AI agent processes the transcript, decides on a response
        ↓
  Response → Text-to-Speech (Sarvam / Cartesia / ElevenLabs / Deepgram / more)
        ↓
  Caller hears the response in real time

The entire loop — listen, think, respond — happens in milliseconds, creating a seamless conversational experience.

Supported Features

Feature	Description
Inbound calls	Answer incoming calls on your phone numbers automatically
Outbound calls	Trigger calls to customers via API or campaigns
Web calls	Browser-based voice sessions using WebRTC
Call recording	Record and store every call for review and compliance
Call transcription	Full text transcripts of every conversation
Call transfers	Hand off calls to human agents or other departments
IVR navigation	Navigate existing IVR systems during outbound calls
Voicemail detection	Detect answering machines and respond appropriately
Campaign calling	Batch outbound calls with scheduling and throttling

Voice Providers

thinnestAI supports 8 TTS providers and 6 STT providers so you can choose the voice and transcription engine that best fits your use case: Text-to-Speech (TTS):

Cartesia (recommended) — Sub-100ms latency with emotion control
Sarvam — India-first, 10+ Indian languages with code-switching
Deepgram — Fast, affordable, great for high volume
ElevenLabs — Ultra-realistic voices with voice cloning
Rime — Consistent, reliable voice quality
Inworld — Character-driven, persona-consistent voices
Google Cloud TTS — 220+ voices across 40+ languages
Azure Speech — Enterprise-grade with custom voice training

Speech-to-Text (STT):

Deepgram (default) — Fastest and most accurate for English
Sarvam — Best accuracy for Indian languages
Google Speech — 125+ languages
Azure Speech — Enterprise, custom models
Cartesia — Low latency with word timestamps
AssemblyAI — Highest accuracy with speaker diarization

You can mix and match providers — for example, use Sarvam for STT and ElevenLabs for TTS. Switch providers at any time without changing your agent logic. See Voice Configuration for setup details.

Architecture Overview

The thinnestAI voice engine handles all the complexity of real-time audio:

┌──────────────────────────────────────────────────────────┐
│                   Your Application                        │
│         (Dashboard / API / Embed Widget)                  │
└────────────────────────┬─────────────────────────────────┘
                         │
                         ▼
┌──────────────────────────────────────────────────────────┐
│              thinnestAI Voice Engine                       │
│                                                           │
│   ┌─────────────┐  ┌──────────┐  ┌───────────────────┐  │
│   │ Speech-to-  │  │ AI Agent │  │ Text-to-Speech    │  │
│   │ Text (STT)  │→ │ (LLM)   │→ │ (TTS)             │  │
│   └─────────────┘  └──────────┘  └───────────────────┘  │
│                                                           │
│   ┌─────────────┐  ┌──────────────────────────────────┐  │
│   │ Call        │  │ Recording & Transcription         │  │
│   │ Management  │  │ Storage                           │  │
│   └─────────────┘  └──────────────────────────────────┘  │
└────────────────────────┬─────────────────────────────────┘
                         │
         ┌───────────────┼───────────────┐
         ▼               ▼               ▼
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   Twilio     │  │    Vobiz     │  │   WebRTC     │
│   (PSTN)     │  │    (SIP)     │  │   (Browser)  │
└──────────────┘  └──────────────┘  └──────────────┘

The voice engine manages audio streams, handles interruptions, processes speech in real time, and coordinates with your AI agent to generate responses. You interact with it through the dashboard or API — no need to manage audio infrastructure yourself.

Getting Started

Step 1: Create a Voice Agent

In the thinnestAI dashboard, click Create Agent and select Voice Agent as the type. Configure the basics:

Name — Give your agent a descriptive name (e.g., “Customer Support Agent”)
System prompt — Define your agent’s personality, knowledge, and behavior
Voice — Choose a TTS provider and voice
LLM — Select the language model (GPT-4, Claude, Gemini, etc.)

Step 2: Connect a Phone Number

Go to Settings → Phone Numbers and either:

Purchase a new number through Twilio
Connect an existing Twilio or Vobiz number

Assign the number to your voice agent.

Step 3: Test Your Agent

Use the Web Call button in the dashboard to have a live conversation with your agent. Speak naturally and verify that responses are accurate and appropriately timed.

Step 4: Go Live

Once you are satisfied with testing, your agent is ready to handle real calls. Monitor performance in the Call Logs section of the dashboard.

Next Steps

Inbound Calls — Set up your agent to answer incoming calls
Outbound Calls — Trigger calls to customers via API
Web Calls — Embed voice calls in your website
Voice Configuration — Fine-tune voice settings
Call Recording — Enable recording and transcription
Call Transfers — Route calls to human agents
SIP Integration — Connect your SIP infrastructure
Voice Tools — End call, DTMF, SMS mid-call, voicemail drop, and more

​Voice Agents

​Why Voice Agents?

​How It Works

​Supported Features

​Voice Providers

​Architecture Overview

​Getting Started

​Step 1: Create a Voice Agent

​Step 2: Connect a Phone Number

​Step 3: Test Your Agent

​Step 4: Go Live

​Next Steps