Voice Agents

Build intelligent voice agents that handle phone calls, web calls, and complex conversations with natural-sounding AI voices.

Voice Agents

Voice agents are AI-powered phone agents that can answer calls, make outbound calls, and hold natural conversations with your customers. They understand speech in real time, process intent, and respond with human-like voices — all without human intervention.

Why Voice Agents?

Traditional IVR systems frustrate callers with rigid menus and limited options. Voice agents replace that experience with fluid, intelligent conversation:

  • 24/7 availability — Never miss a call, regardless of time zone or volume.
  • Instant response — No hold times. Your agent picks up immediately.
  • Natural conversation — Callers speak naturally instead of pressing buttons.
  • Scalable — Handle 1 call or 1,000 simultaneous calls with the same quality.
  • Cost-effective — A fraction of the cost of human call centers.

How It Works

When a call reaches your thinnestAI voice agent, here is what happens:

Caller dials your number

  Phone provider (Twilio / Vobiz) routes the call

  thinnestAI voice engine picks up

  Caller speaks → Speech-to-Text (Vega / Deepgram / Sarvam / Google / more)

  AI agent processes the transcript, decides on a response

  Response → Text-to-Speech (Aero / Sarvam / Cartesia / ElevenLabs / Deepgram / more)

  Caller hears the response in real time

The entire loop — listen, think, respond — happens in milliseconds, creating a seamless conversational experience.

Supported Features

FeatureDescription
Inbound callsAnswer incoming calls on your phone numbers automatically
Outbound callsTrigger calls to customers via API or campaigns
Web callsBrowser-based voice sessions using WebRTC
Call recordingRecord and store every call for review and compliance
Call transcriptionFull text transcripts of every conversation
Call transfersHand off calls to human agents or other departments
IVR navigationNavigate existing IVR systems during outbound calls
Voicemail detectionDetect answering machines and respond appropriately
Campaign callingBatch outbound calls with scheduling and throttling

Voice Providers

thinnestAI supports 8 TTS providers and 6 STT providers so you can choose the voice and transcription engine that best fits your use case:

Text-to-Speech (TTS):

  • Cartesia (recommended) — Sub-100ms latency with emotion control
  • Sarvam — India-first, 10+ Indian languages with code-switching
  • Deepgram — Fast, affordable, great for high volume
  • ElevenLabs — Ultra-realistic voices with voice cloning
  • Rime — Consistent, reliable voice quality
  • Inworld — Character-driven, persona-consistent voices
  • Google Cloud TTS — 220+ voices across 40+ languages
  • Azure Speech — Enterprise-grade with custom voice training

Speech-to-Text (STT):

  • Deepgram (default) — Fastest and most accurate for English
  • Sarvam — Best accuracy for Indian languages
  • Google Speech — 125+ languages
  • Azure Speech — Enterprise, custom models
  • Cartesia — Low latency with word timestamps
  • AssemblyAI — Highest accuracy with speaker diarization

You can mix and match providers — for example, use Sarvam for STT and Aero for TTS. Switch providers at any time without changing your agent logic. See Voice Configuration for setup details.

Architecture Overview

The thinnestAI voice engine handles all the complexity of real-time audio:

┌──────────────────────────────────────────────────────────┐
│                   Your Application                        │
│         (Dashboard / API / Embed Widget)                  │
└────────────────────────┬─────────────────────────────────┘


┌──────────────────────────────────────────────────────────┐
│              thinnestAI Voice Engine                       │
│                                                           │
│   ┌─────────────┐  ┌──────────┐  ┌───────────────────┐  │
│   │ Speech-to-  │  │ AI Agent │  │ Text-to-Speech    │  │
│   │ Text (STT)  │→ │ (LLM)   │→ │ (TTS)             │  │
│   └─────────────┘  └──────────┘  └───────────────────┘  │
│                                                           │
│   ┌─────────────┐  ┌──────────────────────────────────┐  │
│   │ Call        │  │ Recording & Transcription         │  │
│   │ Management  │  │ Storage                           │  │
│   └─────────────┘  └──────────────────────────────────┘  │
└────────────────────────┬─────────────────────────────────┘

         ┌───────────────┼───────────────┐
         ▼               ▼               ▼
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   Twilio     │  │    Vobiz     │  │   WebRTC     │
│   (PSTN)     │  │    (SIP)     │  │   (Browser)  │
└──────────────┘  └──────────────┘  └──────────────┘

The voice engine manages audio streams, handles interruptions, processes speech in real time, and coordinates with your AI agent to generate responses. You interact with it through the dashboard or API — no need to manage audio infrastructure yourself.

Getting Started

Step 1: Create a Voice Agent

In the thinnestAI dashboard, click Create Agent and select Voice Agent as the type.

Configure the basics:

  • Name — Give your agent a descriptive name (e.g., "Customer Support Agent")
  • System prompt — Define your agent's personality, knowledge, and behavior
  • Voice — Choose a TTS provider and voice
  • LLM — Select the language model (GPT-4, Claude, Gemini, etc.)

Step 2: Connect a Phone Number

Go to Settings → Phone Numbers and either:

  • Purchase a new number through Twilio
  • Connect an existing Twilio or Vobiz number

Assign the number to your voice agent.

Step 3: Test Your Agent

Use the Web Call button in the dashboard to have a live conversation with your agent. Speak naturally and verify that responses are accurate and appropriately timed.

Step 4: Go Live

Once you are satisfied with testing, your agent is ready to handle real calls. Monitor performance in the Call Logs section of the dashboard.

Next Steps

On this page