Request Body
Core
Agent display name
LLM model identifier. Use provider/model format or just model name (see LLM Models)
System prompt that defines the agent’s behavior, personality, and constraints
Internal description (not shown to end users)
Agent type: simple, graph, workflow
Enable voice/phone call capabilities
Greeting message the agent speaks when a voice call starts
LLM temperature (0.0–2.0). Lower = more deterministic, higher = more creative
Maximum tokens in the LLM response
Allow the agent to end voice calls
Maximum voice call duration in seconds. 0 = unlimited
Noise cancellation: none, bvc, krisp
LLM Models
Themodel field accepts a string in provider/model format, or just the model name for auto-detection.
Subscription Tiers: Trial (default) — limited to gpt-4o-mini and Sarvam models. PAYG (after first top-up) — all models, OCR, and multimodal unlocked. Enterprise — all features including branding removal, teams, observability, and priority support.
Provider auto-detection rules:
| Model prefix | Detected provider |
|---|---|
gpt-*, o1-*, o3-* | openai |
claude-* | anthropic |
gemini* | google |
sarvam* | sarvam |
llama-*, mixtral-* | groq |
"openai/gpt-4o", "anthropic/claude-3-5-sonnet-20241022".
OpenAI
OpenAI
| Model ID | Min. Tier | Notes |
|---|---|---|
gpt-4o-mini | Trial | Fast, cost-effective for simple tasks |
gpt-4o | PAYG | Most capable, recommended for production |
gpt-4-turbo | PAYG | High capability, large context |
gpt-3.5-turbo | PAYG | Legacy, fastest |
o3 | PAYG | Reasoning model |
o3-mini | PAYG | Lightweight reasoning |
o1 | PAYG | Advanced reasoning |
o1-mini | PAYG | Lightweight reasoning |
Anthropic
Anthropic
| Model ID | Min. Tier | Notes |
|---|---|---|
claude-3-5-haiku-20241022 | PAYG | Fast, affordable |
claude-3-5-sonnet-20241022 | PAYG | Best for complex tasks |
claude-3-opus-20240229 | PAYG | Highest capability |
Google
| Model ID | Min. Tier | Notes |
|---|---|---|
gemini-2.0-flash-exp | PAYG | Fast, experimental |
gemini-1.5-flash | PAYG | Lightweight |
gemini-1.5-pro | PAYG | High capability |
Groq (Ultra-Fast Inference)
Groq (Ultra-Fast Inference)
Groq delivers the fastest LLM inference — 200-400ms TTFT for voice agents. Recommended for low-latency voice calls.
| Model ID | Min. Tier | Notes |
|---|---|---|
llama-3.3-70b-versatile | Trial | Best for voice agents, tool calling supported |
llama-3.1-8b-instant | Trial | Fastest, lightweight |
llama-3.1-70b-versatile | PAYG | 70B with tool calling |
qwen/qwen3-32b | PAYG | Qwen 3, strong multilingual |
meta-llama/llama-4-scout-17b-16e-instruct | PAYG | Llama 4 Scout |
deepseek-r1-distill-llama-70b | PAYG | DeepSeek reasoning |
gemma2-9b-it | PAYG | Google Gemma 2 |
mixtral-8x7b-32768 | PAYG | 32K context MoE |
Sarvam AI (Indian LLM)
Sarvam AI (Indian LLM)
All Sarvam models are available on Trial tier (no top-up needed).
| Model ID | Min. Tier | Notes |
|---|---|---|
sarvam-m | Trial | General purpose |
sarvam-30b | Trial | 30B parameters |
sarvam-30b-16k | Trial | 30B with 16K context |
sarvam-105b | Trial | 105B parameters |
sarvam-105b-32k | Trial | 105B with 32K context |
Transcriber (STT)
Speech-to-text configuration. Only used whenvoiceEnabled is true.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
provider | string | No | deepgram | STT provider |
model | string | No | Provider default | STT model ID |
language | string | No | en | Language code (en, hi, es, fr, etc.) |
Deepgram (default)
Deepgram (default)
Default provider. Best accuracy and lowest latency for voice agents.
| Model | Description |
|---|---|
nova-2-conversationalai | Best for voice agents and phone calls (default) |
nova-2-phonecall | Optimized for phone audio |
nova-2-medical | Medical terminology |
nova-2 | Base Nova 2 model |
nova-3 | Latest generation, highest accuracy |
nova-3-multilingual | Multi-language support |
nova-3-medical | Medical + Nova 3 |
flux-general-en | Recommended for voice agents — Deepgram v2 streaming API, ~200ms latency, native turn detection. Picking this auto-sets turn_detection_mode: "stt_endpointing". |
flux-general-multi | Same as flux-general-en, multilingual variant |
OpenAI
OpenAI
| Model | Description |
|---|---|
whisper-1 | Whisper model |
gpt-4o-transcribe | GPT-4o powered transcription |
gpt-4o-mini-transcribe | GPT-4o Mini transcription |
ElevenLabs
ElevenLabs
| Model | Description |
|---|---|
scribe-v2-realtime | Real-time transcription |
Cartesia
Cartesia
| Model | Description |
|---|---|
ink-whisper | Fast transcription |
Sarvam (Indian Languages)
Sarvam (Indian Languages)
Supports Hindi, Tamil, Telugu, Kannada, Malayalam, Bengali, Marathi, Gujarati, and more.
| Model | Description |
|---|---|
saarika:v2.5 | Latest Sarvam STT, 10+ Indian languages |
saarika:v2 | Sarvam STT v2 |
AssemblyAI
AssemblyAI
| Model | Description | Pricing |
|---|---|---|
u3-rt-pro | Universal-3 Pro Streaming — most accurate for voice agents | $0.45/hr |
universal-streaming | Universal Streaming — fastest English transcription | $0.15/hr |
universal-streaming-multilingual | Universal Streaming Multilingual | $0.15/hr |
whisper-streaming | Whisper Streaming — open-source Whisper on AssemblyAI infra | $0.30/hr |
universal-3-pro | Universal-3 Pro (file/batch only) | $0.21/hr |
universal-2 | Universal-2 (file/batch only) | $0.15/hr |
Free on Trial tier — AssemblyAI STT is included free during trial (no STT charges).
Voice (TTS)
Text-to-speech configuration. Only used whenvoiceEnabled is true.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
provider | string | No | deepgram | TTS provider |
voiceId | string | No | asteria | Voice identifier |
model | string | No | Provider default | TTS model (if provider has multiple) |
speed | float | No | 1.0 | Speech speed multiplier |
Deepgram Aura (default)
Deepgram Aura (default)
Default provider. Low-latency, high-quality voices.Models:
Multilingual: Append language code —
aura-2 (default), auraFeatured Voices (Aura-2):| Voice ID | Gender | Accent |
|---|---|---|
aura-2-thalia-en | Female | US English |
aura-2-andromeda-en | Female | US English |
aura-2-helena-en | Female | US English |
aura-2-athena-en | Female | US English |
aura-2-aurora-en | Female | US English |
aura-2-apollo-en | Male | US English |
aura-2-arcas-en | Male | US English |
aura-2-atlas-en | Male | US English |
aura-2-aries-en | Male | US English |
aura-2-thalia-es (Spanish), aura-2-thalia-de (German), aura-2-thalia-fr (French), aura-2-thalia-nl (Dutch), aura-2-thalia-it (Italian), aura-2-thalia-ja (Japanese).Cartesia
Cartesia
High-quality, multi-language voices with low latency.Models:
sonic-3, sonic-2, sonic-turbo, sonic50+ voices available — fetched dynamically. Use the voice ID from Cartesia’s library.ElevenLabs
ElevenLabs
Premium voice cloning and 100+ voices.Models:
eleven_flash_v2_5, eleven_flash_v2, eleven_turbo_v2_5, eleven_turbo_v2, eleven_multilingual_v2Use the voice ID from your ElevenLabs account (premade or cloned voices).OpenAI
OpenAI
Models:
tts-1, tts-1-hd| Voice ID | Gender |
|---|---|
alloy | Neutral |
echo | Male |
fable | Male |
onyx | Male |
nova | Female |
shimmer | Female |
Sarvam (Hindi / Indian)
Sarvam (Hindi / Indian)
Model:
bulbul:v2| Voice ID | Gender |
|---|---|
anushka | Female |
manisha | Female |
vidya | Female |
arya | Male |
abhilash | Male |
karun | Male |
hitesh | Male |
Rime
Rime
Models:
arcana-v3, mist-v2Inworld
Inworld
Models:
inworld-tts-1.5-max, inworld-tts-1.5-mini, inworld-tts-1-max, inworld-tts-1Tools
Array of tool identifiers to attach to the agent.| Field | Type | Required | Default | Description |
|---|---|---|---|---|
tools | string[] | No | [] | List of tool type identifiers |
Search
Search
| Tool ID | Description |
|---|---|
duckduckgo | DuckDuckGo web search |
tavily | Tavily AI search |
exa | Exa neural search |
serpapi | SerpAPI Google results |
serper | Serper.dev search |
wikipedia | Wikipedia lookup |
arxiv | arXiv paper search |
hackernews | Hacker News search |
pubmed | PubMed medical literature |
Web Scraping
Web Scraping
| Tool ID | Description |
|---|---|
website | Basic URL scraping |
firecrawl | Firecrawl web scraper |
crawl4ai | Crawl4AI scraper |
jinareader | Jina Reader API |
spider | Spider web crawler |
scrapling | Scrapling scraper |
newspaper | Article extraction |
trafilatura | Content extraction |
Communication
Communication
| Tool ID | Description |
|---|---|
email | Send emails |
gmail | Gmail integration |
slack | Slack messaging |
discord | Discord messaging |
telegram | Telegram bot |
twilio | Twilio SMS/voice |
whatsapp | WhatsApp messaging |
sms | SMS sending |
x | X (Twitter) |
zoom | Zoom meetings |
reddit | |
sendgrid | SendGrid email |
resend | Resend email |
Databases
Databases
| Tool ID | Description |
|---|---|
postgres | PostgreSQL queries |
mysql | MySQL queries |
mongodb | MongoDB operations |
duckdb | DuckDB analytics |
bigquery | Google BigQuery |
sql | Generic SQL |
neo4j | Neo4j graph DB |
supabase | Supabase |
firebase | Firebase |
airtable | Airtable |
redshift | AWS Redshift |
Productivity
Productivity
| Tool ID | Description |
|---|---|
github | GitHub repos/issues/PRs |
gitlab | GitLab integration |
jira | Jira issues |
notion | Notion pages |
linear | Linear issues |
trello | Trello boards |
confluence | Confluence docs |
asana | Asana tasks |
monday | Monday.com |
todoist | Todoist tasks |
clickup | ClickUp tasks |
calcom | Cal.com scheduling |
Google
| Tool ID | Description |
|---|---|
googlecalendar | Google Calendar events |
googlesheets | Google Sheets read/write |
googlemaps | Google Maps lookup |
googledrive | Google Drive files |
CRM & Sales
CRM & Sales
| Tool ID | Description |
|---|---|
hubspot | HubSpot CRM |
salesforce | Salesforce CRM |
activecampaign | ActiveCampaign |
apollo | Apollo.io prospecting |
linkedin | |
intercom | Intercom support |
mailchimp | Mailchimp campaigns |
shopify | Shopify e-commerce |
zendesk | Zendesk support |
AI & Media
AI & Media
| Tool ID | Description |
|---|---|
dalle | DALL·E image generation |
elevenlabs | ElevenLabs TTS |
replicate | Replicate AI models |
fal | Fal.ai models |
lumalabs | Luma Labs video |
giphy | Giphy GIFs |
unsplash | Unsplash photos |
youtube | YouTube data |
spotify | Spotify data |
Cloud & DevOps
Cloud & DevOps
| Tool ID | Description |
|---|---|
awslambda | AWS Lambda functions |
awsses | AWS SES email |
s3 | AWS S3 storage |
dropbox | Dropbox files |
onedrive | OneDrive files |
sentry | Sentry error tracking |
datadog | Datadog monitoring |
msteams | Microsoft Teams |
Finance
Finance
| Tool ID | Description |
|---|---|
yfinance | Yahoo Finance data |
openbb | OpenBB financial data |
stripe | Stripe payments |
Utility
Utility
| Tool ID | Description |
|---|---|
calculator | Math calculations |
python | Execute Python code |
shell | Execute shell commands |
file | File operations |
custom_api | Custom HTTP API calls |
openweather | Weather data |
mem0 | Memory storage |
mcp | Model Context Protocol |
get_user | Get caller/user info |
Knowledge Base
Attach knowledge sources by passing their IDs. Knowledge sources must be created first via the Knowledge API.| Field | Type | Required | Default | Description |
|---|---|---|---|---|
sourceIds | integer[] | No | [] | IDs of knowledge sources to attach |
Knowledge Source Types
Knowledge Source Types
Sources are created via the Knowledge API and can be of these types:
| Type | Description |
|---|---|
url | Web page — crawled and indexed |
text | Raw text content |
file | Document upload (PDF, DOCX, TXT) |
csv | CSV data |
json | JSON data |
markdown | Markdown documents |
youtube | YouTube video transcript |
excel | Excel spreadsheets |
github_repo | GitHub repository |
azure_blob | Azure Blob Storage |
sharepoint | SharePoint documents |
ocr | OCR-processed documents |
Voice Configuration
These sections configure voice call behavior. Only used whenvoiceEnabled is true.
Interruption Config
Interruption Config
Controls how the agent handles user interruptions during voice calls.
| Field | Type | Default | Description |
|---|---|---|---|
enabled | boolean | true | Allow users to interrupt the agent mid-speech |
threshold | float | 0.5 | Sensitivity threshold (0.0–1.0). Lower = more sensitive |
minSilenceDuration | float | 0.3 | Seconds of silence before detecting end of speech |
minSpeechDuration | float | 0.1 | Minimum speech duration to trigger interruption |
Recording Config
Recording Config
Configure call recording for voice agents.
| Field | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Enable call recording |
downloadEnabled | boolean | true | Allow recording downloads via API |
DTMF Config
DTMF Config
Dual-tone multi-frequency (keypad) detection for IVR systems.
| Field | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Enable DTMF keypad detection |
ivrDetection | boolean | false | Detect IVR system prompts |
menuEnabled | boolean | false | Enable DTMF menu navigation |
Silence Config
Silence Config
Control agent behavior during silence and unresponsive callers.
| Field | Type | Default | Description |
|---|---|---|---|
unresponsiveTimeoutSeconds | integer | 30 | Seconds before first silence prompt |
unresponsiveFinalSeconds | integer | 15 | Seconds before ending the call |
fillersEnabled | boolean | false | Use filler phrases during processing |
fillerPhrases | string[] | [] | Custom filler phrases |
Multilingual Config
Multilingual Config
Enable multi-language support in voice calls.
| Field | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Enable multilingual detection |
primaryLanguage | string | en | Primary language code |
Call Summary Config
Call Summary Config
Automatically generate a summary after each voice call ends.
| Field | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Enable post-call summaries |
prompt | string | "" | Custom summarization prompt |
Webhook Config
Webhook Config
Receive real-time event notifications for voice calls.
| Field | Type | Default | Description |
|---|---|---|---|
url | string | "" | Webhook endpoint URL |
events | string[] | [] | Events: call_started, call_ended, message_received, error_occurred |
retryCount | integer | 3 | Number of retry attempts on failure |
Graph Data
Workflow configuration forgraph or workflow agent types. Used with the visual flow editor.
Graph Data Structure
Graph Data Structure
| Type | Description |
|---|---|
trigger | Entry point — starts the workflow |
agent | LLM agent node |
team | Multi-agent team node |
condition | Conditional branching |
tool | Tool execution |
llm | Direct LLM call |
http-request | External API call |
python | Python code execution |
variable | Set/get variables |
memory | Memory operations |
delay | Wait/pause |
webhook | Webhook trigger |
approval | Human approval gate |
loop | Loop iteration |
Full Example — Voice Agent with Tools
Full Example — Simple Chat Agent
Response 201
Returns the full agent object with nested configuration (see Get Agent for complete response schema).
Voice configuration
For voice agents, pass avoice object with the cascaded TTS picker,
Speech-to-Speech settings (Gemini Live or OpenAI Realtime), noise
cancellation, and audio ambience. See the dedicated
Voice Config Reference for every
field, default, and accepted values.
Errors
| Code | Description |
|---|---|
401 | Missing or invalid API key |
403 | Agent limit reached for your plan |
422 | Invalid request body |

