Models & Providers
Supported AI models, how to choose the right one, and configuration options.
Models & Providers
thinnestAI offers ultra-fast AI models on the platform (free tier), plus OpenAI managed models and BYOK support for all major providers.
Supported Models
Sarvam AI (Default)
Sarvam AI models are free on thinnestAI and are the default for all new agents. Built from scratch for Indian languages, they deliver state-of-the-art multilingual performance across 11 Indic languages plus English.
| Model | Parameters | Best For | Speed | Intelligence | Context |
|---|---|---|---|---|---|
| Sarvam 105B | 105B | Complex reasoning, enterprise tasks | Medium | Highest | 128K |
| Sarvam 30B | 30B | Conversational agents, real-time use | Fast | High | 128K |
| Sarvam M | 24B | Legacy (migrate to 30B/105B) | Fast | Good | 32K |
Key Benefits
- Free on thinnestAI — Zero token cost for all Sarvam models. No usage charges.
- 11 Indic Languages + English — Native support for Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Odia, Punjabi, and Assamese.
- Native Script, Romanized & Code-Mixed Input — Handles "Hinglish", Romanized Hindi, Devanagari, and mixed-language input naturally.
- Tool Calling — Sarvam 30B and 105B support function/tool calling for agentic workflows.
- Hybrid Reasoning — Built-in think/non-think modes. The model can reason step-by-step when needed and respond directly for simple queries.
- Wikipedia Grounding — Optional RAG-based fact-checking against Wikipedia for factual accuracy.
- OpenAI-Compatible API — Drop-in replacement using OpenAI SDK format. No code changes needed.
When to Use Each Model
- Sarvam 105B — Best for complex reasoning, multi-step agentic tasks, enterprise-grade accuracy, and code generation. Powers Sarvam's Indus AI assistant.
- Sarvam 30B (recommended default) — Best balance of speed and quality for real-time conversational agents, voice bots, and customer support. Powers Sarvam's Samvaad platform.
- Sarvam M — Legacy 24B model being phased out. Migrate to 30B for better performance at similar speed.
Controllable Parameters
Sarvam models support these additional parameters beyond standard temperature/max_tokens:
| Parameter | Description |
|---|---|
reasoning_effort | Controls depth of model reasoning (low/medium/high) |
wiki_grounding | Enables Wikipedia-based fact-checking for factual queries |
presence_penalty | Encourages introducing new topics (0.0 to 2.0) |
frequency_penalty | Reduces word/phrase repetition (0.0 to 2.0) |
seed | Enables reproducible outputs for testing |
Example: Using Sarvam via API
curl -X PATCH https://api.thinnest.ai/agents/agent_abc123 \
-H "Authorization: Bearer $THINNESTAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "sarvam-30b"
}'Sarvam Speech Models
Sarvam also provides speech models used in thinnestAI voice agents:
| Model | Type | Languages | Description |
|---|---|---|---|
| Saaras V3 | Speech-to-Text | 22 Indian languages | Low-latency streaming STT with code-mixed support |
| Bulbul V3 | Text-to-Speech | 11 Indian languages | Natural, expressive voices for production use |
These are automatically available when configuring voice agents with Sarvam as the speech provider.
ThinnestAI Stack (Default — Free)
Every new agent uses the ThinnestAI Stack by default — a curated AI model optimized for ultra-fast voice and chat with reliable tool calling. No configuration needed.
- Speed: ~200ms time-to-first-token — fastest available for voice agents
- Tool calling: 90% accuracy (on par with GPT-4o-mini)
- Cost: Zero token charges. Free during trial (50 voice mins, 200 chat messages). After trial, platform fee only (₹1.50/min voice, ₹0.50/msg chat)
- Prompt caching: Automatic 50% discount on cached tokens (system prompts, tool definitions)
- Fallback: Built-in resilience with automatic model fallback
How it works
Toggle ThinnestAI Stack ON (default) in the Agent Studio Behavior panel. The platform automatically selects the best model for your agent. Toggle OFF to choose a specific model from Sarvam, OpenAI, or BYOK providers.
When to use ThinnestAI Stack
- Voice agents (fastest TTFT for natural conversations)
- Chat agents with tool calling (database queries, API calls, etc.)
- FAQ and customer support bots
- Any agent where speed matters
When to switch to a specific model
- You need Sarvam for Indian language-specific quality
- You want GPT-4o for maximum reasoning quality (BYOK or managed)
- You need Claude for instruction following (BYOK)
OpenAI (Managed)
Available on PAYG plan. Token cost with 20% markup, or use BYOK for zero token cost.
| Model | Best For | Speed | Intelligence | Context |
|---|---|---|---|---|
| GPT-4o | Multimodal tasks, complex reasoning | Medium | High | 128K |
| GPT-4o Mini | Cost-effective general use | Fast | Good | 128K |
BYOK Providers
Add your own API key in Settings > Models BYOK to unlock any of these providers. Zero token cost from us — you pay the provider directly.
| Provider | Models | Best For |
|---|---|---|
| Anthropic | Claude 4, Claude 3.5 Sonnet, Haiku | Instruction following, voice agents |
| Gemini 2.5, 2.0 Flash, 1.5 Pro | Long context (1M tokens), fast responses | |
| Mistral | Mistral Large, Codestral | Multilingual, code generation |
| DeepSeek | DeepSeek V3, R1 | Cost-effective reasoning |
| Cohere | Command R+, Command R | RAG-optimized |
| xAI | Grok-2, Grok-3 | Real-time knowledge |
| Perplexity | Sonar Large/Small | Built-in web search |
Browse and configure models in the dashboard under Agent Settings > Behavior > Model.
Choosing the Right Model
For Indian Language Agents
- Sarvam 30B (free) — Best choice. Fast, strong multilingual support, handles code-mixed input.
- Sarvam 105B (free) — Maximum accuracy for complex Indian language tasks.
- GPT-OSS 20B (free) — Good multilingual, ultra-fast inference.
For Voice Agents
Voice agents need fast response times. Long pauses feel unnatural in phone conversations.
- ThinnestAI Stack (free, default) — ~200ms TTFT, 90% tool calling accuracy, automatic fallback.
- Sarvam 30B (free) — Best for Indian language voice bots with Saaras STT + Bulbul TTS.
- Claude Sonnet (BYOK) — Best for English voice agents requiring nuanced responses.
For Chat Agents
Chat is more forgiving of response time, so you can prioritize intelligence:
- ThinnestAI Stack (free, default) — Fast and reliable for most chat use cases.
- Sarvam 105B (free) — Best free option for complex, multilingual conversations.
- GPT-4o (BYOK or managed) — Great balance of speed and capability.
- Claude Sonnet (BYOK) — Excellent instruction following.
For High-Volume / Cost-Sensitive
When running campaigns or handling thousands of calls:
- ThinnestAI Stack (free) — Zero cost, ultra-fast, handles most use cases.
- Sarvam 30B (free) — Zero cost, best for Indian market.
For Complex Reasoning
When your agent needs to make decisions, analyze data, or handle edge cases:
- Sarvam 105B (free) — State-of-the-art for Indian languages.
- GPT-4o (BYOK or managed) — Best reasoning quality.
- Claude Opus (BYOK) — Strongest reasoning for English.
Model Configuration
Setting the Model
In the dashboard, select a model from the dropdown in your agent's settings. Via the API:
curl -X PATCH https://api.thinnest.ai/agents/agent_abc123 \
-H "Authorization: Bearer $THINNESTAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet"
}'Temperature
Controls the randomness of your agent's responses.
| Value | Behavior | Use Case |
|---|---|---|
| 0.0 | Deterministic, consistent | FAQ bots, data lookup |
| 0.3 | Mostly consistent with slight variation | Customer support |
| 0.7 | Balanced (default) | General conversation |
| 1.0 | Creative, varied | Creative writing, brainstorming |
For voice agents, we recommend 0.3 to 0.5 — you want consistent, reliable responses.
curl -X PATCH https://api.thinnest.ai/agents/agent_abc123 \
-H "Authorization: Bearer $THINNESTAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"temperature": 0.3
}'Max Tokens
Limits the length of each response. For voice agents, shorter is usually better — long responses sound unnatural when spoken aloud.
Recommended values:
- Voice agents: 150-300 tokens
- Chat agents: 500-1000 tokens
- Document generation: 2000-4000 tokens
Model Pricing
thinnestAI uses a token-based billing system. You purchase tokens and they are consumed based on model usage. Different models have different token costs.
| Model Tier | Approximate Cost | Examples |
|---|---|---|
| Free | Zero cost | Sarvam 30B, Sarvam 105B, Sarvam M |
| Economy | Lowest cost per token | Haiku, GPT-4o Mini, Gemini Flash |
| Standard | Moderate cost | Sonnet, GPT-4o |
| Premium | Higher cost, highest capability | Opus, GPT-4 Turbo |
All Sarvam models are completely free on thinnestAI — no token charges, no usage limits on paid plans. This makes them ideal for startups and high-volume use cases targeting the Indian market.
Check the Billing page in your dashboard for current pricing per model. Phone call minutes include additional telephony costs.
Bring Your Own Key (BYOK)
Use your own API keys for any LLM, STT, or TTS provider. When you add a BYOK key, you pay the provider directly — thinnest.ai charges only the platform fee (₹1.50/min voice, ₹0.50/msg chat).
How It Works
- Go to Settings → API Keys → Models BYOK tab
- Click + Add Key next to a provider (e.g., Anthropic, Google Gemini)
- Paste your API key and save
- The provider's models immediately appear in the Model dropdown with a green BYOK badge
- Select any BYOK-enabled model for your agent — token cost from us is ₹0
Model Dropdown Behavior
| Provider Status | Dropdown Appearance |
|---|---|
| ThinnestAI Stack (default) | Toggle ON — no model selection needed |
| Managed (Sarvam, OpenAI) | Toggle OFF — select from dropdown |
| BYOK key added | Enabled with green BYOK badge next to provider name |
| No key configured | Grayed out with "BYOK Required" label. Hover shows tooltip: "Add your API key in Settings → Models BYOK" |
Supported BYOK Providers
LLM: OpenAI, Anthropic, Google Gemini, Mistral, DeepSeek, Cohere, xAI, Perplexity, Together AI
STT: Deepgram, AssemblyAI, Cartesia, Sarvam, ElevenLabs, Google Cloud
TTS: Cartesia, ElevenLabs, Deepgram, Sarvam, Rime, Inworld, Google Cloud
BYOK Billing
| Scenario | Token Cost | Platform Fee |
|---|---|---|
| ThinnestAI Stack (free, default) | ₹0 (free) | ₹1.50/min voice, ₹0.50/msg chat |
| Using managed OpenAI (no BYOK key) | Provider cost + 20% markup | ₹1.50/min voice, ₹0.50/msg chat |
| Using BYOK key for any provider | ₹0 (you pay provider directly) | ₹1.50/min voice, ₹0.50/msg chat |
GST (18%) is collected at wallet topup — not charged again on platform fee.
BYOK + Trial Free Usage
BYOK users get the full trial free allowance (50 voice minutes + 200 chat messages). During trial:
| Trial Status | BYOK Cost |
|---|---|
| Trial minutes/messages remaining | ₹0.00 total (token cost = ₹0, platform fee = ₹0) |
| Trial exhausted | ₹0 token cost + platform fee (₹1.50/min voice, ₹0.50/msg chat) |
This means BYOK users can use any provider (OpenAI, Anthropic, etc.) for free during the trial period — you just pay the provider directly for their API usage.
BYOK Key Priority
Your key always takes priority. Even for providers we manage (Deepgram, Cartesia), if you add your own key, we use yours instead of ours. This applies to LLM, STT, and TTS uniformly.
Security
- All keys are encrypted at rest using Fernet symmetric encryption
- Keys are decrypted only at call time and never logged
- Only masked keys (last 4 characters) are shown in the dashboard
- Keys are fetched once at agent load time and cached — zero per-request latency overhead
API
Manage BYOK keys programmatically via the BYOK API.
Switching Models
You can change your agent's model at any time. The change takes effect on the next conversation — active calls are not interrupted.
Things to keep in mind when switching models:
- Different models may interpret your instructions differently. Test after switching.
- Response speed will change, which affects voice call quality.
- Token costs will change based on the new model's pricing tier.