Models & Providers

thinnestAI offers ultra-fast AI models on the platform (free tier), plus OpenAI managed models and BYOK support for all major providers.

Supported Models

Sarvam AI (Default)

Sarvam AI models are free on thinnestAI and are the default for all new agents. Built from scratch for Indian languages, they deliver state-of-the-art multilingual performance across 11 Indic languages plus English.

Model	Parameters	Best For	Speed	Intelligence	Context
Sarvam 105B	105B	Complex reasoning, enterprise tasks	Medium	Highest	128K
Sarvam 30B	30B	Conversational agents, real-time use	Fast	High	128K
Sarvam M	24B	Legacy (migrate to 30B/105B)	Fast	Good	32K

Key Benefits

Free on thinnestAI — Zero token cost for all Sarvam models. No usage charges.
11 Indic Languages + English — Native support for Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Odia, Punjabi, and Assamese.
Native Script, Romanized & Code-Mixed Input — Handles “Hinglish”, Romanized Hindi, Devanagari, and mixed-language input naturally.
Tool Calling — Sarvam 30B and 105B support function/tool calling for agentic workflows.
Hybrid Reasoning — Built-in think/non-think modes. The model can reason step-by-step when needed and respond directly for simple queries.
Wikipedia Grounding — Optional RAG-based fact-checking against Wikipedia for factual accuracy.
OpenAI-Compatible API — Drop-in replacement using OpenAI SDK format. No code changes needed.

When to Use Each Model

Sarvam 105B — Best for complex reasoning, multi-step agentic tasks, enterprise-grade accuracy, and code generation. Powers Sarvam’s Indus AI assistant.
Sarvam 30B (recommended default) — Best balance of speed and quality for real-time conversational agents, voice bots, and customer support. Powers Sarvam’s Samvaad platform.
Sarvam M — Legacy 24B model being phased out. Migrate to 30B for better performance at similar speed.

Controllable Parameters

Sarvam models support these additional parameters beyond standard temperature/max_tokens:

Parameter	Description
`reasoning_effort`	Controls depth of model reasoning (low/medium/high)
`wiki_grounding`	Enables Wikipedia-based fact-checking for factual queries
`presence_penalty`	Encourages introducing new topics (0.0 to 2.0)
`frequency_penalty`	Reduces word/phrase repetition (0.0 to 2.0)
`seed`	Enables reproducible outputs for testing

Example: Using Sarvam via API

curl -X PATCH https://api.thinnest.ai/agents/agent_abc123 \
  -H "Authorization: Bearer $THINNESTAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sarvam-30b"
  }'

Sarvam Speech Models

Sarvam also provides speech models used in thinnestAI voice agents:

Model	Type	Languages	Description
Saaras V3	Speech-to-Text	22 Indian languages	Low-latency streaming STT with code-mixed support
Bulbul V3	Text-to-Speech	11 Indian languages	Natural, expressive voices for production use

These are automatically available when configuring voice agents with Sarvam as the speech provider.

ThinnestAI Stack (Default — Free)

Every new agent uses the ThinnestAI Stack by default — a curated AI model optimized for ultra-fast voice and chat with reliable tool calling. No configuration needed.

Speed: ~200ms time-to-first-token — fastest available for voice agents
Tool calling: 90% accuracy (on par with GPT-4o-mini)
Cost: Zero token charges. Free during trial (50 voice mins, 200 chat messages). After trial, platform fee only (₹1.50/min voice, ₹0.50/msg chat)
Prompt caching: Automatic 50% discount on cached tokens (system prompts, tool definitions)
Fallback: Built-in resilience with automatic model fallback

How it works

Toggle ThinnestAI Stack ON (default) in the Agent Studio Behavior panel. The platform automatically selects the best model for your agent. Toggle OFF to choose a specific model from Sarvam, OpenAI, or BYOK providers.

When to use ThinnestAI Stack

Voice agents (fastest TTFT for natural conversations)
Chat agents with tool calling (database queries, API calls, etc.)
FAQ and customer support bots
Any agent where speed matters

When to switch to a specific model

You need Sarvam for Indian language-specific quality
You want GPT-4o for maximum reasoning quality (BYOK or managed)
You need Claude for instruction following (BYOK)

OpenAI (Managed)

Available on PAYG plan. Token cost with 20% markup, or use BYOK for zero token cost.

Model	Best For	Speed	Intelligence	Context
GPT-4o	Multimodal tasks, complex reasoning	Medium	High	128K
GPT-4o Mini	Cost-effective general use	Fast	Good	128K

BYOK Providers

Add your own API key in Settings > Models BYOK to unlock any of these providers. Zero token cost from us — you pay the provider directly.

Provider	Models	Best For
Anthropic	Claude 4, Claude 3.5 Sonnet, Haiku	Instruction following, voice agents
Google	Gemini 2.5, 2.0 Flash, 1.5 Pro	Long context (1M tokens), fast responses
Mistral	Mistral Large, Codestral	Multilingual, code generation
DeepSeek	DeepSeek V3, R1	Cost-effective reasoning
Cohere	Command R+, Command R	RAG-optimized
xAI	Grok-2, Grok-3	Real-time knowledge
Perplexity	Sonar Large/Small	Built-in web search

Browse and configure models in the dashboard under Agent Settings > Behavior > Model.

Choosing the Right Model

For Indian Language Agents

Sarvam 30B (free) — Best choice. Fast, strong multilingual support, handles code-mixed input.
Sarvam 105B (free) — Maximum accuracy for complex Indian language tasks.
GPT-OSS 20B (free) — Good multilingual, ultra-fast inference.

For Voice Agents

Voice agents need fast response times. Long pauses feel unnatural in phone conversations.

ThinnestAI Stack (free, default) — ~200ms TTFT, 90% tool calling accuracy, automatic fallback.
Sarvam 30B (free) — Best for Indian language voice bots with Saaras STT + Bulbul TTS.
Claude Sonnet (BYOK) — Best for English voice agents requiring nuanced responses.

For Chat Agents

Chat is more forgiving of response time, so you can prioritize intelligence:

ThinnestAI Stack (free, default) — Fast and reliable for most chat use cases.
Sarvam 105B (free) — Best free option for complex, multilingual conversations.
GPT-4o (BYOK or managed) — Great balance of speed and capability.
Claude Sonnet (BYOK) — Excellent instruction following.

For High-Volume / Cost-Sensitive

When running campaigns or handling thousands of calls:

ThinnestAI Stack (free) — Zero cost, ultra-fast, handles most use cases.
Sarvam 30B (free) — Zero cost, best for Indian market.

For Complex Reasoning

When your agent needs to make decisions, analyze data, or handle edge cases:

Sarvam 105B (free) — State-of-the-art for Indian languages.
GPT-4o (BYOK or managed) — Best reasoning quality.
Claude Opus (BYOK) — Strongest reasoning for English.

Model Configuration

Setting the Model

In the dashboard, select a model from the dropdown in your agent’s settings. Via the API:

curl -X PATCH https://api.thinnest.ai/agents/agent_abc123 \
  -H "Authorization: Bearer $THINNESTAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet"
  }'

Temperature

Controls the randomness of your agent’s responses.

Value	Behavior	Use Case
0.0	Deterministic, consistent	FAQ bots, data lookup
0.3	Mostly consistent with slight variation	Customer support
0.7	Balanced (default)	General conversation
1.0	Creative, varied	Creative writing, brainstorming

For voice agents, we recommend 0.3 to 0.5 — you want consistent, reliable responses.

curl -X PATCH https://api.thinnest.ai/agents/agent_abc123 \
  -H "Authorization: Bearer $THINNESTAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "temperature": 0.3
  }'

Max Tokens

Limits the length of each response. For voice agents, shorter is usually better — long responses sound unnatural when spoken aloud. Recommended values:

Voice agents: 150-300 tokens
Chat agents: 500-1000 tokens
Document generation: 2000-4000 tokens

For reasoning models (gpt-oss, o-series, QwQ, DeepSeek R1, ThinnestAI Stack) the platform automatically floors max_tokens so reasoning doesn’t consume the entire budget and produce zero spoken output:

Reasoning toggle OFF: floor 400 tokens (low effort, hidden format)
Reasoning toggle ON: floor 1500 tokens (user-picked effort/format)

This is enforced in the voice bot — your max_tokens UI setting only matters when it exceeds the floor.

Reasoning (for reasoning models)

Some models — gpt-oss, OpenAI o-series, Alibaba QwQ, DeepSeek R1, ThinnestAI Stack — have built-in chain-of-thought reasoning. By default the platform forces them into low-effort, hidden mode so they behave like normal models for voice agents (fast turns, no token waste on thinking). When you genuinely want the agent to reason hard about a complex question — multi-step calculations, conflict resolution, ambiguous requests — toggle Reasoning ON in the Behavior panel:

Setting	UI label	Values	Effect
`reasoning`	Reasoning toggle	`true` / `false` (default)	Master switch. OFF forces minimal reasoning.
`reasoning_effort`	Effort	`low` / `medium` / `high`	How many internal thinking tokens the model spends per turn. Higher = better reasoning, slower TTFT.
`reasoning_format`	Format	`parsed` / `raw` / `hidden` (default)	Whether the reasoning trace is included in the response. `hidden` still bills the tokens but drops them from the stream. Groq-only.

Cost impact: reasoning tokens count toward output tokens. With effort: high, expect 3-5× the cost of a normal turn. Latency impact: TTFT scales with effort (~+200ms on medium, ~+500-1000ms on high). For voice agents that need to feel snappy, keep effort at low even with the toggle on. For non-reasoning models (gpt-4o, llama, sarvam, gpt-4o-mini, claude), all three settings are silently ignored — no harm in setting them.

curl -X PATCH https://api.thinnest.ai/agents/agent_abc123 \
  -H "Authorization: Bearer $THINNESTAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "graph_data": {
      "nodes": [{
        "type": "agent",
        "data": {
          "reasoning": true,
          "reasoning_effort": "medium",
          "reasoning_format": "hidden"
        }
      }]
    }
  }'

Model Pricing

thinnestAI uses a token-based billing system. You purchase tokens and they are consumed based on model usage. Different models have different token costs.

Model Tier	Approximate Cost	Examples
Free	Zero cost	Sarvam 30B, Sarvam 105B, Sarvam M
Economy	Lowest cost per token	Haiku, GPT-4o Mini, Gemini Flash
Standard	Moderate cost	Sonnet, GPT-4o
Premium	Higher cost, highest capability	Opus, GPT-4 Turbo

All Sarvam models are completely free on thinnestAI — no token charges, no usage limits on paid plans. This makes them ideal for startups and high-volume use cases targeting the Indian market. Check the Billing page in your dashboard for current pricing per model. Phone call minutes include additional telephony costs.

Bring Your Own Key (BYOK)

Use your own API keys for any LLM, STT, or TTS provider. When you add a BYOK key, you pay the provider directly — thinnest.ai charges only the platform fee (₹1.50/min voice, ₹0.50/msg chat).

How It Works

Go to Settings → API Keys → Models BYOK tab
Click + Add Key next to a provider (e.g., Anthropic, Google Gemini)
Paste your API key and save
The provider’s models immediately appear in the Model dropdown with a green BYOK badge
Select any BYOK-enabled model for your agent — token cost from us is ₹0

Provider Status	Dropdown Appearance
ThinnestAI Stack (default)	Toggle ON — no model selection needed
Managed (Sarvam, OpenAI)	Toggle OFF — select from dropdown
BYOK key added	Enabled with green BYOK badge next to provider name
No key configured	Grayed out with “BYOK Required” label. Hover shows tooltip: “Add your API key in Settings → Models BYOK”

Supported BYOK Providers

LLM: OpenAI, Anthropic, Google Gemini, Mistral, DeepSeek, Cohere, xAI, Perplexity, Together AI STT: Deepgram, AssemblyAI, Cartesia, Sarvam, ElevenLabs, Google Cloud TTS: Cartesia, ElevenLabs, Deepgram, Sarvam, Rime, Inworld, Google Cloud

BYOK Billing

Scenario	Token Cost	Platform Fee
ThinnestAI Stack (free, default)	₹0 (free)	₹1.50/min voice, ₹0.50/msg chat
Using managed OpenAI (no BYOK key)	Provider cost + 20% markup	₹1.50/min voice, ₹0.50/msg chat
Using BYOK key for any provider	₹0 (you pay provider directly)	₹1.50/min voice, ₹0.50/msg chat

GST (18%) is collected at wallet topup — not charged again on platform fee.

BYOK + Trial Free Usage

BYOK users get the full trial free allowance (50 voice minutes + 200 chat messages). During trial:

Trial Status	BYOK Cost
Trial minutes/messages remaining	₹0.00 total (token cost = ₹0, platform fee = ₹0)
Trial exhausted	₹0 token cost + platform fee (₹1.50/min voice, ₹0.50/msg chat)

This means BYOK users can use any provider (OpenAI, Anthropic, etc.) for free during the trial period — you just pay the provider directly for their API usage.

BYOK Key Priority

Your key always takes priority. Even for providers we manage (Deepgram, Cartesia), if you add your own key, we use yours instead of ours. This applies to LLM, STT, and TTS uniformly.

Security

All keys are encrypted at rest using Fernet symmetric encryption
Keys are decrypted only at call time and never logged
Only masked keys (last 4 characters) are shown in the dashboard
Keys are fetched once at agent load time and cached — zero per-request latency overhead

API

Manage BYOK keys programmatically via the BYOK API.

Switching Models

You can change your agent’s model at any time. The change takes effect on the next conversation — active calls are not interrupted. Things to keep in mind when switching models:

Different models may interpret your instructions differently. Test after switching.
Response speed will change, which affects voice call quality.
Token costs will change based on the new model’s pricing tier.

​Models & Providers

​Supported Models

​Sarvam AI (Default)

​Key Benefits

​When to Use Each Model

​Controllable Parameters

​Example: Using Sarvam via API

​Sarvam Speech Models

​ThinnestAI Stack (Default — Free)

​How it works

​When to use ThinnestAI Stack

​When to switch to a specific model

​OpenAI (Managed)

​BYOK Providers

​Choosing the Right Model

​For Indian Language Agents

​For Voice Agents

​For Chat Agents

​For High-Volume / Cost-Sensitive

​For Complex Reasoning

​Model Configuration

​Setting the Model

​Temperature

​Max Tokens

​Reasoning (for reasoning models)

​Model Pricing

​Bring Your Own Key (BYOK)

​How It Works

​Model Dropdown Behavior

​Supported BYOK Providers

​BYOK Billing

​BYOK + Trial Free Usage

​BYOK Key Priority

​Security

​API

​Switching Models