Models & Providers

Supported AI models, how to choose the right one, and configuration options.

Models & Providers

thinnestAI offers ultra-fast AI models on the platform (free tier), plus OpenAI managed models and BYOK support for all major providers.

Supported Models

Sarvam AI (Default)

Sarvam AI models are free on thinnestAI and are the default for all new agents. Built from scratch for Indian languages, they deliver state-of-the-art multilingual performance across 11 Indic languages plus English.

ModelParametersBest ForSpeedIntelligenceContext
Sarvam 105B105BComplex reasoning, enterprise tasksMediumHighest128K
Sarvam 30B30BConversational agents, real-time useFastHigh128K
Sarvam M24BLegacy (migrate to 30B/105B)FastGood32K

Key Benefits

  • Free on thinnestAI — Zero token cost for all Sarvam models. No usage charges.
  • 11 Indic Languages + English — Native support for Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Odia, Punjabi, and Assamese.
  • Native Script, Romanized & Code-Mixed Input — Handles "Hinglish", Romanized Hindi, Devanagari, and mixed-language input naturally.
  • Tool Calling — Sarvam 30B and 105B support function/tool calling for agentic workflows.
  • Hybrid Reasoning — Built-in think/non-think modes. The model can reason step-by-step when needed and respond directly for simple queries.
  • Wikipedia Grounding — Optional RAG-based fact-checking against Wikipedia for factual accuracy.
  • OpenAI-Compatible API — Drop-in replacement using OpenAI SDK format. No code changes needed.

When to Use Each Model

  • Sarvam 105B — Best for complex reasoning, multi-step agentic tasks, enterprise-grade accuracy, and code generation. Powers Sarvam's Indus AI assistant.
  • Sarvam 30B (recommended default) — Best balance of speed and quality for real-time conversational agents, voice bots, and customer support. Powers Sarvam's Samvaad platform.
  • Sarvam M — Legacy 24B model being phased out. Migrate to 30B for better performance at similar speed.

Controllable Parameters

Sarvam models support these additional parameters beyond standard temperature/max_tokens:

ParameterDescription
reasoning_effortControls depth of model reasoning (low/medium/high)
wiki_groundingEnables Wikipedia-based fact-checking for factual queries
presence_penaltyEncourages introducing new topics (0.0 to 2.0)
frequency_penaltyReduces word/phrase repetition (0.0 to 2.0)
seedEnables reproducible outputs for testing

Example: Using Sarvam via API

curl -X PATCH https://api.thinnest.ai/agents/agent_abc123 \
  -H "Authorization: Bearer $THINNESTAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sarvam-30b"
  }'

Sarvam Speech Models

Sarvam also provides speech models used in thinnestAI voice agents:

ModelTypeLanguagesDescription
Saaras V3Speech-to-Text22 Indian languagesLow-latency streaming STT with code-mixed support
Bulbul V3Text-to-Speech11 Indian languagesNatural, expressive voices for production use

These are automatically available when configuring voice agents with Sarvam as the speech provider.


ThinnestAI Stack (Default — Free)

Every new agent uses the ThinnestAI Stack by default — a curated AI model optimized for ultra-fast voice and chat with reliable tool calling. No configuration needed.

  • Speed: ~200ms time-to-first-token — fastest available for voice agents
  • Tool calling: 90% accuracy (on par with GPT-4o-mini)
  • Cost: Zero token charges. Free during trial (50 voice mins, 200 chat messages). After trial, platform fee only (₹1.50/min voice, ₹0.50/msg chat)
  • Prompt caching: Automatic 50% discount on cached tokens (system prompts, tool definitions)
  • Fallback: Built-in resilience with automatic model fallback

How it works

Toggle ThinnestAI Stack ON (default) in the Agent Studio Behavior panel. The platform automatically selects the best model for your agent. Toggle OFF to choose a specific model from Sarvam, OpenAI, or BYOK providers.

When to use ThinnestAI Stack

  • Voice agents (fastest TTFT for natural conversations)
  • Chat agents with tool calling (database queries, API calls, etc.)
  • FAQ and customer support bots
  • Any agent where speed matters

When to switch to a specific model

  • You need Sarvam for Indian language-specific quality
  • You want GPT-4o for maximum reasoning quality (BYOK or managed)
  • You need Claude for instruction following (BYOK)

OpenAI (Managed)

Available on PAYG plan. Token cost with 20% markup, or use BYOK for zero token cost.

ModelBest ForSpeedIntelligenceContext
GPT-4oMultimodal tasks, complex reasoningMediumHigh128K
GPT-4o MiniCost-effective general useFastGood128K

BYOK Providers

Add your own API key in Settings > Models BYOK to unlock any of these providers. Zero token cost from us — you pay the provider directly.

ProviderModelsBest For
AnthropicClaude 4, Claude 3.5 Sonnet, HaikuInstruction following, voice agents
GoogleGemini 2.5, 2.0 Flash, 1.5 ProLong context (1M tokens), fast responses
MistralMistral Large, CodestralMultilingual, code generation
DeepSeekDeepSeek V3, R1Cost-effective reasoning
CohereCommand R+, Command RRAG-optimized
xAIGrok-2, Grok-3Real-time knowledge
PerplexitySonar Large/SmallBuilt-in web search

Browse and configure models in the dashboard under Agent Settings > Behavior > Model.

Choosing the Right Model

For Indian Language Agents

  1. Sarvam 30B (free) — Best choice. Fast, strong multilingual support, handles code-mixed input.
  2. Sarvam 105B (free) — Maximum accuracy for complex Indian language tasks.
  3. GPT-OSS 20B (free) — Good multilingual, ultra-fast inference.

For Voice Agents

Voice agents need fast response times. Long pauses feel unnatural in phone conversations.

  1. ThinnestAI Stack (free, default) — ~200ms TTFT, 90% tool calling accuracy, automatic fallback.
  2. Sarvam 30B (free) — Best for Indian language voice bots with Saaras STT + Bulbul TTS.
  3. Claude Sonnet (BYOK) — Best for English voice agents requiring nuanced responses.

For Chat Agents

Chat is more forgiving of response time, so you can prioritize intelligence:

  1. ThinnestAI Stack (free, default) — Fast and reliable for most chat use cases.
  2. Sarvam 105B (free) — Best free option for complex, multilingual conversations.
  3. GPT-4o (BYOK or managed) — Great balance of speed and capability.
  4. Claude Sonnet (BYOK) — Excellent instruction following.

For High-Volume / Cost-Sensitive

When running campaigns or handling thousands of calls:

  1. ThinnestAI Stack (free) — Zero cost, ultra-fast, handles most use cases.
  2. Sarvam 30B (free) — Zero cost, best for Indian market.

For Complex Reasoning

When your agent needs to make decisions, analyze data, or handle edge cases:

  1. Sarvam 105B (free) — State-of-the-art for Indian languages.
  2. GPT-4o (BYOK or managed) — Best reasoning quality.
  3. Claude Opus (BYOK) — Strongest reasoning for English.

Model Configuration

Setting the Model

In the dashboard, select a model from the dropdown in your agent's settings. Via the API:

curl -X PATCH https://api.thinnest.ai/agents/agent_abc123 \
  -H "Authorization: Bearer $THINNESTAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet"
  }'

Temperature

Controls the randomness of your agent's responses.

ValueBehaviorUse Case
0.0Deterministic, consistentFAQ bots, data lookup
0.3Mostly consistent with slight variationCustomer support
0.7Balanced (default)General conversation
1.0Creative, variedCreative writing, brainstorming

For voice agents, we recommend 0.3 to 0.5 — you want consistent, reliable responses.

curl -X PATCH https://api.thinnest.ai/agents/agent_abc123 \
  -H "Authorization: Bearer $THINNESTAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "temperature": 0.3
  }'

Max Tokens

Limits the length of each response. For voice agents, shorter is usually better — long responses sound unnatural when spoken aloud.

Recommended values:

  • Voice agents: 150-300 tokens
  • Chat agents: 500-1000 tokens
  • Document generation: 2000-4000 tokens

Model Pricing

thinnestAI uses a token-based billing system. You purchase tokens and they are consumed based on model usage. Different models have different token costs.

Model TierApproximate CostExamples
FreeZero costSarvam 30B, Sarvam 105B, Sarvam M
EconomyLowest cost per tokenHaiku, GPT-4o Mini, Gemini Flash
StandardModerate costSonnet, GPT-4o
PremiumHigher cost, highest capabilityOpus, GPT-4 Turbo

All Sarvam models are completely free on thinnestAI — no token charges, no usage limits on paid plans. This makes them ideal for startups and high-volume use cases targeting the Indian market.

Check the Billing page in your dashboard for current pricing per model. Phone call minutes include additional telephony costs.

Bring Your Own Key (BYOK)

Use your own API keys for any LLM, STT, or TTS provider. When you add a BYOK key, you pay the provider directly — thinnest.ai charges only the platform fee (₹1.50/min voice, ₹0.50/msg chat).

How It Works

  1. Go to Settings → API Keys → Models BYOK tab
  2. Click + Add Key next to a provider (e.g., Anthropic, Google Gemini)
  3. Paste your API key and save
  4. The provider's models immediately appear in the Model dropdown with a green BYOK badge
  5. Select any BYOK-enabled model for your agent — token cost from us is ₹0

Model Dropdown Behavior

Provider StatusDropdown Appearance
ThinnestAI Stack (default)Toggle ON — no model selection needed
Managed (Sarvam, OpenAI)Toggle OFF — select from dropdown
BYOK key addedEnabled with green BYOK badge next to provider name
No key configuredGrayed out with "BYOK Required" label. Hover shows tooltip: "Add your API key in Settings → Models BYOK"

Supported BYOK Providers

LLM: OpenAI, Anthropic, Google Gemini, Mistral, DeepSeek, Cohere, xAI, Perplexity, Together AI

STT: Deepgram, AssemblyAI, Cartesia, Sarvam, ElevenLabs, Google Cloud

TTS: Cartesia, ElevenLabs, Deepgram, Sarvam, Rime, Inworld, Google Cloud

BYOK Billing

ScenarioToken CostPlatform Fee
ThinnestAI Stack (free, default)₹0 (free)₹1.50/min voice, ₹0.50/msg chat
Using managed OpenAI (no BYOK key)Provider cost + 20% markup₹1.50/min voice, ₹0.50/msg chat
Using BYOK key for any provider₹0 (you pay provider directly)₹1.50/min voice, ₹0.50/msg chat

GST (18%) is collected at wallet topup — not charged again on platform fee.

BYOK + Trial Free Usage

BYOK users get the full trial free allowance (50 voice minutes + 200 chat messages). During trial:

Trial StatusBYOK Cost
Trial minutes/messages remaining₹0.00 total (token cost = ₹0, platform fee = ₹0)
Trial exhausted₹0 token cost + platform fee (₹1.50/min voice, ₹0.50/msg chat)

This means BYOK users can use any provider (OpenAI, Anthropic, etc.) for free during the trial period — you just pay the provider directly for their API usage.

BYOK Key Priority

Your key always takes priority. Even for providers we manage (Deepgram, Cartesia), if you add your own key, we use yours instead of ours. This applies to LLM, STT, and TTS uniformly.

Security

  • All keys are encrypted at rest using Fernet symmetric encryption
  • Keys are decrypted only at call time and never logged
  • Only masked keys (last 4 characters) are shown in the dashboard
  • Keys are fetched once at agent load time and cached — zero per-request latency overhead

API

Manage BYOK keys programmatically via the BYOK API.

Switching Models

You can change your agent's model at any time. The change takes effect on the next conversation — active calls are not interrupted.

Things to keep in mind when switching models:

  • Different models may interpret your instructions differently. Test after switching.
  • Response speed will change, which affects voice call quality.
  • Token costs will change based on the new model's pricing tier.

On this page