Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.thinnest.ai/llms.txt

Use this file to discover all available pages before exploring further.

Troubleshooting & Optimization

When a voice call doesn’t feel right — too slow, the agent invents details, replies in the wrong language — the cause is almost always one of a handful of knobs in the Voice tab, the system prompt, or the knowledge base. This page collects the patterns we see most often and the exact fix for each.

The agent feels slow even when latency stats look good

Symptom. You hear a 3–4 second silence after you finish speaking, but the call’s latency log shows e2e under 1 second. Why. Two clocks are running. The e2e number in the dashboard starts ticking the moment the system decides you stopped speaking. Before that, the agent is still listening for a continuation — that wait is invisible in the per-turn stats but very audible on the call. Fix. In Voice → Detection, drop Max Endpointing Delay to between 0.3s and 0.5s. Default presets ship at 1.5–2.0s to be patient with mid-sentence pauses; if your callers finish sentences cleanly, that wait is pure dead air. The slider now allows values as low as 0.2s.
Latency presetMax Endpointing DelayWhen to use
Aggressive1.0sFast-talking callers, ambient quiet
Balanced (default)1.5sGeneral-purpose
High Accuracy2.0sCallers who pause to think mid-sentence
Custom0.2s–5.0sTune to your traffic

The agent invents prices, plan names, or numbers

Symptom. Caller asks about pricing or a specific feature. The agent confidently states a number, tier name, or detail that doesn’t exist in your business. Why. When the agent’s knowledge base doesn’t return the exact fact the caller asked about, the underlying model fills the gap with plausible-sounding content. Even when the KB returns related material, if the specific number is missing, the model can interpolate. Fix. Three things, in order of effectiveness:
  1. Put critical facts directly in the system prompt as a verbatim fallback. Pricing, hours, contact info, top product names — anything the caller might ask about by name. The model uses prompt content as ground truth.
    PRICING (cite verbatim, do not paraphrase):
    - Starter: $99/month, 5 users
    - Team: $299/month, 25 users
    - Enterprise: custom, contact sales
    
  2. Add an anti-fabrication rule to your prompt:
    Never invent prices, plan names, or specific numbers. If the answer
    isn't in your system prompt or the knowledge base context, say:
    "Let me share the exact details — I'll send you a link" or offer
    a callback. Do not guess.
    
  3. Verify the KB actually has the answer. Go to Knowledge → click the source → use the test query box at the top. If your query doesn’t surface the right chunk, the KB is the problem, not the model.

The agent gives wrong answers for Hindi or regional-language queries

Symptom. Caller asks in Hindi (or Marathi, Tamil, etc.) — the agent answers in the right language, but with content unrelated to what they asked. Why. Most knowledge bases are built from English-language source documents. When the caller asks in a regional language, the search system can’t find the right chunk because the words don’t match the indexed text. Fix. Three options, depending on your traffic mix:
  • If most callers use Hinglish or code-mix English keywords (“मुझे pricing के बारे में बताइए”), no change needed — English keywords route correctly.
  • If callers speak pure regional language, add a system-prompt instruction: “Before searching the knowledge base, translate the caller’s question into English.” This works because the model performs the translation internally before tool use.
  • For high-traffic regional-language agents, ingest translated KB content alongside English. The Knowledge tab supports ingest from URLs, PDFs, and direct text — translate your key pages once and upload both versions.

Why is the first response of a call slower than subsequent ones?

Symptom. The first time the caller asks a knowledge-base question, the agent takes 1–2 seconds longer than every following turn. Why. Several internal caches need to warm up on the first request — embedding model, knowledge retrieval, model context. Once warmed, subsequent turns reuse the cached state. Fix. This is expected behavior — you can’t eliminate it entirely, but you can minimize it:
  • Enable Knowledge Prefetch (Voice → Detection → Preemption Strategy → RAG Prefetch). The agent fires a speculative knowledge lookup while the caller is still speaking, so by the time they finish their sentence, the answer is already in flight.
  • Use the balanced or aggressive latency presets, which enable prefetch by default.
  • Keep your system prompt stable across calls — variable substitutions placed late in the prompt cache better than ones placed early (see System prompt tuning below).

My agent makes too many separate replies before answering

Symptom. Caller asks a simple question. Before the actual answer arrives, the agent says “Let me check…” or makes a tool call that wasn’t needed. Why. Most often this is the agent running a tool — like get_user — on every turn because the prompt encourages it. Each tool call adds a model round-trip plus the tool’s own latency. Fix. In the Tools tab, review which tools are attached:
  • Disable tools the agent doesn’t need on every turn. get_user, calendar lookups, and CRM fetches are best invoked only when the caller asks something that genuinely requires them.
  • Tighten your prompt: instead of “use get_user to personalize replies”, say “only call get_user when the caller asks about their account, balance, or order”.
  • For widget agents where you already have the caller’s name via URL parameters, you usually don’t need get_user at all — the name is injected as a {{name}} variable.

How do I optimize my system prompt for caching and speed?

The model providers we use (Groq, OpenAI) automatically cache the unchanged prefix of your prompt across calls. Anything you can keep identical between calls runs faster. Best practices:
  • Place variable substitutions like {{name}} near the end of your system prompt, not the top. A variable at the top breaks the cache for every call (since the value changes per caller); a variable at the bottom keeps most of the prompt cacheable.
  • Avoid time-of-day variables like {{current_time}} in the body of the prompt unless your behavior genuinely needs them. They change every call and prevent any cache reuse.
  • Don’t restate the same instructions in multiple places. Each duplicate makes the prompt longer without adding signal.
  • Keep tools list stable across agents of the same persona. The tool definitions are part of the cacheable prefix.
Before / after: Variable at top (cache miss every call):
You are Priya, support agent.
You are talking to {{name}}.

Role: Help users with...
(2,500 chars of static rules below)
Variable at bottom (cache hits across all calls):
You are Priya, support agent.

Role: Help users with...
(2,500 chars of static rules)

Caller: {{name}}.

How do I check what knowledge the agent actually retrieved for a question?

Two ways:
  • During development: Use the Test Voice Call button on your agent’s page. Each turn’s retrieved knowledge passages are surfaced in the call log alongside the agent’s reply.
  • After the call: Open Analytics → Call Recordings, click any call, then the Turn Details panel. Each turn shows the user’s transcript, the chunks the agent retrieved, the agent’s reply, and the per-component latency breakdown.
If the retrieved chunks don’t actually contain the answer to the caller’s question, the knowledge base is the bottleneck — re-ingest the source, check for stale URLs, or split long documents into smaller focused pages.

Quick decision tree

ProblemFirst thing to try
3-4s pause before agent speaksDrop Max Endpointing Delay in Detection tab
Agent invents detailsPut facts in system prompt; add anti-fabrication rule
Wrong answers in Hindi/regional languagesAdd “translate to English before searching KB” to prompt
First reply slowEnable RAG Prefetch (Detection → Preemption)
Too many “let me check” delaysRemove unused tools from the Tools tab
All replies slowCheck Latency Preset; switch to Balanced or Aggressive
Hallucinating after KB changesRe-test the failing query in Knowledge → Test

Still stuck?

If you’ve tried the relevant fix above and the issue persists, capture:
  1. The agent’s public ID (e.g., ag_xxxxxxxx)
  2. The specific call ID from Analytics
  3. A 1-2 sentence description of expected vs actual behavior
Send those to support — call-level recordings + per-turn diagnostics let us identify the root cause without re-creating the issue.