Documentation Index
Fetch the complete documentation index at: https://docs.thinnest.ai/llms.txt
Use this file to discover all available pages before exploring further.
Troubleshooting & Optimization
When a voice call doesn’t feel right — too slow, the agent invents details, replies in the wrong language — the cause is almost always one of a handful of knobs in the Voice tab, the system prompt, or the knowledge base. This page collects the patterns we see most often and the exact fix for each.The agent feels slow even when latency stats look good
Symptom. You hear a 3–4 second silence after you finish speaking, but the call’s latency log shows e2e under 1 second. Why. Two clocks are running. The e2e number in the dashboard starts ticking the moment the system decides you stopped speaking. Before that, the agent is still listening for a continuation — that wait is invisible in the per-turn stats but very audible on the call. Fix. In Voice → Detection, drop Max Endpointing Delay to between 0.3s and 0.5s. Default presets ship at 1.5–2.0s to be patient with mid-sentence pauses; if your callers finish sentences cleanly, that wait is pure dead air. The slider now allows values as low as 0.2s.| Latency preset | Max Endpointing Delay | When to use |
|---|---|---|
| Aggressive | 1.0s | Fast-talking callers, ambient quiet |
| Balanced (default) | 1.5s | General-purpose |
| High Accuracy | 2.0s | Callers who pause to think mid-sentence |
| Custom | 0.2s–5.0s | Tune to your traffic |
The agent invents prices, plan names, or numbers
Symptom. Caller asks about pricing or a specific feature. The agent confidently states a number, tier name, or detail that doesn’t exist in your business. Why. When the agent’s knowledge base doesn’t return the exact fact the caller asked about, the underlying model fills the gap with plausible-sounding content. Even when the KB returns related material, if the specific number is missing, the model can interpolate. Fix. Three things, in order of effectiveness:-
Put critical facts directly in the system prompt as a verbatim fallback. Pricing, hours, contact info, top product names — anything the caller might ask about by name. The model uses prompt content as ground truth.
-
Add an anti-fabrication rule to your prompt:
- Verify the KB actually has the answer. Go to Knowledge → click the source → use the test query box at the top. If your query doesn’t surface the right chunk, the KB is the problem, not the model.
The agent gives wrong answers for Hindi or regional-language queries
Symptom. Caller asks in Hindi (or Marathi, Tamil, etc.) — the agent answers in the right language, but with content unrelated to what they asked. Why. Most knowledge bases are built from English-language source documents. When the caller asks in a regional language, the search system can’t find the right chunk because the words don’t match the indexed text. Fix. Three options, depending on your traffic mix:- If most callers use Hinglish or code-mix English keywords (“मुझे pricing के बारे में बताइए”), no change needed — English keywords route correctly.
- If callers speak pure regional language, add a system-prompt instruction: “Before searching the knowledge base, translate the caller’s question into English.” This works because the model performs the translation internally before tool use.
- For high-traffic regional-language agents, ingest translated KB content alongside English. The Knowledge tab supports ingest from URLs, PDFs, and direct text — translate your key pages once and upload both versions.
Why is the first response of a call slower than subsequent ones?
Symptom. The first time the caller asks a knowledge-base question, the agent takes 1–2 seconds longer than every following turn. Why. Several internal caches need to warm up on the first request — embedding model, knowledge retrieval, model context. Once warmed, subsequent turns reuse the cached state. Fix. This is expected behavior — you can’t eliminate it entirely, but you can minimize it:- Enable Knowledge Prefetch (Voice → Detection → Preemption Strategy →
RAG Prefetch). The agent fires a speculative knowledge lookup while the caller is still speaking, so by the time they finish their sentence, the answer is already in flight. - Use the balanced or aggressive latency presets, which enable prefetch by default.
- Keep your system prompt stable across calls — variable substitutions placed late in the prompt cache better than ones placed early (see System prompt tuning below).
My agent makes too many separate replies before answering
Symptom. Caller asks a simple question. Before the actual answer arrives, the agent says “Let me check…” or makes a tool call that wasn’t needed. Why. Most often this is the agent running a tool — likeget_user — on every turn because the prompt encourages it. Each tool call adds a model round-trip plus the tool’s own latency.
Fix. In the Tools tab, review which tools are attached:
- Disable tools the agent doesn’t need on every turn.
get_user, calendar lookups, and CRM fetches are best invoked only when the caller asks something that genuinely requires them. - Tighten your prompt: instead of “use get_user to personalize replies”, say “only call get_user when the caller asks about their account, balance, or order”.
- For widget agents where you already have the caller’s name via URL parameters, you usually don’t need
get_userat all — the name is injected as a{{name}}variable.
How do I optimize my system prompt for caching and speed?
The model providers we use (Groq, OpenAI) automatically cache the unchanged prefix of your prompt across calls. Anything you can keep identical between calls runs faster. Best practices:- Place variable substitutions like
{{name}}near the end of your system prompt, not the top. A variable at the top breaks the cache for every call (since the value changes per caller); a variable at the bottom keeps most of the prompt cacheable. - Avoid time-of-day variables like
{{current_time}}in the body of the prompt unless your behavior genuinely needs them. They change every call and prevent any cache reuse. - Don’t restate the same instructions in multiple places. Each duplicate makes the prompt longer without adding signal.
- Keep tools list stable across agents of the same persona. The tool definitions are part of the cacheable prefix.
How do I check what knowledge the agent actually retrieved for a question?
Two ways:- During development: Use the Test Voice Call button on your agent’s page. Each turn’s retrieved knowledge passages are surfaced in the call log alongside the agent’s reply.
- After the call: Open Analytics → Call Recordings, click any call, then the Turn Details panel. Each turn shows the user’s transcript, the chunks the agent retrieved, the agent’s reply, and the per-component latency breakdown.
Quick decision tree
| Problem | First thing to try |
|---|---|
| 3-4s pause before agent speaks | Drop Max Endpointing Delay in Detection tab |
| Agent invents details | Put facts in system prompt; add anti-fabrication rule |
| Wrong answers in Hindi/regional languages | Add “translate to English before searching KB” to prompt |
| First reply slow | Enable RAG Prefetch (Detection → Preemption) |
| Too many “let me check” delays | Remove unused tools from the Tools tab |
| All replies slow | Check Latency Preset; switch to Balanced or Aggressive |
| Hallucinating after KB changes | Re-test the failing query in Knowledge → Test |
Still stuck?
If you’ve tried the relevant fix above and the issue persists, capture:- The agent’s public ID (e.g.,
ag_xxxxxxxx) - The specific call ID from Analytics
- A 1-2 sentence description of expected vs actual behavior

