> ## Documentation Index
> Fetch the complete documentation index at: https://docs.thinnest.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Troubleshooting & Optimization

> Common voice agent issues — slow responses, wrong answers, knowledge-base misses — and the settings that fix them.

# Troubleshooting & Optimization

When a voice call doesn't feel right — too slow, the agent invents details, replies in the wrong language — the cause is almost always one of a handful of knobs in the Voice tab, the system prompt, or the knowledge base. This page collects the patterns we see most often and the exact fix for each.

## The agent feels slow even when latency stats look good

**Symptom.** You hear a 3–4 second silence after you finish speaking, but the call's latency log shows e2e under 1 second.

**Why.** Two clocks are running. The e2e number in the dashboard starts ticking the moment the system *decides* you stopped speaking. Before that, the agent is still listening for a continuation — that wait is invisible in the per-turn stats but very audible on the call.

**Fix.** In **Voice → Detection**, drop **Max Endpointing Delay** to between **0.3s and 0.5s**. Default presets ship at 1.5–2.0s to be patient with mid-sentence pauses; if your callers finish sentences cleanly, that wait is pure dead air. The slider now allows values as low as 0.2s.

| Latency preset     | Max Endpointing Delay | When to use                             |
| ------------------ | --------------------- | --------------------------------------- |
| Aggressive         | 1.0s                  | Fast-talking callers, ambient quiet     |
| Balanced (default) | 1.5s                  | General-purpose                         |
| High Accuracy      | 2.0s                  | Callers who pause to think mid-sentence |
| Custom             | 0.2s–5.0s             | Tune to your traffic                    |

## The agent invents prices, plan names, or numbers

**Symptom.** Caller asks about pricing or a specific feature. The agent confidently states a number, tier name, or detail that doesn't exist in your business.

**Why.** When the agent's knowledge base doesn't return the exact fact the caller asked about, the underlying model fills the gap with plausible-sounding content. Even when the KB returns related material, if the *specific* number is missing, the model can interpolate.

**Fix.** Three things, in order of effectiveness:

1. **Put critical facts directly in the system prompt** as a verbatim fallback. Pricing, hours, contact info, top product names — anything the caller might ask about by name. The model uses prompt content as ground truth.

   ```
   PRICING (cite verbatim, do not paraphrase):
   - Starter: $99/month, 5 users
   - Team: $299/month, 25 users
   - Enterprise: custom, contact sales
   ```

2. **Add an anti-fabrication rule** to your prompt:

   ```
   Never invent prices, plan names, or specific numbers. If the answer
   isn't in your system prompt or the knowledge base context, say:
   "Let me share the exact details — I'll send you a link" or offer
   a callback. Do not guess.
   ```

3. **Verify the KB actually has the answer.** Go to **Knowledge** → click the source → use the test query box at the top. If your query doesn't surface the right chunk, the KB is the problem, not the model.

## The agent gives wrong answers for Hindi or regional-language queries

**Symptom.** Caller asks in Hindi (or Marathi, Tamil, etc.) — the agent answers in the right language, but with content unrelated to what they asked.

**Why.** Most knowledge bases are built from English-language source documents. When the caller asks in a regional language, the search system can't find the right chunk because the words don't match the indexed text.

**Fix.** Three options, depending on your traffic mix:

* **If most callers use Hinglish or code-mix English keywords** ("मुझे pricing के बारे में बताइए"), no change needed — English keywords route correctly.
* **If callers speak pure regional language**, add a system-prompt instruction: *"Before searching the knowledge base, translate the caller's question into English."* This works because the model performs the translation internally before tool use.
* **For high-traffic regional-language agents**, ingest translated KB content alongside English. The Knowledge tab supports ingest from URLs, PDFs, and direct text — translate your key pages once and upload both versions.

## Why is the first response of a call slower than subsequent ones?

**Symptom.** The first time the caller asks a knowledge-base question, the agent takes 1–2 seconds longer than every following turn.

**Why.** Several internal caches need to warm up on the first request — embedding model, knowledge retrieval, model context. Once warmed, subsequent turns reuse the cached state.

**Fix.** This is expected behavior — you can't eliminate it entirely, but you can minimize it:

* Enable **Knowledge Prefetch** (Voice → Detection → Preemption Strategy → `RAG Prefetch`). The agent fires a speculative knowledge lookup while the caller is *still speaking*, so by the time they finish their sentence, the answer is already in flight.
* Use the **balanced** or **aggressive** latency presets, which enable prefetch by default.
* Keep your system prompt stable across calls — variable substitutions placed late in the prompt cache better than ones placed early (see *System prompt tuning* below).

## My agent makes too many separate replies before answering

**Symptom.** Caller asks a simple question. Before the actual answer arrives, the agent says "Let me check…" or makes a tool call that wasn't needed.

**Why.** Most often this is the agent running a tool — like `get_user` — on every turn because the prompt encourages it. Each tool call adds a model round-trip plus the tool's own latency.

**Fix.** In the **Tools** tab, review which tools are attached:

* Disable tools the agent doesn't need *on every turn*. `get_user`, calendar lookups, and CRM fetches are best invoked only when the caller asks something that genuinely requires them.
* Tighten your prompt: instead of "use get\_user to personalize replies", say "only call get\_user when the caller asks about their account, balance, or order".
* For widget agents where you already have the caller's name via URL parameters, you usually don't need `get_user` at all — the name is injected as a `{{name}}` variable.

## How do I optimize my system prompt for caching and speed?

The model providers we use (Groq, OpenAI) automatically cache the unchanged prefix of your prompt across calls. Anything you can keep identical between calls runs faster.

**Best practices:**

* **Place variable substitutions like `{{name}}` near the end** of your system prompt, not the top. A variable at the top breaks the cache for every call (since the value changes per caller); a variable at the bottom keeps most of the prompt cacheable.
* **Avoid time-of-day variables** like `{{current_time}}` in the body of the prompt unless your behavior genuinely needs them. They change every call and prevent any cache reuse.
* **Don't restate the same instructions** in multiple places. Each duplicate makes the prompt longer without adding signal.
* **Keep tools list stable** across agents of the same persona. The tool definitions are part of the cacheable prefix.

**Before / after:**

❌ **Variable at top (cache miss every call):**

```
You are Priya, support agent.
You are talking to {{name}}.

Role: Help users with...
(2,500 chars of static rules below)
```

✅ **Variable at bottom (cache hits across all calls):**

```
You are Priya, support agent.

Role: Help users with...
(2,500 chars of static rules)

Caller: {{name}}.
```

## How do I check what knowledge the agent actually retrieved for a question?

Two ways:

* **During development**: Use the **Test Voice Call** button on your agent's page. Each turn's retrieved knowledge passages are surfaced in the call log alongside the agent's reply.
* **After the call**: Open **Analytics → Call Recordings**, click any call, then the **Turn Details** panel. Each turn shows the user's transcript, the chunks the agent retrieved, the agent's reply, and the per-component latency breakdown.

If the retrieved chunks don't actually contain the answer to the caller's question, the knowledge base is the bottleneck — re-ingest the source, check for stale URLs, or split long documents into smaller focused pages.

## Quick decision tree

| Problem                                   | First thing to try                                       |
| ----------------------------------------- | -------------------------------------------------------- |
| 3-4s pause before agent speaks            | Drop Max Endpointing Delay in Detection tab              |
| Agent invents details                     | Put facts in system prompt; add anti-fabrication rule    |
| Wrong answers in Hindi/regional languages | Add "translate to English before searching KB" to prompt |
| First reply slow                          | Enable RAG Prefetch (Detection → Preemption)             |
| Too many "let me check" delays            | Remove unused tools from the Tools tab                   |
| All replies slow                          | Check Latency Preset; switch to Balanced or Aggressive   |
| Hallucinating after KB changes            | Re-test the failing query in Knowledge → Test            |

## Still stuck?

If you've tried the relevant fix above and the issue persists, capture:

1. The agent's public ID (e.g., `ag_xxxxxxxx`)
2. The specific call ID from Analytics
3. A 1-2 sentence description of expected vs actual behavior

Send those to support — call-level recordings + per-turn diagnostics let us identify the root cause without re-creating the issue.
