Transcribe Raw PCM

POST/api/stt/transcribe/raw

Transcribe raw PCM audio bytes with minimal latency. Designed for real-time voice agent pipelines.

No parameters for this endpoint.

Overview

The /raw endpoint is optimized for voice call pipelines where audio is already available as raw PCM bytes. It skips WAV encoding/decoding for the lowest possible latency.

Use this endpoint when:

Building real-time voice agents (LiveKit, Twilio)
Audio is already in raw PCM format
Every millisecond of latency matters

Use /transcribe instead when:

Uploading audio files (WAV, MP3, FLAC)
Need diarization, sentiment, or timestamps
Working with encoded audio formats

Request

Send raw PCM bytes as the request body with query parameters.

Parameter	Type	Required	Default	Description
`language`	string	No	`hi`	Language code.
`model`	string	No	`vega`	STT engine: `vega` or `dhara`.
`sample_rate`	int	No	`48000`	Audio sample rate in Hz (e.g. 16000, 44100, 48000). Auto-detects telephony (8kHz).
`punctuate`	string	No	`false`	Add punctuation.
`kenlm`	string	No	`true`	Use KenLM accuracy boost.
`enhance`	string	No	`true`	Auto Gain Control — normalize volume for speakerphone/far-field.
`denoise`	string	No	`false`	DeepFilterNet3 noise suppression for noisy environments.
`keywords`	string	No	—	Comma-separated vocabulary boosting terms.

Audio format: 16-bit signed little-endian PCM, mono.

Response

{
  "text": "हैलो आपका नाम क्या है",
  "language": "hi",
  "duration_seconds": 2.1,
  "processing_ms": 98.3,
  "model": "vega",
  "provider": "thinnestai"
}

Example

# Transcribe 16kHz raw PCM
curl -X POST "https://api.thinnest.ai/api/stt/transcribe/raw?language=hi&sample_rate=16000&model=vega" \
  -H "Authorization: Bearer thns_sk_your_key_here" \
  -H "Content-Type: application/octet-stream" \
  --data-binary @audio.raw

import httpx

with open("audio.raw", "rb") as f:
    pcm_bytes = f.read()

resp = httpx.post(
    "https://api.thinnest.ai/api/stt/transcribe/raw",
    params={"language": "hi", "model": "vega", "sample_rate": 16000},
    headers={"Authorization": "Bearer thns_sk_your_key_here"},
    content=pcm_bytes,
)
print(resp.json()["text"])

Latency Comparison

Endpoint	Vega Latency	Notes
`/transcribe` (WAV)	~370ms	Includes WAV decode overhead
`/transcribe/raw` (PCM)	~370ms	Skips WAV decode, optimal for pipelines

The latency difference is small (~3-5ms) because the inference dominates. The real benefit is avoiding WAV construction on the client side.

Overview

Request

Response

Example

Latency Comparison

On this page