Speech-to-Text
Transcribe Raw PCM
Ultra-fast raw PCM transcription for voice call pipelines.
POST
/api/stt/transcribe/rawTranscribe raw PCM audio bytes with minimal latency. Designed for real-time voice agent pipelines.
No parameters for this endpoint.
Overview
The /raw endpoint is optimized for voice call pipelines where audio is already available as raw PCM bytes. It skips WAV encoding/decoding for the lowest possible latency.
Use this endpoint when:
- Building real-time voice agents (LiveKit, Twilio)
- Audio is already in raw PCM format
- Every millisecond of latency matters
Use /transcribe instead when:
- Uploading audio files (WAV, MP3, FLAC)
- Need diarization, sentiment, or timestamps
- Working with encoded audio formats
Request
Send raw PCM bytes as the request body with query parameters.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
language | string | No | hi | Language code. |
model | string | No | vega | STT engine: vega or dhara. |
sample_rate | int | No | 48000 | Audio sample rate in Hz (e.g. 16000, 44100, 48000). Auto-detects telephony (8kHz). |
punctuate | string | No | false | Add punctuation. |
kenlm | string | No | true | Use KenLM accuracy boost. |
enhance | string | No | true | Auto Gain Control — normalize volume for speakerphone/far-field. |
denoise | string | No | false | DeepFilterNet3 noise suppression for noisy environments. |
keywords | string | No | — | Comma-separated vocabulary boosting terms. |
Audio format: 16-bit signed little-endian PCM, mono.
Response
{
"text": "हैलो आपका नाम क्या है",
"language": "hi",
"duration_seconds": 2.1,
"processing_ms": 98.3,
"model": "vega",
"provider": "thinnestai"
}Example
# Transcribe 16kHz raw PCM
curl -X POST "https://api.thinnest.ai/api/stt/transcribe/raw?language=hi&sample_rate=16000&model=vega" \
-H "Authorization: Bearer thns_sk_your_key_here" \
-H "Content-Type: application/octet-stream" \
--data-binary @audio.rawimport httpx
with open("audio.raw", "rb") as f:
pcm_bytes = f.read()
resp = httpx.post(
"https://api.thinnest.ai/api/stt/transcribe/raw",
params={"language": "hi", "model": "vega", "sample_rate": 16000},
headers={"Authorization": "Bearer thns_sk_your_key_here"},
content=pcm_bytes,
)
print(resp.json()["text"])Latency Comparison
| Endpoint | Vega Latency | Notes |
|---|---|---|
/transcribe (WAV) | ~370ms | Includes WAV decode overhead |
/transcribe/raw (PCM) | ~370ms | Skips WAV decode, optimal for pipelines |
The latency difference is small (~3-5ms) because the inference dominates. The real benefit is avoiding WAV construction on the client side.