Speech-to-Text

Transcribe Raw PCM

Ultra-fast raw PCM transcription for voice call pipelines.

POST/api/stt/transcribe/raw

Transcribe raw PCM audio bytes with minimal latency. Designed for real-time voice agent pipelines.

No parameters for this endpoint.


Overview

The /raw endpoint is optimized for voice call pipelines where audio is already available as raw PCM bytes. It skips WAV encoding/decoding for the lowest possible latency.

Use this endpoint when:

  • Building real-time voice agents (LiveKit, Twilio)
  • Audio is already in raw PCM format
  • Every millisecond of latency matters

Use /transcribe instead when:

  • Uploading audio files (WAV, MP3, FLAC)
  • Need diarization, sentiment, or timestamps
  • Working with encoded audio formats

Request

Send raw PCM bytes as the request body with query parameters.

ParameterTypeRequiredDefaultDescription
languagestringNohiLanguage code.
modelstringNovegaSTT engine: vega or dhara.
sample_rateintNo48000Audio sample rate in Hz (e.g. 16000, 44100, 48000). Auto-detects telephony (8kHz).
punctuatestringNofalseAdd punctuation.
kenlmstringNotrueUse KenLM accuracy boost.
enhancestringNotrueAuto Gain Control — normalize volume for speakerphone/far-field.
denoisestringNofalseDeepFilterNet3 noise suppression for noisy environments.
keywordsstringNoComma-separated vocabulary boosting terms.

Audio format: 16-bit signed little-endian PCM, mono.


Response

{
  "text": "हैलो आपका नाम क्या है",
  "language": "hi",
  "duration_seconds": 2.1,
  "processing_ms": 98.3,
  "model": "vega",
  "provider": "thinnestai"
}

Example

# Transcribe 16kHz raw PCM
curl -X POST "https://api.thinnest.ai/api/stt/transcribe/raw?language=hi&sample_rate=16000&model=vega" \
  -H "Authorization: Bearer thns_sk_your_key_here" \
  -H "Content-Type: application/octet-stream" \
  --data-binary @audio.raw
import httpx

with open("audio.raw", "rb") as f:
    pcm_bytes = f.read()

resp = httpx.post(
    "https://api.thinnest.ai/api/stt/transcribe/raw",
    params={"language": "hi", "model": "vega", "sample_rate": 16000},
    headers={"Authorization": "Bearer thns_sk_your_key_here"},
    content=pcm_bytes,
)
print(resp.json()["text"])

Latency Comparison

EndpointVega LatencyNotes
/transcribe (WAV)~370msIncludes WAV decode overhead
/transcribe/raw (PCM)~370msSkips WAV decode, optimal for pipelines

The latency difference is small (~3-5ms) because the inference dominates. The real benefit is avoiding WAV construction on the client side.

On this page