Streaming Transcription

WebSocket Endpoint

wss://api.thinnest.ai/api/stt/transcribe/stream

Real-time audio streaming with VAD-optimized interim results every 500ms.

Protocol

1. Connect and Configure

Send a JSON config message after connecting:

{
  "language": "hi",
  "sample_rate": 16000,
  "interim_results": true,
  "utterance_end_ms": 800,
  "keywords": "ThinnestAI,HDFC,Mumbai",
  "token": "thns_sk_your_key_here"
}

Field	Type	Default	Description
`language`	string	`hi`	Language code.
`sample_rate`	int	`16000`	Audio sample rate.
`interim_results`	boolean	`true`	Send partial results every 500ms.
`utterance_end_ms`	int	`800`	Silence duration (ms) to trigger final result.
`keywords`	string	—	Comma-separated vocabulary boosting terms.
`token`	string	—	API key for authentication.

2. Server Confirms

{ "status": "ready", "model": "vega" }

3. Send Audio

Send binary frames of 16-bit mono PCM audio at the configured sample rate.

4. Receive Transcriptions

{
  "text": "नमस्ते मेरा",
  "is_final": false,
  "model": "vega"
}

Field	Type	Description
`text`	string	Transcribed text (partial or final).
`is_final`	boolean	`true` = utterance complete (KenLM applied). `false` = interim result.

5. Stop

Send a JSON message to end the session:

{ "action": "stop" }

Example (Python)

import asyncio
import websockets
import json

async def stream_audio():
    uri = "wss://api.thinnest.ai/api/stt/transcribe/stream"
    async with websockets.connect(uri) as ws:
        # Configure
        await ws.send(json.dumps({
            "language": "hi",
            "sample_rate": 16000,
            "interim_results": True,
            "token": "thns_sk_your_key_here"
        }))
        ready = json.loads(await ws.recv())
        print(f"Server ready: {ready}")

        # Stream audio chunks
        with open("audio.raw", "rb") as f:
            while chunk := f.read(3200):  # 100ms chunks at 16kHz
                await ws.send(chunk)
                await asyncio.sleep(0.1)

        # Stop and get final result
        await ws.send(json.dumps({"action": "stop"}))
        while True:
            msg = json.loads(await ws.recv())
            print(f"{'FINAL' if msg['is_final'] else 'interim'}: {msg['text']}")
            if msg["is_final"]:
                break

asyncio.run(stream_audio())

Latency

Metric	Value
Interim results	Every 500ms
Final result (after silence)	800ms silence threshold
KenLM accuracy boost	Applied on final results only

On this page