Speech-to-Text

Streaming Transcription

Real-time WebSocket streaming with interim results.

WebSocket Endpoint

wss://api.thinnest.ai/api/stt/transcribe/stream

Real-time audio streaming with VAD-optimized interim results every 500ms.


Protocol

1. Connect and Configure

Send a JSON config message after connecting:

{
  "language": "hi",
  "sample_rate": 16000,
  "interim_results": true,
  "utterance_end_ms": 800,
  "keywords": "ThinnestAI,HDFC,Mumbai",
  "token": "thns_sk_your_key_here"
}
FieldTypeDefaultDescription
languagestringhiLanguage code.
sample_rateint16000Audio sample rate.
interim_resultsbooleantrueSend partial results every 500ms.
utterance_end_msint800Silence duration (ms) to trigger final result.
keywordsstringComma-separated vocabulary boosting terms.
tokenstringAPI key for authentication.

2. Server Confirms

{ "status": "ready", "model": "vega" }

3. Send Audio

Send binary frames of 16-bit mono PCM audio at the configured sample rate.

4. Receive Transcriptions

{
  "text": "नमस्ते मेरा",
  "is_final": false,
  "model": "vega"
}
FieldTypeDescription
textstringTranscribed text (partial or final).
is_finalbooleantrue = utterance complete (KenLM applied). false = interim result.

5. Stop

Send a JSON message to end the session:

{ "action": "stop" }

Example (Python)

import asyncio
import websockets
import json

async def stream_audio():
    uri = "wss://api.thinnest.ai/api/stt/transcribe/stream"
    async with websockets.connect(uri) as ws:
        # Configure
        await ws.send(json.dumps({
            "language": "hi",
            "sample_rate": 16000,
            "interim_results": True,
            "token": "thns_sk_your_key_here"
        }))
        ready = json.loads(await ws.recv())
        print(f"Server ready: {ready}")

        # Stream audio chunks
        with open("audio.raw", "rb") as f:
            while chunk := f.read(3200):  # 100ms chunks at 16kHz
                await ws.send(chunk)
                await asyncio.sleep(0.1)

        # Stop and get final result
        await ws.send(json.dumps({"action": "stop"}))
        while True:
            msg = json.loads(await ws.recv())
            print(f"{'FINAL' if msg['is_final'] else 'interim'}: {msg['text']}")
            if msg["is_final"]:
                break

asyncio.run(stream_audio())

Latency

MetricValue
Interim resultsEvery 500ms
Final result (after silence)800ms silence threshold
KenLM accuracy boostApplied on final results only

On this page