Speech-to-Text
Streaming Transcription
Real-time WebSocket streaming with interim results.
WebSocket Endpoint
wss://api.thinnest.ai/api/stt/transcribe/streamReal-time audio streaming with VAD-optimized interim results every 500ms.
Protocol
1. Connect and Configure
Send a JSON config message after connecting:
{
"language": "hi",
"sample_rate": 16000,
"interim_results": true,
"utterance_end_ms": 800,
"keywords": "ThinnestAI,HDFC,Mumbai",
"token": "thns_sk_your_key_here"
}| Field | Type | Default | Description |
|---|---|---|---|
language | string | hi | Language code. |
sample_rate | int | 16000 | Audio sample rate. |
interim_results | boolean | true | Send partial results every 500ms. |
utterance_end_ms | int | 800 | Silence duration (ms) to trigger final result. |
keywords | string | — | Comma-separated vocabulary boosting terms. |
token | string | — | API key for authentication. |
2. Server Confirms
{ "status": "ready", "model": "vega" }3. Send Audio
Send binary frames of 16-bit mono PCM audio at the configured sample rate.
4. Receive Transcriptions
{
"text": "नमस्ते मेरा",
"is_final": false,
"model": "vega"
}| Field | Type | Description |
|---|---|---|
text | string | Transcribed text (partial or final). |
is_final | boolean | true = utterance complete (KenLM applied). false = interim result. |
5. Stop
Send a JSON message to end the session:
{ "action": "stop" }Example (Python)
import asyncio
import websockets
import json
async def stream_audio():
uri = "wss://api.thinnest.ai/api/stt/transcribe/stream"
async with websockets.connect(uri) as ws:
# Configure
await ws.send(json.dumps({
"language": "hi",
"sample_rate": 16000,
"interim_results": True,
"token": "thns_sk_your_key_here"
}))
ready = json.loads(await ws.recv())
print(f"Server ready: {ready}")
# Stream audio chunks
with open("audio.raw", "rb") as f:
while chunk := f.read(3200): # 100ms chunks at 16kHz
await ws.send(chunk)
await asyncio.sleep(0.1)
# Stop and get final result
await ws.send(json.dumps({"action": "stop"}))
while True:
msg = json.loads(await ws.recv())
print(f"{'FINAL' if msg['is_final'] else 'interim'}: {msg['text']}")
if msg["is_final"]:
break
asyncio.run(stream_audio())Latency
| Metric | Value |
|---|---|
| Interim results | Every 500ms |
| Final result (after silence) | 800ms silence threshold |
| KenLM accuracy boost | Applied on final results only |