OCR Knowledge Source
Extract text from images and scanned documents using Sarvam Vision OCR with support for 23 Indian languages.
OCR Knowledge Source
Turn images and scanned documents into searchable knowledge. Upload a scanned PDF, photograph of a form, or image with text — thinnestAI extracts the content via OCR and adds it to your agent's knowledge base.
Supported Providers
| Provider | Best For | Languages | API Key | Cost |
|---|---|---|---|---|
| Sarvam Vision (default) | Indian languages, complex layouts | 23 (all 22 official Indian languages + English) | Platform-managed | Included |
| Google Vision | Enterprise, multilingual | 50+ | User-provided | Per-request |
| AWS Textract | English forms and tables | English-focused | User-provided | Per-page |
Sarvam Vision — Indian Language Specialist
Sarvam Vision is purpose-built for Indian documents. It achieves 84.3% accuracy on the olmOCR benchmark — outperforming Google Gemini and GPT on Indian language text.
Sarvam is the default provider and requires no setup from users — the API key is managed by the platform.
Supported Indian languages: Hindi, Tamil, Telugu, Bengali, Kannada, Malayalam, Marathi, Gujarati, Punjabi, Odia, Assamese, Urdu, Sanskrit, Nepali, Sindhi, Kashmiri, Dogri, Manipuri, Santali, Konkani, Maithili, Bodo.
Supported File Types
| Type | Extensions |
|---|---|
| Images | PNG, JPG, JPEG, TIFF, BMP, WebP, GIF |
| Documents | PDF (scanned/image-based) |
Maximum file size: 10 MB. PDFs are automatically split into pages (up to 50 by default).
Setup
From the Dashboard
- Go to your agent's Knowledge tab.
- Click Add Source and select OCR / Scanned Docs.
- Upload your image or PDF.
- Select languages for the document.
- Click Process — the extracted text is added to the knowledge base.
Via the API
# Using Sarvam Vision (default — no API key needed from user)
curl -X POST https://api.thinnest.ai/api/knowledge/add-ocr \
-H "Authorization: Bearer $THINNESTAI_API_KEY" \
-F "file=@scanned_document.pdf" \
-F 'config={"languages": ["en"]}'# Sarvam Vision for Hindi + English documents
curl -X POST https://api.thinnest.ai/api/knowledge/add-ocr \
-H "Authorization: Bearer $THINNESTAI_API_KEY" \
-F "file=@hindi_form.jpg" \
-F 'config={
"name": "Hindi Application Form",
"languages": ["hi", "en"]
}'# Using Google Vision (requires your own API key)
curl -X POST https://api.thinnest.ai/api/knowledge/add-ocr \
-H "Authorization: Bearer $THINNESTAI_API_KEY" \
-F "file=@receipt.png" \
-F 'config={
"provider": "google_vision",
"api_key": "your-google-api-key",
"languages": ["en"]
}'Configuration
| Field | Type | Default | Description |
|---|---|---|---|
provider | string | "sarvam" | OCR provider to use |
api_key | string | "" | API key (only needed for Google Vision / AWS Textract) |
languages | string[] | ["en"] | Language codes for OCR (e.g., ["hi", "en", "ta"]) |
max_pages | integer | 50 | Maximum pages to process for PDFs |
name | string | filename | Display name for the knowledge source |
Language Codes
Use ISO 639-1 codes for the languages parameter:
| Code | Language | Code | Language |
|---|---|---|---|
en | English | hi | Hindi |
ta | Tamil | te | Telugu |
bn | Bengali | kn | Kannada |
ml | Malayalam | mr | Marathi |
gu | Gujarati | pa | Punjabi |
or | Odia | as | Assamese |
ur | Urdu | sa | Sanskrit |
ne | Nepali | sd | Sindhi |
Sarvam supports all 22 official Indian languages.
Response
{
"status": "success",
"source_id": 42,
"chunks": 8,
"pages": 3,
"avg_confidence": 0.873,
"provider": "sarvam",
"result": {
"chunks": 8,
"token_count": 2450,
"char_count": 9800,
"size": "9.6 KB",
"pages": 3,
"avg_confidence": 0.873,
"provider": "sarvam"
},
"embedding_tokens": 2450,
"tokens_deducted": 0.0002,
"balance_remaining": 4.99
}Provider Setup Guides
Sarvam Vision (Default)
No setup needed — the Sarvam API key is managed by the platform. Just upload your file and select languages.
Google Cloud Vision
- Enable the Vision API in your Google Cloud Console.
- Create an API key or service account.
- Pass the API key in the
api_keyconfig field.
AWS Textract
- Set up AWS credentials (
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYenvironment variables). - Or pass credentials as JSON in
api_key:
{
"access_key": "AKIA...",
"secret_key": "...",
"region": "us-east-1"
}Pricing
OCR processing cost is absorbed by the platform for the default Sarvam provider. You are only billed for the embedding tokens used to index the extracted text (same as any other knowledge source).
For Google Vision and AWS Textract, API costs are billed directly by the respective cloud providers using your own API keys.
Best Practices
- Choose the right provider for your language — For Indian languages, Sarvam Vision significantly outperforms general-purpose OCR.
- Use high-resolution scans — 300 DPI produces the best results. Low-quality images reduce accuracy.
- Specify all relevant languages — If a document contains multiple languages (e.g., Hindi + English), pass both codes.
- Check confidence scores — The
avg_confidencein the response indicates OCR quality. Below 0.6 suggests the image quality may be too low. - For PDFs, use
max_pages— Large PDFs can be expensive with cloud providers. Set a reasonable limit. - Combine with text sources — Use OCR for scanned documents and regular file upload for digital PDFs. Digital PDFs already have extractable text and don't need OCR.
Next Steps
- Knowledge Sources — Browse all available knowledge source types
- Supported Formats — File types and size limits
- Agent Learning — Teach your agent from feedback and corrections