OCR Knowledge Source

Turn images and scanned documents into searchable knowledge. Upload a scanned PDF, photograph of a form, or image with text — thinnestAI extracts the content via OCR and adds it to your agent’s knowledge base.

Supported Providers

Provider	Best For	Languages	API Key	Cost
Sarvam Vision (default)	Indian languages, complex layouts	23 (all 22 official Indian languages + English)	Platform-managed	Included
Google Vision	Enterprise, multilingual	50+	User-provided	Per-request
AWS Textract	English forms and tables	English-focused	User-provided	Per-page

Sarvam Vision — Indian Language Specialist

Sarvam Vision is purpose-built for Indian documents. It achieves 84.3% accuracy on the olmOCR benchmark — outperforming Google Gemini and GPT on Indian language text. Sarvam is the default provider and requires no setup from users — the API key is managed by the platform. Supported Indian languages: Hindi, Tamil, Telugu, Bengali, Kannada, Malayalam, Marathi, Gujarati, Punjabi, Odia, Assamese, Urdu, Sanskrit, Nepali, Sindhi, Kashmiri, Dogri, Manipuri, Santali, Konkani, Maithili, Bodo.

Supported File Types

Type	Extensions
Images	PNG, JPG, JPEG, TIFF, BMP, WebP, GIF
Documents	PDF (scanned/image-based)

Maximum file size: 10 MB. PDFs are automatically split into pages (up to 50 by default).

Setup

From the Dashboard

Go to your agent’s Knowledge tab.
Click Add Source and select OCR / Scanned Docs.
Upload your image or PDF.
Select languages for the document.
Click Process — the extracted text is added to the knowledge base.

Via the API

# Using Sarvam Vision (default — no API key needed from user)
curl -X POST https://api.thinnest.ai/api/knowledge/add-ocr \
  -H "Authorization: Bearer $THINNESTAI_API_KEY" \
  -F "file=@scanned_document.pdf" \
  -F 'config={"languages": ["en"]}'

# Sarvam Vision for Hindi + English documents
curl -X POST https://api.thinnest.ai/api/knowledge/add-ocr \
  -H "Authorization: Bearer $THINNESTAI_API_KEY" \
  -F "file=@hindi_form.jpg" \
  -F 'config={
    "name": "Hindi Application Form",
    "languages": ["hi", "en"]
  }'

# Using Google Vision (requires your own API key)
curl -X POST https://api.thinnest.ai/api/knowledge/add-ocr \
  -H "Authorization: Bearer $THINNESTAI_API_KEY" \
  -F "file=@receipt.png" \
  -F 'config={
    "provider": "google_vision",
    "api_key": "your-google-api-key",
    "languages": ["en"]
  }'

Configuration

Field	Type	Default	Description
`provider`	string	`"sarvam"`	OCR provider to use
`api_key`	string	`""`	API key (only needed for Google Vision / AWS Textract)
`languages`	string[]	`["en"]`	Language codes for OCR (e.g., `["hi", "en", "ta"]`)
`max_pages`	integer	`50`	Maximum pages to process for PDFs
`name`	string	filename	Display name for the knowledge source

Language Codes

Use ISO 639-1 codes for the languages parameter:

Code	Language	Code	Language
`en`	English	`hi`	Hindi
`ta`	Tamil	`te`	Telugu
`bn`	Bengali	`kn`	Kannada
`ml`	Malayalam	`mr`	Marathi
`gu`	Gujarati	`pa`	Punjabi
`or`	Odia	`as`	Assamese
`ur`	Urdu	`sa`	Sanskrit
`ne`	Nepali	`sd`	Sindhi

Sarvam supports all 22 official Indian languages.

Response

{
  "status": "success",
  "source_id": 42,
  "chunks": 8,
  "pages": 3,
  "avg_confidence": 0.873,
  "provider": "sarvam",
  "result": {
    "chunks": 8,
    "token_count": 2450,
    "char_count": 9800,
    "size": "9.6 KB",
    "pages": 3,
    "avg_confidence": 0.873,
    "provider": "sarvam"
  },
  "embedding_tokens": 2450,
  "tokens_deducted": 0.0002,
  "balance_remaining": 4.99
}

Provider Setup Guides

Sarvam Vision (Default)

No setup needed — the Sarvam API key is managed by the platform. Just upload your file and select languages.

Google Cloud Vision

Enable the Vision API in your Google Cloud Console.
Create an API key or service account.
Pass the API key in the api_key config field.

AWS Textract

Set up AWS credentials (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables).
Or pass credentials as JSON in api_key:

{
  "access_key": "AKIA...",
  "secret_key": "...",
  "region": "us-east-1"
}

Pricing

OCR processing cost is absorbed by the platform for the default Sarvam provider. You are only billed for the embedding tokens used to index the extracted text (same as any other knowledge source). For Google Vision and AWS Textract, API costs are billed directly by the respective cloud providers using your own API keys.

Best Practices

Choose the right provider for your language — For Indian languages, Sarvam Vision significantly outperforms general-purpose OCR.
Use high-resolution scans — 300 DPI produces the best results. Low-quality images reduce accuracy.
Specify all relevant languages — If a document contains multiple languages (e.g., Hindi + English), pass both codes.
Check confidence scores — The avg_confidence in the response indicates OCR quality. Below 0.6 suggests the image quality may be too low.
For PDFs, use max_pages — Large PDFs can be expensive with cloud providers. Set a reasonable limit.
Combine with text sources — Use OCR for scanned documents and regular file upload for digital PDFs. Digital PDFs already have extractable text and don’t need OCR.

Next Steps

Knowledge Sources — Browse all available knowledge source types
Supported Formats — File types and size limits
Agent Learning — Teach your agent from feedback and corrections

​OCR Knowledge Source

​Supported Providers

​Sarvam Vision — Indian Language Specialist

​Supported File Types

​Setup

​From the Dashboard

​Via the API

​Configuration

​Language Codes

​Response

​Provider Setup Guides

​Sarvam Vision (Default)

​Google Cloud Vision

​AWS Textract

​Pricing

​Best Practices

​Next Steps