Knowledge Bases

OCR Knowledge Source

Extract text from images and scanned documents using Sarvam Vision OCR with support for 23 Indian languages.

OCR Knowledge Source

Turn images and scanned documents into searchable knowledge. Upload a scanned PDF, photograph of a form, or image with text — thinnestAI extracts the content via OCR and adds it to your agent's knowledge base.

Supported Providers

ProviderBest ForLanguagesAPI KeyCost
Sarvam Vision (default)Indian languages, complex layouts23 (all 22 official Indian languages + English)Platform-managedIncluded
Google VisionEnterprise, multilingual50+User-providedPer-request
AWS TextractEnglish forms and tablesEnglish-focusedUser-providedPer-page

Sarvam Vision — Indian Language Specialist

Sarvam Vision is purpose-built for Indian documents. It achieves 84.3% accuracy on the olmOCR benchmark — outperforming Google Gemini and GPT on Indian language text.

Sarvam is the default provider and requires no setup from users — the API key is managed by the platform.

Supported Indian languages: Hindi, Tamil, Telugu, Bengali, Kannada, Malayalam, Marathi, Gujarati, Punjabi, Odia, Assamese, Urdu, Sanskrit, Nepali, Sindhi, Kashmiri, Dogri, Manipuri, Santali, Konkani, Maithili, Bodo.

Supported File Types

TypeExtensions
ImagesPNG, JPG, JPEG, TIFF, BMP, WebP, GIF
DocumentsPDF (scanned/image-based)

Maximum file size: 10 MB. PDFs are automatically split into pages (up to 50 by default).

Setup

From the Dashboard

  1. Go to your agent's Knowledge tab.
  2. Click Add Source and select OCR / Scanned Docs.
  3. Upload your image or PDF.
  4. Select languages for the document.
  5. Click Process — the extracted text is added to the knowledge base.

Via the API

# Using Sarvam Vision (default — no API key needed from user)
curl -X POST https://api.thinnest.ai/api/knowledge/add-ocr \
  -H "Authorization: Bearer $THINNESTAI_API_KEY" \
  -F "file=@scanned_document.pdf" \
  -F 'config={"languages": ["en"]}'
# Sarvam Vision for Hindi + English documents
curl -X POST https://api.thinnest.ai/api/knowledge/add-ocr \
  -H "Authorization: Bearer $THINNESTAI_API_KEY" \
  -F "file=@hindi_form.jpg" \
  -F 'config={
    "name": "Hindi Application Form",
    "languages": ["hi", "en"]
  }'
# Using Google Vision (requires your own API key)
curl -X POST https://api.thinnest.ai/api/knowledge/add-ocr \
  -H "Authorization: Bearer $THINNESTAI_API_KEY" \
  -F "file=@receipt.png" \
  -F 'config={
    "provider": "google_vision",
    "api_key": "your-google-api-key",
    "languages": ["en"]
  }'

Configuration

FieldTypeDefaultDescription
providerstring"sarvam"OCR provider to use
api_keystring""API key (only needed for Google Vision / AWS Textract)
languagesstring[]["en"]Language codes for OCR (e.g., ["hi", "en", "ta"])
max_pagesinteger50Maximum pages to process for PDFs
namestringfilenameDisplay name for the knowledge source

Language Codes

Use ISO 639-1 codes for the languages parameter:

CodeLanguageCodeLanguage
enEnglishhiHindi
taTamilteTelugu
bnBengaliknKannada
mlMalayalammrMarathi
guGujaratipaPunjabi
orOdiaasAssamese
urUrdusaSanskrit
neNepalisdSindhi

Sarvam supports all 22 official Indian languages.

Response

{
  "status": "success",
  "source_id": 42,
  "chunks": 8,
  "pages": 3,
  "avg_confidence": 0.873,
  "provider": "sarvam",
  "result": {
    "chunks": 8,
    "token_count": 2450,
    "char_count": 9800,
    "size": "9.6 KB",
    "pages": 3,
    "avg_confidence": 0.873,
    "provider": "sarvam"
  },
  "embedding_tokens": 2450,
  "tokens_deducted": 0.0002,
  "balance_remaining": 4.99
}

Provider Setup Guides

Sarvam Vision (Default)

No setup needed — the Sarvam API key is managed by the platform. Just upload your file and select languages.

Google Cloud Vision

  1. Enable the Vision API in your Google Cloud Console.
  2. Create an API key or service account.
  3. Pass the API key in the api_key config field.

AWS Textract

  1. Set up AWS credentials (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables).
  2. Or pass credentials as JSON in api_key:
{
  "access_key": "AKIA...",
  "secret_key": "...",
  "region": "us-east-1"
}

Pricing

OCR processing cost is absorbed by the platform for the default Sarvam provider. You are only billed for the embedding tokens used to index the extracted text (same as any other knowledge source).

For Google Vision and AWS Textract, API costs are billed directly by the respective cloud providers using your own API keys.

Best Practices

  • Choose the right provider for your language — For Indian languages, Sarvam Vision significantly outperforms general-purpose OCR.
  • Use high-resolution scans — 300 DPI produces the best results. Low-quality images reduce accuracy.
  • Specify all relevant languages — If a document contains multiple languages (e.g., Hindi + English), pass both codes.
  • Check confidence scores — The avg_confidence in the response indicates OCR quality. Below 0.6 suggests the image quality may be too low.
  • For PDFs, use max_pages — Large PDFs can be expensive with cloud providers. Set a reasonable limit.
  • Combine with text sources — Use OCR for scanned documents and regular file upload for digital PDFs. Digital PDFs already have extractable text and don't need OCR.

Next Steps

On this page