Run OCR
This guide deploys LightOnOCR-2-1B on Dedicated Inference and uses it to pull text out of images: invoices, receipts, forms, and scanned documents.
LightOnOCR-2-1B is a small multilingual vision-language model built for OCR. It covers 11 languages and handles tables, multi-column layouts, and math notation.
Prerequisites
- Exoscale CLI (
exo) installed and configured - an API key with
computeandaiaccess - enough GPU quota in your organization
Set these variables for the examples:
export ZONE="at-vie-2"
export MODEL="lightonai/LightOnOCR-2-1B"Step 1. Create the Model
Register the model in your zone:
exo dedicated-inference model create "$MODEL" -z "$ZONE"Wait until it shows created:
exo dedicated-inference model list -z "$ZONE"Step 2. Deploy the Model
LightOnOCR-2-1B is a 1B-parameter model, about 1.9 GiB, so one A5000 GPU is plenty.
exo dedicated-inference deployment create ocr \
--model-name "$MODEL" \
--gpu-type gpua5000 \
--gpu-count 1 \
--replicas 1 \
--inference-engine-version 0.15.1 \
-z "$ZONE"Wait until the deployment is ready, usually 2 to 3 minutes:
exo dedicated-inference deployment show ocr -z "$ZONE"Step 3. Get the Endpoint and Key
exo dedicated-inference deployment show ocr -z "$ZONE"
exo dedicated-inference deployment reveal-api-key ocr -z "$ZONE"Export both for the next steps:
export ENDPOINT_URL="https://<your-deployment-id>.inference.at-vie-2.exoscale-cloud.com/v1"
export API_KEY="<your-api-key>"Step 4. Run OCR on an Image
The model uses the OpenAI-compatible chat completions API. Pass the image as a base64 data URI. You don’t need a text prompt. The model extracts all text from the image on its own.
Encode the image and build the request file:
BASE64_IMG=$(base64 -i document.png)
cat > request.json << EOF
{
"model": "lightonai/LightOnOCR-2-1B",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": { "url": "data:image/png;base64,${BASE64_IMG}" }
}
]
}
],
"max_tokens": 4096,
"temperature": 0.2,
"top_p": 0.9
}
EOFPass the body as a file with -d @request.json. Base64 image data inline tends to break shell escaping.
curl -X POST "$ENDPOINT_URL/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d @request.jsonFor an invoice, the model returns the text with tables as HTML:
{
"choices": [
{
"message": {
"content": "INVOICE #12345\nDate: 2026-03-26\n\n<table>\n <tr><th>Item</th><th>Qty</th><th>Price</th></tr>\n <tr><td>Widget A</td><td>2</td><td>19.99</td></tr>\n</table>\n\nTotal: 272.69"
}
}
]
}Get Better Results
Image quality drives accuracy more than anything else:
- Render PDFs at 200 DPI.
- Aim for about 1540 pixels on the longest side.
- Keep the original aspect ratio so text geometry stays intact.
- PNG and JPEG both work. Use JPEG to keep large documents smaller.
Supported languages: English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, and Korean.
Clean Up
Scale to zero to stop GPU billing but keep the endpoint and key:
exo dedicated-inference deployment scale ocr 0 -z "$ZONE"Delete the deployment when you’re done with the endpoint:
exo dedicated-inference deployment delete ocr -z "$ZONE"Then remove the model files from Object Storage if you no longer need them:
exo dedicated-inference model delete <model-id> -z "$ZONE"