Skip to content

Run OCR

This guide deploys LightOnOCR-2-1B on Dedicated Inference and uses it to pull text out of images: invoices, receipts, forms, and scanned documents.

LightOnOCR-2-1B is a small multilingual vision-language model built for OCR. It covers 11 languages and handles tables, multi-column layouts, and math notation.

Prerequisites

Set these variables for the examples:

export ZONE="at-vie-2"
export MODEL="lightonai/LightOnOCR-2-1B"

Step 1. Create the Model

Register the model in your zone:

exo dedicated-inference model create "$MODEL" -z "$ZONE"

Wait until it shows created:

exo dedicated-inference model list -z "$ZONE"

Step 2. Deploy the Model

LightOnOCR-2-1B is a 1B-parameter model, about 1.9 GiB, so one A5000 GPU is plenty.

exo dedicated-inference deployment create ocr \
  --model-name "$MODEL" \
  --gpu-type gpua5000 \
  --gpu-count 1 \
  --replicas 1 \
  --inference-engine-version 0.15.1 \
  -z "$ZONE"

Wait until the deployment is ready, usually 2 to 3 minutes:

exo dedicated-inference deployment show ocr -z "$ZONE"

Step 3. Get the Endpoint and Key

exo dedicated-inference deployment show ocr -z "$ZONE"
exo dedicated-inference deployment reveal-api-key ocr -z "$ZONE"

Export both for the next steps:

export ENDPOINT_URL="https://<your-deployment-id>.inference.at-vie-2.exoscale-cloud.com/v1"
export API_KEY="<your-api-key>"

Step 4. Run OCR on an Image

The model uses the OpenAI-compatible chat completions API. Pass the image as a base64 data URI. You don’t need a text prompt. The model extracts all text from the image on its own.

Encode the image and build the request file:

BASE64_IMG=$(base64 -i document.png)

cat > request.json << EOF
{
  "model": "lightonai/LightOnOCR-2-1B",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": { "url": "data:image/png;base64,${BASE64_IMG}" }
        }
      ]
    }
  ],
  "max_tokens": 4096,
  "temperature": 0.2,
  "top_p": 0.9
}
EOF

Pass the body as a file with -d @request.json. Base64 image data inline tends to break shell escaping.

curl -X POST "$ENDPOINT_URL/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d @request.json

For an invoice, the model returns the text with tables as HTML:

{
  "choices": [
    {
      "message": {
        "content": "INVOICE #12345\nDate: 2026-03-26\n\n<table>\n  <tr><th>Item</th><th>Qty</th><th>Price</th></tr>\n  <tr><td>Widget A</td><td>2</td><td>19.99</td></tr>\n</table>\n\nTotal: 272.69"
      }
    }
  ]
}

Get Better Results

Image quality drives accuracy more than anything else:

  • Render PDFs at 200 DPI.
  • Aim for about 1540 pixels on the longest side.
  • Keep the original aspect ratio so text geometry stays intact.
  • PNG and JPEG both work. Use JPEG to keep large documents smaller.

Supported languages: English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, and Korean.

Clean Up

Scale to zero to stop GPU billing but keep the endpoint and key:

exo dedicated-inference deployment scale ocr 0 -z "$ZONE"

Delete the deployment when you’re done with the endpoint:

exo dedicated-inference deployment delete ocr -z "$ZONE"

Then remove the model files from Object Storage if you no longer need them:

exo dedicated-inference model delete <model-id> -z "$ZONE"

Next Steps

Last updated on