Run OCR

This guide shows how to deploy the LightOnOCR-2-1B model on Exoscale Dedicated Inference and use it to extract text from images such as invoices, receipts, forms, and scanned documents.

LightOnOCR-2-1B is a lightweight multilingual vision-language model purpose-built for OCR. It supports 11 languages and handles tables, multi-column layouts, and math notation.

Prerequisites

Exoscale CLI (exo) installed and configured
An API key with compute and ai service access
Sufficient GPU quota in your Exoscale organization

Step 1: Create the Model

Download and register the LightOnOCR model in your zone:

exo dedicated-inference model create lightonai/LightOnOCR-2-1B -z at-vie-2

Monitor creation progress:

exo dedicated-inference model list -z at-vie-2

Wait until the status shows created before proceeding.

Step 2: Deploy the Model

Create a deployment on a single A5000 GPU. LightOnOCR-2-1B is a 1B-parameter model (roughly 1.9 GiB) and fits comfortably on a single GPU.

Important: Use inference engine version 0.15.1. Later versions of vLLM have a regression that prevents multimodal image requests from reaching the model.

exo dedicated-inference deployment create ocr \
  --model-name lightonai/LightOnOCR-2-1B \
  --gpu-type gpua5000 \
  --gpu-count 1 \
  --replicas 1 \
  --inference-engine-version 0.15.1 \
  -z at-vie-2

Monitor the deployment:

exo dedicated-inference deployment show ocr -z at-vie-2

Wait until the status is ready (typically 2-3 minutes).

Step 3: Get the Endpoint URL and API Key

exo dedicated-inference deployment show ocr -z at-vie-2
exo dedicated-inference deployment reveal-api-key ocr -z at-vie-2

Export them for use in subsequent commands:

export ENDPOINT_URL="https://<your-deployment-id>.inference.at-vie-2.exoscale-cloud.com/v1"
export API_KEY="<your-api-key>"

Step 4: Run OCR on an Image

LightOnOCR uses the OpenAI-compatible chat completions API. Pass the image as a base64-encoded data URI. The model does not require a text prompt — it automatically extracts all text from the provided image.

Prepare the Request

Encode the image and build a JSON request file:

BASE64_IMG=$(base64 -i document.png)

cat > request.json << EOF
{
  "model": "lightonai/LightOnOCR-2-1B",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,${BASE64_IMG}"
          }
        }
      ]
    }
  ],
  "max_tokens": 4096,
  "temperature": 0.2,
  "top_p": 0.9
}
EOF

Note: The request body must be passed as a file (-d @request.json) rather than inline, because base64 image data can interfere with shell escaping.

Send the Request

curl -X POST "$ENDPOINT_URL/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d @request.json

Example Response

For an invoice image, the model returns structured text with tables formatted as HTML:

{
  "choices": [
    {
      "message": {
        "content": "INVOICE #12345\nDate: 2026-03-26\n\n<table>\n  <tr>\n    <th>Item</th>\n    <th>Qty</th>\n    <th>Price</th>\n  </tr>\n  <tr>\n    <td>Widget A</td>\n    <td>2</td>\n    <td>19.99</td>\n  </tr>\n</table>\n\nTotal: 272.69"
      }
    }
  ]
}

Image Recommendations

For best OCR accuracy:

Resolution: Render PDFs at 200 DPI
Size: Target the longest dimension at around 1540 pixels
Aspect ratio: Maintain the original aspect ratio to preserve text geometry
Format: PNG or JPEG both work; use JPEG for smaller payloads on large documents

Supported Languages

LightOnOCR-2-1B supports: English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, and Korean.

Clean Up

Scale to zero to stop GPU billing while preserving the endpoint:

exo dedicated-inference deployment scale ocr 0 -z at-vie-2

Or delete the deployment entirely:

exo dedicated-inference deployment delete ocr -z at-vie-2

To also remove the model from Object Storage:

exo dedicated-inference model delete <model-id> -z at-vie-2

Next Steps

Last updated on March 31, 2026