Skip to content

CTRL K

Blog Changelog Documentation Learning Contact ↗ Portal

CTRL K

Blog
Changelog
Documentation
Learning
Contact ↗

Compute
Concrete AI
- Dedicated Inference
Storage
- Block Storage
- Object Storage
Networking
DBaaS
Security
- Key Management Service (KMS)
- IAM
- Audit Trails ↗
Support

Dedicated Inference

How-To

How-To

Open in ChatGPT
Open in Claude

Find practical guides for deploying models, scaling inference endpoints, troubleshooting issues and optimizing performance or costs.

Use CLI Commands

Manage Dedicated Inference models and deployments with the Exoscale CLI.

Import Gated Models

Import gated or private Hugging Face models with a read token.

Trusted model providers

Understand trusted model providers, remote code execution, and the process for requesting additional providers.

Monitor and Troubleshoot

Check deployment state, inspect logs, and diagnose common Dedicated Inference issues.

Optimize Deployment Costs

Control Dedicated Inference costs with right-sized GPUs, replica scaling, scale-to-zero, and cleanup.

Optimize Performance

Improve inference latency and throughput with context tuning, quantization, KV-cache optimization, and speculative decoding.

Deploy LightOnOCR on Dedicated Inference and extract text from images with an OpenAI-compatible endpoint.

Last updated on June 15, 2026

Sovereign by design, reliable by discipline.

Products

Compute
GPU
Kubernetes (SKS)
Object Storage
Databases

Documentation

Platform
Product docs
API & references
Tutorials
Changelog

Company

Pricing
Blog
Support
Status
Contact

AI & developers

Every docs page is available as plain Markdown — just add index.md to its URL.

llms.txt
Console

© 2026 Exoscale is a registered trademark of Akenes SA - Reg/VAT ID CHE-423.524.322 // Privacy // Terms & Conditions