How-To
Find practical guides for deploying models, scaling inference endpoints, troubleshooting issues and optimizing performance or costs.
Manage Dedicated Inference models and deployments with the Exoscale CLI.
Import gated or private Hugging Face models with a read token.
Understand trusted model providers, remote code execution, and the process for requesting additional providers.
Check deployment state, inspect logs, and diagnose common Dedicated Inference issues.
Control Dedicated Inference costs with right-sized GPUs, replica scaling, scale-to-zero, and cleanup.
Improve inference latency and throughput with context tuning, quantization, KV-cache optimization, and speculative decoding.
Deploy LightOnOCR on Dedicated Inference and extract text from images with an OpenAI-compatible endpoint.
Last updated on