How-To
Step-by-step guides to configure the service and use specific features, with clear instructions for common tasks and problem solving.
Manage Dedicated Inference models and deployments with exo CLI commands to create, scale, inspect, and delete resources.
Deploy Hugging Face gated models with Dedicated Inference by accepting licenses, using access tokens, and creating the model.
Understand trusted model providers, remote code execution, and the process for requesting additional providers.
Monitor Dedicated Inference deployments, inspect logs, and fix common issues with health checks, errors, and performance tips.
Reduce Dedicated Inference spending with right-sized GPUs, autoscaling, scale-to-zero, and model lifecycle best practices.
Scale Dedicated Inference deployments up, down, or to zero to match traffic, control costs, and preserve endpoints.
Improve inference latency and throughput with context tuning, quantization, KV-cache optimization, and speculative decoding.
Deploy LightOnOCR on Dedicated Inference and extract text from images with an OpenAI-compatible OCR API endpoint.
Last updated on