Dedicated Inference
Deploy your first AI model as a managed inference endpoint on Exoscale. Create a model, launch a deployment on dedicated GPUs, and start serving requests via an OpenAI-compatible API.
Exoscale Dedicated Inference provides a sovereign AI infrastructure and managed inference services to run, deploy, and scale AI models on dedicated GPUs without operational complexity.
Step-by-step guides for common Dedicated Inference tasks including deploying gated models, monitoring deployments, optimizing costs, and configuring advanced features.
Service boundaries: quotas, limits, and the guaranteed service levels (SLA) for this product, including key constraints to plan and operate reliably.
Discover API and CLI Instances documentation with comprehensive guides for performing all instance operations and commands.
Informational, concise guides showing how to integrate with external tools and solve specific scenarios.
Last updated on