Service Boundaries
Exoscale SLA guarantees 99.95% uptime for its services, ensuring high reliability and availability, with transparent incident reporting and compensation policies.
Exoscale locations where Dedicated Inference is available.
Dedicated Inference operates with the following constraints:
- Safetensors Model File Format
- Model weights must be in the
safetensorsformat. GGUF and other formats are not supported. - Customer-managed sizing
- Picking a GPU type and count is model-dependent and use-case dependent. As such, it is up to you to size your inference deployments.
- GPU Count Immutability
- The
--gpu-countparameter cannot be changed after deployment. To use a different GPU count, create a new deployment.
Note
Integration on the web portal is pending. During the preview phase, Dedicated Inference is available via CLI and API SDKs only.
Last updated on