Service Boundaries

Dedicated Inference operates with the following constraints:

Safetensors Model File Format: Model weights must be in the safetensors format. GGUF and other formats are not supported.
Customer-managed sizing: Picking a GPU type and count is model-dependent and use-case dependent. As such, it is up to you to size your inference deployments.
GPU Count Immutability: The --gpu-count parameter cannot be changed after deployment. To use a different GPU count, create a new deployment.

Last updated on May 6, 2026