# Service Boundaries

{{< auto-cards >}}

{{< cards cols="2" >}}
  {{< card link="https://www.exoscale.com/pricing/#inference"
  icon="currency-euro"
  title="Pricing ↗︎" >}}
{{< /cards >}}

Dedicated Inference operates with the following constraints:

Safetensors Model File Format
: Model weights must be in the `safetensors` format. GGUF and other formats are not supported.

Customer-managed sizing
: Picking a GPU type and count is model-dependent and use-case dependent. As
such, it is up to you to size your inference deployments.

GPU Count Immutability
: The `--gpu-count` parameter cannot be changed after deployment. To use a different GPU count, create a new deployment.

> [!NOTE]
> Integration on the web portal is pending. During the preview phase,
> Dedicated Inference is available via CLI and API SDKs only.

