Skip to content

Limits and Quotas

Limits

Limits depend on the GPU type you want to use for your specific deployment. Some GPU categories require additional compliance requirements, including the GPU End-User Certificate.

Quotas

UsageQuota
quotas= GPU Quotas

Checking for GPU capacity and authorizations

On top of the above:

  • You need to sign the GPU End User Certificate to be authorized to deploy models on RTX Pro 6000 GPUs
  • Dedicated Inference may be capacity-constrained on certain offerings

You can use the exo dedicated-inference deployment instance-type single command which allows you to know what GPU you may deploy in any given zone:

┼───────────────┼────────────┼──────────┼
│    FAMILY     │ AUTHORIZED │   ZONE   │
┼───────────────┼────────────┼──────────┼
│ gpu3080ti     │ true       │ at-vie-2 │
│ gpua5000      │ true       │ at-vie-2 │
│ gpurtx6000pro │ false      │ ch-dk-2  │
│ gpu3          │ true       │ de-fra-1 │
│ gpurtx6000pro │ true       │ de-fra-1 │
│ gpurtx6000pro │ true       │ hr-zag-1 │
┼───────────────┼────────────┼──────────┼

Additional Constraints

Safetensors Model File Format
Model weights must be in the safetensors format. GGUF and other formats are not supported.
Customer-managed sizing
Picking a GPU type and count is model-dependent and use-case dependent. As such, it is up to you to size your inference deployments.
GPU Count Immutability
The --gpu-count parameter cannot be changed after deployment. To use a different GPU count, create a new deployment.
Additional Runtime Dependencies
Models requiring additional Python packages, custom decoding logic, custom logits processors, or other runtime dependencies beyond the standard inference runtime are not currently supported. As a result, some models may remain unsupported even if their provider is approved for trust-remote-code. Check Trusted Model Providers for information about trusted model providers, remote code execution, and the provider review process.

Availability

GPU availability varies by zone and GPU type. See GPU availability by zone for the current GPU-by-zone matrix.

ZoneCountryCityAvailability
at-vie-1AustriaVienna
at-vie-2AustriaVienna
bg-sof-1BulgariaSofia
ch-dk-2SwitzerlandZurich
ch-gva-2SwitzerlandGeneva
de-fra-1GermanyFrankfurt
de-muc-1GermanyMünich
hr-zag-1CroatiaZagreb
Last updated on