Limits and Quotas
Limits
Limits depend on the GPU type you want to use for your specific deployment. Some GPU categories require additional compliance requirements, including the GPU End-User Certificate.
Quotas
| Usage | Quota |
|---|---|
| quotas | = GPU Quotas |
Checking for GPU capacity and authorizations
On top of the above:
- You need to sign the GPU End User Certificate to be authorized to deploy models on RTX Pro 6000 GPUs
- Dedicated Inference may be capacity-constrained on certain offerings
You can use the exo dedicated-inference deployment instance-type single command which allows you to know what GPU you may deploy in any given zone:
┼───────────────┼────────────┼──────────┼
│ FAMILY │ AUTHORIZED │ ZONE │
┼───────────────┼────────────┼──────────┼
│ gpu3080ti │ true │ at-vie-2 │
│ gpua5000 │ true │ at-vie-2 │
│ gpurtx6000pro │ false │ ch-dk-2 │
│ gpu3 │ true │ de-fra-1 │
│ gpurtx6000pro │ true │ de-fra-1 │
│ gpurtx6000pro │ true │ hr-zag-1 │
┼───────────────┼────────────┼──────────┼Additional Constraints
- Safetensors Model File Format
- Model weights must be in the
safetensorsformat. GGUF and other formats are not supported. - Customer-managed sizing
- Picking a GPU type and count is model-dependent and use-case dependent. As such, it is up to you to size your inference deployments.
- GPU Count Immutability
- The
--gpu-countparameter cannot be changed after deployment. To use a different GPU count, create a new deployment. - Additional Runtime Dependencies
- Models requiring additional Python packages, custom decoding logic, custom logits processors, or other runtime dependencies beyond the standard inference runtime are not currently supported. As a result, some models may remain unsupported even if their provider is approved for
trust-remote-code. Check Trusted Model Providers for information about trusted model providers, remote code execution, and the provider review process.
Availability
GPU availability varies by zone and GPU type. See GPU availability by zone for the current GPU-by-zone matrix.
| Zone | Country | City | Availability |
|---|---|---|---|
at-vie-1 | Austria | Vienna | |
at-vie-2 | Austria | Vienna | |
bg-sof-1 | Bulgaria | Sofia | |
ch-dk-2 | Switzerland | Zurich | |
ch-gva-2 | Switzerland | Geneva | |
de-fra-1 | Germany | Frankfurt | |
de-muc-1 | Germany | Münich | |
hr-zag-1 | Croatia | Zagreb |
Last updated on