# Limits and Quotas

## Limits
<!--- Template Guidance 
A table with hard system limits of the service.
The difference with quotas is that these cannot be raised at all.
--->

Limits depend on the GPU type you want to use for your specific deployment. Some GPU categories require additional compliance requirements, including the [GPU End-User Certificate](https://portal.exoscale.com/organization/legal/terms).

<!--- Template Guidance 
If there are additional specific limitations, describe below the table.
--->



## Quotas
<!--- Template Guidance 
A table with the quotas of the service.
Below is an example for SOS. Even a single line in the table is OK.
--->

| Usage                                                  | Quota                  |
|--------------------------------------------------------|------------------------|
| quotas                                                 | = GPU Quotas           |

### Checking for GPU capacity and authorizations

On top of the above:
- You need to sign the GPU [End User Certificate](https://www.exoscale.com/end-user-certificate-gpu/) to be authorized to deploy models on RTX Pro 6000 GPUs 
- Dedicated Inference may be capacity-constrained on certain offerings

You can use the `exo dedicated-inference deployment instance-type` single command which allows you to know what GPU you may deploy in any given zone:

```
┼───────────────┼────────────┼──────────┼
│    FAMILY     │ AUTHORIZED │   ZONE   │
┼───────────────┼────────────┼──────────┼
│ gpu3080ti     │ true       │ at-vie-2 │
│ gpua5000      │ true       │ at-vie-2 │
│ gpurtx6000pro │ false      │ ch-dk-2  │
│ gpu3          │ true       │ de-fra-1 │
│ gpurtx6000pro │ true       │ de-fra-1 │
│ gpurtx6000pro │ true       │ hr-zag-1 │
┼───────────────┼────────────┼──────────┼
```

##  Additional Constraints

Safetensors Model File Format
: Model weights must be in the `safetensors` format. GGUF and other formats are not supported.

Customer-managed sizing
: Picking a GPU type and count is model-dependent and use-case dependent. As such, it is up to you to size your inference deployments.

GPU Count Immutability
: The `--gpu-count` parameter cannot be changed after deployment. To use a different GPU count, create a new deployment.

Additional Runtime Dependencies
: Models requiring additional Python packages, custom decoding logic, custom logits processors, or other runtime dependencies beyond the standard inference runtime are not currently supported. As a result, some models may remain unsupported even if their provider is approved for `trust-remote-code`. Check [Trusted Model Providers](/product/concrete-ai/dedicated-inference/how-to/trusted-models) for information about trusted model providers, remote code execution, and the provider review process.


## Availability
GPU availability varies by zone and GPU type. See [GPU availability by zone](https://www.exoscale.com/gpu/#comparison-gpu) for the current GPU-by-zone matrix.

| Zone                                     | Country      | City      | Availability         |
| :---                                     | :---         | :---      | :---:                |
| {{< icon flag-at-4x3 >}} __`at-vie-1`__  | Austria      | Vienna    | {{< icon "ban" >}}   |
| {{< icon flag-at-4x3 >}} __`at-vie-2`__  | Austria      | Vienna    | {{< icon "check" >}} |
| {{< icon flag-bg-4x3 >}} __`bg-sof-1`__  | Bulgaria     | Sofia     | {{< icon "ban" >}}   |
| {{< icon flag-ch-4x3 >}} __`ch-dk-2`__   | Switzerland  | Zurich    | {{< icon "check" >}} |
| {{< icon flag-ch-4x3 >}} __`ch-gva-2`__  | Switzerland  | Geneva    | {{< icon "ban" >}}   |
| {{< icon flag-de-4x3 >}} __`de-fra-1`__  | Germany      | Frankfurt | {{< icon "check" >}} |
| {{< icon flag-de-4x3 >}} __`de-muc-1`__  | Germany      | Münich    | {{< icon "ban" >}}   |
| {{< icon flag-hr-4x3 >}} __`hr-zag-1`__  | Croatia      | Zagreb    | {{< icon "check" >}} |




<!--- Template Guidance 
Only include a table if the service is NOT available in all zones.
The table should still list all zones and use checkmarks or crosses to indicate availability.
We want to avoid any doubt about whether a service is available in a given zone.
--->

<!--- Example Table 
| Zone                                     | Country      | City      | Availability         |
| :---                                     | :---         | :---      | :---:                |
| {{< icon flag-at-4x3 >}} __`at-vie-1`__  | Austria      | Vienna    | {{< icon "check" >}} |
| {{< icon flag-at-4x3 >}} __`at-vie-2`__  | Austria      | Vienna    | {{< icon "check" >}} |
| {{< icon flag-bg-4x3 >}} __`bg-sof-1`__  | Bulgaria     | Sofia     | {{< icon "check" >}} |
| {{< icon flag-hr-4x3 >}} __`hr-zag-1`__  | Croatia      | Zagreb    | {{< icon "check" >}} |
| {{< icon flag-de-4x3 >}} __`de-fra-1`__  | Germany      | Frankfurt | {{< icon "check" >}} |
| {{< icon flag-de-4x3 >}} __`de-muc-1`__  | Germany      | Munich    | {{< icon "ban" >}}   |
| {{< icon flag-ch-4x3 >}} __`ch-gva-2`__  | Switzerland  | Geneva    | {{< icon "check" >}} |
| {{< icon flag-ch-4x3 >}} __`ch-dk-2`__   | Switzerland  | Zurich    | {{< icon "check" >}} |
--->

