SLA Dedicated Inference
These product-specific SLAs define the Service Availability Target for Exoscale Dedicated Inference. It applies to each Client using the Service. Capitalized terms used herein but not defined herein shall have the meanings set forth in the Exoscale Terms and Conditions or EUSA, whichever applies.
SLA Definitions
| Metric | Target | Definition |
|---|---|---|
| Dedicated Inference Endpoint Availability | >= 99.95% Monthly Uptime Percentage | The percentage of eligible requests to a Dedicated Inference endpoint that are served without an Exoscale-attributable server-side failure, calculated per Client and per Dedicated Inference endpoint during a calendar month. Availability is calculated as (total eligible requests - failed eligible requests) / total eligible requests * 100. Failed eligible requests are requests to the Client’s Dedicated Inference endpoint that return an HTTP 5xx response attributable to the Dedicated Inference service. |
Only eligible requests are included in the Monthly Availability Percentage. The applicable exclusions are listed below.
Product Specifications
For detailed product features, technical specifications, and tutorials, please refer to the Exoscale documentation.
Client Responsibilities
While Exoscale is responsible for operating and maintaining the Dedicated Inference endpoint, Clients remain responsible for the following:
- Selection, deployment, and configuration of models and inference workloads.
- Management of input and output data, including data classification, retention, and deletion.
- Access control to inference endpoints, including API keys, credentials, and network policies.
- Validation of inference results and model behavior.
- Backup and lifecycle management of models and related artifacts.
- Compliance of datasets, prompts, outputs, and workloads with applicable laws and regulations.
- Cost management, usage monitoring, and scaling configuration.
The Service Level Agreement applies solely to Dedicated Inference Endpoint Availability and does not cover model performance, inference accuracy, or Client workload behavior. For a complete explanation of the shared responsibility framework between Exoscale and Clients, please see the Exoscale Shared Responsibility Model documentation.
Exclusions
No compensation shall be granted to the Client if the failure to comply with the Dedicated Inference Service Level Objective is due to any of the following reasons:
- Client-side configurations, including model deployment parameters, scaling settings, network rules, access policies, authentication settings, authorization settings, or quota limitations.
- Model behavior, inference quality, output correctness, inference latency, throughput, or data loss.
- Invalid, malformed, unauthorized, forbidden, quota-related, or rate-limited requests.
- Requests that are abusive, artificially generated to affect SLA measurement, part of denial-of-service activity, or above documented usage limits. For the complete list of SLA exclusions, please refer to the General Exoscale Terms and Conditions.
Retribution
Service credits apply only to Dedicated Inference Endpoint Availability as defined above and do not apply to model performance, inference latency, throughput, or Client workload behavior. The standard retribution and SLA credit schemes apply for the Dedicated Inference as described under General Exoscale Terms and Conditions or EUSA whichever applies. The Service Credits shall be the sole remedy in case of non-meeting the SLA. In the event that the Monthly Uptime Percentage for Exoscale Compute Instances falls below the committed Service Level Objective, the Client will be eligible for service credits as follows:
| Monthly Uptime Percentage | Service Credit Percentage of Monthly Service Fees for Affected Resources |
|---|---|
| from 99.95% to 98.3% | 50% |
| below 98.3% | 100% |
Claim Service Credit
To claim a Service Credit, open a support ticket in the Exoscale Portal within 10 business days of the incident. More details