SLA Dedicated Inference

These product-specific SLAs define the Service Availability Target for Exoscale Dedicated Inference. It applies to each Client using the Service. Capitalized terms used herein but not defined herein shall have the meanings set forth in the Exoscale Terms and Conditions or EUSA, whichever applies.

SLA Definitions

Metric	Target	Definition
Dedicated Inference Endpoint Availability	>= 99.95% Monthly Uptime Percentage	The percentage of eligible requests to a Dedicated Inference endpoint that are served without an Exoscale-attributable server-side failure, calculated per Client and per Dedicated Inference endpoint during a calendar month. Availability is calculated as `(total eligible requests - failed eligible requests) / total eligible requests * 100`. Failed eligible requests are requests to the Client’s Dedicated Inference endpoint that return an HTTP `5xx` response attributable to the Dedicated Inference service.

Only eligible requests are included in the Monthly Availability Percentage. The applicable exclusions are listed below.

Product Specifications

For detailed product features, technical specifications, and tutorials, please refer to the Exoscale documentation.

Client Responsibilities

While Exoscale is responsible for operating and maintaining the Dedicated Inference endpoint, Clients remain responsible for the following:

Selection, deployment, and configuration of models and inference workloads.
Management of input and output data, including data classification, retention, and deletion.
Access control to inference endpoints, including API keys, credentials, and network policies.
Validation of inference results and model behavior.
Backup and lifecycle management of models and related artifacts.
Compliance of datasets, prompts, outputs, and workloads with applicable laws and regulations.
Cost management, usage monitoring, and scaling configuration.

The Service Level Agreement applies solely to Dedicated Inference Endpoint Availability and does not cover model performance, inference accuracy, or Client workload behavior. For a complete explanation of the shared responsibility framework between Exoscale and Clients, please see the Exoscale Shared Responsibility Model documentation.

Exclusions

No compensation shall be granted to the Client if the failure to comply with the Dedicated Inference Service Level Objective is due to any of the following reasons:

Client-side configurations, including model deployment parameters, scaling settings, network rules, access policies, authentication settings, authorization settings, or quota limitations.
Model behavior, inference quality, output correctness, inference latency, throughput, or data loss.
Invalid, malformed, unauthorized, forbidden, quota-related, or rate-limited requests.
Requests that are abusive, artificially generated to affect SLA measurement, part of denial-of-service activity, or above documented usage limits. For the complete list of SLA exclusions, please refer to the General Exoscale Terms and Conditions.

Retribution

Service credits apply only to Dedicated Inference Endpoint Availability as defined above and do not apply to model performance, inference latency, throughput, or Client workload behavior. The standard retribution and SLA credit schemes apply for the Dedicated Inference as described under General Exoscale Terms and Conditions or EUSA whichever applies. The Service Credits shall be the sole remedy in case of non-meeting the SLA. In the event that the Monthly Uptime Percentage for Exoscale Compute Instances falls below the committed Service Level Objective, the Client will be eligible for service credits as follows:

Monthly Uptime Percentage	Service Credit Percentage of Monthly Service Fees for Affected Resources
from 99.95% to 98.3%	50%
below 98.3%	100%

Claim Service Credit

To request a service credit, refer to the Service Unavailability Credit section of the Exoscale Terms and Conditions or the EUSA, whichever applies.

Last updated on July 2, 2026