Dedicated Inference
Dedicated Inference lets you run Large Language Models (LLMs) on Exoscale GPU infrastructure.
scale-deployment
[BETA] Scale Deployment
POST /ai/deployment/{id}/scalePath parameters
idinpath(required)
Request body
application/jsonreplicas(required) (integer): Number of replicas (>=0)
Responses
200: 200application/jsonid(string): Operation IDreason(string): Operation failure reasonreference(object): Related resource reference - schema detailsmessage(string): Operation messagestate(string): Operation status
SDK reference for scale-deployment: golang | Python | Java
CLI: exo api scale-deployment
create-deployment
[BETA] Create Deployment
POST /ai/deploymentDeploy a model on an inference server
Request body
application/jsonmodel(object) - schema detailsname(string): Deployment namegpu-type(required) (string): GPU type family (e.g., gpua5000, gpu3080ti)gpu-count(required) (integer): Number of GPUs (1-8)replicas(required) (integer): Number of replicas (>=1)inference-engine-parameters(array[string]): Optional extra inference engine server CLI args
Responses
200: 200application/jsonid(string): Operation IDreason(string): Operation failure reasonreference(object): Related resource reference - schema detailsmessage(string): Operation messagestate(string): Operation status
400: 400application/jsonerror(string): Error description
SDK reference for create-deployment: golang | Python | Java
CLI: exo api create-deployment
get-deployment
[BETA] Get Deployment
GET /ai/deployment/{id}Path parameters
idinpath(required)
Responses
200: 200application/jsongpu-count(integer): Number of GPUsupdated-at(string): Update timedeployment-url(string): Deployment URL (nullable)service-level(string): Service levelname(string): Deployment namestatus-details(string): Deployment status detailsgpu-type(string): GPU type familystatus(string): Deployment statusid(string): Deployment IDreplicas(integer): Number of replicas (>=0)created-at(string): Creation timeinference-engine-parameters(array[string]): Optional extra inference engine server CLI argsmodel(object) - schema details
404: 404application/jsonerror(string): Error description
SDK reference for get-deployment: golang | Python | Java
CLI: exo api get-deployment
delete-deployment
[BETA] Delete Deployment
DELETE /ai/deployment/{id}Path parameters
idinpath(required)
Responses
200: 200application/jsonid(string): Operation IDreason(string): Operation failure reasonreference(object): Related resource reference - schema detailsmessage(string): Operation messagestate(string): Operation status
SDK reference for delete-deployment: golang | Python | Java
CLI: exo api delete-deployment
create-model
[BETA] Create Model
POST /ai/modelModel files will be downloaded from Huggingface.
Name must be the exact name of the model on huggingface (ex: openai/gpt-oss-120b or ggml-org/gpt-oss-120b-GGUF).
If the model is under a license then you must provide a Huggingface access token for an account that signed the license agreement
Request body
application/jsonhuggingface-token(string): Huggingface Tokenname(string): Model name
Responses
200: 200application/jsonid(string): Operation IDreason(string): Operation failure reasonreference(object): Related resource reference - schema detailsmessage(string): Operation messagestate(string): Operation status
SDK reference for create-model: golang | Python | Java
CLI: exo api create-model
get-model
[BETA] Get Model
GET /ai/model/{id}Path parameters
idinpath(required)
Responses
200: 200application/jsonid(string): Model IDname(string): Model namestatus(string): Model statusmodel-size(integer): Model size (nullable)created-at(string): Creation timeupdated-at(string): Update time
404: 404application/jsonerror(string): Error description
SDK reference for get-model: golang | Python | Java
CLI: exo api get-model
delete-model
[BETA] Delete Model
DELETE /ai/model/{id}Path parameters
idinpath(required)
Responses
200: 200application/jsonid(string): Operation IDreason(string): Operation failure reasonreference(object): Related resource reference - schema detailsmessage(string): Operation messagestate(string): Operation status
412: 412application/jsondeployments(array[string]): Deployments using models
SDK reference for delete-model: golang | Python | Java
CLI: exo api delete-model
Other Operations
reveal-deployment-api-key
[BETA] Reveal Deployment API Key
GET /ai/deployment/{id}/api-keyPath parameters
idinpath(required)
Responses
200: 200application/jsonapi-key(string)
SDK reference for reveal-deployment-api-key: golang | Python | Java
CLI: exo api reveal-deployment-api-key
get-deployment-logs
[BETA] Get Deployment Logs
GET /ai/deployment/{id}/logsReturn logs for the vLLM deployment (deploy/
Path parameters
idinpath(required)
Responses
200: 200application/jsonlogs(array[object]): List of log entries - schema details
400: 400application/jsonerror(string): Error description
404: 404application/jsonerror(string): Error description
500: 500application/jsonerror(string): Error description
SDK reference for get-deployment-logs: golang | Python | Java
CLI: exo api get-deployment-logs
list-deployments
[BETA] List Deployments
GET /ai/deploymentResponses
200: 200application/jsondeployments(array[object]) - schema details
SDK reference for list-deployments: golang | Python | Java
CLI: exo api list-deployments
get-inference-engine-help
[BETA] Get inference-engine Help
GET /ai/help/inference-engine-parametersGet list of allowed inference engine parameters with their descriptions, types, allowed values, and defaults
Responses
200: 200application/jsonparameters(array[object]) - schema details
500: 500application/jsonerror(string): Error description
SDK reference for get-inference-engine-help: golang | Python | Java
CLI: exo api get-inference-engine-help
list-models
[BETA] List Models
GET /ai/modelResponses
200: 200application/jsonmodels(array[object]) - schema details
SDK reference for list-models: golang | Python | Java
CLI: exo api list-models