Deployment
Deployments are loaded model instances ready for inference.
scale-deployment
[BETA] Scale Deployment
POST /ai/deployment/{id}/scaleScale Deployment
Path parameters
| Name | In | Description |
|---|---|---|
id | path |
Request body
Content-Type: application/json
| Property | Type | Required | Description |
|---|---|---|---|
replicas | integer | yes | Number of replicas (>=0) |
Example
{
"replicas": 0
}Responses
Status: 200 - 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
id | string | Operation ID |
message | string | Operation message |
reason | string | Operation failure reason Allowed values: incorrect, unknown, unavailable, forbidden, busy, fault, partial, not-found, interrupted, unsupported, conflict. |
reference | Reference | Related resource reference |
state | string | Operation status Allowed values: failure, pending, success, timeout. |
Example output
{
"id": "string",
"message": "string",
"reason": "incorrect",
"reference": {
"command": "string",
"id": "string",
"link": "string"
},
"state": "failure"
}SDK reference for scale-deployment: golang | Python | Java
CLI: exo api scale-deployment
create-deployment
[BETA] Create Deployment
POST /ai/deploymentDeploy a model on an inference server
Request body
Content-Type: application/json
| Property | Type | Required | Description |
|---|---|---|---|
gpu-count | integer | yes | Number of GPUs (1-8) |
gpu-type | string | yes | GPU type family (e.g., gpua5000, gpu3080ti) |
model | Model Ref | yes | |
name | string | yes | Deployment name |
replicas | integer | yes | Number of replicas (>=1) |
inference-engine-parameters | array[string] | no | Optional extra inference engine server CLI args |
inference-engine-version | string | no | Allowed values: 0.12.0, 0.15.1. |
Example
{
"gpu-count": 0,
"gpu-type": "string",
"inference-engine-parameters": [
"string"
],
"inference-engine-version": "0.12.0",
"model": {
"id": "string",
"name": "string"
},
"name": "string",
"replicas": 0
}Responses
200: 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
id | string | Operation ID |
message | string | Operation message |
reason | string | Operation failure reason Allowed values: incorrect, unknown, unavailable, forbidden, busy, fault, partial, not-found, interrupted, unsupported, conflict. |
reference | Reference | Related resource reference |
state | string | Operation status Allowed values: failure, pending, success, timeout. |
Example output
{
"id": "string",
"message": "string",
"reason": "incorrect",
"reference": {
"command": "string",
"id": "string",
"link": "string"
},
"state": "failure"
}400: 400
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
error | string | Error description |
Example output
{
"error": "string"
}412: 412
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
error | string | Error description |
Example output
{
"error": "string"
}SDK reference for create-deployment: golang | Python | Java
CLI: exo api create-deployment
get-deployment
[BETA] Get Deployment
GET /ai/deployment/{id}Get Deployment details
Path parameters
| Name | In | Description |
|---|---|---|
id | path |
Responses
200: 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
created-at | string | Creation time |
deployment-url | string | Deployment URL (nullable) |
gpu-count | integer | Number of GPUs |
gpu-type | string | GPU type family |
id | string | Deployment ID |
inference-engine-parameters | array[string] | Optional extra inference engine server CLI args |
inference-engine-version | string | Allowed values: 0.12.0, 0.15.1. |
model | Model Ref | |
name | string | Deployment name |
replicas | integer | Number of replicas (>=0) |
service-level | string | Service level |
state | string | Deployment state Allowed values: ready, creating, error, deploying. |
state-details | string | Deployment state details |
updated-at | string | Update time |
Example output
{
"created-at": "2024-01-01T12:00:00Z",
"deployment-url": "string",
"gpu-count": 0,
"gpu-type": "string",
"id": "string",
"inference-engine-parameters": [
"string"
],
"inference-engine-version": "0.12.0",
"model": {
"id": "string",
"name": "string"
},
"name": "string",
"replicas": 0,
"service-level": "string",
"state": "ready",
"state-details": "string",
"updated-at": "2024-01-01T12:00:00Z"
}404: 404
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
error | string | Error description |
Example output
{
"error": "string"
}SDK reference for get-deployment: golang | Python | Java
CLI: exo api get-deployment
update-deployment
PATCH /ai/deployment/{id}Update AI deployment
Path parameters
| Name | In | Description |
|---|---|---|
id | path |
Request body
Content-Type: application/json
| Property | Type | Required | Description |
|---|---|---|---|
inference-engine-parameters | array[string] | no | Optional extra inference engine server CLI args |
inference-engine-version | string | no | Allowed values: 0.12.0, 0.15.1. |
name | string | no | Deployment name |
Example
{
"inference-engine-parameters": [
"string"
],
"inference-engine-version": "0.12.0",
"name": "string"
}Responses
Status: 200 - 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
id | string | Operation ID |
message | string | Operation message |
reason | string | Operation failure reason Allowed values: incorrect, unknown, unavailable, forbidden, busy, fault, partial, not-found, interrupted, unsupported, conflict. |
reference | Reference | Related resource reference |
state | string | Operation status Allowed values: failure, pending, success, timeout. |
Example output
{
"id": "string",
"message": "string",
"reason": "incorrect",
"reference": {
"command": "string",
"id": "string",
"link": "string"
},
"state": "failure"
}SDK reference for update-deployment: golang | Python | Java
CLI: exo api update-deployment
delete-deployment
[BETA] Delete Deployment
DELETE /ai/deployment/{id}Delete Deployment
Path parameters
| Name | In | Description |
|---|---|---|
id | path |
Responses
Status: 200 - 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
id | string | Operation ID |
message | string | Operation message |
reason | string | Operation failure reason Allowed values: incorrect, unknown, unavailable, forbidden, busy, fault, partial, not-found, interrupted, unsupported, conflict. |
reference | Reference | Related resource reference |
state | string | Operation status Allowed values: failure, pending, success, timeout. |
Example output
{
"id": "string",
"message": "string",
"reason": "incorrect",
"reference": {
"command": "string",
"id": "string",
"link": "string"
},
"state": "failure"
}SDK reference for delete-deployment: golang | Python | Java
CLI: exo api delete-deployment
Other Operations
list-ai-instance-types
List Instance Types
GET /ai/instance-typeList available instance types with authorization status based on GPU availability
Responses
Status: 200 - 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
instance-types | array of Instance type with authorization status |
Example output
{
"instance-types": [
{
"authorized": true,
"family": "string"
}
]
}SDK reference for list-ai-instance-types: golang | Python | Java
CLI: exo api list-ai-instance-types
reveal-deployment-api-key
[BETA] Reveal Deployment API Key
GET /ai/deployment/{id}/api-keyGet Deployment API Key
Path parameters
| Name | In | Description |
|---|---|---|
id | path |
Responses
Status: 200 - 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
api-key | string |
Example output
{
"api-key": "string"
}SDK reference for reveal-deployment-api-key: golang | Python | Java
CLI: exo api reveal-deployment-api-key
get-deployment-logs
[BETA] Get Deployment Logs
GET /ai/deployment/{id}/logsReturn logs for the vLLM deployment (deploy/
Path parameters
| Name | In | Description |
|---|---|---|
id | path |
Query parameters
| Name | In | Required | Description |
|---|---|---|---|
stream | query | no | |
tail | query | no |
Responses
200: 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
logs | array of A single log entry | List of log entries |
Example output
{
"logs": [
{
"message": "string",
"node": "string",
"time": "string"
}
]
}400: 400
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
error | string | Error description |
Example output
{
"error": "string"
}404: 404
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
error | string | Error description |
Example output
{
"error": "string"
}500: 500
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
error | string | Error description |
Example output
{
"error": "string"
}SDK reference for get-deployment-logs: golang | Python | Java
CLI: exo api get-deployment-logs
list-deployments
[BETA] List Deployments
GET /ai/deploymentList Deployments
Responses
Status: 200 - 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
deployments | array of AI deployment |
Example output
{
"deployments": [
{
"created-at": "2024-01-01T12:00:00Z",
"deployment-url": "string",
"gpu-count": 0,
"gpu-type": "string",
"id": "string",
"model": {
"id": "string",
"name": "string"
},
"name": "string",
"replicas": 0,
"service-level": "string",
"state": "ready",
"updated-at": "2024-01-01T12:00:00Z"
}
]
}SDK reference for list-deployments: golang | Python | Java
CLI: exo api list-deployments
get-inference-engine-help
[BETA] Get inference-engine Help
GET /ai/help/inference-engine-parametersGet list of allowed inference engine parameters with their descriptions and allowed values
Query parameters
| Name | In | Required | Description |
|---|---|---|---|
version | query | no |
Responses
Status: 200 - 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
parameters | array of inference-engine parameter definition |
Example output
{
"parameters": [
{
"allowed-values": [
"string"
],
"default": "string",
"description": "string",
"flags": [
"string"
],
"name": "string",
"section": "string",
"type": "string"
}
]
}SDK reference for get-inference-engine-help: golang | Python | Java
CLI: exo api get-inference-engine-help