Deployment
Deployments are loaded model instances ready for inference.
scale-deployment
[BETA] Scale Deployment
POST /ai/deployment/{id}/scalePath parameters
| Name | In | Description |
|---|---|---|
id | path |
Request body
Content-Type: application/json
| Property | Type | Required | Description |
|---|---|---|---|
replicas | integer | yes | Number of replicas (>=0) |
Example
{
"replicas": 0
}Responses
Status: 200 - 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
id | string | Operation ID |
message | string | Operation message |
reason | string | Operation failure reason Allowed values: incorrect, unknown, unavailable, forbidden, busy, fault, partial, not-found, interrupted, unsupported, conflict. |
reference | object | Related resource reference schema details |
state | string | Operation status Allowed values: failure, pending, success, timeout. |
Example output
{
"id": "string",
"message": "string",
"reason": "incorrect",
"reference": {
"command": "string",
"id": "string",
"link": "string"
},
"state": "failure"
}SDK reference for scale-deployment: golang | Python | Java
CLI: exo api scale-deployment
create-deployment
[BETA] Create Deployment
POST /ai/deploymentDeploy a model on an inference server
Request body
Content-Type: application/json
| Property | Type | Required | Description |
|---|---|---|---|
gpu-count | integer | yes | Number of GPUs (1-8) |
gpu-type | string | yes | GPU type family (e.g., gpua5000, gpu3080ti) |
replicas | integer | yes | Number of replicas (>=1) |
inference-engine-parameters | array[string] | no | Optional extra inference engine server CLI args |
model | object | no | schema details |
name | string | no | Deployment name |
Example
{
"gpu-count": 0,
"gpu-type": "string",
"inference-engine-parameters": [
"string"
],
"model": {
"id": "string",
"name": "string"
},
"name": "string",
"replicas": 0
}Responses
200: 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
id | string | Operation ID |
message | string | Operation message |
reason | string | Operation failure reason Allowed values: incorrect, unknown, unavailable, forbidden, busy, fault, partial, not-found, interrupted, unsupported, conflict. |
reference | object | Related resource reference schema details |
state | string | Operation status Allowed values: failure, pending, success, timeout. |
Example output
{
"id": "string",
"message": "string",
"reason": "incorrect",
"reference": {
"command": "string",
"id": "string",
"link": "string"
},
"state": "failure"
}400: 400
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
error | string | Error description |
Example output
{
"error": "string"
}SDK reference for create-deployment: golang | Python | Java
CLI: exo api create-deployment
get-deployment
[BETA] Get Deployment
GET /ai/deployment/{id}Path parameters
| Name | In | Description |
|---|---|---|
id | path |
Responses
200: 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
created-at | string | Creation time |
deployment-url | string | Deployment URL (nullable) |
gpu-count | integer | Number of GPUs |
gpu-type | string | GPU type family |
id | string | Deployment ID |
inference-engine-parameters | array[string] | Optional extra inference engine server CLI args |
model | object | schema details |
name | string | Deployment name |
replicas | integer | Number of replicas (>=0) |
service-level | string | Service level |
status | string | Deployment status Allowed values: ready, creating, error, deploying. |
status-details | string | Deployment status details |
updated-at | string | Update time |
Example output
{
"created-at": "2024-01-01T12:00:00Z",
"deployment-url": "string",
"gpu-count": 0,
"gpu-type": "string",
"id": "string",
"inference-engine-parameters": [
"string"
],
"model": {
"id": "string",
"name": "string"
},
"name": "string",
"replicas": 0,
"service-level": "string",
"status": "ready",
"status-details": "string",
"updated-at": "2024-01-01T12:00:00Z"
}404: 404
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
error | string | Error description |
Example output
{
"error": "string"
}SDK reference for get-deployment: golang | Python | Java
CLI: exo api get-deployment
delete-deployment
[BETA] Delete Deployment
DELETE /ai/deployment/{id}Path parameters
| Name | In | Description |
|---|---|---|
id | path |
Responses
Status: 200 - 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
id | string | Operation ID |
message | string | Operation message |
reason | string | Operation failure reason Allowed values: incorrect, unknown, unavailable, forbidden, busy, fault, partial, not-found, interrupted, unsupported, conflict. |
reference | object | Related resource reference schema details |
state | string | Operation status Allowed values: failure, pending, success, timeout. |
Example output
{
"id": "string",
"message": "string",
"reason": "incorrect",
"reference": {
"command": "string",
"id": "string",
"link": "string"
},
"state": "failure"
}SDK reference for delete-deployment: golang | Python | Java
CLI: exo api delete-deployment
Other Operations
reveal-deployment-api-key
[BETA] Reveal Deployment API Key
GET /ai/deployment/{id}/api-keyPath parameters
| Name | In | Description |
|---|---|---|
id | path |
Responses
Status: 200 - 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
api-key | string |
Example output
{
"api-key": "string"
}SDK reference for reveal-deployment-api-key: golang | Python | Java
CLI: exo api reveal-deployment-api-key
get-deployment-logs
[BETA] Get Deployment Logs
GET /ai/deployment/{id}/logsReturn logs for the vLLM deployment (deploy/
Path parameters
| Name | In | Description |
|---|---|---|
id | path |
Query parameters
| Name | In | Required | Description |
|---|---|---|---|
tail | query | no |
Responses
200: 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
logs | array[object] | List of log entries schema details |
Example output
{
"logs": [
{
"message": "string",
"node": "string",
"time": "string"
}
]
}404: 404
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
error | string | Error description |
Example output
{
"error": "string"
}500: 500
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
error | string | Error description |
Example output
{
"error": "string"
}SDK reference for get-deployment-logs: golang | Python | Java
CLI: exo api get-deployment-logs
list-deployments
[BETA] List Deployments
GET /ai/deploymentResponses
Status: 200 - 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
deployments | array[object] | schema details |
Example output
{
"deployments": [
{
"created-at": "2024-01-01T12:00:00Z",
"deployment-url": "string",
"gpu-count": 0,
"gpu-type": "string",
"id": "string",
"model": {
"id": "string",
"name": "string"
},
"name": "string",
"replicas": 0,
"service-level": "string",
"status": "ready",
"updated-at": "2024-01-01T12:00:00Z"
}
]
}SDK reference for list-deployments: golang | Python | Java
CLI: exo api list-deployments
get-inference-engine-help
[BETA] Get inference-engine Help
GET /ai/help/inference-engine-parametersGet list of allowed inference engine parameters with their descriptions, types, allowed values, and defaults
Responses
200: 200
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
parameters | array[object] | schema details |
Example output
{
"parameters": [
{
"allowed-values": [
"string"
],
"default": "string",
"description": "string",
"flags": [
"string"
],
"name": "string",
"section": "string",
"type": "string"
}
]
}500: 500
Content-Type: application/json
| Property | Type | Description |
|---|---|---|
error | string | Error description |
Example output
{
"error": "string"
}SDK reference for get-inference-engine-help: golang | Python | Java
CLI: exo api get-inference-engine-help