Deployment

Deployments are loaded model instances ready for inference.

scale-deployment

[BETA] Scale Deployment

POST /ai/deployment/{id}/scale

Scale Deployment

Path parameters

Name	In	Description
`id`	`path`

Request body

Content-Type: application/json

Property	Type	Required	Description
`replicas`	integer	yes	Number of replicas (>=0)

Example

{
  "replicas": 0
}

Responses

Status: 200 - 200

Content-Type: application/json

Property	Type	Description
`id`	string	Operation ID
`message`	string	Operation message
`reason`	string	Operation failure reason Allowed values: `incorrect`, `unknown`, `unavailable`, `forbidden`, `busy`, `fault`, `partial`, `not-found`, `interrupted`, `unsupported`, `conflict`.
`reference`	Reference	Related resource reference
`state`	string	Operation status Allowed values: `failure`, `pending`, `success`, `timeout`.

Example output

{
  "id": "string",
  "message": "string",
  "reason": "incorrect",
  "reference": {
    "command": "string",
    "id": "string",
    "link": "string"
  },
  "state": "failure"
}

SDK reference for scale-deployment: golang | Python | Java

CLI: exo api scale-deployment

create-deployment

[BETA] Create Deployment

POST /ai/deployment

Deploy a model on an inference server

Request body

Content-Type: application/json

Property	Type	Required	Description
`gpu-count`	integer	yes	Number of GPUs (1-8)
`gpu-type`	string	yes	GPU type family (e.g., gpua5000, gpu3080ti)
`model`	Model Ref	yes
`name`	string	yes	Deployment name
`replicas`	integer	yes	Number of replicas (>=1)
`inference-engine-parameters`	array[string]	no	Optional extra inference engine server CLI args
`inference-engine-version`	string	no	Allowed values: `0.12.0`, `0.15.1`.

Example

{
  "gpu-count": 0,
  "gpu-type": "string",
  "inference-engine-parameters": [
    "string"
  ],
  "inference-engine-version": "0.12.0",
  "model": {
    "id": "string",
    "name": "string"
  },
  "name": "string",
  "replicas": 0
}

Responses

200: 200

Content-Type: application/json

Property	Type	Description
`id`	string	Operation ID
`message`	string	Operation message
`reason`	string	Operation failure reason Allowed values: `incorrect`, `unknown`, `unavailable`, `forbidden`, `busy`, `fault`, `partial`, `not-found`, `interrupted`, `unsupported`, `conflict`.
`reference`	Reference	Related resource reference
`state`	string	Operation status Allowed values: `failure`, `pending`, `success`, `timeout`.

Example output

{
  "id": "string",
  "message": "string",
  "reason": "incorrect",
  "reference": {
    "command": "string",
    "id": "string",
    "link": "string"
  },
  "state": "failure"
}

400: 400

Content-Type: application/json

Property	Type	Description
`error`	string	Error description

Example output

{
  "error": "string"
}

412: 412

Content-Type: application/json

Property	Type	Description
`error`	string	Error description

Example output

{
  "error": "string"
}

SDK reference for create-deployment: golang | Python | Java

CLI: exo api create-deployment

get-deployment

[BETA] Get Deployment

GET /ai/deployment/{id}

Get Deployment details

Path parameters

Name	In	Description
`id`	`path`

Responses

200: 200

Content-Type: application/json

Property	Type	Description
`created-at`	string	Creation time
`deployment-url`	string	Deployment URL (nullable)
`gpu-count`	integer	Number of GPUs
`gpu-type`	string	GPU type family
`id`	string	Deployment ID
`inference-engine-parameters`	array[string]	Optional extra inference engine server CLI args
`inference-engine-version`	string	Allowed values: `0.12.0`, `0.15.1`.
`model`	Model Ref
`name`	string	Deployment name
`replicas`	integer	Number of replicas (>=0)
`service-level`	string	Service level
`state`	string	Deployment state Allowed values: `ready`, `creating`, `error`, `deploying`.
`state-details`	string	Deployment state details
`updated-at`	string	Update time

Example output

{
  "created-at": "2024-01-01T12:00:00Z",
  "deployment-url": "string",
  "gpu-count": 0,
  "gpu-type": "string",
  "id": "string",
  "inference-engine-parameters": [
    "string"
  ],
  "inference-engine-version": "0.12.0",
  "model": {
    "id": "string",
    "name": "string"
  },
  "name": "string",
  "replicas": 0,
  "service-level": "string",
  "state": "ready",
  "state-details": "string",
  "updated-at": "2024-01-01T12:00:00Z"
}

404: 404

Content-Type: application/json

Property	Type	Description
`error`	string	Error description

Example output

{
  "error": "string"
}

SDK reference for get-deployment: golang | Python | Java

CLI: exo api get-deployment

update-deployment

PATCH /ai/deployment/{id}

Update AI deployment

Path parameters

Name	In	Description
`id`	`path`

Request body

Content-Type: application/json

Property	Type	Required	Description
`inference-engine-parameters`	array[string]	no	Optional extra inference engine server CLI args
`inference-engine-version`	string	no	Allowed values: `0.12.0`, `0.15.1`.
`name`	string	no	Deployment name

Example

{
  "inference-engine-parameters": [
    "string"
  ],
  "inference-engine-version": "0.12.0",
  "name": "string"
}

Responses

Status: 200 - 200

Content-Type: application/json

Property	Type	Description
`id`	string	Operation ID
`message`	string	Operation message
`reason`	string	Operation failure reason Allowed values: `incorrect`, `unknown`, `unavailable`, `forbidden`, `busy`, `fault`, `partial`, `not-found`, `interrupted`, `unsupported`, `conflict`.
`reference`	Reference	Related resource reference
`state`	string	Operation status Allowed values: `failure`, `pending`, `success`, `timeout`.

Example output

{
  "id": "string",
  "message": "string",
  "reason": "incorrect",
  "reference": {
    "command": "string",
    "id": "string",
    "link": "string"
  },
  "state": "failure"
}

SDK reference for update-deployment: golang | Python | Java

CLI: exo api update-deployment

delete-deployment

[BETA] Delete Deployment

DELETE /ai/deployment/{id}

Delete Deployment

Path parameters

Name	In	Description
`id`	`path`

Responses

Status: 200 - 200

Content-Type: application/json

Property	Type	Description
`id`	string	Operation ID
`message`	string	Operation message
`reason`	string	Operation failure reason Allowed values: `incorrect`, `unknown`, `unavailable`, `forbidden`, `busy`, `fault`, `partial`, `not-found`, `interrupted`, `unsupported`, `conflict`.
`reference`	Reference	Related resource reference
`state`	string	Operation status Allowed values: `failure`, `pending`, `success`, `timeout`.

Example output

{
  "id": "string",
  "message": "string",
  "reason": "incorrect",
  "reference": {
    "command": "string",
    "id": "string",
    "link": "string"
  },
  "state": "failure"
}

SDK reference for delete-deployment: golang | Python | Java

CLI: exo api delete-deployment

Other Operations

list-ai-instance-types

List Instance Types

GET /ai/instance-type

List available instance types with authorization status based on GPU availability

Responses

Status: 200 - 200

Content-Type: application/json

Property	Type	Description
`instance-types`	array of Instance type with authorization status

Example output

{
  "instance-types": [
    {
      "authorized": true,
      "family": "string"
    }
  ]
}

SDK reference for list-ai-instance-types: golang | Python | Java

CLI: exo api list-ai-instance-types

reveal-deployment-api-key

[BETA] Reveal Deployment API Key

GET /ai/deployment/{id}/api-key

Get Deployment API Key

Path parameters

Name	In	Description
`id`	`path`

Responses

Status: 200 - 200

Content-Type: application/json

Property	Type	Description
`api-key`	string

Example output

{
  "api-key": "string"
}

SDK reference for reveal-deployment-api-key: golang | Python | Java

CLI: exo api reveal-deployment-api-key

get-deployment-logs

[BETA] Get Deployment Logs

GET /ai/deployment/{id}/logs

Return logs for the vLLM deployment (deploy/–deployment-vllm). Optional ?stream=true to request streaming (may not be supported).

Path parameters

Name	In	Description
`id`	`path`

Query parameters

Name	In	Required	Description
`stream`	`query`	no
`tail`	`query`	no

Responses

200: 200

Content-Type: application/json

Property	Type	Description
`logs`	array of A single log entry	List of log entries

Example output

{
  "logs": [
    {
      "message": "string",
      "node": "string",
      "time": "string"
    }
  ]
}

400: 400

Content-Type: application/json

Property	Type	Description
`error`	string	Error description

Example output

{
  "error": "string"
}

404: 404

Content-Type: application/json

Property	Type	Description
`error`	string	Error description

Example output

{
  "error": "string"
}

500: 500

Content-Type: application/json

Property	Type	Description
`error`	string	Error description

Example output

{
  "error": "string"
}

SDK reference for get-deployment-logs: golang | Python | Java

CLI: exo api get-deployment-logs

list-deployments

[BETA] List Deployments

GET /ai/deployment

List Deployments

Responses

Status: 200 - 200

Content-Type: application/json

Property	Type	Description
`deployments`	array of AI deployment

Example output

{
  "deployments": [
    {
      "created-at": "2024-01-01T12:00:00Z",
      "deployment-url": "string",
      "gpu-count": 0,
      "gpu-type": "string",
      "id": "string",
      "model": {
        "id": "string",
        "name": "string"
      },
      "name": "string",
      "replicas": 0,
      "service-level": "string",
      "state": "ready",
      "updated-at": "2024-01-01T12:00:00Z"
    }
  ]
}

SDK reference for list-deployments: golang | Python | Java

CLI: exo api list-deployments

get-inference-engine-help

[BETA] Get inference-engine Help

GET /ai/help/inference-engine-parameters

Get list of allowed inference engine parameters with their descriptions and allowed values

Query parameters

Name	In	Required	Description
`version`	`query`	no

Responses

Status: 200 - 200

Content-Type: application/json

Property	Type	Description
`parameters`	array of inference-engine parameter definition

Example output

{
  "parameters": [
    {
      "allowed-values": [
        "string"
      ],
      "default": "string",
      "description": "string",
      "flags": [
        "string"
      ],
      "name": "string",
      "section": "string",
      "type": "string"
    }
  ]
}

SDK reference for get-inference-engine-help: golang | Python | Java

CLI: exo api get-inference-engine-help

Last updated on March 9, 2026