Deployment

Deployments are loaded model instances ready for inference.

Read more

scale-deployment

[BETA] Scale Deployment

POST /ai/deployment/{id}/scale

Path parameters

NameInDescription
idpath

Request body

Content-Type: application/json

PropertyTypeRequiredDescription
replicasintegeryesNumber of replicas (>=0)
Example
{
  "replicas": 0
}

Responses

Status: 200 - 200

Content-Type: application/json

PropertyTypeDescription
idstringOperation ID
messagestringOperation message
reasonstringOperation failure reason

Allowed values: incorrect, unknown, unavailable, forbidden, busy, fault, partial, not-found, interrupted, unsupported, conflict.
referenceobjectRelated resource reference schema details
statestringOperation status

Allowed values: failure, pending, success, timeout.
Example output
{
  "id": "string",
  "message": "string",
  "reason": "incorrect",
  "reference": {
    "command": "string",
    "id": "string",
    "link": "string"
  },
  "state": "failure"
}

SDK reference for scale-deployment: golang | Python | Java

CLI: exo api scale-deployment

create-deployment

[BETA] Create Deployment

POST /ai/deployment

Deploy a model on an inference server

Request body

Content-Type: application/json

PropertyTypeRequiredDescription
gpu-countintegeryesNumber of GPUs (1-8)
gpu-typestringyesGPU type family (e.g., gpua5000, gpu3080ti)
replicasintegeryesNumber of replicas (>=1)
inference-engine-parametersarray[string]noOptional extra inference engine server CLI args
modelobjectnoschema details
namestringnoDeployment name
Example
{
  "gpu-count": 0,
  "gpu-type": "string",
  "inference-engine-parameters": [
    "string"
  ],
  "model": {
    "id": "string",
    "name": "string"
  },
  "name": "string",
  "replicas": 0
}

Responses

200: 200

Content-Type: application/json

PropertyTypeDescription
idstringOperation ID
messagestringOperation message
reasonstringOperation failure reason

Allowed values: incorrect, unknown, unavailable, forbidden, busy, fault, partial, not-found, interrupted, unsupported, conflict.
referenceobjectRelated resource reference schema details
statestringOperation status

Allowed values: failure, pending, success, timeout.
Example output
{
  "id": "string",
  "message": "string",
  "reason": "incorrect",
  "reference": {
    "command": "string",
    "id": "string",
    "link": "string"
  },
  "state": "failure"
}

400: 400

Content-Type: application/json

PropertyTypeDescription
errorstringError description
Example output
{
  "error": "string"
}

SDK reference for create-deployment: golang | Python | Java

CLI: exo api create-deployment

get-deployment

[BETA] Get Deployment

GET /ai/deployment/{id}

Path parameters

NameInDescription
idpath

Responses

200: 200

Content-Type: application/json

PropertyTypeDescription
created-atstringCreation time
deployment-urlstringDeployment URL (nullable)
gpu-countintegerNumber of GPUs
gpu-typestringGPU type family
idstringDeployment ID
inference-engine-parametersarray[string]Optional extra inference engine server CLI args
modelobjectschema details
namestringDeployment name
replicasintegerNumber of replicas (>=0)
service-levelstringService level
statusstringDeployment status

Allowed values: ready, creating, error, deploying.
status-detailsstringDeployment status details
updated-atstringUpdate time
Example output
{
  "created-at": "2024-01-01T12:00:00Z",
  "deployment-url": "string",
  "gpu-count": 0,
  "gpu-type": "string",
  "id": "string",
  "inference-engine-parameters": [
    "string"
  ],
  "model": {
    "id": "string",
    "name": "string"
  },
  "name": "string",
  "replicas": 0,
  "service-level": "string",
  "status": "ready",
  "status-details": "string",
  "updated-at": "2024-01-01T12:00:00Z"
}

404: 404

Content-Type: application/json

PropertyTypeDescription
errorstringError description
Example output
{
  "error": "string"
}

SDK reference for get-deployment: golang | Python | Java

CLI: exo api get-deployment

delete-deployment

[BETA] Delete Deployment

DELETE /ai/deployment/{id}

Path parameters

NameInDescription
idpath

Responses

Status: 200 - 200

Content-Type: application/json

PropertyTypeDescription
idstringOperation ID
messagestringOperation message
reasonstringOperation failure reason

Allowed values: incorrect, unknown, unavailable, forbidden, busy, fault, partial, not-found, interrupted, unsupported, conflict.
referenceobjectRelated resource reference schema details
statestringOperation status

Allowed values: failure, pending, success, timeout.
Example output
{
  "id": "string",
  "message": "string",
  "reason": "incorrect",
  "reference": {
    "command": "string",
    "id": "string",
    "link": "string"
  },
  "state": "failure"
}

SDK reference for delete-deployment: golang | Python | Java

CLI: exo api delete-deployment


Other Operations

reveal-deployment-api-key

[BETA] Reveal Deployment API Key

GET /ai/deployment/{id}/api-key

Path parameters

NameInDescription
idpath

Responses

Status: 200 - 200

Content-Type: application/json

PropertyTypeDescription
api-keystring
Example output
{
  "api-key": "string"
}

SDK reference for reveal-deployment-api-key: golang | Python | Java

CLI: exo api reveal-deployment-api-key

get-deployment-logs

[BETA] Get Deployment Logs

GET /ai/deployment/{id}/logs

Return logs for the vLLM deployment (deploy/–deployment-vllm). Optional ?stream=true to request streaming (may not be supported).

Path parameters

NameInDescription
idpath

Query parameters

NameInRequiredDescription
tailqueryno

Responses

200: 200

Content-Type: application/json

PropertyTypeDescription
logsarray[object]List of log entries schema details
Example output
{
  "logs": [
    {
      "message": "string",
      "node": "string",
      "time": "string"
    }
  ]
}

404: 404

Content-Type: application/json

PropertyTypeDescription
errorstringError description
Example output
{
  "error": "string"
}

500: 500

Content-Type: application/json

PropertyTypeDescription
errorstringError description
Example output
{
  "error": "string"
}

SDK reference for get-deployment-logs: golang | Python | Java

CLI: exo api get-deployment-logs

list-deployments

[BETA] List Deployments

GET /ai/deployment

Responses

Status: 200 - 200

Content-Type: application/json

PropertyTypeDescription
deploymentsarray[object]schema details
Example output
{
  "deployments": [
    {
      "created-at": "2024-01-01T12:00:00Z",
      "deployment-url": "string",
      "gpu-count": 0,
      "gpu-type": "string",
      "id": "string",
      "model": {
        "id": "string",
        "name": "string"
      },
      "name": "string",
      "replicas": 0,
      "service-level": "string",
      "status": "ready",
      "updated-at": "2024-01-01T12:00:00Z"
    }
  ]
}

SDK reference for list-deployments: golang | Python | Java

CLI: exo api list-deployments

get-inference-engine-help

[BETA] Get inference-engine Help

GET /ai/help/inference-engine-parameters

Get list of allowed inference engine parameters with their descriptions, types, allowed values, and defaults

Responses

200: 200

Content-Type: application/json

PropertyTypeDescription
parametersarray[object]schema details
Example output
{
  "parameters": [
    {
      "allowed-values": [
        "string"
      ],
      "default": "string",
      "description": "string",
      "flags": [
        "string"
      ],
      "name": "string",
      "section": "string",
      "type": "string"
    }
  ]
}

500: 500

Content-Type: application/json

PropertyTypeDescription
errorstringError description
Example output
{
  "error": "string"
}

SDK reference for get-inference-engine-help: golang | Python | Java

CLI: exo api get-inference-engine-help

Last updated on