exo dedicated-inference deployment create
Description
This command creates an AI deployment on dedicated inference servers.
exo dedicated-inference deployment create [NAME] [flags]Options
| Option | Description |
|---|---|
--gpu-count | Number of GPUs (1-8) |
--gpu-type | GPU type family (e.g., gpua5000, gpu3080ti) |
--help, -h | help for create |
--inference-engine-parameter-help | Show inference engine parameters help |
--inference-engine-params | Space-separated inference engine server CLI arguments (e.g., "–gpu-memory-usage=0.8 –max-tokens=4096") |
--inference-engine-version | Inference engine version |
--model-id | Model ID (UUID) |
--model-name | Model name (as created) |
--replicas | Number of replicas (>=1) |
--zone, -z | zone |
Options inherited from parent commands
| Option | Description |
|---|---|
--config, -C | Specify an alternate config file [env EXOSCALE_CONFIG] |
--output-format, -O | Output format (table|json|text), see "exo output –help" for more information |
--output-template | Template to use if output format is "text" |
--quiet, -Q | Quiet mode (disable non-essential command output) |
--use-account, -A | Account to use in config file [env EXOSCALE_ACCOUNT] |
Related Commands
- deployment - Manage AI deployments
Last updated on