Monitor and Troubleshoot Deployments

Monitor and Troubleshoot Deployments

This guide covers how to monitor deployments, interpret logs, and resolve common issues.

Monitoring Deployment Status

List All Deployments

Get an overview of all deployments in a zone:

exo dedicated-inference deployment list -z at-vie-2

Output includes:

  • Deployment name
  • Model name
  • GPU type and count
  • Replica count
  • Current status

Check Specific Deployment

View detailed information about a single deployment:

exo dedicated-inference deployment show my-app -z at-vie-2

Key information:

  • Deployment URL (endpoint for inference requests)
  • Status (deploying, ready, scaling)
  • GPU configuration (type, count per instance)
  • Replica count (current and target)
  • Creation and update timestamps

Understanding Deployment States

StatusDescriptionExpected Duration
deployingResources are being provisioned and model is loading3-5 minutes (longer for large models)
readyDeployment is running and accepting requestsStable state
scalingReplicas are being added or removed3-5 minutes per replica

Viewing Deployment Logs

Logs provide detailed information about deployment health, model loading, and inference requests.

Access Logs

exo dedicated-inference deployment logs my-app -z at-vie-2

What Logs Show

Model Loading
Track progress as the model downloads from Object Storage and loads into GPU memory.
GPU Memory Usage
Identify if your model fits within available GPU memory.
Inference Requests
Monitor incoming requests and response times.
Errors and Warnings
Diagnose failures, misconfigurations, or resource constraints.

Common Issues

Check logs first for any issue:

exo dedicated-inference deployment logs my-app -z at-vie-2

# Show last 100 lines
exo dedicated-inference deployment logs my-app -z at-vie-2 --tail 100

Deployment stuck in “deploying”

ErrorCauseSolution
Out of memory / CUDA out of memoryModel too large for GPUIncrease --gpu-count or use larger GPU (requires new deployment)
Failed to download modelModel name incorrect or connectivity issueVerify model name, delete and recreate model
Quota exceededInsufficient GPU quotaRequest increase via Portal (Organization → Quotas)

Inference requests failing

ErrorCauseSolution
401 UnauthorizedInvalid API keyRun reveal-api-key and update client
404 Not FoundWrong endpoint URLEnsure URL includes /v1 path
400 Bad RequestMalformed request bodyCheck OpenAI API format
500 errorsDeployment not readyWait for ready status

Slow performance

  • High concurrency: Add replicas to handle more parallel requests
    exo dedicated-inference deployment scale my-app 3 -z at-vie-2
  • Large contexts or high parallelism: Increase --gpu-count per instance (requires new deployment) for larger KV-cache
  • See Optimize Performance guide

Model creation failed

ErrorCauseSolution
Format errorNot safetensor formatUse model with safetensor format
Access deniedGated model, missing tokenSee Deploy Gated Model guide
Not foundWrong model IDVerify ID on Hugging Face

To recover: delete failed model, recreate with correct parameters.

High costs

Scale to zero when idle:

exo dedicated-inference deployment scale my-app 0 -z at-vie-2

See Optimize Costs guide.

Getting Help

Before contacting support, collect diagnostic info:

exo dedicated-inference deployment show my-app -z at-vie-2 > deployment-info.txt
exo dedicated-inference deployment logs my-app -z at-vie-2 > deployment-logs.txt

Open a support ticket via the Exoscale Portal.

Next Steps

Last updated on