Monitor and Troubleshoot Deployments
This guide covers how to monitor deployments, interpret logs, and resolve common issues.
Monitoring Deployment Status
List All Deployments
Get an overview of all deployments in a zone:
exo dedicated-inference deployment list -z at-vie-2Output includes:
- Deployment name
- Model name
- GPU type and count
- Replica count
- Current status
Check Specific Deployment
View detailed information about a single deployment:
exo dedicated-inference deployment show my-app -z at-vie-2Key information:
- Deployment URL (endpoint for inference requests)
- Status (
deploying,ready,scaling) - GPU configuration (type, count per instance)
- Replica count (current and target)
- Creation and update timestamps
Understanding Deployment States
| Status | Description | Expected Duration |
|---|---|---|
deploying | Resources are being provisioned and model is loading | 3-5 minutes (longer for large models) |
ready | Deployment is running and accepting requests | Stable state |
scaling | Replicas are being added or removed | 3-5 minutes per replica |
Viewing Deployment Logs
Logs provide detailed information about deployment health, model loading, and inference requests.
Access Logs
exo dedicated-inference deployment logs my-app -z at-vie-2What Logs Show
- Model Loading
- Track progress as the model downloads from Object Storage and loads into GPU memory.
- GPU Memory Usage
- Identify if your model fits within available GPU memory.
- Inference Requests
- Monitor incoming requests and response times.
- Errors and Warnings
- Diagnose failures, misconfigurations, or resource constraints.
Common Issues
Check logs first for any issue:
exo dedicated-inference deployment logs my-app -z at-vie-2
# Show last 100 lines
exo dedicated-inference deployment logs my-app -z at-vie-2 --tail 100Deployment stuck in “deploying”
| Error | Cause | Solution |
|---|---|---|
Out of memory / CUDA out of memory | Model too large for GPU | Increase --gpu-count or use larger GPU (requires new deployment) |
Failed to download model | Model name incorrect or connectivity issue | Verify model name, delete and recreate model |
Quota exceeded | Insufficient GPU quota | Request increase via Portal (Organization → Quotas) |
Inference requests failing
| Error | Cause | Solution |
|---|---|---|
401 Unauthorized | Invalid API key | Run reveal-api-key and update client |
404 Not Found | Wrong endpoint URL | Ensure URL includes /v1 path |
400 Bad Request | Malformed request body | Check OpenAI API format |
500 errors | Deployment not ready | Wait for ready status |
Slow performance
- High concurrency: Add replicas to handle more parallel requests
exo dedicated-inference deployment scale my-app 3 -z at-vie-2 - Large contexts or high parallelism: Increase
--gpu-countper instance (requires new deployment) for larger KV-cache - See Optimize Performance guide
Model creation failed
| Error | Cause | Solution |
|---|---|---|
| Format error | Not safetensor format | Use model with safetensor format |
| Access denied | Gated model, missing token | See Deploy Gated Model guide |
| Not found | Wrong model ID | Verify ID on Hugging Face |
To recover: delete failed model, recreate with correct parameters.
High costs
Scale to zero when idle:
exo dedicated-inference deployment scale my-app 0 -z at-vie-2See Optimize Costs guide.
Getting Help
Before contacting support, collect diagnostic info:
exo dedicated-inference deployment show my-app -z at-vie-2 > deployment-info.txt
exo dedicated-inference deployment logs my-app -z at-vie-2 > deployment-logs.txtOpen a support ticket via the Exoscale Portal.
Next Steps
Last updated on