# Autoscale SKS Node Pools

One of the primary features of running your applications on Kubernetes is the ability to scale your cluster without your intervention based on the current workload.

In Kubernetes, one can scale:

* the pods themselves vertically by raising and lowering resource requests using
  the Vertical Pod Autoscaler
* the pods horizontally by raising and lowering the number of pods in a
  deployment using the Horizontal Pod Autoscaler
* the number of nodes by resizing the nodepool based on either node utilization
  or pod deployment requirements using the cluster Autoscaler

First, we will describe the cluster Autoscaler option.


## Prerequisites

As a prerequisite for the following documentation, you need:

- An Exoscale SKS cluster on the Pro plan
- Access to your cluster via `kubectl`
- Basic Linux knowledge

If you do not have access to an SKS cluster, follow the [Quick Start Guide]({{< ref "/product/compute/instances/quick-start/">}}).

## Deploying the Cluster Autoscaler

In the [Kubernetes Autoscaler](https://github.com/kubernetes/autoscaler/) repository, you can find the Exoscale provider in the [./cluster-autoscaler/cloudprovider/exoscale](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/exoscale) folder.

As described in the README for that repository, you need to create a secret with an appropriate API key as well as the zone of your cluster:
```bash
export EXOSCALE_API_KEY="EXOxxxxxxxxxxxxxxxxxxxxxxxx"
export EXOSCALE_API_SECRET="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
export EXOSCALE_ZONE="ch-gva-2"

kubectl -n kube-system create secret generic exoscale-api-credentials \
   --from-literal=api-key="$EXOSCALE_API_KEY" \
   --from-literal=api-secret="$EXOSCALE_API_SECRET" \
   --from-literal=api-zone="$EXOSCALE_ZONE"
```

This API key **MUST** be authorized to perform the following API operations:

```bash
evict-sks-nodepool-members
get-instance
get-instance-pool
get-quota
list-sks-clusters
scale-sks-nodepool
```


Then you can get the [deployment
manifests](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/exoscale/examples/cluster-autoscaler.yaml). Make
sure to modify the deployment `image` to match the kubernetes version you are
currently using.

Then deploy the Autoscaler:
```bash
kubectl apply -f cluster-autoscaler.yaml
```

> [!NOTE]
> When testing, adjust the commented timeouts towards the end of the file to
> see scale-down happening within a minute or so, instead of the slower
> behavior you might want in a production deployment.


### Multiple nodepools

In case your cluster has multiple nodepools, you might want to tell the cluster
Autoscaler which nodepool should be scaled up or down. Otherwise, a random
nodepool will be scaled.

Create a `ConfigMap` with your nodepools **Instance Pool**
ID (**not** the SKS nodepool ID itself):

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority-expander
  namespace: kube-system
data:
  priorities: |-
    50:
      - 00719c6e-1d06-4053-afea-8926c3431ef7
```

As well as adding an argument to the autoscaler deployment:

```yaml
    - --expander=priority
```

The cluster Autoscaler will then target this prioritized nodepool instead.


## Putting it to the Test

To test everything is working as it should, we deploy a `DaemonSet` that is very
busy, but still fit in our current deployment of two nodes:

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: stress
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      run: stress
  template:
    metadata:
      labels:
        run: stress
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: run
                operator: In
                values:
                - stress
            topologyKey: kubernetes.io/hostname
      containers:
      - image: nixery.dev/stress:latest
        name: stress
        command:
        - stress
        - --cpu
        - "1"
        resources:
          limits:
            cpu: 300m
            memory: 30Mi
          requests:
            cpu: 150m
            memory: 15Mi
```

Note how we use a podAntiAffinity to ensure these pods cannot share a node. This
reflects real requirements, but the node filling up with pods would result in a
similar behavior.

After this is is deployed, the CPU limit will be consumed in each pod and the
Kubernetes metrics confirm this:

```bash 
kubectl top pods
NAME                     CPU(cores)   MEMORY(bytes)
stress-d77bdc8db-22dwb   292m         0Mi
stress-d77bdc8db-pd4xn   291m         0Mi
```

We now use the Horizontal Pod Autoscaler to react on these internal metrics
and increase this deployments replicas to 11:

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: stress
  namespace: default
spec:
  minReplicas: 1
  maxReplicas: 11
  metrics:
  - resource:
      name: cpu
      target:
        averageUtilization: 50
        type: Utilization
    type: Resource
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: stress
```

We can see new pods being spawned and new nodes being added after a short time,
as the pods can not be scheduled onto an existing node:

```bash
NAME                  STATUS                     ROLES    AGE    VERSION
pool-0f4cd-fquea      Ready                      <none>   3d6h   v1.29.6
pool-0f4cd-llrsh      NotReady                   <none>   16s    v1.29.6
pool-0f4cd-vhend      NotReady                   <none>   15s    v1.29.6
pool-0f4cd-ygwhj      Ready                      <none>   3d6h   v1.29.6
pool-0f4cd-zknko      NotReady                   <none>   15s    v1.29.6
[...]
```

You can look into the cluster Autoscaler logs to see it taking action:

```bash
kubectl -n kube-system logs deployment/cluster-autoscaler
```

Output:

```bash
[...]
I0324 19:50:12.521062       1 scale_up.go:675] Scale-up: setting group 0f4cd2ad-2825-4ae7-aaa2-1fd0f8e0af19 size to 5
I0324 19:50:12.530709       1 log.go:32] exoscale-provider: scaling SKS Nodepool afe7aa21-b4f6-409b-a70b-6fc31d6fada1 to size 5
[...]
```

When the new nodes are ready, the pods will start up on them. When you remove
the test deployment, superfluous nodes will be removed after a grace period.


## Tips and Tricks

* You can add `--v=4` to the list of arguments to get more information
  in the logs about why the cluster Autoscaler makes a certain decision.
* Certain pods can prevent the Autoscaler from removing a node. 
  See the [CA FAQ](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node)
  for more information.
* You can also [annotate certain nodes](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-prevent-cluster-autoscaler-from-scaling-down-a-particular-node), so they will not be touched by the CA.
* Longhorn Users: Note that the cluster Autoscaler is not fully supported. See the Issues section of the [Longhorn repository](https://github.com/longhorn/longhorn/issues/2203).


> [!TIP]
> Are you interested in kubernetes storage topics like Volumes, Persistent
> Volumes and Longhorn? Take a look at the __free__ [Certified Container
> Engineer Course][1] in our online academy.

[1]: https://academy.exoscale.com/courses/course-v1:Exoscale+CCE300+R2025/about