Use Horizontal Node Autoscaling

MetaKube Kubernetes clusters support horizontal node autoscaling out of the box. This tutorial shows how you can configure and activate it in your cluster. Node autoscaling also plays well together with horizontal pod autoscaling.

The node autoscaler supports scaling down to one single node. Scaling down to zero nodes is currently not supported.

Clusters running Kubernetes 1.17 and lower, can only scale up. Scaling down requires Kubernetes 1.18 or higher.

Prerequisites

Deploy an application

For easy cleanups we create a new namespace for our tutorial:

$ kubectl create namespace hna-tutorial
namespace/hna-tutorial created

For our tutorial we will deploy an NGINX Hello World to simulate the workload. To install the app run:

$ kubectl run hello-app --image=nginxdemos/hello --port=80 --namespace hna-tutorial --requests="cpu=500m,memory=700Mi"
deployment.apps/hello-app created

Check that the pod of the new application was created successfully and is running:

$ kubectl get pods --namespace hna-tutorial
NAME                           READY     STATUS    RESTARTS   AGE
hello-app-5c7477d7b7-n44wq     1/1       Running   0          9s

Note: we defined a fairly high CPU and memory requests for our pod. This ensures that we quickly get to the point where the scheduler is not able to schedule new pods due to insufficient resources, when scaling the deployment up.

Now let's manually scale the deployment, in a production scenario this can be done automatically using the horizontal pod autoscaler:

$ kubectl scale deployment/hello-app --replicas 15 --namespace hna-tutorial
deployment.extensions/hello-app scaled

When you list all pods, you should notice several pods being stuck in "Pending" state:

$ kubectl get pods --namespace hna-tutorial
NAME                           READY     STATUS    RESTARTS   AGE
hello-app-6f488fcdfc-6pj74   1/1     Running   0          13s
hello-app-6f488fcdfc-9bb8h   1/1     Running   0          13s
hello-app-6f488fcdfc-9bn8l   1/1     Running   0          13s
hello-app-6f488fcdfc-n44wq   1/1     Running   0          3m47s
hello-app-6f488fcdfc-dbhq9   0/1     Pending   0          13s
hello-app-6f488fcdfc-dvnsd   0/1     Pending   0          13s
hello-app-6f488fcdfc-g2m8n   0/1     Pending   0          13s
hello-app-6f488fcdfc-hm8kp   0/1     Pending   0          13s
hello-app-6f488fcdfc-kfx95   0/1     Pending   0          13s
hello-app-6f488fcdfc-m2n8x   0/1     Pending   0          13s
hello-app-6f488fcdfc-mcnbc   0/1     Pending   0          13s
hello-app-6f488fcdfc-q5fmn   0/1     Pending   0          13s
hello-app-6f488fcdfc-spk6p   0/1     Pending   0          13s
hello-app-6f488fcdfc-xr6r8   0/1     Pending   0          13s
hello-app-6f488fcdfc-zx8st   0/1     Pending   0          13s

Taking a closer look at one of the pending pods, you can see that the scheduling failed because of insufficient memory and CPU resources:

$ kubectl describe pod hello-app-6f488fcdfc-m2n8x --namespace hna-tutorial
Name:               hello-app-6f488fcdfc-m2n8x
Namespace:          hna-tutorial
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             pod-template-hash=6f488fcdfc
                    run=hello-app
Annotations:        <none>
Status:             Pending
IP:
Controlled By:      ReplicaSet/hello-app-6f488fcdfc
Containers:
  hello-app:
    Image:      nginxdemos/hello
    Port:       80/TCP
    Host Port:  0/TCP
    Requests:
      cpu:        500m
      memory:     700Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-6d8qc (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  default-token-6d8qc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-6d8qc
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From                Message
  ----     ------            ----                ----                -------
  Warning  FailedScheduling  36s (x25 over 81s)  default-scheduler   0/5 nodes are available: 1 Insufficient memory, 5 Insufficient cpu.

This means we need to increase the capacity of our cluster by adding more nodes. This could be done manually through the MetaKube dashboard (see Manage number of worker nodes) or via the CLI (see Managing worker nodes over CLI). However, in this tutorial we will solve this automatically using the cluster autoscaler.

To do so, we have to create a scalable MachineDeployment in our cluster. You can either do this via the MetaKube dashboard by creating a new NodeDeployment and activating autoscaling:

Configure Horizontal Node Autoscaler

or with the CLI and kubectl:

# Choose the public key you want to deploy on the node
SSH_PUBLIC_KEY=$(cat ~/.ssh/id_rsa.pub)

# Getting the cluster name like this only works if you did not rename the
# context in the downloaded kubeconfig. If you did, choose the original name of
# the cluster (10 character alphanumeric string).
CLUSTER_NAME=$(kubectl config current-context)

# Set up the flavor for the node
FLAVOR="m1.small"

# Set up the correct region and availability zone
REGION="dbl"
AVAILABILITY_ZONE="dbl1"

OPERATING_SYSTEM="ubuntu"
IMAGE_NAME="Ubuntu Bionic 18.04 (2021-03-20)"
FLOATING_IP_POOL="ext-net"
K8S_VERSION="1.20.4"

cat <<EOF | kubectl apply -f -
apiVersion: cluster.k8s.io/v1alpha1
kind: MachineDeployment
metadata:
  annotations:
    cluster.k8s.io/cluster-api-autoscaler-node-group-min-size: "1"
    cluster.k8s.io/cluster-api-autoscaler-node-group-max-size: "15"
  name: scalable-machine-deployment
  namespace: kube-system
spec:
  replicas: 0
  minReadySeconds: 0
  selector:
    matchLabels:
      deployment: scalable-machine-deployment
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  paused: false
  template:
    metadata:
      labels:
        deployment: scalable-machine-deployment
    spec:
      providerSpec:
        value:
          cloudProvider: openstack
          cloudProviderSpec:
            availabilityZone: ${AVAILABILITY_ZONE}
            domainName: ""
            flavor: ${FLAVOR}
            floatingIpPool: ${FLOATING_IP_POOL}
            identityEndpoint: "https://api.${REGION}.cloud.syseleven.net:5000/v3"
            image: "${IMAGE_NAME}"
            network: metakube-${CLUSTER_NAME}
            password: ""
            region: ${REGION}
            securityGroups:
            - metakube-${CLUSTER_NAME}
            tenantName: ""
            tokenId: ""
            username: ""
          operatingSystem: ${OPERATING_SYSTEM}
          operatingSystemSpec:
            distUpgradeOnBoot: false
          sshPublicKeys:
          - "${SSH_PUBLIC_KEY}"
      versions:
        kubelet: "${K8S_VERSION}"
EOF

Note: on clusters using Kubernetes 1.17 and below the labels for the autoscaler are named cluster-autoscaler/max-size and cluster-autoscaler/min-size respectively.

This will create a MachineDeployment which initially contains only a single machine, but can be scaled up by the cluster autoscaler to a maximum of 15 machines, if you have enough quota of course. For every machine a VM will be automatically created in OpenStack, provisioned and joined as a node to the Kubernetes cluster. For more details on MachineDeployments see Cluster Management API.

If you list all available machines, you can see that a few new machines have been automatically created by the autoscaler:

$ kubectl get machines --namespace kube-system
NAME                                           AGE
machine-metakube-fhgbvx65xg-7flj7              8d
machine-metakube-fhgbvx65xg-hmgd4              8d
machine-metakube-fhgbvx65xg-q287t              8d
scalable-machine-deployment-5c4cbbc47b-62wsd   4m
scalable-machine-deployment-5c4cbbc47b-zwrts   4m

Furthermore, the replica count of the MachineDeployment has been updated too:

$ kubectl get machinedeployment --namespace kube-system scalable-machine-deployment --output jsonpath="{.spec.replicas}"
2

After a few minutes, once the VMs are started and provisioned, new nodes will appear in the cluster as well:

$ kubectl get nodes
NAME                                           STATUS     ROLES    AGE     VERSION
metakube-fhgbvx65xg-7flj7                      Ready      <none>   8d      v1.12.2
metakube-fhgbvx65xg-hmgd4                      Ready      <none>   8d      v1.12.2
metakube-fhgbvx65xg-q287t                      Ready      <none>   8d      v1.12.2
scalable-machine-deployment-5c4cbbc47b-62wsd   Ready      <none>   2m58s   v1.12.2
scalable-machine-deployment-5c4cbbc47b-zwrts   Ready      <none>   2m30s   v1.12.2

When enough nodes have been added, the previously pending pods are now successfully scheduled and running:

$ kubectl get pods --namespace hna-tutorial
NAME                           READY     STATUS    RESTARTS   AGE
hello-app-6f488fcdfc-6pj74   1/1     Running   0          13s
hello-app-6f488fcdfc-9bb8h   1/1     Running   0          13s
hello-app-6f488fcdfc-9bn8l   1/1     Running   0          13s
hello-app-6f488fcdfc-n44wq   1/1     Running   0          3m47s
hello-app-6f488fcdfc-dbhq9   1/1     Running   0          13s
hello-app-6f488fcdfc-dvnsd   1/1     Running   0          13s
hello-app-6f488fcdfc-g2m8n   1/1     Running   0          13s
hello-app-6f488fcdfc-hm8kp   1/1     Running   0          13s
hello-app-6f488fcdfc-kfx95   1/1     Running   0          13s
hello-app-6f488fcdfc-m2n8x   1/1     Running   0          13s
hello-app-6f488fcdfc-mcnbc   1/1     Running   0          13s
hello-app-6f488fcdfc-q5fmn   1/1     Running   0          13s
hello-app-6f488fcdfc-spk6p   1/1     Running   0          13s
hello-app-6f488fcdfc-xr6r8   1/1     Running   0          13s
hello-app-6f488fcdfc-zx8st   1/1     Running   0          13s

This also works in the opposite direction: lets scale the deployment of the Hello World app down to a single pod:

$ kubectl scale deployment/hello-app --replicas 1 --namespace hna-tutorial
deployment.extensions/hello-app scaled

Now keep inspecting the number of pods, machines and nodes as described above. The number of pods will quickly drop to 1. After a couple of minutes you will notice that, nodes will get drained and removed from the cluster.

Note: The down scaling feature is only available on Kubernetes 1.18 or higher. If you run older Kubernetes versions, the number of nodes needs to be reduced manually.

Clean up

Delete the MachineDeployment to remove the created machines and VMs:

$ kubectl delete machinedeployment --namespace kube-system scalable-machine-deployment
machinedeployment.cluster.k8s.io "scalable-machine-deployment" deleted

Delete the namespace:

$ kubectl delete namespace hna-tutorial
namespace "hna-tutorial" deleted