Autoscaling

The load on your workloads may fluctuate over time.
With horizontal node autoscaling, instead of a fixed number of nodes, the autoscaler will automatically scale the MachineDeployment to an appropriate number of replicas.

This is especially effective in combination with Pod autoscalers, such as the HorizontalPodAutoscaler built into Kubernetes.

Enable autoscaling

You can enable autoscaling for a MachineDeployment using different clients.

Terraform

When using the MetaKube Terraform provider, you configure autoscaling by specifying the min_replicas and max_replicas fields:

resource "metakube_node_deployment" "nodes" {
  spec {
    min_replicas = 0
    max_replicas = 5
  }
}

UI

Just check the mark "Use horizontal node autoscaling" when creating or editing a MachineDeployment to enable autoscaling.

Configure Horizontal Node Autoscaler

Kubectl

Just add the following annotations to the MachineDeployment:

metadata:
  annotations:
    cluster.k8s.io/cluster-api-autoscaler-node-group-min-size: "1"
    cluster.k8s.io/cluster-api-autoscaler-node-group-max-size: "15"

Behavior

Scaling up

If autoscaling is enabled, the autoscaler will scale up the MachineDeployment if all the following conditions are met:

  • The current number of replicas is lower than the maximum number of replicas

    1. Get current replica count

      kubectl -n kube-system get machinedeployment $name -o jsonpath='{.spec.replicas}'
    2. Get configured max replica count

      kubectl -n kube-system get machinedeployment $name -o jsonpath='{.metadata.annotations.cluster\.k8s\.io/cluster-api-autoscaler-node-group-max-size}'
  • Pods can not be scheduled and remain in Pending state due to limited resources

    Get Pods in Pending state:

    kubectl -n $namespace get pod --field-selector spec.nodeName==""
  • The Nodes managed by the MachineDeployment allow the Pods to be scheduled

    Inspect the MachineDeployment's taints and labels and if the Pod matches.

  • Adding a new Node will create enough free capacity to accommodate the Pods

    Inspect the Pod's containers' requests:

    kubectl -n $namespace get pod $pod -o jsonpath='{..requests}'

    Their sum must be smaller than the node's allocatable resources.

The autoscaler calculates the number of required Nodes to schedule all Pending Pods and will update the replica count of the MachineDeployment accordingly.

Scaling down

If autoscaling is enabled, the autoscaler will scale down the MachineDeployment if all the following conditions are met:

  • The current number of replicas is higher than the specified minimum number of replicas

    1. Get current replica count

      kubectl -n kube-system get machinedeployment $name -o jsonpath='{.spec.replicas}'
    2. Get configured min replica count

      kubectl -n kube-system get machinedeployment $name -o jsonpath='{.metadata.annotations.cluster\.k8s\.io/cluster-api-autoscaler-node-group-min-size}'
  • The last scale-up hasn't happened within the last 2 minutes

  • The Nodes are utilized under a 50% threshold

    To check utilization of the Nodes:

    kubectl top no
  • All Pods can be scheduled on fewer nodes

    It could be that Pods are not allowed to be evicted for example because of PodDisruptionBudgets.

The autoscaler will simulate moving existing Pods to other Nodes and calculate candidates for removal.
It will remove underutilized Nodes one at a time.

Scaling up from zero

MetaKube supports scaling MachineDeployments down to zero and up from zero.
You may also use taints or rely on node labels.

Configuration

MetaKube runs the cluster autoscaler with the generic Cluster API provider plugin.
The version always matches the cluster's minor Kubernetes version.

We use the following additional configuration flags:

--scan-interval=1m
--scale-down-delay-after-add=2m
--scale-down-unneeded-time=2m
--scale-down-unready-time=2m
--skip-nodes-with-local-storage=false
--enforce-node-group-min-size=true

We currently provide no way to change this configuration.
If you encounter issues or have special requirements, please contact us.

Autoscaling is not suitable for workloads that use host local storage.
Because of the flag --skip-nodes-with-local-storage=false, Nodes with Pods that use e.g. hostPath volumes, may still be considered candidates to remove when scaling down.
This is a deliberate decision as in our experience, the alternative very often leads to false positives and unnecessarily blocks scale down.

Troubleshooting

If your MachineDeployment isn't scaling up or down, carefully examine the conditions required for scaling up or down respectively.

If you still can't find a reason why the MachineDeployment isn't scaled up or down, please contact our support.
Please include the output of the above steps in your inquiry.

References