Kubernetes offers Horizontal Pod Autoscaling and MetaKube clusters automatically come with the necessary metrics-server and configuration out of the box. This tutorial gives you an example on how to use this. For more generic information see the Kubernetes Documentation on it.
This tutorial requires a MetaKube cluster and access to it via kubectl
.
For easy cleanup we create a dedicated namespace for our tutorial:
$ kubectl create namespace hpa-tutorial
namespace/hpa-tutorial created
For our tutorial we will just deploy a NGINX Hello World app, run:
$ cat << 'EOF' | kubectl create -f -
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: hello-app
name: hello-app
namespace: hpa-tutorial
spec:
replicas: 1
selector:
matchLabels:
app: hello-app
template:
metadata:
labels:
app: hello-app
spec:
containers:
- name: hello
image: nginxdemos/hello
resources:
requests:
cpu: 100m
memory: 256Mi
ports:
- containerPort: 80
EOF
deployment.apps/hello-app created
Check that the pod of the new application was created successfully and is running:
$ kubectl --namespace hpa-tutorial get pods --watch
NAME READY STATUS RESTARTS AGE
hello-app-5c7477d7b7-n44wq 1/1 Running 0 9s
Once the pods are running, we expose it with a load balancer, so that we can reach it from the outside:
$ kubectl --namespace hpa-tutorial expose deployment hello-app --name hello-app-svc --port 80 --target-port 80 --type LoadBalancer
service/hello-app-svc exposed
Check that the service received an external IP address, this can take a few seconds:
$ kubectl --namespace hpa-tutorial get service hello-app-svc --watch
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hello-app-svc LoadBalancer 10.10.10.102 195.192.xxx.xxx 80:31750/TCP 37s
Check that you can successfully access the service:
$ curl http://195.192.xxx.xxx/ -I
HTTP/1.1 200 OK
Server: nginx/1.13.8
Date: Mon, 27 Aug 2018 11:53:46 GMT
Content-Type: text/html
Connection: keep-alive
Expires: Mon, 27 Aug 2018 11:53:45 GMT
Cache-Control: no-cache
To configure the autoscaler, run:
$ kubectl --namespace hpa-tutorial autoscale deployment hello-app --min 1 --max 6 --cpu-percent 5
horizontalpodautoscaler.autoscaling/hello-app autoscaled
In real life you do not want to use 5% CPU as a limit, but this way we can more easily see the effect of the autoscaler in a tutorial setting. Have a look at the official Kubernetes Documentation for more information about possible limit settings. You can check that the autoscaler was created with:
$ kubectl --namespace hpa-tutorial get horizontalpodautoscaler hello-app
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hello-app Deployment/hello-app <unknown>/5% 1 6 2 37s
To simulate load we can use a tool like ab to create traffic to our service:
ab -c 100 -n 1000 http://195.192.xxx.xxx/
After running the above command a few times (Kubernetes needs enough data in order to decide to scale up), you should see new pods being created:
$ kubectl --namespace hpa-tutorial get pods --watch
NAME READY STATUS RESTARTS AGE
hello-app-5d6bcff5dd-frdnl 1/1 Running 0 4s
hello-app-5d6bcff5dd-bjkcj 0/1 Pending 0 0s
hello-app-5d6bcff5dd-bjkcj 0/1 Pending 0 0s
hello-app-5d6bcff5dd-948xr 0/1 Pending 0 0s
hello-app-5d6bcff5dd-mdh9n 0/1 Pending 0 0s
hello-app-5d6bcff5dd-948xr 0/1 Pending 0 1s
hello-app-5d6bcff5dd-mdh9n 0/1 Pending 0 1s
hello-app-5d6bcff5dd-bjkcj 0/1 ContainerCreating 0 1s
hello-app-5d6bcff5dd-948xr 0/1 ContainerCreating 0 1s
hello-app-5d6bcff5dd-mdh9n 0/1 ContainerCreating 0 1s
hello-app-5d6bcff5dd-bjkcj 0/1 ContainerCreating 0 3s
hello-app-5d6bcff5dd-mdh9n 0/1 ContainerCreating 0 3s
hello-app-5d6bcff5dd-948xr 0/1 ContainerCreating 0 3s
hello-app-5d6bcff5dd-bjkcj 1/1 Running 0 4s
hello-app-5d6bcff5dd-mdh9n 1/1 Running 0 7s
hello-app-5d6bcff5dd-948xr 1/1 Running 0 8s
For more information on the decisions the autoscaler made, you check its events:
$ kubectl --namespace hpa-tutorial describe horizontalpodautoscaler hello-app
Name: hello-app
Namespace: hpa-tutorial
Labels: <none>
Annotations: <none>
CreationTimestamp: Mon, 27 Aug 2018 14:04:46 +0200
Reference: Deployment/hello-app
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 0% (0) / 5%
Min replicas: 1
Max replicas: 6
Deployment pods: 4 current / 4 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale False BackoffBoth the time since the previous scale is still within both the downscale and upscale forbidden windows
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited True TooFewReplicas the desired replica count is more than the maximum replica count
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 2m (x2 over 3m) horizontal-pod-autoscaler unable to get metrics for resource cpu: no metrics returned from resource metrics API
Warning FailedComputeMetricsReplicas 2m (x2 over 3m) horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Normal SuccessfulRescale 2m horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target
Delete the namespace:
$ kubectl delete namespace hpa-tutorial
namespace "hpa-tutorial" deleted