9.8. Horizontal Pod Autoscaler (HPA)
The Horizontal Pod Autoscaler (HPA) in Kubernetes is a feature that automatically scales the number of pods in a deployment, replica set, or stateful set based on observed CPU utilization, memory usage, or custom metrics.
HPA continuously monitors the resource usage of pods and adjusts the number of replicas to maintain a desired performance level. The scaling process is based on metrics collected from the Kubernetes Metrics Server or external monitoring systems like Prometheus.
For more details, see also the Kubernetes documentation on Horizontal Pod Autoscaling .
Task 9.8.1: Create a Deployment, Service and the HPA
Let’s try this out, first, we create a new Deployment with the file deploy-hpa.yaml
.
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: hpa-demo-deployment
labels:
run: hpa-demo-deployment
spec:
selector:
matchLabels:
run: hpa-demo-deployment
replicas: 1
template:
metadata:
labels:
run: hpa-demo-deployment
spec:
containers:
- name: hpa-demo-deployment
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
And a service in svc-hpa.yaml
to connect to our pods:
---
apiVersion: v1
kind: Service
metadata:
name: hpa-demo-deployment
labels:
run: hpa-demo-deployment
spec:
ports:
- port: 80
selector:
run: hpa-demo-deployment
And finally for the HPA to do its job, we also have to deploy the HPA object in hpa.yaml
:
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-demo-deployment
labels:
run: hpa-demo-deployment
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hpa-demo-deployment
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 50
Apply all those files with:
cat *hpa.yaml | kubectl apply -f -
Task 9.8.2: Trigger the HPA
To see our HPA in action, lets generate some traffic on our hpa-demo-deployment in a seperate terminal: We use a simple while loop with a wget
call to our hpa-demo-deployment
service:
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never --namespace <namespace> -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://hpa-demo-deployment; done"
Now lets watch how the HPA increases the replica count of our Deployment:
watch kubectl get deploy,pod,hpa -l run=hpa-demo-deployment --namespace <namespace>
At beginn, you just have one Pod:
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/hpa-demo-deployment 1/1 1 1 42h
NAME READY STATUS RESTARTS AGE
pod/hpa-demo-deployment-9cc6d54b5-kprvn 1/1 Running 0 42h
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/hpa-demo-deployment Deployment/hpa-demo-deployment cpu: 0%/50% 1 10 1 42h
after a while, you notice that the CPU utilization value on the HPA is increasing:
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/hpa-demo-deployment 1/1 1 1 42h
NAME READY STATUS RESTARTS AGE
pod/hpa-demo-deployment-9cc6d54b5-kprvn 1/1 Running 0 42h
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/hpa-demo-deployment Deployment/hpa-demo-deployment cpu: 6%/50% 1 10 1 42h
And then you see that new Pods are being scheduled in our Namespace, because the current CPU utilization is at around 250% (and therefore over its target of 50%). The HPA will now scale your Deployment until it reaches again the target 50% CPU utilization or when MAXPODS is reached:
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/hpa-demo-deployment 6/6 6 6 42h
NAME READY STATUS RESTARTS AGE
pod/hpa-demo-deployment-9cc6d54b5-7fnkf 1/1 Running 0 11s
pod/hpa-demo-deployment-9cc6d54b5-k5tdg 1/1 Running 0 26s
pod/hpa-demo-deployment-9cc6d54b5-kgngq 1/1 Running 0 26s
pod/hpa-demo-deployment-9cc6d54b5-kprvn 1/1 Running 0 42h
pod/hpa-demo-deployment-9cc6d54b5-t9xhq 1/1 Running 0 26s
pod/hpa-demo-deployment-9cc6d54b5-vt9zg 1/1 Running 0 11s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/hpa-demo-deployment Deployment/hpa-demo-deployment cpu: 249%/50% 1 10 4 42h
Once the CPU utilization reaches around 50% again, no more new Pods will be created and the replica count remains on that level:
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/hpa-demo-deployment 6/6 6 6 42h
NAME READY STATUS RESTARTS AGE
pod/hpa-demo-deployment-9cc6d54b5-7fnkf 1/1 Running 0 39s
pod/hpa-demo-deployment-9cc6d54b5-k5tdg 1/1 Running 0 54s
pod/hpa-demo-deployment-9cc6d54b5-kgngq 1/1 Running 0 54s
pod/hpa-demo-deployment-9cc6d54b5-kprvn 1/1 Running 0 42h
pod/hpa-demo-deployment-9cc6d54b5-t9xhq 1/1 Running 0 54s
pod/hpa-demo-deployment-9cc6d54b5-vt9zg 1/1 Running 0 39s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/hpa-demo-deployment Deployment/hpa-demo-deployment cpu: 46%/50% 1 10 6 42h
Stop the load-generator
by closing the terminal. You will see that the deployment scales back to 1 replica.