5. Scaling

In this lab, we are going to show you how to scale applications on Kubernetes. Furthermore, we show you how Kubernetes makes sure that the number of requested Pods is up and running and how an application can tell the platform that it is ready to receive requests.

Note

This lab does not depend on previous labs. You can start with an empty Namespace.

Task 5.1: Scale the example application

Create a new Deployment in your Namespace. So again, lets define the Deployment using YAML in a file deployment_example-web-app.yaml with the following content:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: example-web-app
  name: example-web-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: example-web-app
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: example-web-app
    spec:
      containers:
        - image: quay.io/acend/example-web-python:latest
          name: example-web-app
          resources:
            limits:
              cpu: 100m
              memory: 128Mi
            requests:
              cpu: 50m
              memory: 128Mi

and then apply with:

kubectl apply -f deployment_example-web-app.yaml --namespace <namespace>

If we want to scale our example application, we have to tell the Deployment that we want to have three running replicas instead of one. Let’s have a closer look at the existing ReplicaSet:

kubectl get replicasets --namespace <namespace>

Which will give you an output similar to this:

NAME                            DESIRED   CURRENT   READY   AGE
example-web-app-86d9d584f8      1         1         1       110s

Or for even more details:

kubectl get replicaset <replicaset> -o yaml --namespace <namespace>

The ReplicaSet shows how many instances of a Pod are desired, current and ready.

Now we scale our application to three replicas:

kubectl scale deployment example-web-app --replicas=3 --namespace <namespace>

Check the number of desired, current and ready replicas:

kubectl get replicasets --namespace <namespace>

NAME                            DESIRED   CURRENT   READY   AGE
example-web-app-86d9d584f8      3         3         3       4m33s

Look at how many Pods there are:

kubectl get pods --namespace <namespace>

Which gives you an output similar to this:

NAME                                  READY   STATUS    RESTARTS   AGE
example-web-app-86d9d584f8-7vjcj      1/1     Running   0          5m2s
example-web-app-86d9d584f8-hbvlv      1/1     Running   0          31s
example-web-app-86d9d584f8-qg499      1/1     Running   0          31s

Note

Kubernetes even supports autoscaling .

As we changed the number of replicas with the kubectl scale deployment command, the example-web-app Deployment now differs from your local deployment_example-web-app.yaml file. Change your local deployment_example-web-app.yaml file to match the current number of replicas and update the value replicas to 3:

[...]
metadata:
  labels:
    app: example-web-app
  name: example-web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example-web-app
[...]

Check for uninterruptible Deployments

Now we create a new Service of the type ClusterIP. Create a new file svc-example-app.yaml with the following content:

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: example-web-app
  name: example-web-app
spec:
  ports:
  - port: 5000
    protocol: TCP
    targetPort: 5000
  selector:
    app: example-web-app
  type: ClusterIP

and apply the file with:

kubectl apply -f svc-example-app.yaml --namespace <namespace>

Then we add the Ingress to access our application. Create a new file ing-example-web-app.yaml with the following content:

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-web-app
spec:
  rules:
    - host: example-web-app-<namespace>.<appdomain>
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service: 
                name: example-web-app
                port: 
                  number: 5000
  tls:
    - hosts:
      - example-web-app-<namespace>.<appdomain>

Apply this Ingress definition using, e.g.:

kubectl apply -f ing-example-web-app.yaml --namespace <namespace>

Let’s look at our Service. We should see all three corresponding Endpoints:

kubectl describe service example-web-app --namespace <namespace>

Name:                     example-web-app
Namespace:                acend-scale
Labels:                   app=example-web-app
Annotations:              <none>
Selector:                 app=example-web-app
Type:                     ClusterIP
IP:                       10.39.245.205
Port:                     <unset>  5000/TCP
TargetPort:               5000/TCP
Endpoints:                10.36.0.10:5000,10.36.0.11:5000,10.36.0.9:5000
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason                Age   From                Message
  ----    ------                ----  ----                -------

Scaling of Pods is fast as Kubernetes simply creates new containers.

You can check the availability of your Service while you scale the number of replicas up and down in your browser: http://example-web-app-<namespace>.<appdomain>.

Now, execute the corresponding loop command for your operating system in another console.

Linux:

URL=$(kubectl get ingress example-web-app -o go-template="{{ (index .spec.rules 0).host }}" --namespace <namespace>)
while true; do sleep 1; curl -s https://${URL}/pod/; date "+ TIME: %H:%M:%S,%3N"; done

Windows PowerShell:

while(1) {
  Start-Sleep -s 1
  Invoke-RestMethod https://<URL>/pod/
  Get-Date -Uformat "+ TIME: %H:%M:%S,%3N"
}

Scale from 3 replicas to 1. The output shows which Pod is still alive and is responding to requests:

example-web-app-86d9d584f8-7vjcj TIME: 17:33:07,289
example-web-app-86d9d584f8-7vjcj TIME: 17:33:08,357
example-web-app-86d9d584f8-hbvlv TIME: 17:33:09,423
example-web-app-86d9d584f8-7vjcj TIME: 17:33:10,494
example-web-app-86d9d584f8-qg499 TIME: 17:33:11,559
example-web-app-86d9d584f8-hbvlv TIME: 17:33:12,629
example-web-app-86d9d584f8-qg499 TIME: 17:33:13,695
example-web-app-86d9d584f8-hbvlv TIME: 17:33:14,771
example-web-app-86d9d584f8-hbvlv TIME: 17:33:15,840
example-web-app-86d9d584f8-7vjcj TIME: 17:33:16,912
example-web-app-86d9d584f8-7vjcj TIME: 17:33:17,980
example-web-app-86d9d584f8-7vjcj TIME: 17:33:19,051
example-web-app-86d9d584f8-7vjcj TIME: 17:33:20,119
example-web-app-86d9d584f8-7vjcj TIME: 17:33:21,182
example-web-app-86d9d584f8-7vjcj TIME: 17:33:22,248
example-web-app-86d9d584f8-7vjcj TIME: 17:33:23,313
example-web-app-86d9d584f8-7vjcj TIME: 17:33:24,377
example-web-app-86d9d584f8-7vjcj TIME: 17:33:25,445
example-web-app-86d9d584f8-7vjcj TIME: 17:33:26,513

The requests get distributed amongst the three Pods. As soon as you scale down to one Pod, there should be only one remaining Pod that responds.

Let’s make another test: What happens if you start a new Deployment while our request generator is still running?

kubectl rollout restart deployment example-web-app --namespace <namespace>

During a short period we won’t get a response:

example-web-app-86d9d584f8-7vjcj TIME: 17:37:24,121
example-web-app-86d9d584f8-7vjcj TIME: 17:37:25,189
example-web-app-86d9d584f8-7vjcj TIME: 17:37:26,262
example-web-app-86d9d584f8-7vjcj TIME: 17:37:27,328
example-web-app-86d9d584f8-7vjcj TIME: 17:37:28,395
example-web-app-86d9d584f8-7vjcj TIME: 17:37:29,459
example-web-app-86d9d584f8-7vjcj TIME: 17:37:30,531
example-web-app-86d9d584f8-7vjcj TIME: 17:37:31,596
example-web-app-86d9d584f8-7vjcj TIME: 17:37:32,662
# no answer
example-web-app-f4c5dd8fc-4nx2t TIME: 17:37:33,729
example-web-app-f4c5dd8fc-4nx2t TIME: 17:37:34,794
example-web-app-f4c5dd8fc-4nx2t TIME: 17:37:35,862
example-web-app-f4c5dd8fc-4nx2t TIME: 17:37:36,929
example-web-app-f4c5dd8fc-4nx2t TIME: 17:37:37,995
example-web-app-f4c5dd8fc-4nx2t TIME: 17:37:39,060
example-web-app-f4c5dd8fc-4nx2t TIME: 17:37:40,118
example-web-app-f4c5dd8fc-4nx2t TIME: 17:37:41,187

In our example, we use a very lightweight Pod. If we had used a more heavyweight Pod that needed a longer time to respond to requests, we would of course see a larger gap. An example for this would be a Java application with a startup time of 30 seconds:

example-spring-boot-2-73aln TIME: 16:48:25,251
example-spring-boot-2-73aln TIME: 16:48:26,305
example-spring-boot-2-73aln TIME: 16:48:27,400
example-spring-boot-2-73aln TIME: 16:48:28,463
example-spring-boot-2-73aln TIME: 16:48:29,507
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
 TIME: 16:48:33,562
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
 TIME: 16:48:34,601
 ...
example-spring-boot-3-tjdkj TIME: 16:49:20,114
example-spring-boot-3-tjdkj TIME: 16:49:21,181
example-spring-boot-3-tjdkj TIME: 16:49:22,231

It is even possible that the Service gets down, and the routing layer responds with the status code 503 as can be seen in the example output above.

In the following chapter we are going to look at how a Service can be configured to be highly available.

Uninterruptible Deployments

The rolling update strategy makes it possible to deploy Pods without interruption. The rolling update strategy means that the new version of an application gets deployed and started. As soon as the application says it is ready, Kubernetes forwards requests to the new instead of the old version of the Pod, and the old Pod gets terminated.

Additionally, container health checks help Kubernetes to precisely determine what state the application is in.

Basically, there are two different kinds of checks that can be implemented:

Liveness probes are used to find out if an application is still running
Readiness probes tell us if the application is ready to receive requests (which is especially relevant for the above-mentioned rolling updates)

These probes can be implemented as HTTP checks, container execution checks (the execution of a command or script inside a container) or TCP socket checks.

In our example, we want the application to tell Kubernetes that it is ready for requests with an appropriate readiness probe.

Our example application has a health check context named health: http://<node-ip>:<node-port>/health

Task 5.2: Availability during deployment

In our deployment configuration inside the rolling update strategy section, we define that our application always has to be available during an update: maxUnavailable: 0

Now insert the readiness probe at .spec.template.spec.containers above the resources line in your local deployment_example-web-app.yaml File:


...
containers:
  - image: quay.io/acend/example-web-python:latest
    imagePullPolicy: Always
    name: example-web-app
    # start to copy here
    readinessProbe:
      httpGet:
        path: /health
        port: 5000
        scheme: HTTP
      initialDelaySeconds: 10
      timeoutSeconds: 1
    # stop to copy here
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 50m
        memory: 128Mi
...

apply the file with:

kubectl apply -f deployment_example-web-app.yaml --namespace <namespace>

We are now going to verify that a redeployment of the application does not lead to an interruption.

Set up the loop again to periodically check the application’s response (you don’t have to set the $URL variable again if it is still defined):

URL=$(kubectl get ingress example-web-app -o go-template="{{ (index .spec.rules 0).host }}" --namespace <namespace>)
while true; do sleep 1; curl -s http://${URL}/pod/; date "+ TIME: %H:%M:%S,%3N"; done

Windows PowerShell:

while(1) {
  Start-Sleep -s 1
  Invoke-RestMethod https://<URL>/pod/
  Get-Date -Uformat "+ TIME: %H:%M:%S,%3N"
}

Restart your Deployment with:

kubectl rollout restart deployment example-web-app --namespace <namespace>

Self-healing

Via the Replicaset we told Kubernetes how many replicas we want. So what happens if we simply delete a Pod?

Look for a running Pod (status RUNNING) that you can bear to kill via kubectl get pods.

Show all Pods and watch for changes:

kubectl get pods -w --namespace <namespace>

Now delete a Pod (in another terminal) with the following command:

kubectl delete pod <pod> --namespace <namespace>

Observe how Kubernetes instantly creates a new Pod in order to fulfill the desired number of running instances.

Last modified June 28, 2025: chore(deps): update docker.io/nginxinc/nginx-unprivileged docker tag to v1.29 (#644) (7a956e5)