Kubernetes — Achieving High Availability Through Health Probes and Readiness Gates
Kubernetes, or K8s, is a powerful tool for automating the deployment and management of containers. Due to the incredible scalability, velocity, and ease of management that it brings to your cloud deployments, it’s not hard to see why businesses are adopting it at a rapid pace.
Apart from the host of features it offers for your deployments, you can configure it to work as a high-availability tool for your applications. This blog post looks at a couple of features that you can use to ensure that your application is never down and is always available.
Container Health Checks Through Probes
Probes are periodic health checks that you can perform to determine if the running containers are in a healthy state. There are two different types of probes you can use: a liveness probeand a readiness probe.
Liveness Probe
In Kubernetes, a pod is the smallest deployable unit of computing that you can create and manage. A liveness probe determines if a pod is alive or not. If it’s alive, Kubernetes does not perform any action. If dead, it will restart the pod. For example, if you have running microservices, and one of the microservices fails to function (maybe, due to a bug), this probe can help bring the pod back to life by restarting it using the default restart policy.
Readiness Probe
The readiness probe determines if the pod is ready to receive traffic. If not, Kubernetes will not send any traffic to this pod until it is ready. Such a probe is performed throughout the lifecycle of the pod. If the pod needs to be made unavailable for some reason, such as scheduled maintenance or to perform some background tasks, it can be configured to respond to probes with different values.
These probes are performed on containers in a pod. Let’s take a look at an example to understand this better. Say you have a pod with a container for each nginx application and a GraphQL application. In this case, nginx will have its own liveness and readiness configuration and GraphQL will have its own configurations.
For your nginx app, the config could be as follows:
lifecycle: preStop: exec: command: - sh - -c - "sleep 30 && /usr/local/openresty/bin/openresty -s quit" ports: - name: http containerPort: 80 protocol: TCP livenessProbe: tcpSocket: port: 80 initialDelaySeconds: 5 periodSeconds: 1 failureThreshold: 1 readinessProbe: httpGet: path: /health port: 80 initialDelaySeconds: 5 periodSeconds: 1 failureThreshold: 1
We have tweaked the default values of some parameters. Let’s look at the meaning of some of these:
tcpSocket :
port: 80
If tcpSocket is found open at port 80, it is considered a success for livenessProbe.
initialDelaySeconds: 5
The probe executes 5 seconds after the container starts.
periodSeconds: 1
The probe executes after every one second.
failureThreshold: 1
If the probe fails one time, it will mark the container as unhealthy or not ready.
httpGet:
path: /health
port: 80
The pod IP connects to /healthat port 80 for readinessProbe.
For GraphQL, you can use:
lifecycle: preStop: exec: command: - sh - -c - "sleep 20 && kill 1"
The preStop hook above is a container hook called immediately before terminating a container. It works in conjunction with a parameter called terminationGracePeriodSeconds.
NOTE:The terminationGracePeriodSeconds
parameter applies to a pod. So, after executing this parameter, the pod will start terminating. You need to set this value keeping in mind the container executes preStop successfully.
To add some context, in most applications, a “probe” is an HTTP endpoint. If the endpoint returns a status code from 200 to 399, the probe is successful. Anything else is considered a failure.
Readiness Gates to Handle Unexpected Issues
The above probes help determine if your pods are healthy and can receive traffic. However, it is likely that the infrastructure responsible for delivering traffic to your pod is not ready. It may be due to reasons such as network policies or load balancers taking more time than expected. Let’s look at an example to understand how this might happen.
Suppose you have a GraphQL application running on two pods, and you want to restart the deployment. Load balancers expose these pods to the outside world, so they need to be registered in the load balancer target group. When you execute #kubectl rollout restart deploy/graphql to restart, the new pods will start the cycle until they are in a running state. When this happens, Kubernetes will start terminating old pods, irrespective of whether the new pods are registered in the load balancers and are ready to send and receive the traffic.
An effective strategy to bridge such a gap is to use Pod Readiness Gates. It is a parameter to define if the new pods are registered with the load balancers and are healthy to receive traffic. It gets feedback from the load balancers and informs the upgrade manager that the new pods are registered in the target group and are ready to go.
You can use helm to install and configure the GraphQL application for the above example, writing readinessGates in values.yaml.
Pod Configuration for readinessGates:
To implement the above, add readiness gate to your pod as follows:
conditionType: target-health.alb.ingress.k8s.aws/<ingress name>_<service name>_<service port>.
podAnnotations: {} podSecurityContext: {} # fsGroup: 2000 securityContext: {} # capabilities: # drop: # - ALL # readOnlyRootFilesystem: true # runAsNonRoot: true region: "" readinessgates: target-health.alb.ingress.k8s.aws/cs-ingress_dev8-graphql_80 command: args1: "node" args2: "server.js" ports: #node containerport containerport: 9000 env1: name: NODE_ENV value: "development" service: type: NodePort port: 80 targetport: 80
Once done, verify the status of readinessGates with the following command:
kubectl get pod -o wide
Using health probes on containers enhances the application’s performance and reliability, while readiness gates on pods ensure that it is ready before accepting any traffic. Having health probes and readiness gates configured can ensure that your app is always available without downtime.