Kubernetes — Achieving High Availability Through Health Probes and Readiness Gates

标签: | 发表时间:2022-01-15 16:26 | 作者:

Kubernetes, or K8s, is a powerful tool for automating the deployment and management of containers. Due to the incredible scalability, velocity, and ease of management that it brings to your cloud deployments, it’s not hard to see why businesses are adopting it at a rapid pace.

Apart from the host of features it offers for your deployments, you can configure it to work as a high-availability tool for your applications. This blog post looks at a couple of features that you can use to ensure that your application is never down and is always available.

Container Health Checks Through Probes

Probes are periodic health checks that you can perform to determine if the running containers are in a healthy state. There are two different types of probes you can use: a liveness probeand a readiness probe.

Liveness Probe

In Kubernetes, a pod is the smallest deployable unit of computing that you can create and manage. A liveness probe determines if a pod is alive or not. If it’s alive, Kubernetes does not perform any action. If dead, it will restart the pod. For example, if you have running microservices, and one of the microservices fails to function (maybe, due to a bug), this probe can help bring the pod back to life by restarting it using the default restart policy.

Readiness Probe

The readiness probe determines if the pod is ready to receive traffic. If not, Kubernetes will not send any traffic to this pod until it is ready. Such a probe is performed throughout the lifecycle of the pod. If the pod needs to be made unavailable for some reason, such as scheduled maintenance or to perform some background tasks, it can be configured to respond to probes with different values.

These probes are performed on containers in a pod. Let’s take a look at an example to understand this better. Say you have a pod with a container for each nginx application and a GraphQL application. In this case, nginx will have its own liveness and readiness configuration and GraphQL will have its own configurations.

For your nginx app, the config could be as follows:

              - sh
              - -c
              - "sleep 30 && /usr/local/openresty/bin/openresty -s quit"
          - name: http
            containerPort: 80
            protocol: TCP
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 1
          failureThreshold: 1
            path: /health
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 1
          failureThreshold: 1

We have tweaked the default values of some parameters. Let’s look at the meaning of some of these:

tcpSocket :

port: 80

If tcpSocket is found open at port 80, it is considered a success for livenessProbe.

initialDelaySeconds: 5

The probe executes 5 seconds after the container starts.

periodSeconds: 1

The probe executes after every one second.

failureThreshold: 1

If the probe fails one time, it will mark the container as unhealthy or not ready.


path: /health

port: 80

The pod IP connects to /healthat port 80 for readinessProbe.

For GraphQL, you can use:

              - sh
              - -c
              - "sleep 20 && kill 1"

The preStop hook above is a container hook called immediately before terminating a container. It works in conjunction with a parameter called terminationGracePeriodSeconds.

NOTE:The terminationGracePeriodSecondsparameter applies to a pod. So, after executing this parameter, the pod will start terminating. You need to set this value keeping in mind the container executes preStop successfully.

To add some context, in most applications, a “probe” is an HTTP endpoint. If the endpoint returns a status code from 200 to 399, the probe is successful. Anything else is considered a failure.

Readiness Gates to Handle Unexpected Issues

The above probes help determine if your pods are healthy and can receive traffic. However, it is likely that the infrastructure responsible for delivering traffic to your pod is not ready. It may be due to reasons such as network policies or load balancers taking more time than expected. Let’s look at an example to understand how this might happen.

Suppose you have a GraphQL application running on two pods, and you want to restart the deployment. Load balancers expose these pods to the outside world, so they need to be registered in the load balancer target group. When you execute #kubectl rollout restart deploy/graphql to restart, the new pods will start the cycle until they are in a running state. When this happens, Kubernetes will start terminating old pods, irrespective of whether the new pods are registered in the load balancers and are ready to send and receive the traffic.

An effective strategy to bridge such a gap is to use Pod Readiness Gates. It is a parameter to define if the new pods are registered with the load balancers and are healthy to receive traffic. It gets feedback from the load balancers and informs the upgrade manager that the new pods are registered in the target group and are ready to go.

You can use helm to install and configure the GraphQL application for the above example, writing readinessGates in values.yaml.

Pod Configuration for readinessGates:

To implement the above, add readiness gate to your pod as follows:

conditionType: target-health.alb.ingress.k8s.aws/<ingress name>_<service name>_<service port>.

podAnnotations: {}

podSecurityContext: {}
  # fsGroup: 2000

securityContext: {}
  # capabilities:
  #   drop:
  #   - ALL
  # readOnlyRootFilesystem: true
  # runAsNonRoot: true
region: ""

  args1: "node"
  args2: "server.js"
ports: #node containerport
  containerport: 9000

  name: NODE_ENV
  value: "development"

  type: NodePort
  port: 80
  targetport: 80

Once done, verify the status of readinessGates with the following command:

kubectl get pod -o wide

Using health probes on containers enhances the application’s performance and reliability, while readiness gates on pods ensure that it is ready before accepting any traffic. Having health probes and readiness gates configured can ensure that your app is always available without downtime.

相关 [kubernetes achieving high] 推荐:

Kubernetes — Achieving High Availability Through Health Probes and Readiness Gates

- -
Due to the incredible scalability, velocity, and ease of management that it brings to your cloud deployments, it’s not hard to see why businesses are adopting it at a rapid pace..

译|High-Performance Server Architecture

- - 掘金 架构
本文的目的是分享我多年来关于如何开发某种应用程序的一些想法,对于这种应用程序,术语“服务”只是一个无力的近似称呼. 更准确地说,将写的与一大类程序有关,这些程序旨每秒处理大量离散的消息或请求. 网络服务通常最适合此定义,但从某种意义上讲,实际上并非所有的程序都是服务. 但是,由于“高性能请求处理程序”是很糟糕的标题,为简单起见,倒不如叫“服务”万事大吉.

隐形摩天轮:High Wheel

- caihexi - 爱…稀奇~{新鲜:科技:...
如果说普通的摩天轮尚不能满足你对刺激的全部需求,那么试试这个吧,隐形摩天轮(High Wheel):. 来自西班牙艺术家Maider López的创意,这摩天轮就好像是漂浮在空中,人们悬在那里摇摇欲坠……好吧,虽然只是图片,我已经吓得屁滚尿流了……而且在屁滚尿流之余,我还诞生了一个更邪恶的想法,为啥不把摩天轮的吊斗也做成透明的呢.

Kubernetes & Microservice

- - 午夜咖啡
这是前一段时间在一个微服务的 meetup 上的分享,整理成文章发布出来. 谈微服务之前,先澄清一下概念. 微服务这个词的准确定义很难,不同的人有不同的人的看法. 比如一个朋友是『微服务原教旨主义者』,坚持微服务一定是无状态的 http API 服务,其他的都是『邪魔歪道』,它和 SOA,RPC,分布式系统之间有明显的分界.


- - Z.S.K.'s Records
记录在使用Kubernetes中遇到的各种问题及解决方案, 好记性不如烂笔头. prometheus提示 /metrics/resource/v1alpha1 404. 原因: 这是因为[/metrics/resource/v1alpha1]是在v1.14中才新增的特性,而当前kubelet版本为1.13.

Percona 的 MySQL High Availability 機制比較文

- - Gea-Suan Lin's BLOG
Percona 發了一篇「 High-availability options for MySQL, October 2013 update」,比較目前 MySQL 上常見的 High Availability 機制. 這些都是把 High Availability 做在 MySQL 上,讓前端的程式不需要操心的方式.

Kafka设计解析(二):Kafka High Availability (上)

- -
Kafka在0.8以前的版本中,并不提供High Availablity机制,一旦一个或多个Broker宕机,则宕机期间其上所有Partition都无法继续提供服务. 若该Broker永远不能再恢复,亦或磁盘故障,则其上数据将丢失. 而Kafka的设计目标之一即是提供数据持久化,同时对于分布式系统来说,尤其当集群规模上升到一定程度后,一台或者多台机器宕机的可能性大大提高,对Failover要求非常高.


- -
两周前,Kubernetes在其最新的Changelog中宣布1.20之后将要弃用dockershime,也就说Kubernetes将不再使用Docker做为其容器运行时. 这一消息持续发酵,掀起了不小的波澜,毕竟Kubernetes+Docker的经典组合是被市场所认可的,大量企业都在使用. 看上去这个“弃用”的决定有点无厘头,那么为什么Kubernetes会做出这样的决定.

Kubernetes 完全教程

- - 午夜咖啡
经过一个阶段的准备,视频版本的 《Kubernetes 完全教程》出炉了. 课程一共分为七节,另外有一节 Docker 预备课,每节课大约一个多小时. 目标是让从没接触过 Kubernetes 的同学也能通过这个课程掌握 Kubernetes. 为什么要学习 Kubernetes. 在介绍课程之前,先说说为什么要学习 Kubernetes 以及什么人需要学习 Kubernetes.

Kubernetes 监控详解

- - DockOne.io
【编者的话】监控 Kubernetes 并不是件容易的事. 本文介绍了监控 Kubernetes 的难点、用例以及有关工具,希望可以帮助大家进一步了解监控 Kubernetes. 如果想要监控 Kubernetes,包括基础架构平台和正在运行的工作负载,传统的监控工具和流程可能还不够用. 就目前而言,监控 Kubernetes 并不是件容易的事.