GitHub - andreas-schroeder/kafka-health-check: Health Check for Kafka Brokers.

标签: | 发表时间:2020-01-11 22:36 | 作者:
出处:https://github.com

Kafka Health Check

Health checker for Kafka brokers and clusters that operates by checking whether:

  • a message inserted in a dedicated health check topic becomes available for consumers,
  • the broker can stay in the ISR of a replication check topic,
  • the broker is in the in-sync replica set for all partitions it replicates,
  • under-replicated partitions exist,
  • out-of-sync replicas exist,
  • offline partitions exist, and
  • the metadata of the cluster and the ZooKeeper metadata are consistent with each other.

Status

Build Status

Release version is 0.1.0

Compiled binaries are available for Linux, macOS, and FreeBSD.

Use Cases

Submit a pull request to have your use case listed here!

Self-healing cluster

At AutoScout24, in order to reduce operational workload, we use kafka-health-check to automatically restart broker nodes as they become unhealthy.

In-place rolling updates

At AutoScout24, to keep the OS up to date of our clusters running on AWS, we perform regular in-place rolling updates. As we run immutable servers, we terminate each broker and replace them with fresh EC2 instances (keeping the previous broker ids). In order not to jeopardy the cluster stability when terminating brokers, we verify that the cluster is healthy before taking one broker offline. Similarly, we wait for the broker coming back online to fully catch up before proceeding with the next broker. To achieve this, we use the cluster health information provided by kafka-health-check.

Usage

    Usage of kafka-health-check:
  -broker-host string
    	ip address or hostname of broker host (default "localhost")
  -broker-id uint
    	id of the Kafka broker to health check
  -broker-port uint
    	Kafka broker port (default 9092)
  -check-interval duration
    	how frequently to perform health checks (default 10s)
  -no-topic-creation
    	disable automatic topic creation and deletion
  -replication-failures-count uint
    	number of replication failures before broker is reported unhealthy (default 5)
  -replication-topic string
    	name of the topic to use for replication checks - use one per cluster, defaults to broker-replication-check
  -server-port uint
    	port to open for http health status queries (default 8000)
  -topic string
    	name of the topic to use - use one per broker, defaults to broker-<id>-health-check
  -zookeeper string
    	ZooKeeper connect string (e.g. node1:2181,node2:2181,.../chroot)

Broker Health

Broker health can be queried at /:

    $ curl -s <broker-host>:8000/
{
    "broker": 1,
    "status": "sync"
}

Return codes and status values are:

  • 200with syncfor a healthy broker that is fully in sync with all leaders.
  • 200with imokfor a healthy broker that replays messages of its health check topic, but is not fully in sync.
  • 500with nookfor an unhealthy broker that fails to replay messages in its health check topic within 200 millisecondsor if it fails to stay in the ISR of the replication check topic for more checks than replication-failures-count(default 5).

The returned json contains details about replicas the broker is lagging behind:

    $ curl -s <broker-host>:8000/
{
    "broker": 3,
    "status": "imok",
    "out-of-sync": [
        {
            "topic": "mytopic",
            "partition": 0
        }
    ],
    "replication-failures": 1
}

Cluster Health

Cluster health can be queried at /cluster:

    $ curl -s <broker-host>:8000/cluster
{
    "status": "green"
}

Return codes and status values are:

  • 200with greenif all replicas of all partitions of all topics are in sync and metadata is consistent.
  • 200with yellowif one or more partitions are under-replicated and metadata is consistent.
  • 500with redif one or more partitions are offline or metadata is inconsistent.

The returned json contains details about metadata status and partition replication:

    $ curl -s <broker-host>:8000/cluster
{
    "status": "yellow",
    "topics": [
        {
            "topic": "mytopic",
            "status": "yellow",
            "partitions": {
                "1": {
                    "status": "yellow",
                    "OSR": [
                        3
                    ]
                },
                "2": {
                    "status": "yellow",
                    "OSR": [
                        3
                    ]
                }
            }
        }
    ]
}

The fields for additional info and structures are:

  • topicsfor topic replication status: [{"topic":"mytopic","status":"yellow","partitions":{"2":{"status":"yellow","OSR":[3]}}}]In this data, OSRmeans out-of-sync replica and contains the list of all brokers that are not in the ISR.
  • metadatafor inconsistencies between ZooKeeper and Kafka metadata: [{"broker":3,"status":"red","problem":"Missing in ZooKeeper"}]
  • zookeeperfor problems with ZooKeeper connection or data, contains a single string: "Fetching brokers failed: ..."

Supported Kafka Versions

Tested with the following Kafka versions:

  • 2.0.0
  • 1.1.1
  • 1.1.0
  • 1.0.0
  • 0.11.0.2
  • 0.11.0.1
  • 0.11.0.0
  • 0.10.2.1
  • 0.10.2.0
  • 0.10.1.1
  • 0.10.1.0
  • 0.10.0.1
  • 0.10.0.0
  • 0.9.0.1
  • 0.9.0.0

Kafka 0.8 is not supported.

see the compatibility specfor the full list of executed compatibility checks. To execute the compatibility checks, run make compatibility. Running the checks requires Docker.

Building

Run maketo build after running make depsto restore the dependencies using govendor.

Prerequisites

Notable Details on Health Check Behavior

  • When first started, the check tries to find the Kafka broker to check in the cluster metadata. Then, it tries to find the health check topic, and creates it if missing by communicating directly with ZooKeeper(configuration: 10 seconds message lifetime, one single partition assigned to the broker to check). This behavior can be disabled by using -no-topic-creation.
  • The check also creates one replication check topic for the whole cluster. This topic is expanded to all brokers that are checked.
  • When shutting down, the check deletes to health check topic partition by communicating directly with ZooKeeper. It also shrinks the partition assignment of the replication check topic, and deletes it when stopping the last health check process. This behavior can be disabled by using -no-topic-creation.
  • The check will try to create the health check and replication check topics only on its first connection after startup. If the topic disappears later while the check is running, it will not try to re-create its topics.
  • If the broker health check fails, the cluster health will be set to red.
  • For each check pass, the Kafka cluster metadata is fetched from ZooKeeper, i.e. the full data on brokers and topic partitions with replicas.

相关 [github andreas schroeder] 推荐:

GitHub - andreas-schroeder/kafka-health-check: Health Check for Kafka Brokers.

- -
At AutoScout24, to keep the OS up to date of our clusters running on AWS, we perform regular in-place rolling updates. As we run immutable servers, we terminate each broker and replace them with fresh EC2 instances (keeping the previous broker ids).

Andreas Ohlund摄影作品

- 璎珞天色 - PADMAG视觉杂志
Andreas Ohlund,瑞典摄影师,官方网站:http://www.andreasohlund.com/. 这组作品名为《Swedish Kittens》,点击阅读全文可见该系列的另几张.

Home · JohnLangford/vowpal_wabbit Wiki · GitHub

- -
There are two ways to have a fast learning algorithm: (a) start with a slow algorithm and speed it up, or (b) build an intrinsically fast learning algorithm.

GitHub - jgraph/drawio: Source to www.draw.io

- -
draw.io supports IE 11, Chrome 32+, Firefox 38+, Safari 9.1.x, 10.1.x and 11.0.x, Opera 20+, Native Android browser 5.1.x+, the default browser in the current and previous major iOS versions (e.g.

git和github简介(上)

- linyehui - 没做完,没准备好
在此贴上本人在Web标准化交流会6月25日北京站的主题分享. 在线PPT:http://jinjiang.github.com/slides/learning-git/. PPT源码:https://github.com/Jinjiang/slides/tree/gh-pages/learning-git.

Github使用指南(转)

- - CSDN博客推荐文章
来自:https://github.com/neuola/neuola-legacy/wiki/github%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97. 如果你只是想了解 github 的使用,请跳到 Github 简介一节. 作为程序员大军之一,想必大家有这样的经历吧.

github 上的好东西

- - 收集分享互联网资源
基于HTML5的专业级图像处理开源引擎.

Windows 下 使用TortoiseGit GitHub

- - CSDN博客研发管理推荐文章
TortoiseGit依赖msysgit,首先下载: http://code.google.com/p/msysgit/downloads/detail?name=msysGit-fullinstall-1.8.1.2-preview20130201.exe&can=2&q=. 再下载TortoiseGit: http://code.google.com/p/tortoisegit/wiki/Download?tm=2.

一个 GitHub Trending 小工具

- - IT瘾-dev
Github Trending基本上是我每天都会浏览的网页,上面会及时发布一些GIthub上比较有潜力的项目,或者说每日Star数增量排行榜. 不过由于Github Trending经常会实时更新,即使你访问得再勤,难免还是会错过一些你感兴趣的项目,为此不少人都想出了自己的解决办法,例如. josephyzhou,他的 github-trending项目得到了众多人的青睐,我仔细阅读了他的源码 (Go),发现实现也较为简单, 就用Python 重写了一下,发现代码少了好多,详见 我的 github-trending.

blong/clickhouse .md at master · xingxing9688/blong · GitHub

- -
https://clickhouse.yandex/tutorial.html快速搭建集群参考. https://clickhouse.yandex/reference_en.html官网文档. https://habrahabr.ru/company/smi2/blog/317682/关于集群配置参考.