Apache Kafka Monitoring

screenshot - all brokers heatmap

 

RTView's Solution Package for Apache Kafka provides a complete Kafka monitoring solution with pre-built dashboards for monitoring Kafka brokers, producers, consumers, topics, and Kafka Zookeepers. With over 30 pre-defined alerts and over 15 pre-built monitoring dashboards, users can deploy quickly without the time, skill and expense necessary to build their own dashboards from scratch using open-source tools.

Monitor Kafka Brokers

Brokers are the heart of Apache Kafka and manage critical functions such as partitions, reads and writes, updating replicas with new data and flushing expired data.  RTView monitors dozens of metrics provides early warning for key metrics such as under replicated partitions and slow purge rates.

  • Broker State
  • Partitions (count, offline, and under replicated)
  • Controllers
  • Purgatory (fetch, heartbeat, produce, rebalance, topic)
  • Metered Metrics (mean rate, 1-min, 5-min and 15-min rate)
  • Histogrammed Metrics (min, mean, max, std dev and percentile)
  • Timer Metrics (min, max, mean, std dev, 1-min, 5-min, 15-min rate, and by percentile)

Monitor Kafka Topics

Monitor the performance of all Kafka topics and drill-down for detailed information on topic performance on individual brokers.

  • Messages (in per second)
  • Bytes (in/out/rejected per second)
  • Produce Requests (total/failed per second)
  • Fetch Requests (total/failed per second)
  • Trend data (count, mean rate, 1-min/5-min/15-min avg)

Monitor Kafka Producers

Monitoring the health of Kafka producers can provide an early warning indicator to overall application health. It is important to identify any unusual trends as soon as possible so that you can react quickly to intercept growing problems. Metrics include:

  • Producer Count and Status
  • Batch Size (min, max)
  • Buffer (available bytes, exhausted rate, total bytes, pool wait ratio)
  • Compression Rate
  • Connection (count, close rate, creation rate)
  • Incoming Byte Rate & Outgoing Byte Rate
  • IO (ratio, TimeNS avg, Wait Ratio, Wait Time Millisec Av)
  • Produce Throttle Time (avg, max)
  • Record (error rate, queue time avg/max, retry rate, send rate, size avg/max, per request avg)
  • Requests (latency avg/max, rate, size avg/max, # in flight))
  • Response Rate & Select Rate
  • Waiting Threads

Monitor Kafka Consumers

The performance of consumers can also provide an early warning indication that something is wrong upstream from Kafka. Finding these problems early can be the key to getting back on track quickly. Metrics include:

  • Bytes Consumed Rate
  • Fetch (latency avg/max, rate, size avg/max, throttle time avg/max)
  •  Records (consumed rate, lag max, per request avg)

Monitor Kafka Zookeepers

Kafka uses Zookeeper to manage all the Kafka brokers including electing a controller, cluster membership, topic configuration, quotas, ACLs and consumer membership. RTView monitors critical Zookeeper metrics including:

  • Latency (min/max/avg)
  • Connections & Outstanding Requests
  • Nodes & Watches
  • Packets received & sent (delta & rate)
  • Client Connections Per Host
  • Session Timeout (min, max)

Kafka Alerts - Out of the Box

RTView provides centralized alert management with over 50+ pre-defined alerts with pre-configured thresholds out of the box.  Manage alerts directly in RTView or export to Service Now and other ticket management apps.

  • Kafka Broker Alerts (11)
  • Kafka Cluster Alerts (2)
  • Kafka Consumer Alerts (6)
  • Kafka Producer Alerts (7)
  • Kafka Zookeeper Alerts (6)
  • Tomcat Alerts (4)
  • JVM Alerts (6)
  • Mule Alerts (11)