goRainbow is a plug-in for Burrow. Burrow is a Lag monitoring service for Apache Kafka.
With goRainbow, it provides more visibility for Kafka users:
- Producer status: for each producer, how many records produced to each partition.
- Consumer status: for each consumer, which partition(s) are hosted and how many records consumed per minute.
- Lag: total lag in the whole consumer group and partition level lag.
- Traffic statistics: It provides data traffic statistic.(totalMessage, validMessage, metricsSent, exceptionCount)
goRainbow includes 3 main parts:
- URL maintainer: maintain available URLs, create new handler thread for new URL.
- URL handler: translates data into required form, prepared for producer.
- kafka producer: sends data to kafka(speed-racer), which will send metrics to the Wavefront.
- Each
consumer handler
is responsible for one consumer and has one specific url(burrow/{cluster}/{consumer}
) to pull the consumer info from Burrow. alive consumers maintainer
checks Burrow periodically to see whether there is a new consumer or not. If so, it would raise a newconsumer handler
for the new consumer.consumer handler
would deregister itself inalive consumers maintainer
when its consumer is not valid any longer.
It's similar to the Consumer structure.
You may check Burrow Dockerfile for how to use goRainbow. Also, Burrow Inspection is my understanding of Burrow code.
- We have the sample Dockerfile integrated goRainbow into Burrow.
- The main program is main.go It will open health_check port at localhost:7099
- health-check: localhost:7099/health-check
- return 200 if service is available
- return 503 if service is unavailable
Also goRainbow provides a Burrow-push-model, in which goRainbow accepts Burrow's Lag message via Burrow notifier. It's working fine, but goRainbow pull-model can provide a better precision.
You may check rainbow-push-model branch for details. push-model
- Avoid blocking operation in main pipeline.
- Refined nested sync map to avoid blocking in URL maintainer.
- leave heavy workload to goroutine.
- Twin-state-machine to guarantee metrics start and end with 0.
- Heath-check: It provides health-check HTTP service so that AWS can auto restart Burrow-goRainbow when the service is unavailable.
- Dynamic metric sending:
- It sends partition metrics when lag exists. Also it guarantees every metric starts from 0 and ends with 0, which shows better in wavefront.
- It sends metrics per 30s when metrics change and per 60s for unchanged metrics.
A big thanks to porter-rainbow, which gave me a basic idea about how to design the goRainbow.
port-rainbow is mainly based on socket connection. goRainbow is like a RESTful service.