[Integrations]: Kafka support for Observability #774

nidhisinghai · 2022-06-01T13:41:24Z

Hello
This enhancement request following feature:

Install and Configure JDK, Zookeeper and Kafka
Configure Kafka to generate required level of logs
Configure FluentD to recieve Kafka logs and forward to OpenSearch
Analyze various metrices available from kafka for visualization
Create visualizations in Observability.

@abasatwar @spattnaik

zishanfazilkhan · 2022-06-30T15:10:04Z

Here's a brief overview of the Kafka exploration done so far:

Prerequisite

Kafka requires jdk. We have used openjdk-11 as the base jdk environment.
Kafka also requires Zookeeper setup. Although, it is possible to run Kafka without Zookeeper, it has been suggested in many forums to use Kafka along with Zookeeper. We went ahead setting up Kafka along with Zookeeper

Kafka Logs

After setting up standalone Kafka server, we explored logs from various components of Kafka.
Kafka produces metrices for the server itself, Zookeeper, Producer and Consumer threads.
Kafka, however, doesn't provide any stats/metrices in the logs. We tried getting the logs in the default, debug and trace mode.
Nothing concrete was present in the logs which could have been modelled as analytics hence we stopped further analysis of logs.

Kafka Statistics

We researched the statistics from Kafka that needs to be monitored. These are well documented on the parent site as well as other sites.

https://kafka.apache.org/documentation/#monitoring

https://www.datadoghq.com/blog/collecting-kafka-performance-metrics/
We analyzed further ways of getting statistics from Kafka and found out various ways of getting them.
These primarily fall under:
- With the help of agents (Ex. DataDog, Lenses, Confluent etc )
  
  This category mostly falls in the category of licensed tools where in an agent is configured locally and communicates to a remote server which renders it or forwards it again in the data pipeline. These are usually a combination of a tool that gathers/reads the statistics via JMX (plugins) and other services (Graphite, Graphana, DataDog etc) which process those data and render them visually.
- With the help of tools/plugins (Ex. jConsole, jmxtrans, Burrow, Prometheus etc)
  
  The second category are mostly opensource tools/plugins/libraries that fetches various metrices/stats exposed via JMX from Kafka. They give you flexibility to model the statistics in your own way.

JConsole:

JConsole is a monitoring utility shipped with JDK.
It provides a simple UI interface to connect with a JVM instance and show various statistics.

Advantage:

No configuration required and UI based operation.

Disadvantage:

No programming interface via API's which could be used to fetch metrices/statistics.
Manual operation.
Metrices/Statistics could be saved however that is also a manual operation.

We explored various options available in Jcosole however discarded use of jconsole since it didn't provide a programming interface.

JmxTrans:

https://github.com/jmxtrans/jmxtrans/wiki

JmxTrans is an opensource plugin which provides JVM metrices exposed via JMX
It is available as a simple zip package downloadable from github.
Could be configured via simple yml files to query various statistics
We have explored this option and have been able to query Kafka and Zookeeper statistics and write them to a file.
The file then could be provided to Fluent and published as an index inside opensearch.
This paves way for visualizations to be created around the various metrices.

Advantage:

Simple yaml file configuration for querying statistics/metrices.
Opensource software with reliable maintenance

Disadvantage:

N/A

Burrow:

Yet to be explored.

Prometheus:

Yet to be explored

zishanfazilkhan · 2022-07-11T14:47:21Z

Statistics via JMXTrans:

The following section describes how we have accessed metrices using JMXTrans as well as what all metrices are available to query.

Metrices Availability:

As mentioned in the previous post, the metrices to be monitored for Kafka/Zookeeper are well documented.
These could be found at:

https://kafka.apache.org/documentation/#monitoring

JConsole provides all the relevant MBeans to be queried in it's GUI. We configured JConsole and got the name of the MBeans to be queried from there. The below snapshot shows the same:

Note that, for the analysis we have done, we took a sample of the available MBeans and tried querying them.
(The MBeans which we queried are described in the configuration section below.)
We assume that the same process would also be applicable to rest of the MBeans.

JMXTrans Configuration:

Once we had the MBeans to be queried, we created config files in JMXTRans which would query the relevant JVM instance for metrices. These are simple json/yaml files which contains elements/nodes to query a MBeans inside a JVM instance.
The one which we created is as follows:

   servers:
   - port: 9004
   host: admin1-Veriton-M200-H81
   alias: srv
   queries:
       - obj: kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec
       resultAlias: BrTopicMets
       attr:
       - Count
       - RateUnit
       - MeanRate
       outputWriters:
       - "@Class": com.googlecode.jmxtrans.model.output.KeyOutWriter
           outputFile: "/opt/OpenSearch/Kafka/jmxtrans-master/KeyOut_Kafka.txt"
           maxLogFileSize: 10MB
           maxLogBackupFiles: 200
           debug: true
           typeNames:
           - name
   - port: 9003
   host: admin1-Veriton-M200-H81
   alias: srv
   queries:
       - obj: org.apache.ZooKeeperService:name0=StandaloneServer_port2181
       resultAlias: ZooMets
       attr:
       - NumAliveConnections
       - MinRequestLatency
       - MaxRequestLatency
       outputWriters:
       - "@Class": com.googlecode.jmxtrans.model.output.KeyOutWriter
           outputFile: "/opt/OpenSearch/Kafka/jmxtrans-master/KeyOut_Zoo.txt"
           maxLogFileSize: 10MB
           maxLogBackupFiles: 200
           debug: true
           typeNames:
           - name

The config section also allows you to redirect the metrices output to a bunch of writers. Some of these write the metrices in files while others could forward them to different applications, for example: Graphite. Some could also stream the metrices over UDP.

We gathered the metrices and stored them into a simple file using the KeyOutWriter as shown in the config file above.
Any specific writer if required could be explored.

Integration with FluentD & Opensearch

The file so created via JMXTrans was integrated into Opensearch using FluentD.
We read the file in FluentD and redirected its content to Opensearch.

We have stopped analysis on Kafka/ZooKeeper at this juncture and are awaiting further instructions
whether to explore the rest of the options mentioned here or use any other tools/mechanism.

zishanfazilkhan · 2022-07-29T19:52:00Z

Prometheus

We are trying to evaluate Prometheus for collecting statistics as it supports integration with a lot of applications.
We have tried the following with Prometheus so far:

Output from Configured Endpoint:

FluentD is not able to read the file which has default output from endpoints.
A dedicated parser would need to be written to extract the metric from the endpoint and store it in a file which could then be read by FluentD

Prom2json:

Since the file created from output of default endpoint was not readable in FluentD,
we tried a specific formatter for translating the output to json which is supported as a valid input for FluentD.

Prom2json reads the default endpoint for Prometheus and translates the output to Json.
However, this JSON again had few nested elements for which FluentD parsing failed.

HTTP API:

Prometheus provides rest APIs to query data and get result in json.
However, from the samples carried out so far, the resultant json is not being parsed in FluentD.

Remote Write API:

This could be used to store data from Prometheus to a remote timeseries database such as Influx DB.
We have not tried this option. However, the question about getting data into Opensearch remains unanswered in this approach too.

Kindly let us know if:

you are aware of a tool/plugin/connector which could query and output data from Prometheus
there is any other mechanism via which we could query and output data from Prometheus
we could get help from FluentD Developers to write us a plugin for querying Prometheus data

louzadod · 2022-12-15T16:24:14Z

Prometheus is an entire ecosystem of well written and stable tools. In my opinion, the best approach is reuse it as much as possible.
Considering Kafka, best approach is to use JXM Exporter and write data to Prometheus.

What I see as good integration point between Prometheus and OpenSearch is using OpenSearch as a remote storage back end.

Then, observability can take advantage of a central point for logs, tracing and metrics. Data correlation.

nidhisinghai added enhancement New feature or request untriaged labels Jun 1, 2022

deepaknevdepsl mentioned this issue Jun 9, 2022

Visualization tracker spattnaik/opensearch-planning#1

Open

21 tasks

anirudha removed the untriaged label Jun 13, 2022

abasatwar mentioned this issue Jan 3, 2023

[PLANNING] Visualization progress tracker opensearch-project/dashboards-observability#127

Open

22 tasks

nidhisinghai changed the title ~~[FEATURE] Kafka support for Observability~~ [Integrations]: Kafka support for Observability Aug 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Integrations]: Kafka support for Observability #774

[Integrations]: Kafka support for Observability #774

nidhisinghai commented Jun 1, 2022 •

edited

Loading

zishanfazilkhan commented Jun 30, 2022

zishanfazilkhan commented Jul 11, 2022

zishanfazilkhan commented Jul 29, 2022

louzadod commented Dec 15, 2022

[Integrations]: Kafka support for Observability #774

[Integrations]: Kafka support for Observability #774

Comments

nidhisinghai commented Jun 1, 2022 • edited Loading

zishanfazilkhan commented Jun 30, 2022

Prerequisite

Kafka Logs

Kafka Statistics

JConsole:

Advantage:

Disadvantage:

JmxTrans:

Advantage:

Disadvantage:

Burrow:

Prometheus:

zishanfazilkhan commented Jul 11, 2022

Statistics via JMXTrans:

Metrices Availability:

JMXTrans Configuration:

Integration with FluentD & Opensearch

zishanfazilkhan commented Jul 29, 2022

Prometheus

Output from Configured Endpoint:

Prom2json:

HTTP API:

Remote Write API:

louzadod commented Dec 15, 2022

nidhisinghai commented Jun 1, 2022 •

edited

Loading