Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Integrations]: Kafka support for Observability #774

Open
nidhisinghai opened this issue Jun 1, 2022 · 4 comments
Open

[Integrations]: Kafka support for Observability #774

nidhisinghai opened this issue Jun 1, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@nidhisinghai
Copy link
Contributor

nidhisinghai commented Jun 1, 2022

Hello
This enhancement request following feature:

  1. Install and Configure JDK, Zookeeper and Kafka
  2. Configure Kafka to generate required level of logs
  3. Configure FluentD to recieve Kafka logs and forward to OpenSearch
  4. Analyze various metrices available from kafka for visualization
  5. Create visualizations in Observability.

@abasatwar @spattnaik

@zishanfazilkhan
Copy link

Here's a brief overview of the Kafka exploration done so far:

Prerequisite

  1. Kafka requires jdk. We have used openjdk-11 as the base jdk environment.

  2. Kafka also requires Zookeeper setup. Although, it is possible to run Kafka without Zookeeper, it has been suggested in many forums to use Kafka along with Zookeeper. We went ahead setting up Kafka along with Zookeeper

Kafka Logs

  1. After setting up standalone Kafka server, we explored logs from various components of Kafka.

  2. Kafka produces metrices for the server itself, Zookeeper, Producer and Consumer threads.

  3. Kafka, however, doesn't provide any stats/metrices in the logs. We tried getting the logs in the default, debug and trace mode.

  4. Nothing concrete was present in the logs which could have been modelled as analytics hence we stopped further analysis of logs.

Kafka Statistics

  1. We researched the statistics from Kafka that needs to be monitored. These are well documented on the parent site as well as other sites.

    https://kafka.apache.org/documentation/#monitoring

    https://www.datadoghq.com/blog/collecting-kafka-performance-metrics/

  2. We analyzed further ways of getting statistics from Kafka and found out various ways of getting them.

  3. These primarily fall under:

    • With the help of agents (Ex. DataDog, Lenses, Confluent etc )

      This category mostly falls in the category of licensed tools where in an agent is configured locally and communicates to a remote server which renders it or forwards it again in the data pipeline. These are usually a combination of a tool that gathers/reads the statistics via JMX (plugins) and other services (Graphite, Graphana, DataDog etc) which process those data and render them visually.

    • With the help of tools/plugins (Ex. jConsole, jmxtrans, Burrow, Prometheus etc)

      The second category are mostly opensource tools/plugins/libraries that fetches various metrices/stats exposed via JMX from Kafka. They give you flexibility to model the statistics in your own way.

JConsole:

  1. JConsole is a monitoring utility shipped with JDK.

  2. It provides a simple UI interface to connect with a JVM instance and show various statistics.

Advantage:
  1. No configuration required and UI based operation.
Disadvantage:
  1. No programming interface via API's which could be used to fetch metrices/statistics.

  2. Manual operation.

  3. Metrices/Statistics could be saved however that is also a manual operation.

We explored various options available in Jcosole however discarded use of jconsole since it didn't provide a programming interface.

JmxTrans:

https://github.com/jmxtrans/jmxtrans/wiki

  1. JmxTrans is an opensource plugin which provides JVM metrices exposed via JMX

  2. It is available as a simple zip package downloadable from github.

  3. Could be configured via simple yml files to query various statistics

  4. We have explored this option and have been able to query Kafka and Zookeeper statistics and write them to a file.

  5. The file then could be provided to Fluent and published as an index inside opensearch.

  6. This paves way for visualizations to be created around the various metrices.

Advantage:

  1. Simple yaml file configuration for querying statistics/metrices.

  2. Opensource software with reliable maintenance

Disadvantage:

  1. N/A

Burrow:

Yet to be explored.

Prometheus:

Yet to be explored

@zishanfazilkhan
Copy link

Statistics via JMXTrans:

The following section describes how we have accessed metrices using JMXTrans as well as what all metrices are available to query.

Metrices Availability:

As mentioned in the previous post, the metrices to be monitored for Kafka/Zookeeper are well documented.
These could be found at:

https://kafka.apache.org/documentation/#monitoring

JConsole provides all the relevant MBeans to be queried in it's GUI. We configured JConsole and got the name of the MBeans to be queried from there. The below snapshot shows the same:

JConsole_Kafka_1

JConsole_Kafka_2

JConsole_Kafka_3

Note that, for the analysis we have done, we took a sample of the available MBeans and tried querying them. 
(The MBeans which we queried are described in the configuration section below.)
We assume that the same process would also be applicable to rest of the MBeans.

JMXTrans Configuration:

Once we had the MBeans to be queried, we created config files in JMXTRans which would query the relevant JVM instance for metrices. These are simple json/yaml files which contains elements/nodes to query a MBeans inside a JVM instance.
The one which we created is as follows:

    servers: 
     - port: 9004
       host: admin1-Veriton-M200-H81
       alias: srv
       queries:
        - obj: kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec
          resultAlias: BrTopicMets
          attr: 
           - Count
           - RateUnit
           - MeanRate
          outputWriters:
           - "@Class": com.googlecode.jmxtrans.model.output.KeyOutWriter
             outputFile: "/opt/OpenSearch/Kafka/jmxtrans-master/KeyOut_Kafka.txt"
             maxLogFileSize: 10MB
             maxLogBackupFiles: 200
             debug: true
             typeNames:
             - name         
     - port: 9003
       host: admin1-Veriton-M200-H81
       alias: srv
       queries:   
        - obj: org.apache.ZooKeeperService:name0=StandaloneServer_port2181
          resultAlias: ZooMets
          attr: 
           - NumAliveConnections
           - MinRequestLatency
           - MaxRequestLatency
          outputWriters:
           - "@Class": com.googlecode.jmxtrans.model.output.KeyOutWriter
             outputFile: "/opt/OpenSearch/Kafka/jmxtrans-master/KeyOut_Zoo.txt"
             maxLogFileSize: 10MB
             maxLogBackupFiles: 200
             debug: true
             typeNames:
             - name

The config section also allows you to redirect the metrices output to a bunch of writers. Some of these write the metrices in files while others could forward them to different applications, for example: Graphite. Some could also stream the metrices over UDP.

We gathered the metrices and stored them into a simple file using the KeyOutWriter as shown in the config file above.
Any specific writer if required could be explored.

Integration with FluentD & Opensearch

The file so created via JMXTrans was integrated into Opensearch using FluentD. 
We read the file in FluentD and redirected its content to Opensearch.

We have stopped analysis on Kafka/ZooKeeper at this juncture and are awaiting further instructions 
whether to explore the rest of the options mentioned here or use any other tools/mechanism.

@zishanfazilkhan
Copy link

Prometheus

We are trying to evaluate Prometheus for collecting statistics as it supports integration with a lot of applications.
We have tried the following with Prometheus so far:

Output from Configured Endpoint:

FluentD is not able to read the file which has default output from endpoints.
A dedicated parser would need to be written to extract the metric from the endpoint and store it in a file which could then be read by FluentD

Prom2json:

Since the file created from output of default endpoint was not readable in FluentD,
we tried a specific formatter for translating the output to json which is supported as a valid input for FluentD.

Prom2json reads the default endpoint for Prometheus and translates the output to Json.
However, this JSON again had few nested elements for which FluentD parsing failed.

HTTP API:

Prometheus provides rest APIs to query data and get result in json.
However, from the samples carried out so far, the resultant json is not being parsed in FluentD.

Remote Write API:

This could be used to store data from Prometheus to a remote timeseries database such as Influx DB.
We have not tried this option. However, the question about getting data into Opensearch remains unanswered in this approach too.

Kindly let us know if:

  1. you are aware of a tool/plugin/connector which could query and output data from Prometheus
  2. there is any other mechanism via which we could query and output data from Prometheus
  3. we could get help from FluentD Developers to write us a plugin for querying Prometheus data

@nidhisinghai nidhisinghai changed the title [FEATURE] Kafka support for Observability [Integrations]: Kafka support for Observability Aug 23, 2022
@louzadod
Copy link

Prometheus is an entire ecosystem of well written and stable tools. In my opinion, the best approach is reuse it as much as possible.
Considering Kafka, best approach is to use JXM Exporter and write data to Prometheus.

What I see as good integration point between Prometheus and OpenSearch is using OpenSearch as a remote storage back end.

Then, observability can take advantage of a central point for logs, tracing and metrics. Data correlation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

4 participants