-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redesign the Jolokia input plugin (as Jolokia2) #2278
Conversation
cc @phemmer @mainvoid @saiello @subhachandrachandra could you take a look at this? |
I don't understand #2242 exactly. But I am making a Jolokia "gatherer" that implements this input plugin and is usable by other inputs. That way, the Cassandra input plugin could be a thin veneer on top of the Jolokia gatherer. I think this satisfies #2242. If I was implementing #2242 after finishing this PR, I would consider doing these two things together:
The final config, to get all the JVM and Cassandra measurements from a Jolokia agent running on Cassandra:
|
#2242 is quite simple. Replace existing Jolokia plugin with Cassandra plugin. Both hook into jolokia and grab data out of it. However, the Cassandra plugin get the data out in a simpler and smarter way than the jolokia plugin. It eases how one configures it, compared to the jolokia plugin, as well as apparently this PR. I'm not sure why re-inventing the wheel is a necessity, but hey. I'd say, if you want to replace the jolokia plugin, make it better than the Cassandra one if possible. I see little need to have a shim layer on top of a plugin in order to grab specific metrics. All of them are presented as it is. A side effect of the merge is also, one code base to maintain. Only reason the Cassandra plugin doesn't grab other metrics out is because it has been coded to only grab java.lang and cassandra specific metrics. Remove that and it'll do what the Jolokia plugin does, and more.. Like name_drop, field_pass, etc. etc. which doesn't exist in the Jolokia plugin today. It's already there for the taking. Also, replacing jolokia with Cassandra only needs three things. 1/ Removing 10 lines of code. 2/ Renaming it to something suitable. 3/ Profit. It has also already proven itself in production environments. |
Cassandra input plugin does not do proxying. It does not do bulk HTTP. It does not handle TLS. It does not produce measurements other than the ones it knows about: It does take key-property names and turn them into tags, which is something we want with the Improvements made to Jolokia gathering could knock on to any plugin interacting with Jolokia agents. |
e63d114
to
ed52c88
Compare
5a51530
to
d01034f
Compare
plugins/inputs/jolokia2/jolokia.go
Outdated
|
||
func (jc *Jolokia) SampleConfig() string { | ||
return fmt.Sprintf(` | ||
# %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to put the Description in your SampleConfig, telegraf will add that automatically
plugins/inputs/jolokia2/README.md
Outdated
[[inputs.jolokia2.metric]] | ||
name = "jvm_memory_pool" | ||
mbean = "java.lang:name=*,type=MemoryPool" | ||
paths = ["Usage", "PeakUsage, "CollectionUsage"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know why github is interpreting this as invalid toml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's subtle, but there's a missing quote after PeakUsage
.
plugins/inputs/jolokia2/README.md
Outdated
url = "service:jmx:rmi:///jndi/rmi://targethost:9999/jmxrmi" | ||
#username = "" | ||
#password = "" | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put sample plain-text metrics here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, curious if the problem of collapsing metrics with long paths is fixed.
ATM we're using Cassandra jolokia plugin and it seems that plugin collapses LastGcInfo
for all heap spaces into one. For instance:
metrics = [
"/java.lang:type=GarbageCollector,name=G1%20Young%20Generation/LastGcInfo/memoryUsageAfterGc/G1%20Eden%20Space",
"/java.lang:type=GarbageCollector,name=G1%20Young%20Generation/LastGcInfo/memoryUsageAfterGc/G1%20Survivor%20Space",
"/java.lang:type=GarbageCollector,name=G1%20Young%20Generation/LastGcInfo/memoryUsageAfterGc/G1%20Old%20Gen",
"/java.lang:type=GarbageCollector,name=G1%20Young%20Generation/LastGcInfo/memoryUsageAfterGc/Metaspace",
"/java.lang:type=GarbageCollector,name=G1%20Young%20Generation/LastGcInfo/memoryUsageAfterGc/Compressed%20Class%20Space",
"/java.lang:type=GarbageCollector,name=G1%20Young%20Generation/LastGcInfo/memoryUsageAfterGc/Code%20Cache"
]
result in only following 6:
"javaGarbageCollector,mname=G1\\ Young\\ Generation,cassandra_host=test LastGcInfo_used=0,LastGcInfo_init=4513071104,LastGcInfo_committed=4160749568,LastGcInfo_max=-1 1487259680000000000\n"
"javaGarbageCollector,mname=G1\\ Young\\ Generation,cassandra_host=test LastGcInfo_max=-1,LastGcInfo_used=352321536,LastGcInfo_init=0,LastGcInfo_committed=352321536 1487259680000000000\n"
"javaGarbageCollector,mname=G1\\ Young\\ Generation,cassandra_host=test LastGcInfo_init=12666798080,LastGcInfo_committed=12666798080,LastGcInfo_max=17179869184,LastGcInfo_used=4759843152 1487259680000000000\n"
"javaGarbageCollector,mname=G1\\ Young\\ Generation,cassandra_host=test LastGcInfo_used=39882464,LastGcInfo_init=0,LastGcInfo_committed=40714240,LastGcInfo_max=-1 1487259680000000000\n"
"javaGarbageCollector,mname=G1\\ Young\\ Generation,cassandra_host=test LastGcInfo_init=0,LastGcInfo_committed=4587520,LastGcInfo_max=1073741824,LastGcInfo_used=4326520 1487259680000000000\n"
"javaGarbageCollector,mname=G1\\ Young\\ Generation,cassandra_host=test LastGcInfo_init=2555904,LastGcInfo_committed=44236800,LastGcInfo_max=251658240,LastGcInfo_used=41900864 1487259680000000000\n"
whereas they should (perhaps) have path in their names providing metric names sort of like (or in any way where path
"javaGarbageCollector,mname=G1\\ Young\\ Generation,cassandra_host=test memoryUsageAfterGc_Metaspace_LastGcInfo_init=2555904,memoryUsageAfterGc_Metaspace_LastGcInfo_committed=44236800,memoryUsageAfterGc_Metaspace_LastGcInfo_max=251658240,memoryUsageAfterGc_Metaspace_LastGcInfo_used=41900864 1487259680000000000\n"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we are collapsing paths, and coalescing series as much as we can so we don't produce nonsensical measurements.
For example, if you were to set it up so that Jolokia returned this:
foo,gc="G1 Garbage Collector" x=1
foo,gc="G1 Garbage Collector" y=2
These would get compacted into: foo,gc="G1 Garbage Collector" x=1,x=2
That happens at this point in the code is for: https://github.com/dylanmei/telegraf/blob/c9a251e7f219c326c30e48140bcffcc4e302f1d8/plugins/inputs/jolokia2/gatherer.go#L138
LastGcInfo in particular has been tricky, since Young Gen returns an object and Old Gen always returns null. I'm super confused by the design of this output!
With this updated plugin, you can create a very finely tuned "young gen" measurement for yourself, or what we've been doing is punting on it entirely and deriving just what we need from CollectionTime/CollectionCount:
[[inputs.jolokia2.metric]]
name = "java_garbage_collector"
mbean = "java.lang:name=G1*,type=GarbageCollector"
paths = ["CollectionTime", "CollectionCount"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dylanmei still not sure I get how it is possible to configure plugin for getting stats for utilization of different heap spaces (Eden, SurvivorSpace and OldGen etc which are only in paths which as I understand are collapsed completely). Perhaps I am missing something really-really basic 😆
@dylanmei can you explain some of the changes that you made to config.go? Not saying they're wrong but I'd like to know why they are included in this PR. |
@sparrc regarding config.go: This plugin is very complex. It is my feeling that if I can simply leverage the TOML parsing and plugin creation in config.go that I can create more readable test fixtures, and the test fixtures can enhance the documentation. This can lead to an fixture file like this: https://github.com/dylanmei/telegraf/blob/jolokia2/plugins/inputs/jolokia2/testdata/field_substitutions.toml I realize I may be off on a tangent here. Writing tests this way has helped me immensely as I was progressing. I may decompose these fixtures files back into code now that the changes are settling down. But I think being able to pass a plain string as well as a file to config.go is a handy thing. |
b98ecaa
to
c9a251e
Compare
I've begun running this "in production" for a Kafka cluster. The conf is largely based on this project: https://github.com/dylanmei/compose-confluent and uses a publicly available Docker image In addition to iterating on the tests,
|
good to know @dylanmei, have any plans for implementing the proxy support? I'm not sure we can merge it until then since the current jolokia plugin does support jolokia proxies. |
Yes, I do plan to put the proxy support back into place. As noted in the README, we'll express targets with the full url (i.e, Meaning, the config will just channel what's shown in the Jolokia docs: url, password, user.
|
@danielnelson the changes to Related to naming, I want to call out that there are two plugins here. The original We can
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two agent and proxy configs are quite similar, what were the issues you ran into with the toml config? If it has to do with associating the proxy url to the connection url, perhaps we just need the urls to be in the same table:
[[inputs.jolokia2]]
# default_tag_prefix = ""
# default_field_prefix = ""
# default_field_separator = "."
response_timeout = "5s"
## Add agents to query
[[inputs.jolokia2.servers]]
url = "http://localhost:8080/jolokia"
username = ""
password = ""
proxy_url = "service:jmx:rmi:///jndi/rmi://targethost:9999/jmxrmi"
proxy_username = ""
proxy_password = ""
## Optional SSL config
# ssl_ca = "/var/private/ca.pem"
# ssl_cert = "/var/private/client.pem"
# ssl_key = "/var/private/client-key.pem"
# insecure_skip_verify = false
## Add metrics to read
[[inputs.jolokia2.metric]]
name = "java_runtime"
mbean = "java.lang:type=Runtime"
paths = ["Uptime"]
Two plugins is fine as well. I'll try to finish a proper review this week, but here are a couple minor things I noticed:
plugins/inputs/jolokia2/jolokia.go
Outdated
# default_field_separator = "." | ||
|
||
## Add agents to query | ||
agents = ["http://localhost:8080/jolokia"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be urls
plugins/inputs/jolokia2/jolokia.go
Outdated
## Add targets to query | ||
# default_target_username = "" | ||
# default_target_password = "" | ||
[[inputs.jolokia_proxy.target]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there be more than one target per url?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the proxy can make requests to multiple targets, and each of those targets could potentially have different user/pass. This would be in a world where, for example, a distributed application's JMX endpoints are completely inaccessible except to the proxy. The proxy can have its own user/pass, and the individual targets can have their own user/pass. It's quite difficult to express all these possibilities in a single table.
plugins/inputs/jolokia2/README.md
Outdated
[[inputs.jolokia2.metric]] | ||
name = "kafka_log" | ||
mbean = "kafka.log:name=*,partition=*,topic=*,type=Log" | ||
field_name = "$1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see a spot where the $1 syntax was explained. Do these correspond to the *
in the mbean
options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll explain this more clearly.
plugins/inputs/jolokia2/jolokia.go
Outdated
}) | ||
} | ||
|
||
func (jc *JolokiaProxy) Gather(acc telegraf.Accumulator) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method receiver is sometimes named jc
and sometimes jp
, lets always use the same variable name.
plugins/inputs/jolokia2/jolokia.go
Outdated
|
||
for _, config := range jc.Metrics { | ||
metrics = append(metrics, NewMetric(config, | ||
jc.DefaultFieldPrefix, jc.DefaultFieldSeparator, jc.DefaultTagPrefix)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done every gather, but only needs to be done once. Lets put it into an initialize function and only run it once. Should be done also in Jolokia struct.
plugins/inputs/jolokia2/jolokia.go
Outdated
} | ||
|
||
gatherer := NewGatherer(acc, metrics) | ||
proxy, err := jc.buildProxy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building the proxy and agent is also done every gather when it could be done once. This will create a new transport each Gather which has the potential to cause problems. Also needs change in Jolokia struct
plugins/inputs/jolokia2/jolokia.go
Outdated
return err | ||
} | ||
|
||
err = gatherer.Gather(agent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be ran in a goroutine to avoid earlier urls blocking later ones. It is not a problem in the JolokiaProxy since there is only one server, unless there are multiple targets.
You could get around this by saying there can only be one url per plugin, but the flip side of that is if you have many servers with the same configuration it would be a lot of config file to duplicate.
@danielnelson I believe I've addressed your awesome feedback, the big ones were: less mysterious documentation and also a groroutine around fetching agent metrics from multiple agents. |
Thanks @dylanmei, this will be in 1.5.0. |
Sorry if this is the incorrect place for this - I was getting started with the jolokia2_agent in telegraf 1.5 to monitor kafka 1.0.0. I'm having trouble parsing the metrics I'm seeing reported when using the example configuration. For example for per topic metrics:
What I'm noticing in InfluxDB are fields for |
The declaration If I'm not clear what you think the problem is -- I need more info. Can you create an issue and show the line-protocol output? |
I should have actually included what is being sent to influxdb, with stdout: Here are two different items, both with the same field name, but being reported with the same tags in Telegraf:
You can see one is I would expect to either see a tag ( |
I see a problem here, which is that your fields are starting with just |
The output from the jolokia server seems correct: curl http://localhost:8778/jolokia/read/kafka.server:name=*,topic=*,type=BrokerTopicMetrics {
"request": {
"mbean": "kafka.server:name=*,topic=*,type=BrokerTopicMetrics",
"type": "read"
},
"value": {
"kafka.server:name=TotalFetchRequestsPerSec,topic=prod-topic,type=BrokerTopicMetrics": {
"RateUnit": "SECONDS",
"OneMinuteRate": 2.964393875e-314,
"EventType": "requests",
"Count": 63,
"FifteenMinuteRate": 4.174759510437766e-37,
"FiveMinuteRate": 1.4223410020833352e-106,
"MeanRate": 0.000760766356235835
},
"kafka.server:name=FailedFetchRequestsPerSec,topic=prod-topic,type=BrokerTopicMetrics": {
"RateUnit": "SECONDS",
"OneMinuteRate": 0,
"EventType": "requests",
"Count": 0,
"FifteenMinuteRate": 0,
"FiveMinuteRate": 0,
"MeanRate": 0
},
"kafka.server:name=MessagesInPerSec,topic=prod-topic,type=BrokerTopicMetrics": {
"RateUnit": "SECONDS",
"OneMinuteRate": 1.1654669777738194,
"EventType": "messages",
"Count": 96684,
"FifteenMinuteRate": 1.1598221110942293,
"FiveMinuteRate": 1.1526658491904729,
"MeanRate": 1.1675227624956321
},
"kafka.server:name=BytesInPerSec,topic=prod-topic,type=BrokerTopicMetrics": {
"RateUnit": "SECONDS",
"OneMinuteRate": 24247.62725889159,
"EventType": "bytes",
"Count": 1717869132,
"FifteenMinuteRate": 23861.129588429572,
"FiveMinuteRate": 23573.155350780762,
"MeanRate": 20744.397382482304
},
"kafka.server:name=BytesRejectedPerSec,topic=prod-topic,type=BrokerTopicMetrics": {
"RateUnit": "SECONDS",
"OneMinuteRate": 0,
"EventType": "bytes",
"Count": 0,
"FifteenMinuteRate": 0,
"FiveMinuteRate": 0,
"MeanRate": 0
},
"kafka.server:name=BytesOutPerSec,topic=prod-topic,type=BrokerTopicMetrics": {
"RateUnit": "SECONDS",
"OneMinuteRate": 2.964393875e-314,
"EventType": "bytes",
"Count": 12445845,
"FifteenMinuteRate": 8.418860443606704e-32,
"FiveMinuteRate": 3.295931974580797e-101,
"MeanRate": 150.2917479255084
},
"kafka.server:name=ProduceMessageConversionsPerSec,topic=prod-topic,type=BrokerTopicMetrics": {
"RateUnit": "SECONDS",
"OneMinuteRate": 1.1654669777738194,
"EventType": "requests",
"Count": 96684,
"FifteenMinuteRate": 1.1598221110942293,
"FiveMinuteRate": 1.1526658491904729,
"MeanRate": 1.1675227688593328
},
"kafka.server:name=FetchMessageConversionsPerSec,topic=prod-topic,type=BrokerTopicMetrics": {
"RateUnit": "SECONDS",
"OneMinuteRate": 2.964393875e-314,
"EventType": "requests",
"Count": 652,
"FifteenMinuteRate": 4.0730516115215564e-36,
"FiveMinuteRate": 1.5654916456104026e-105,
"MeanRate": 0.007873328005193672
},
"kafka.server:name=FailedProduceRequestsPerSec,topic=prod-topic,type=BrokerTopicMetrics": {
"RateUnit": "SECONDS",
"OneMinuteRate": 0,
"EventType": "requests",
"Count": 0,
"FifteenMinuteRate": 0,
"FiveMinuteRate": 0,
"MeanRate": 0
},
"kafka.server:name=TotalProduceRequestsPerSec,topic=prod-topic,type=BrokerTopicMetrics": {
"RateUnit": "SECONDS",
"OneMinuteRate": 1.1406883382817614,
"EventType": "requests",
"Count": 91950,
"FifteenMinuteRate": 1.1162715461060453,
"FiveMinuteRate": 1.111917792093389,
"MeanRate": 1.1103566089201538
}
},
"timestamp": 1513455392,
"status": 200
} Do you think this is a problem with the plugin or with my environment? |
@nemosupremo Let's take this to issue #3597 |
Required for all PRs:
This PR is an attempt to deprecate the initial Jolokia input plugin. The shortcomings of the initial Jolokia plugin include:
I'll gladly add other target "shortcomings" should they come up and be in scope.
Taking inspiration from the
win_perf_counters
input, the core change here is to tie measurements to mbeans, rather than folding all mbean requests to a single "jolokia" measurement.Feedback is welcome.
I'm available for discussion on the gophers #influxdb channel as @dylanmei.
Checkout
quay.io/nordstrom/telegraf:latest
to test-drive this.