-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: pool detection and metrics gathering for ZFS >= 2.1.x #10099
Conversation
Did the lint rules change recently? It seems to be flagged a lot of things in the ZFS plugin README that I didn't touch. |
Hey @aaronjwood, thanks for looking into this. From your changes it looks like the PR is breaking backward compatibility, i.e. you cannot use a zfs < 2.1.x with the PR (e.g. by changing the proc-file path). Is my understanding correct? And if so, can we somehow test for the present zfs version to be compatible with both formats? |
Hi, it should not. I fall back to checking the old kstat path if the new one doesn't exist. There is also a fallback for how the actual file contents are parsed. I've also added some tests that write out a kstat with the new format/filename to exercise the new code paths. Do you see an issue with it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronjwood I now see your approach and it's fine IMO. Sorry for having missed it in the first pass...
Given that the format changed I suspect/fear that this might happen again in the future. So I
wonder if we should be prepared and approach this in a slightly more general way. How about moving the probing to a probeVersion
function returning a format identifier (e.g. 1, 2, 3...). Based on this version we have different gather..._vX()
functions for X in {1,2,3,...}
that are tailored to the format of this version. This way we would have a clear structure and avoid complex if/else flow handlings dependent on versions. If there is common code, we can factor that into more general functions parameterized by the version-specific functions. What do you think?
No worries. What you suggest sounds good to me. I'll try and find some time in the coming weeks to change this. Things are a bit tight with the holidays coming up :) |
090ab3c
to
dd60a24
Compare
995a6c5
to
1c77f17
Compare
FWIW I'll try out the changes I pushed up just now over the weekend. I don't have the ability to put a new binary on my server at the moment :) |
Yep, still looking good on my server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an amazing update @aaronjwood! Some small leftovers, but we are almost good to go. I really love the new structure.
Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
Should be all set now :) Thanks for the reviews. Looking forward to having this upstreamed! |
📦 Looks like new artifacts were built from this PR. Expand this list to get them here ! 🐯Artifact URLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks for the nice work @aaronjwood!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this pr!
(cherry picked from commit 666bfe3)
* origin/master: (133 commits) chore: restart service if it is already running and upgraded via RPM (influxdata#9970) feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10237) fix: Handle duplicate registration of protocol-buffer files gracefully. (influxdata#10188) fix(http_listener_v2): fix panic on close (influxdata#10132) feat: add Vault input plugin (influxdata#10198) feat: support aws managed service for prometheus (influxdata#10202) fix: Make telegraf compile on Windows with golang 1.16.2 (influxdata#10246) Update changelog feat: Modbus add per-request tags (influxdata#10231) fix: Implement NaN and inf handling for elasticsearch output (influxdata#10196) feat: add nomad input plugin (influxdata#10106) fix: Print loaded plugins and deprecations for once and test (influxdata#10205) fix: eliminate MIB dependency for ifname processor (influxdata#10214) feat: Optimize locking for SNMP MIBs loading. (influxdata#10206) feat: Add SMART plugin concurrency configuration option, nvme-cli v1.14+ support and lint fixes. (influxdata#10150) feat: update configs (influxdata#10236) fix(inputs/kube_inventory): set TLS server name config properly (influxdata#9975) fix: Sudden close of Telegraf caused by OPC UA input plugin (influxdata#10230) fix: bump github.com/eclipse/paho.mqtt.golang from 1.3.0 to 1.3.5 (influxdata#9913) fix: json_v2 parser timestamp setting (influxdata#10221) fix: ensure graylog spec fields not prefixed with '_' (influxdata#10209) docs: remove duplicate links in CONTRIBUTING.md (influxdata#10218) fix: pool detection and metrics gathering for ZFS >= 2.1.x (influxdata#10099) fix: parallelism fix for ifname processor (influxdata#10007) chore: Forbids "log" package only for aggregators, inputs, outputs, parsers and processors (influxdata#10191) docs: address documentation gap when running telegraf in k8s (influxdata#10215) feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10211) fix: mqtt topic extracting no longer requires all three fields (influxdata#10208) fix: windows service - graceful shutdown of telegraf (influxdata#9616) feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10201) feat: Modbus support multiple slaves (gateway feature) (influxdata#9279) fix: Revert unintented corruption of the Makefile from influxdata#10200. (influxdata#10203) chore: remove triggering update-config bot in CI (influxdata#10195) Update changelog feat: Implement deprecation infrastructure (influxdata#10200) fix: extra lock on init for safety (influxdata#10199) fix: resolve influxdata#10027 (influxdata#10112) fix: register bigquery to output plugins influxdata#10177 (influxdata#10178) fix: sysstat use unique temp file vs hard-coded (influxdata#10165) refactor: snmp to use gosmi (influxdata#9518) ...
Required for all PRs:
partially resolves #8862
I moved to ZFS 2.1.1 the other day and noticed my pool metrics broke. After a bit of investigation I found out that things have changed significantly under the hood. I have this diff deployed to one of my servers and can show it working: