Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: pool detection and metrics gathering for ZFS >= 2.1.x #10099

Merged
merged 3 commits into from
Dec 6, 2021

Conversation

aaronjwood
Copy link
Contributor

@aaronjwood aaronjwood commented Nov 12, 2021

Required for all PRs:

partially resolves #8862

I moved to ZFS 2.1.1 the other day and noticed my pool metrics broke. After a bit of investigation I found out that things have changed significantly under the hood. I have this diff deployed to one of my servers and can show it working:
Screenshot from 2021-11-12 12-44-09

@telegraf-tiger telegraf-tiger bot added the fix pr to fix corresponding bug label Nov 12, 2021
@aaronjwood
Copy link
Contributor Author

Did the lint rules change recently? It seems to be flagged a lot of things in the ZFS plugin README that I didn't touch.

@aaronjwood aaronjwood changed the title Fix pool detection and metrics gathering for ZFS >= 2.1.x fix (inputs/zfs): pool detection and metrics gathering for ZFS >= 2.1.x Nov 12, 2021
@srebhan
Copy link
Member

srebhan commented Nov 15, 2021

Hey @aaronjwood, thanks for looking into this. From your changes it looks like the PR is breaking backward compatibility, i.e. you cannot use a zfs < 2.1.x with the PR (e.g. by changing the proc-file path). Is my understanding correct? And if so, can we somehow test for the present zfs version to be compatible with both formats?

@aaronjwood
Copy link
Contributor Author

Hey @aaronjwood, thanks for looking into this. From your changes it looks like the PR is breaking backward compatibility, i.e. you cannot use a zfs < 2.1.x with the PR (e.g. by changing the proc-file path). Is my understanding correct? And if so, can we somehow test for the present zfs version to be compatible with both formats?

Hi, it should not. I fall back to checking the old kstat path if the new one doesn't exist. There is also a fallback for how the actual file contents are parsed. I've also added some tests that write out a kstat with the new format/filename to exercise the new code paths.

Do you see an issue with it?

Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronjwood I now see your approach and it's fine IMO. Sorry for having missed it in the first pass...

Given that the format changed I suspect/fear that this might happen again in the future. So I
wonder if we should be prepared and approach this in a slightly more general way. How about moving the probing to a probeVersion function returning a format identifier (e.g. 1, 2, 3...). Based on this version we have different gather..._vX() functions for X in {1,2,3,...} that are tailored to the format of this version. This way we would have a clear structure and avoid complex if/else flow handlings dependent on versions. If there is common code, we can factor that into more general functions parameterized by the version-specific functions. What do you think?

plugins/inputs/zfs/zfs_linux.go Outdated Show resolved Hide resolved
plugins/inputs/zfs/zfs_linux.go Outdated Show resolved Hide resolved
plugins/inputs/zfs/zfs_linux.go Outdated Show resolved Hide resolved
@aaronjwood
Copy link
Contributor Author

@aaronjwood I now see your approach and it's fine IMO. Sorry for having missed it in the first pass...

Given that the format changed I suspect/fear that this might happen again in the future. So I
wonder if we should be prepared and approach this in a slightly more general way. How about moving the probing to a probeVersion function returning a format identifier (e.g. 1, 2, 3...). Based on this version we have different gather..._vX() functions for X in {1,2,3,...} that are tailored to the format of this version. This way we would have a clear structure and avoid complex if/else flow handlings dependent on versions. If there is common code, we can factor that into more general functions parameterized by the version-specific functions. What do you think?

No worries. What you suggest sounds good to me. I'll try and find some time in the coming weeks to change this. Things are a bit tight with the holidays coming up :)

@srebhan srebhan self-assigned this Nov 17, 2021
@aaronjwood aaronjwood changed the title fix (inputs/zfs): pool detection and metrics gathering for ZFS >= 2.1.x fix: pool detection and metrics gathering for ZFS >= 2.1.x Nov 27, 2021
@aaronjwood
Copy link
Contributor Author

FWIW I'll try out the changes I pushed up just now over the weekend. I don't have the ability to put a new binary on my server at the moment :)

@aaronjwood
Copy link
Contributor Author

Yep, still looking good on my server.

Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an amazing update @aaronjwood! Some small leftovers, but we are almost good to go. I really love the new structure.

plugins/inputs/zfs/zfs_linux.go Outdated Show resolved Hide resolved
plugins/inputs/zfs/zfs_linux.go Outdated Show resolved Hide resolved
plugins/inputs/zfs/zfs_linux.go Show resolved Hide resolved
aaronjwood and others added 2 commits November 30, 2021 10:03
Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
@aaronjwood
Copy link
Contributor Author

Should be all set now :) Thanks for the reviews. Looking forward to having this upstreamed!

@telegraf-tiger
Copy link
Contributor

Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for the nice work @aaronjwood!

@srebhan srebhan added plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. labels Dec 1, 2021
Copy link
Contributor

@MyaLongmire MyaLongmire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this pr!

@MyaLongmire MyaLongmire merged commit 666bfe3 into influxdata:master Dec 6, 2021
MyaLongmire pushed a commit that referenced this pull request Dec 8, 2021
powersj pushed a commit to powersj/telegraf that referenced this pull request Jan 21, 2022
phemmer added a commit to phemmer/telegraf that referenced this pull request Feb 18, 2022
* origin/master: (133 commits)
  chore: restart service if it is already running and upgraded via RPM (influxdata#9970)
  feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10237)
  fix: Handle duplicate registration of protocol-buffer files gracefully. (influxdata#10188)
  fix(http_listener_v2): fix panic on close (influxdata#10132)
  feat: add Vault input plugin (influxdata#10198)
  feat: support aws managed service for prometheus (influxdata#10202)
  fix: Make telegraf compile on Windows with golang 1.16.2 (influxdata#10246)
  Update changelog
  feat: Modbus add per-request tags (influxdata#10231)
  fix: Implement NaN and inf handling for elasticsearch output (influxdata#10196)
  feat: add nomad input plugin (influxdata#10106)
  fix: Print loaded plugins and deprecations for once and test (influxdata#10205)
  fix: eliminate MIB dependency for ifname processor (influxdata#10214)
  feat: Optimize locking for SNMP MIBs loading. (influxdata#10206)
  feat: Add SMART plugin concurrency configuration option, nvme-cli v1.14+ support and lint fixes. (influxdata#10150)
  feat: update configs (influxdata#10236)
  fix(inputs/kube_inventory): set TLS server name config properly (influxdata#9975)
  fix: Sudden close of Telegraf caused by OPC UA input plugin (influxdata#10230)
  fix: bump github.com/eclipse/paho.mqtt.golang from 1.3.0 to 1.3.5 (influxdata#9913)
  fix: json_v2 parser timestamp setting (influxdata#10221)
  fix: ensure graylog spec fields not prefixed with '_' (influxdata#10209)
  docs: remove duplicate links in CONTRIBUTING.md (influxdata#10218)
  fix: pool detection and metrics gathering for ZFS >= 2.1.x (influxdata#10099)
  fix: parallelism fix for ifname processor (influxdata#10007)
  chore: Forbids "log" package only for aggregators, inputs, outputs, parsers and processors (influxdata#10191)
  docs: address documentation gap when running telegraf in k8s (influxdata#10215)
  feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10211)
  fix: mqtt topic extracting no longer requires all three fields (influxdata#10208)
  fix: windows service - graceful shutdown of telegraf (influxdata#9616)
  feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10201)
  feat: Modbus support multiple slaves (gateway feature) (influxdata#9279)
  fix: Revert unintented corruption of the Makefile from influxdata#10200. (influxdata#10203)
  chore: remove triggering update-config bot in CI (influxdata#10195)
  Update changelog
  feat: Implement deprecation infrastructure (influxdata#10200)
  fix: extra lock on init for safety (influxdata#10199)
  fix: resolve influxdata#10027 (influxdata#10112)
  fix: register bigquery to output plugins influxdata#10177 (influxdata#10178)
  fix: sysstat use unique temp file vs hard-coded (influxdata#10165)
  refactor: snmp to use gosmi (influxdata#9518)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix pr to fix corresponding bug plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Updated ZFS measurements for OpenZFS 2.0
3 participants