-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(inputs.systemd_units): Allow to query unloaded/disabled units #14814
Conversation
Output with this build looks so good!
Output for inactive (dead) and disabled service "telegraf" (using wildcards for search pattern) and a running active enabled service "crond":
|
My tests so far: Monitored at least 2 services same time, using wildcards in pattern. Started, removed, disabled and created new services (e.g. smartd). Killed service too. Telegraf log was clean (normal logging mode - no debug on). systemctl --version Only issue/anomaly:
Metrics: I hope to find time for testing @ services tomorrow. |
Tested again. Now normal "static" services (services without [install] part) and multi instances services. Telegraf log was clean (normal logging mode - no debug on). systemctl --version telegraf --version Like yesterday and uncritical: A loaded but inactive service produces as mem_current value with max 64-bit signed integer value. Maybe changelog and systemd_units README.md should mention that pattern search now relies on unit file name. When we have a service "lvm2-pvscan@252:2.service":
The old Telegraf v1.29.4 produces results with following config
systemd_units,active=active,host=localhost.localdomain,load=loaded,name=lvm2-pvscan@252:2.service,sub=exited load_code=0i,active_code=0i,sub_code=4i 1708028550000000000 Now Telegraf v1.30.0 does not produce any output with following config
OR
_ User now have to use:
OR
|
@1tft thanks for the finding. I will try to fix the changed behavior... |
Sadly found a serious issue regarding new release For static service (service without [Install] section) like telegraf produces 2 lines of metrics
After starting testservice.service telegraf produces 2 lines again, and does not recognized that service has been started. Also systemctl daemon-reload was no solution. Also reproduced issue when removing [Install] section from existing standard service like smartd. Than it was "static" and telegraf produced two false (indentical) metrics for every interval. My testservice:
|
Thank you very much @1tft for the thorough testing! That's much appreciated and invaluable! Hope I got all the corner cases right this time... |
Thank you for new build! Regarding mem_current. For static and not running services no mem_current is printed out.
But for other inactive services mem_current is still printed out:
|
@1tft thanks for testing this again! Hmmm I think all of those are a lie if I look at the number... ;-) The issue is that for the disabled units, systemd provides those numbers on request, while we can't get those for the "static" ones. Do you want me to set those fields to zero? I mean in InfluxDB you would get a @powersj any preferences? |
Regarding mem_current:
So I think: If can't be returned unsigned, it should be -1 in case of a non-running service or be redacted. For clarity, I would say:
|
@knollet not sure what you mean... We currently return what we get from DBUS without any modification. Regarding redacting/removing fields, I'm not sure if we should do this... |
Actually you don't. I don't know if you programmed anything or if the go dbus library is buggy, but you changed numbers. Do a You return for a non-running service You are converting it to a signed int for some reason and losing half the number doing it. If you convert 2^64-1 to signed int, though, a special value which should indicate that the value is actually missing, I would either return unsigned int max, or -1, which would both be the same bitwise: 64 1 bits. |
@knollet are you sure you enabled |
@1tft it may well be that this is the culprit. |
@knollet Setting @srebhan So we are OK with everything regarding mem_current handling (current behaviour, 0 or supressing complete field). I think 0 would be better than this very high value 9223372036854775807 / 18446744073709551615, only because for diagram scaling reasons. For InfluxDB its not relevant, because 9223372036854775807 is max... I think Readme.md should mention this special situation, than everybody can decide to build a custom processor for this special value and we do not have to invest any more time on this edge case. |
4419ba3
to
250a075
Compare
One more tweak... I now set the "invalid"/"unset" memory fields to zero. I also noticed that before (with the systemd command) the memory values were provided as signed-integers which is plain wrong as the dbus type is unsigned integer. So I fixed this as the feature was not yet released... |
@1tft could you please give this another (hopefully last) round of testing!? |
Co-authored-by: Joshua Powers <powersj@fastmail.com>
a1f9f8c
to
d565fae
Compare
Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
Summary
Unloaded or disabled units are neither listed with
systemctl list-units
notsystemctl show
assystemctl
only will work on loaded units (see systemd/systemd#5063) especially with wildcards!The only way to make this work is to determine the unit-files matching the patterns and then use the exact unit-name to infer further information as this will enforce loading the unit by systemd. Unfortunately, this is not particularly easy using the
systemctl
command, therefore, this PR basically rewrites theinputs.systemd_units
plugin to be based on DBUS communication.Important note: This PR changes the tag-names
uf_state
anduf_preset
tostate
andpreset
match the output ofsystemctl
. It also replaces thesubcommand
option by a booleandetails
config option to simplify code. Furthermore, the PR corrects the type ofmem_*
andswap_*
fields touint64
instead ofint
as this is the native underlying type of those fields.Checklist
Related issues
resolves #14763