Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(inputs.chrony): Use DGRAM for the unix socket #15552

Merged
merged 3 commits into from
Aug 9, 2024

Conversation

Frankkkkk
Copy link
Contributor

Summary

We must use the DGRAM type when connecting to the chrony socket. On top of that, the socket must be duplicated in order to allow concurrent calls to it.

Checklist

  • No AI generated code was used in this PR

Related issues

resolves #15549

@telegraf-tiger telegraf-tiger bot added fix pr to fix corresponding bug plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Jun 21, 2024
@Frankkkkk Frankkkkk force-pushed the chrony-unix branch 2 times, most recently from 794fb7e to 5ef9848 Compare June 23, 2024 22:16
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @Frankkkkk! I do have some questions and comments in the code. One additional request is to somewhere add a comment with a reference to a document stating that the socket has to be unixgram! Looking forward to the next round!

plugins/inputs/chrony/chrony.go Outdated Show resolved Hide resolved
plugins/inputs/chrony/chrony.go Outdated Show resolved Hide resolved
plugins/inputs/chrony/chrony.go Outdated Show resolved Hide resolved
plugins/inputs/chrony/chrony.go Outdated Show resolved Hide resolved
plugins/inputs/chrony/chrony.go Outdated Show resolved Hide resolved
@srebhan srebhan changed the title fix(inputs.chrony): use DGRAM for the unix socket fix(inputs.chrony): Use DGRAM for the unix socket Jun 24, 2024
@srebhan srebhan self-assigned this Jun 24, 2024
@Frankkkkk Frankkkkk force-pushed the chrony-unix branch 3 times, most recently from f220411 to 32c8763 Compare June 24, 2024 10:44
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more questions...

plugins/inputs/chrony/README.md Show resolved Hide resolved
plugins/inputs/chrony/chrony.go Outdated Show resolved Hide resolved
plugins/inputs/chrony/chrony.go Outdated Show resolved Hide resolved
We must use the DGRAM type when connecting to the chrony socket. On top
of that, the socket must be duplicated in order to allow concurrent
calls to it.

Fixes influxdata#15549

Signed-off-by: Frank Villaro-Dixon <frank@villaro-dixon.eu>
Frankkkkk added a commit to Frankkkkk/telegraf that referenced this pull request Jun 26, 2024
Chrony ≥ 4.4 introduces a new ServerStats response with u64 numbers and
more fields.

Fixes influxdata#15553

NB: The plugin still doesn't work due to influxdata#15549. To test this commit,
please make sure you use influxdata#15552.

Signed-off-by: Frank Villaro-Dixon <frank@villaro-dixon.eu>
Frankkkkk added a commit to Frankkkkk/telegraf that referenced this pull request Jun 27, 2024
Chrony ≥ 4.4 introduces a new ServerStats response with u64 numbers and
more fields.

Fixes influxdata#15553

NB: The plugin still doesn't work due to influxdata#15549. To test this commit,
please make sure you use influxdata#15552.

Signed-off-by: Frank Villaro-Dixon <frank@villaro-dixon.eu>
Signed-off-by: Frank Villaro-Dixon <frank.villarodixon@merkle.com>
@Frankkkkk
Copy link
Contributor Author

@powersj @srebhan Can we recap what is needed for this MR/issue to be closed ?

Do we agree that we still need unix sockets in order to access the serverstats metrics (which are quite helpful IMHO)?

If so, then:

  • Is the dialUnix function okay with you? We still need to remove the sock file at the end. Should I create an intermediate struct ? What do you recommend ?

Thanks!

@srebhan
Copy link
Member

srebhan commented Jun 27, 2024

@Frankkkkk I would say

  1. Do not add "unixgram" as a scheme but fix the existing "unix" one with the dialUnix function as is.
  2. Close and remove the local socket file in the Stop() function of the plugin. For this you probably need to store the filename in the plugin struct I guess.
  3. Clarify if the file permissions really have to be 0666 or if we can find a solution with 0660 or even 0600. Fix the linter warning accordingly.

Regarding item 3: I don't think we can accept 0666 as this opens the socket to everyone posing a security issue. I would rather want to see either a user-specified permission and/or chgrp'ing the socket to the chrony group (probably also a user setting). @powersj what do you think?

@Frankkkkk
Copy link
Contributor Author

Frankkkkk commented Jun 27, 2024

Regarding the unixgram vs unix, I don't think it will work with unix as it's basically 2 different protocols. Trying it it says:

2024-06-27T21:44:06Z E! [agent] Starting input inputs.chrony: dialing "unix:///run/chrony/chronyd.sock" failed: dial unix /run/chrony/chrony-telegraf-e04231d6-396a-452b-b1de-78681a384010.sock->/run/chrony/chronyd.sock: connect: protocol wrong type for socket

Regarding the 666: Without it it doesn't work. I don't know why. Maybe we should check that before proceeding further..

Looking at chronyc, it basically does the same, that's interesting:

bind(3, {sa_family=AF_UNIX, sun_path="/var/run/chrony/chronyc.160046.sock"}, 110) = 0
chmod("/var/run/chrony/chronyc.160046.sock", 0666) = 0
connect(3, {sa_family=AF_UNIX, sun_path="/var/run/chrony/chronyd.sock"}, 110) = 0

Pinging @mlichvar, one of the fathers of chrony: can you help us here? Thanks!

@mlichvar
Copy link

chronyd normally runs under a non-root user, so the client's socket needs to have permissions or ownership set in such a way that chronyd will be able to write to it (to send the response). Access to the socket is still protected by the permissions/ownership of the /var/run/chrony directory.

@srebhan
Copy link
Member

srebhan commented Jun 28, 2024

@Frankkkkk sorry for the confusion I cause! :-( What I meant with

unixgram vs unix,

is to use the code you do have now but using a dgram socket if the user specifies a unix:///my/socket instead of making the user specifying unixgram:///my/socket...

@mlichvar thanks for chiming in! Would it work if we set the group of the socket to chrony's group and set 0660? I guess it would and would improve security, wouldn't it?

@mlichvar
Copy link

mlichvar commented Jul 1, 2024

/run/chrony should be accessible only to the chrony group, so I don't think changing group on the socket would do anything. To me it seems like an unnecessary complication.

@Frankkkkk
Copy link
Contributor Author

Frankkkkk commented Jul 15, 2024 via email

@srebhan
Copy link
Member

srebhan commented Jul 17, 2024

@Frankkkkk of course the telegraf-user (the one executing the telegraf) need to be in the correct group... :-)

@powersj
Copy link
Contributor

powersj commented Jul 19, 2024

Where 0666 works and 0660 or 0600 doesn't, can someone confirm the ownership and permissions on the /run/chrony directory, what user telegraf is running as and what group membership that user has?

Confirmed that 0660 hangs:

root@j1:~# stat /run/chrony
  File: /run/chrony
  Size: 120       	Blocks: 0          IO Block: 4096   directory
Device: 4ah/74d	Inode: 342         Links: 2
Access: (0750/drwxr-x---)  Uid: (  106/ _chrony)   Gid: (  113/ _chrony)
Access: 2024-07-19 15:15:09.536385535 +0000
Modify: 2024-07-19 15:13:19.722551918 +0000
Change: 2024-07-19 15:13:19.722551918 +0000
 Birth: 2024-07-19 15:09:35.908770107 +0000
root@j1:~# whoami
root
root@j1:~# groups
root
root@j1:~# ./telegraf-660 --config config.toml --once
2024-07-19T15:13:19Z I! Loading config: config.toml
2024-07-19T15:13:19Z I! Starting Telegraf 1.32.0-2fcfc3eb brought to you by InfluxData the makers of InfluxDB
2024-07-19T15:13:19Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 6 secret-stores
2024-07-19T15:13:19Z I! Loaded inputs: chrony
2024-07-19T15:13:19Z I! Loaded aggregators: 
2024-07-19T15:13:19Z I! Loaded processors: 
2024-07-19T15:13:19Z I! Loaded secretstores: 
2024-07-19T15:13:19Z I! Loaded outputs: file
2024-07-19T15:13:19Z I! Tags enabled: 
2024-07-19T15:13:19Z D! [agent] Initializing plugins
2024-07-19T15:13:19Z D! [agent] Connecting outputs
2024-07-19T15:13:19Z D! [agent] Attempting connection to [outputs.file]
2024-07-19T15:13:19Z D! [agent] Successfully connected to outputs.file
2024-07-19T15:13:19Z D! [agent] Starting service inputs
2024-07-19T15:13:19Z D! [inputs.chrony]  Connected to "unixgram:///var/run/chrony/chronyd.sock"...
2024-07-19T15:13:29Z D! [outputs.file]  Buffer fullness: 0 / 10000 metrics
2024-07-19T15:13:39Z D! [outputs.file]  Buffer fullness: 0 / 10000 metrics
2024-07-19T15:13:49Z D! [outputs.file]  Buffer fullness: 0 / 10000 metrics
2024-07-19T15:13:59Z D! [outputs.file]  Buffer fullness: 0 / 10000 metrics
2024-07-19T15:14:09Z D! [outputs.file]  Buffer fullness: 0 / 10000 metrics
2024-07-19T15:14:19Z D! [outputs.file]  Buffer fullness: 0 / 10000 metrics
2024-07-19T15:14:29Z D! [outputs.file]  Buffer fullness: 0 / 10000 metrics
^C2024-07-19T15:14:39Z D! [outputs.file]  Buffer fullness: 0 / 10000 metrics
^C^C^C^C^C2024-07-19T15:14:49Z D! [outputs.file]  Buffer fullness: 0 / 10000 metrics
Killed
root@j1:~# 

@powersj
Copy link
Contributor

powersj commented Jul 24, 2024

Having the socket with 0660 and ensuring it matches the _chonry group's GID works.

Is the way forward then to expose new config options for the group (default: _chrony) and permissions (default: 0660) and only try creating this socket if the user is trying to collect serverstats metrics?

[[inputs.chrony]]
    server = "unixgram:///run/chrony/chronyd.sock"
    metrics = ["serverstats"]
[[outputs.file]]
diff --git plugins/inputs/chrony/chrony.go plugins/inputs/chrony/chrony.go
index 784f8e929..dfddd1b06 100644
--- plugins/inputs/chrony/chrony.go
+++ plugins/inputs/chrony/chrony.go
@@ -9,6 +9,7 @@ import (
        "net"
        "net/url"
        "os"
+       "os/user"
        "path"
        "strconv"
        "syscall"
@@ -55,7 +56,19 @@ func dialUnix(address string) (*net.UnixConn, error) {
                return nil, err
        }
 
-       if err := os.Chmod(local, 0666); err != nil {
+       if err := os.Chmod(local, 0660); err != nil {
+               return nil, err
+       }
+
+       group, err := user.LookupGroup("_chrony")
+       if err != nil {
+               return nil, err
+       }
+       gid, err := strconv.Atoi(group.Gid)
+       if err != nil {
+               return nil, err
+       }
+       if err := os.Chown(local, os.Getuid(), gid); err != nil {
                return nil, err
        }
root@j1:~# ./telegraf --config config.toml --once
2024-07-24T19:59:07Z I! Loading config: config.toml
2024-07-24T19:59:07Z I! Starting Telegraf 1.32.0-2fcfc3eb brought to you by InfluxData the makers of InfluxDB
2024-07-24T19:59:07Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 6 secret-stores
2024-07-24T19:59:07Z I! Loaded inputs: chrony
2024-07-24T19:59:07Z I! Loaded aggregators: 
2024-07-24T19:59:07Z I! Loaded processors: 
2024-07-24T19:59:07Z I! Loaded secretstores: 
2024-07-24T19:59:07Z I! Loaded outputs: file
2024-07-24T19:59:07Z I! Tags enabled: host=j1
2024-07-24T19:59:07Z I! [agent] Hang on, flushing any cached metrics before shutdown
chrony_serverstats,host=j1,source=/run/chrony/chronyd.sock ntp_drops=0i,ntp_timestamps=0i,cmd_hits=5i,cmd_drops=0i,log_drops=0i,ntp_hits=0i,ntp_auth_hits=0i,ntp_interleaved_hits=0i,ntp_span_seconds=0i,nke_hits=0i,nke_drops=0i 1721851147000000000
2024-07-24T19:59:07Z I! [agent] Stopping running outputs

@srebhan srebhan removed their assignment Jul 31, 2024
* Introduce two config options for the chrony group and permissions
of the socket. Use these to set the permissions and group ownership of
the temporary socket.
* Clean up the temporary socket on close

TODO: do we need to keep "unix" as a supported scheme
TODO: do we need to make the local socket opt-in or only when requesting
  server stats to avoid regressions
@powersj
Copy link
Contributor

powersj commented Aug 1, 2024

@srebhan,

I've pushed a commit, 46fc4df, which includes a few TODO I'd like your feedback on.

@srebhan srebhan assigned srebhan and unassigned powersj Aug 7, 2024
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small comment about the README section naming... Other than that the PR looks good.

plugins/inputs/chrony/README.md Outdated Show resolved Hide resolved
Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Aug 7, 2024

Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip.
Downloads for additional architectures and packages are available below.

🥳 This pull request decreases the Telegraf binary size by -7.56 % for linux amd64 (new size: 243.7 MB, nightly size 263.6 MB)

📦 Click here to get additional PR build artifacts

Artifact URLs

DEB RPM TAR GZ ZIP
amd64.deb aarch64.rpm darwin_amd64.tar.gz windows_amd64.zip
arm64.deb armel.rpm darwin_arm64.tar.gz windows_arm64.zip
armel.deb armv6hl.rpm freebsd_amd64.tar.gz windows_i386.zip
armhf.deb i386.rpm freebsd_armv7.tar.gz
i386.deb ppc64le.rpm freebsd_i386.tar.gz
mips.deb riscv64.rpm linux_amd64.tar.gz
mipsel.deb s390x.rpm linux_arm64.tar.gz
ppc64el.deb x86_64.rpm linux_armel.tar.gz
riscv64.deb linux_armhf.tar.gz
s390x.deb linux_i386.tar.gz
linux_mips.tar.gz
linux_mipsel.tar.gz
linux_ppc64le.tar.gz
linux_riscv64.tar.gz
linux_s390x.tar.gz

@powersj powersj added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Aug 7, 2024
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect thanks @Frankkkkk and @powersj! And thanks @mlichvar for your great input!

@srebhan srebhan removed their assignment Aug 7, 2024
@DStrand1 DStrand1 merged commit 9df4800 into influxdata:master Aug 9, 2024
27 checks passed
@github-actions github-actions bot added this to the v1.31.3 milestone Aug 9, 2024
powersj pushed a commit that referenced this pull request Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix pr to fix corresponding bug plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[chrony] unix socket connection is broken
6 participants