Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for tracefs #963

Closed
5h4k4r opened this issue May 23, 2024 · 10 comments · Fixed by siderolabs/talos#8815
Closed

Add support for tracefs #963

5h4k4r opened this issue May 23, 2024 · 10 comments · Fixed by siderolabs/talos#8815

Comments

@5h4k4r
Copy link

5h4k4r commented May 23, 2024

Talos OS doesn’t allow tracefs probes. Only PMU probes are allowed, where maxactive can not be provided and is set to its default value.
The default value of maxactive is too low to support high throughput - causing data loss in data capture

@shkarface
Copy link

tracefs not being available renders some of the best eBPF observability tools useless, including @groundcover-com

@shkarface
Copy link

@smira any chance to look into this?

@smira
Copy link
Member

smira commented May 27, 2024

Do you have any proposal on the required changes to the kernel configuration?

@shkarface
Copy link

shkarface commented May 27, 2024

I do not have any knowledge on the kernel configuration, we came into this problem when installing eBPF based monitoring solutions, including @groundcover-com, the exact message from their engineers:

Technical background:

  • to provide traces of plaintext information, groundcover’s eBPF agent, Flora, attaches eBPF kernel entry and return probes, using tracefs as the attach method. Since a lot of data is handled in the kernel at once, we use a probe argument called maxactive to allow multiple return probes to operate at once: otherwise, we would miss a lot of the data, because the default value of this argument is the too low. The following link explains the technical details of why maxactive is needed link: add maxactive for kretprobe cilium/ebpf#755
  • The issue of missing return probes and consequentially missing data, when unable to set maxactive, is being discussed in many eBPF-based repositories. Some examples are attached. For this reason, eBPF-based network tools tend to use tracefs attachment, where maxactive can be provided.

The issue

  • Talos OS doesn’t allow tracefs probes. Only PMU probes are allowed, where maxactive can not be provided, and is set to its default value.
  • As specified above, the default value of maxactive is too low to support high throughput - causing data loss in data capture.
  • This is a problem with all known eBPF solutions - and currently has no workaround.

Work being done

  • This issue has been brought to the attention of Talos’s maintainers on April 2022, who have since not responded. Allow configuring additional host-wide mountpoints talos#5318. We hope that this will be resolved in the future.
  • As long as Talos doesn’t support tracefs probes, we can expect a lot (up to 99%) of plaintext traces to be missing on this platform.

@smira
Copy link
Member

smira commented May 27, 2024

I still don't understand what exactly is missing? Mount for tracefs?

@orishuss
Copy link

Yes. As discussed in siderolabs/talos#5318, /sys/kernel/tracing is not mounted.

@smira
Copy link
Member

smira commented May 27, 2024

Why this issue is opened in pkgs then? I don't quite understand.

@orishuss
Copy link

I believe that this issue and siderolabs/talos#5318 are almost the same, and the placement could be mistaken.
The point is that when /sys/kernel/tracing is not mounted, tracefs eBPF probes can not be attached to the OS, which in turn causes many tools that depend on such probes to not operate correctly.
@5h4k4r please correct me if any interpretation I made is not accurate.

smira added a commit to smira/talos that referenced this issue May 28, 2024
Fixes siderolabs/pkgs#963

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit da7f276)
@shkarface
Copy link

@smira thank you for the fix, do you think that this fix might be in a recent release?

@smira
Copy link
Member

smira commented May 28, 2024

It will be in Talos v1.7.3

smira added a commit to smira/talos that referenced this issue May 29, 2024
Fixes siderolabs/pkgs#963

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit da7f276)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants