Skip to content

Conversation

@Molter73
Copy link
Collaborator

@Molter73 Molter73 commented Oct 22, 2025

Description

In the context of ROX-30439, I started testing fact on some RHCOS versions and found some issues that are addressed by this PR, namely:

  • The relocation of kernfs_node for the renamed parent field was wrong, causing cgroups to not be captured correctly.
  • Some missing null checks that are not necessary on newer kernels.
  • path_unlink LSM hook is not allowed to call bpf_d_path on older kernels.

This last point is the most involved part of the PR, I couldn't find a precise way to check if this call is allowed on the running kernel, so instead I created a small checks BPF object that can try to load a small probe and check if the load was successful, then use a constant loaded at runtime to exclude the call when needed.

Checklist

  • Investigated and inspected CI test results
  • Updated documentation accordingly

Automated testing

  • Added unit tests
  • Added integration tests
  • Added regression tests

If any of these don't apply, please comment below.

Testing Performed

Manually run fact on RHCOS 4.15 and 4.18 VMs.
#131 adds bootstrap tests to multiple VMs for CI checking.

@Molter73 Molter73 force-pushed the mauro/rhcos-compat branch 2 times, most recently from 6e2c746 to 4577812 Compare October 27, 2025 15:16
@Molter73 Molter73 force-pushed the mauro/ROX-30836/hotreload-paths branch from 35111ae to 04c7e8a Compare October 29, 2025 11:33
@Molter73 Molter73 force-pushed the mauro/ROX-30836/hotreload-paths branch from 04c7e8a to a306241 Compare October 30, 2025 09:55
@Molter73 Molter73 force-pushed the mauro/ROX-30836/hotreload-paths branch from a306241 to e219451 Compare October 30, 2025 10:46
Base automatically changed from mauro/ROX-30836/hotreload-paths to main October 30, 2025 11:05
@Molter73 Molter73 force-pushed the mauro/rhcos-compat branch 2 times, most recently from ea10e32 to fe72605 Compare October 30, 2025 11:39
@Molter73 Molter73 marked this pull request as ready for review October 30, 2025 11:42
@Molter73 Molter73 requested a review from Stringy October 31, 2025 10:39
}

__always_inline static struct bound_path_t* path_read(struct path* path) {
__always_inline static struct bound_path_t* _path_read(struct path* path, bool use_bpf_d_path) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to have a second though on how much this is needed. We did not plan originally to have path_unlink in the first milestone, even more so backward compatible implementation for older kernels. At the same time the implementation complexity seems to be quite significant because of that. How about keep it tight, and record lack of path_unlink support for older versions as a known limitation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did not plan originally to have path_unlink in the first milestone

I know, but at this point, the distinction between m1 and m2 is pretty blurry + we are running out of time and we need to get things done, that includes implementing at least a basic set of file operations IMO.

even more so backward compatible implementation for older kernels

The changes are needed for fact to be compatible with RHCOS 4.15 and 4.18 (RHEL 9), so it's not like I'm backporting to ancient kernels, this is very much versions being used in production right now.

At the same time the implementation complexity seems to be quite significant because of that.

I wouldn't consider loading a small probe to check whether we can use bpf_d_path in path_unlink hooks and setting a global variable to be too complex. If you are concerned about the d_path implementation, that was already implemented, I just moved it to its own file because I was getting circular includes and moving it out to its own header was the most straightforward solution. This implementation is also needed to get the exe_path from the process, so it's not like we can just get rid of it.

How about keep it tight, and record lack of path_unlink support for older versions as a known limitation?

Problem with this is we don't just not have path_unlink support, the probe is rejected by the verifier, so we would still need to at least have the checks in place to know when we should attempt to load path_unlink and when to skip it.

// clang-format on

SEC("lsm/path_unlink")
int BPF_PROG(check_path_unlink_supports_bpf_d_path, struct path* dir, struct dentry* dentry) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is another option. From what I understand we know the exact commit (which on bpf-next resolves to v6.7-rc3), where an LSM hook on path_unlink became capable of calling bpf_d_path. How about just doing a condition on the kernel version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this, but RHEL 9 uses 5.14.0 for all its kernels, we would need to figure out when/if the change was backported and have a separate check for that, I figured probing was more reliable than that.
(Also, I suck at navigating the kernel mailing list and I couldn't find the exact commit that made this change, thanks for doing that!).

* Reimplementation of the kernel d_path function.
*
* We should attempt to use bpf_d_path when possible, but you can't on
* values that have been read using the bpf_probe_* helpers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any clarification on why? I assume because bpf_d_path requires a path structure with BTF information, and bpf_probe_* helpers do not provide it, is that so?

Copy link
Collaborator Author

@Molter73 Molter73 Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I try to use bpf_d_path to read the path to the executable path of task like this:

    BPF_CORE_READ_INTO(&path, task, mm, exe_file, f_path);
    bpf_d_path(&path, p->lineage[i].exe_path, PATH_MAX);

I get the following verifier error:

657: (85) call bpf_probe_read_kernel#113      ; R0=scalar() fp-16=mmmmmmmm fp-24=mmmmmmmm refs=4
; bpf_d_path(&path, p->lineage[i].exe_path, PATH_MAX); @ process.h:95
658: (bf) r2 = r9                     ; R2_w=ringbuf_mem(ref_obj_id=4,sz=28736) R9=ringbuf_mem(ref_obj_id=4,sz=28736) refs=4
659: (07) r2 += 12336                 ; R2_w=ringbuf_mem(ref_obj_id=4,off=12336,sz=28736) refs=4
660: (bf) r1 = r6                     ; R1_w=fp-24 R6=fp-24 refs=4
661: (b4) w3 = 4096                   ; R3_w=4096 refs=4
662: (85) call bpf_d_path#147
R1 type=fp expected=ptr_, trusted_ptr_, rcu_ptr_
verification time 4729 usec
stack depth 128
processed 1731 insns (limit 1000000) max_states_per_insn 4 total_states 68 peak_states 68 mark_read 64

Searching for the error I found this answer on SO: https://stackoverflow.com/a/79348591

So it seems that you can't use variables in the stack with bpf_d_path, but these are very much needed when accessing things with bpf_probe_read_*, so we need to use our own implementation of d_path, which sucks.

On newer kernels we can probably use bpf_path_d_path which may help... Except aya doesn't support KFuncs yet, so 🤷🏻

return *res != 0;

// The NULL check is simply here to satisfy some verifiers, the result
// will never actually be NULL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it never be NULL? From what I see the corresponding lookup function can return NULL in some cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of, this is not the trie map, this is a regular map with a single entry that determines whether the trie map has anything and if it should be used for filtering.

Because the map only has one value and it is allocated when the probe is first loaded, and because we are always asking for that one entry (by using int zero = 0; as the key), the lookup will always return that one element.

This is further validated by the fact this code ran without the null check on newer kernels that are smarter about this kind of thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, and suggest now to add few more lines of commentaries about this around the filter_by_prefix_map.

}

bound_path->len = bpf_d_path(path, bound_path->path, PATH_MAX);
bound_path->len = use_bpf_d_path ? bpf_d_path(path, bound_path->path, PATH_MAX) : d_path(path, bound_path->path, PATH_MAX);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL that bpf_d_path is already legacy :) But for a good reason, looks like there is a possibility of concurrent modification of the path structure, leading to use-after-free -- I assume the same is correct for using copied d_path function as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the copied d_path function is probably worse actually, but it is not like we have much choice ATM. As mentioned in another comment, the newer bpf_path_d_path is only available on pretty new kernels and aya doesn't support KFuncs yet.

Copy link
Contributor

@erthalion erthalion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good. Few commentaries below.

@Molter73 Molter73 requested a review from erthalion October 31, 2025 15:49
@Molter73 Molter73 mentioned this pull request Nov 13, 2025
5 tasks
@JoukoVirtanen
Copy link

I have deployed fact on a GKE cluster using the image built from this PR and it runs fine. I have also done the same using an image from master and it results in a verifier issue.

In the context of ROX-30439, I started testing fact on some RHCOS
versions and found some issues that are addressed by this PR, namely:

* The relocation of kernfs_node for the renamed parent field was wrong,
  causing cgroups to not be captured correctly.
* Some missing null checks that are not necessary on newer kernels.
* path_unlink LSM hook is not allowed to call bpf_d_path on older
  kernels.

This last point is the most involved part of the PR, I couldn't find a
precise way to check if this call is allowed on the running kernel, so
instead I created a small checks BPF object that can try to load a small
probe and check if the load was successful, then use a constant loaded
at runtime to exclude the call when needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants