-
Notifications
You must be signed in to change notification settings - Fork 1
fix(fact-ebpf): improve kernel compatibility #125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6e2c746 to
4577812
Compare
35111ae to
04c7e8a
Compare
4577812 to
cef027c
Compare
04c7e8a to
a306241
Compare
cef027c to
5e3dc58
Compare
a306241 to
e219451
Compare
5e3dc58 to
c74f34f
Compare
ea10e32 to
fe72605
Compare
| } | ||
|
|
||
| __always_inline static struct bound_path_t* path_read(struct path* path) { | ||
| __always_inline static struct bound_path_t* _path_read(struct path* path, bool use_bpf_d_path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm starting to have a second though on how much this is needed. We did not plan originally to have path_unlink in the first milestone, even more so backward compatible implementation for older kernels. At the same time the implementation complexity seems to be quite significant because of that. How about keep it tight, and record lack of path_unlink support for older versions as a known limitation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did not plan originally to have path_unlink in the first milestone
I know, but at this point, the distinction between m1 and m2 is pretty blurry + we are running out of time and we need to get things done, that includes implementing at least a basic set of file operations IMO.
even more so backward compatible implementation for older kernels
The changes are needed for fact to be compatible with RHCOS 4.15 and 4.18 (RHEL 9), so it's not like I'm backporting to ancient kernels, this is very much versions being used in production right now.
At the same time the implementation complexity seems to be quite significant because of that.
I wouldn't consider loading a small probe to check whether we can use bpf_d_path in path_unlink hooks and setting a global variable to be too complex. If you are concerned about the d_path implementation, that was already implemented, I just moved it to its own file because I was getting circular includes and moving it out to its own header was the most straightforward solution. This implementation is also needed to get the exe_path from the process, so it's not like we can just get rid of it.
How about keep it tight, and record lack of path_unlink support for older versions as a known limitation?
Problem with this is we don't just not have path_unlink support, the probe is rejected by the verifier, so we would still need to at least have the checks in place to know when we should attempt to load path_unlink and when to skip it.
| // clang-format on | ||
|
|
||
| SEC("lsm/path_unlink") | ||
| int BPF_PROG(check_path_unlink_supports_bpf_d_path, struct path* dir, struct dentry* dentry) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is another option. From what I understand we know the exact commit (which on bpf-next resolves to v6.7-rc3), where an LSM hook on path_unlink became capable of calling bpf_d_path. How about just doing a condition on the kernel version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, but RHEL 9 uses 5.14.0 for all its kernels, we would need to figure out when/if the change was backported and have a separate check for that, I figured probing was more reliable than that.
(Also, I suck at navigating the kernel mailing list and I couldn't find the exact commit that made this change, thanks for doing that!).
| * Reimplementation of the kernel d_path function. | ||
| * | ||
| * We should attempt to use bpf_d_path when possible, but you can't on | ||
| * values that have been read using the bpf_probe_* helpers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any clarification on why? I assume because bpf_d_path requires a path structure with BTF information, and bpf_probe_* helpers do not provide it, is that so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I try to use bpf_d_path to read the path to the executable path of task like this:
BPF_CORE_READ_INTO(&path, task, mm, exe_file, f_path);
bpf_d_path(&path, p->lineage[i].exe_path, PATH_MAX);I get the following verifier error:
657: (85) call bpf_probe_read_kernel#113 ; R0=scalar() fp-16=mmmmmmmm fp-24=mmmmmmmm refs=4
; bpf_d_path(&path, p->lineage[i].exe_path, PATH_MAX); @ process.h:95
658: (bf) r2 = r9 ; R2_w=ringbuf_mem(ref_obj_id=4,sz=28736) R9=ringbuf_mem(ref_obj_id=4,sz=28736) refs=4
659: (07) r2 += 12336 ; R2_w=ringbuf_mem(ref_obj_id=4,off=12336,sz=28736) refs=4
660: (bf) r1 = r6 ; R1_w=fp-24 R6=fp-24 refs=4
661: (b4) w3 = 4096 ; R3_w=4096 refs=4
662: (85) call bpf_d_path#147
R1 type=fp expected=ptr_, trusted_ptr_, rcu_ptr_
verification time 4729 usec
stack depth 128
processed 1731 insns (limit 1000000) max_states_per_insn 4 total_states 68 peak_states 68 mark_read 64
Searching for the error I found this answer on SO: https://stackoverflow.com/a/79348591
So it seems that you can't use variables in the stack with bpf_d_path, but these are very much needed when accessing things with bpf_probe_read_*, so we need to use our own implementation of d_path, which sucks.
On newer kernels we can probably use bpf_path_d_path which may help... Except aya doesn't support KFuncs yet, so 🤷🏻
| return *res != 0; | ||
|
|
||
| // The NULL check is simply here to satisfy some verifiers, the result | ||
| // will never actually be NULL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would it never be NULL? From what I see the corresponding lookup function can return NULL in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of, this is not the trie map, this is a regular map with a single entry that determines whether the trie map has anything and if it should be used for filtering.
Because the map only has one value and it is allocated when the probe is first loaded, and because we are always asking for that one entry (by using int zero = 0; as the key), the lookup will always return that one element.
This is further validated by the fact this code ran without the null check on newer kernels that are smarter about this kind of thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, and suggest now to add few more lines of commentaries about this around the filter_by_prefix_map.
| } | ||
|
|
||
| bound_path->len = bpf_d_path(path, bound_path->path, PATH_MAX); | ||
| bound_path->len = use_bpf_d_path ? bpf_d_path(path, bound_path->path, PATH_MAX) : d_path(path, bound_path->path, PATH_MAX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL that bpf_d_path is already legacy :) But for a good reason, looks like there is a possibility of concurrent modification of the path structure, leading to use-after-free -- I assume the same is correct for using copied d_path function as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the copied d_path function is probably worse actually, but it is not like we have much choice ATM. As mentioned in another comment, the newer bpf_path_d_path is only available on pretty new kernels and aya doesn't support KFuncs yet.
erthalion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good. Few commentaries below.
|
I have deployed fact on a GKE cluster using the image built from this PR and it runs fine. I have also done the same using an image from master and it results in a verifier issue. |
In the context of ROX-30439, I started testing fact on some RHCOS versions and found some issues that are addressed by this PR, namely: * The relocation of kernfs_node for the renamed parent field was wrong, causing cgroups to not be captured correctly. * Some missing null checks that are not necessary on newer kernels. * path_unlink LSM hook is not allowed to call bpf_d_path on older kernels. This last point is the most involved part of the PR, I couldn't find a precise way to check if this call is allowed on the running kernel, so instead I created a small checks BPF object that can try to load a small probe and check if the load was successful, then use a constant loaded at runtime to exclude the call when needed.
9eccab6 to
b801e91
Compare
Description
In the context of ROX-30439, I started testing fact on some RHCOS versions and found some issues that are addressed by this PR, namely:
This last point is the most involved part of the PR, I couldn't find a precise way to check if this call is allowed on the running kernel, so instead I created a small checks BPF object that can try to load a small probe and check if the load was successful, then use a constant loaded at runtime to exclude the call when needed.
Checklist
Automated testing
If any of these don't apply, please comment below.
Testing Performed
Manually run fact on RHCOS 4.15 and 4.18 VMs.
#131 adds bootstrap tests to multiple VMs for CI checking.