-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFE: support "maximum kernel version" #11
Comments
+1 |
@nmav to be clear, this RFE is for adding information to the internal syscall tables about when the syscall was first introduced to the Linux kernel, not for adding logic to determine if the current running kernel supports a given syscall. However, if you are trying to block a syscall, you can do so with libseccomp regardless of if it is supported on a particular arch/ABI and kernel version, libseccomp will do the right thing for you. |
This RFE is almost five years old, and outside of a single discussion with @cgwalters I haven't seen or heard of much other interest in such a feature. With plenty of other open issues, most with higher priority, it is not clear when we would work on this, or even if such a thing would be a useful addition. @cgwalters and @drakenclimber what do you think of this issue in 2020? I'm tempted to close this as WONTFIX, but I would like to get some comments and feedback before we take that step. |
Honestly I think this is a really cool idea. Several of my in-house customers are using allowlists because of this exact reason. If they were to use a denylist and a new syscall is added to the kernel, then that syscall would be another avenue of attack. Let's leave it open for a bit longer. I'll ask around within Oracle and see if any customers are interested enough in this feature for me to pick it up. But @cgwalters (or anyone else for that matter) is totally welcome to own it if they have the time and interest :). |
Okay, as long as there is interest, I've got no problem in keeping this one open. |
I do still think it'd be useful! |
It looks like issue #286 is the concrete issue to help drive this work forward ... even if it has been almost five years ;) I think the first step towards this is to add a new field to the syscalls.csv file that indicates when the syscall was first introduced. That is going to be a good chunk of work as we currently have ~469 syscalls defined (!). However, we could amortize this work for the existing syscalls with an "undefined" value that we would treat simply as the syscall being created at the dawn of time. Of course all new additions to the syscalls.csv table would need to be added with the kernel version. Some more quick thoughts:
... where
enum kernel_version {
KV_UNDEF = 0,
KV_1_0,
KV_1_1,
KV_1_3,
KV_2_0,
...
KV_5_8,
_KV_MAX,
}; |
Is kernel version the right thing to track? Is it guaranteed that newer syscalls are not backported to e.g. stable kernel branches with a lower version number? |
Red Hat will backport all kinds of things to their kernels, so no. |
If RedHat backports a syscall to an older kernel version, they can also patch their version of libseccomp to match. Though to be fair this might matter more for certain use-cases but as an approach to fixing #286 I think it's fairly workable. The other problem is that I'm not sure there's any better approach -- syscalls can be added in non-consecutive order (for instance |
Yes, exactly. The upstream libseccomp project has no control over the various enterprise Linux distributions and if those distributions decide to deviate from the upstream projects (either the Linux Kernel or libseccomp) they are on their own for support. While we will do our best to help, we can't sacrifice the upstream project in favor of these enterprise distributions with their own support and engineering staff. |
As a point of reference, the |
I can do the syscall spelunking to figure out a version number for each syscall -- the only question is whether we should have the version number be per-architecture since I'm pretty sure certain syscalls were added to different architectures in different releases. |
They most definitely were, and still are, as far as I can see. While it is going to be slightly annoying, and will definitely explode the CSV, tracking the syscall's first appearance for each arch/ABI is probably the right thing to do. Any help you can provide on this @cyphar would be greatly appreciated. |
it was reported by clang with the option -fsanitize=memory: Uninitialized bytes in MemcmpInterceptorCommon at offset 0 inside [0x7070000002a0, 56) ==3791089==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x482a2c in memcmp (fuzzer+0x482a2c) seccomp#1 0x7fed2f120ebb in _hsh_add src/libseccomp/src/gen_bpf.c:598:9 seccomp#2 0x7fed2f121715 in _gen_bpf_action_hsh src/libseccomp/src/gen_bpf.c:796:6 seccomp#3 0x7fed2f121a53 in _gen_bpf_node src/libseccomp/src/gen_bpf.c:831:11 seccomp#4 0x7fed2f121a53 in _gen_bpf_chain.isra.0 src/libseccomp/src/gen_bpf.c:1072:13 seccomp#5 0x7fed2f121f16 in _gen_bpf_chain_lvl_res src/libseccomp/src/gen_bpf.c:977:12 seccomp#6 0x7fed2f121c74 in _gen_bpf_chain.isra.0 src/libseccomp/src/gen_bpf.c:1124:12 seccomp#7 0x7fed2f12253c in _gen_bpf_syscall src/libseccomp/src/gen_bpf.c:1520:10 seccomp#8 0x7fed2f12253c in _gen_bpf_syscalls src/libseccomp/src/gen_bpf.c:1615:18 seccomp#9 0x7fed2f12253c in _gen_bpf_arch src/libseccomp/src/gen_bpf.c:1683:7 seccomp#10 0x7fed2f12253c in _gen_bpf_build_bpf src/libseccomp/src/gen_bpf.c:2056:11 seccomp#11 0x7fed2f12253c in gen_bpf_generate src/libseccomp/src/gen_bpf.c:2321:7 seccomp#12 0x7fed2f11f41c in seccomp_export_bpf src/libseccomp/src/api.c:724:7 Uninitialized value was created by a heap allocation #0 0x4547ef in realloc (fuzzer+0x4547ef) seccomp#1 0x7fed2f121244 in _blk_resize src/libseccomp/src/gen_bpf.c:362:8 seccomp#2 0x7fed2f121244 in _blk_append src/libseccomp/src/gen_bpf.c:394:6 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
it was reported by clang with the option -fsanitize=memory: Uninitialized bytes in MemcmpInterceptorCommon at offset 0 inside [0x7070000002a0, 56) ==3791089==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x482a2c in memcmp (fuzzer+0x482a2c) seccomp#1 0x7fed2f120ebb in _hsh_add src/libseccomp/src/gen_bpf.c:598:9 seccomp#2 0x7fed2f121715 in _gen_bpf_action_hsh src/libseccomp/src/gen_bpf.c:796:6 seccomp#3 0x7fed2f121a53 in _gen_bpf_node src/libseccomp/src/gen_bpf.c:831:11 seccomp#4 0x7fed2f121a53 in _gen_bpf_chain.isra.0 src/libseccomp/src/gen_bpf.c:1072:13 seccomp#5 0x7fed2f121f16 in _gen_bpf_chain_lvl_res src/libseccomp/src/gen_bpf.c:977:12 seccomp#6 0x7fed2f121c74 in _gen_bpf_chain.isra.0 src/libseccomp/src/gen_bpf.c:1124:12 seccomp#7 0x7fed2f12253c in _gen_bpf_syscall src/libseccomp/src/gen_bpf.c:1520:10 seccomp#8 0x7fed2f12253c in _gen_bpf_syscalls src/libseccomp/src/gen_bpf.c:1615:18 seccomp#9 0x7fed2f12253c in _gen_bpf_arch src/libseccomp/src/gen_bpf.c:1683:7 seccomp#10 0x7fed2f12253c in _gen_bpf_build_bpf src/libseccomp/src/gen_bpf.c:2056:11 seccomp#11 0x7fed2f12253c in gen_bpf_generate src/libseccomp/src/gen_bpf.c:2321:7 seccomp#12 0x7fed2f11f41c in seccomp_export_bpf src/libseccomp/src/api.c:724:7 Uninitialized value was created by a heap allocation #0 0x4547ef in realloc (fuzzer+0x4547ef) seccomp#1 0x7fed2f121244 in _blk_resize src/libseccomp/src/gen_bpf.c:362:8 seccomp#2 0x7fed2f121244 in _blk_append src/libseccomp/src/gen_bpf.c:394:6 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
As a FYI, I'm starting to look closer at our work queue for v2.6.0 and this jumped out as one of the larger items so I spent some time on it this afternoon while avoiding other work :) I've got "arch-syscall-validate" so that it creates CSV files with minimum kernel versions for earch sycall/ABI pair, it also imports an existing "syscalls.csv" file to obtain any existing version information so we don't have to have a separate file with version information lying around. If that sounds confusing, it will make more sense when I submit the PR. Speaking of the PR, I need to update the rest of the code/tooling to understand the expanded CSV format, once that is done I'll submit the PR for review/merge. This initial effort will be pretty hollow (no actual version information, and nothing to make use of it ), but it will allow us to start collecting the version information in our syscall table and pave the way for additional work using the syscall version information. |
Ironically I recently started looking at it as well :) [1]. I have everything working except for the BPF creation. Feel free to use/discard any of my work. Or if you want, we could have a quick chat to decide what to keep/throw away.
With all of that said, I would love to see what you've put together, @pcmoore. I think what I've outlined above will work, but it may not be the optimal way to do it. [1] https://github.com/drakenclimber/libseccomp/blob/wip/issue11 |
But conceptually that's not actually different than upgrading the kernel to a newer version. |
I actually started on this issue yesterday because I knew you were thinking about this topic and figured I could jump start it by getting some of the basic infrastructure in place, it wasn't my intention to duplicate efforts ... oh well, the best laid plans of mice and men ;) Regardless, I've probably only got a couple more hours if work before my basic PR is ready so I'll go ahead and submit that so you can take a look; at that point we can merge it, or drop it as a "lessons learned" sort of thing. I don't get too attached to any code I write, so feel free to reject the PR in favor of what you've got.
I personally think it would be good to see all of the syscall information in one table/csv. Yes, it is going to start getting a bit big, but I believe fairly strongly that having all of our syscall information in one file/database is going to be a better choice in the long run (easier updates, less worries about synchronizing the tables, etc.). The creation/updating and automation of maintaining this file/database is a slightly different topic, but one change I've made to the "arch-syscall-validate" script is that when it is asked to generate a new syscall CVS table it optionally loads an existing CSV table and pulls the kernel version from that. This allows us to preserve the kernel version information in our CSV file and only worry about adding new entries by hand after the new CSV is generated. I think it's important to remember that the addition of new syscalls is a relatively rare event and optimizing the process for that is not something I would worry too much about. Similarly, initially populating the syscall table with kernel versions is a one-time event that I don't think we need to worry about making repeatable; if we can hack together something to initially populate the values - correctly! - I think that's okay.
Validating the syscall table information used to be a lot more important when it was hand created, now that the table is generated from the kernel source itself the validation step isn't really necessary. The fact that our generation script has "validate" in the name is really just vestigial naming and not something to worry too much about IMO. Even with the kernel versions being added to the table I'm not sure validation will provide much benefit, although using this script to initially generate and add the kernel versions to the syscall table would be nice.
I'm not sure how I feel about pre-computing the valid syscall ranges for each version/ABI. I understand the performance advantage, but that feels very wrong to me; I think I'd rather see the library calculate that as needed right now.
For whatever reason, I've always thought of this more as a proper rule API instead of a filter attribute. I know filter attributes are seductive in the sense that they are easy and malleable to fit a wide range of uses, but in my mind restricting the filter to a specific set of syscalls available in a given kernel version seems much more like a filter rule than an attribute. Like I said earlier, this is getting way ahead of what I was attempting to do with my initial little syscall table infrastructure PR, but I guess something like this is what I was thinking along the lines of this for the API: enum scmp_kver {
__SCMP_KV_NULL = 0,
SCMP_KV_UNDEF = 1,
...
SCMP_KV_5_17,
__SCMP_KV_MAX,
};
#define SCMP_KVLE(V) SCMP_CMP64(100, SCMP_CMP_LE, (V), 0)
int seccomp_rule_add_kver(scmp_filter_ctx ctx, uint32_t action, scmp_arg_cmp cmp); ... with an example usage being:
We could also just scrap the
I'm happy to help put together the db/BPF code when we get to that point, I've tossed it around in my head a bit over the past few years and I think I have some ideas on how to make it work, but we'll have to see how well those ideas translate into proper code ;)
I really should be able to get the PR out this afternoon, but if something comes up I'll post it over the weekend; we can talk a bit more about it then. Although like I said earlier, it's a far cry from being a complete solution, it is really just intended to help pave the way for a lot of the stuff you're working on. |
I was going to say the exact same thing. :)
I'm excited to see what you come up with. I think we could solve this in a variety of ways and it may take a few iterations. Thanks so much for the help! |
This ended up taking a bit more time than I thought today, but my initial infrastructure PR is up at #381. |
Hi all! I'm just curious to know if someone is already working on finishing this. |
A year ago I put together a prototype that was most of the way there [1]. I believe it works (or nearly so), but it needs a lot of work to clean it up, make the commits sensible, add tests, etc. Unfortunately I've since been pulled onto other issues, and I'm not sure when I'll get back to it. I am open to others picking up the task - either by continuing my work or starting from scratch. [1] https://github.com/drakenclimber/libseccomp/tree/wip/issue11 |
I haven't had an opportunity to do any further work on this, so if you are interested in working on this please let us know! |
As system calls are added to the kernel, I feel there is not enough discussion by default of the wide variety of applications that will suddenly gain access to a new attack surface.
The canonical example here is
perf_event_open()
, the source of numerous CVEs. While perf is awesome, my (e.g.) web server should not (by default) be able to use it.It's possible to use seccomp today to blacklist. whitelists can get very difficult to manage.
One thing that might be useful is a filter for any system calls newer than a particular kernel version, say 3.10. That way, each new system call would have to be verified for use in e.g. containers before it's added. Upgrading the kernel wouldn't suddenly expose containers to new attack surface.
In a discussion with @pcmoore he indicated this could be another annotation in the struct in e.g.
arch-x86-syscalls.c
.The text was updated successfully, but these errors were encountered: