-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFE: allow adding an exact syscall number for an architecture #259
Comments
Hi @Xyene, thanks for the RFE and/or BUG. I'm thinking this may be both, but given the subject line I think it's reasonable to keep it as a RFE. Yes, you are correct in that libseccomp always interprets syscalls in the context of the native ABI. This is partly due to historical reasons where people wanted to be able to use the However, I agree that we should be doing a better job about making this behavior clear in the documentation. @Xyene would you be willing to put together a patch/PR for the manpages? If not, that's fine but please let us know so @drakenclimber or I can work on it. It would be good to get the doc fixes in for the v2.5.0 release. Beyond that, I suppose we could create a new API that would take raw syscall numbers and pass them through to the filter untranslated; although I haven't given that too much thought yet. Since this hits at the issue's subject line, let's leave this open for now as a way of discussing and tracking this. |
Agreed, I don't think changing something so fundamental (but subtle) is a good idea :)
Will do so soon.
I think this could be an opportunity to kill two birds with one stone. As far as I can tell, there's currently no way to specify a disjoint set of syscalls per architecture. In the x86-on-x64 case, regardless of how the architectures are added to the context, at least some syscall rules will be "inherited" by the other. There's no way I see to have an x86-only ruleset and an x64-only one in the same context. Since there are plenty of bits still left in an // Allow syscall #2 on x86
seccomp_rule_add_exact(ctx, SCMP_ACT_ALLOW, SCMP_SYS_ARCH(2, SCMP_ARCH_X86), 0);
// Allow read syscall on x86, same resolution mechanism as SCMP_SYS
seccomp_rule_add_exact(ctx, SCMP_ACT_ALLOW, SCMP_SYS_ARCH(read, SCMP_ARCH_X86), 0); |
Create two filters, one for x86 and one for x86_64, and add whatever rules you want to each, independent of each other and when you are done you merge them back together with a call to |
Refs seccomp#259. Signed-off-by: Tudor Brindus <me@tbrindus.ca>
Refs seccomp#259. Signed-off-by: Tudor Brindus <me@tbrindus.ca>
Refs seccomp#259. Signed-off-by: Tudor Brindus <me@tbrindus.ca>
libseccomp performs a translation step when adding a raw syscall value to a multi-architecture filter. For instance, when adding __NR_open (syscall value 2 on x86-64) to a filter containing x86 and x86-64 where the native ABI is x86-64, the x86 BPF branch will use the value 5 (__NR_open on x86). This commit adds explicit documentation for the translation step. Refs seccomp#259. Signed-off-by: Tudor Brindus <me@tbrindus.ca>
libseccomp performs a translation step when adding a raw syscall value to a multi-architecture filter. For instance, when adding __NR_open (syscall value 2 on x86-64) to a filter containing x86 and x86-64 where the native ABI is x86-64, the x86 BPF branch will use the value 5 (__NR_open on x86). This commit adds explicit documentation for the translation step. Refs #259. Signed-off-by: Tudor Brindus <me@tbrindus.ca> Acked-by: Tom Hromatka <tom.hromatka@oracle.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
Hi,
I have a usage of libseccomp that involves a 64-bit process setting up a filter for and then launching a 32-bit process, with
SCMP_ACT_TRACE
and usingptrace
for syscall emulation.The code already has to maintain a syscall name to per-arch ID (similar to libseccomp's syscall.csv in spirit), since it falls back to pure
ptrace
ifseccomp
is not available. This means that in order to use libseccomp's handySCMP_SYS
helpers, we'd need to maintain another mapping to use only on the seccomp path. Possible, but icky. We already have the raw mappings; might as well use them.Reading the docs, I would have expected the following code to set up a filter allowing
fork
for x86, and causing traps for all other syscalls (most error-checking elided for brevity):Compiling and running this example produces:
It looks like libseccomp is taking the syscall 2, seeing that it is
open
on x86-64, then mapping that to x86 syscall 5 in the filter. At the same time, the x64 arch has been removed from the seccomp context, and only x86 remains.This may or may not be a helpful thing to do (personally, I find it confusing, but that's not a good reason to change something :), but at the very least I couldn't find anything in the documentation describing this behavior. The closest I found was in
man 3 seccomp_rule_add
(apologies if I've missed something obvious):I think the wording, in particularly that of "...to specify the syscall value directly..." doesn't indicate that the syscall number will be translated in any way. My expectation, at least, was that if I call
seccomp_rule_add
with syscall number N, then syscall number N appears in all branches of architectures currently in the context.This made it an excellent foot-gun — instead of allowing a rather harmless
fork
(syscall 2 on x86), the filter instead allowsopen
ing anything (syscall 5 on x86).If this is in fact the intended behavior here (and reading through the code makes it pretty clear it is), I'd be happy to submit a patch clarifying that bit of the documentation. At the same time, having a way to add raw syscall numbers, without this remapping, would be very useful in cases where
SCMP_ACT_TRACE
is being used. Maintaining a mapping to libseccomp pseudosyscall numbers in such cases seems less than ideal. If this is something you'd consider having upstream, I can also send in a patch for review.The text was updated successfully, but these errors were encountered: