-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Use memory management hardware to handle faults, measure user strings #8863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use memory management hardware to handle faults, measure user strings #8863
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8863 +/- ##
==========================================
- Coverage 52.61% 52.25% -0.36%
==========================================
Files 201 196 -5
Lines 25249 24802 -447
Branches 5276 5155 -121
==========================================
- Hits 13285 12961 -324
+ Misses 9838 9762 -76
+ Partials 2126 2079 -47
Continue to review full report at Codecov.
|
fec737e to
cc8cea1
Compare
lpereira
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach, but I would have to take a look with care every place where the fixup pointer is set to ensure that this can't be abused as a generic indirect jump gadget.
kernel/userspace.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check for overflow here with __builtin_add_overflow().
kernel/userspace.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check for overflow here as well. If possible, write the longhand if here.
arch/arc/core/fault.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that the fault handling in all of the userspace-supported platforms are not nested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixup only gets set for a very brief period of time inside z_arch_user_string_nlen assembly code. don't see at the moment how this could be problematic.
kernel/userspace.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just use retval < 0 as an error indication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to keep this the same as strnlen which returns a size_t.
Is there some advantage to this I am missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only simpler use and easier error dispatching (but this is my personal opinion).
ssize_t size = z_user_string_nlen(src, maxlen);
if (size < 0) {
return -EFAULT;
}
kernel/include/syscall_handler.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should mention that this function must be called with interrupts disabled, due to interaction with fault handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this is actually necessary.
The fixup pointer is stored on a per-thread basis.
We can context switch out unless I'm missing something. This function should only be called from thread context though, which I can certainly document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about situation when z_user_string_nlen() is being interrupted by a higher priority interrupt which triggers a fault? By my understanding, we will get a false positive here - instead of crashing kernel, we will assume that fault was caused by user string checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see now.
Will address in next iteration of this PR, I have an idea on how to do this without requiring interrupt locking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New approach taken, interrupt locking should no longer be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it looks better.
cc8cea1 to
9cd8d8c
Compare
|
Updated. The thread-specific fixup member is gone. Instead we build up a table of instruction pointer ranges with associated fixup handlers and use that instead. The fixup is only jumped to if the fault occurs within a very narrow range of instructions and nowhere else. This also has the advantage of requiring less overhead for the success case. Test cases augmented. Just need some more testing on ARC, having some trouble with the nsim target. |
9cd8d8c to
69a2fef
Compare
arch/arm/core/fault.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, this code block will make _Fault() to by-pass the standard fault handling process handling. So we need to manually perform some actions that are now incorporated into the relative fault handlers (e.g. clear fault status register sticky bits, reset the fault address register etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by-pass the standard fault handling process handling
Where does this take place? I cannot find it. All _Fault seems to do is call various code to dump out information and then call SysFatalErrorHandler() which then returns. I did not encounter any issues testing this on hardware either FWIW.
Edit: The documentation for _Fault states:
* This routine is called when fatal error conditions are detected by hardware
* and is responsible only for reporting the error. Once reported, it then
* invokes the user provided routine _SysFatalErrorHandler() which is
* responsible for implementing the error handling policy.
I think I'm doing this in the right spot as this is a non-fatal exception and nothing should be printed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In line 678 we seem to return from _Fault(). So, for example, if this was a BusFault, we are not going to clear the Sticky bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fault handlers do not only dump information, but also clear fault status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am suggesting to refactor and clean up this code in Fault.c,so fault dumping and fault status clearing are somewhat separated....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is a bad situation because in this scenario I really don't want anything printed.
Not only that, but this code doesn't even run if CONFIG_FAULT_DUMP isn't set. And I have to propagate back to the main _Fault() somehow that we need to return without calling SysFatalErrorHandler.
Do you have any advice other than rewriting fault.c for ARM?
arch/arm/core/fault.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structure declarations, below, are generic, i.e. not specific to ARM, so I wonder if they could be placed in some more general location and included by the arch-specific fault.c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same declarations are present for ARC, AFAICS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved most of this to a header.
kernel/include/kernel_internal.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
measure*
arch/arm/core/fault.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, only memory protection faults could fall into this category of errors. So, MemMg for ARM and BusFaults for NXP, so I wonder if we could move the code inside the respective handlers.
This comment is related to my other comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would require significantly restructuring the code in fault.c.
We don't really have handlers in here, just a bunch of stuff which prints out information to the console. There's no leeway for policy decisions to be made in the way it is structured now. All the "handlers" like _MpuFault are called by FAULT_DUMP, which only calls _FaultDump() if CONFIG_FAULT_DUMP > 0 and all the return values are ignored. It's a mess.
Can we file a bug on this?
69a2fef to
69ea7f4
Compare
|
Moved some declarations to a header. Fixed some typos. Declared the exception handler arrays as const to put them in ROM. Filed #8950 to figure out what to do about the ARM fault handling code buried in the fault dumping infrastructure, this still needs to be addressed. |
Yes, I need absolute control over the state of general purpose registers and the stack contents when I jump to the fixup handler, so this has to be done in assembly. This is not simple. Linux does this too with rather byzantine macros: https://github.com/torvalds/linux/blob/master/Documentation/x86/exception-tables.txt
Not why I'm doing this. |
I love the new scheme -- from a cursory glance I couldn't find anything wrong with this, race condition wise. There's some room for improvement (such as the exception table location being generic), and fixing the platform-specific code as already pointed out here, but I like the cleverness of this version.
|
Generally OK for ARC, my concerns are
|
Can you be more specific with your concerns?
Linux does essentially the same thing: https://github.com/torvalds/linux/blob/master/Documentation/x86/exception-tables.txt
How is this not scalable? |
1b39a62 to
c1b5060
Compare
|
@ioannisg can you re-visit your review |
arch/arm/core/fault.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with changing MPU to Memory, if this function is to be used in a more generic way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewboie , in the amended version of this PR, this function is also used in ARMv6-M, so it needs to be implemented outside the #ifdef ARMV7-M guard.
ioannisg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some requests for changes. But also: note that we still do not handle this properly for ARMv6-M /ARMv8_M_Base, where HardFault is the only fault exception.
I wonder whether we could, for now, support the whole thing only for ARMv7-M/ARMv8-M_Main
arch/arm/core/fault.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to put this inside #ifdef NXP_MPU, as this is only relevant for NXP SOCs that generate BusFaults for memory access violations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't true.
For example, in the provided test case, user mode passes a nonsense string address of 0xFFFFFFF0. This generates a bus fault.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see this possibility now, thanks for pointing it out, @andrewboie
arch/arm/core/fault.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The immediate return here is a bit problematic, because we need to clear the sticky bits, see line 386-371. I see two "good " options:
- Copy here the code for sticky bits clearing (note that it is only relevant for ARMv8-M), or
- Create a local variable
reasonto hold the_NANO_ERR_RECOVERABLE, and jump withgototo the sticky bit clearing and thenreturn reason.
arch/arm/core/fault.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you implement my second suggestion, here you could return reason.
arch/arm/core/fault.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good
arch/arm/core/fault.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewboie, @agross-linaro , this is a comment related to adding support for recoverable memory-protection faults for ARMv6-M/ARMv8-M Baseline: is it as easy as just calling the MemoryFaultIsRecoverable(.) function in line 501 (i.e. for ARMv6 only) and do similar things as @andrewboie does in Bus and MPU fault?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will also need to move the print-out of "*** HARD FAULT ***" below the call of _MemoryFaultIsRecoverable, to avoid any fault dumping in this case...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at this point I've let go of my desire to not have any printouts for recoverable faults, it just makes the code excessively complicated and it's too easy to mess this up.
These errors should be rare anyway and will always be due to programming mistakes.
c1b5060 to
592cd74
Compare
arch/arm/core/fault.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good
ioannisg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to approve this, however, I think we should not abandon the ambition of preventing fault dumping in case of errors that are supposed to be recoverable. We can merge this PR for 1.13, and do the refinement under a future task, IMO.
ioannisg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few comments, after checking the sanity check errors.
arch/arm/core/fault.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewboie , in the amended version of this PR, this function is also used in ARMv6-M, so it needs to be implemented outside the #ifdef ARMV7-M guard.
arch/arm/core/fault.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be outside the #ifdef ARMv7-M guard?
Defines some common macros and data structures for declaring text section ranges with fixup handler addresses if an exception occurs. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This is used to measure the length of potentially unsafe strings. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Uses fixup infrastructure to safely abort if we get a page fault while measuring a string passed in from user mode. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Uses fixup infrastructure to safely abort if we get an MPU fault when examining a string passed in from user mode. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Safely measure the length of a potentially dubious string. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
We now have functions for handling all the details of copying data to/from user mode, including C strings and copying data into resource pool allocations. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Test that we can define our own system calls in application code and that fault handling works properly. Additional tests for base system call infrastructure, outside of specific system calls, go here. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
592cd74 to
860811b
Compare
AdithyaBaglody
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except for the error codes
|
|
||
| static int user_copy(void *dst, void *src, size_t size, bool to_user) | ||
| { | ||
| int ret = EFAULT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be -EFAULT?
Some forthcoming system calls have user mode pass in strings which could be potentially garbage and generate an exception.
Trap memory management hardware exceptions during these operations and gracefully return an error instead.