Skip to content

Conversation

@andrewboie
Copy link
Contributor

@andrewboie andrewboie commented Jul 11, 2018

Some forthcoming system calls have user mode pass in strings which could be potentially garbage and generate an exception.

Trap memory management hardware exceptions during these operations and gracefully return an error instead.

@codecov-io
Copy link

codecov-io commented Jul 11, 2018

Codecov Report

Merging #8863 into master will decrease coverage by 0.35%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8863      +/-   ##
==========================================
- Coverage   52.61%   52.25%   -0.36%     
==========================================
  Files         201      196       -5     
  Lines       25249    24802     -447     
  Branches     5276     5155     -121     
==========================================
- Hits        13285    12961     -324     
+ Misses       9838     9762      -76     
+ Partials     2126     2079      -47
Impacted Files Coverage Δ
include/logging/log_core.h 0% <0%> (-80%) ⬇️
include/entropy.h 33.33% <0%> (-33.34%) ⬇️
kernel/stack.c 67.3% <0%> (-25%) ⬇️
kernel/idle.c 72.72% <0%> (-16.17%) ⬇️
kernel/msg_q.c 77.77% <0%> (-15.91%) ⬇️
kernel/init.c 57.97% <0%> (-15.8%) ⬇️
drivers/entropy/fake_entropy_native_posix.c 70.58% <0%> (-11.77%) ⬇️
kernel/queue.c 84.95% <0%> (-8.85%) ⬇️
kernel/thread.c 81.04% <0%> (-6.9%) ⬇️
subsys/net/lib/sockets/getaddrinfo.c 40% <0%> (-5.57%) ⬇️
... and 56 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15516ed...860811b. Read the comment docs.

@andrewboie andrewboie force-pushed the syscall-extensions branch from fec737e to cc8cea1 Compare July 11, 2018 21:54
@andrewboie andrewboie changed the title [DNM] use memory management hardware to handle faults, measure user strings [WIP] use memory management hardware to handle faults, measure user strings Jul 11, 2018
Copy link
Member

@lpereira lpereira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach, but I would have to take a look with care every place where the fixup pointer is set to ensure that this can't be abused as a generic indirect jump gadget.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check for overflow here with __builtin_add_overflow().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check for overflow here as well. If possible, write the longhand if here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that the fault handling in all of the userspace-supported platforms are not nested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixup only gets set for a very brief period of time inside z_arch_user_string_nlen assembly code. don't see at the moment how this could be problematic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just use retval < 0 as an error indication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep this the same as strnlen which returns a size_t.
Is there some advantage to this I am missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only simpler use and easier error dispatching (but this is my personal opinion).

ssize_t size = z_user_string_nlen(src, maxlen);
if (size < 0) {
      return -EFAULT;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention that this function must be called with interrupts disabled, due to interaction with fault handler.

Copy link
Contributor Author

@andrewboie andrewboie Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is actually necessary.
The fixup pointer is stored on a per-thread basis.
We can context switch out unless I'm missing something. This function should only be called from thread context though, which I can certainly document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about situation when z_user_string_nlen() is being interrupted by a higher priority interrupt which triggers a fault? By my understanding, we will get a false positive here - instead of crashing kernel, we will assume that fault was caused by user string checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see now.
Will address in next iteration of this PR, I have an idea on how to do this without requiring interrupt locking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New approach taken, interrupt locking should no longer be necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it looks better.

@andrewboie andrewboie force-pushed the syscall-extensions branch from cc8cea1 to 9cd8d8c Compare July 15, 2018 19:54
@andrewboie
Copy link
Contributor Author

andrewboie commented Jul 15, 2018

Updated. The thread-specific fixup member is gone. Instead we build up a table of instruction pointer ranges with associated fixup handlers and use that instead. The fixup is only jumped to if the fault occurs within a very narrow range of instructions and nowhere else. This also has the advantage of requiring less overhead for the success case.

Test cases augmented.

Just need some more testing on ARC, having some trouble with the nsim target.

@andrewboie andrewboie force-pushed the syscall-extensions branch from 9cd8d8c to 69a2fef Compare July 15, 2018 20:43
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, this code block will make _Fault() to by-pass the standard fault handling process handling. So we need to manually perform some actions that are now incorporated into the relative fault handlers (e.g. clear fault status register sticky bits, reset the fault address register etc.)

Copy link
Contributor Author

@andrewboie andrewboie Jul 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by-pass the standard fault handling process handling

Where does this take place? I cannot find it. All _Fault seems to do is call various code to dump out information and then call SysFatalErrorHandler() which then returns. I did not encounter any issues testing this on hardware either FWIW.

Edit: The documentation for _Fault states:

* This routine is called when fatal error conditions are detected by hardware
 * and is responsible only for reporting the error. Once reported, it then
 * invokes the user provided routine _SysFatalErrorHandler() which is
 * responsible for implementing the error handling policy.

I think I'm doing this in the right spot as this is a non-fatal exception and nothing should be printed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line 678 we seem to return from _Fault(). So, for example, if this was a BusFault, we are not going to clear the Sticky bits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fault handlers do not only dump information, but also clear fault status.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am suggesting to refactor and clean up this code in Fault.c,so fault dumping and fault status clearing are somewhat separated....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is a bad situation because in this scenario I really don't want anything printed.
Not only that, but this code doesn't even run if CONFIG_FAULT_DUMP isn't set. And I have to propagate back to the main _Fault() somehow that we need to return without calling SysFatalErrorHandler.
Do you have any advice other than rewriting fault.c for ARM?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure declarations, below, are generic, i.e. not specific to ARM, so I wonder if they could be placed in some more general location and included by the arch-specific fault.c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same declarations are present for ARC, AFAICS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved most of this to a header.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

measure*

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, only memory protection faults could fall into this category of errors. So, MemMg for ARM and BusFaults for NXP, so I wonder if we could move the code inside the respective handlers.

This comment is related to my other comment below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would require significantly restructuring the code in fault.c.

We don't really have handlers in here, just a bunch of stuff which prints out information to the console. There's no leeway for policy decisions to be made in the way it is structured now. All the "handlers" like _MpuFault are called by FAULT_DUMP, which only calls _FaultDump() if CONFIG_FAULT_DUMP > 0 and all the return values are ignored. It's a mess.

Can we file a bug on this?

@andrewboie
Copy link
Contributor Author

andrewboie commented Jul 16, 2018

Moved some declarations to a header. Fixed some typos. Declared the exception handler arrays as const to put them in ROM.

Filed #8950 to figure out what to do about the ARM fault handling code buried in the fault dumping infrastructure, this still needs to be addressed.

@andrewboie
Copy link
Contributor Author

andrewboie commented Jul 16, 2018

Do you really need the arch assembly to implement the whole strlen loop? Seems like a mostly-needless burden on the arch developer. You could stuff the "fault_start" and "fault_end" symbols into simple C code with inline assembly and then just do the loop with whatever.

Yes, I need absolute control over the state of general purpose registers and the stack contents when I jump to the fixup handler, so this has to be done in assembly. This is not simple.

Linux does this too with rather byzantine macros: https://github.com/torvalds/linux/blob/master/Documentation/x86/exception-tables.txt

If there's concern that it be super-optimized

Not why I'm doing this.

@lpereira lpereira dismissed their stale review July 16, 2018 19:12

I love the new scheme -- from a cursory glance I couldn't find anything wrong with this, race condition wise. There's some room for improvement (such as the exception table location being generic), and fixing the platform-specific code as already pointed out here, but I like the cleverness of this version.

@andrewboie andrewboie requested a review from dcpleung July 16, 2018 20:21
@vonhust
Copy link

vonhust commented Jul 18, 2018

Generally OK for ARC, my concerns are

  1. avoid the abuse of this mechnism to weak the reliability and security of system

  2. scalability. strnlen is one case, i hope this mechnism can cover other possible cases to avoid repeat of work

@andrewboie
Copy link
Contributor Author

Generally OK for ARC, my concerns are

Can you be more specific with your concerns?

avoid the abuse of this mechnism to weak the reliability and security of system

Linux does essentially the same thing: https://github.com/torvalds/linux/blob/master/Documentation/x86/exception-tables.txt

scalability. strnlen is one case, i hope this mechnism can cover other possible cases to avoid repeat of work

How is this not scalable?
Just add more items to the exception array, as long as their implementations have start/end/fixup addresses this will work.

@andrewboie
Copy link
Contributor Author

@ioannisg can you re-visit your review

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with changing MPU to Memory, if this function is to be used in a more generic way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewboie , in the amended version of this PR, this function is also used in ARMv6-M, so it needs to be implemented outside the #ifdef ARMV7-M guard.

Copy link
Member

@ioannisg ioannisg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some requests for changes. But also: note that we still do not handle this properly for ARMv6-M /ARMv8_M_Base, where HardFault is the only fault exception.

I wonder whether we could, for now, support the whole thing only for ARMv7-M/ARMv8-M_Main

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to put this inside #ifdef NXP_MPU, as this is only relevant for NXP SOCs that generate BusFaults for memory access violations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't true.
For example, in the provided test case, user mode passes a nonsense string address of 0xFFFFFFF0. This generates a bus fault.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see this possibility now, thanks for pointing it out, @andrewboie

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The immediate return here is a bit problematic, because we need to clear the sticky bits, see line 386-371. I see two "good " options:

  1. Copy here the code for sticky bits clearing (note that it is only relevant for ARMv8-M), or
  2. Create a local variable reason to hold the _NANO_ERR_RECOVERABLE, and jump with goto to the sticky bit clearing and then return reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you implement my second suggestion, here you could return reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewboie, @agross-linaro , this is a comment related to adding support for recoverable memory-protection faults for ARMv6-M/ARMv8-M Baseline: is it as easy as just calling the MemoryFaultIsRecoverable(.) function in line 501 (i.e. for ARMv6 only) and do similar things as @andrewboie does in Bus and MPU fault?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will also need to move the print-out of "*** HARD FAULT ***" below the call of _MemoryFaultIsRecoverable, to avoid any fault dumping in this case...

Copy link
Contributor Author

@andrewboie andrewboie Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at this point I've let go of my desire to not have any printouts for recoverable faults, it just makes the code excessively complicated and it's too easy to mess this up.

These errors should be rare anyway and will always be due to programming mistakes.

@andrewboie andrewboie force-pushed the syscall-extensions branch from c1b5060 to 592cd74 Compare July 26, 2018 17:59
@andrewboie andrewboie changed the title [WIP] use memory management hardware to handle faults, measure user strings Use memory management hardware to handle faults, measure user strings Jul 26, 2018
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good

Copy link
Member

@ioannisg ioannisg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to approve this, however, I think we should not abandon the ambition of preventing fault dumping in case of errors that are supposed to be recoverable. We can merge this PR for 1.13, and do the refinement under a future task, IMO.

Copy link
Member

@ioannisg ioannisg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments, after checking the sanity check errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewboie , in the amended version of this PR, this function is also used in ARMv6-M, so it needs to be implemented outside the #ifdef ARMV7-M guard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be outside the #ifdef ARMv7-M guard?

Andrew Boie added 7 commits July 30, 2018 11:38
Defines some common macros and data structures for declaring
text section ranges with fixup handler addresses if an
exception occurs.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This is used to measure the length of potentially unsafe
strings.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Uses fixup infrastructure to safely abort if we get a page
fault while measuring a string passed in from user mode.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Uses fixup infrastructure to safely abort if we get an MPU
fault when examining a string passed in from user mode.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Safely measure the length of a potentially dubious string.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
We now have functions for handling all the details of copying
data to/from user mode, including C strings and copying data
into resource pool allocations.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Test that we can define our own system calls in application code
and that fault handling works properly.

Additional tests for base system call infrastructure, outside of
specific system calls, go here.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Copy link
Contributor

@AdithyaBaglody AdithyaBaglody left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except for the error codes


static int user_copy(void *dst, void *src, size_t size, bool to_user)
{
int ret = EFAULT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be -EFAULT?

@andrewboie andrewboie merged commit cef0748 into zephyrproject-rtos:master Jul 31, 2018
@andrewboie andrewboie deleted the syscall-extensions branch April 2, 2019 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants