Skip to content

Conversation

@vonhust
Copy link

@vonhust vonhust commented Jun 7, 2018

This PR tries to fix
#8092
commit c058be6
commit ca651b3
#8032
commit 5d58f0f
#6515
commit f8509cc

Wayne Ren added 5 commits June 7, 2018 16:09
* use a separate stack for exception handling, this
will gurantee the exception handling always work, not
affected by some speical cases, e.g., stack check exception
or mem. protection exception at the exception entry.

* this commit can fix the second case of zephyrproject-rtos#8092

* note: the thread switch is still possible in exception
handling, e.g, caused by thread_abort. But the switched out
thread cannot be recovered, as the thread context is not
setup.

Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
Even CONFIG_HW_STACK_PROTECTION is no here,
it is still yes as it will be re-selected by
CONFIG_TEST_HW_STACK_PROTECTION in tests/Kconfig.

So the correct setting here is:

CONFIG_TEST_HW_STACK_PROTECTION=n

This fixes zephyrproject-rtos#8092 (case1)

Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
Whether it should hang the system it not decided finally.
But remove it here to let some tests pass.

Fixes zephyrproject-rtos#8032

Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
base.user_options is already set in k_thread_
user_mode_enter

Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
This commit improves the reset of arc:
* make the processor in the correct status
* clear interrupt related regs

this may improve or fix zephyrproject-rtos#6515

Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
@codecov-io
Copy link

Codecov Report

Merging #8246 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #8246   +/-   ##
=======================================
  Coverage   64.53%   64.53%           
=======================================
  Files         420      420           
  Lines       40109    40109           
  Branches     6762     6762           
=======================================
  Hits        25884    25884           
  Misses      11108    11108           
  Partials     3117     3117

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd561dd...f8509cc. Read the comment docs.

Copy link
Member

@nashif nashif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm both high priority bugs are fixed with this.

@MaureenHelm MaureenHelm added this to the v1.12.0 milestone Jun 7, 2018
Copy link
Contributor

@andrewboie andrewboie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good but I don't understand why we don't handle exceptions on the IRQ stack, all other arches do that

SECTION_VAR(BSS, saved_value)
.word 0

#define EXCEPTION_STACK_SIZE 256
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of doing this, why not use the interrupt stack for exceptions like all our other arches?

@ruuddw
Copy link
Member

ruuddw commented Jun 7, 2018

@andrewboie

instead of doing this, why not use the interrupt stack for exceptions like all our other arches?

The interrupt stack was used in the original code, but only after initial exception context was saved to the current stack. The problem being fixed here is the following (#8092)

  1. the test code creates a stack overflow; sp gets out of bounds because of a large array on the stack, and then a printf call is done
  2. exception handler is triggered (hardware stack checking), and creates an exception stack frame … using the incorrect sp
  3. _kernel struct gets corrupted due to this, and later in the exception handler the switch to _interrupt_stack (fetched from the _kernel struct) fails since it was overwritten.

I don't think the interrupt stack can be trusted in case of stack overflow exceptions. Hence this solution with a fail safe dedicated exception stack.

@andrewboie
Copy link
Contributor

I still don't think we should do this. That's a lot of space to reserve for what should not happen in practice.

_kernel struct gets corrupted due to this, and later in the exception handler the switch to _interrupt_stack (fetched from the _kernel struct) fails since it was overwritten.

This is the real problem. There's no need that I can see to have to store the pointer to the IRQ stack in _kernel. It gets initialized once to K_THREAD_STACK_BUFFER(_interrupt_stack) + CONFIG_ISR_STACK_SIZE and is never touched again. It's essentially a read-only value. It doesn't need to be in RAM.

Instead I'd like to see if we can remove the irq_stack members from _kernel (for both UP and SMP) and use ROM for it:

  • initialize a const read-only array at buiild time which contains this information and stored in ROM/text region and have the exception handler read that
  • or if that turns out to be impossible, just have the fatal exception handler use the address of the _interrupt_stack symbol, add CONFIG_ISR_STACK_SIZE to it, and set the stack pointer to that.

Can you drop the exception stack patch from this PR? I have time to work on what I have described above.

Copy link
Contributor

@andrewboie andrewboie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #8271 to track putting the irq stack pointers in ROM

@ruuddw
Copy link
Member

ruuddw commented Jun 7, 2018

Yes, while analysing the stack issues I wondered why the interrupt stack was obtained from the kernel struct since it is constant. But, that is not the issue. We support nested (different priority) interrupts, and the interrupts can trigger an exception as well. Setting sp to the initial interrupt stack value as the first action in the exception handler interferes with that. It would probably be ok if exceptions don't have to be recoverable, that would greatly simplify things.

@andrewboie
Copy link
Contributor

andrewboie commented Jun 7, 2018

We support nested (different priority) interrupts, and the interrupts can trigger an exception as well. Setting sp to the initial interrupt stack value as the first action in the exception handler interferes with that.

sp only gets set to _kernel interrupt stack if we are not already on the interrupt stack (nested count = 0)
If we are already in an IRQ or exception, sp should not be modified. This is true for all arches AFAIK, if ARC is unconditionally setting sp on exception entry then that is a bug.

Nested count is also something that can get corrupted now that I think of it though....might need to replace nested checks with instead a check in the asm code to determine if SP is already within the bounds of the irq stack and if so don't change it.

It would probably be ok if exceptions don't have to be recoverable, that would greatly simplify things.

I think any exception, if triggered in supervisor mode, that ends up in _NanoFatalErrorHandler can be treated as non-recoverable and we might consider just unconditionally setting SP for this case.

@ruuddw
Copy link
Member

ruuddw commented Jun 7, 2018

Exceptions (on ARC) can't be nested, an exception during exception will trigger a non-recoverable double fault. So unconditional switching to the dedicate exception stack is ok. The nesting is required for the interrupt stack.
The alternative to the exception stack could indeed be a check if the sp is in range of the _kernel interrupt stack and switch to it if not. That would result in a safe sp for the exception handling. Need to give that a bit more thought...

@nashif
Copy link
Member

nashif commented Jun 8, 2018

We need a resolution here, this is blocking the 1.12 release.

@andrewboie
Copy link
Contributor

andrewboie commented Jun 8, 2018

Exceptions (on ARC) can't be nested, an exception during exception will trigger a non-recoverable double fault. So unconditional switching to the dedicate exception stack is ok. The nesting is required for the interrupt stack.

The alternative to the exception stack could indeed be a check if the sp is in range of the _kernel interrupt stack and switch to it if not.

Not sure that would help. The kernel irq stack pointer is corrupted and could point anywhere.

Perhaps the best thing to do is to either:

  1. Adjust linker scripts such that we move _kernel, currently in .bss, so that is not located immediately before the .noinit section (which is where stacks live)
  2. Or have the exception code simply take the constant address of _interrupt_stack (since it is a symbol known at build time), add CONFIG_IRQ_STACK_SIZE to it in the ASM code, and set SP from there, instead of looking it up in _kernel.

Having 256 bytes reserved for a fatal exception stack does not sit well with me, if we go that route it will never get fixed and users will lose that RAM forever. We do have a double-fault stack for x86 but it's only 8 bytes and takes advantage of ancient i386 hardware task switching.

@andrewboie
Copy link
Contributor

We need a resolution here, this is blocking the 1.12 release.

I don't want to block the release, but if we take the current approach of an exception stack I feel it's mandatory to work on this some more in 1.13 so we don't lose that memory.

@MaureenHelm MaureenHelm dismissed andrewboie’s stale review June 8, 2018 21:36

agreed to accept this for 1.12 and rework in 1.13

@MaureenHelm
Copy link
Member

Fixes #8092
Fixes #8032
Fixes #6515

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants