-
Notifications
You must be signed in to change notification settings - Fork 8.3k
latest Arc fixes #8246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
latest Arc fixes #8246
Conversation
* use a separate stack for exception handling, this will gurantee the exception handling always work, not affected by some speical cases, e.g., stack check exception or mem. protection exception at the exception entry. * this commit can fix the second case of zephyrproject-rtos#8092 * note: the thread switch is still possible in exception handling, e.g, caused by thread_abort. But the switched out thread cannot be recovered, as the thread context is not setup. Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
Even CONFIG_HW_STACK_PROTECTION is no here, it is still yes as it will be re-selected by CONFIG_TEST_HW_STACK_PROTECTION in tests/Kconfig. So the correct setting here is: CONFIG_TEST_HW_STACK_PROTECTION=n This fixes zephyrproject-rtos#8092 (case1) Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
Whether it should hang the system it not decided finally. But remove it here to let some tests pass. Fixes zephyrproject-rtos#8032 Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
base.user_options is already set in k_thread_ user_mode_enter Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
This commit improves the reset of arc: * make the processor in the correct status * clear interrupt related regs this may improve or fix zephyrproject-rtos#6515 Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
Codecov Report
@@ Coverage Diff @@
## master #8246 +/- ##
=======================================
Coverage 64.53% 64.53%
=======================================
Files 420 420
Lines 40109 40109
Branches 6762 6762
=======================================
Hits 25884 25884
Misses 11108 11108
Partials 3117 3117Continue to review full report at Codecov.
|
nashif
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm both high priority bugs are fixed with this.
andrewboie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally looks good but I don't understand why we don't handle exceptions on the IRQ stack, all other arches do that
| SECTION_VAR(BSS, saved_value) | ||
| .word 0 | ||
|
|
||
| #define EXCEPTION_STACK_SIZE 256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of doing this, why not use the interrupt stack for exceptions like all our other arches?
The interrupt stack was used in the original code, but only after initial exception context was saved to the current stack. The problem being fixed here is the following (#8092)
I don't think the interrupt stack can be trusted in case of stack overflow exceptions. Hence this solution with a fail safe dedicated exception stack. |
|
I still don't think we should do this. That's a lot of space to reserve for what should not happen in practice.
This is the real problem. There's no need that I can see to have to store the pointer to the IRQ stack in _kernel. It gets initialized once to K_THREAD_STACK_BUFFER(_interrupt_stack) + CONFIG_ISR_STACK_SIZE and is never touched again. It's essentially a read-only value. It doesn't need to be in RAM. Instead I'd like to see if we can remove the irq_stack members from _kernel (for both UP and SMP) and use ROM for it:
Can you drop the exception stack patch from this PR? I have time to work on what I have described above. |
andrewboie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #8271 to track putting the irq stack pointers in ROM
|
Yes, while analysing the stack issues I wondered why the interrupt stack was obtained from the kernel struct since it is constant. But, that is not the issue. We support nested (different priority) interrupts, and the interrupts can trigger an exception as well. Setting sp to the initial interrupt stack value as the first action in the exception handler interferes with that. It would probably be ok if exceptions don't have to be recoverable, that would greatly simplify things. |
sp only gets set to _kernel interrupt stack if we are not already on the interrupt stack (nested count = 0) Nested count is also something that can get corrupted now that I think of it though....might need to replace nested checks with instead a check in the asm code to determine if SP is already within the bounds of the irq stack and if so don't change it.
I think any exception, if triggered in supervisor mode, that ends up in _NanoFatalErrorHandler can be treated as non-recoverable and we might consider just unconditionally setting SP for this case. |
|
Exceptions (on ARC) can't be nested, an exception during exception will trigger a non-recoverable double fault. So unconditional switching to the dedicate exception stack is ok. The nesting is required for the interrupt stack. |
|
We need a resolution here, this is blocking the 1.12 release. |
Not sure that would help. The kernel irq stack pointer is corrupted and could point anywhere. Perhaps the best thing to do is to either:
Having 256 bytes reserved for a fatal exception stack does not sit well with me, if we go that route it will never get fixed and users will lose that RAM forever. We do have a double-fault stack for x86 but it's only 8 bytes and takes advantage of ancient i386 hardware task switching. |
I don't want to block the release, but if we take the current approach of an exception stack I feel it's mandatory to work on this some more in 1.13 so we don't lose that memory. |
agreed to accept this for 1.12 and rework in 1.13
This PR tries to fix
#8092
commit c058be6
commit ca651b3
#8032
commit 5d58f0f
#6515
commit f8509cc