-
Notifications
You must be signed in to change notification settings - Fork 2.1k
tests: add test for context switch from ISR #16284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Looks good, can you add a README to the test? Something like your PR description would be enough IMO. |
All my boards passed the test, I'll ACK once there is a small README, please trigger the CI afterward @maribu |
ping @maribu |
Sorry for the delay. Sadly, there is indeed a bug in the AVR context switch implementation, as @nandojve pointed out, that this test doesn't trigger. I'll try to spice up the test first. @nandojve found a way to do so, but it depends on a platform being able to trigger IRQs on GPIOs by configuring them as output and controlling them. While this works fine on AVR, I don't really want to have this test relying on this. @nandojve: I think this was actually the approach used for context switching some time ago for AVR, but it resulting in a GPIO pin and a precious external interrupt being used. It was actually quite smart, as the context pushed to stack would be identical regardless whether it was triggered from within an ISR, or synchron by a "soft IRQ". Reverting to this implementation would solve the issue again. I have some ideas on how to fake a soft IRQ without relying on real hardware. This would solve the issue and use avrlibc for saving and restoring the context, getting rid of a lot of inline assembly in RIOT. It would also ease adding new AVR boards that are already supported by the avrlibc. |
I'll rebase on |
@maribu , I would like remind that if there is a possibility to use nested ISR I would say IRQ0 should be the answer. Why? when user enable IRQ any high priority can preempt current one. If context switch is on IRQ0 ( highest priority ) no one can preempt it and the bug is solved for that scenario too. |
98d2f6e
to
f6d119f
Compare
OK, I basically rewrote the test. This test is no able to confirm the issue for ATxmega board. However, I still fail to trigger the stack overflow on an Arduino Mega2560. Since I had to use a hardware timer directly to trigger the issue reliably (well, for my |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions. |
still relevant I guess |
8d0ecb3
to
e15b929
Compare
Looks like on the samr21-xpro the test actually found an issue |
Tested successfully with an This is what I get in hterm, I did not press anything.
Edit: On an
|
Maybe relevant: I rebased the PR before testing (otherwise I couldn't have tested with the Xiao). Testing was also positive on |
The new tests triggers context switching from ISR with a high load with enabled stack tests. If the context switching code does not disable IRQs until a context switch is fully completed, an IRQ might be served while not all the saved context is consumed from the stack. Normally, this is not an issue, as the context switch is just resumed after the ISR completes. However, under high load data on the stack can accumulate and eventually overflow. This tests puts the board under high IRQ load to see if such stack overflows can happen.
e15b929
to
40f905a
Compare
Yeah, with the PR being that old, it really makes sense to rebase it. I just did so now to make use of the "new" test folder structure and the
Since this is more about the implementation of the context switching, I would expect all MCUs of a given architecture to either fail or all to pass, unless there are bugs in the test or in the This is why I'm more than a bit puzzled about the test failure on the
It just seems to be slow:
|
Whoopsie:
|
Interesting: Prefixing |
I'm not sure what changed, but now the test passes on the
Perhaps it has to do with |
|
||
|
||
def testfunc(child): | ||
child.expect("Testing 500 context switches triggered from ISR") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
child.expect("Testing 500 context switches triggered from ISR") | |
child.expect(r"Testing 500+ context switches triggered from ISR") |
otherwise it passes on adafruit-feather-nrf52840-sense
, but also resets to the bootloader.
Testing Context Switches from ISR under Load | ||
============================================ | ||
|
||
This tests launches two threads, t1 and t2, which each try to lock its own mutex for 500 times. A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests launches two threads, t1 and t2, which each try to lock its own mutex for 500 times. A | |
This tests launches two threads, t1 and t2, which each try to lock its own mutex for 500 or 5000 times, depending on the MCU. A |
I will sadly have to change the test here again. I'm able to cause a stack overflow when slightly changing the timing for pretty much any Cortex-M board. I would like to have this test fail reliably on every Cortex-M board, as long as the issue is there. But the fact that this test can trigger the stack overflow if it hits the right timing in the context switch shows that the underlying strategy is working. I'll probably have to sweep with the timing gently over a wide range of timeouts and for each do enough repetitions to cause a stack overflow, if the stack would grow due to context switching. But that does sound like a pretty expensive test then. |
Maybe some more evidence: I've noticed that with With diff --git a/tests/core/isr_context_switch/main.c b/tests/core/isr_context_switch/main.c
index 13399c0a06..e522a85dc0 100644
--- a/tests/core/isr_context_switch/main.c
+++ b/tests/core/isr_context_switch/main.c
@@ -81,6 +81,7 @@ static void *t1_impl(void *unused)
for (unsigned repetitions = 0; repetitions < TEST_REPETITIONS; repetitions++) {
mutex_lock(&sig_t1);
}
+ puts("t1 FINISHED");
return NULL;
}
@@ -91,6 +92,7 @@ static void *t2_impl(void *unused)
mutex_lock(&sig_t2);
}
mutex_unlock(&sig_main);
+ puts("t2 FINISHED");
return NULL;
} I now get no stacksize information any longer (and no extra output) with 2025-04-10 14:14:35,418 # START
2025-04-10 14:14:35,423 # main(): This is RIOT! (Version: 2025.04-devel-392-geb1d7-HEAD)
2025-04-10 14:14:35,427 # Testing 5000 context switches triggered from ISR
2025-04-10 14:14:35,431 # INFO: timer running at 16000000 Hz.
2025-04-10 14:14:35,436 # Timeout at start: 125, incrementing timeout in steps of: 31
2025-04-10 14:14:38,130 # TEST PASSED
2025-04-10 14:14:38,135 # { "threads": [{ "name": "main", "stack_size": 1024, "stack_used": 396 }]}
2025-04-10 14:14:38,138 # t2*** RIOT kernel panic:
2025-04-10 14:14:38,139 # MEM MANAGE HANDLER
2025-04-10 14:14:38,139 #
2025-04-10 14:14:38,140 # *** halted.
2025-04-10 14:14:38,141 #
2025-04-10 14:14:38,142 # Inside isr -12 |
Contribution description
The new tests triggers context switching from ISR with a high load
with enabled stack tests. If the context switching code does not disable
IRQs until a context switch is fully completed, an IRQ might be served
while not all the saved context is consumed from the stack. Normally,
this is not an issue, as the context switch is just resumed after the
ISR completes. However, under high load data on the stack can accumulate
and eventually overflow.
This tests puts the board under high IRQ load to see if such stack
overflows can happen.
Testing procedure
Issues/PRs references
Came up in #16211