Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/debug: printf what asked to print #20168

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kfessel
Copy link
Contributor

@kfessel kfessel commented Dec 12, 2023

Contribution description

removes the unhelpful stack-size-test from debug-print

Testing procedure

Issues/PRs references

this is much more sane than #20166

@github-actions github-actions bot added the Area: core Area: RIOT kernel. Handle PRs marked with this with care! label Dec 12, 2023
@kfessel kfessel requested a review from maribu December 12, 2023 08:30
@benpicco benpicco added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Dec 12, 2023
@riot-ci
Copy link

riot-ci commented Dec 12, 2023

Murdock results

✔️ PASSED

b39fbe2 core/debug: printf what asked to print

Success Failures Total Runtime
8082 0 8082 10m:54s

Artifacts

@@ -44,17 +44,7 @@ extern "C" {
*/
#ifdef DEVELHELP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole ifdef case can go

@OlegHahm
Copy link
Member

Can I haz an explanation why this check is no longer needed?

@maribu
Copy link
Member

maribu commented Dec 12, 2023

With a switch to picolibc we wouldn't get stack overflows on printf() that easily.

With newlib, this can still happen. We do have the MPU based stack overflow and the heuristic in place in addition, so that stack overflows should at least get caught. And an "the debug info you wanted to see cannot be shown until you increase stack size" and an "stack overflown" both have the same result: Bump the stack size and flash again. So there isn't much lost even when using newlib.

@OlegHahm
Copy link
Member

With a switch to picolibc we wouldn't get stack overflows on printf() that easily.

Not that easily but still possible, right? And as far as I understood newlib is still the default or am I wrong?

With newlib, this can still happen. We do have the MPU based stack overflow and the heuristic in place in addition, so that stack overflows should at least get caught.

On how many platforms do we have MPU support? And what is the "heuristic" one?

@maribu
Copy link
Member

maribu commented Dec 12, 2023

Most Cortex M boards have an MPU, but for some we don't use it to due bugs, if I recall correctly.

Some RISC-V MCUs also have an MPU (they call it differently, though). I'm not sure if we use that to guard the stack, though. @teufelchen should know.

The heuristic is the THREAD_CREATE_STACKTEST that fills the stack with canary values. (The address of a memory location is used as canary for that location.) The heuristic can fail if despite the stack overflow the canary is still there (hitting the value by chance is 1 to 2^32-1 on 32 bit systems). The heuristic in practice works pretty well, but it is enavled by default only with develhelp, if I recall correctly.

@OlegHahm
Copy link
Member

As far as I recall these MPUs (or at least some of them) only allow to protect a limited number of memory segments, i.e., cannot be used for arbitrary stack protection.

Regarding STACKTEST: that doesn't provide any stack protection but only shows stack usage. This feature already was in place when this macro was introduced but didn't help that much.

I totally agree that this macro hack is ugly as fuck but was introduced for a reason and I'm not convinced that reason has vanished.

@maribu
Copy link
Member

maribu commented Dec 13, 2023

@OlegHahm:

https://github.com/RIOT-OS/RIOT/blob/master/core/sched.c#L126-L139

@kaspar030
Copy link
Contributor

The heuristic can fail if despite the stack overflow the canary is still there (hitting the value by chance is 1 to 2^32-1 on 32 bit systems). The heuristic in practice works pretty well, but it is enavled by default only with develhelp, if I recall correctly.

Actually the chance of overflowing the stack without hitting the test is much higher, as sth might just allocate a huge buffer on stack, just modifying the SP, but not writing to it. the test only covers changing the exact canary value, not anything below.

@kaspar030
Copy link
Contributor

As far as I recall these MPUs (or at least some of them) only allow to protect a limited number of memory segments, i.e., cannot be used for arbitrary stack protection.

What we do here is to also switch segments when task switching.
I'm not sure though whether we have a segment for the ISR stack.

@maribu
Copy link
Member

maribu commented Dec 13, 2023

Actually the chance of overflowing the stack without hitting the test is much higher, as sth might just allocate a huge buffer on stack, just modifying the SP, but not writing to it.

That is pretty like one of the vectors an attacker would try; it would also work when "jumping over" the MPU guard space.

The Linux kernel uses a similar mechanism to detect stack overflows, I think it has at least two unmapped pages after the stack ("after" here in terms of the direction the stack grows) in the virtual address, causing segfaults on most stack overflows. But jumping of the guard space is possible there as well. I think GCC can be told to touch at least one byte in every page it allocates on the stack to prevent this attack vector, but it requires binaries to be compiled with that magic. We might want to enable that as well (but with MPU guard area size instead of page size)?

@kaspar030
Copy link
Contributor

I also experimented with enlarging the canary value to a proper redzone (bringing non-mpu checks closer to what the mpu does), but never got to PR that. this is the branch.

@maribu
Copy link
Member

maribu commented Dec 13, 2023

I like the idea, but I don't like zero being the magic number there. I think e.g. GCC when asked to write at least one byte to every page of stack allocation during the allocation will write zero to it. The stacktest approach would also make it more difficult to correctly guess the correct magic value, as it depends on memory layout. (But then again, the memory layout is reproducible and static within the same firmware.)

Maybe we could also just check one word on context switch, and have the idle thread check the red zones of all stacks before going to sleep? Or is the context switch overhead not as bad as I assume?

@kaspar030
Copy link
Contributor

I like the idea, but I don't like zero being the magic number there.

I guess re-using the stack test approach makes sense.

Or is the context switch overhead not as bad as I assume?

TBH, I don't remember. I worked on that at the side like, three years ago. I'd assume the overhead to be prohibitive for production, though it might not be that bad. (assuming a word wise memory compare takes a cycle, in theory we're looking at only like, +10-20 cycles per context switch, which would be <5-10% overhead.)

@OlegHahm
Copy link
Member

Regarding this PR:
the reason for this check was that enabling the debug macro may easily result in a stack overflow because of newlib's high stack usage for printf. In that case I would assume that the chances are actually not that bad that the scheduler stack test in place would catch these cases. Hence, the check may be obsolete indeed.

So, how about removing this check and see if we notice that people start to run into this type of problem on a regular basis, we can still revert the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: core Area: RIOT kernel. Handle PRs marked with this with care! CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants