Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv: ISR stack is too small for ENABLE_DEBUG in core files #16395

Open
nmeum opened this issue Apr 26, 2021 · 0 comments
Open

riscv: ISR stack is too small for ENABLE_DEBUG in core files #16395

nmeum opened this issue Apr 26, 2021 · 0 comments
Assignees
Labels
Area: cpu Area: CPU/MCU ports Platform: RISC-V Platform: This PR/issue effects RISC-V-based platforms Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@nmeum
Copy link
Member

nmeum commented Apr 26, 2021

Description

I wanted to debug the RIOT thread creation code on the SiFive HiFive1, for this purpose I activated ENABLE_DEBUG in core/thread.c:

RIOT/core/thread.c

Lines 34 to 35 in 1dde0f4

#define ENABLE_DEBUG 0
#include "debug.h"

Interestingly, some functions from thread.c are not executed on any thread stack but instead use the ISR/Exception stack as defined by _sp in the riscv_common ldscript:

.stack ORIGIN(ram) + LENGTH(ram) - __stack_size :
{
PROVIDE( _eheap = . );
. = __stack_size;
PROVIDE( _sp = . );
} >ram AT>ram :ram

For example, consider that kernel_init, which calls thread_create from start.S, is executed with this stack. Unfortunately, the default stack size is quite small with 256 bytes:

__stack_size = DEFINED(__stack_size) ? __stack_size : 256;

This seems to be too small for using the debugging macros from debug.h which on RISC-V would normally require an additional 256 bytes of stack space:

#ifndef THREAD_EXTRA_STACKSIZE_PRINTF
#define THREAD_EXTRA_STACKSIZE_PRINTF (256)
#endif

Unfortunately, the DEBUG_EXTRA_STACKSIZE macro, which is often used to increase stack space if ENABLE_DEBUG is activated can't really be used here as the size of the region in the ldscript would need to be increased in this case. Since the ISR stack is also not a normal RIOT thread, the following sanity check from debug.h does also not work correctly:

if ((thread_get_active() == NULL) || \
(thread_get_active()->stack_size >= \
THREAD_EXTRA_STACKSIZE_PRINTF)) { \
printf(__VA_ARGS__); \
} \
else { \
puts("Cannot debug, stack too small. Consider using DEBUG_PUTS()."); \
} \

As such, I encountered several stack overflows when using ENABLE_DEBUG in core/thread.c on the HiFive1. This may lead to weird behavior during debugging and it would be nice if this could somehow be detected to avoid this pitfall during debugging. Also note that the heuristic employed by the scheduler (SCHED_TEST_STACK) to detect stack overflows does not work for the same reason. Maybe the sanity check in debug.h can be improved by adding an edge case for code running on the ISR stack? Furthermore, it would also be neat to improve the SCHED_TEST_STACK to also cover the ISR stack somehow.

Steps to reproduce the issue

This issue should be reproducible using the following GDB script:

set height 0
break thread_create
continue
break _vfprintf_r
continue
while ($sp > 0x80003f00)
	p/x $sp
	stepi
end
printf "stack overflow at pc: %x\n", $pc

This script can, for instance, be used with the hello-world application. Assuming ENABLE_DEBUG has been enabled in core/thread.c, the application can be compiled as follows:

$ BOARD=hifive1 make -C examples/hello-world/

Afterwards, extract the lower bound of the ISR stack (the ISR stack grows downwards towards _eheap) and add it to the while condition of the gdb script. The lower bound of the ISR stack should match the address of the _eheap symbol which can be extracted as follows:

$ riscv32-unknown-elf-nm examples/hello-world/bin/hifive1/hello-world.elf  | grep _eheap
80003f00 B _eheap

Furthermore, the script requires debug symbols in newlib to find the internal _vfprintf_r symbol. Executing the GDB script results in the following output on my system:

Reading symbols from examples/hello-world/bin/hifive1/hello-world.elf...
_start () at /home/nmeum/src/RIOT/cpu/riscv_common/start.S:17
17	    la gp, __global_pointer$
Breakpoint 1 at 0x204002c6: file /home/nmeum/src/RIOT/core/thread.c, line 196.

Breakpoint 1, thread_create (stack=stack@entry=0x80000090 <idle_stack> "", stacksize=stacksize@entry=256, 
    priority=priority@entry=15 '\017', flags=flags@entry=12, function=function@entry=0x204000de <idle_thread>, 
    arg=arg@entry=0x0, name=name@entry=0x20401c98 "idle") at /home/nmeum/src/RIOT/core/thread.c:196
196	    if (priority >= SCHED_PRIO_LEVELS) {
Breakpoint 2 at 0x204014fc: file /opt/riscv-gnu-toolchain/riscv-newlib/newlib/libc/stdio/nano-vfprintf.c, line 487.

Breakpoint 2, _vfprintf_r (data=0x80000000 <impure_data>, fp=0x800008b4, 
    fmt0=fmt0@entry=0x20401df5 "Created thread %s. PID: %hi. Priority: %u.\n", ap=ap@entry=0x80003fa4)
    at /opt/riscv-gnu-toolchain/riscv-newlib/newlib/libc/stdio/nano-vfprintf.c:487
487	/opt/riscv-gnu-toolchain/riscv-newlib/newlib/libc/stdio/nano-vfprintf.c: No such file or directory.
$1 = 0x80003f70
0x204014fe	487	in /opt/riscv-gnu-toolchain/riscv-newlib/newlib/libc/stdio/nano-vfprintf.c
stack overflow at pc: 204014fe

Though this is a bit hacky, so the issue might not be as easily reproducible using the script. However, I think the outlined problem is hopefully also fairly obvious from the description above.

@fjmolinas fjmolinas added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: cpu Area: CPU/MCU ports labels Apr 27, 2021
@aabadie aabadie added the Platform: RISC-V Platform: This PR/issue effects RISC-V-based platforms label May 20, 2021
@MrKevinWeiss MrKevinWeiss added this to the Release 2021.07 milestone Jun 22, 2021
@MrKevinWeiss MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: cpu Area: CPU/MCU ports Platform: RISC-V Platform: This PR/issue effects RISC-V-based platforms Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

No branches or pull requests

6 participants