Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CONFIG_THREAD_LOCAL_STORAGE (selected by PICOLIBC) sends sparse into an infinite loop #63417

Closed
marc-hb opened this issue Oct 2, 2023 · 12 comments
Assignees
Labels
area: Build System bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug

Comments

@marc-hb
Copy link
Collaborator

marc-hb commented Oct 2, 2023

Describe the bug

This is admittedly a bug in the sparse tool itself but it's triggered by very specific Zephyr conditions.

EDIT: the infinite loop has been confirmed and fixed in sparse. Other Zephyr issues mentioned below remain, tracked in other bugs if needed.

To Reproduce
Steps to reproduce the behavior:

west build  samples/hello_world -p -b rpi_4b -- -DZEPHYR_SCA_VARIANT=sparse
 => hangs forever, infinite loop in sparse tls.c

Crl-C
ninja -C build -j1
 Building C object zephyr/arch/arch/arm64/core/CMakeFiles/arch__arm64__core.dir/tls.c.obj
 => hangs forever

west build  samples/hello_world -p -b hifive1 -- -DZEPHYR_SCA_VARIANT=sparse
 => same infinite loop

Same infinite loop with intel_ish_5_4_1, arduino_due and others. No infinite loop with intel_adsp_cavs25 (which does not have THREAD_LOCAL_STORAGE)

Workaround 1
EDIT: a bit more complicated now, see below.

--- a/lib/libc/Kconfig
+++ b/lib/libc/Kconfig
@@ -72,7 +72,7 @@ config MINIMAL_LIBC
 config PICOLIBC
        bool "Picolibc library"
        select COMMON_LIBC_ABORT
-       select THREAD_LOCAL_STORAGE if ARCH_HAS_THREAD_LOCAL_STORAGE && TOOLCHAIN_SUPPORTS_THREAD_LOCAL_STORAGE
+#      select THREAD_LOCAL_STORAGE if ARCH_HAS_THREAD_LOCAL_STORAGE && TOOLCHAIN_SUPPORTS_THREAD_LOCAL_STORAGE
        select LIBC_ERRNO if THREAD_LOCAL_STORAGE
        imply COMMON_LIBC_MALLOC
        depends on !NATIVE_APPLICATION

Disabling tls.c still avoids the issue but it requires using the separate picolibc module as added to zephyr/west.yml by d0c75f3. Then the following command avoids the infinite loop:

west build  samples/hello_world  -b rpi_4b -- \
  -DCONFIG_PICOLIBC_USE_MODULE=y  -DZEPHYR_SCA_VARIANT=sparse -DCONFIG_THREAD_LOCAL_STORAGE=n

Workaround 2

west build  samples/hello_world -p -b rpi_4b -- -DZEPHYR_SCA_VARIANT=sparse -DCONFIG_MINIMAL_LIBC=y

CONFIG_MINIMAL_LIBC=y seems to also turn off THREAD_LOCAL_STORAGE

Expected behavior

No infinite loop.

Impact

Impossible to use sparse static analyzer on these platforms.

Harder to debug other sparse issues like #63003

Logs and console output
See above

Environment (please complete the following information):

Linux
Zephyr SDK 0.16.1
Zephyr commit ea4a463
marc-hb/sparse@3848a76ba49f

Additional context

@marc-hb marc-hb added bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug area: Build System labels Oct 2, 2023
@marc-hb marc-hb changed the title CONFIG_THREAD_LOCAL_STORAGE (selected by PICOLIBC) is not compatible with sparse CONFIG_THREAD_LOCAL_STORAGE (selected by PICOLIBC) sends sparse into an infinite loop Oct 23, 2023
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Dec 23, 2023
@marc-hb
Copy link
Collaborator Author

marc-hb commented Dec 26, 2023

This is a bug in the sparse tool.

@marc-hb marc-hb closed this as completed Dec 26, 2023
@marc-hb
Copy link
Collaborator Author

marc-hb commented Dec 27, 2023

Reproduced again with most recent sparse commit adceff0ab6e3, HAVE_LLVM=no and
and zephyr commit 2312897

"Reported" at https://marc.info/?l=linux-sparse&m=170366399601223&w=2 (from https://sparse.docs.kernel.org/)

The infinite loop is located in delete_pseudo_user_list_entry() which tries to remove an entry never found in an extremely large (millions) and probably cyclic "pseudo_user" list:

==69099==    at 0x13F6AA: delete_pseudo_user_list_entry (simplify.c:258)
==69099==    by 0x13F7D4: rem_usage (simplify.c:276)
==69099==    by 0x13F832: remove_usage (simplify.c:284)
==69099==    by 0x1431CD: switch_pseudo (simplify.c:1704)
==69099==    by 0x14367D: simplify_associative_binop (simplify.c:1826)
==69099==    by 0x1468D7: simplify_instruction (simplify.c:2752)
==69099==    by 0x12999B: clean_up_insns (optimize.c:42)
==69099==    by 0x129B82: optimize (optimize.c:98)
==69099==    by 0x12938C: linearize_fn (linearize.c:2577)
==69099==    by 0x1293FF: linearize_symbol (linearize.c:2593)
==69099==    by 0x119F96: check_symbols (sparse.c:306)
==69099==    by 0x11A0EB: main (sparse.c:332)

valgrind does not find anything suspicious.

I git bisected the infinite loop to this sparse commit from @lucvoo

commit 3ad464ff247aa6c5039c9567960787d578c99462
Author: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Date:   Fri Jun 22 17:25:39 2018 +0200

    cast: optimize away casts to/from pointers
    
    Now that all casts to or from a pointer are between a pointer
    and a pointer-sized unsigned integer, from an optimization
    PoV, they are all no-ops.
    
    So, optimize them away at simplification time.
    
    Note: casts between pointers (OP_PTRCAST) should also be
          optimized away but the original type is used for a
          number a things (for example in check_access()) and
          can't be optimized away so simply (yet).
    
    Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>

 simplify.c                    |  5 +++--
 sparse-llvm.c                 |  4 ++++
 validation/optim/cast-kinds.c | 24 ++++++++----------------
 validation/optim/cast-nop.c   | 18 ++++++++++++++++++
 4 files changed, 33 insertions(+), 18 deletions(-)
 create mode 100644 validation/optim/cast-nop.c

Interestingly, BUILD_ASSERT(sizeof(intptr_t) == sizeof(long)); fails in kernel.h just before the infinite loop.
That would stop a regular compiler but it does not stop sparse.

The trigger is in the very short zephyr/arch/arm64/core/tls.c file. Commenting out the calls to z_tls_data_size() and z_tls_copy(stack_ptr) in that file makes the infinite loop go away.

@keith-packard is there a way to disable tls.c while still using the Zephyr SDK, picolibc and generally keeping everything else the same? My one-line Kconfig hack above has no effect anymore and short of disabling picolibc entirely, everything else I tried in kconfig kept tls.c active.

--- a/cgcc
+++ b/cgcc
@@ -106,6 +106,8 @@ if ($do_check) {
     chomp($multiarch_dir);  # possibly remove '\n' from compiler
     $check .= " -multiarch-dir " . $multiarch_dir if $multiarch_dir;
 
+    $check = "valgrind  $check";
+
     print "$check\n" if $verbose;
     if ($do_compile) {
        system ($check) == 0 or exit 1;

@keith-packard
Copy link
Collaborator

@keith-packard is there a way to disable tls.c while still using the Zephyr SDK, picolibc and generally keeping everything else the same? My one-line Kconfig hack above has no effect anymore and short of disabling picolibc entirely, everything else I tried in kconfig kept tls.c active.

Picolibc (as built in the SDK) uses thread local storage internally for a few variables, including 'errno'. If you don't use any of those variables, you can disable TLS. I'm not sure that's entirely satisfactory here though; presumably you want to use sparse with lots of different configurations.

If you build picolibc as a module, then you can disable thread local storage. The only limitation here is that you cannot use the C++ standard library as that uses the C library ABI, including TLS variables exposed by libc like errno.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Dec 28, 2023

The Picolibc module can be enabled by selecting CONFIG_PICOLIBC_USE_MODULE in the application configuration file.

https://docs.zephyrproject.org/latest/develop/languages/c/picolibc.html

Thank you! I'm typically not using zephyr/west.yml and of course it didn't have the picolibc module :-( I switched back to zephyr/west.yml for the purpose of this issue.

After a few other, time-consuming detours to #67035 and #67036, the following workaround runs fine and avoids this infinite loop! I'll edit the description at the top and add it.

west build  samples/hello_world  -b rpi_4b -- \
  -DCONFIG_PICOLIBC_USE_MODULE=y  -DZEPHYR_SCA_VARIANT=sparse -DCONFIG_THREAD_LOCAL_STORAGE=n

-DCONFIG_THREAD_LOCAL_STORAGE=y (the default) reproduces the infinite loop again.

I'm not sure that's entirely satisfactory here though; presumably you want to use sparse with lots of different configurations.

I'm trying to keep the workaround(s) for this issue as small and as least intrusive as possible for two reasons:

  • keeping the difference as small as possible between infinite loop versus not keeps narrowing this issue and pointing precisely at what exactly triggers the infinite loop.
  • from a more "selfish" perspective, keeping that same difference small should hopefully give us more accurate sparse coverage (assuming we add the picolibc module in the future)

@marc-hb
Copy link
Collaborator Author

marc-hb commented Dec 28, 2023

Interestingly, BUILD_ASSERT(sizeof(intptr_t) == sizeof(long)); fails in kernel.h just before the infinite loop.
That would stop a regular compiler but it does not stop sparse.

Actually, isn't that size inconsistency a serious problem in the Zephyr build system? A sparse-specific issue for sure but a Zephyr problem nevertheless. Re-opening!

@marc-hb marc-hb reopened this Dec 28, 2023
@github-actions github-actions bot removed the Stale label Dec 28, 2023
@marc-hb
Copy link
Collaborator Author

marc-hb commented Dec 28, 2023

Easy reproduction with the attached tls.i file:

sparse sparse-infinite-loop-tls.i

.txt
  => infinite loop

sparse-infinite-loop.i.txt

@lucvoo
Copy link

lucvoo commented Dec 28, 2023

Easy reproduction with the attached tls.i file:

Thanks.
It should be fixed on mainline Sparse now.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Dec 28, 2023

sparse fix confirmed! Thanks @lucvoo. @keith-packard now the same command fails differently:

west build  samples/hello_world -o=-k0 -b rpi_4b  --  -DZEPHYR_SCA_VARIANT=sparse
zephyr/lib/libc/picolibc/libc-hooks.c: note: in included file:
zephyr/include/zephyr/posix/sys/stat.h:40:11: error: unable to open 'sys/_timespec.h'

EDIT: this is older #63003

Interestingly, -DCONFIG_PICOLIBC_USE_MODULE=y prints a lot of warnings including hundreds of Trying to use reserved word '_Complex' as identifier but it succeeds:

west build  samples/hello_world  -b rpi_4b -- \
  -DCONFIG_PICOLIBC_USE_MODULE=y  -DZEPHYR_SCA_VARIANT=sparse -DCONFIG_THREAD_LOCAL_STORAGE=y

< wall of warnings >

echo $?
0

I will open new bugs if needed, this one has become too long.

@marc-hb marc-hb closed this as completed Dec 28, 2023
marc-hb pushed a commit to thesofproject/sparse that referenced this issue Dec 28, 2023
The infinite loop is triggered by some fairly simple code on Zephyr and is
caused by some exchange of pseudos done without checking the canonical order.

Fix this by adding the check for the canonical order.
Also use can_move_to() instead of the simpler one_use() to check the dominance
of the moved pseudos.

Link: zephyrproject-rtos/zephyr#63417
Link: https://lore.kernel.org/linux-sparse/AD16C022-C5F3-4DA2-A1A0-775E4C95A7A1@intel.com/
Reported-by: Marc Herbert <marc.herbert@intel.com>
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
lucvoo added a commit to lucvoo/sparse that referenced this issue Jan 7, 2024
The infinite loop is triggered by some fairly simple code on Zephyr and is
caused by some exchange of pseudos done without checking the canonical order.

Fix this by adding the check for the canonical order.
Also use can_move_to() instead of the simpler one_use() to check the dominance
of the moved pseudos.

Link: zephyrproject-rtos/zephyr#63417
Link: https://lore.kernel.org/linux-sparse/AD16C022-C5F3-4DA2-A1A0-775E4C95A7A1@intel.com/
Reported-by: Marc Herbert <marc.herbert@intel.com>
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
@marc-hb
Copy link
Collaborator Author

marc-hb commented Jan 23, 2024

Thank you! I'm typically not using zephyr/west.yml and of course it didn't have the picolibc module :-( I switched back to zephyr/west.yml for the purpose of this issue.

I just got bit by that again... -DCONFIG_PICOLIBC_USE_MODULE=y is SILENTLY ignored when the picolibc module/git repo is not there. @keith-packard would there be a way to catch and report this instead of silently falling back on the SDK? This is not user friendly :-)

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jan 23, 2024

My bad:

warning: PICOLIBC_USE_MODULE (defined at lib/libc/picolibc/Kconfig:6) was assigned the value 'y' but
got the value 'n'. Check these unsatisfied dependencies: ZEPHYR_PICOLIBC_MODULE (=n). See
http://docs.zephyrproject.org/latest/kconfig.html#CONFIG_PICOLIBC_USE_MODULE and/or look up
PICOLIBC_USE_MODULE in the menuconfig/guiconfig interface. The Application Development Primer,
Setting Configuration Values, and Kconfig - Tips and Best Practices sections of the manual might be
helpful too.

Of course this is not visible when buried under 400 lines of sparse warnings....

So maybe what's needed is some sort of -Wconfig-error option...

@keith-packard
Copy link
Collaborator

Thank you! I'm typically not using zephyr/west.yml and of course it didn't have the picolibc module :-( I switched back to zephyr/west.yml for the purpose of this issue.

I just got bit by that again... -DCONFIG_PICOLIBC_USE_MODULE=y is SILENTLY ignored when the picolibc module/git repo is not there. @keith-packard would there be a way to catch and report this instead of silently falling back on the SDK? This is not user friendly :-)

I haven't any idea how to do this; presumably some Kconfig magic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Build System bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug
Projects
None yet
Development

No branches or pull requests

4 participants