Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add internal tests for WASI threads #1963

Merged
merged 7 commits into from
Mar 9, 2023

Conversation

eloparco
Copy link
Contributor

@eloparco eloparco commented Feb 15, 2023

Add internal tests for WASI threads. These tests are run in addition to the ones in the proposal https://github.com/WebAssembly/wasi-threads/tree/main/test/testsuite.

The purpose is to test additional and more complex scenarios.

@eloparco eloparco force-pushed the eloparco/test-wasi-threads branch 2 times, most recently from afa80d0 to 99b8f88 Compare February 18, 2023 21:38
@eloparco eloparco changed the base branch from dev/wasi_threads to main February 18, 2023 21:39
@eloparco eloparco force-pushed the eloparco/test-wasi-threads branch from 99b8f88 to bfeeb62 Compare February 18, 2023 22:32
@eloparco eloparco mentioned this pull request Feb 19, 2023
@eloparco eloparco force-pushed the eloparco/test-wasi-threads branch 7 times, most recently from baa0c58 to 878efac Compare February 23, 2023 02:15
@eloparco eloparco changed the title Add tests for WASI threads Add internal tests for WASI threads Feb 23, 2023
@eloparco eloparco force-pushed the eloparco/test-wasi-threads branch 4 times, most recently from 3f38ec1 to 69242fc Compare February 24, 2023 01:41
@eloparco eloparco marked this pull request as ready for review February 24, 2023 02:00
@eloparco eloparco force-pushed the eloparco/test-wasi-threads branch from 6127a8f to 986db41 Compare February 26, 2023 17:34
@eloparco eloparco force-pushed the eloparco/test-wasi-threads branch 2 times, most recently from c4b1654 to 120c146 Compare February 27, 2023 13:52
@eloparco eloparco force-pushed the eloparco/test-wasi-threads branch from 83a8f2c to ccce28b Compare March 3, 2023 01:50
@wenyongh
Copy link
Contributor

wenyongh commented Mar 4, 2023

CI reported error:
https://github.com/bytecodealliance/wasm-micro-runtime/actions/runs/4324003969/jobs/7548386927#step:13:232

Test nonmain_proc_exit_sleep failed
  [exit_code] 33 == 250
STDOUT:

STDERR:
double free or corruption (fasttop)

Any idea about that?

@eloparco eloparco force-pushed the eloparco/test-wasi-threads branch from 6e50016 to 584da41 Compare March 5, 2023 23:37
@eloparco
Copy link
Contributor Author

eloparco commented Mar 6, 2023

Any idea about that?

I rebased the PR on top of main. The problem seems to be caused by the exec_env being freed by the thread after finishing but still used by the main thread for the join operation. Apart from that, there are some edge cases that still cause problems for the tests in this PR.
Instead, after #1995, WASi tests from the proposal run fine without race conditions.

For the tests in this PR, I mainly see two kind of issues:

The investigation is still on.

@wenyongh
Copy link
Contributor

wenyongh commented Mar 6, 2023

Any idea about that?

I rebased the PR on top of main. The problem seems to be caused by the exec_env being freed by the thread after finishing but still used by the main thread for the join operation. Apart from that, there are some edge cases that still cause problems for the tests in this PR. Instead, after #1995, WASi tests from the proposal run fine without race conditions.

Yes, seems that the "double free or corruption (fasttop)" error had been reported by CI when running nonmain_proc_exit_sleep and nonmain_trap_wait, and I found it ever occurred when running thread_termination.wasm in samples/wasi-threads. But I manually tested them again and cannot reproduce the issues, instead, I found that nonmain_proc_exit_wait sometimes got hang.

For the tests in this PR, I mainly see two kind of issues:

The investigation is still on.

OK, seems the first issue is caused by runtime and the second is caused by wasi-libc itself.

@wenyongh
Copy link
Contributor

wenyongh commented Mar 7, 2023

Maybe you can try whether it fixes some issues found?

It didn't fix the data race that I was looking at, I still see it between these two calls

if (!clusters_have_exec_env(exec_env) || exec_env->thread_is_detached) {

In addition, from time to time, I still see (even before your PR) problems related to unlocking an unlocked mutex, usage of invalid mutexes and heap-use-after-free.

If you want to try to reproduce, add the thread sanitizer here

set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -fsanitize=thread")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fsanitize=thread")

and then run the tests like

/wasm-micro-runtime/samples/wasi-threads/build $ ./iwasm -v=5 ../../../core/iwasm/libraries/lib-wasi-threads/test/nonmain_trap_wait.wasm; echo $?

Usually I have I have a script to run all the tests in sequence and I may have to run it multiple times to get useful warnings from the sanitizer.

I won't have much time in the next couple of weeks, so feel free to take a look if you want.

OK, thanks, I will try looking into it.

@wenyongh
Copy link
Contributor

wenyongh commented Mar 7, 2023

@eloparco I found some multi-threading related issues and uploaded PR #2013 to fix them, I think it should be able to fix some of the issues you reported, could you help review it and have a try?
BTW: The PR will lead to compile error in Windows when thread manager is enabled, I will fix it later.

@eloparco
Copy link
Contributor Author

eloparco commented Mar 7, 2023

@wenyongh Great! I just tried the changes, and they seem to fix the remaining race conditions I was seeing.
The only one that I spot a couple of times is with

return wasi_ctx->exit_code;


But that should be fairly easy to fix.

I tried many runs in classic interpreter mode, I still need to check with AOT.

@wenyongh
Copy link
Contributor

wenyongh commented Mar 8, 2023

@wenyongh Great! I just tried the changes, and they seem to fix the remaining race conditions I was seeing. The only one that I spot a couple of times is with

return wasi_ctx->exit_code;

But that should be fairly easy to fix.
I tried many runs in classic interpreter mode, I still need to check with AOT.

OK, I have merged that PR, maybe you can rebase this PR with main branch, and if CIs run OK, I will merge it.

- name: build wasi-libc (needed for wasi-threads)
if: matrix.test_option == '$WASI_TEST_OPTIONS'
run: |
mkdir wasi-libc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we upgrade to wasi-sdk 20, so we don't need to build libc from git?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'm addressing that in a separate PR

Copy link
Contributor Author

@eloparco eloparco Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think I should update it everywhere in the CI or only for the parts where WASI threads are used?
eloparco@4b8c49e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had better update here only since we claimed in the document that we support the wasi-sdk 19.0+, we had better use it by default in the CI files.
https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/build_wasm_app.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it here eloparco@006534c. It works on MacOS but not on Ubuntu 20.04, as we observed in the past (because of the glibc version installed there).

https://github.com/eloparco/wasm-micro-runtime/actions/runs/4379797818/jobs/7666263974
https://github.com/eloparco/wasm-micro-runtime/actions/runs/4379796629/jobs/7666151109

We can use wasi-sdk-20 pre-release in case of Ubuntu 22.04 and keep using wasi-libc for wasi-sdk-19 on Ubuntu 20.04, making sure that in case of wasi-sdk-19 we specify WASI_SYSROOT. What do you think?

Maybe I should create a PR anyway and we can discuss there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good, at least we don't need to build wasi-libc in Ubuntu 22.04. And we can update CI again when the wasi-sdk-20 is actually released. Not sure whether the glibc version issue on Ubuntu 20.04 will be resolved in wasi-sdk-20.0? Or should we file an issue in wasi-sdk community, or upload a PR to help resolve the issue?

@g0djan
Copy link
Contributor

g0djan commented Mar 8, 2023

#2016

Fixed a data race when deleting wait_info in wasm_shared_memory, spotted with main_proc_exit_wait.c from this PR

@eloparco eloparco force-pushed the eloparco/test-wasi-threads branch from 95e9f1e to 7aafc8d Compare March 8, 2023 18:14
@eloparco
Copy link
Contributor Author

eloparco commented Mar 8, 2023

OK, I have merged that PR, maybe you can rebase this PR with main branch, and if CIs run OK, I will merge it.

Yes, let's do that so it's easier to use the tests without applying patches or cherry-picking. They seem to succeed now (I tried many runs) and the data races seem gone (apart from the ones related to load/store operations).
I tried only in classic interpreter mode for now, I'll have to check in AOT mode.

@wenyongh wenyongh merged commit 128c0ea into bytecodealliance:main Mar 9, 2023
@wenyongh
Copy link
Contributor

wenyongh commented Mar 9, 2023

OK, I have merged that PR, maybe you can rebase this PR with main branch, and if CIs run OK, I will merge it.

Yes, let's do that so it's easier to use the tests without applying patches or cherry-picking. They seem to succeed now (I tried many runs) and the data races seem gone (apart from the ones related to load/store operations). I tried only in classic interpreter mode for now, I'll have to check in AOT mode.

OK, I merged it.
BTW, @loganek added a comment to suggest to use the wasi-sdk-20.0 instead in CI, please submit another PR to do it if you want.

@wenyongh
Copy link
Contributor

wenyongh commented Mar 9, 2023

CI run failure:

./test_wamr.sh -m x86_32 -P $WASI_TEST_OPTIONS -t classic-interp
...
Test nonmain_proc_exit_sleep failed
  [exit_code] 33 == 0
STDOUT:

STDERR:

wenyongh added a commit to wenyongh/wasm-micro-runtime that referenced this pull request Mar 9, 2023
@eloparco
Copy link
Contributor Author

CI run failure:

Locally, in classic interpreter mode, I tried multiple runs of test_wamr.sh and it was always succeeding. I tried multiple runs of nonmain_proc_exit_sleep with the thread sanitizer but I couldn't find anything (apart from the usual race conditions in load/store operations).

I didn't try with x86_32 though since I don't have the necessary packages installed, do you think the problem could be x86_32-specific?

@wenyongh
Copy link
Contributor

Locally, in classic interpreter mode, I tried multiple runs of test_wamr.sh and it was always succeeding. I tried multiple runs of nonmain_proc_exit_sleep with the thread sanitizer but I couldn't find anything (apart from the usual race conditions in load/store operations).

I didn't try with x86_32 though since I don't have the necessary packages installed, do you think the problem could be x86_32-specific?

I am not sure, I tested it many times with both iwasm x86_64 and x86_32, but also cannot reproduce the issue.

But for update_shared_data_and_alloc_heap.wasm in core/iwasm/libraries/lib-wasi-threads/test, seems it is only related to x86_32 when calling malloc function:

Failed to spawn a new thread
Exception: out of bounds memory access

I didn't reproduce it with iwasm x86_64, but it easily occurs with iwasm x86_32.

@eloparco
Copy link
Contributor Author

But for update_shared_data_and_alloc_heap.wasm in core/iwasm/libraries/lib-wasi-threads/test, seems it is only related to x86_32 when calling malloc function:

Ok, that may be caused by the fact that the malloc implementation in wasi-libc is not thread-safe iirc and no lock is being used in the test.

@wenyongh
Copy link
Contributor

OK, could we add lock to malloc for this case? And not sure whether there will be a thread-safe malloc in future version of wasi-libc?

@eloparco
Copy link
Contributor Author

OK, could we add lock to malloc for this case? And not sure whether there will be a thread-safe malloc in future version of wasi-libc?

Done #2022

wenyongh pushed a commit that referenced this pull request Mar 13, 2023
Fix a data race for test main_proc_exit_wait.c from #1963.
And fix atomic_wait logic that was wrong before:
- a thread 1 started executing wasm instruction wasm_atomic_wait
  but hasn't reached waiting on condition variable
- a main thread calls proc_exit and notifies all the threads that reached
  waiting on condition variable
Which leads to thread 1 hang on waiting on condition variable after that

Now it's atomically checked whether proc_exit was already called.
victoryang00 pushed a commit to victoryang00/wamr-aot-gc-checkpoint-restore that referenced this pull request May 27, 2024
Add internal tests for WASI threads. These tests are run in addition to
the ones in the proposal:
https://github.com/WebAssembly/wasi-threads/tree/main/test/testsuite.

The purpose is to test additional and more complex scenarios.
victoryang00 pushed a commit to victoryang00/wamr-aot-gc-checkpoint-restore that referenced this pull request May 27, 2024
…odealliance#2016)

Fix a data race for test main_proc_exit_wait.c from bytecodealliance#1963.
And fix atomic_wait logic that was wrong before:
- a thread 1 started executing wasm instruction wasm_atomic_wait
  but hasn't reached waiting on condition variable
- a main thread calls proc_exit and notifies all the threads that reached
  waiting on condition variable
Which leads to thread 1 hang on waiting on condition variable after that

Now it's atomically checked whether proc_exit was already called.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants