-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy callstack API #4033
base: main
Are you sure you want to change the base?
Copy callstack API #4033
Conversation
@loganek addressed all your comments and rebased to fix the checks. The last failing check is CI issue and there's another PR that will fix it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, if possible, consider adding tests.
Should be possible yeah, I can try
It's possible to bring this logic to WAMR, but if we are to do this I would like to have wasm_iterate_callstack API available first and then open a separate PR for dumping callstack on signal interruption
This doesn't go well with async-signal-safety, that's why it's not part of it. But it seems you'd like to have a general approach to inspect stack in a safe manner rather than thread-safety itself |
I started refactoring that, So possible change comes down to replacing this loop for wasm_interp But it doesn't make code simpler imo, it just adds more boilerplate internally. Do you see it useful/meaningful to do so @lum1n0us ? |
@lum1n0us I addressed security concerns and made API read-only by reducing it to a copy API instead of using user defined callback. Also I hid the feature behind a feature flag to avoid accidental use of a non thread safe API Could you please take a look at it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few questions regarding the design of the APIs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few comments.
addressed all |
Addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last round, keep fighting 💪
@lum1n0us addressed last comments |
New WAMR public API to copy runtime call stack frames.
CAUTION: this APIs is not thread safe, that's why it's hidden behind feature flag for now. If you need to call it from another thread ensure the passed exec_env is suspended.
Our use case
Sometimes WAMR runtime gets stuck in production and we have no data where in the code compiled to WASM it happens. We currently only track such situations in a separate native thread. To increase visibility into the problem we developed internal solution that requires presence of this API in WAMR. If a separate thread finds that the WASM VM thread has stuck, it interrupts it with a user defined signal and calls this API to collect callstack. The main complexity is maintaining async-signal-safety and avoiding segfaults. For that we're maintaining atomic copies of
exec_env
,exec_env->module_inst
,exec_env->module_inst->module
. Those copies are always set to NULL before the referenced memory is freed. Before a call to this API those copies are always checked for validity. In our use case scenario we guarantee ourselves only absence of crashes but we realize that the frame data that we collect might be invalidated due to a signal interruption. However it's highly unlikely and is not a concern for us.Have we tried existing WAMR APIs for our usecase?
Yes, we've tried suggested by maintainers
wasm_cluster_suspend_thread
andwasm_runtime_terminate
.wasm_cluster_suspend_thread
doesn't suit us either. Even if it did we'd still need API to iterate over stackframes.