-
Notifications
You must be signed in to change notification settings - Fork 15
Wrap ffi calls with catch_unwind. #1083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9b2b792 to
762337a
Compare
BenchmarksComparisonBenchmark execution time: 2025-06-26 11:20:17 Comparing candidate commit 39b17b6 in PR branch Found 0 performance improvements and 1 performance regressions! Performance is the same for 51 metrics, 2 unstable metrics. scenario:tags/replace_trace_tags
CandidateCandidate benchmark detailsGroup 1
Group 2
Group 3
Group 4
Group 5
Group 6
Group 7
Group 8
Group 9
Group 10
Group 11
Group 12
Group 13
BaselineOmitted due to size. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1083 +/- ##
==========================================
- Coverage 71.21% 71.20% -0.02%
==========================================
Files 340 340
Lines 51654 51692 +38
==========================================
+ Hits 36785 36805 +20
- Misses 14869 14887 +18
🚀 New features to boost your workflow:
|
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
a4e42c9 to
67006bc
Compare
| Self::Serde => write!(f, "Serialization/Deserialization error"), | ||
| Self::TimedOut => write!(f, "Operation timed out"), | ||
| #[cfg(feature = "catch_panic")] | ||
| Self::Panic => write!(f, "Operation panicked"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be returning more information here? Something that would be helpful when these errors wind up in telemetry logs?
| } | ||
|
|
||
| #[cfg(feature = "catch_panic")] | ||
| macro_rules! catch_panic { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking: should this eventually be a proc macro? So we can just do:
#[catch_unwind]
fn foo() {
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
ekump
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A couple of ideas that we can talk more about for follow-up but this is a good start.
|
Hey nice, that we are going forward that way. I have one question: do we plan to have this for all the ffi APIs ? |
At the moment we're unsure about the performance impact of having this integrated so we thought it might be a good idea test it in some of our current integrations and if it does not bring any overhead probably turning it in a proc macro so it can be used across different crates. |
67006bc to
ea6f479
Compare
ea6f479 to
39b17b6
Compare
|
/merge |
|
View all feedbacks in Devflow UI.
This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals.
The expected merge time in
|
## Summary of changes Bump libdatadog from 19.0.1 to 19.1.0. ## Reason for change New datadog version integrates new features: API changes: - ddog_trace_exporter_config_set_connection_timeout: this is aimed to solve span duplications in the test enviroment due to the current timeout is very short. - ddog_trace_exporter_config_set_rates_payload_version: this will avoid caching the agent response which was used to check if there were changes in the sample rates. Improvements: - Prevent panics from unwinding in the host language so we can avoid undefined behavior ([#1083](DataDog/libdatadog#1083)). ## Other details Libdd release [page](https://github.com/DataDog/libdatadog/releases/tag/v19.1.0).
…h_void_ffi_result` **What does this PR do?** This PR updates the `wrap_with_ffi_result` and `wrap_with_void_ffi_result` macros to catch any panics that happen inside them, returning them as errors. The error handling is made in such a way (see `handle_panic_error` for details) that it should be able to report back an error even if we fail to do any allocations. Important note: Because only the macros have been changed, and ffi APIs that don't use the macros are of course not affected and can still trigger panics. If we like this approach, I'll follow-up with a separate PR to update other APIs to use the new macros. **Motivation:** In <https://docs.google.com/document/d/1weMu9P03KKhPQ-gh9BMqRrEzpa1BnnY0LaSRGJbfc7A/edit?usp=sharing> (Datadog-only link, sorry!) we saw `ddog_prof_Exporter_send` crashing due to what can be summed up as `ddog_prof_Exporter_send` (report a profile) -> hyper-util tries to do dns resolution in a separate thread pool -> tokio failed to create a new thread -> panic and we tear down the app because we can't report a profile This is not good at all, and this PR solves this inspired by earlier work in #815 and #1083. **Additional Notes:** While I don't predict that will happen very often, callers that want to opt-out of the catch unwind behavior can still use the `..._no_catch` variants of the macros. **How to test the change?** This change includes test coverage. I've also separately tried to sprinkle a few `panic!` calls manually and tested that it works as expected.
…h_void_ffi_result` **What does this PR do?** This PR updates the `wrap_with_ffi_result` and `wrap_with_void_ffi_result` macros to catch any panics that happen inside them, returning them as errors. The error handling is made in such a way (see `handle_panic_error` for details) that it should be able to report back an error even if we fail to do any allocations. Important note: Because only the macros have been changed, and ffi APIs that don't use the macros are of course not affected and can still trigger panics. If we like this approach, I'll follow-up with a separate PR to update other APIs to use the new macros. **Motivation:** In <https://docs.google.com/document/d/1weMu9P03KKhPQ-gh9BMqRrEzpa1BnnY0LaSRGJbfc7A/edit?usp=sharing> (Datadog-only link, sorry!) we saw `ddog_prof_Exporter_send` crashing due to what can be summed up as `ddog_prof_Exporter_send` (report a profile) -> hyper-util tries to do dns resolution in a separate thread pool -> tokio failed to create a new thread -> panic and we tear down the app because we can't report a profile This is not good at all, and this PR solves this inspired by earlier work in #815 and #1083. **Additional Notes:** While I don't predict that will happen very often, callers that want to opt-out of the catch unwind behavior can still use the `..._no_catch` variants of the macros. **How to test the change?** This change includes test coverage. I've also separately tried to sprinkle a few `panic!` calls manually and tested that it works as expected.
…h_void_ffi_result` **What does this PR do?** This PR updates the `wrap_with_ffi_result` and `wrap_with_void_ffi_result` macros to catch any panics that happen inside them, returning them as errors. The error handling is made in such a way (see `handle_panic_error` for details) that it should be able to report back an error even if we fail to do any allocations. Important note: Because only the macros have been changed, and ffi APIs that don't use the macros are of course not affected and can still trigger panics. If we like this approach, I'll follow-up with a separate PR to update other APIs to use the new macros. **Motivation:** In <https://docs.google.com/document/d/1weMu9P03KKhPQ-gh9BMqRrEzpa1BnnY0LaSRGJbfc7A/edit?usp=sharing> (Datadog-only link, sorry!) we saw `ddog_prof_Exporter_send` crashing due to what can be summed up as `ddog_prof_Exporter_send` (report a profile) -> hyper-util tries to do dns resolution in a separate thread pool -> tokio failed to create a new thread -> panic and we tear down the app because we can't report a profile This is not good at all, and this PR solves this inspired by earlier work in #815 and #1083. **Additional Notes:** While I don't predict that will happen very often, callers that want to opt-out of the catch unwind behavior can still use the `..._no_catch` variants of the macros. **How to test the change?** This change includes test coverage. I've also separately tried to sprinkle a few `panic!` calls manually and tested that it works as expected.
…h_void_ffi_result` **What does this PR do?** This PR updates the `wrap_with_ffi_result` and `wrap_with_void_ffi_result` macros to catch any panics that happen inside them, returning them as errors. The error handling is made in such a way (see `handle_panic_error` for details) that it should be able to report back an error even if we fail to do any allocations. Important note: Because only the macros have been changed, and ffi APIs that don't use the macros are of course not affected and can still trigger panics. If we like this approach, I'll follow-up with a separate PR to update other APIs to use the new macros. **Motivation:** In <https://docs.google.com/document/d/1weMu9P03KKhPQ-gh9BMqRrEzpa1BnnY0LaSRGJbfc7A/edit?usp=sharing> (Datadog-only link, sorry!) we saw `ddog_prof_Exporter_send` crashing due to what can be summed up as `ddog_prof_Exporter_send` (report a profile) -> hyper-util tries to do dns resolution in a separate thread pool -> tokio failed to create a new thread -> panic and we tear down the app because we can't report a profile This is not good at all, and this PR solves this inspired by earlier work in #815 and #1083. **Additional Notes:** While I don't predict that will happen very often, callers that want to opt-out of the catch unwind behavior can still use the `..._no_catch` variants of the macros. The return type change in `ddog_crasht_CrashInfoBuilder_build` does change the tag enum entries, which unfortunately is a breaking change. Ideas on how to work around this? This makes the following enum entries change: * `DDOG_CRASHT_CRASH_INFO_NEW_RESULT_OK` => `DDOG_CRASHT_RESULT_HANDLE_CRASH_INFO_OK_HANDLE_CRASH_INFO` * `DDOG_CRASHT_CRASH_INFO_NEW_RESULT_ERR` => `DDOG_CRASHT_RESULT_HANDLE_CRASH_INFO_ERR_HANDLE_CRASH_INFO` **How to test the change?** This change includes test coverage. I've also separately tried to sprinkle a few `panic!` calls manually and tested that it works as expected.
…h_void_ffi_result` (#1334) [PROF-12853] Catch panics inside `wrap_with_ffi_result` and `wrap_with_void_ffi_result` **What does this PR do?** This PR updates the `wrap_with_ffi_result` and `wrap_with_void_ffi_result` macros to catch any panics that happen inside them, returning them as errors. The error handling is made in such a way (see `handle_panic_error` for details) that it should be able to report back an error even if we fail to do any allocations. Important note: Because only the macros have been changed, and ffi APIs that don't use the macros are of course not affected and can still trigger panics. If we like this approach, I'll follow-up with a separate PR to update other APIs to use the new macros. **Motivation:** In <https://docs.google.com/document/d/1weMu9P03KKhPQ-gh9BMqRrEzpa1BnnY0LaSRGJbfc7A/edit?usp=sharing> (Datadog-only link, sorry!) we saw `ddog_prof_Exporter_send` crashing due to what can be summed up as `ddog_prof_Exporter_send` (report a profile) -> hyper-util tries to do dns resolution in a separate thread pool -> tokio failed to create a new thread -> panic and we tear down the app because we can't report a profile This is not good at all, and this PR solves this inspired by earlier work in #815 and #1083. **Additional Notes:** While I don't predict that will happen very often, callers that want to opt-out of the catch unwind behavior can still use the `..._no_catch` variants of the macros. The return type change in `ddog_crasht_CrashInfoBuilder_build` does change the tag enum entries, which unfortunately is a breaking change. Ideas on how to work around this? This makes the following enum entries change: * `DDOG_CRASHT_CRASH_INFO_NEW_RESULT_OK` => `DDOG_CRASHT_RESULT_HANDLE_CRASH_INFO_OK_HANDLE_CRASH_INFO` * `DDOG_CRASHT_CRASH_INFO_NEW_RESULT_ERR` => `DDOG_CRASHT_RESULT_HANDLE_CRASH_INFO_ERR_HANDLE_CRASH_INFO` **How to test the change?** This change includes test coverage. I've also separately tried to sprinkle a few `panic!` calls manually and tested that it works as expected. Improve documentation around empty vec not allocating Merge branch 'main' into ivoanjo/crash-handling-experiments Fix off-by-one (including terminator in length) I suspect in practice, since this is a static string, it doesn't make a difference but let's fix it still. Remove leftover comment Ooops! Clarify that failed allocation is the only expected source of an empty error Linting fixes Co-authored-by: taegyunkim <taegyun.kim@datadoghq.com> Co-authored-by: ivo.anjo <ivo.anjo@datadoghq.com>
What does this PR do?
Prevent panics from unwindind in the host language so we can avoid UB.
Motivation
Currently there is no guarantee that the trace exporter methods are panic free so there is the possibility that unwinding in the host language can cause UB.
Additional Notes
The wrapper is feature gated by "catch_unwind" feature and it's enabled by default. The aim of the feature is having a mechanism to disable the feature if the performance penalty is higher than expected.