Skip to content

Commit 5409e70

Browse files
ivoanjotaegyunkim
andauthored
[PROF-12853] Catch panics inside wrap_with_ffi_result and wrap_with_void_ffi_result (#1334)
[PROF-12853] Catch panics inside `wrap_with_ffi_result` and `wrap_with_void_ffi_result` **What does this PR do?** This PR updates the `wrap_with_ffi_result` and `wrap_with_void_ffi_result` macros to catch any panics that happen inside them, returning them as errors. The error handling is made in such a way (see `handle_panic_error` for details) that it should be able to report back an error even if we fail to do any allocations. Important note: Because only the macros have been changed, and ffi APIs that don't use the macros are of course not affected and can still trigger panics. If we like this approach, I'll follow-up with a separate PR to update other APIs to use the new macros. **Motivation:** In <https://docs.google.com/document/d/1weMu9P03KKhPQ-gh9BMqRrEzpa1BnnY0LaSRGJbfc7A/edit?usp=sharing> (Datadog-only link, sorry!) we saw `ddog_prof_Exporter_send` crashing due to what can be summed up as `ddog_prof_Exporter_send` (report a profile) -> hyper-util tries to do dns resolution in a separate thread pool -> tokio failed to create a new thread -> panic and we tear down the app because we can't report a profile This is not good at all, and this PR solves this inspired by earlier work in #815 and #1083. **Additional Notes:** While I don't predict that will happen very often, callers that want to opt-out of the catch unwind behavior can still use the `..._no_catch` variants of the macros. The return type change in `ddog_crasht_CrashInfoBuilder_build` does change the tag enum entries, which unfortunately is a breaking change. Ideas on how to work around this? This makes the following enum entries change: * `DDOG_CRASHT_CRASH_INFO_NEW_RESULT_OK` => `DDOG_CRASHT_RESULT_HANDLE_CRASH_INFO_OK_HANDLE_CRASH_INFO` * `DDOG_CRASHT_CRASH_INFO_NEW_RESULT_ERR` => `DDOG_CRASHT_RESULT_HANDLE_CRASH_INFO_ERR_HANDLE_CRASH_INFO` **How to test the change?** This change includes test coverage. I've also separately tried to sprinkle a few `panic!` calls manually and tested that it works as expected. Improve documentation around empty vec not allocating Merge branch 'main' into ivoanjo/crash-handling-experiments Fix off-by-one (including terminator in length) I suspect in practice, since this is a static string, it doesn't make a difference but let's fix it still. Remove leftover comment Ooops! Clarify that failed allocation is the only expected source of an empty error Linting fixes Co-authored-by: taegyunkim <taegyun.kim@datadoghq.com> Co-authored-by: ivo.anjo <ivo.anjo@datadoghq.com>
1 parent 5079362 commit 5409e70

File tree

9 files changed

+197
-19
lines changed

9 files changed

+197
-19
lines changed

Cargo.lock

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

examples/ffi/crashinfo.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ CHECK_RESULT(ddog_Vec_Tag_PushResult, DDOG_VEC_TAG_PUSH_RESULT_OK)
6666

6767
EXTRACT_RESULT(CrashInfoBuilder,
6868
DDOG_CRASHT_CRASH_INFO_BUILDER_NEW_RESULT_OK)
69-
EXTRACT_RESULT(CrashInfo, DDOG_CRASHT_CRASH_INFO_NEW_RESULT_OK)
69+
EXTRACT_RESULT(CrashInfo, DDOG_CRASHT_RESULT_HANDLE_CRASH_INFO_OK_HANDLE_CRASH_INFO)
7070
EXTRACT_RESULT(StackTrace, DDOG_CRASHT_STACK_TRACE_NEW_RESULT_OK)
7171
EXTRACT_RESULT(StackFrame, DDOG_CRASHT_STACK_FRAME_NEW_RESULT_OK)
7272

libdd-common-ffi/Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,5 @@ serde = "1.0"
2929

3030
[dev-dependencies]
3131
bolero = "0.13"
32+
assert_no_alloc = "1.1.2"
33+
function_name = "0.3.0"

libdd-common-ffi/src/error.rs

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,22 @@ use crate::slice::{AsBytes, CharSlice};
55
use crate::vec::Vec;
66
use std::fmt::{Debug, Display, Formatter};
77

8+
/// You probably don't want to use this directly. This constant is used by `handle_panic_error` to
9+
/// signal that something went wrong, but avoid needing any allocations to represent it.
10+
///
11+
/// Note: This vec is 0-sized and thus does not allocate any memory (or need to be dropped).
12+
pub(crate) const CANNOT_ALLOCATE_ERROR: Error = Error {
13+
message: Vec::new(),
14+
};
15+
16+
// This error message is used as a placeholder for errors without message -- corresponding to an
17+
// error where we couldn't even _allocate_ the message (or some other even weirder error).
18+
const CANNOT_ALLOCATE: &std::ffi::CStr =
19+
c"libdatadog failed: (panic) Cannot allocate error message";
20+
const CANNOT_ALLOCATE_CHAR_SLICE: CharSlice = unsafe {
21+
crate::Slice::from_raw_parts(CANNOT_ALLOCATE.as_ptr(), CANNOT_ALLOCATE.to_bytes().len())
22+
};
23+
824
/// Please treat this as opaque; do not reach into it, and especially don't
925
/// write into it! The most relevant APIs are:
1026
/// * `ddog_Error_message`, to get the message as a slice.
@@ -104,7 +120,17 @@ pub unsafe extern "C" fn ddog_Error_drop(error: Option<&mut Error>) {
104120
pub unsafe extern "C" fn ddog_Error_message(error: Option<&Error>) -> CharSlice<'_> {
105121
match error {
106122
None => CharSlice::empty(),
107-
Some(err) => CharSlice::from(err.as_ref()),
123+
// When the error is empty (CANNOT_ALLOCATE_ERROR) we assume we failed to allocate an actual
124+
// error and return this placeholder message instead.
125+
// In particular this means we'll use the CANNOT_ALLOCATE_CHAR_SLICE error message for
126+
// **every** empty error, and no other kinds of errors are expected to be empty.
127+
Some(err) => {
128+
if *err == CANNOT_ALLOCATE_ERROR {
129+
CANNOT_ALLOCATE_CHAR_SLICE
130+
} else {
131+
CharSlice::from(err.as_ref())
132+
}
133+
}
108134
}
109135
}
110136

libdd-common-ffi/src/result.rs

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,12 @@ impl From<anyhow::Result<()>> for VoidResult {
3434
}
3535
}
3636

37+
impl From<Error> for VoidResult {
38+
fn from(value: Error) -> Self {
39+
Self::Err(value)
40+
}
41+
}
42+
3743
/// A generic result type for when an operation may fail,
3844
/// or may return <T> in case of success.
3945
#[repr(C)]
@@ -68,3 +74,9 @@ impl<T> From<anyhow::Result<T>> for Result<T> {
6874
}
6975
}
7076
}
77+
78+
impl<T> From<Error> for Result<T> {
79+
fn from(value: Error) -> Self {
80+
Self::Err(value)
81+
}
82+
}

libdd-common-ffi/src/string.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,3 +104,9 @@ impl From<String> for StringWrapperResult {
104104
Self::Ok(value.into())
105105
}
106106
}
107+
108+
impl From<Error> for StringWrapperResult {
109+
fn from(value: Error) -> Self {
110+
Self::Err(value)
111+
}
112+
}

libdd-common-ffi/src/utils.rs

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,30 @@
11
// Copyright 2024-Present Datadog, Inc. https://www.datadoghq.com/
22
// SPDX-License-Identifier: Apache-2.0
33

4+
use std::panic::{catch_unwind, AssertUnwindSafe};
5+
46
/// Wraps a C-FFI function in standard form
57
/// Expects the function to return a result type that implements into and to be decorated with
68
/// #[named].
79
#[macro_export]
810
macro_rules! wrap_with_ffi_result {
11+
($body:block) => {{
12+
use std::panic::{catch_unwind, AssertUnwindSafe};
13+
14+
catch_unwind(AssertUnwindSafe(|| {
15+
$crate::wrap_with_ffi_result_no_catch!({ $body })
16+
}))
17+
.map_or_else(
18+
|e| $crate::utils::handle_panic_error(e, function_name!()).into(),
19+
|result| result,
20+
)
21+
}};
22+
}
23+
24+
/// Wraps a C-FFI function in standard form (no catch variant).
25+
/// Same as `wrap_with_ffi_result` but does not try to catch panics.
26+
#[macro_export]
27+
macro_rules! wrap_with_ffi_result_no_catch {
928
($body:block) => {{
1029
use anyhow::Context;
1130
(|| $body)()
@@ -18,6 +37,23 @@ macro_rules! wrap_with_ffi_result {
1837
/// Expects the function to return a VoidResult and to be decorated with #[named].
1938
#[macro_export]
2039
macro_rules! wrap_with_void_ffi_result {
40+
($body:block) => {{
41+
use std::panic::{catch_unwind, AssertUnwindSafe};
42+
43+
catch_unwind(AssertUnwindSafe(|| {
44+
$crate::wrap_with_void_ffi_result_no_catch!({ $body })
45+
}))
46+
.map_or_else(
47+
|e| $crate::utils::handle_panic_error(e, function_name!()).into(),
48+
|result| result,
49+
)
50+
}};
51+
}
52+
53+
/// Wraps a C-FFI function in standard form (no catch variant).
54+
/// Same as `wrap_with_void_ffi_result` but does not try to catch panics.
55+
#[macro_export]
56+
macro_rules! wrap_with_void_ffi_result_no_catch {
2157
($body:block) => {{
2258
use anyhow::Context;
2359
(|| {
@@ -38,3 +74,93 @@ impl ToHexStr for usize {
3874
format!("0x{self:X}")
3975
}
4076
}
77+
78+
/// You probably don't want to use this directly. This is used by `wrap_with_*_ffi_result` macros to
79+
/// turn a panic error into an actual nice error. Because the original panic may have been caused by
80+
/// being unable to allocate, this helper handles failures to allocate as well, turning them into a
81+
/// fallback error.
82+
pub fn handle_panic_error(
83+
error: Box<dyn std::any::Any + Send + 'static>,
84+
function_name: &str,
85+
) -> crate::Error {
86+
catch_unwind(AssertUnwindSafe(|| {
87+
// This pattern of String vs &str comes from
88+
// https://doc.rust-lang.org/std/panic/struct.PanicHookInfo.html#method.payload
89+
if let Some(s) = error.downcast_ref::<String>() {
90+
anyhow::anyhow!("{} failed: (panic) {}", function_name, s)
91+
} else if let Some(s) = error.downcast_ref::<&str>() {
92+
anyhow::anyhow!("{} failed: (panic) {}", function_name, s)
93+
} else {
94+
anyhow::anyhow!(
95+
"{} failed: (panic) Unable to retrieve panic context",
96+
function_name
97+
)
98+
}
99+
.into()
100+
}))
101+
.unwrap_or(crate::error::CANNOT_ALLOCATE_ERROR)
102+
}
103+
104+
#[cfg(test)]
105+
mod tests {
106+
use assert_no_alloc::{assert_no_alloc, AllocDisabler};
107+
use function_name::named;
108+
109+
#[cfg(debug_assertions)] // required when disable_release is set (default)
110+
#[global_allocator]
111+
static ALLOCATOR: AllocDisabler = AllocDisabler;
112+
113+
#[test]
114+
fn test_handle_panic_error_fallback_does_not_allocate() {
115+
let mut error_result_buffer: [std::os::raw::c_char; 100] = [0; 100];
116+
117+
assert_no_alloc(|| {
118+
// Simulate fallback code path of handle_panic_error + ddog_Error_message
119+
let fallback_error = crate::error::CANNOT_ALLOCATE_ERROR;
120+
let error_message = unsafe { crate::ddog_Error_message(Some(&fallback_error)) };
121+
122+
// Stash error message so we can assert on it
123+
let n = error_message.len().min(error_result_buffer.len());
124+
error_result_buffer[..n].copy_from_slice(&error_message[..n]);
125+
});
126+
127+
unsafe {
128+
let c_str = std::ffi::CStr::from_ptr(error_result_buffer.as_ptr());
129+
assert_eq!(
130+
c_str.to_str().unwrap(),
131+
"libdatadog failed: (panic) Cannot allocate error message"
132+
);
133+
};
134+
}
135+
136+
#[test]
137+
#[named]
138+
#[allow(clippy::redundant_closure_call)]
139+
fn test_wrap_with_ffi_result_turns_panic_into_error() {
140+
// Save the current panic handler and replace it with a no-op so that Rust doesn't print
141+
// anything inside `wrap_with_ffi_result`...
142+
let original_hook = std::panic::take_hook();
143+
std::panic::set_hook(Box::new(|_| {}));
144+
145+
let result: crate::Result<()> = wrap_with_ffi_result!({
146+
panic!("this is a test panic message");
147+
#[allow(unreachable_code)]
148+
anyhow::Ok(())
149+
});
150+
151+
// ...restore original behavior
152+
std::panic::set_hook(original_hook);
153+
154+
assert_eq!(result.unwrap_err().to_string(), "test_wrap_with_ffi_result_turns_panic_into_error failed: (panic) this is a test panic message");
155+
}
156+
157+
#[test]
158+
#[named]
159+
fn test_wrap_with_ffi_result_does_not_modify_other_kinds_of_errors() {
160+
let result: crate::result::VoidResult = wrap_with_void_ffi_result!({
161+
Err(anyhow::anyhow!("this is a test error message"))?;
162+
});
163+
164+
assert_eq!(result.unwrap_err().to_string(), "test_wrap_with_ffi_result_does_not_modify_other_kinds_of_errors failed: this is a test error message");
165+
}
166+
}

libdd-common-ffi/src/vec.rs

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,16 @@ impl<T: Eq> Eq for Vec<T> {}
5353

5454
impl<T> Drop for Vec<T> {
5555
fn drop(&mut self) {
56+
// A Rust Vec of size 0 [has no allocated memory](https://doc.rust-lang.org/std/vec/struct.Vec.html#guarantees):
57+
// "In particular, if you construct a Vec with capacity 0 via Vec::new, vec![],
58+
// Vec::with_capacity(0), or by calling shrink_to_fit on an empty Vec, it will not allocate
59+
// memory." And as per https://doc.rust-lang.org/nomicon/vec/vec-dealloc.html:
60+
// "We must not call alloc::dealloc when self.cap == 0, as in this case we haven't actually
61+
// allocated any memory."
62+
if self.capacity == 0 {
63+
return;
64+
}
65+
5666
let vec =
5767
unsafe { alloc::vec::Vec::from_raw_parts(self.ptr as *mut T, self.len, self.capacity) };
5868
drop(vec)
@@ -106,6 +116,7 @@ impl<T> Vec<T> {
106116
unsafe { Slice::from_raw_parts(self.ptr, self.len) }
107117
}
108118

119+
/// Note: Like the regular rust `Vec`, this doesn't allocate memory when capacity is zero.
109120
pub const fn new() -> Self {
110121
Vec {
111122
ptr: NonNull::dangling().as_ptr(),

libdd-crashtracker-ffi/src/crash_info/builder.rs

Lines changed: 4 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -42,31 +42,18 @@ pub unsafe extern "C" fn ddog_crasht_CrashInfoBuilder_drop(builder: *mut Handle<
4242
}
4343
}
4444

45-
#[allow(dead_code)]
46-
#[repr(C)]
47-
pub enum CrashInfoNewResult {
48-
Ok(Handle<CrashInfo>),
49-
Err(Error),
50-
}
45+
// Name the type so it's prettier for the consumer
46+
pub type CrashInfoNewResult = libdd_common_ffi::Result<Handle<CrashInfo>>;
5147

5248
/// # Safety
5349
/// The `builder` can be null, but if non-null it must point to a Builder made by this module,
5450
/// which has not previously been dropped.
5551
#[no_mangle]
5652
#[must_use]
57-
pub unsafe extern "C" fn ddog_crasht_CrashInfoBuilder_build(
58-
builder: *mut Handle<CrashInfoBuilder>,
59-
) -> CrashInfoNewResult {
60-
match ddog_crasht_crash_info_builder_build_impl(builder) {
61-
Ok(crash_info) => CrashInfoNewResult::Ok(crash_info),
62-
Err(err) => CrashInfoNewResult::Err(err.into()),
63-
}
64-
}
65-
6653
#[named]
67-
unsafe fn ddog_crasht_crash_info_builder_build_impl(
54+
pub unsafe extern "C" fn ddog_crasht_CrashInfoBuilder_build(
6855
mut builder: *mut Handle<CrashInfoBuilder>,
69-
) -> anyhow::Result<Handle<CrashInfo>> {
56+
) -> CrashInfoNewResult {
7057
wrap_with_ffi_result!({ anyhow::Ok(builder.take()?.build()?.into()) })
7158
}
7259

0 commit comments

Comments
 (0)