Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement crash-tracking handler #282

Merged
merged 87 commits into from
Jan 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
4952924
Create telemetry module in profiling
danielsn Oct 5, 2023
7822344
starting on crashtracker
danielsn Oct 6, 2023
4134ce7
Initial implementation of crashtracking, and a reciever binary
danielsn Oct 12, 2023
0df5b96
Chain to old signal handler
danielsn Oct 12, 2023
0767b6b
cleanup the code a bit after testing on Python
danielsn Oct 16, 2023
9c5a131
print more info from stack trace
danielsn Oct 17, 2023
9e5cec7
crashtracker tries to upload
danielsn Oct 17, 2023
90dd74d
Looks like we only need part of the url
danielsn Oct 17, 2023
8925acb
experimenting with more info in the crashtracker
danielsn Oct 18, 2023
447ef5f
Emit valid json
danielsn Oct 18, 2023
9345b69
Experiment to print /proc/pid/stack
danielsn Oct 19, 2023
8186a0b
split into a couple files
danielsn Oct 19, 2023
0e369fd
emit a bunch more metadata, and wire up the ffi to enable setting it …
danielsn Oct 19, 2023
75045c5
fix _
danielsn Oct 20, 2023
8537f59
collect and emit counters for profiler operations
danielsn Oct 20, 2023
2d91d9f
TODO comments
danielsn Oct 23, 2023
79c728e
Reorganized files
danielsn Oct 24, 2023
567ef3d
Put global state inside an informative enum, and use std::mem::replac…
danielsn Oct 24, 2023
660d08d
Add import for emit_proc_self_maps
sanchda Oct 27, 2023
09dd581
Fix daemonization of sidecar closing stderr properly (#271)
bwoebi Oct 23, 2023
755ef2b
Bump rustix from 0.37.23 to 0.37.26
dependabot[bot] Oct 24, 2023
6087032
Update LICENSE-3rdparty.yml
ivoanjo Oct 25, 2023
1cbe078
Bump webpki from 0.22.1 to 0.22.4 (#273)
dependabot[bot] Oct 25, 2023
8ad467b
Fix clippy lint in tarpc (#276)
paullegranddc Oct 25, 2023
3848dd5
Only emit if there was a crash
danielsn Oct 27, 2023
8e331e2
Tags
danielsn Oct 27, 2023
cf054ae
Add API for on fork
danielsn Oct 27, 2023
1171f44
Actually we do handle two different counters
danielsn Oct 27, 2023
9231f3c
Don't count NotProfiling as profiling
danielsn Oct 27, 2023
30307c2
Merge branch 'main' into dsn/catch-segv
danielsn Nov 20, 2023
599c276
reset counters on fork, and handle sigbus
danielsn Nov 20, 2023
2df8975
Emit some other useful files
danielsn Nov 21, 2023
e019549
Use a proper CrashInfo structure
danielsn Nov 21, 2023
aa39ef4
reasonable code for recieving reports now
danielsn Nov 21, 2023
0534007
use braces so we can decrypt the json
danielsn Nov 21, 2023
54e0b8c
test passes (well, it crashes, but generate teh right report first)
danielsn Nov 22, 2023
d2e80c7
Allow choice of endpoint and/or file output
danielsn Nov 22, 2023
a767dbb
Option is not FFI safe, take it out for a minute
danielsn Nov 22, 2023
a7273e7
better comments
danielsn Nov 22, 2023
b7b3b2d
Better error handling, and no extra quotes in filenames
danielsn Nov 22, 2023
65d83d7
collect a timestamp
danielsn Nov 22, 2023
27a0199
Remove hacky hardcoded values in the api
danielsn Nov 27, 2023
52097d9
create_alt_stack option
danielsn Nov 27, 2023
0afdcc7
handle duplicate names by ignoring the first one
danielsn Nov 27, 2023
9652e6a
Handle duplicate stacktrace names by printing an array with all of them.
danielsn Nov 27, 2023
89c54aa
miri ignore the expected to fail test
danielsn Nov 27, 2023
b3548b7
removed unneeded update
danielsn Nov 27, 2023
6cdce9a
fix miri
danielsn Nov 27, 2023
060262e
Update licence file
danielsn Nov 27, 2023
19785f3
fmt
danielsn Nov 27, 2023
619119e
Friendly name for signaltype
danielsn Nov 27, 2023
7327165
Fix reciever to receiver
danielsn Nov 28, 2023
0674fd2
Better comments
danielsn Nov 28, 2023
19ea672
remove experiments file
danielsn Nov 28, 2023
123bf41
More comments and fix typos
danielsn Nov 29, 2023
5cbd78d
More comments
danielsn Nov 29, 2023
4826dd2
Resolved names are optional
danielsn Nov 30, 2023
1b91f3d
API should just take Config and Metadata
danielsn Dec 15, 2023
c225909
receiver output can either go to dev/null or a file
danielsn Dec 15, 2023
a2511ed
Testing out-of-process symbol resolution
danielsn Dec 15, 2023
b63d236
remove panic, add notes
danielsn Dec 15, 2023
3b2798d
Notes on how to run on docker
danielsn Dec 20, 2023
3481ce7
Move the reciever code to libdatadog, and then use a tiny c function …
danielsn Jan 12, 2024
eb03b3e
makes a 33KB reciever
danielsn Jan 12, 2024
3507b87
Build the crashtracker-receiver when building the ffi
danielsn Jan 12, 2024
57df14d
Handle relative paths when building libdd
danielsn Jan 16, 2024
6256480
rename receiver
danielsn Jan 16, 2024
97f3195
Make API more self documenting
danielsn Jan 17, 2024
9df1bbd
Merge branch 'main' into dsn/crash-handler-api
danielsn Jan 17, 2024
7059475
make the test manual
danielsn Jan 17, 2024
a11725f
forward tags
danielsn Jan 17, 2024
0ecab54
Cargo.lock pulled back to work with 1.69
danielsn Jan 18, 2024
99dfd6d
Licence 3rd party
danielsn Jan 18, 2024
2486394
Test commented out
danielsn Jan 18, 2024
83b44e5
downgrading home
danielsn Jan 18, 2024
ba51dcc
Using ignored doc tests doesn't work in ci
danielsn Jan 18, 2024
ae2cd8b
Add extra fields to crashinfo requested by .net
danielsn Jan 18, 2024
510e7cf
Remove blazesym for now, will add back in subsequent PR
danielsn Jan 18, 2024
90f6602
Crashtracker only builds on unix for now
danielsn Jan 18, 2024
0894ac8
make the entire crashtracker guarded by unix
danielsn Jan 18, 2024
471fae9
The receiver is pure C now
danielsn Jan 18, 2024
4f6388d
licence
danielsn Jan 18, 2024
6ce79a0
Only build the crashtracker on unix
danielsn Jan 18, 2024
de858ce
use apline version in gitlab
danielsn Jan 18, 2024
bbd1b9e
update the dockerfile
danielsn Jan 19, 2024
ba18031
Back to main for the dd-build branch
danielsn Jan 19, 2024
3ba2f51
README
danielsn Jan 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
368 changes: 186 additions & 182 deletions Cargo.lock

Large diffs are not rendered by default.

209 changes: 118 additions & 91 deletions LICENSE-3rdparty.yml

Large diffs are not rendered by default.

22 changes: 22 additions & 0 deletions build-profiling-ffi.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@
# under the Apache License Version 2.0. This product includes software developed
# at Datadog (https://www.datadoghq.com/). Copyright 2021-Present Datadog, Inc.

get_abs_filename() {
# $1 : relative filename
echo "$(cd "$(dirname "$1")" && pwd)/$(basename "$1")"
}

# Location to place all artifacts
if [ -z $CARGO_TARGET_DIR ] ; then
export CARGO_TARGET_DIR=$PWD/target
Expand Down Expand Up @@ -162,4 +167,21 @@ cbindgen --crate "${datadog_profiling_ffi}" \
--output "$destdir/include/datadog/profiling.h"
"$CARGO_TARGET_DIR"/debug/dedup_headers "$destdir/include/datadog/common.h" "$destdir/include/datadog/profiling.h"

# Don't build the crashtracker on windows
if [[ "$target" != "x86_64-pc-windows-msvc" ]]; then
echo "Building binaries"
# $destdir might be relative. Get an absolute path that will work when we cd
export ABS_DESTDIR=$(get_abs_filename $destdir)
export CRASHTRACKER_BUILD_DIR=$CARGO_TARGET_DIR/build/crashtracker-receiver
export CRASHTRACKER_SRC_DIR=$PWD/profiling-crashtracking-receiver
# Always start with a clean directory
[ -d $CRASHTRACKER_BUILD_DIR ] && rm -r $CRASHTRACKER_BUILD_DIR
mkdir -p $CRASHTRACKER_BUILD_DIR
cd $CRASHTRACKER_BUILD_DIR
cmake -S $CRASHTRACKER_SRC_DIR -DDatadog_ROOT=$ABS_DESTDIR
cmake --build .
mkdir -p $ABS_DESTDIR/bin
cp libdatadog-crashtracking-receiver $ABS_DESTDIR/bin
fi

echo "Done."
7 changes: 7 additions & 0 deletions profiling-crashtracking-receiver/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
cmake_minimum_required(VERSION 3.19)
project(datadog_profiling_crashtracking_reciever LANGUAGES C CXX)

find_package(Datadog REQUIRED)

add_executable(libdatadog-crashtracking-receiver libdatadog-crashtracking-receiver.c)
target_link_libraries(libdatadog-crashtracking-receiver PRIVATE Datadog::Profiling)
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
// Unless explicitly stated otherwise all files in this repository are licensed under the Apache License Version 2.0.
// This product includes software developed at Datadog (https://www.datadoghq.com/). Copyright 2024-Present Datadog, Inc.

#include <datadog/common.h>
#include <datadog/profiling.h>
#include <stdio.h>
#include <stdlib.h>

int main(void) {
ddog_prof_Profile_Result new_result = ddog_prof_crashtracker_receiver_entry_point();
if (new_result.tag != DDOG_PROF_PROFILE_NEW_RESULT_OK) {
ddog_CharSlice message = ddog_Error_message(&new_result.err);
fprintf(stderr, "%*s", (int)message.len, message.ptr);
ddog_Error_drop(&new_result.err);
exit(EXIT_FAILURE);
}
return 0;
}
178 changes: 178 additions & 0 deletions profiling-ffi/src/crashtracker.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
// Unless explicitly stated otherwise all files in this repository are licensed under the Apache License Version 2.0.
// This product includes software developed at Datadog (https://www.datadoghq.com/). Copyright 2021-Present Datadog, Inc.
#![cfg(unix)]

use crate::exporter::{self, Endpoint};
use crate::profiles::ProfileResult;
use datadog_profiling::crashtracker;
use ddcommon::tag::Tag;
use ddcommon_ffi::slice::{AsBytes, CharSlice};
use ddcommon_ffi::Error;
use std::ops::Not;

pub use datadog_profiling::crashtracker::{CrashtrackerResolveFrames, ProfilingOpTypes};

#[repr(C)]
pub struct CrashtrackerConfiguration<'a> {
pub create_alt_stack: bool,
/// The endpoint to send the crash repor to (can be a file://)
pub endpoint: Endpoint<'a>,
/// Optional filename to forward stderr to (useful for logging/debugging)
pub optional_stderr_filename: CharSlice<'a>,
/// Optional filename to forward stdout to (useful for logging/debugging)
pub optional_stdout_filename: CharSlice<'a>,
pub path_to_receiver_binary: CharSlice<'a>,
/// Whether/when we should attempt to resolve frames
pub resolve_frames: CrashtrackerResolveFrames,
}

impl<'a> TryFrom<CrashtrackerConfiguration<'a>> for crashtracker::CrashtrackerConfiguration {
type Error = anyhow::Error;
fn try_from(value: CrashtrackerConfiguration<'a>) -> anyhow::Result<Self> {
fn option_from_char_slice(s: CharSlice) -> anyhow::Result<Option<String>> {
let s = unsafe { s.try_to_utf8()?.to_string() };
Ok(s.is_empty().not().then_some(s))
}

let create_alt_stack = value.create_alt_stack;
let endpoint = unsafe { Some(exporter::try_to_endpoint(value.endpoint)?) };
let path_to_receiver_binary =
unsafe { value.path_to_receiver_binary.try_to_utf8()?.to_string() };
let resolve_frames = value.resolve_frames;
let stderr_filename = option_from_char_slice(value.optional_stderr_filename)?;
let stdout_filename = option_from_char_slice(value.optional_stdout_filename)?;

crashtracker::CrashtrackerConfiguration::new(
create_alt_stack,
endpoint,
path_to_receiver_binary,
resolve_frames,
stderr_filename,
stdout_filename,
)
}
}

#[repr(C)]
pub struct CrashtrackerMetadata<'a> {
pub profiling_library_name: CharSlice<'a>,
pub profiling_library_version: CharSlice<'a>,
pub family: CharSlice<'a>,
/// Should include "service", "environment", etc
pub tags: Option<&'a ddcommon_ffi::Vec<Tag>>,
}

impl<'a> TryFrom<CrashtrackerMetadata<'a>> for crashtracker::CrashtrackerMetadata {
type Error = anyhow::Error;
fn try_from(value: CrashtrackerMetadata<'a>) -> anyhow::Result<Self> {
let profiling_library_name =
unsafe { value.profiling_library_name.try_to_utf8()?.to_string() };
let profiling_library_version =
unsafe { value.profiling_library_version.try_to_utf8()?.to_string() };
let family = unsafe { value.family.try_to_utf8()?.to_string() };
let tags = value
.tags
.map(|tags| tags.iter().cloned().collect())
.unwrap_or_default();
Ok(crashtracker::CrashtrackerMetadata::new(
profiling_library_name,
profiling_library_version,
family,
tags,
))
}
}

#[no_mangle]
#[must_use]
pub unsafe extern "C" fn ddog_prof_crashtracker_begin_profiling_op(
op: ProfilingOpTypes,
) -> ProfileResult {
match crashtracker::begin_profiling_op(op) {
Ok(_) => ProfileResult::Ok(true),
Err(err) => ProfileResult::Err(Error::from(
err.context("ddog_prof_crashtracker_init failed"),
)),
}
}

#[no_mangle]
#[must_use]
pub unsafe extern "C" fn ddog_prof_crashtracker_end_profiling_op(
op: ProfilingOpTypes,
) -> ProfileResult {
match crashtracker::end_profiling_op(op) {
Ok(_) => ProfileResult::Ok(true),
Err(err) => ProfileResult::Err(Error::from(
err.context("ddog_prof_crashtracker_init failed"),
)),
}
}

#[no_mangle]
#[must_use]
pub unsafe extern "C" fn ddog_prof_crashtracker_shutdown() -> ProfileResult {
match crashtracker::shutdown_crash_handler() {
Ok(_) => ProfileResult::Ok(true),
Err(err) => ProfileResult::Err(Error::from(
err.context("ddog_prof_crashtracker_init failed"),
)),
}
}

#[no_mangle]
#[must_use]
pub unsafe extern "C" fn ddog_prof_crashtracker_update_on_fork(
config: CrashtrackerConfiguration,
metadata: CrashtrackerMetadata,
) -> ProfileResult {
match ddog_prof_crashtracker_update_on_fork_impl(config, metadata) {
Ok(_) => ProfileResult::Ok(true),
Err(err) => ProfileResult::Err(Error::from(
err.context("ddog_prof_crashtracker_init failed"),
)),
}
}

unsafe fn ddog_prof_crashtracker_update_on_fork_impl(
config: CrashtrackerConfiguration,
metadata: CrashtrackerMetadata,
) -> anyhow::Result<()> {
let config = config.try_into()?;
let metadata = metadata.try_into()?;
crashtracker::on_fork(config, metadata)
}

#[no_mangle]
#[must_use]
pub unsafe extern "C" fn ddog_prof_crashtracker_receiver_entry_point() -> ProfileResult {
match crashtracker::receiver_entry_point() {
Ok(_) => ProfileResult::Ok(true),
Err(err) => ProfileResult::Err(Error::from(
err.context("ddog_prof_crashtracker_receiver_entry_point failed"),
)),
}
}

#[no_mangle]
#[must_use]
pub unsafe extern "C" fn ddog_prof_crashtracker_init(
config: CrashtrackerConfiguration,
metadata: CrashtrackerMetadata,
) -> ProfileResult {
match ddog_prof_crashtracker_init_impl(config, metadata) {
Ok(_) => ProfileResult::Ok(true),
Err(err) => ProfileResult::Err(Error::from(
err.context("ddog_prof_crashtracker_init failed"),
)),
}
}

unsafe fn ddog_prof_crashtracker_init_impl(
config: CrashtrackerConfiguration,
metadata: CrashtrackerMetadata,
) -> anyhow::Result<()> {
let config = config.try_into()?;
let metadata = metadata.try_into()?;
crashtracker::init(config, metadata)
}
2 changes: 1 addition & 1 deletion profiling-ffi/src/exporter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ unsafe fn try_to_url(slice: CharSlice) -> anyhow::Result<hyper::Uri> {
Ok(hyper::Uri::from_str(str)?)
}

unsafe fn try_to_endpoint(endpoint: Endpoint) -> anyhow::Result<exporter::Endpoint> {
pub unsafe fn try_to_endpoint(endpoint: Endpoint) -> anyhow::Result<exporter::Endpoint> {
// convert to utf8 losslessly -- URLs and API keys should all be ASCII, so
// a failed result is likely to be an error.
match endpoint {
Expand Down
1 change: 1 addition & 0 deletions profiling-ffi/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ use std::time::SystemTime;

use chrono::{DateTime, TimeZone, Utc};

mod crashtracker;
mod exporter;
mod profiles;

Expand Down
8 changes: 7 additions & 1 deletion profiling/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,16 @@ crate-type = ["lib"]

[dependencies]
anyhow = "1.0"
backtrace = "0.3.69"
bitmaps = "3.2.0"
bytes = "1.1"
chrono = {version = "0.4", default-features = false, features = ["std", "clock"]}
chrono = {version = "0.4", default-features = false, features = ["std", "clock", "serde"]}
ddcommon = {path = "../ddcommon"}
derivative = "2.2.0"
futures = { version = "0.3", default-features = false }
futures-core = {version = "0.3.0", default-features = false}
futures-util = {version = "0.3.0", default-features = false}
hex = { version = "0.4.3", features = ["serde"] }
http = "0.2"
http-body = "0.4"
hyper = {version = "0.14", features = ["client"], default-features = false}
Expand All @@ -30,10 +32,14 @@ libc = "0.2"
lz4_flex = { version = "0.9", default-features = false, features = ["std", "safe-encode", "frame"] }
mime = "0.3.16"
mime_guess = {version = "2.0", default-features = false}
nix = { version = "0.27.1", features = ["signal"] }
os_info = "3.7.0"
page_size = "0.6.0"
percent-encoding = "2.1"
prost = "0.11"
rustc-hash = { version = "1.1", default-features = false }
serde = {version = "1.0", features = ["derive"]}
serde_json = {version = "1.0"}
tokio = {version = "1.23", features = ["rt", "macros"]}
tokio-util = "0.7.1"
uuid = { version = "1.4.1", features = ["v4", "serde"] }
35 changes: 35 additions & 0 deletions profiling/src/crashtracker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Libdatadog Crashtracker
This module implements a crash tracker that enables libdatadog using profilers to send crash-reports to the datadog backend.

It has three related parts:
1. A `CrashInfo` struct which stores information about a crash in a structured format, and has built-in functions to upload itself to the datadog backend.
This struct can be generated by the crash-handlers described below (e.g. for python or Ruby), or by language runtime tools from a crash dump (e.g. .Net).

2. A UNIX crash-handler, which registers signal handlers to detect crashes.
The legal operations within a crash-handler are limited, so the handler does as much out of process as possible.
When a crash occurs, this handler collects relevant information, and puts it on a pipe to the receiver, which processes it out of process.

3. A binary application `libdatadog-crashtracking-receiver`, which receives the crash-report over a pipe and formats it for transmission to the backend.

## How to use the crashhandler

1. Initilize it using `ddog_prof_crashtracker_init`
2. After a fork, reset the crashtracker in the child using `ddog_prof_crashtracker_update_on_fork`.
This can be done in an `pthread_atfork` handler.
2. [Optional]. The crash-tracker can be shutdown, and the previous crash handler restored, using `ddog_prof_crashtracker_shutdown`.
Currently, there is a state machine that stops you from then restarting the crash-tracker.
Fixing this is a todo

## Todos

- [ ] Support windows
- [ ] API for generating crash report externally.
- [ ] Instrumentation Telemetry support
- [ ] Enable multiple shutdown/restart cycles for crashtracker
- [ ] Enable dynamic update of metadata/configuration



## How to configure crashtracking

#
Loading
Loading