Skip to content

feat(bottlecap): Update Step Functions trace extraction #591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 10, 2025

Conversation

avedmala
Copy link
Contributor

@avedmala avedmala commented Mar 6, 2025

What

Updates the creation of trace context from a Step Function execution's context object to...

  1. Make use of State.RetryCount and Execution.RedriveCount for inferable parent ID generation in a backwards compatible manner (related to Update Step Functions Parent ID Generation datadog-lambda-python#559)
  2. Support the new trace context propagation for multi-level trace merging. These are cases where we're receiving a Step Function event but that Step Function has another parent service which can be either a Lambda or another Step Function (related Explicit trace ID propagation for SFN w/o Hashing datadog-lambda-python#537)

Why

Parity of DataDog/datadog-agent#34264

  1. Using these new values for parent ID generation will prevent collisions with "retry spans" which are Step Function spans that parent a Lambda. Without using the State.RetryCount the other values we use for the hash are identical across tries
  2. Multi-level trace merging is the future, it lets us merge an arbitrary number of Lambda and Step Function traces. The previous approach only lets us do a max depth of 2 services, losing the context after that. We're able to always preserve some about the top-most service or the root service to keep the Trace ID intact while using the context object to infer the parent ID.

Testing

Expanded unit tests to account for new cases

Live trace of the nested step function trace merging (SFN -> SFN -> Lambda)
Screenshot 2025-03-06 at 5 37 33 PM

Live trace of the lambda root trace merging (Lambda -> SFN -> SFN -> Lambda)
Screenshot 2025-03-06 at 5 41 10 PM

@@ -14,13 +14,6 @@ use crate::{
},
};

#[allow(clippy::module_name_repetitions)]
#[derive(Debug, Serialize, Deserialize, Clone, PartialEq)]
pub struct LegacyStepFunctionEvent {
Copy link
Contributor Author

@avedmala avedmala Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was never used as we were just unpacking "Payload": {...} if it existed rather than treating them as separate events

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good catch, it was being covered, just not through this struct

format!("{hi_tid:x}"),
)]);
let (lo_tid, tags) =
if let (Some(trace_id), Some(trace_tags)) = (&self.trace_id, &self.trace_tags) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wouldn't need this if-else checking if we defined separate events rather than including all fields in one and making some optional

But, this way we're keeping all the complexity in this file, otherwise we'd have to match on all the different types of events in the span_inferrer

This logic will need infrequent edits so it made sense to me to pull it down similar to what we do in the python layer

@avedmala avedmala marked this pull request as ready for review March 6, 2025 22:46
@avedmala avedmala requested a review from a team as a code owner March 6, 2025 22:46
Copy link
Contributor

@astuyve astuyve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty solid and well tested!

Comment on lines 64 to 68
let p = payload
.get("Payload")
.unwrap_or(&payload)
.get("_datadog")
.unwrap_or(payload.get("Payload").unwrap_or(&payload));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified

Copy link
Contributor

@duncanista duncanista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments!

@avedmala avedmala force-pushed the avedmala/sfn-trace-context-update branch from ad55c16 to e8cfc56 Compare March 10, 2025 17:48
@avedmala
Copy link
Contributor Author

/merge

@dd-devflow
Copy link

dd-devflow bot commented Mar 10, 2025

View all feedbacks in Devflow UI.
2025-03-10 17:48:26 UTC ℹ️ Start processing command /merge


2025-03-10 17:48:32 UTC ℹ️ MergeQueue: pull request added to the queue

The median merge time in main is 0s.


2025-03-10 17:50:07 UTC ⚠️ MergeQueue: This merge request build was cancelled

abhinav.vedmala@datadoghq.com cancelled this merge request build

@avedmala
Copy link
Contributor Author

/remove

@dd-devflow
Copy link

dd-devflow bot commented Mar 10, 2025

View all feedbacks in Devflow UI.
2025-03-10 17:49:58 UTC ℹ️ Start processing command /remove


2025-03-10 17:50:04 UTC ℹ️ Devflow: /remove

@avedmala avedmala merged commit f3c3d2e into main Mar 10, 2025
28 of 33 checks passed
@avedmala avedmala deleted the avedmala/sfn-trace-context-update branch March 10, 2025 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants