Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tracing): Add tracestate header handling #1179

Merged
merged 7 commits into from
Sep 16, 2021

Conversation

lobsterkatie
Copy link
Member

@lobsterkatie lobsterkatie commented Sep 1, 2021

This PR introduces handling of the tracestate header, as described in the W3C Trace Context spec and our own corresponding spec.

Key features:

  • Deprecation of from_traceparent in favor of continue_from_headers, which now propagates both incoming sentry-trace and incoming tracestate headers.
  • Propagation of tracestate value as a header on outgoing HTTP requests when they're made during a transaction.
  • Addition of tracestate data to transaction envelope headers.

Supporting changes:

Note: tracestate handling is currently feature-gated by the flag propagate_tracestate in the _experiments SDK option.

More details can be found in the main PR on this branch, #971.

@lobsterkatie lobsterkatie marked this pull request as draft September 1, 2021 16:39
@lobsterkatie lobsterkatie force-pushed the kmclb-add-tracestate-header-handling branch from 5f3f8be to 3877df9 Compare September 2, 2021 21:38
No behavior changes - just moving stuff around.
# Conflicts:
#	sentry_sdk/tracing.py
Mostly docstrings and comments. Also a stray, no-longer-necessary variable initialization and a few log statements. Pulled from an upcoming PR in order to make it less noisy.
Three small changes here, mostly unrelated. If you're specifically interested in one of them, it's probably easiest to look at them commit by commit in the PR.

- Correct return type of `compute_tracestate_entry` to correctly return `None` when we're missing a client or DSN.

- Let transactions (effectively) point to themselves as the "containing transaction." In order to avoid a circular reference, this is handled through a new property on both spans and transactions. As a result of this change, we can eliminate a good deal of type-checking that we've been doing up until now.

- Switch `tracestate` creation to be lazy. Rather than always creating a value in the transaction constructor, we now wait until we need it - either for an outgoing HTTP header or for the envelope header when sending the transaction event.
)

This adds user data (specifically `id` and `segment`) and transaction name to the `tracetate` value. 

Doing this has two known limitations, which, though we've discussed them, I'm adding here for posterity:

1) Adding this data puts us over the character limit for tracestate values [listed in the W3C spec](https://www.w3.org/TR/trace-context/#value) (256 characters). For reference:

    Tracestate data:

    ```
    {
        "trace_id": "12312012123120121231201212312012",
        "environment": "dogpark",
        "release": "off.leash.park",
        "public_key": "dogsarebadatkeepingsecrets",
        "user": {"id": 12312013, "segment": "bigs"},
        "transaction": "/interactions/other-dogs/new-dog/"
    }
    ```

    `tracestate`  with without either: 196 characters
    `tracestate` with user data: 256 characters
    `tracestate` with transaction name: 264 characters
    `tracestate` with both: 324 characters

2) This data may change and/or get added to the scope after the tracestate value has been calculated, making it impossible to sample based on those attributes. This is especially a problem for transaction name, which in some frameworks isn't set to its final value until the transaction ends. This poses the added problem that the transaction name in its raw, un-finalized form may contain PII, because it is often the raw URL as opposed to the parameterized one (so, `/users/maisey/tricks/` rather than `/users/:username/tricks/`). More work needs to be done to investigate whether the final transaction name can be set earlier in any/all of the frameworks where this poses a problem. (For instance, it is a known problem in our Express integration, but not yet clear if it is a problem in any Python frameworks.)
This adds a feature flag, `"propagate_tracestate"`, to `_experiments`, which enables the tracestate header handling behavior.

In particular, if the flag is `False` or missing:

- `tracestate` headers won't be attached to outgoing HTTP requests, and
- `trace` data will not be added to envelope headers.
… garbage collection (#1184)

There are a few places in the SDK where we have circular references:

1) Transaction -> span recorder -> spans including transaction itself
2) Child span -> span recorder -> spans including child span itself
3) Transaction -> span recorder -> spans -> containing transaction
4) In `serializer.py`, `_serialize_node()` -> `_serialize_node_impl()` -> `_serialize_node()`, making each a closure for the other.

This PR addresses points 1-3*, by making the following changes:

1) Transactions are no longer added to their own span recorders. (The SDK doesn't ever use the fact that they're there, and they're stripped out before the event is sent to Sentry.)

2) Child spans no longer have their own span recorder pointer, and instead access the recorder through their containing transaction.

3) When a transaction ends, after it harvests any completed spans, it now jettisons its link to its span recorder before it (the transaction) goes out of scope.

It also adds to/modifies tests covering all three scenarios

*Point 4, which we only discovered in the process of fixing 1-3, concerns a different system than the rest, and therefore will need a separate fix.
@lobsterkatie lobsterkatie force-pushed the kmclb-add-tracestate-header-handling branch from 48cadf1 to 27240ee Compare September 13, 2021 20:47
@lobsterkatie lobsterkatie marked this pull request as ready for review September 13, 2021 21:38
@lobsterkatie lobsterkatie changed the title [WIP] feat(tracing): Add tracestate header handling feat(tracing): Add tracestate header handling Sep 13, 2021
Copy link
Member

@untitaker untitaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feedback: this PR is very hard to review. it appears to consist of already-reviewed PRs exclusively which have been reviewed by a different set of people sometimes. and now I can re-review 1.3k added lines at once

If we're going to release all of those changes at once I think it could be rather risky. It seems to me that we have not ever deployed any part of this into our own SaaS usage. I would suggest to split this PR back up into many PRs, merge those, run a (stable) release for each, deploy each release separately to SaaS. Otherwise I think the risk is too great.

Copy link
Member

@untitaker untitaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rubberstamping. apparently this featurebranch has been deployed to saas already at one point, and there's not many ways to reduce risk afaict, unfortunately

@lobsterkatie
Copy link
Member Author

feedback: this PR is very hard to review. it appears to consist of already-reviewed PRs exclusively which have been reviewed by a different set of people sometimes. and now I can re-review 1.3k added lines at once

If we're going to release all of those changes at once I think it could be rather risky. It seems to me that we have not ever deployed any part of this into our own SaaS usage. I would suggest to split this PR back up into many PRs, merge those, run a (stable) release for each, deploy each release separately to SaaS. Otherwise I think the risk is too great.

rubberstamping. apparently this featurebranch has been deployed to saas already at one point, and there's not many ways to reduce risk afaict, unfortunately

We talked about this offline, but for posterity:

This was a feature branch, so yes, it's a bunch of already-reviewed PRs. But we did actually already deploy it, in our back and front ends (see getsentry/sentry#24177 and https://github.com/getsentry/getsentry/pull/5192), at a point when 80-90 percent of the work had already been done.)

We can talk process for future work, but in the meantime, I'm going to merge this as agreed.

@lobsterkatie lobsterkatie merged commit 54bc81c into master Sep 16, 2021
@lobsterkatie lobsterkatie deleted the kmclb-add-tracestate-header-handling branch September 16, 2021 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants