Skip to content

ref(spans): Detect performance issues directly in segments consumer #86595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 10, 2025

Conversation

jan-auer
Copy link
Member

@jan-auer jan-auer commented Mar 7, 2025

Creates a single step in the segments consumer that detects performance issues
and produces occurrences to the issue platform. This function no longer relies
on EventManager, but it does create an event payload internally.

Issue detectors have quite a few places where they read directly from the event
data dict. We're keeping this for now, so that issue detectors can be refactored
at a later point when issue detection in the transaction pipeline has been
disabled.

Changes:

  • Call detect_performance_issues directly in the consumer.
  • Simplify the function that creates the shim event dict for issue detectors.
  • Move topological sorting of spans into the performance detection module.
  • Inline producing issue occurrences into the detection step in the consumer.
    This will become a single side-effect step in the later platform.

Ref https://github.com/getsentry/streaming-planning/issues/67

@jan-auer jan-auer requested review from a team as code owners March 7, 2025 14:07
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Mar 7, 2025
def _detect_performance_problems(
jobs: Sequence[Job], projects: ProjectsMapping, is_standalone_spans: bool = False
) -> None:
def _detect_performance_problems(jobs: Sequence[Job], projects: ProjectsMapping) -> None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer need the is_standalone_spans flag here since we don't call this from the consumer anymore.

@@ -213,63 +99,100 @@ def _create_models(segment: Span, project: Project) -> None:
)


def transform_spans_to_event_dict(segment_span: Span, spans: list[Span]) -> dict[str, Any]:
event_spans: list[dict[str, Any]] = []
def _detect_performance_problems(segment_span: Span, spans: list[Span], project: Project) -> None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a look at what this function does. It now:

  • Creates an event payload with as many fields as possible populated.
  • Runs that event payload through issue detectors.
  • Trims the event payload to a smaller subset.
  • Creates an issue occurrence and sends it into the issue platform.

I would suggest to keep this for now, until we've disabled issue detection in the old transactions pipeline. That will make it easier for the performance/issues teams to refactor issue detectors on span payloads and create this flow instead:

  • Run the segment's spans directly through issue detectors. Internally, detectors pull top-level attributes (like "browser.name") from the segment span.
  • Create a minimal event payload for the issue occurrence
  • Send it to the issue platform

Comment on lines +189 to +190
event_span["start_timestamp"] = span["start_timestamp_precise"]
event_span["timestamp"] = span["end_timestamp_precise"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue detectors don't read the _precise timestamps from standalone spans yet, since Relay doesn't produce them for embedded spans in transaction events.

  1. We can inject the _precise fields into Event spans in Relay so that we can update detectors and get rid of this modification here.
  2. Alternatively, we always create timestamp and start_timestamp and operate on those in the entire pipeline + we pull the attributes from there in EAP ingest.

@mjq Is there already a definition of how spans should look wrt this?

Comment on lines +340 to +347
if standalone:
# The performance detectors expect the span list to be ordered/flattened in the way they
# are structured in the tree. This is an implicit assumption in the performance detectors.
# So we build a tree and flatten it depth first.
# TODO: See if we can update the detectors to work without this assumption so we can
# just pass it a list of spans.
tree, segment_id = build_tree(data.get("spans", []))
data = {**data, "spans": flatten_tree(tree, segment_id)}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved from the old spans consumer. I'd like to leave it to the owners of performance issue detection whether detectors should be updated or this code remains.

The only change here was that we prevent modifying the original payload.

Copy link

codecov bot commented Mar 7, 2025

Codecov Report

Attention: Patch coverage is 97.22222% with 2 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...sentry/spans/consumers/process_segments/message.py 92.30% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #86595      +/-   ##
==========================================
- Coverage   87.79%   87.79%   -0.01%     
==========================================
  Files        9771     9771              
  Lines      553386   553369      -17     
  Branches    21640    21640              
==========================================
- Hits       485864   485809      -55     
- Misses      67151    67189      +38     
  Partials      371      371              

Comment on lines +144 to +145
event_data=event_data,
is_buffered_spans=True,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sends the trimmed down event payload along with the is_buffered_spans flag into the issue platform. @mrduncan we can further trim down the payload based on what is needed for the created issues to work.

Note that the issues are still created with PerformanceStreamedSpansGroupTypeExperimental - I didn't check what exactly this type does different. I just noticed that when I remove it, at least in my test there are no more occurrences created.

@jan-auer jan-auer merged commit f94aba6 into master Mar 10, 2025
49 checks passed
@jan-auer jan-auer deleted the ref/detect-perf-issues-spans branch March 10, 2025 10:28
jan-auer added a commit that referenced this pull request Mar 10, 2025
* master: (57 commits)
  ref(getting-started): Change code to always update loader script when products changes (#86583)
  ref(spans): Detect performance issues directly in segments consumer (#86595)
  ref(getting-started): Update copies, replacing 'Sentry dashboard' with 'Sentry Issues' (#86672)
  ref(product-selection): Update code to not strip url (#86582)
  ref(quick-start): Replace 'record' with 'create_or_update' in 'record_new_project' (#86663)
  ref(assemble): Remove old `find_missing_chunks` method (#86588)
  fix(views):Exclude newly added views from last visited update as well (#86653)
  feat(workflow_engine): Only execute enabled Detectors (#86652)
  fix(autofix): Github links, remove seer branding, and UI cleanup (#86640)
  fix(seer-issues-patch) More parsing of functions for Python (#86558)
  ref(ui): Remove sentry.eot (#86649)
  feat(alerts): Restrict uptime/crons overview buttons for alerts:write (#86436)
  chore(deps): bump axios from 1.7.7 to 1.8.2 (#86642)
  deps(ui): Upgrade prettier (#86634)
  deps(ui): Upgrade eslint, biome (#86630)
  fix(shared-views): Fix default view passing to last visited endpoint (#86632)
  feat(billing): update copy for payg disabled CTA (#86143)
  ref(ui): Remove usage of withOrganization from organizationAuthList (#86554)
  fix(crons): Fix disabled state of disable button (#86637)
  feat(ui): Replace OrganizationAuth DeprecatedAsyncComponent (#86556)
  ...
@github-actions github-actions bot locked and limited conversation to collaborators Mar 25, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants