Workflow 2.0: Issue Grouping #34970

mitsuhiko · 2022-05-24T19:12:07Z

mitsuhiko
May 24, 2022

It’s been a couple of weeks since @dcramer’s initial discussion on Workflow 2.0. As a reminder, our focus is on addressing these two key areas:

Deliver an experience that ensures a developer is informed about relevant issues after a new release of their software.
Expand the Issues product to cover additional current and future concerns, including bringing first-class performance issues.

We’ve identified a few paths to those outcomes and we are opening the discussion to the community. This conversation is focused on issue grouping, but we’d love to hear from you on issue notifications and performance issues as well.

Where We Are

Sometimes issues that Sentry detects as new are really duplicates of older issues. These failures in grouping lead to notifications that aren’t actionable and a worse in-product experience.

Making This Better

We will improve issue grouping by democratizing access to the algorithm internally and externally. This approach ensures that grouping is continually improved across platforms and communities as opposed to being a one-time enhancement.

We will also present opinions on the quality of the group using information beyond the event itself and refresh groups based on new information.

1. Better access to the grouping algorithm

In the past, enhancing grouping has been challenging due to how intertwined this is with our product code. This has also limited internal and external contributions to the grouping algorithm. We are looking at having language and platform specific configuration bundles in a separate repository on github (Example: getsentry/platform-tweaks) so that improvements to grouping and related modifications can be committed in the form of better default rules or algorithms.

With this mechanism, enhancements get added as a new grouping configuration to the main Sentry repository allowing us to iterate on our grouping strategy. This will also support gradual rollouts and improvements will be available to customers on a regular cadence without their intervention. The goal is to have a monthly cadence for grouping improvements.

2. Not all Issues are equal

Today, our grouping algorithm relies exclusively on fingerprinting. We will augment this with other context we may have on the event allowing us to show confidence levels on issues. For example, we could detect low quality data and use this to inform our notification decisions. Specifically, we would emit this information to the event so that alerting can be automatically disabled for these. Similarly, with platform specific tweaks, the grouping algorithm could surface different types of signals that can act as a better indicator for alerts (example: network and socket errors are more noisy and global than an attribute error).

3. Group fast but revisit with new data

We are evaluating a three minute lookback window where we sweep up multiple noisy issues and create a new “supergroup” of these issues. The user could see a new type of issue comprising multiple related issues. Resolving the top level issue could also resolve the related/contained issues.

This is not an exhaustive list and there are other areas we’re looking into.

Again, we want to hear from you. Do any of the proposed solutions feel like they'd be helpful? Do you have other ideas we should consider? Please let us know in this conversation.

ilyazub · 2022-06-10T12:35:46Z

ilyazub
Jun 10, 2022

Airbrake for Ruby groups errors better than Sentry. Haven't checked their algorithm yet.

Sorry for not really useful comment.

1 reply

mitsuhiko Jun 22, 2022
Author

I believe airbrake creates larger groups by only taking a single frame into account if the documentation is correct: https://docs.airbrake.io/docs/features/error-grouping/

mitsuhiko · 2022-07-04T10:07:11Z

mitsuhiko
Jul 4, 2022
Author

I want to give an update here on where we are at with the work on grouping. Grouping fundamentally is a pretty hard problem and we will increase our investments into it going forward. We are likely going to break the problem into smaller independent chunks that can be optimized independently. We want to ensure that we get the best possible fingerprinting for stack traces and errors, and then to separate this from the actual group creation process.

This type of change might be quite fundamental to Sentry which is why this will not be an immediate change. In the foreseeable future expect the following things to happen:

We will add support for automatic grouping rules upgrading. Historically we have optimized for a consistent grouping experience for projects which meant that once a project was created it kept the same grouping rules until a user pressed the upgrade button at which point you likely got new groups created. We found a way to make this transition a bit smoother and with that our goal is to now allow automatic upgrading. We're doing this as we found that the vast majority of customers never upgrade their rules and as such improvements are invisible to most. Worse though is that once improvements landed, it's an emotional roller-coaster to actually click the upgrade button.
We are going to invest into fingerprinting. With the automatic grouping upgrades in place we will keep upgrading the fingerprinting algorithm to create the best stack and error fingerprints we can create for the specific platform and cut regular new grouping versions. More importantly we're looking at overhauling our tech here and making it usable in isolation as well. The goal is that both our internal platform engineers as well as interested contributors can help in improving fingerprinting. With that mind we will try to separate our fingerprinting algorithm so it's usable even outside of Sentry and in separate repository (or at least independent library within the main sentry repo).

There is now a repository for work related to grouping which will capture our documentation and plans. Right now there is still quite a bit of internal notes that need to be migrated, but it already contains a description what makes fingerprinting hard and what constraints exist: https://github.com/getsentry/grouping-ideas

0 replies

dcramer · 2022-07-26T17:49:02Z

dcramer
Jul 26, 2022
Maintainer

Example of a culprit - somehow this exception is getting swallowed and turned into a stackless log entry, and thus has no useful grouping.

#37023 https://sentry.io/share/issue/12a98725a623428a8e51b32b76fba713/

Even if it was stackless and kept the exception in check it would at least have the class name it could utilize.

0 replies

zsd4yr · 2025-02-06T17:51:33Z

zsd4yr
Feb 6, 2025

Would love to see more granularity here for arbitrary groupings -- in app, in sdk, in engine, in renderer, etc... just anything and everything we want instead of just app/not app

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Workflow 2.0: Issue Grouping #34970

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Workflow 2.0: Issue Grouping #34970

Uh oh!

Uh oh!

mitsuhiko May 24, 2022

Where We Are

Making This Better

Replies: 4 comments · 1 reply

Uh oh!

ilyazub Jun 10, 2022

Uh oh!

mitsuhiko Jun 22, 2022 Author

Uh oh!

mitsuhiko Jul 4, 2022 Author

Uh oh!

Uh oh!

dcramer Jul 26, 2022 Maintainer

Uh oh!

zsd4yr Feb 6, 2025

mitsuhiko
May 24, 2022

Replies: 4 comments 1 reply

ilyazub
Jun 10, 2022

mitsuhiko Jun 22, 2022
Author

mitsuhiko
Jul 4, 2022
Author

dcramer
Jul 26, 2022
Maintainer

zsd4yr
Feb 6, 2025