Workflow 2.0: Issue Grouping #34970
Replies: 3 comments 1 reply
-
Airbrake for Ruby groups errors better than Sentry. Haven't checked their algorithm yet. Sorry for not really useful comment. |
Beta Was this translation helpful? Give feedback.
-
I want to give an update here on where we are at with the work on grouping. Grouping fundamentally is a pretty hard problem and we will increase our investments into it going forward. We are likely going to break the problem into smaller independent chunks that can be optimized independently. We want to ensure that we get the best possible fingerprinting for stack traces and errors, and then to separate this from the actual group creation process. This type of change might be quite fundamental to Sentry which is why this will not be an immediate change. In the foreseeable future expect the following things to happen:
There is now a repository for work related to grouping which will capture our documentation and plans. Right now there is still quite a bit of internal notes that need to be migrated, but it already contains a description what makes fingerprinting hard and what constraints exist: https://github.com/getsentry/grouping-ideas |
Beta Was this translation helpful? Give feedback.
-
Example of a culprit - somehow this exception is getting swallowed and turned into a stackless log entry, and thus has no useful grouping. #37023 https://sentry.io/share/issue/12a98725a623428a8e51b32b76fba713/ Even if it was stackless and kept the exception in check it would at least have the class name it could utilize. |
Beta Was this translation helpful? Give feedback.
-
It’s been a couple of weeks since @dcramer’s initial discussion on Workflow 2.0. As a reminder, our focus is on addressing these two key areas:
We’ve identified a few paths to those outcomes and we are opening the discussion to the community. This conversation is focused on issue grouping, but we’d love to hear from you on issue notifications and performance issues as well.
Where We Are
Sometimes issues that Sentry detects as new are really duplicates of older issues. These failures in grouping lead to notifications that aren’t actionable and a worse in-product experience.
Making This Better
We will improve issue grouping by democratizing access to the algorithm internally and externally. This approach ensures that grouping is continually improved across platforms and communities as opposed to being a one-time enhancement.
We will also present opinions on the quality of the group using information beyond the event itself and refresh groups based on new information.
1. Better access to the grouping algorithm
In the past, enhancing grouping has been challenging due to how intertwined this is with our product code. This has also limited internal and external contributions to the grouping algorithm. We are looking at having language and platform specific configuration bundles in a separate repository on github (Example: getsentry/platform-tweaks) so that improvements to grouping and related modifications can be committed in the form of better default rules or algorithms.
With this mechanism, enhancements get added as a new grouping configuration to the main Sentry repository allowing us to iterate on our grouping strategy. This will also support gradual rollouts and improvements will be available to customers on a regular cadence without their intervention. The goal is to have a monthly cadence for grouping improvements.
2. Not all Issues are equal
Today, our grouping algorithm relies exclusively on fingerprinting. We will augment this with other context we may have on the event allowing us to show confidence levels on issues. For example, we could detect low quality data and use this to inform our notification decisions. Specifically, we would emit this information to the event so that alerting can be automatically disabled for these. Similarly, with platform specific tweaks, the grouping algorithm could surface different types of signals that can act as a better indicator for alerts (example: network and socket errors are more noisy and global than an attribute error).
3. Group fast but revisit with new data
We are evaluating a three minute lookback window where we sweep up multiple noisy issues and create a new “supergroup” of these issues. The user could see a new type of issue comprising multiple related issues. Resolving the top level issue could also resolve the related/contained issues.
This is not an exhaustive list and there are other areas we’re looking into.
Again, we want to hear from you. Do any of the proposed solutions feel like they'd be helpful? Do you have other ideas we should consider? Please let us know in this conversation.
Beta Was this translation helpful? Give feedback.
All reactions