Description
Our core workflow has begun to suffer over the years - from growth of technology, scale of data, and general lack of attention. Recognizing this, we had the following conversation this week internally, and are setting out to tackle this concern. That core workflow includes several components that are loosely coupled together:
- the
release
metadata which allows us to identify the version of code an event is present in - the
new alert
(and other variations) approach to notifications - the
new deployment
notification - which requires a good amount of effort to achieve correctness - the
resolved in release
andignore issue
actions and the general triage flow - the
regression
notification which is key to understanding if an issue remains unresolved
These all string together to create a workflow that is tightly coupled to how I - and I believe most developers - think about shipping code. We dry run a bunch of changes, pray our tests are accurate enough, ship the code, and inevitably find a problem. The faster Sentry can connect those dots with accurate diagnostics, the better the outcomes for our customers.
The challenge here is that there's a fundamental gap in the workflow in how we notify about issues. We rely on "New Issue" to be timely and contextual. That is, we duct taped a solution that assumed most of the time new issues happened with new code. That's not always true, and technologies like JavaScript have decreased the signal to noise ratio over the years. So let's tackle that problem. There's a few key things we should look at as part of this:
- What does a timely notification look like connected to the release lifecycle? That should focus on how we help identify truly "new issues caused by code changes".
- How do those notifications change (or become addititive) with things like code push or feature flags, where we're making behavioral changes but not shipping a new SHA?
- How can we improve things like "Ignore Issue" to be more functional usable? We shouldn't rely on custom alerts, or custom saved searches for such common concerns.
- "Resolved in Next Release" can be problematic in many environments, but we now have SemVer support. Can we leverage that better?
- Where is fingerprint breaking down? Are there platforms that are not effective that we need to resolve our native heuristics?
- How does this apply to performance? to csp? to other "problems" (ala "issues")? Issues needs to be a platform to enable the workflow.
Most importantly, with all this in mind, how do we resolve the workflow without requiring customers to configure anything? We've - for better, or more likely worse - taken the approach over the years to put this problem into our customers hands by giving them complicated solutions to create alerts, complicated solutions to improve fingerprinting, and we've stopped trying to build a first-class curated solution to the problem. This is our chance to correct that.
For the community, if you have opinions, what are they? What can we do better here? What works well? What is completely terrible? We already have some solid foundations, but if you've got an opinion, let's hear it!