openedon Oct 12, 2020
Summary of the problem (If there are multiple problems or use cases, prioritize them)
We have lots of data points but they are siloed
A user receives an alert from a signal (ie failed test) but the related underlying data we hold cannot be shown in a single view
It takes a lot of time to troubleshoot an error
User stories
As an Operations Engineer/SRE/DevOps Engineer
I need to be made aware of functional issues on my website(s) that are potentially impacting end users and see contextual information to understand potential root causes
So that I can take meaningful action to resolve the problem in the most efficient manner
The error details page includes the following: Screenshot filmstrip, page load waterfall (failing objects highlighted), history of recent runs, synopsis of steps up to the failed step, availability percentage by geo location, APM trace for that specific test run, listing of all log entries that match the period the test was run, relevant metrics, RUM generated information for the identified page/step for the impacted period
The error details page has a persistent link which can be shared
There is a free text notes field that allows end users to record notes about the event or to claim the event