Skip to content

Commit

Permalink
Update net triage docs.
Browse files Browse the repository at this point in the history
In particular, rework triage instructions, and clean up
net-internals doc a little.

NOTRY=true
BUG=none

Review URL: https://codereview.chromium.org/1774733002

Cr-Commit-Position: refs/heads/master@{#380421}
  • Loading branch information
mmenke authored and Commit bot committed Mar 10, 2016
1 parent 381b049 commit 212fe43
Show file tree
Hide file tree
Showing 3 changed files with 155 additions and 154 deletions.
148 changes: 56 additions & 92 deletions net/docs/bug-triage-suggested-workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,59 +2,6 @@

[TOC]

## Looking for new crashers

1. Go to [go/chromecrash](https://goto.google.com/chromecrash).

2. For each platform, look through the releases for which releases to
investigate. As per bug-triage.txt, this should be the most recent canary,
the previous canary (if the most recent is less than a day old), and any of
dev/beta/stable that were released in the last couple of days.

3. For each release, in the "Process Type" frame, click on "browser".

4. At the bottom of the "Magic Signature" frame, click "limit 1000". Reported
crashers are sorted in decreasing order of the number of reports for that
crash signature.

5. Search the page for *"net::"*.

6. For each found signature:
* If there is a bug already filed, make sure it is correctly describing the
current bug (e.g. not closed, or not describing a long-past issue), and
make sure that if it is a *net* bug, that it is labeled as such.
* Ignore signatures that only occur once, as memory corruption can easily
cause one-off failures when the sample size is large enough.
* Ignore signatures that only come from a single client ID, as individual
machine malware and breakage can also easily cause one-off failures.
* Click on the number of reports field to see details of crash. Ignore it
if it doesn't appear to be a network bug.
* Otherwise, file a new bug directly from chromecrash. Note that this may
result in filing bugs for low- and very-low- frequency crashes. That's
ok; the bug tracker is a better tool to figure out whether or not we put
resources into those crashes than a snap judgement when filing bugs.
* For each bug you file, include the following information:
* The backtrace. Note that the backtrace should not be added to the
bug if Restrict-View-Google isn't set on the bug as it may contain
PII. Filing the bug from the crash reporter should do this
automatically, but check.
* The channel in which the bug is seen (canary/dev/beta/stable), its
frequency in that channel, and its rank among crashers in the
channel.
* The frequency of this signature in recent releases. This information
is available by:
1. Clicking on the signature in the "Magic Signature" list
2. Clicking "Edit" on the dremel query at the top of the page
3. Removing the "product.version='X.Y.Z.W' AND" string and clicking
"Update".
4. Clicking "Limit 1000" in the Product Version list in the
resulting page (without this, the listing will be restricted to
the releases in which the signature is most common, which will
often not include the canary/dev release being investigated).
5. Choose some subset of that list, or all of it, to include in the
bug. Make sure to indicate if there is a defined point in the
past before which the signature is not present.

## Identifying unlabeled network bugs on the tracker

* Look at new uncomfirmed bugs since noon PST on the last triager's rotation.
Expand All @@ -72,8 +19,7 @@
related. Be sure to check if other bug reports have that stack trace, and
mark as a dupe if so. Even if the bug isn't network related, paste the stack
trace in the bug, so no one else has to look up the crash stack from the ID.
* If there's no other information than the crash ID, ask for more details
and add the Needs-Feedback label.
* If there's just a blank form and a crash ID, just ignore the bug.

* If network causes are possible, ask for a net-internals log (If it's not a
browser crash) and attach the most specific internals-network label that's
Expand All @@ -96,11 +42,10 @@

* Look through uncomfirmed and untriaged component=Internals>Network bugs,
prioritizing those updated within the last week. [Use this issue tracker
query](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=component%3DInternals%3ENetwork+-status%3AAssigned+-status%3AStarted+-status%3AAvailable&sort=-modified&colspec=ID+Pri+M+Stars+ReleaseBlock+Component+Status+Owner+Summary+OS+Modified&x=m&y=releaseblock&cells=ids).
query](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=component%3DInternals%3ENetwork+status%3AUnconfirmed,Untriaged+-label:Needs-Feedback&sort=-modified).

* If more information is needed from the reporter, ask for it and add the
Needs-Feedback label. If the reporter has answered an earlier request for
information, remove that label.
Needs-Feedback label.

* While investigating a new issue, change the status to Untriaged.

Expand All @@ -112,7 +57,8 @@
mark it with component Privacy.

* For bugs that already have a more specific network component, go ahead and
remove the Internals>Network component and move on.
remove the Internals>Network component to get them off the next triager's
radar and move on.

* Try to figure out if it's really a network bug. See common non-network
components section for description of common components for issues incorrectly
Expand Down Expand Up @@ -161,14 +107,7 @@
subcomponent applies, or only the Internals>Network>HTTP subcomponent
applies, and there's no clear owner), try to figure out the exact cause.

## Monitoring UMA histograms and Chirp/Gasper alerts

Sign up to chrome-network-debugging@google.com mailing list to receive automated
e-mails about UMA alerts. Chirp is the new alert system, sending automated
e-mails with sender address finch-chirp@google.com. Gasper is the old alert
system, sending automated e-mails with sender address gasper-alerts@google.com.
While Chirp is of higher priority, Gasper is not deprecated yet, so both alerts
should be monitored for the time being.
## Investigate UMA notifications

For each alert that fires, determine if it's a real alert and file a bug if so.

Expand All @@ -186,8 +125,56 @@ For each alert that fires, determine if it's a real alert and file a bug if so.
* SimpleCache on Windows
* DiskCache on Android.

For each alert, respond to chrome-network-debugging@google.com with a summary of
the action you've taken and why, including issue link if an issue was filed.
## Looking for new crashers

1. Go to [go/chromecrash](https://goto.google.com/chromecrash).

2. For each platform, look through the releases for which releases to
investigate. As per bug-triage.txt, this should be the most recent canary,
the previous canary (if the most recent is less than a day old), and any of
dev/beta/stable that were released in the last couple of days.

3. For each release, in the "Process Type" frame, click on "browser".

4. At the bottom of the "Magic Signature" frame, click "limit 1000" (Or reduce
the limit to 100 first, as that's all the triager needs to look at).
Reported crashers are sorted in decreasing order of the number of reports for
that crash signature.

5. Search the page for *"net::"*.

6. For each found signature:
* Ignore signatures that only occur once or twice, as memory corruption can
easily cause one-off failures when the sample size is large enough. Also
ignore crashers that are not in the top 100 for that platform / release.
* If there is a bug already filed, make sure it is correctly describing the
current bug (e.g. not closed, or not describing a long-past issue), and
make sure that if it is a *net* bug, that it is labeled as such.
* Ignore signatures that only come from one or two client IDs, as individual
machine malware and breakage can cause one-off failures.
* Click on the number of reports field to see details of crash. Ignore it
if it doesn't appear to be a network bug.
* Otherwise, file a new bug directly from chromecrash.
* For each bug you file, include the following information:
* The backtrace. Note that the backtrace should not be added to the
bug if Restrict-View-Google isn't set on the bug as it may contain
PII. Filing the bug from the crash reporter should do this
automatically, but check.
* The channel in which the bug is seen (canary/dev/beta/stable), and its
rank among crashers in the channel.
* The frequency of this signature in recent releases. This information
is available by:
1. Clicking on the signature in the "Magic Signature" list
2. Clicking "Edit" on the dremel query at the top of the page
3. Removing the "product.version='X.Y.Z.W' AND" string and clicking
"Update".
4. Clicking "Limit 1000" in the Product Version list in the
resulting page (without this, the listing will be restricted to
the releases in which the signature is most common, which will
often not include the canary/dev release being investigated).
5. Choose some subset of that list, or all of it, to include in the
bug. Make sure to indicate if there is a defined point in the
past before which the signature is not present.

## Investigating crashers

Expand Down Expand Up @@ -221,26 +208,3 @@ the action you've taken and why, including issue link if an issue was filed.

* Load crash dumps, try to figure out a cause. See
http://www.chromium.org/developers/crash-reports for more information

## Dealing with old bugs

* For all network issues (Even those with owners, or a more specific component):

* If the issue has had the Needs-Feedback label for over a month, verify it
is waiting on feedback from the user. If not, remove the label.
Otherwise, go ahead and mark the issue WontFix due to lack of response
and suggest the user file a new bug if the issue is still present. [Use
this issue tracker query for old Needs-Feedback
issues](https://code.google.com/p/chromium/issues/list?can=2&q=component%3AInternals>Network%20Needs=Feedback+modified-before%3Atoday-30&sort=-modified).

* If a bug is over 2 months old, and the underlying problem was never
reproduced or really understood:
* If it's over a year old, go ahead and mark the issue as Archived.
* Otherwise, ask reporters if the issue is still present, and attach
the Needs-Feedback label.

* Old unconfirmed or untriaged Internals>Network issues can be investigated
just like newer ones. Crashers should generally be given higher priority,
since we can verify if they still occur, and then newer issues, as they're
more likely to still be present, and more likely to have a still responsive
bug reporter.
141 changes: 89 additions & 52 deletions net/docs/bug-triage.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,16 @@ label seems suitable.

## Responsibilities

### Required:
* Identify new crashers
* Identify new network issues.
* Request data about recent Internals>Network issue.
* Investigate each recent Internals>Network issue.
* Monitor UMA histograms and Chirp/Gasper alerts.
### Required, in rough order of priority:
* Identify new network bugs on the tracker.
* Investigate UMA notifications.
* Investigate recent Internals>Network issues with no subcomponent.
* Follow up on Needs-Feedback issues for all network components.
* Identify and file bugs for significant new crashers.

### Best effort:
* Investigate unowned and owned-but-forgotten net/ crashers
* Investigate old bugs
* Investigate unowned and owned-but-forgotten net/ crashers.
* Investigate old bugs.
* Close obsolete bugs.

All of the above is to be done on each rotation. These responsibilities should
Expand All @@ -30,67 +30,104 @@ uniform, predictable two day commitment for all triagers.

### Required:

* Identify new crashers that are potentially network related. You should check
the most recent canary, the previous canary (if the most recent less than a
day old), and any of dev/beta/stable that were released in the last couple of
days, for each platform. File Internals>Network bugs on the tracker when
new crashers are found.

* Identify new network bugs, both on the bug tracker and on the crash server.
All Unconfirmed issues filed during your triage rotation should be scanned,
and, for suspected network bugs, a network component assigned. A triager is
responsible for looking at bugs reported from noon PST / 3:00 pm EST of the
last day of the previous triager's rotation until the same time on the last
day of their rotation.

* Investigate each recent (new comment within the past week or so)
Internals>Network issue, driving getting information from reporters as
needed, until you can do one of the following:
* Identify new network bugs on the bug tracker. All Unconfirmed issues filed
during your triage rotation should be scanned, and, for suspected network
bugs, a network component assigned and an about:net-internals log requested.
A triager is responsible for looking at bugs reported from noon PST / 3:00 pm
EST of the last day of the previous triager's rotation until the same time on
the last day of their rotation. Once you've changed labels on a bug, mark it
Untriaged, so other triagers sorting through Unconfirmed bugs won't see it.

* For desktop bugs, ask for a net-internals log and give the user a link to
https://sites.google.com/a/chromium.org/dev/for-testers/providing-network-details
(A link there appears on about:net-internals, for easy reference) for
instructions. On mobile, point them to about:net-export. In either case,
attach the Needs-Feedback label.

* Investigate UMA notifications.

* UMA notifications ("chirps") are alerts based on UMA histograms that are
sent to chrome-network-debugging@google.com. Triagers should subscribe
to this list. When an alert fires, the triager should determine if the
alert looks to be real and file a bug with the appropriate label if so.
Note that if no label more specific than Internals>Network is appropriate,
the responsibility remains with the triager to continue investigating the
bug, as above.

* The triager is responsible for looking at any notification previous
triagers did not, so when an issue is investigated, the person who did
so should respond to chrome-network-debugging@google.com with a short
email, describing their conclusions. Future triagers can then use the
fact an alert was responded to as an inidicator of which of them need
to be followed up on.

* Investigate [Uncomfirmed / Untriaged Internals>Network issues that don't
belong to a more specific network component](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=component%3DInternals%3ENetwork+status%3AUnconfirmed,Untriaged+-label:Needs-Feedback&sort=-modified),
prioritizing the most recent issues, ones with the most responsive reporters,
and major crashers. This will generally take up the majority of your time as
triager. Continue digging until you can do one of the following:

* Mark it as *WontFix* (working as intended, obsolete issue) or a
duplicate.

* Mark it as a feature request.

* Remove the Internals>Network component, replacing it with at least one
more specific network component or non-network component. Promptly adding
non-network components when appropriate is important to get new bugs in front
of someone familiar with the relevant code, and to remove them from the
next triager's radar. Because of the way the bug report wizard works, a
lot of bugs incorrectly end up with the network component.

* The issue is assigned to an appropriate owner.

* If there is no more specific component for a bug, it should be investigated
until we have a good understanding of the cause of the problem, and some
idea how it should be fixed, at which point its status should be set to
Available. Future triagers should ignore bugs with this status, unless
investigating stale bugs.
* Mark it as Needs-Feedback.

* Monitor UMA histograms and Chirp/Gasper alerts.

* For each Chirp and Gasper alert that fires, the triager should determine
if the alert is real (not due to noise), and file a bug with the
appropriate component if so. Note that if no component more specific than
Internals>Network is appropriate, the responsibility remains with the
triager to continue investigating the bug, as above.
* Remove the Internals>Network component, replacing it with at least one
more specific network component or non-network component. Replacing the
Internals>Network component gets it off the next triager's radar, and
in front of someone more familiar with the relevant code. Note that
due to the way the bug report wizard works, a lot of bugs incorrectly end
up with the network component.

* The issue is assigned to an appropriate owner, and make sure to mark it
as "assigned" so the next triager doesn't run into it.

* If there is no more specific component for a bug, it should be
investigated by the triager until we have a good understanding of the
cause of the problem, and some idea how it should be fixed, at which point
its status should be set to Available. Future triagers should ignore bugs
with this status, unless investigating stale bugs.

* Follow up on [Needs-Feedback issues for all components owned by the network
stack team](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=component%3AInternals%3ENetwork%2CUI>Browser>Downloads+-component%3AInternals%3ENetwork%3EDataProxy+-component%3AInternals%3ENetwork%3EDataUse+-component%3AInternals%3ENetwork%3EVPN+Needs%3DFeedback).

* Remove label once feedback is provided. Continue to investigate, if
the previous section applies.

* If the Needs-Feedback label has been present for one week, ping the
reporter.

* Archive after two weeks with no feedback, telling users to file a new
bug if they still have the issue, with the requested information, unless
the reporter indicates they'll provide data when they can. In that case,
use your own judgment for further pings or archiving.

* Identify significant new browser process
[crashers](https://goto.google.com/chromecrash) that are potentially network
related. You should look at crashes for the most recent canary that has at
least a day of data, and if there's been a dev or beta release from the start
of the last triager's shift to the start of yours, you should also look at
that once it has at least a day of data. Recent releases available
[here](https://omahaproxy.appspot.com/). If both dev and beta have been
released in that period, just look at beta. File Internals>Network bugs on
the tracker when new crashers are found. Bugs should only be filed for
crashes that are both in the top 100 for each release and occurred for more
than two users.

### Best Effort (As you have time):

* Investigate old bugs, and bugs associated with Internals>Network
subcomponents.

* Investigate unowned and owned but forgotten net/ crashers that are still
occurring (As indicated by
[go/chromecrash](https://goto.google.com/chromecrash)), prioritizing frequent
and long standing crashers.

* Investigate old bugs, prioritizing the most recent.

* Close obsolete bugs.

If you've investigated an issue (in code you don't normally work on) to an
extent that you know how to fix it, and the fix is simple, feel free to take
ownership of the issue and create a patch while on triage duty, but other tasks
should take priority.

See [bug-triage-suggested-workflow.md](bug-triage-suggested-workflow.md) for
suggested workflows.

Expand Down
Loading

0 comments on commit 212fe43

Please sign in to comment.