Change significance to be determined by IQR fencing #996
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This changes how we define (and subsequently implement in code) a "significant" test result to a more formal and less arbitrary mechanism (described below). Additionally, the documentation is updated to reflect this change.
Before
Until we've used a simple threshold of either 0.2% change for non-"dodgy" test cases (i.e., test cases which we've determined to not have some sort of historical noise) and 0.8% for "dodgy" test cases.
After
Significance is defined as being an outlier when compared with historical data. We use interquartile range fencing to determine whether a given result is an outlier.
IQR fencing uses this formula: