-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-16949 Introduce inverse quantiles for metrics where higher numer… #5495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fd95d54
to
989ca5f
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
public class MutableInverseQuantiles extends MutableQuantiles{ | ||
|
||
@VisibleForTesting | ||
public static final Quantile[] INVERSE_QUANTILES = { new Quantile(0.50, 0.050), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should either invert the percentile, or report all percentiles. This only changes the range without the list-order traversal change.
I would expect that this PR for "inverse quantiles" would report the P10 as the P90, but not directly emit the P10. If we emit the P10 we should enhance quantiles to emit all percents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reversing the list-order traversal does not work well with Inverse Quantiles as seen in PR #5486.
It does not work because for optimization not all values are stored in memory.
For quantiles more values at the higher percentile are stored ( 99, 95, 90 ..) for a smaller allowed error percentage ( 0.1, 0.5, 1 % ). For lower percentiles the allowed error increases (+-10% error for 1 percentile) giving us less accurate values.
If we store all the quantiles ( p99, p95, p90, p75, p50, p25, p10, p5, p1 ) with allowed error percentages as (0.1, 0.5, 1, 2.5, 5, 2.5, 1, 0.5. 0.1 ) then it will defeat the purpose of having space optimization since we will end up storing all the values anyways.
Thus limiting normal quantiles to p99, p95, p90, p75, p50 (with existing allowed error % to be .1, .5, 1, 2.5, 5)
And inverse quantiles to just p1, p5, p10, p25, p50 (with allowed error % to be .1, .5, 1, 2.5, 5) will give us the optimization that we need as well as the more accurate results for the metric we care for both of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
String descTemplate = "%d inverse percentile " + lvName + " with " + interval | ||
+ " second interval for " + desc; | ||
for (int i = 0; i < INVERSE_QUANTILES.length; i++) { | ||
int inversePercentile = (int) (100 * (1 - INVERSE_QUANTILES[i].quantile)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. We are still inverting the quantile, but also reporting P10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are reporting P10 as P90 due to inversion.
*/ | ||
public MutableInverseQuantiles(String name, String description, String sampleName, | ||
String valueName, int interval) { | ||
String ucName = StringUtils.capitalize(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend two things going this route:
- Encapsulate any logic you will not add a test for in a common function shared between implementations.
- Any logic that could be parametarized or overridden.
- Test all new logic added.
This function repeats much of the constructor from the superclass to supply the inverse percentile. I would advocate for a DRY subclass (assuming we subclass).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. Will try to refactor the common part here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored
...ject/hadoop-common/src/main/java/org/apache/hadoop/metrics2/lib/MutableInverseQuantiles.java
Show resolved
Hide resolved
@VisibleForTesting | ||
public static final Quantile[] INVERSE_QUANTILES = { new Quantile(0.50, 0.050), | ||
new Quantile(0.25, 0.025), new Quantile(0.10, 0.010), | ||
new Quantile(0.05, 0.005), new Quantile(0.01, 0.001) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you are using the same structure all the time: quantile, error=quantile/10.
To make this more readable, I wonder if we could do something like:
class PercentileQuantile extend Quantile {
PercentileQuantile(double percentile) {
super(percentile/100.0, percentile/1000.0);
}
}
public static final Quantile[] INVERSE_QUANTILES = {
new PercentileQuantile(50), // 50th
new PercentileQuantile(25), // 25th
new PercentileQuantile(10), // 10th
new PercentileQuantile(5), // 5th
new PercentileQuantile(1), // 1th
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. This looks much cleaner & readable.
* @param description long-form textual description of the metric | ||
* @param sampleName type of items in the stream (e.g., "Ops") | ||
* @param valueName type of the values | ||
* @param interval rollover interval (in seconds) of the estimator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add seconds to the variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could even go further with TimeUnit but that might be overkilled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intervalSecs seems sufficient, updated.
String descTemplate = "%d inverse percentile " + lvName + " with " + interval | ||
+ " second interval for " + desc; | ||
for (int i = 0; i < INVERSE_QUANTILES.length; i++) { | ||
int inversePercentile = (int) (100 * (1 - INVERSE_QUANTILES[i].quantile)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have issues with percentiles like 99.9th when rounding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would have, changed to double.
if (scheduledTask != null) { | ||
scheduledTask.cancel(false); | ||
} | ||
scheduledTask = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just do the scheduledTask = null;
inside the if.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
...mon-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/lib/MutableQuantiles.java
Outdated
Show resolved
Hide resolved
Random r = new Random(0xDEADDEAD); | ||
Long[] values = new Long[count]; | ||
for (int i = 0; i < count; i++) { | ||
values[i] = (long) (i + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just do:
for (long i = 0; i < count; i++) {
values[i] = i + 1;
}
public void testInverseQuantiles() throws IOException { | ||
SampleQuantiles inverseQuantilesEstimator = new SampleQuantiles(MutableInverseQuantiles.INVERSE_QUANTILES); | ||
final int count = 100000; | ||
Random r = new Random(0xDEADDEAD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are you using this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Random object is used as source of randomness to shuffle the sample array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it rnd
; it is hard to find.
...project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/util/TestSampleQuantiles.java
Show resolved
Hide resolved
import static org.apache.hadoop.test.MetricsAsserts.assertQuantileGauges; | ||
import static org.apache.hadoop.test.MetricsAsserts.getLongCounter; | ||
import static org.apache.hadoop.test.MetricsAsserts.getMetrics; | ||
import static org.apache.hadoop.test.MetricsAsserts.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -413,7 +410,7 @@ public Boolean get() { | |||
final long endWriteValue = getLongCounter("TotalWriteTime", rbNew); | |||
final long endReadValue = getLongCounter("TotalReadTime", rbNew); | |||
assertCounter("ReadTransferRateNumOps", 1L, rbNew); | |||
assertQuantileGauges("ReadTransferRate" + "60s", rbNew, "Rate"); | |||
assertInverseQuantileGauges("ReadTransferRate" + "60s", rbNew, "Rate"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"ReadTransferRate60s"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
9571544
to
19432f5
Compare
19432f5
to
51be61b
Compare
@goiri Thanks for your review. |
@@ -92,27 +93,68 @@ public void testClear() throws IOException { | |||
public void testQuantileError() throws IOException { | |||
final int count = 100000; | |||
Random r = new Random(0xDEADDEAD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we using this random?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, is the shuffle.
Hard to search single letter vars, make it rnd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
public void testInverseQuantiles() throws IOException { | ||
SampleQuantiles inverseQuantilesEstimator = new SampleQuantiles(MutableInverseQuantiles.INVERSE_QUANTILES); | ||
final int count = 100000; | ||
Random r = new Random(0xDEADDEAD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it rnd
; it is hard to find.
inverseQuantilesEstimator.clear(); | ||
|
||
// Insert | ||
for (int j = 0; j < count; j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (int value : values) {
inverseQuantilesEstimator.insert(value);
}
} | ||
|
||
// Do 10 shuffle/insert/check cycles | ||
for (int i = 0; i < 10; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make 10 a constant just to show is NUM_REPEATS or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Collections.shuffle(Arrays.asList(values), r); | ||
estimator.clear(); | ||
|
||
// Insert | ||
for (int j = 0; j < count; j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are at cleaning:
for (int value : values) {
estimator.insert(value);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
3a3fcf1
to
ad11f1b
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
ad11f1b
to
3b06c5b
Compare
💔 -1 overall
This message was automatically generated. |
|
||
@Before | ||
@Before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spacing is wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In one of my commits I had unintentionally added extra space. Removed and fixed it now.
...project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/util/TestSampleQuantiles.java
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
The failed UTs seem flaky and unrelated to the change in this PR. They pass locally. |
As he had comments, I'd like to get an approval from @mkuchenbecker if possible. |
Can we also get a clean build?
Are flaky or actually broken. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
...project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/util/TestSampleQuantiles.java
Show resolved
Hide resolved
public class MutableInverseQuantiles extends MutableQuantiles{ | ||
|
||
@VisibleForTesting | ||
public static final Quantile[] INVERSE_QUANTILES = { new Quantile(0.50, 0.050), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
🎊 +1 overall
This message was automatically generated. |
Got +1 from @mkuchenbecker and the build has passed now as well. @goiri Can you please help merging this PR. Thanks! |
Hi @goiri friendly ping if you can help in merging the PR whenever you get time. Thanks! |
…ic value is better
Description of PR
Currently quantiles are used for latencies, where lower numeric value is better.
Hence p90 gives us a value
val(p90)
such that 90% of our sample set has a value better (lower) thanval(p90)
However for metrics such as calculating transfer rates (eg : HDFS-16917 ) higher numeric value is better. Thus for such metrics the current quantiles dont work.
For these metrics in order for p90 to give a value
val(p90)
where 90% of the sample set is better (higher) thanval(p90)
we need to inverse the selection by choosing a value at the (100 - 90)th location instead of the usual 90th position of the sample size sorted in ascending order.Note: There will be allowable error percentage for percentiles following Cormode, Korn, Muthukrishnan, and Srivastava algorithm
In another approach #5486 I reversed the loop traversal for finding the position. This did not work as expected because at the 1st, 5th, 10th position the allowable error percentage was on the higher side( 10%, 9.5%, 9%). Note that for optimization we dont store all the values in memory anymore and hence for higher allowable errors less values are stored giving us less accurate values at the 1st, 5th positions & 10th positions.
In the new approach, I am making sure that the allowable error at the 1st, 5th, 10th position is less ( 0.1%, 0.5%, 1%) which gives us more accurate values for the inverse quantiles that we are looking for. This can be seen by comparing the UT results in both the PR descriptions.
How was this patch tested?
Results from UT testInverseQuantiles()
Starting run 0
For inverse quantile 0.500000 Expected 50000 with error 5000, estimated 49827
For inverse quantile 0.750000 Expected 25000 with error 2500, estimated 24966
For inverse quantile 0.900000 Expected 10000 with error 1000, estimated 10020
For inverse quantile 0.950000 Expected 5000 with error 500, estimated 5011
For inverse quantile 0.990000 Expected 1000 with error 100, estimated 997
Starting run 1
For inverse quantile 0.500000 Expected 50000 with error 5000, estimated 49947
For inverse quantile 0.750000 Expected 25000 with error 2500, estimated 25038
For inverse quantile 0.900000 Expected 10000 with error 1000, estimated 10015
For inverse quantile 0.950000 Expected 5000 with error 500, estimated 5000
For inverse quantile 0.990000 Expected 1000 with error 100, estimated 992
Starting run 2
For inverse quantile 0.500000 Expected 50000 with error 5000, estimated 49825
For inverse quantile 0.750000 Expected 25000 with error 2500, estimated 25052
For inverse quantile 0.900000 Expected 10000 with error 1000, estimated 9952
For inverse quantile 0.950000 Expected 5000 with error 500, estimated 5038
For inverse quantile 0.990000 Expected 1000 with error 100, estimated 997
Starting run 3
For inverse quantile 0.500000 Expected 50000 with error 5000, estimated 50009
For inverse quantile 0.750000 Expected 25000 with error 2500, estimated 24937
For inverse quantile 0.900000 Expected 10000 with error 1000, estimated 9988
For inverse quantile 0.950000 Expected 5000 with error 500, estimated 4997
For inverse quantile 0.990000 Expected 1000 with error 100, estimated 1009
Starting run 4
For inverse quantile 0.500000 Expected 50000 with error 5000, estimated 49790
For inverse quantile 0.750000 Expected 25000 with error 2500, estimated 24893
For inverse quantile 0.900000 Expected 10000 with error 1000, estimated 9975
For inverse quantile 0.950000 Expected 5000 with error 500, estimated 4984
For inverse quantile 0.990000 Expected 1000 with error 100, estimated 1011
Starting run 5
For inverse quantile 0.500000 Expected 50000 with error 5000, estimated 49836
For inverse quantile 0.750000 Expected 25000 with error 2500, estimated 25047
For inverse quantile 0.900000 Expected 10000 with error 1000, estimated 10033
For inverse quantile 0.950000 Expected 5000 with error 500, estimated 4983
For inverse quantile 0.990000 Expected 1000 with error 100, estimated 1008
Starting run 6
For inverse quantile 0.500000 Expected 50000 with error 5000, estimated 49887
For inverse quantile 0.750000 Expected 25000 with error 2500, estimated 25032
For inverse quantile 0.900000 Expected 10000 with error 1000, estimated 10062
For inverse quantile 0.950000 Expected 5000 with error 500, estimated 5000
For inverse quantile 0.990000 Expected 1000 with error 100, estimated 999
Starting run 7
For inverse quantile 0.500000 Expected 50000 with error 5000, estimated 49911
For inverse quantile 0.750000 Expected 25000 with error 2500, estimated 25008
For inverse quantile 0.900000 Expected 10000 with error 1000, estimated 9981
For inverse quantile 0.950000 Expected 5000 with error 500, estimated 4989
For inverse quantile 0.990000 Expected 1000 with error 100, estimated 1003
Starting run 8
For inverse quantile 0.500000 Expected 50000 with error 5000, estimated 49925
For inverse quantile 0.750000 Expected 25000 with error 2500, estimated 24941
For inverse quantile 0.900000 Expected 10000 with error 1000, estimated 10006
For inverse quantile 0.950000 Expected 5000 with error 500, estimated 5013
For inverse quantile 0.990000 Expected 1000 with error 100, estimated 1005
Starting run 9
For inverse quantile 0.500000 Expected 50000 with error 5000, estimated 50001
For inverse quantile 0.750000 Expected 25000 with error 2500, estimated 24990
For inverse quantile 0.900000 Expected 10000 with error 1000, estimated 9951
For inverse quantile 0.950000 Expected 5000 with error 500, estimated 5009
For inverse quantile 0.990000 Expected 1000 with error 100, estimated 1001
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?