Add distribution metric to semantic-core and ffwd-reporter#79
Add distribution metric to semantic-core and ffwd-reporter#79ao2017 merged 13 commits intospotify:masterfrom ao2017:adele/distribution
Conversation
core/src/main/java/com/spotify/metrics/core/codahale/metrics/ext/Distribution.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/spotify/metrics/core/SemanticMetricRegistryListenerV2.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/spotify/metrics/core/codahale/metrics/ext/Distribution.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/spotify/metrics/core/codahale/metrics/ext/Distribution.java
Outdated
Show resolved
Hide resolved
Hey @ao2017 - could you provide a Use Case-like description please? With links to the actual US/Issue and supporting links e.g. T-digest, codehale. Also, I'm quite confused about |
| /** | ||
| * A no-op implementation of {@link MetricRegistryListener}. | ||
| */ | ||
| abstract class Base implements MetricRegistryListener { |
There was a problem hiding this comment.
can we call this NoOpMetricRegistryListener, or something else meaningful?
There was a problem hiding this comment.
That 's the original name Let me see if changing the name won't brake anything.
core/src/main/java/com/spotify/metrics/core/codahale/metrics/ext/MetricRegistryListener.java
Outdated
Show resolved
Hide resolved
| * @param name the meter's name | ||
| * @param derivingMeter the meter | ||
| */ | ||
| void onDerivingMeterAdded(String name, DerivingMeter derivingMeter); |
There was a problem hiding this comment.
is a distribution a Deriving Meter? I can't quite tell. Here's the definition of a deriving meter:
* A meter that takes the derivative of a value that is expected to be almost monotonically
* increasing. A typical use case is to get the rate of change of a counter of the total number of
* occurrences of something.
* <p>
* Implementations will ignore updates that are a decrease of the counter value. The rationale is
* that the counter is expected to be monotonically increasing between infrequent resets (such as
* when a process has been restarted). Thus, negative values should only happen on restart, and it
* should be safe to discard those.There was a problem hiding this comment.
Nope, it is not. Distribution captures the recorded data distribution. It is used to compute percentile in a distributed environment. Can one use distribution to compute rate ? I doubt it won't be accurate since distribution is just capturing an approximation not the actual count.
There was a problem hiding this comment.
How do distributions relate to histograms? https://github.com/infusionsoft/yammer-metrics/blob/master/metrics-core/src/main/java/com/codahale/metrics/Histogram.java
Do we need to have both? Is the newly added Distribution a superset of histogram?
There was a problem hiding this comment.
So distribution is a name for the type of Histogram that computes rank related stat using actual data distribution.
Dropwizard histogram doesn't support that feature. For more info, take a look at this issue spotify/heroic#476
Do we need to have both ?
At the moment we think yes. Some users may have a preference for local histogram. And I think it is the standard in the industry to support both at least DataDog and OpenTelemetry support both types.
|
|
||
|
|
||
| @Override | ||
| public void onDerivingMeterAdded(String name, DerivingMeter derivingMeter) { |
There was a problem hiding this comment.
I'm curious about onHistogramRemoved() and onHistogramAdded() above. Are they a different "kind" of histogram or something?
There was a problem hiding this comment.
Sorry I don't understand this question :).
| } | ||
|
|
||
| /** | ||
| * Returns the current count. |
There was a problem hiding this comment.
not terribly useful :) Either remove or explain what's being counted, cheers.
There was a problem hiding this comment.
Ok, something is better than guessing :) . I will update.
| private final AtomicReference<TDigest> distRef; | ||
|
|
||
| protected DistributionImpl() { | ||
| this.distRef = new AtomicReference<>(TDigest.createDigest(COMPRESSION_DEFAULT_LEVEL)); |
There was a problem hiding this comment.
what's the rationale behind using an AtomicReference? With AtomicRefernce, the only guarantee is that this reference will be updated/set in a thread-safe manner and AFAICT we don't update it...?
Also, AFAICT, multiple threads could still overwrite this.distRef here, no?
There was a problem hiding this comment.
That's ok, As long as each thread has its own copy we are good.
ffwd-reporter/src/main/java/com/spotify/metrics/ffwd/FastForwardReporter.java
Show resolved
Hide resolved
ffwd-reporter/src/main/java/com/spotify/metrics/ffwd/FastForwardReporter.java
Show resolved
Hide resolved
core/src/main/java/com/spotify/metrics/core/SemanticMetricRegistry.java
Outdated
Show resolved
Hide resolved
|
|
||
| @Override | ||
| public void record(double val) { | ||
| distRef.get().add(val); |
There was a problem hiding this comment.
TDigest.add() does not appear to be threadsafe
There was a problem hiding this comment.
Yeah, I am looking into this. I checked couple of Dropwizard metrics there are threadsafe so I will make this thread safe.
There was a problem hiding this comment.
I think we could either
- add synchronized around it and hope it's enough.
- Use ThreadLocal and then merge all the thread local entries somehow inside getValueAndFlush
- Push to a queue and have another thread popping them off and adding to the TDigest object
There was a problem hiding this comment.
Please check the implementation I added yesterday. I thought about adding threadLocal then change mind. ThreadLocal may introduce memory leak if getValueAndFlush isn't called often.
There was a problem hiding this comment.
Yes, that should work now I think (but it's hard to know if there will be much contention, some multithreaded JMH benchmarks might be useful).
There was a problem hiding this comment.
Ok, sure will do. Please take a look I need a +1
mattnworb
left a comment
There was a problem hiding this comment.
(@malish8632 asked me to take a look since I've worked with these classes before)
The main question I have is - what is a Distribution? I think the initial PR summary is missing the motivation behind the addition or what it means to achieve (except adding new classes).
Is it necessary to have a separate interface and implementation for Distribution and DistributionImpl, when the other core types (Timer, Meter, Histogram) don't follow this pattern? Those classes implement interfaces, but those are more generic interfaces that have multiple implementations (i.e. Timer and Meter are implementations of Metered) which allows for abstraction in some places (and code reuse). I don't see the similar need for Distribution unless you plan to add more implementations of it later on, in which case DistributionImpl could use a more descriptive name.
A few suggestions/questions below - I don't think you need to go to the trouble of declaring V2 interfaces or new packages, it'd be simpler to update the version of the library to indicate changes to the API of SemanticMetricRegistry and friends.
core/src/main/java/com/spotify/metrics/core/SemanticMetricRegistryListenerV2.java
Outdated
Show resolved
Hide resolved
| * under the License. | ||
| */ | ||
|
|
||
| package com.spotify.metrics.core.codahale.metrics.ext; |
There was a problem hiding this comment.
why add the new Distribution type to this package, instead of com.spotify.metrics.core? There are interfaces in core that refer to this type, so I think it would be natural to have it live in com.spotify.metrics.core as well
If you want to add it in a subpackage, I'd still recommend against this specific name, its not like any classes live in or com.spotify.metrics.core.codahale com.spotify.metrics.core.codahale.metrics
There was a problem hiding this comment.
@mattnworb I don't see how SemanticMetricRegistry.addListener(SemanticMetricRegistryListener). is broken with this PR.
There was a problem hiding this comment.
I think I agree with Matt - could we use com.spotify.metrics.core instead?
core/src/main/java/com/spotify/metrics/core/codahale/metrics/ext/Distribution.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/spotify/metrics/core/codahale/metrics/ext/MetricRegistryListener.java
Outdated
Show resolved
Hide resolved
ffwd-reporter/src/main/java/com/spotify/metrics/ffwd/FastForwardReporter.java
Show resolved
Hide resolved
| stop(); | ||
| } | ||
|
|
||
| public static final class Builder { |
There was a problem hiding this comment.
It'd probably be better to make separate PRs for changes like reordering the structure of the class or whitespace changes, as it makes the git history harder to follow if its combined with this PR
There was a problem hiding this comment.
+1 to this comment - pls separate this into diff PR
1. Remove SemanticMetricRegistryListenerV2
2. Added Default methods to SemanticMetricRegistryListener
3. We understand that there is a chance for SemanticMetricRegistry to call a no op methods.
This can lead to confusion hopefully users of this lib will look at the source code or documentation.
core/src/main/java/com/spotify/metrics/core/SemanticMetricRegistry.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/spotify/metrics/core/SemanticMetricRegistryListener.java
Show resolved
Hide resolved
malish8632
left a comment
There was a problem hiding this comment.
Looks much better. Some changes should help simplify this PR and clarify type packages.
…tricRegistry.getDistribution
mattnworb
left a comment
There was a problem hiding this comment.
thanks for resolving all of my earlier comments, and I find the PR summary very helpful now - thanks for both of those.
I have a few small questions below
core/src/main/java/com/spotify/metrics/core/SemanticMetricDistribution.java
Show resolved
Hide resolved
| * is reset and a new recording starts. | ||
| * @return | ||
| */ | ||
| ByteBuffer getValueAndFlush(); |
There was a problem hiding this comment.
I am still curious about the idea of exposing the "value" of the Distribution as a ByteBuffer. It seems like it would couple this Distribution metric to the way in which the t-digest library serializes the underlying com.tdunning.math.stats.TDigest into bytes.
For instance, if there was an alternate implementation of Distribution in the future, it would have to return the values in the same byte format, since FastForwardReporter has the expectation that the results of getValueAndFlush() must look a certain way.
Is there any other data type that could be used? Or at least could the format of the bytes be documented here?
There was a problem hiding this comment.
If we use Distribution as the return type then the serializer will have to know something about Distribution implementation. For instance, FFW reporter will have an implementation of this method for every Distribution type.
https://github.com/spotify/semantic-metrics/pull/79/files/4ae53fe81b12e35376529ab79184798e982e7ade#diff-455c8027b591558e13ad0a4d245b2518R265. That will not align with the current paradigm of not exposing the detail of the metric implementation to the serializer.
Documentation for bytes format is handled by FFW. There is a version number, we will use that number upstream to determine how to handle the bytes.
There was a problem hiding this comment.
- Return something immutable like ByteString
- Instead of ByteString, return something more semantically meaningful like a DistributionSnapshot wrapper
There was a problem hiding this comment.
- I am not sure about DistributionSnapshot, that will expose Distribution implementation to ffwReporter as noted above.
- Let's merge this branch. Bytebuffer is transformed into ByteString before serialization in ffw-client. I will change the return type later.
| private void reportDistribution(final com.spotify.ffwd.v1.Metric metric, | ||
| final Distribution distribution) { | ||
| ByteBuffer byteBuffer = distribution.getValueAndFlush(); | ||
| Value value = Value.distributionValue(byteBuffer); |
There was a problem hiding this comment.
I just noticed this PR: spotify/ffwd-client-java#10
Can we update the ffwd API to use ByteString instead of ByteBuffer?
I think it should be safe to change that if no one is using the distribution types yet (since they are not implemented yet)
There was a problem hiding this comment.
If you look at the entire PR you will see that ByteBuffer is transformed into ByteString before serialization. Moving byteString up in the stack is something we can do in the next PR.
| @@ -0,0 +1,88 @@ | |||
| /* | |||
| * Copyright (c) 2016 Spotify AB. | |||
| * Just get an instance through SemanticMetricBuilder and record data. | ||
| * | ||
| * <p> {@link Distribution} is a good choice if you care about percentile accuracy in | ||
| * a distributed environment and you want to rely on P99 to set SLO. |
| */ | ||
| public final class SemanticMetricDistribution implements Distribution { | ||
|
|
||
| private static final int COMPRESSION_DEFAULT_LEVEL = 100; |
There was a problem hiding this comment.
Might we want to change this level? Should it be configuration-driven perhaps?
There was a problem hiding this comment.
I think for the time being it is better to keep every aspect of Tdigest private. Power users have the ability to extend this class and use a different compression level. The typical use case 100 is ok.
| @Override | ||
| public void onDerivingMeterRemoved(MetricId name) { | ||
| } | ||
|
|
There was a problem hiding this comment.
I'm still not a fan of this class name. Is it convention?
| private TagExtractor tagExtractor; | ||
| private ScheduledExecutorService executorService; | ||
|
|
||
| private Set<Percentile> histogramPercentiles = |
There was a problem hiding this comment.
Why these particular numbers? Can we comment or const them please.
| public Builder histogramQuantiles(double... quantiles) { | ||
| histogramPercentiles = new HashSet<>(); | ||
| for (double q : quantiles) { | ||
| histogramPercentiles.add(new Percentile(q)); |
There was a problem hiding this comment.
Is there a reason we're not validating that the double is in range?
PR raison d'être
We are adding distribution support to resolve 2 problems.
1. Accurate Aggregated Histogram Data.
Currently percentile data provided by heroic are computed locally.
It is practically impossible to aggregate that data.
With this PR we are introducing a mechanism to record data and send data sketches
to heroic. A sketch of a dataset is a small data structure that let you approximate certain characteristics of the original
dataset. We use sketches to compute rank based statistics such as percentile. Sketches are mergeable and can be used
to compute any percentile on the entire data distribution.
2. Support sophisticated data point value.
We currently have many libraries that emit Open-census metrics. We cannot export Open-census metric to heroic because heroic supports only double data point value. With the introduction of Distribution we will be able to support sophisticated data point value such as Open-census metric distribution.
What's new ?
This PR introduces Distribution. Unlike most metric supported by SemanticMetric this metric is not a DropWizard metric.
However, Distribution implements DropWizard Metric and Counting interface. This is done so we can manage Distribution metric through Semantic Registry which is an extension of DropWizard Registry.
This PR also introduces SemanticMetricDistribution. SemanticMetricDistribution is an implementation of Distribution backed by Tdigest. Check the links below for the rational of using Tdigest.
This PR also refactors FastForwardReporter to add distribution support.
All efforts were made to ensure compatibility. This changes should be an Opt-in for Heroic Users. They shouldn't have to make any change to existing code unless they want to use distribution. At least that is the goal.
Related Issues and PR's
spotify/heroic#476
spotify/ffwd-client-java#13
#78
spotify/heroic#673
spotify/heroic#677