-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTLP Metrics update #162
OTLP Metrics update #162
Conversation
Please review the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Nothing blocking, I'm interested in moving forward with this approach. Likely iterations on documentation, but structurally it looks good.
// Remove the labels, start_time, and time TODO. | ||
// | ||
// The cumulative value is over the time interval (start_time_unix_nano, time_unix_nano]. | ||
// Value is UNIX Epoch time in nanoseconds since 00:00:00 UTC on 1 January 1970. | ||
// fields: Add min, max, last fields | ||
// as described in https://github.com/open-telemetry/oteps/pull/117 | ||
// | ||
// Value of 0 indicates that the timestamp is unspecified. In that case the timestamp | ||
// may be decided by the backend. | ||
fixed64 start_time_unix_nano = 2; | ||
// (Same comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done?
// time_unix_nano is the moment when this value was recorded. | ||
// Value is UNIX Epoch time in nanoseconds since 00:00:00 UTC on 1 January 1970. | ||
fixed64 time_unix_nano = 3; | ||
// (Same comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing TODO?
I remembered the problem with using one DataPoint. The main argument was overall memory usage (not number of allocations). For one metric you have a lot of data points and now each data point requires more space (even when using pointers for Histogram and Summary). For example for a int metric with 10 data points you have 10*3(int64 If using pointers for Histogram and Summary) total memory allocated. Every DataPoint has 3 int64 extra now. |
I see. This makes me wonder about using a custom decoder, encoder, and representation of metric data as it passes through the collector. It least I'm much happier to deal with a performance concern than a philosophical one. Thanks, I'll evaluate. |
// application called the API, as opposed to the SDK calling the | ||
// application via a callback. When set, SYNCHRONOUS indicates | ||
// that there may be an associated trace context, otherwise | ||
// data points are considered asynchronous. Asynchronous Counts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What Count
does this redder to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I clarified that this refers to DataPoint.Histogram.Count
or DataPoint.Summary.Count
.
// MONOTONIC indicates that the calculated sum can be monitored | ||
// as a rate of change. | ||
// | ||
// Synchronicity applies to data points produced when the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this consumed by a backend? If the data are coming from other library like Prometheus how would I know what to set here? They define exemplars as well and I don't know in advance if a metric will or not have exemplars associated (in the collector). Is this something that we really need or we can remove to not complicate translation from other Protocols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote this as:
// Synchronicity applies to data points produced when the
// application called the API, as opposed to the SDK calling the
// application via a callback. When set, SYNCHRONOUS indicates
// that there may be an associated trace context. The rate of
// synchronous events has a meaningful interpretation, when
// considering the DataPoint.Histogram.Count or
// DataPoint.Summary.Count fields, as these are true
// application-level events.
//
// Data points without SYNCHRONOUS are considered asynchronous.
// The DataPoint.Histogram.Count and DataPoint.Summary.Count
// fields record a number of observations, which is controlled
// both by the number of distinct label sets observed and by the
// rate of collection. Because asynchronous metric Count fields
// depend on the rate of collection, the rate of these metrics
// should not be interpreted as rates (however these Counts may
// still be used to form an average value of these metrics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the data are coming from other library like Prometheus how would I know what to set here?
Hopefully SYNCHRONOUS is always a safe option. This makes me think that the optional flag should be ASYNCHRONOUS, and it will only be set when for sure the value was generated by a variable-rate callback controlled by the SDK, not user-level events. How's this sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels to me that this information cannot be guaranteed, and would like to challenge if this is an important and well defined property that should be in the protocol. Maybe consider to be removed instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we cannot include this information, the entire part of the specification that deals with asynchronous instrument semantics was for nothing. I think when ASYNCHRONOUS is true it is guaranteed.
I did some performance analysis: open-telemetry/opentelemetry-collector#1346 It led me to think this PR is OK as it stands and that we can proceed as follows: |
@jmacd do you have the results of the performance test you could post? |
@MrAlias I performed a hack-and-slash operation on Tigran's repository, branch here. I took a number of shortcuts to avoid creating build rules that would automatically run protoc and produce the generated artifacts (since this is a tough problem and I feel time pressure). What I did instead was to check-out two branches of opentelemetry-proto, run the same protoc/protoc-gen-go on both, and use go.mod replacements to build both dependencies in a single binary:
There are a lot of parameters we could study to compare performance. First I'll post a sample comparing the two protocols with small requests consisting of one metric, one point, 10 labels per metric, 3 values per label:
The protocols are HTTP with 1 thread (http11) and 10 threads (http11conc). There are three aggregations tested, int64, double, and MMLSC. You can see for example comparing http11conc/experiment/mmlsc/10 vs http11conc/otlp/mmlsc/10: a comparison of 121.0 cpuμs/point vs 124.2 cpuμs/point for the concurrent mmlsc experiment, or a loss of 2.5% in throughput. I will post memory profiles for the same example next, then I'll post throughput results for a greater number of points per metric. |
The same test with 10 points per metric, reducing the number of batches to keep approximately the same wall time of these tests:
Here the throughput difference is negligible, I suspect because label encoding becomes the dominant cost of transmitting metrics. Potentially I've used a larger number of labels than is realistic? I think that varies widely. I'm still reaching the conclusion that the detail we were focused on its relatively minor from a throughput perspective. Next I will share memory profile results. |
I repeated a specific pair of tests with a larger number of metrics per batch, which reduces the overhead of HTTP requests in the benchmark. Here with 100 metrics per batch (above it was 1 / batch):
Now this shows a 15% loss of performance with the experimental change. (That's bad!) Here's the pair of For the baseline (OTLP+MMLSC):
and for the EXPERIMENT, which as you see allocates 624M vs 581M times (7% more memory allocations by count):
The use of MMLSC for this test is indicative of what we'd see across the board with a proper protobuf |
@jmacd I'd advise to run with larger batches to see the effect of your changes more easily. For tiny batches the performance profile is likely dominated by per-request constant overhead and makes less visible the serialization performance differences which I assume is what you are optimizing here. |
For the same data described above, the For OTLP (baseline):
For the experiment:
|
Yes, I think the 100/batch test shows enough of a problem that I'm going to revert the structural change. |
|
||
// GROUPING structure means the value has been computed by combining | ||
// individual values in a meaningful aggregation. GROUPING structure | ||
// implies the sum of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incomplete sentence?
Closing this. I will factor apart the Kind and ValueType changes into a separate change, develop the MMLSC change independently, and I will recommend @cnnradams re-attempt the RawValue portion of this PR afterwards. |
This is a substantial refactoring. The highlights:
DataPoint
type containing the three common fieldsResolves #125
Resolves #124
Resolves #81
Resolves #78
Partially implemented: open-telemetry/opentelemetry-go#884
This PR is a draft mainly because there are still a number of comments that have to be fixed in the lower half of the file.