Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/prometheusexporter] accumulate delta temporality histograms #20530

Closed

Conversation

locmai
Copy link
Contributor

@locmai locmai commented Mar 31, 2023

Description:

Rebased the original work from the PR #9006 to introduce the ability to accumulate the delta temporality histograms.

Link to tracking Issue:

nabam and others added 13 commits August 18, 2022 15:23
Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>
Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>
Signed-off-by: Loc Mai <locmai0201@gmail.com>
Signed-off-by: Loc Mai <locmai0201@gmail.com>
Signed-off-by: Loc Mai <locmai0201@gmail.com>
Signed-off-by: Loc Mai <locmai0201@gmail.com>
This reverts commit 21980b454cb0e2ad5a4dcfe35ba4426ef4a41460.

Signed-off-by: Loc Mai <locmai0201@gmail.com>
This reverts commit 08022fc9074a445533ab3cd8db22b8874ab4e00c.

Signed-off-by: Loc Mai <locmai0201@gmail.com>
Signed-off-by: Loc Mai <locmai0201@gmail.com>
Signed-off-by: Loc Mai <locmai0201@gmail.com>
@runforesight
Copy link

runforesight bot commented Mar 31, 2023

Foresight Summary

    
Major Impacts

build-and-test-windows duration(8 seconds) has decreased 31 minutes 46 seconds compared to main branch avg(31 minutes 54 seconds).
View More Details

⭕  build-and-test-windows workflow has finished in 8 seconds (31 minutes 46 seconds less than main branch avg.) and finished at 4th Apr, 2023.


Job Failed Steps Tests
windows-unittest-matrix -     🔗  N/A See Details
windows-unittest -     🔗  N/A See Details

✅  check-links workflow has finished in 46 seconds and finished at 4th Apr, 2023.


Job Failed Steps Tests
changed files -     🔗  N/A See Details
check-links -     🔗  N/A See Details

✅  telemetrygen workflow has finished in 56 seconds and finished at 4th Apr, 2023.


Job Failed Steps Tests
build-dev -     🔗  N/A See Details
publish-latest -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details

✅  changelog workflow has finished in 2 minutes 4 seconds and finished at 4th Apr, 2023.


Job Failed Steps Tests
changelog -     🔗  N/A See Details

✅  build-and-test workflow has finished in 34 minutes 53 seconds (11 minutes 30 seconds less than main branch avg.) and finished at 4th Apr, 2023.


Job Failed Steps Tests
unittest-matrix (1.20, connector) -     🔗  ✅ 126  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, connector) -     🔗  ✅ 126  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, internal) -     🔗  ✅ 583  ❌ 0  ⏭ 0    🔗 See Details
correctness-traces -     🔗  ✅ 17  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.20, internal) -     🔗  ✅ 583  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.20, extension) -     🔗  ✅ 544  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, processor) -     🔗  ✅ 1557  ❌ 0  ⏭ 0    🔗 See Details
correctness-metrics -     🔗  ✅ 2  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.20, processor) -     🔗  ✅ 1557  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, extension) -     🔗  ✅ 544  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.20, receiver-0) -     🔗  ✅ 2616  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, receiver-0) -     🔗  ✅ 2616  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, exporter) -     🔗  ✅ 2512  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, receiver-1) -     🔗  ✅ 1965  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.20, exporter) -     🔗  ✅ 2512  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.20, receiver-1) -     🔗  ✅ 1965  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.20, other) -     🔗  ✅ 4761  ❌ 0  ⏭ 0    🔗 See Details
integration-tests -     🔗  ✅ 56  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, other) -     🔗  ✅ 4761  ❌ 0  ⏭ 0    🔗 See Details
setup-environment -     🔗  N/A See Details
check-collector-module-version -     🔗  N/A See Details
checks -     🔗  N/A See Details
build-examples -     🔗  N/A See Details
check-codeowners -     🔗  N/A See Details
lint-matrix (receiver-0) -     🔗  N/A See Details
lint-matrix (receiver-1) -     🔗  N/A See Details
lint-matrix (processor) -     🔗  N/A See Details
lint-matrix (exporter) -     🔗  N/A See Details
lint-matrix (extension) -     🔗  N/A See Details
lint-matrix (connector) -     🔗  N/A See Details
lint-matrix (internal) -     🔗  N/A See Details
lint-matrix (other) -     🔗  N/A See Details
lint -     🔗  N/A See Details
unittest (1.20) -     🔗  N/A See Details
unittest (1.19) -     🔗  N/A See Details
cross-compile (darwin, amd64) -     🔗  N/A See Details
cross-compile (darwin, arm64) -     🔗  N/A See Details
cross-compile (linux, 386) -     🔗  N/A See Details
cross-compile (linux, amd64) -     🔗  N/A See Details
cross-compile (linux, arm) -     🔗  N/A See Details
cross-compile (linux, arm64) -     🔗  N/A See Details
cross-compile (linux, ppc64le) -     🔗  N/A See Details
cross-compile (windows, 386) -     🔗  N/A See Details
cross-compile (windows, amd64) -     🔗  N/A See Details
build-package (deb) -     🔗  N/A See Details
build-package (rpm) -     🔗  N/A See Details
windows-msi -     🔗  N/A See Details
publish-check -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details
publish-dev -     🔗  N/A See Details

✅  load-tests workflow has finished in 6 minutes 46 seconds (3 minutes 43 seconds less than main branch avg.) and finished at 4th Apr, 2023.


Job Failed Steps Tests
loadtest (TestTraceAttributesProcessor) -     🔗  ✅ 3  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestIdleMode) -     🔗  ✅ 1  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestMetric10kDPS|TestMetricsFromFile) -     🔗  ✅ 6  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestTraceNoBackend10kSPS|TestTrace1kSPSWithAttrs) -     🔗  ✅ 8  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestTraceBallast1kSPSWithAttrs|TestTraceBallast1kSPSAddAttrs) -     🔗  ✅ 10  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestMetricResourceProcessor|TestTrace10kSPS) -     🔗  ✅ 12  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestBallastMemory|TestLog10kDPS) -     🔗  ✅ 18  ❌ 0  ⏭ 0    🔗 See Details
setup-environment -     🔗  N/A See Details

✅  prometheus-compliance-tests workflow has finished in 3 minutes 22 seconds (3 minutes 3 seconds less than main branch avg.) and finished at 4th Apr, 2023.


Job Failed Steps Tests
prometheus-compliance-tests -     🔗  ✅ 21  ❌ 0  ⏭ 0    🔗 See Details

✅  e2e-tests workflow has finished in 12 minutes 18 seconds (1 minute 46 seconds less than main branch avg.) and finished at 4th Apr, 2023.


Job Failed Steps Tests
kubernetes-test (v1.26.0) -     🔗  N/A See Details
kubernetes-test (v1.25.3) -     🔗  N/A See Details
kubernetes-test (v1.24.7) -     🔗  N/A See Details
kubernetes-test (v1.23.13) -     🔗  N/A See Details

🔎 See details on Foresight

*You can configure Foresight comments in your organization settings page.

Signed-off-by: Loc Mai <locmai0201@gmail.com>
@locmai locmai marked this pull request as ready for review March 31, 2023 03:58
@locmai locmai requested a review from a team March 31, 2023 03:58
exporter/prometheusexporter/accumulator.go Outdated Show resolved Hide resolved
exporter/prometheusexporter/accumulator.go Outdated Show resolved Hide resolved
Comment on lines +364 to +378
if current.StartTimestamp().AsTime().Before(prev.StartTimestamp().AsTime()) {
dest.SetStartTimestamp(current.StartTimestamp())
} else {
dest.SetStartTimestamp(prev.StartTimestamp())
}

older := prev
newer := current
if current.Timestamp().AsTime().Before(prev.Timestamp().AsTime()) {
older = current
newer = prev
}

newer.Attributes().CopyTo(dest.Attributes())
dest.SetTimestamp(newer.Timestamp())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the current.Timestamp() is older but current.StartTimestamp() is newer? Or the inverse? I'm not sure it's correct to allow these values to be taken from different data points.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean something like this:

|---prev.StartTimestamp() ------------------------------------ prev.Timestamp(). ---|
           |---current.StartTimestamp() ---- current.Timestamp(). ---|

TBH I'm not sure. We had a similar case for Sums and would consider it a single-write principle violation ( https://opentelemetry.io/docs/reference/specification/metrics/data-model/#sums-detecting-alignment-issues)

With the current implementation, we would take the newer one based on the Timestamp() comparison. Though here are the options we could consider:

If the incoming delta time interval has significant overlap with the previous time interval, we assume a violation of the single-writer principle and can be handled with one of the following options:

  • Simply report the inconsistencies in time intervals, as the error condition could be caused by a misconfiguration.
  • Eliminate the overlap / deduplicate on the receiver side.
  • Correct the inconsistent time intervals by differentiating the given Resource and Attribute set used from overlapping time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need any help here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bitomaxsp, thanks for joining.

The above concern from @Aneurysm9 is what we had left (so far). I don' think I addressed it well. We are discussing about what to do with the case where:

current.Timestamp() is older but current.StartTimestamp() is newer? Or the inverse?

as I understand, this could happened due to the multi-writers or bugs from the clients.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean something like this:

|---prev.StartTimestamp() ------------------------------------ prev.Timestamp(). ---|
           |---current.StartTimestamp() ---- current.Timestamp(). ---|

In this case I think the current implementation would result in dest taking StartTimestamp from prev and Timestamp from current without any indication that it had combined them.

TBH I'm not sure. We had a similar case for Sums and would consider it a single-write principle violation ( https://opentelemetry.io/docs/reference/specification/metrics/data-model/#sums-detecting-alignment-issues)

With the current implementation, we would take the newer one based on the Timestamp() comparison. Though here are the options we could consider:

If the incoming delta time interval has significant overlap with the previous time interval, we assume a violation of the single-writer principle and can be handled with one of the following options:

  • Simply report the inconsistencies in time intervals, as the error condition could be caused by a misconfiguration.
  • Eliminate the overlap / deduplicate on the receiver side.
  • Correct the inconsistent time intervals by differentiating the given Resource and Attribute set used from overlapping time.

I'm not really sure what the right thing to do here is, but I would suggest at least logging a warning if current.StartTimestamp().Before(prev.Timestamp()). I think this situation would indicate an inconsistency in the production of the metrics and shouldn't be silently accepted.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging a warning seems reasonable.
But also should not we drop points that doesn't fit in the time interval (in addition to printing a warning) ?

I have a chance to test it a bit. I put to production and it crashes.

panic: runtime error: index out of range [35] with length 35
...

Thanks for testing this out! It must be that this one is missing out the test cases for some parts. I will take a look at your logs and see what I can do.

Also going to add the warning log as Aneurysm9 commented above

Let me know, i can test it again when you fix it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments below why it's crashing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay my memory come back a little bit here with the 3 cases:

Case 1: Delta-to-cumulative

|prev.StartTimestamp() - prev.Timestamp()|
                                         |current.StartTimestamp() - current.Timestamp()|
|dest.StartTimestamp() ------------------------------------------------ dest.Timestamp()|

We take the accumulated sum of them only if the current.StartTimestamp() == prev.StartTimestamp()

Case 2: Reset

|prev.StartTimestamp() - prev.Timestamp()|
                                                   |current.StartTimestamp() - current.Timestamp()|
                                                   |dest.StartTimestamp() ------- dest.Timestamp()|

Since the current.StartTimestamp() is after the prev.Timestamp() - we understand it as the delta has been reset.

Case 3: Anything else - for example:

|prev.StartTimestamp() --------------------- prev.Timestamp()|
     |current.StartTimestamp() - current.Timestamp()|
|dest.StartTimestamp() --------------------- dest.Timestamp()|

We only keep the latest Timestamp. Which means whatever ended with latest Timestamp() will be kept. Which for this case we would log the warning message out.

Reference the Sum approach: https://opentelemetry.io/docs/reference/specification/metrics/data-model/#sums-delta-to-cumulative

Upon receiving future Delta points, we do the following:
- If the next point aligns with the expected next-time window (see [detecting delta restarts](https://opentelemetry.io/docs/reference/specification/metrics/data-model/#sums-detecting-alignment-issues))
  - Update the “last seen” time to align with the time of the current point.
  - Add the current value to the cumulative counter
  - Output a new cumulative point with the original start time and current last seen time and count.
- If the current point precedes the start time, then drop this point. Note: there are algorithms which can deal with late arriving points.
- If the next point does NOT align with the expected next-time window, then reset the counter following the same steps performed as if the current point was the first point seen.

@bitomaxsp

I went through original Lev's (#9006) PR. I see that this PR doesn't include Sum accumulation. Did you forget it or removed intentionally? I can help there if needed

I did the delta-to-cumulative for the Sum at the same time Lev made the #9006 as I was in need of using it for the Sum/Counter. So for this PR I left that part out as it has been done.

exporter/prometheusexporter/accumulator.go Outdated Show resolved Hide resolved
exporter/prometheusexporter/accumulator.go Outdated Show resolved Hide resolved
Signed-off-by: Loc Mai <locmai0201@gmail.com>
@bitomaxsp
Copy link

Hi. Is there any timeline for this? Any help needed with this change moving forward?

@GustavJaner
Copy link

This would be much needed

@bitomaxsp
Copy link

I have a chance to test it a bit. I put to production and it crashes.

panic: runtime error: index out of range [35] with length 35

goroutine 13 [running]:
go.opentelemetry.io/collector/pdata/pcommon.Float64Slice.At(...)
	/Users/dmitry/go/pkg/mod/go.opentelemetry.io/collector/pdata@v1.0.0-rc9.0.20230424163446-8e33ded10872/pcommon/generated_float64slice.go:60
github.com/northvolt/nvlt-otel-collector/exporter/prometheusexporter.accumulateHistogramValues({0x0?}, {0xf57880?}, {0xc0005e1280?})
	/Users/dmitry/src/github.com/northvolt/nvlt-otel-collector/exporter/prometheusexporter/accumulator.go:358 +0x6f8
github.com/northvolt/nvlt-otel-collector/exporter/prometheusexporter.(*lastValueAccumulator).accumulateHistogram(0xc0000c8660, {0xc0005aa940?}, {0xc0005e1320?}, {0x0?}, {0x0?, 0x0?, 0x19ef5a0?})
	/Users/dmitry/src/github.com/northvolt/nvlt-otel-collector/exporter/prometheusexporter/accumulator.go:266 +0x5e7
github.com/northvolt/nvlt-otel-collector/exporter/prometheusexporter.(*lastValueAccumulator).addMetric(0xc0000c8660, {0x40e3a7?}, {0xc0005e13e8?}, {0x5d66a6?}, {0xc000148001?, 0xc0005e1490?, 0x19ef5a0?})
	/Users/dmitry/src/github.com/northvolt/nvlt-otel-collector/exporter/prometheusexporter/accumulator.go:98 +0x48b
github.com/northvolt/nvlt-otel-collector/exporter/prometheusexporter.(*lastValueAccumulator).Accumulate(0xc0000aa480?, {0x19ef5a0?})
	/Users/dmitry/src/github.com/northvolt/nvlt-otel-collector/exporter/prometheusexporter/accumulator.go:82 +0xcc
github.com/northvolt/nvlt-otel-collector/exporter/prometheusexporter.(*collector).processMetrics(...)
	/Users/dmitry/src/github.com/northvolt/nvlt-otel-collector/exporter/prometheusexporter/collector.go:101
github.com/northvolt/nvlt-otel-collector/exporter/prometheusexporter.(*prometheusExporter).ConsumeMetrics(0xc00016c700, {0xc0005e14d0?, 0x5d6aaa?}, {0x128ff38?})
	/Users/dmitry/src/github.com/northvolt/nvlt-otel-collector/exporter/prometheusexporter/prometheus.go:96 +0x46
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsRequest).Export(0x128ff38?, {0x128ff00?, 0xc0000aa480?})
	/Users/dmitry/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.75.0/exporterhelper/metrics.go:65 +0x34
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send(0xc00059ddd0, {0x1295510, 0xc0000aa120})
	/Users/dmitry/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.75.0/exporterhelper/common.go:208 +0x96
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send(0xc0000b68c0, {0x1295510?, 0xc0000aa120?})
	/Users/dmitry/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.75.0/exporterhelper/queued_retry.go:365 +0x190
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send(0xc0000c64b0, {0x1295510, 0xc0000aa120})
	/Users/dmitry/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.75.0/exporterhelper/metrics.go:136 +0x88
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send(0xc00016c7e0, {0x1295510, 0xc0000aa120})
	/Users/dmitry/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.75.0/exporterhelper/queued_retry.go:301 +0x44c
go.opentelemetry.io/collector/exporter/exporterhelper.NewMetricsExporter.func2({0x128ff38?, 0xc0000c8990}, {0x1?})
	/Users/dmitry/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.75.0/exporterhelper/metrics.go:116 +0xca
go.opentelemetry.io/collector/consumer.ConsumeMetricsFunc.ConsumeMetrics(...)
	/Users/dmitry/go/pkg/mod/go.opentelemetry.io/collector/consumer@v0.75.1-0.20230424163446-8e33ded10872/metrics.go:36
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/resourcetotelemetry.(*wrapperMetricsExporter).ConsumeMetrics(0xc00047b700, {0x128ff38, 0xc0000c8990}, {0x1?})
	/Users/dmitry/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/pkg/resourcetotelemetry@v0.75.0/resource_to_telemetry.go:43 +0x4d
go.opentelemetry.io/collector/processor/batchprocessor.(*batchMetrics).export(0xc0000c8930, {0x128ff38, 0xc0000c8990}, 0x2?, 0x0)
	/Users/dmitry/go/pkg/mod/go.opentelemetry.io/collector/processor/batchprocessor@v0.75.0/batch_processor.go:292 +0x117
go.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).sendItems(0xc0001458f0, 0xc000063744?)
	/Users/dmitry/go/pkg/mod/go.opentelemetry.io/collector/processor/batchprocessor@v0.75.0/batch_processor.go:175 +0x5c
go.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).startProcessingCycle(0xc0001458f0)
	/Users/dmitry/go/pkg/mod/go.opentelemetry.io/collector/processor/batchprocessor@v0.75.0/batch_processor.go:143 +0x185
created by go.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).Start
	/Users/dmitry/go/pkg/mod/go.opentelemetry.io/collector/processor/batchprocessor@v0.75.0/batch_processor.go:101 +0x6a

I fail to understand how is it possible because before going to the loop we check Len equality.
Any ideas what it could be?

I have a suspicion that underlying structure is modified during the function call. If that is the case, then we need t know good pattern how to work with suck types. I am new to that, can maybe some gurus chime in here with a hint? Not sure whom to tag even :(

@locmai
Copy link
Contributor Author

locmai commented Apr 26, 2023

I have a chance to test it a bit. I put to production and it crashes.

panic: runtime error: index out of range [35] with length 35
...

Thanks for testing this out! It must be that this one is missing out the test cases for some parts. I will take a look at your logs and see what I can do.

Also going to add the warning log as Aneurysm9 commented above

@bitomaxsp
Copy link

bitomaxsp commented Apr 27, 2023

I went through original Lev's (#9006) PR. I see that this PR doesn't include Sum accumulation. Did you forget it or removed intentionally? I can help there if needed

@bitomaxsp
Copy link

bitomaxsp commented Apr 27, 2023

Regarding the crash. I found the issue.

From metrics.pb.go:

	// bucket_counts is an optional field contains the count values of histogram
	// for each bucket.
	//
	// The sum of the bucket_counts must equal the value in the count field.
	//
	// The number of elements in bucket_counts array must be by one greater than
	// the number of elements in explicit_bounds array.
	BucketCounts []uint64 `protobuf:"fixed64,6,rep,packed,name=bucket_counts,json=bucketCounts,proto3" json:"bucket_counts,omitempty"`
	// explicit_bounds specifies buckets with explicitly defined bounds for values.
	//
	// The boundaries for bucket at index i are:
	//
	// (-infinity, explicit_bounds[i]] for i == 0
	// (explicit_bounds[i-1], explicit_bounds[i]] for 0 < i < size(explicit_bounds)
	// (explicit_bounds[i-1], +infinity) for i == size(explicit_bounds)
	//
	// The values in the explicit_bounds array must be strictly increasing.
	//
	// Histogram buckets are inclusive of their upper boundary, except the last
	// bucket where the boundary is at infinity. This format is intentionally
	// compatible with the OpenMetrics histogram definition.
	ExplicitBounds []float64 `protobuf:"fixed64,7,rep,packed,name=explicit_bounds,json=explicitBounds,proto3" json:"explicit_bounds,omitempty"`

We should use "ExplicitBounds" instead of BucketCounts probably.
And test should also be fixed to accommodate for that

@Aneurysm9
Copy link
Member

We should use "ExplicitBounds" instead of BucketCounts probably.
And test should also be fixed to accommodate for that

Both are needed. ExplicitBounds describes the boundaries between buckets and BucketCounts describes the values within each bucket. The issue is probably in this code:

	if older.ExplicitBounds().Len() == newer.ExplicitBounds().Len() {
		for i := 0; i < newer.BucketCounts().Len(); i++ {
			if older.ExplicitBounds().At(i) != newer.ExplicitBounds().At(i) {
				match = false
				break
			}
		}

ExplicitBounds().Len() should equal BucketCounts().Len()-1 as there is an implicit bucket for <=Inf. The for loop should probably be checking ExplicitBounds().Len and maybe that it is equal to BucketCounts().Len()-1.

@bitomaxsp
Copy link

We should use "ExplicitBounds" instead of BucketCounts probably.
And test should also be fixed to accommodate for that

Both are needed. ExplicitBounds describes the boundaries between buckets and BucketCounts describes the values within each bucket. The issue is probably in this code:

	if older.ExplicitBounds().Len() == newer.ExplicitBounds().Len() {
		for i := 0; i < newer.BucketCounts().Len(); i++ {
			if older.ExplicitBounds().At(i) != newer.ExplicitBounds().At(i) {
				match = false
				break
			}
		}

ExplicitBounds().Len() should equal BucketCounts().Len()-1 as there is an implicit bucket for <=Inf. The for loop should probably be checking ExplicitBounds().Len and maybe that it is equal to BucketCounts().Len()-1.

That fix i already did in my local branch and tested in prod. Works

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label May 26, 2023
@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@hkfgo
Copy link
Contributor

hkfgo commented Jun 28, 2023

Hey @bitomaxsp @Aneurysm9 and anyone else who's watching, I've continued @locmai work in a new PR. He couldn't find a way to reopen this PR, hence a new PR.

djaglowski pushed a commit that referenced this pull request Jan 8, 2024
…#23790)

**Description:** <Describe what has changed.>
This continues the work done in the now closed
[PR](#20530).
I have addressed issues raised in the original PR by
- Adding logic to handle timestamp misalignments 
- Adding fix + a out-of-bounds bug  

In addition, I have performed end-to-end testing in a local setup, and
confirmed that accumulated histogram time series are correct.

**Link to tracking Issue:** <Issue number if applicable>

#4968

#9006

#19153
**Testing:** <Describe what testing was performed and which tests were
added.>
Added tests for timestamp misalignment and an out-of-bounds bug
discovered in the previous PR.
End-to-end testing to ensure histogram bucket counts exported to
Prometheus are correct

---------

Signed-off-by: Loc Mai <locmai0201@gmail.com>
Signed-off-by: xchen <xchen@axon.com>
Signed-off-by: stephenchen <x.chen1016@gmail.com>
Co-authored-by: Lev Popov <nabam@nabam.net>
Co-authored-by: Lev Popov <leo@nabam.net>
Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>
Co-authored-by: Loc Mai <locmai0201@gmail.com>
Co-authored-by: Alex Boten <aboten@lightstep.com>
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this pull request Jan 10, 2024
…open-telemetry#23790)

**Description:** <Describe what has changed.>
This continues the work done in the now closed
[PR](open-telemetry#20530).
I have addressed issues raised in the original PR by
- Adding logic to handle timestamp misalignments 
- Adding fix + a out-of-bounds bug  

In addition, I have performed end-to-end testing in a local setup, and
confirmed that accumulated histogram time series are correct.

**Link to tracking Issue:** <Issue number if applicable>

open-telemetry#4968

open-telemetry#9006

open-telemetry#19153
**Testing:** <Describe what testing was performed and which tests were
added.>
Added tests for timestamp misalignment and an out-of-bounds bug
discovered in the previous PR.
End-to-end testing to ensure histogram bucket counts exported to
Prometheus are correct

---------

Signed-off-by: Loc Mai <locmai0201@gmail.com>
Signed-off-by: xchen <xchen@axon.com>
Signed-off-by: stephenchen <x.chen1016@gmail.com>
Co-authored-by: Lev Popov <nabam@nabam.net>
Co-authored-by: Lev Popov <leo@nabam.net>
Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>
Co-authored-by: Loc Mai <locmai0201@gmail.com>
Co-authored-by: Alex Boten <aboten@lightstep.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[exporter/prometheus] collector do not support metric histogram with temporality = DELTA
7 participants