Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: Delegation broken for global meterproviders #5827

Closed
dashpole opened this issue Sep 17, 2024 · 5 comments · Fixed by #5828 or #5881
Closed

Regression: Delegation broken for global meterproviders #5827

dashpole opened this issue Sep 17, 2024 · 5 comments · Fixed by #5828 or #5881
Assignees
Labels
bug Something isn't working
Milestone

Comments

@dashpole
Copy link
Contributor

From #5753 (comment) reported by @Jesse-Bonfire

It appears that this change has broken metrics that are registered before otel.SetMeterProvider is called. We are getting into a spot where the delegate on the siUpDownCounter isn't getting set as it looks like with the new map we are creating new ones and throwing away the old. But the old counter has been registered with meter.RegisterCallback
https://github.com/open-telemetry/opentelemetry-go/blob/42fd8fe325c125a40a21796c3f14b599e5e5f586/internal/global/meter.go#L175C8-L175C9

I am pretty new to Otel and Go but this change seems to break the connection pools convention outlined here.
https://github.com/open-telemetry/semantic-conventions/blob/422eeb71f40c73d1b195723cc1cffeb09abf2391/docs/database/database-metrics.md#connection-pools

Currently I am seeing this when instrumenting a Redis Cluster as it creates instrumentation for the connection pool to each node in the cluster. https://github.com/redis/go-redis/blob/233f97accde15bcb2e3d86ee744109b4dce34171/extra/redisotel/metrics.go#L85

This problem is removed if we apply the instrumentation after we call otel.SetMeterProvider because the delegate just creates the counter. Not sure the best way to proceed.

I am able to reproduce the exception with the following test case

package bench

import (
	"context"
	"testing"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/metric"
	sdkmetric "go.opentelemetry.io/otel/sdk/metric"
)

const (
	meterName  = "foo"
	metricName = "bar"
)

func BenchmarkSimple(b *testing.B) {
	b.ReportAllocs()
	meter := otel.Meter(meterName)
	for i := 0; i < 5; i++ {
		m, _ := meter.Int64ObservableUpDownCounter(metricName)
		meter.RegisterCallback(func(ctx context.Context, o metric.Observer) error {
			o.ObserveInt64(m, int64(10))
			return nil
		}, m)
	}
	p1 := sdkmetric.NewMeterProvider()
	otel.SetMeterProvider(p1)
}
@dashpole dashpole added the bug Something isn't working label Sep 17, 2024
@Jesse-Bonfire
Copy link
Contributor

I think that I have a fix but I don't know the underlying intention to know if it is the correct way to solve the problem. I will post a PR in a few hours so there can be some discussion.

@jmacd
Copy link
Contributor

jmacd commented Oct 9, 2024

I am not convinced that the test for this fix in #5828 is completely functional. I will present the evidence in this issue.

@jmacd
Copy link
Contributor

jmacd commented Oct 9, 2024

Note that the in internal/global/meter.go, there is an interface named wrapped with an unwrap() method.

unwrap() metric.Observable

Note that in instruments.go, there is an interface named unwrapper with an Unwrap() method.

Unwrap() metric.Observable

@jmacd
Copy link
Contributor

jmacd commented Oct 10, 2024

I have a test to reproduce the problem. At this point, I have confirmed that #5828 addressed a crash, but it does not address the feature which allows delegation to an alternative metric SDK.

@jmacd
Copy link
Contributor

jmacd commented Oct 10, 2024

False alarm. See #5881 (comment)

@jmacd jmacd closed this as completed Oct 10, 2024
dashpole added a commit that referenced this issue Oct 11, 2024
### Added

- Add `go.opentelemetry.io/otel/sdk/metric/exemplar` package which
includes `Exemplar`, `Filter`, `TraceBasedFilter`, `AlwaysOnFilter`,
`HistogramReservoir`, `FixedSizeReservoir`, `Reservoir`, `Value` and
`ValueType` types. These will be used for configuring the exemplar
reservoir for the metrics sdk. (#5747, #5862)
- Add `WithExportBufferSize` option to log batch processor.(#5877)

### Changed

- Enable exemplars by default in `go.opentelemetry.io/otel/sdk/metric`.
Exemplars can be disabled by setting
`OTEL_METRICS_EXEMPLAR_FILTER=always_off` (#5778)
- `Logger.Enabled` in `go.opentelemetry.io/otel/log` now accepts a newly
introduced `EnabledParameters` type instead of `Record`. (#5791)
- `FilterProcessor.Enabled` in
`go.opentelemetry.io/otel/sdk/log/internal/x` now accepts
`EnabledParameters` instead of `Record`. (#5791)
- The `Record` type in `go.opentelemetry.io/otel/log` is no longer
comparable. (#5847)
- Performance improvements for the trace SDK `SetAttributes` method in
`Span`. (#5864)
- Reduce memory allocations for the `Event` and `Link` lists in `Span`.
(#5858)
- Performance improvements for the trace SDK `AddEvent`, `AddLink`,
`RecordError` and `End` methods in `Span`. (#5874)

### Deprecated

- Deprecate all examples under `go.opentelemetry.io/otel/example` as
they are moved to [Contrib
repository](https://github.com/open-telemetry/opentelemetry-go-contrib/tree/main/examples).
(#5854)

### Fixed

- The race condition for multiple `FixedSize` exemplar reservoirs
identified in #5814 is resolved. (#5819)
- Fix log records duplication in case of heterogeneous resource
attributes by correctly mapping each log record to it's resource and
scope. (#5803)
- Fix timer channel drain to avoid hanging on Go 1.23. (#5868)
- Fix delegation for global meter providers, and panic when calling
otel.SetMeterProvider. (#5827)
- Change the `reflect.TypeOf` to use a nil pointer to not allocate on
the heap unless necessary. (#5827)
jmacd added a commit to lightstep/otel-launcher-go that referenced this issue Oct 15, 2024
**Description:** Updates to OTel-Go 1.31 require minor LS Metric SDK
fixes (see Unwrap()), discussed and fixed in the upstream PR
open-telemetry/opentelemetry-go#5881. The
Unwrap() calls here are to work around the problem until fully fixed
upstream.

**Link to tracking Issue:**
open-telemetry/opentelemetry-go#5827 explains
how we got here.

Part of #794.

**Testing:** Incomplete. Existing tests were logging messages about
foreign instruments in the asynchronous call path, but nothing failed as
a result.
MrAlias pushed a commit that referenced this issue Oct 18, 2024
)

~Two defects are fixed here. However, note that async instrument
delegation appears to have been broken a long time.~

Internalizes and tests the behavior of the Global MeterProvider. This
moves the call to `Unwrap()` out of the SDK, fully concealing it within
the internal/global package (using an un-exported method).

This adds a test for the new functionality. While this test is not
comprehensive, because it doesn't test every instrument variation, it
explicitly tests that both the NewCallback function and the Observe
functions receive objects constructed by the alternate SDK.

Fixes #5827

---------

Co-authored-by: David Ashpole <dashpole@google.com>
Co-authored-by: Robert Pająk <pellared@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants