Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify compliant metric SDK specification implementation: MeterProvider/Observations inside asynchronous callbacks #3652

Closed
2 tasks done
Tracked by #3674
MrAlias opened this issue Feb 3, 2023 · 8 comments
Assignees
Labels
area:metrics Part of OpenTelemetry Metrics pkg:SDK Related to an SDK package

Comments

@MrAlias
Copy link
Contributor

MrAlias commented Feb 3, 2023

  • Identify all the normative requirements, recommendations, and options the specification defines as comments to this issue
  • Ensure the current metric SDK implementation is compliant with these normative requirements, recommendations, and options in those comments.
@MrAlias MrAlias added pkg:SDK Related to an SDK package area:metrics Part of OpenTelemetry Metrics labels Feb 3, 2023
@MrAlias MrAlias self-assigned this Jun 1, 2023
@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 1, 2023

Callback functions MUST be invoked for the specific MetricReader performing collection, such that observations made or produced by executing callbacks only apply to the intended MetricReader during collection.

The SDK does not look compliant with this:

func TestMeterProviderMixingOnRegisterErrors(t *testing.T) {
	otel.SetLogger(testr.New(t))

	rdr0 := NewManualReader()
	mp0 := NewMeterProvider(WithReader(rdr0))

	rdr1 := NewManualReader()
	mp1 := NewMeterProvider(WithReader(rdr1))

	// Meters with the same scope but different MeterProviders.
	m0 := mp0.Meter("TestMeterProviderMixingOnRegisterErrors")
	m0Ctr, err := m1.Float64ObservableCounter("float64 ctr")
	require.NoError(t, err)

	m1 := mp1.Meter("TestMeterProviderMixingOnRegisterErrors")
	m1Ctr, err := m1.Int64ObservableCounter("int64 ctr")
	require.NoError(t, err)

	_, err = m0.RegisterCallback(
		func(_ context.Context, o metric.Observer) error {
			o.ObserveFloat64(m0Ctr, 2)
			// Observe an instrument from a differnt MeterProvider.
			o.ObserveInt64(m1Ctr, 1)

			return nil
		},
		m0Ctr, m1Ctr,
	)
	assert.Error(
		t,
		err,
		"Instrument registered with Meter from different MeterProvider",
	)

	var data metricdata.ResourceMetrics
	_ = rdr0.Collect(context.Background(), &data)
	// Only the metrics from mp0 should be produced.
	assert.Len(t, data.ScopeMetrics, 1)

	err = rdr0.Collect(context.Background(), &data)
	assert.NoError(t, err, "Errored when collect should be a noop")
	assert.Len(
		t, data.ScopeMetrics, 0,
		"Metrics produced for instrument collected by different MeterProvider",
	)
}
go test ./...
?   	go.opentelemetry.io/otel/sdk/metric/metricdata	[no test files]
--- FAIL: TestMeterProviderMixingOnRegisterErrors (0.00s)
    provider_test.go:111:
        	Error Trace:	/home/tyler/go/src/go.opentelemetry.io/otel/sdk/metric/provider_test.go:111
        	Error:      	An error is expected but got nil.
        	Test:       	TestMeterProviderMixingOnRegisterErrors
        	Messages:   	Instrument registered with Meter from different MeterProvider
    provider_test.go:124:
        	Error Trace:	/home/tyler/go/src/go.opentelemetry.io/otel/sdk/metric/provider_test.go:124
        	Error:      	"[{{TestMeterProviderMixingOnRegisterErrors  } [{float64 ctr   {[{{{[]}} 2023-06-01 13:29:57.873249248 -0700 PDT m=+0.021975681 2023-06-01 13:29:57.873306387 -0700 PDT m=+0.022032821 %!s(float64=2) []}] CumulativeTemporality %!s(bool=true)}}]}]" should have 0 item(s), but has 1
        	Test:       	TestMeterProviderMixingOnRegisterErrors
        	Messages:   	Metrics produced for instrument collected by different MeterProvider
FAIL
FAIL	go.opentelemetry.io/otel/sdk/metric	0.026s
ok  	go.opentelemetry.io/otel/sdk/metric/aggregation	(cached)
ok  	go.opentelemetry.io/otel/sdk/metric/internal	(cached)
ok  	go.opentelemetry.io/otel/sdk/metric/metricdata/metricdatatest	(cached)
FAIL

@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 1, 2023

The SDK does not look compliant with this

Tracking with #4164

@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 2, 2023

The implementation SHOULD disregard the accidental use of APIs appurtenant to asynchronous instruments outside of registered callbacks in the context of a single MetricReader collection.

This sounds like a complex way of saying calls to observable instruments outside of callbacks need to be ignored. Given the observables here do not have any methods, we comply with this implicitly.

@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 2, 2023

The implementation SHOULD use a timeout to prevent indefinite callback execution.

The implementation does not explicitly use a timeout for the callback execution. However it passes the context passed to any collect call that may include a timeout.

I do not think the appropriate, or idiomatic, behavior here is to run callbacks in a goroutine and abandon them if the timeout fails. Instead, the readers should be documented that the callback they pass to Collect will honor any timeouts and the callbacks need to be documented that they need to honor timeouts in the passed context.

For the periodic reader, there is a timeout used for an export:

c, cancel := context.WithTimeout(ctx, r.timeout)

It probably makes sense to include this timeout in the collection process as well.

@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 2, 2023

The implementation SHOULD use a timeout to prevent indefinite callback execution.

The implementation does not explicitly use a timeout for the callback execution. However it passes the context passed to any collect call that may include a timeout.

I do not think the appropriate, or idiomatic, behavior here is to run callbacks in a goroutine and abandon them if the timeout fails. Instead, the readers should be documented that the callback they pass to Collect will honor any timeouts and the callbacks need to be documented that they need to honor timeouts in the passed context.

For the periodic reader, there is a timeout used for an export:

c, cancel := context.WithTimeout(ctx, r.timeout)

It probably makes sense to include this timeout in the collection process as well.

#4166

@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 5, 2023

The implementation MUST complete the execution of all callbacks for a given instrument before starting a subsequent round of collection.

The collection process is guarded by a lock that is unique to a pipeline (reader/views/exporter):

p.Lock()
defer p.Unlock()

That ensures that all the callbacks will be completed before a "subsequent round of collection" for the pipeline is started.

@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 5, 2023

@MrAlias
Copy link
Contributor Author

MrAlias commented Jul 20, 2023

Done.

@MrAlias MrAlias closed this as completed Jul 20, 2023
@MrAlias MrAlias added this to the v1.17.0/v0.40.0 milestone Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:metrics Part of OpenTelemetry Metrics pkg:SDK Related to an SDK package
Projects
No open projects
Development

No branches or pull requests

1 participant