Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data source HTTP metrics #1130

Merged
merged 10 commits into from
Nov 5, 2024
Merged

Data source HTTP metrics #1130

merged 10 commits into from
Nov 5, 2024

Conversation

aangelisc
Copy link
Contributor

We're currently lacking insight into key metrics that will aid us in understanding how HTTP data sources are used.

This PR ports the datasource_metrics_middleware from core Grafana which adds the following metrics under the plugins namespace:

  • datasource_request_total - a counter of the total number of outgoing requests for the data source
  • datasource_request_duration_seconds - a histogram of the duration of outgoing requests for the data source
  • datasource_response_size_bytes - a histogram of the size of the response body for the outgoing requests
  • datasource_request_in_flight - a gauge of the requests in flight for the data source

@aangelisc aangelisc self-assigned this Oct 28, 2024
@aangelisc aangelisc requested a review from a team as a code owner October 28, 2024 15:53
@aangelisc aangelisc requested review from wbrowne, andresmgot and xnyo and removed request for a team October 28, 2024 15:53
Copy link

@Multimo Multimo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work thanks for following up on this 👍

marefr
marefr previously requested changes Oct 29, 2024
Copy link
Contributor

@marefr marefr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general. Some minor things left and would like to know if native histograms for external datasource plugins works as expected.

backend/httpclient/datasource_metrics_middleware.go Outdated Show resolved Hide resolved
Comment on lines +47 to +49
NativeHistogramBucketFactor: 1.1,
NativeHistogramMaxBucketNumber: 100,
NativeHistogramMinResetDuration: time.Hour,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to verify this works as expected for an external datasource plugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this doesn't look like it works atm, maybe best to remove?

Namespace: "plugins",
Name: "datasource_request_duration_seconds",
Help: "histogram of durations of outgoing external data source requests sent from Grafana",
Buckets: []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10, 25, 50, 100},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to consider native histogram here as well if it works for datasource_response_size_bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
This doesn't look like a native histogram response :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about

histogram_quantile(0.99, sum(rate(plugins_datasource_response_size_bytes{datasource_type="$plugin"}[$__rate_interval])) by (le))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope 😞 definitely shows native histograms aren't supported atm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an issue to investigate this further #1132

FYI @wbrowne


var executeMiddlewareFunc = executeMiddleware

func DataSourceMetricsMiddleware() Middleware {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this could be re-used by Grafana itself here https://github.com/grafana/grafana/blob/f062c66f89b09fca7716438ac488a76bf81ed7f8/pkg/infra/httpclient/httpclientprovider/http_client_provider.go#L28, perhaps a follow up. But then it would require an option to configure the namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed although I'm not sure how to change the namespace without m moving the creation of the metrics into a function which then causes duplicate register errors.

backend/httpclient/http_client.go Outdated Show resolved Hide resolved
@@ -3,15 +3,16 @@ package backend
import (
"context"

ctxHelpers "github.com/grafana/grafana-plugin-sdk-go/backend/context"
Copy link
Member

@wbrowne wbrowne Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I missed this somewhere, but why the need for a new package for this? Is it due to requiring the backend package in httpclient?

Yes this same old pesky problem. I recommend we find a better name for this just so we don't always have to alias the import. Though I dislike utility/helper packages, maybe ctxutil? In the meantime I might do a quick look to see how much work it would be to shift some things around to avoid this entirely

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it, yes, a cyclic dependency because of this line:

		labels["endpoint"] = ctxHelpers.EndpointFromContext(ctx).String()

My suggestion to avoid the breaking change:

  • Create a package under backend/internal/endpointctx and expose EndpointCtxKeyType and EndpointCtxKey. Since it's within an internal folder, it won't be exposed outside of the repository.
  • Change backend/endpoint.go to use the above.
  • Change backend/httpclient/datasource_metrics_middleware.go and manually read the exposed key. Something like:
		labels["endpoint"] = ""
		if ep := ctx.Value(endpointctx.EndpointCtxKey); ep != nil {
			labels["endpoint"] = ep.(string)
		}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good alternative, I will implement and request a re-review 😊

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice @andresmgot - I think that'll do well for now. Ket's keep an eye on if this becomes more common, then we may look at a broader change to these packages perhaps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented changes 😄

Copy link
Contributor

@andresmgot andresmgot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

backend/internal/endpointctx/endpointctx.go Show resolved Hide resolved
@aangelisc aangelisc merged commit 501b836 into main Nov 5, 2024
3 checks passed
@aangelisc aangelisc deleted the andreas/ds-response-sizes branch November 5, 2024 12:15
ctx := r.Context()
labels["endpoint"] = ""
if ep := ctx.Value(endpointctx.EndpointCtxKey); ep != nil {
labels["endpoint"] = ep.(string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is not working as expected, sorry. I'm seeing some panics in my local instance:

logger=plugin.example-httpbackend-datasource t=2024-11-06T12:46:05.012205+01:00 level=error msg="panic triggered" error="interface conversion: interface {} is backend.Endpoint, not string"

Can you verify this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants