Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Prometheus metrics reader constructor #2988

Merged
merged 9 commits into from
May 17, 2021

Conversation

albertteoh
Copy link
Contributor

Signed-off-by: albertteoh albert.teoh@logz.io

Which problem is this PR solving?

Short description of the changes

  • Adds the Prometheus metrics reader constructor implementation and tests.

@codecov
Copy link

codecov bot commented May 12, 2021

Codecov Report

Merging #2988 (47fb91f) into master (7636530) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2988      +/-   ##
==========================================
- Coverage   95.98%   95.97%   -0.01%     
==========================================
  Files         224      224              
  Lines        9731     9747      +16     
==========================================
+ Hits         9340     9355      +15     
  Misses        323      323              
- Partials       68       69       +1     
Impacted Files Coverage Δ
plugin/metrics/prometheus/metricsstore/reader.go 100.00% <100.00%> (ø)
cmd/query/app/server.go 95.62% <0.00%> (-1.46%) ⬇️
cmd/query/app/static_handler.go 96.77% <0.00%> (ø)
plugin/storage/integration/integration.go 77.90% <0.00%> (+0.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7636530...47fb91f. Read the comment docs.

Signed-off-by: albertteoh <albert.teoh@logz.io>
@albertteoh albertteoh marked this pull request as ready for review May 12, 2021 03:51
@albertteoh albertteoh requested a review from a team as a code owner May 12, 2021 03:51
@albertteoh albertteoh requested a review from vprithvi May 12, 2021 03:51
conn.Close()

return &MetricsReader{
url: fmt.Sprintf("http://%s/api/v1/query_range", hostPort),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This URL will be used to query metrics for the configured Prometheus-compatible backend, i.e. Prometheus, M3, Thanos, etc.

return nil, fmt.Errorf("no prometheus query host:port provided")
}
errs := make([]error, 0)
for _, hostPort := range hostPorts {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of an odd behavior, why do we need to loop through hosts looking for a reachable one? Why wouldn't they all be reachable? Or to put another way, what's the point of providing unreachable host:port pairs as input?

If these addresses are somehow "unstable", then they might as well become unreachable after the reader is created.

Copy link
Contributor Author

@albertteoh albertteoh May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out @yurishkuro.

The initial motivation was to maintain config consistency with ES and Cassandra span stores in how they both support configuring multiple server URLs, presumably for redundancy but yes, good call that the selected address could be unreachable after the reader is created.

This constructor behaviour was based on following ES's lead in how it initiates a new client by health checking each server: https://github.com/olivere/elastic/blob/v6.2.35/client.go#L1153.

However, the piece I'm missing which you correctly picked up on, is maintaining this pool of connections and finding the first healthy connection on each request, for example: https://github.com/olivere/elastic/blob/v6.2.35/client.go#L1301

The trade-off here is that it's a fair bit more complicated.

What direction do you suggest we take? Have a simple single host:port configuration? Or support a pool of connections that we select on each request?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Prom only expose HTTP API, or is there gRPC version? gRPC would've taken care of the connection pooling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the best of my knowledge, Prometheus only exposes an HTTP API.

I found some notes in Prometheus docs referring to using gRPC for remote storage reads/writes in the future, but that's not related to the query API.

Copy link
Contributor Author

@albertteoh albertteoh May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative is to use the Prometheus go client; but it means including another dependency.

Worth noting is this client takes a single URL: https://github.com/prometheus/client_golang/blob/master/api/client.go#L78

Given we're only using a single Prometheus endpoint /api/v1/query_range, I feel the Prometheus client offers more than we need and may be a bit of an overkill.

At the same time, we can adopt this client's approach of just a single connection URL, which I assume would be the more idiomatic given it's an official Prometheus client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to proceed with the Prom client library, I think it would be best we modify our metrics data model to match prometheus' to avoid mapping between OpenTelemetry and Prometheus' data models. I guess this might mean using OpenMetrics, unless if there's a good reason to stick with OpenTelemetry's.

Otherwise, should we maintain the simple http client as we have now, configured with a single Prom address?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion. Which of the two formats more mature at this point? I thought OTLP was still changing its proto.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just discussed this in the Jaeger call. General agreement was that OpenMetrics would be a better choice:

  • OTLP is still changing, and we already took a copy of the data model which might be different now
  • OTLP is supposed to be fully compatible with OpenMetrics by the end of the year
  • OpenMetrics is the de-factor Prometheus format used by many different backends already
  • OpenMetrics is stable
  • OTEL will support OTLP<->OpenMetrics conversion in the future so we would still be able to implement support for OTLP-native backends

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for raising this in the Jaeger call. Those reasons make sense so I'll make the switch over to OpenMetrics.

It also sounds like we're okay with using the Prom client for HTTP queries (even though it's in alpha), but please correct me if my assumption is wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also sounds like we're okay with using the Prom client for HTTP queries

sgtm

@albertteoh albertteoh marked this pull request as draft May 14, 2021 11:54
Signed-off-by: albertteoh <albert.teoh@logz.io>
@albertteoh
Copy link
Contributor Author

PR updated to use prom client and OpenMetrics.

Did an initial e2e test with a POC version and the prom client looks okay so far. Still a bit of work involved to convert from the prom client's data model to OpenMetrics, and couldn't find a library that does this conversion, which was a little surprising. It wasn't too difficult in any case.

@albertteoh albertteoh marked this pull request as ready for review May 14, 2021 13:24
yurishkuro
yurishkuro previously approved these changes May 15, 2021
// The OpenMetrics protobuf schema which defines the protobuf wire format.
// Ensure to interpret "required" as semantically required for a valid message.
// All string fields MUST be UTF-8 encoded strings.
// Based on: https://github.com/OpenObservability/OpenMetrics/blob/v1.0.0/proto/openmetrics_data_model.proto
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to auto-generate this file from openmetrics_data_model.proto? If not, please summarize the manual changes required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the file itself it seems that the main IDL definitions are unchanged from openmetrics, we're only adding some preamble?

Copy link
Contributor Author

@albertteoh albertteoh May 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package name and gogoproto options below were added, required to marshal the metrics data over the wire in the response. A note was added in the README for posterity.

It should be possible to auto-generate this file. Off the top of my head, perhaps by copying this file from a submodule and doing some string replacements. Is there a better approach or is it getting a bit too unwieldy, and so manual instructions would be better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

describing additions in readme is a bit removed. If there is a concrete & single block of text that's been added compared to official file, I would mark it with comments in the file directly. When looking at the source file, it's convenient to have it be self-descriptive, e.g.

// This file is a copy of {path to source}, with the following additions 
//  * ...
//  * ...

// -- BEGIN ADDITIONS
...
// -- END ADDITIONS

@@ -18,38 +18,53 @@ import (
"context"
"time"

"github.com/prometheus/client_golang/api"
"github.com/prometheus/client_golang/api/prometheus/v1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"github.com/prometheus/client_golang/api/prometheus/v1"
prom_api "github.com/prometheus/client_golang/api/prometheus/v1"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or something like that, v1 is a terrible package name

yurishkuro
yurishkuro previously approved these changes May 16, 2021
// TODO: Implement me
return nil, nil
return metrics.MetricFamily{}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to return MetricFamily by value instead of a pointer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong reason, it's just a continuation from the API definition which I defined as not nullable.

Perhaps it's better to pass by pointer for performance reasons?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the reason for defining MetricFamily as not nullable is to act as a forcing-function to return an empty result for both cases where there is no data in the result as well as an error, which I think is more meaningful (as it also contains meta-data), rather than nil.

albertteoh added 5 commits May 17, 2021 10:36
Signed-off-by: albertteoh <albert.teoh@logz.io>
Signed-off-by: albertteoh <albert.teoh@logz.io>
Signed-off-by: albertteoh <albert.teoh@logz.io>
Signed-off-by: albertteoh <albert.teoh@logz.io>
Signed-off-by: albertteoh <albert.teoh@logz.io>
albertteoh added 2 commits May 17, 2021 10:37
Signed-off-by: albertteoh <albert.teoh@logz.io>
func TestNewMetricsReaderInvalidAddress(t *testing.T) {
logger := zap.NewNop()
reader, err := NewMetricsReader(logger, "\n", defaultTimeout)
const wantErrMsg = `parse "http://\n": net/url: invalid control character in URL`
Copy link
Member

@yurishkuro yurishkuro May 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better not to depend on exact error messages from stdlib, but error.Wrap the error with our own message and assert only on our own string, e.g.:

require.Error(t, err)
assert.Contains(t, expectedMsg, err.Error())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, I'll address this in the next PR, thanks!

@albertteoh albertteoh merged commit 726c352 into jaegertracing:master May 17, 2021
@albertteoh albertteoh deleted the add-promreader-ctor branch May 17, 2021 01:21
@jpkrohling jpkrohling added this to the Release 1.23.0 milestone Jun 4, 2021
@albertteoh albertteoh mentioned this pull request Jul 9, 2021
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants