Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

registry: drop duplicate checks #1621

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lukedirtwalker
Copy link

In processMetrics drop duplicate checks. If the metricFamily exists we already have a type check with the call to checkMetricConsistency. The help string is already checked in the checkDescConsistency albeit only if pedantic mode is enabled. However this is probably what is desired anyway.

In processMetrics drop duplicate checks. If the metricFamily exists we
already have a type check with the call to checkMetricConsistency. The
help string is already checked in the checkDescConsistency albeit only
if pedantic mode is enabled. However this is probably what is desired
anyway.
@lukedirtwalker
Copy link
Author

I took the freedom to remove the description consistency check in non-pedantic mode.
This was causing issues in opentelemetry-collector prometheus exporter (see open-telemetry/opentelemetry-collector-contrib#28617 for more details)

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Type checks were dups. The bigger change is that indeed registry will now not check the help consistency (you propose relaxing it). Let's update PR title, that's the bigger change here (1).

Without this check we need to handle random helps for the same metric family somehow. Quickly looking, I think the logic seems to be the first help from the first metric within a metric family will be used in the collected channel which mean.. random. This can be confusing and even worse, causing flipping help on scrape trashing potentially collector/Prometheus metadata storage.

Can you fix DCO (2) and add test for multiple helps against registry and pedantic registry confirming the behaviour (random/deterministic) (3).

Another issue is that I can see another check that we missed here, wonder why you didn't hit that first? 🤔 Hopefully test can help us figure out, we might need to update hashing there (4).
Then another check that is still there, it would be weird if we relax in single gatherer, then for other cases we check perhaps (5)?

Then I found various descriptions around logic that will change after this change e.g.

// Note that even after unregistering, it will not be possible to
so we will need to update and test this (6).

Generally it would epic if there was a simple solution for callers like Otel to dedup and decide what help you want to pick. Have you considered that? Then registry can maintain semantics of consistent help for better UX and contract, plus you can see this invariance is tied to various places of this library. Not saying it's impossible to get rid of, but a bit more work is needed.

To sum up, I enumerated 1-6 action items here at the minimum.

cc @beorn7 - do you envision any other surprises we don't see yet here for that decision?

@beorn7
Copy link
Member

beorn7 commented Sep 6, 2024

My memories of implementation details are pretty much gone…
I would be surprised if there were too many checks, but who knows…

Note that the Pushgateway has the same problem (of diverging help strings from different sources). It did not need a change of client_golang to cope.

@beorn7
Copy link
Member

beorn7 commented Sep 6, 2024

Generally it would epic if there was a simple solution for callers like Otel to dedup and decide what help you want to pick. Have you considered that? Then registry can maintain semantics of consistent help for better UX and contract, plus you can see this invariance is tied to various places of this library.

This sounds very reasonable to me and in line with my vague memories (but I might misremember something).

I've just checked what PGW is doing. It is doing the help string check on its own indeed, and then fixes it on its own and logs about it.

@lukedirtwalker
Copy link
Author

Thank you both for the quick feedback. If I read it right, you both seem to prefer to close this PR and solve the problem on the client side? Works for me as well, it's certainly possible to fix this in the code that uses client_golang (otel in this case).

Let me know if you like to leave it here as is, then I close the PR.

@bwplotka
Copy link
Member

bwplotka commented Sep 6, 2024

I mean it would epic to help you in the easiest possible way for everyone.

I looked a bit, plus we discussed on Slack too. I noticed you are using unchecked collector, which makes sense for otel: aggregating data from various sources. I think it would make sense to do this help check ONLY if registeredDescIDs != nil (representing case of checked collector), given it's already "unchecked". This might allow us to generally take it lightly and not need to redesign the whole desc consistency model.

However... flapping help issue remains...maybe, probably depending on order of code movings metrics into channel. We could have a simple framework on client_golang side e.g. take the longer one / strings.Compare > 0 or so, but it's pretty important to have it somewhat stable (if possible).

On client side all of this is possible, just a bit more overhead. Ideas welcome, we are not saying no, but some tests would be mandatory here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants