Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utf8 support to the prometheus exporter #5755

Merged
merged 4 commits into from
Sep 9, 2024

Conversation

dashpole
Copy link
Contributor

@dashpole dashpole commented Aug 28, 2024

Changes

Disable sanitization when the UTF-8 support is enabled in the Prometheus library.

Usage

To enable UTF-8 support for the Prometheus exporter after this change, set the following in your application:

import "github.com/prometheus/common/model"

func init() {
    model.NameValidationScheme = model.UTF8Validation
}

See exporters/prometheus/testdata/counter_utf8.txt for an example of the text exposition format including names/labels with dots.

@dashpole
Copy link
Contributor Author

cc @ywwg

Copy link

codecov bot commented Aug 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.5%. Comparing base (506a9ba) to head (ff43372).
Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #5755     +/-   ##
=======================================
- Coverage   84.5%   84.5%   -0.1%     
=======================================
  Files        272     272             
  Lines      22798   22773     -25     
=======================================
- Hits       19283   19257     -26     
- Misses      3172    3173      +1     
  Partials     343     343             

see 3 files with indirect coverage changes

@dashpole dashpole marked this pull request as ready for review August 28, 2024 21:19
@dashpole dashpole added pkg:exporter:prometheus Related to the Prometheus exporter package enhancement New feature or request labels Aug 28, 2024
@dashpole
Copy link
Contributor Author

cc @open-telemetry/wg-prometheus

@ywwg
Copy link

ywwg commented Aug 29, 2024

    model.NameEscapingScheme = model.NoEscaping

The escaping scheme is designed to be used as a hint when a UTF-8-capable system is delivering metrics to a system that does not or one that does but has UTF-8 turned off. In this case, the specified escaping scheme setting is used rather than using the default non-round-trippable underscore-replacement scheme. Therefore, this option is not intended to be set to NoEscaping when UTF8 is the default. I can see how this is confusing, I might update the documentation to make this more clear.

@ywwg
Copy link

ywwg commented Aug 29, 2024

(part of the confusion is because the same common library is used by metrics producers and consumers, so it's not clear which side of the conversation these settings apply to)

Copy link

@ywwg ywwg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with this exporter, does it do content negotiation with prometheus? That handshake is the intended way to determine whether prometheus can accept UTF8 or needs escaping.

exporters/prometheus/exporter.go Outdated Show resolved Hide resolved
@ywwg
Copy link

ywwg commented Aug 29, 2024

Some more detail: prometheus will put "escaping=allow-utf-8" in the Accept header if it accepts UTF-8, otherwise it may specify a different escaping scheme. If there is no "escaping" term, the producer should use its own default escaping scheme. (If it set "NoEscaping", then it would send UTF-8 and the metrics would be rejected. (That could be an intended behavior, I suppose)

https://github.com/prometheus/prometheus/blob/main/config/config.go#L482
https://github.com/prometheus/common/blob/main/expfmt/expfmt.go#L168-L184

@dashpole
Copy link
Contributor Author

I am not familiar with this exporter, does it do content negotiation with prometheus?

This is just a wrapper around the Prometheus go client library, and users will still use promhttp to expose metrics endpoints, and should get all content negotiation, etc.

Therefore, this option is not intended to be set to NoEscaping when UTF8 is the default

For our tests, If I only set model.NameValidationScheme = model.UTF8Validation, it still used a replacing scheme (e.g. U__rpc_2e_durations_2e_histogram_2e_seconds from rpc.durations.histogram.seconds) during GatherAndCompare, so I assumed users would need to do the same. Do we need to update that function in client_golang? I would have expected the default to be UnderscoreEscaping, rather than ValueEncodingEscaping. Should we tell users to set it to UnderscoreEscaping instead?

@ywwg
Copy link

ywwg commented Sep 3, 2024

For our tests, If I only set model.NameValidationScheme = model.UTF8Validation, it still used a replacing scheme (e.g. U__rpc_2e_durations_2e_histogram_2e_seconds from rpc.durations.histogram.seconds) during GatherAndCompare

It sounds like GatherAndCompare is not respecting the default. Are you able to find where in the call stack the escaping is happening? We may have missed an entry point. I tried setting some breakpoints and was unable to find where the call was happening. I think it would have to be a call to model.EscapeName.

Should we tell users to set it to UnderscoreEscaping instead?

That's a big question tbh! I selected Values escaping because it is roundtrippable, so producers sending UTF-8 metrics can reliably query them back again without risk of collisions. But this may be unexpected behavior for Prometheus users, in which case we could change the built-in default escaping method in common/model.

@dashpole
Copy link
Contributor Author

dashpole commented Sep 4, 2024

It sounds like GatherAndCompare is not respecting the default. Are you able to find where in the call stack the escaping is happening? We may have missed an entry point. I tried setting some breakpoints and was unable to find where the call was happening. I think it would have to be a call to model.EscapeName.

I think it happens from here, with expfmt.NewFormat(expfmt.TypeTextPlain): https://github.com/prometheus/client_golang/blob/97aa0493ebfa75f20fe51ebb32f1e5c5315e5fa1/prometheus/testutil/testutil.go#L301

That should return expfmt.FmtText: https://github.com/prometheus/common/blob/b1880d0dabb633dbf29b999c6a046637efb602fe/expfmt/expfmt.go#L49

Then the encoder calls format.ToEscapingScheme on expfmt.FmtText: https://github.com/prometheus/common/blob/b1880d0dabb633dbf29b999c6a046637efb602fe/expfmt/encode.go#L150

expfmt.FmtText doesn't include an escaping clause, so it returns the global model.NameEscapingScheme: https://github.com/prometheus/common/blob/b1880d0dabb633dbf29b999c6a046637efb602fe/expfmt/expfmt.go#L183, which is model.ValueEncodingEscaping by default.

That eventually is used to call model.EscapeMetricFamily inside of the expfmt encoder: https://github.com/prometheus/common/blob/b1880d0dabb633dbf29b999c6a046637efb602fe/model/metric.go#L195-L195

@ywwg
Copy link

ywwg commented Sep 4, 2024

so this could be fixed by having compare build a format that includes the escaping=allow-utf-8 term. There should probably be a func (format Format) WithEscapingScheme(s model.EscapingScheme) Format{} to add an escaping term to a Format so other people can do this.

ywwg added a commit to ywwg/common that referenced this pull request Sep 4, 2024
should be helpful for open-telemetry/opentelemetry-go#5755

Signed-off-by: Owen Williams <owen.williams@grafana.com>
@ywwg
Copy link

ywwg commented Sep 4, 2024

prometheus/common#688

@dashpole
Copy link
Contributor Author

dashpole commented Sep 4, 2024

@ywwg so the plan is to wait for client_golang to update to the latest prom/common release, then add the escaping option when comparing. We will probably need to remove the existing sanitization with a feature gate.

That's a big question tbh! I selected Values escaping because it is roundtrippable, so producers sending UTF-8 metrics can reliably query them back again without risk of collisions. But this may be unexpected behavior for Prometheus users, in which case we could change the built-in default escaping method in common/model.

This sounds like something we should track separately. Should I open an issue in prom/common?

@ywwg
Copy link

ywwg commented Sep 4, 2024

@ywwg so the plan is to wait for client_golang to update to the latest prom/common release, then add the escaping option when comparing. We will probably need to remove the existing sanitization with a feature gate.

yup, I'll poke people to push out another common release

This sounds like something we should track separately. Should I open an issue in prom/common?

yes that sounds like a good idea

ywwg added a commit to prometheus/client_golang that referenced this pull request Sep 4, 2024
For open-telemetry/opentelemetry-go#5755

Signed-off-by: Owen Williams <owen.williams@grafana.com>
@dashpole
Copy link
Contributor Author

dashpole commented Sep 4, 2024

converting to draft for now

ywwg added a commit to ywwg/client_golang that referenced this pull request Sep 5, 2024
For open-telemetry/opentelemetry-go#5755

Signed-off-by: Owen Williams <owen.williams@grafana.com>
ywwg added a commit to prometheus/client_golang that referenced this pull request Sep 5, 2024
For open-telemetry/opentelemetry-go#5755

Signed-off-by: Owen Williams <owen.williams@grafana.com>
@ywwg
Copy link

ywwg commented Sep 5, 2024

client_golang updated, so you should be unblocked

@dashpole
Copy link
Contributor Author

dashpole commented Sep 6, 2024

I am able to remove overriding model.NameEscapingScheme with model.NoEscaping now. I was also able to switch to using model.EscapeName, rather than our own escaping function.

But I wasn't able to remove the check for model.NameValidationScheme != model.UTF8Validation in the exporter because I still get errors during testing and when running our example. Do we need more updates to client_golang to allow (but escape) names with dots by default?

@ywwg
Copy link

ywwg commented Sep 6, 2024

Right now common.model is still defaulted to Legacy validation, and when 3.0 comes out we will switch this to UTF-8. Until then, it's necessary to set that value manually, either at the top of tests, or in a func init(). In Prometheus, there are a bunch of tests for legacy behavior that I wanted to keep, so even after 3.0 it'll be necessary to override that value to exercise the old behavior.

Feel free to hit me up on CNCF slack, or we can do a video conference to talk it through.

@dashpole
Copy link
Contributor Author

dashpole commented Sep 6, 2024

Great. Just making sure I wasn't missing something

@dashpole dashpole marked this pull request as ready for review September 6, 2024 14:06
Copy link

@ywwg ywwg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one question, otherwise lgtm

exporters/prometheus/exporter.go Outdated Show resolved Hide resolved
Copy link

@ywwg ywwg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for battle-testing the new API!

example/prometheus/go.mod Outdated Show resolved Hide resolved
@dashpole dashpole merged commit 71b341f into open-telemetry:main Sep 9, 2024
32 checks passed
@dashpole dashpole deleted the prometheus_utf8_support branch September 9, 2024 15:08
@XSAM XSAM added this to the v1.30.0 milestone Sep 9, 2024
XSAM added a commit that referenced this pull request Sep 10, 2024
### Added

- Support `OTEL_EXPORTER_OTLP_LOGS_INSECURE` and
`OTEL_EXPORTER_OTLP_INSECURE` environments in
`go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc`. (#5739)
- The `WithResource` option for `NewMeterProvider` now merges the
provided resources with the ones from environment variables. (#5773)
- The `WithResource` option for `NewLoggerProvider` now merges the
provided resources with the ones from environment variables. (#5773)
- Add UTF-8 support to `go.opentelemetry.io/otel/exporters/prometheus`.
(#5755)
### Fixed

- Fix memory leak in the global `MeterProvider` when identical
instruments are repeatedly created. (#5754)
- Fix panic instruments creation when setting meter provider. (#5758)
- Fix panic on instruments creation when setting meter provider. (#5758)
- Fix an issue where `SetMeterProvider` in `go.opentelemetry.io/otel`
might miss the delegation for instruments and registries. (#5780)

### Removed

- Drop support for [Go 1.21](https://go.dev/doc/go1.21). (#5736, #5740,
#5800)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg:exporter:prometheus Related to the Prometheus exporter package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants