Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add log formatting options to Telemetry API #2094

Closed

Conversation

douglas-reid
Copy link
Contributor

@douglas-reid douglas-reid commented Sep 9, 2021

This PR brings logging customization controls up to parity with Tracing / Metrics, allowing tailoring of log formats on a per-workload basis. The API is based on the existing Open Telemetry Logging API specification, which should enable mapping into existing provider formats (file, stream, open_telemetry, and stackdriver).

This addresses several open FRs, including istio/istio#33307.

@istio-policy-bot
Copy link

😊 Welcome @douglas-reid! This is either your first contribution to the Istio api repo, or it's been
awhile since you've been here.

You can learn more about the Istio working groups, code of conduct, and contributing guidelines
by referring to Contributing to Istio.

Thanks for contributing!

Courtesy of your friendly welcome wagon.

@istio-testing istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Sep 9, 2021
@istio-testing
Copy link
Collaborator

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@google-cla google-cla bot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Sep 9, 2021
@istio-testing istio-testing added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 9, 2021
@douglas-reid
Copy link
Contributor Author

@mandarjog @kyessenov @howardjohn i'm trying to kick off Telemetry API work for 1.12. please take a look and provide some feedback. I opt'd out of trying to pull in the OTel protos directly for an initial round of discussion, etc., but they can be put back as required. I think it should be relatively straightforward to map into existing providers here, and this avoids having to try to model structured logging formats, etc., in an envoy-specific way.

//
// TODO(dougreid): discuss mapping into supported providers?
//
// Example: ``body { string_value: "%PROTOCOL%" }``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not yaml so its a bit confusing how a user is supposed to apply it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies, i'll update with some yaml examples soon. i was just trying to put a strawman together quickly to spur discussion.

// allowing mapping into various access logging providers.
message LogFormat {
// Optional. A value containing the body of the log record. Valid uses include
// a human-readable free-form string message and structured data composed of arrays and maps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have evidence structure format is useful? As far as I can tell NGINX, HAProxy, and Envoy all use a free form string. It seems quite a bit simpler to understand.

I couldn't tell how to use the structured form in this API (or the free form, really...), and the otel spec is like 20 pages which is a bit of a red flag it may be over complicated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it makes a bit more sense after poking around the upstream docs a bit. is the otel proto meant for humans writing it though? It seems pretty unfriendly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: structured format utility -- support by major vendors, including cloud logging, and envoy itself, suggests that there are indeed those that use JSON-formatted logs. i know i have open issues against me about structured logging.

This is a bit verbose. That is definitely true. Adopting something like SubstitutionFormatString also seemed a bit onerous.

This is sort of what I'm trying to tease out here. Do we want to provide any sort of structured customization (perhaps even CEL-based), or just say "here's a string, put whatever you want here" ?

I think we do want an ability to provide structured customization (say adding fields to Cloud Logging formats), but I'm not clear on what the "best" way forward is. In past API reviews, we rejected putting the customization directly in the providers (having provider-specific overrides). Trying to generalize across formats unfortunately leads to bloat.

Maybe we could try to slim down the options, or abstract them in a way that makes writing them easier. I'll have to do more thinking about what that might look like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the functionality gap between just plain JSON (and raw string) vs the full otel proto? We already support json format today and its not hard to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we support JSON format, but do we support easy customization of the JSON format ? does accessLogFormat string in proxy config count as easy to use for JSON format cases?

if we want to customize the attributes in any way, having some structured control over the key/values seems useful.

The bulk of my experience is with Cloud Logging. Having a way to separate out the body from additional attributes is useful for that case. That actually maps to some of the old Mixer configuration, where we had a payload and http_mapping sections.

That may be too much for most users though?

We could have a oneof with a text format and a json format to allow for slightly simpler config, at the risk of inventing yet another logging API. The initial appeal of OTel to me is two-fold: (1) it exists already and (2) it can be mapped to other formats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, for default cases. But you could put anything you want in, potentially. so you could have: body: "serviceA --> serviceB" and the attributes be all the other info. The precedent for this would be something like the existing stackdriver TCP logs (example).

I'd be fine with starting without it, if we feel it is overkill.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting... I guess you would need templating in the body as well for it to be useful.

Really it seems like body is just a specialized attribute that may be displayed different in various logging systems

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above - is it Istio business to define and maintain a cross-vendor API for configuring the log format ?
And is this something that should be done at workload-level granularity, i.e. each workload sending metrics in a slightly different format and to a different sink ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if stackdriver or opentelemetry don't have an API or config of their own we could create a placeholder
'stackdriver' CRD, with stackdriver-specific configuration including format and anything else that could be configured.
(same for opentelemetry, prometheus or any other integration ), and just refer to it.

The config can be a simple configmap - giving the vendors a chance to create their own CRDs.

Long term, each vendor library should be able to consume his own vendor config, and we don't have to worry about it -
just reference the config we want to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@costinm I don't believe it is necessarily Istio business to define and maintain a cross-vendor API for log formats. This principle has been at the core of the Telemetry API design discussion for a long while. We were able to avoid it for Tracing, and paper over it for Metrics, but logging is much thornier.

I don't know that you'd necessarily want different formats at the individual workload level, but we do see metrics customization at the workload level, so it isn't outside the realm of possibilities. Imagine having extra headers you want to collect (maybe just for a brief debugging, etc.).

The notion of their own CRDs is essentially the Extension Provider API that currently lives inside of Mesh Config (and is the approach mostly taken by the alternative PR to this one).

I'm intrigued by the idea of having the configuration for extension providers being a reference to a configmap, as that provides a workaround to allowing users to update without access to meshconfig. I worry a bit about the control plane keeping up with those references and updating appropriately.

// LogFormat provides control over the generated access log records.
// It adopts portions of the Open Telemetry model for record formatting,
// allowing mapping into various access logging providers.
message LogFormat {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make this provider agnostic? It can just be pass thru, no?

@kyessenov
Copy link
Contributor

I think the main point that needs a decision is whether we want to support Envoy format strings as Istio APIs. The alternative is to make the format string pre-defined and not expose it in API. Otherwise, this is fairly consistent with other telemetry providers.

@douglas-reid
Copy link
Contributor Author

I think the main point that needs a decision is whether we want to support Envoy format strings as Istio APIs. The alternative is to make the format string pre-defined and not expose it in API. Otherwise, this is fairly consistent with other telemetry providers.

I suspect that users will want to customize the log formats -- to add/remove attributes, etc. (based on the forums, open issues, etc.) Pre-defining the format string seems fairly limiting. Providing an Istio layer that translates down into Envoy format strings also seems like a step too far. So, my instinct is to allow users to pass in format strings opaquely, with some control over the structure that results. Curious to hear thoughts of others.

@mandarjog
Copy link
Contributor

Yeah, I think pass thru opaque string is fine. We can add an enum later to catalog "standard formats", but we don't need to start there.

Freeform opaque string is both necessary and sufficient. Struct offers some validation, however we will have provider specific validation that can do what ever it wants with the string.

@kyessenov
Copy link
Contributor

If you want envoy formatters, those are evolving and we need some way to selectively trigger extensions. E.g. some old envoys do not have REQ_WITHOUT_QUERY while newer ones do.


// Optional. Controls the format of the generated log records.
// If not specified, the istio default logging format will be used.
LogFormat log_format = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we need is far from customizing log format, but we also want specify headers to add

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent behind LogFormat is to allow exactly what you ask for. The idea would be to pass in format specifiers that reference headers, etc., that would be included in the log.

Perhaps we need a better name? Or maybe something less opaque to make that more explicit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show we use some config like k8s volume for better usage, something like:

- name: file-log
  file:
    path: /dev/stdout
    format: "[%START_TIME%]"
- name: grpc-http-log
  gprc:
    grpcService:
      address: grpc-als.istio-system.svc:9811
    additionalRequestHeadersToLog:
    - x-request-user
    additionalResponseHeadersToLog:
    - x-response-cost
- name: grpc-http-log
  grpcOtel:
    grpcService:
      address: grpc-als.istio-system.svc:9811
      

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zirain does this mean that you want to log different things per provider?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@douglas-reid please take a look on this issue istio/istio#34911

@istio-testing istio-testing added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 30, 2021
@douglas-reid
Copy link
Contributor Author

@hzxuzhonghu I added some potential examples to the top comments in the proto file.

@hzxuzhonghu @mandarjog @kyessenov @howardjohn @zirain I took a quick stab at an alternate provider-based approach for the same functionality in #2110. Would appreciate your feedback on that approach versus this style.

@istio-testing
Copy link
Collaborator

@douglas-reid: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Oct 2, 2021
// string_value: "[%START_TIME%] "%REQ(:METHOD)% %RESPONSE_CODE% %RESPONSE_FLAGS%\n"
// ```
//
// Policy to change the access logging format for JSON loggers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid the word 'policy' ? I'm not native speaker, but every time I hear Policy I think about security and enforcement. This is just configuring the log format - there is no policy involved...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. we can worry about the wording once we decide on approach :).

// accessLogging:
// - log_format:
// body:
// string_value: "[%START_TIME%] "%REQ(:METHOD)% %RESPONSE_CODE% %RESPONSE_FLAGS%\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this format something defined by open telemetry or a common format, or envoy specific ?

It would be great to avoid envoy-specific internals leaking into Istio APIs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the syntax should be consistent with what we use for VirtualService ( header value format ). Which is based on envoy internals, but now is grandfathered in - so if it is the same syntax we can keep it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format, as proposed here, is based on OTel's logging format. The values of string_value (or whatever ends up in body) would be opaque to Istio. The only job of the control plane would be to map those opaque values to the "correct" parts of the various provider configs. This could prove challenging.

}

// Taken from OTel ("opentelemetry/proto/common/v1/common.proto")
message AnyValue {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it viable to just use a Struct type? I believe that would allow the user to just write normal yaml with whatever types they want but not actually declaring them like [{foo: 1, bar: "baz"}].

If we need to convert that to OTel proto in the control plane it should be no problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would also work for both envoy string and envoy json format I believe. Unless Struct must be an object as the root

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. needs-rebase Indicates a PR needs to be rebased before being merged size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants