-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracing KEP: Move to Implementable #1458
Conversation
3603ce5
to
44f1d5e
Compare
44f1d5e
to
259e3f3
Compare
I decided to repurpose this PR as the PR to move the KEP to implementable, and will include other changes to address feedback in it. cc @kubernetes/sig-instrumentation-api-reviews |
/assign @brancz |
Any particular reason this is |
Removed the WIP tag. This is really a |
I think we should follow similar versioning conversion like other annotations e.g. "alpha.trace.kubernetes.io/context" |
Makes sense. I'm going to wait until the discussion about updating annotations with subresource update is completed before updating. |
adb386e
to
e289df7
Compare
@BenTheElder can you take a pass on the testing paragraph? It should be straight-forward, but it would be good to get someone's eyes from sig-testing on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is for #647 then the dir should be: 647-apiserver-tracing
KEP-647, etc...
As we track via the issue number and issue 34 that you are referencing seems to be something... completely different. :)
@kikisdeliveryservice I updated the KEP number. The KEP was created when KEP numbers were based on a next kep number file... |
|
||
### Test Plan | ||
|
||
We will e2e test this feature by deploying an OpenTelemetry Collector on the master, and configure it to export traces using the [stdout exporter](https://github.com/open-telemetry/opentelemetry-go/tree/master/exporters/stdout), which logs the spans in json format. We can then verify that the logs contain our expected traces when making calls to the API Server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan this to ever be conformance tested? This doesn't sound easy to conformance test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would be good to have this conformance tested. I added it as a GA requirement. With the current configuration options, we can run the OTel Collector as a service in the cluster instead of on the master node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is, users likely won't be able to configure that config file. But yeah, I agree with testing via running this in the cluster.
## Production Readiness Survey | ||
|
||
* Feature enablement and rollback | ||
- How can this feature be enabled / disabled in a live cluster? **Feature-gate: APIServerTracing. The API Server must be restarted to enable/disable exporting spans.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the feature is GA and we remove the feature gate, how will one turn it off if it's not desired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I specified that if the configuration file path isn't specified, tracing is disabled. Alternatively, we can add a specific configuration option in the config to specify if it is disabled.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah going off of the config file presence is fine.
|
||
We will wrap the API Server's http server and http clients with [otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib/tree/master/instrumentation/net/http/otelhttp) to get spans for incoming and outgoing http requests. This generates spans for all sampled incoming requests and propagates context with all client requests. For incoming requests, this would go below [WithRequestInfo](https://github.com/kubernetes/kubernetes/blob/9eb097c4b07ea59c674a69e19c1519f0d10f2fa8/staging/src/k8s.io/apiserver/pkg/server/config.go#L676) in the filter stack, as it must be after authentication and authorization, before the panic filter, and is closest in function to the WithRequestInfo filter. | ||
|
||
Note that some clients of the API Server, such as webhooks, may make reentrant calls to the API Server. To gain the full benefit of tracing, such clients should propagate context with requests back to the API Server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please expand this, how do clients do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an extremely strong interest in apiserver knowing when an incoming API call is re-entrant, and ideally, what its parent call is. Additionally, I'd like this to work in a multiple apiserver setup. So:
- Can the tracing header be reverse-engineered to figure this out?
- Can I pack metadata into the tracing header to make this possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave an example of how a webhook would do that. The tracing header will tell you if it is a direct child of the API Server trace, but you can't tell if it is an indirect child (e.g. if the admission controller webhook adds a span in-between). You can pack metadata into the tracing header (well, it is a different header, but still), using "Baggage": https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/baggage/api.md#baggage-api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, love the name "Baggage"
|
||
This KEP proposes the use of the [OpenTelemetry tracing framework](https://opentelemetry.io/) to create and export spans to configured backends. | ||
|
||
The API Server will use the [OpenTelemetry exporter format](https://github.com/open-telemetry/opentelemetry-proto), and the [OTlp exporter](https://github.com/open-telemetry/opentelemetry-go/tree/master/exporters/otlp#opentelemetry-collector-go-exporter) which can export traces. This format is easy to use with the [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector), which allows importing and configuring exporters for trace storage backends to be done out-of-tree in addition to other useful features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm OK for approving this for alpha; for beta I think @deads2k should also approve and at least one of us actually needs to look at the OT code, I haven't...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I added a "revisit format" item for beta.
I just noticed this and the Structured Logging KEP. I see this is listed as a non-goal:
Having recently worked on observability for Knative, I discovered that having a common set of labels across all the observability components (metrics, logs, tracing) helped with both reducing boilerplate and increasing accuracy/coverage of labels. It's also really painful to align these after you've done a bunch of instrumentation work for each separately and downstream teams have taken dependencies on the specific labels. (The interface I'm looking at extracts a logger from the context, and that logger can be augmented with trace labels before it is returned.) It looks like KEP 1602 doesn't define many well-known keys ( https://github.com/kubernetes/kubernetes/pull/91773/files#diff-06ce45a2e81bbfd5d4ce7e3c67d49a84R47 also seems to introduce/use |
(Don't consider my comment blocking, it's a drive-by because I was commenting on my experience and someone upstream pointed out this KEP.) |
@evankanderson thanks for the feedback; you make good points. I definitely agree with the value of having the same set of labels. One problem right now is that we have two different standards: the long-standing kubernetes instrumentation guidelines and the OpenTelemetry semantic conventions. The kubernetes standards suggest using "pod", and the OpenTelemetry standards suggest "k8s.pod", for example. I don't have a good answer, but I'll bring it up for discussion at next week's instrumentation meeting. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dashpole, lavalamp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
I think we'll want to encourage clients as strongly as possible to pass those headers through. Is it easy to describe in non-go terms what a client has to do? Hopefully just copy-paste some headers matching a particular name format? |
Yes. This describes the trace context header: https://www.w3.org/TR/trace-context/. And this describes the baggage header: https://w3c.github.io/baggage/. So you would have to copy the But in general, OpenTelemetry has broad language support, which should be easier than manually copying headers. |
cc @kubernetes/sig-api-machinery-misc @lavalamp @logicalhan @deads2k @liggitt @mikedanese