-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto instrumentation via pod mutation #455
Comments
I love this idea and I know @pavolloffay is interested in this topic as well. I'd say that you can go ahead with a PoC :-) |
hi @anuraaga I have already build a POC but in a separate operator https://github.com/pavolloffay/opentelemetry-instrumentation-operator. I was planning to submit a PR to bring it here (at least for Java initially). For languages, without "agent" feature we can still provide this functionality that would serve as a control plane - e.g. configure the SDK (reporting, sampling...) I am willing to submit a PR with this functionality if you haven't started working on this already. |
Thanks @pavolloffay - I looked through that code and the approach looks quite similar, would be great if you could add it here for Java! And I could help extend that with another language - I think we'll be able to get most languages supported, not only a control plane but actual auto instrumentation which will be quite cool. |
This is a great idea, though I'm sad Go can't take advantage of it. :) I wonder about the footprint of copying the instrumentation libraries to a new volume for each pod. Could that get to be rather large with a large number of pods? Would it be possible to use a PersistentVolume with ReadOnlyMany mode to share access to the libraries? |
Yup, this is something on my mind. PersistentVolume comes to mind, but has its own complexity such as the long time it can take to provision one, capacity-related inability to do so, or whether the cluster even has a PersistentVolumeController at all (my understanding is EKS by default doesn't, for example). I think we will want to explore these sort of optimizations going forward, but I'm only aware of the init container as a fullproof, if possibly inefficient, approach. |
The initial implementation will be merged soon. Now adding here my task list for the follow-up PRs
|
I think we can close this issue now and create dedicated well-defined followup issues. |
I'd like to propose a new feature for the k8s operator (which I can work on), the ability to inject and enable auto instrumentation with no user code changes to their dockerfiles. Being able to opt in pods, or even namespaces to auto instrumentation, could be a transformative experience on k8s where observability is ensured by the infrastructure team, without involvement from app teams.
This is somewhat related to opentelemetry-lambda - it has a similar job of injecting auto instrumentation into Lambda runtimes and the approaches will generally be similar.
Basic premise
Enabling auto instrumentation generally requires two things to happen
node_modules
for opentelemetry-js instrumentation libraries, etcThese can happen as part of building an image by modifying
Dockerfile
, but the k8s operator could instead inject the files and edit the runtime command without build changes.Packaging Instrumentation
The package format / ecosystem for k8s is docker images. For each implemented language, we would publish a docker image containing the instrumentation libraries for the language. GHCR may be an appropriate location, though any container registry could be used. For example,
ghcr.io/open-telemetry/opentelemetry-operator/opentelemetry-k8s-java-autoinstrumentation
Init container / volume
The operator can mutate a pod manifest to make instrumentation libraries available to an app container by copying from the docker image into a local volume. The simplest approach that I know if is using an init container, with a volume mounted RW and a simple
cp
command line. The app container would be modified to reference the same volume as RO.Update runtime
The app's container can be mutated in a language specific way to reference the instrumentation in the mounted volume.
One corner case is if an environment variable is updated but also referenced in the Dockerfile, it may get overridden and require a user to copy that environment variable into their k8s yaml. There is probably an approach to work around this though.
Language specific details. I've tried the approach for Java using k8s yaml and it worked well, others I haven't vetted with yaml yet. My assumption is any hand-written yaml boilerplate I could write could instead be applied by the operator automatically.
Java
Package contents: opentelemetry javaagent
Runtime update: Add or update
JAVA_TOOL_OPTIONS
to reference the java agentThis is identical to the approach taken by opentelemetry-lambda
https://github.com/open-telemetry/opentelemetry-lambda/blob/main/java/layer-javaagent/scripts/otel-handler
JS
Package contents: A wrapper library that initializes instrumentation, with the
node_modules
generated bynpm install
on apackage.json
referencing all instrumentation libraries that are used by the wrapper.Runtime update: Add or update
NODE_OPTIONS
to reference the wrapperTODO: Find the best option for adding the wrapper / libraries to the module lookup path
This is identical to the approach taken by opentelemetry-lambda
https://github.com/open-telemetry/opentelemetry-lambda/blob/main/nodejs/packages/layer/scripts/otel-handler
Python
Package contents: Site packages created by
pip install
of all opentelemetry-python instrumentation libraries. While most apps useopentelemetry-bootstrap
to automatically determine a subset of instrumentation to include, our volume should contain all of them to allow full auto instrumentationRuntime update: Prepend container entrypoint with
opentelemetry-instrument
TODO: Find the best option to add the instrumentation packages to the module lookup path
Ruby
TBD
Dotnet
TBD
PHP
TBD
Go
Likely not possible due to static compilation
The text was updated successfully, but these errors were encountered: