Auto instrumentation via pod mutation #455

anuraaga · 2021-10-11T07:24:47Z

I'd like to propose a new feature for the k8s operator (which I can work on), the ability to inject and enable auto instrumentation with no user code changes to their dockerfiles. Being able to opt in pods, or even namespaces to auto instrumentation, could be a transformative experience on k8s where observability is ensured by the infrastructure team, without involvement from app teams.

This is somewhat related to opentelemetry-lambda - it has a similar job of injecting auto instrumentation into Lambda runtimes and the approaches will generally be similar.

Basic premise

Enabling auto instrumentation generally requires two things to happen

Have the actual instrumentation libraries available in the container's filesystem. This could be the javaagent, or the contents of node_modules for opentelemetry-js instrumentation libraries, etc
Edit the entrypoint command or an environment variable to use these libraries when running the application.

These can happen as part of building an image by modifying Dockerfile, but the k8s operator could instead inject the files and edit the runtime command without build changes.

Packaging Instrumentation

The package format / ecosystem for k8s is docker images. For each implemented language, we would publish a docker image containing the instrumentation libraries for the language. GHCR may be an appropriate location, though any container registry could be used. For example, ghcr.io/open-telemetry/opentelemetry-operator/opentelemetry-k8s-java-autoinstrumentation

Init container / volume

The operator can mutate a pod manifest to make instrumentation libraries available to an app container by copying from the docker image into a local volume. The simplest approach that I know if is using an init container, with a volume mounted RW and a simple cp command line. The app container would be modified to reference the same volume as RO.

Update runtime

The app's container can be mutated in a language specific way to reference the instrumentation in the mounted volume.

One corner case is if an environment variable is updated but also referenced in the Dockerfile, it may get overridden and require a user to copy that environment variable into their k8s yaml. There is probably an approach to work around this though.

Language specific details. I've tried the approach for Java using k8s yaml and it worked well, others I haven't vetted with yaml yet. My assumption is any hand-written yaml boilerplate I could write could instead be applied by the operator automatically.

Java

Package contents: opentelemetry javaagent

Runtime update: Add or update JAVA_TOOL_OPTIONS to reference the java agent

This is identical to the approach taken by opentelemetry-lambda

https://github.com/open-telemetry/opentelemetry-lambda/blob/main/java/layer-javaagent/scripts/otel-handler

JS

Package contents: A wrapper library that initializes instrumentation, with the node_modules generated by npm install on a package.json referencing all instrumentation libraries that are used by the wrapper.

Runtime update: Add or update NODE_OPTIONS to reference the wrapper

TODO: Find the best option for adding the wrapper / libraries to the module lookup path

This is identical to the approach taken by opentelemetry-lambda

https://github.com/open-telemetry/opentelemetry-lambda/blob/main/nodejs/packages/layer/scripts/otel-handler

Python

Package contents: Site packages created by pip install of all opentelemetry-python instrumentation libraries. While most apps use opentelemetry-bootstrap to automatically determine a subset of instrumentation to include, our volume should contain all of them to allow full auto instrumentation

Runtime update: Prepend container entrypoint with opentelemetry-instrument

TODO: Find the best option to add the instrumentation packages to the module lookup path

Ruby

TBD

Dotnet

TBD

PHP

TBD

Go

Likely not possible due to static compilation

The text was updated successfully, but these errors were encountered:

jpkrohling · 2021-10-11T07:56:02Z

I love this idea and I know @pavolloffay is interested in this topic as well. I'd say that you can go ahead with a PoC :-)

pavolloffay · 2021-10-11T16:41:17Z

hi @anuraaga I have already build a POC but in a separate operator https://github.com/pavolloffay/opentelemetry-instrumentation-operator. I was planning to submit a PR to bring it here (at least for Java initially).

For languages, without "agent" feature we can still provide this functionality that would serve as a control plane - e.g. configure the SDK (reporting, sampling...)

I am willing to submit a PR with this functionality if you haven't started working on this already.

anuraaga · 2021-10-12T00:50:19Z

Thanks @pavolloffay - I looked through that code and the approach looks quite similar, would be great if you could add it here for Java! And I could help extend that with another language - I think we'll be able to get most languages supported, not only a control plane but actual auto instrumentation which will be quite cool.

Aneurysm9 · 2021-10-12T18:30:53Z

This is a great idea, though I'm sad Go can't take advantage of it. :)

I wonder about the footprint of copying the instrumentation libraries to a new volume for each pod. Could that get to be rather large with a large number of pods? Would it be possible to use a PersistentVolume with ReadOnlyMany mode to share access to the libraries?

anuraaga · 2021-10-13T02:14:55Z

I wonder about the footprint of copying the instrumentation libraries to a new volume for each pod.

Yup, this is something on my mind. PersistentVolume comes to mind, but has its own complexity such as the long time it can take to provision one, capacity-related inability to do so, or whether the cluster even has a PersistentVolumeController at all (my understanding is EKS by default doesn't, for example). I think we will want to explore these sort of optimizations going forward, but I'm only aware of the init container as a fullproof, if possibly inefficient, approach.

pavolloffay · 2021-10-27T06:22:42Z

The initial implementation will be merged soon. Now adding here my task list for the follow-up PRs

add e2e test - Add e2e tests for java auto-instrumentation #498
add flag to set default javaagent image
Support better service name
add status .e.g. show which workloads are instrumented and what is the sampling configuration...
add sampling configuration to CR
add resources map to CR and inject k8s resources - Add resources to instrumentation CR #509

pavolloffay · 2021-11-16T09:33:58Z

I think we can close this issue now and create dedicated well-defined followup issues.

anuraaga mentioned this issue Oct 11, 2021

Use opentelemetry-instrument script to invoke python app open-telemetry/opentelemetry-lambda#152

Closed

jpkrohling added the enhancement New feature or request label Oct 11, 2021

jpkrohling assigned anuraaga Oct 11, 2021

anuraaga changed the title ~~Auto instrumentation via operator~~ Auto instrumentation via pod mutation Oct 11, 2021

This was referenced Oct 19, 2021

Move otelcol api into api/otelcol #462

Merged

Add initial support for auto-instrumentation #464

Merged

pavolloffay mentioned this issue Nov 15, 2021

Add support for better service name and k8s resource attrs #542

Merged

jpkrohling closed this as completed Nov 17, 2021

sy-be mentioned this issue Jan 24, 2024

Detection of Python packages during an autoinstrumentation with Operator #2559

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto instrumentation via pod mutation #455

Auto instrumentation via pod mutation #455

anuraaga commented Oct 11, 2021 •

edited

Loading

jpkrohling commented Oct 11, 2021

pavolloffay commented Oct 11, 2021

anuraaga commented Oct 12, 2021

Aneurysm9 commented Oct 12, 2021

anuraaga commented Oct 13, 2021 •

edited

Loading

pavolloffay commented Oct 27, 2021 •

edited

Loading

pavolloffay commented Nov 16, 2021

Auto instrumentation via pod mutation #455

Auto instrumentation via pod mutation #455

Comments

anuraaga commented Oct 11, 2021 • edited Loading

Basic premise

Packaging Instrumentation

Init container / volume

Update runtime

Java

JS

Python

Ruby

Dotnet

PHP

Go

jpkrohling commented Oct 11, 2021

pavolloffay commented Oct 11, 2021

anuraaga commented Oct 12, 2021

Aneurysm9 commented Oct 12, 2021

anuraaga commented Oct 13, 2021 • edited Loading

pavolloffay commented Oct 27, 2021 • edited Loading

pavolloffay commented Nov 16, 2021

anuraaga commented Oct 11, 2021 •

edited

Loading

anuraaga commented Oct 13, 2021 •

edited

Loading

pavolloffay commented Oct 27, 2021 •

edited

Loading