Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add container resource detector. #340

Merged

Conversation

scbjans
Copy link
Contributor

@scbjans scbjans commented Feb 16, 2023

This PR adds a resource detector for Containers.

It will add the container ID if possible.

For containers running on a Docker runtime, it will also try to add
* container.image.name
* container.image.tag
* container.name
* container.runtime

but that depends on the availability of a readable Docker Socket.

I've decided to remove setting additional container resource attributes and only support setting the container.id.

Reasoning behind that:

  • Setting other container resource attributes requires a readable docker.sock/containerd.sock, which requires correct - probably most of the time equals to elevated - permissions. Most easy way to do that is of course running the container as root. I do not want to promote that.
  • I could try and add support for configuring access to the docker.sock using TLS certs; but that would make auto-instrumentation a lot less auto
  • Other languages also only seem to be setting container.id

Screenshot 2023-02-16 at 15-42-04 Jaeger UI

Fixes: #179

@scbjans scbjans marked this pull request as draft March 29, 2023 07:11
@scbjans scbjans marked this pull request as ready for review April 3, 2023 15:06
Copy link
Member

@mwear mwear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly LGTM. I just had a question about cgroup v2 this time. I'm not requesting changes, but I'd like to clarify that the strategy we're using is at least as reliable as others I've seen.

parts.shift

parts.each do |part|
return part if part.match?(CONTAINER_REGEX)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other implementations that I have seen specifically look for a line containing "hostname" to extract the container id from. For example:

However, looking at your test, and inspecting /proc/self/mountinfo on some containers running locally, it looks like this should work just fine. Do you know of any other implementations that use this strategy? I'm curious if they know something we don't, or if there are multiple solutions that work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your re-review @mwear.

My guess would be that filtering on "hostname" is done to limit the amount of regexmatching?
I initially included the filter on "hostname" as well; but decided to remove that; since it cannot be used reliably. e.g. when a user set the hostname themselves in kubernetes or swarm; we would not be able to determine the container.id (although I think setting the "hostname" will probably be an edge case).

If we decide that we don't want to filter on "hostname" before regex matching; we might also remove container_id_v1; since def container_id_v2 will also work for cgroupv1 - although I still want to benchmark that before doing so - to keep the code a bit simpler. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference. I am most interested in the detector reliably and consistently detecting the container id. As far as performance goes, detecting the container id will be something that happens once at startup. While we don't want to give performance away unnecessarily, this is not a hot code path, so favoring simplicity is reasonable thing to do here. Let me know when you're happy with everything and I'll review it again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've simplified the code a bit; according to me this should do the trick.

@scbjans scbjans requested a review from mwear April 7, 2023 12:31
Copy link
Member

@mwear mwear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks @scbjans.

@mwear
Copy link
Member

mwear commented Apr 13, 2023

@robertlaurin could we get an additional set of eyes on this when you get a chance?

@scbjans
Copy link
Contributor Author

scbjans commented Apr 16, 2023

@mwear @robertlaurin from the SIG I understood that the plan was to split the resource detectors gem into separate gems. That totally makes sense to me; and I'll rework this PR a bit to reflect that. Once that's done and approved, I'm happy to also split of the other two into separate gems.

One point on which I would like advice or some directions is the addition of other container resource attributes (see PR description, and - currently outdated - image). It'll be helpful to add the other container resource attributes as well; but I do not want to promote running the container as root. Do you perhaps have another idea on how I could add them?

# @return [String] container.id
# May be nil.
def container_id
[CGROUP_V2_PATH, CGROUP_V1_PATH].each do |cgroup|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would emitting some kind of warning or error message be helpful to folks trying to debug why this resource detector is unable to read the cgroup files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally my idea was that we do not want to flood the logs if someone uses this gems Autodetector.detect, but since we're going to split the resource detectors up into separate gems, that reasoning is no longer valid.

I'll add some logs once I get to splitting this up into it's own gem.

# If no container attributes could be determined an empty resource is returned
#
# @return [OpenTelemetry::SDK::Resources::Resource]
def detect
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely certain what the spec says about error handling in Resource detectors but I recommend you consider adding error handling with diagnostic information that will clue folks into why this resource detector may be failing and avoid raising errors that would cause the application to crash at boot time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do so once I split this of into it's own separate gem (see comment above). However, I do believe that currently this resource detector will not crash if it cannot determine the container.id?

next unless File.readable?(cgroup)

File.readlines(cgroup, chomp: true).each do |line|
parts = line.split('/')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some nuance I am missing here. Does scanning the full string result in false positives?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scanning the full string works for cgroupv1; but I couldn't get it to work for cgroupv2. This way it's compatible with both

@scbjans scbjans marked this pull request as draft April 20, 2023 19:50
@scbjans
Copy link
Contributor Author

scbjans commented Apr 20, 2023

The container.id cannot be determined reliably when the container has additional copy-on-write layers written to it's filesystem. I'll have to sort that out first -> returning to draft mode.

@scbjans scbjans marked this pull request as ready for review May 9, 2023 10:53
@scbjans
Copy link
Contributor Author

scbjans commented May 9, 2023

Updated PR to only filter out lines including an actual container id. Previously I ran into problems because storage ids also matched. As a bonus I've included a test so that it should also work for Podman (cgroupfile copied from the java instrumentation).

@scbjans
Copy link
Contributor Author

scbjans commented May 9, 2023

Regarding adding the other container resource attributes: I've learned that I can use the otelcols k8sattributesprocessor to fix that.

@scbjans
Copy link
Contributor Author

scbjans commented Jun 1, 2023

@robertlaurin is this how you envisioned splitting up the resource detectors into it's own seperate gem? If so, I'm happy to do it for the other two resource detectors (Azure and GCP) as well.

@arielvalentin
Copy link
Collaborator

@mwear @ericmustin anything left here to do before merging?

@ericmustin
Copy link
Contributor

@arielvalentin lgtm

@arielvalentin
Copy link
Collaborator

@ericmustin would you mind merging and releasing?

@github-actions
Copy link
Contributor

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

@github-actions github-actions bot added the stale Marks an issue/PR stale label Jul 23, 2023
@scbjans
Copy link
Contributor Author

scbjans commented Jul 25, 2023

@mwear @ericmustin @arielvalentin anything I can do to get this merged?

@github-actions github-actions bot removed the stale Marks an issue/PR stale label Jul 26, 2023
@scbjans
Copy link
Contributor Author

scbjans commented Jul 28, 2023

CI is broken; but there's already a PR which fixes that: #581

@arielvalentin arielvalentin merged commit 2039d15 into open-telemetry:main Jul 28, 2023
43 checks passed
@scbjans scbjans deleted the 179-add-container-resource-detector branch July 31, 2023 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request - New Resource Detector - dockerCGroupV1Detector #2353
4 participants