Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adding k8 support for container id detection #1962

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

abhee11
Copy link
Contributor

@abhee11 abhee11 commented Feb 26, 2024

Which problem is this PR solving?

  • This PR is extending the functionality of container -detectot

Short description of the changes

  • Adding a condition for checking if the application is in k8 env - if yes - then uses k8 api to fetch the containerId of the container name specified in the environment variable

@abhee11 abhee11 requested a review from a team February 26, 2024 09:13
@abhee11 abhee11 changed the title Enhancement/adding k8 support for container id detection feat: adding k8 support for container id detection Feb 26, 2024
@@ -51,6 +51,7 @@
"@opentelemetry/api": "^1.0.0"
},
"dependencies": {
"@kubernetes/client-node": "^0.20.0",
Copy link
Contributor

@trentm trentm Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A concern here is that this is a large dependency, but mostly that it still uses the deprecated request HTTP client library. This would bring in some CVEs (kubernetes-client/javascript#754 (comment)).

There is an effort (https://github.com/kubernetes-client/javascript/blob/master/FETCH_MIGRATION.md kubernetes-client/javascript#754) to move away from request to fetch(). That looks active. There was a recent release-candidate release with that work: '1.0.0-rc4': '2023-12-23T15:06:31.075Z'. However, I haven't looked deeply and I'm not sure how soon that might come.

Even if that came, an issue with @kubernetes/client-node@1.x might be that I believe it dropped compat with Node.js v14 in kubernetes-client/javascript@3a995fb
That commit unfortunately doesn't mention the Node.js version, nor is it tied to a PR with any discussion.
That same commit is dropping an AbortController shim for older Node.js version support. AbortController is avaialble starting in Node.js v14.17.0, so it is possible that @kubernetes/client-node@1.x would work with node >=14.17.0. They don't test with Node v14 though, so that is tenuous support.

I haven't read through this PR yet. Would it be possible to do whatever k8s client calls without using @kubernetes/client-node? I don't know if it is a very complex API to call manually.

Another option would possibly be to wait for @kubernetes/client-node@1.x and the OTel SDK 2.0 milestone (https://github.com/open-telemetry/opentelemetry-js/milestone/17), because that will drop support for Node.js v14.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe - This api takes care of adding filepath required for certifcates etc to be present.
I think we can certainly manually do all the operations that this does. It may be bit more code . I can look into it.

I see the migration to fetch and partial compatibility for node 14 as well. I think officially node 14 is not supported now.

Let me take a look at the effort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at the library itself.

It is officially maintained by Kubernetes SIG API Machinery.

I suggest we use this and create an issue there regarding node 14 and request --> fetch ( which is I believe is already in progress )

@trentm let me know what you think.

@@ -25,6 +25,9 @@ import * as fs from 'fs';
import * as util from 'util';
import { diag } from '@opentelemetry/api';

const { KubeConfig, CoreV1Api } = require('@kubernetes/client-node');
require('dotenv').config();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this use of dotenv your debugging code that should be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - makes sense -

Comment on lines 142 to 143
describe('Detect containerId in k8 environment', () => {
it('should return an empty Resource when not in a K8 environment', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could please change all uses of "k8" to "k8s" (upper and lower case). AFAIK "k8s" is the more common abbreviation -- at least "k8s" is used as a prefix in some OpenTelemetry specs / semantic conventions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely

@@ -100,6 +108,43 @@ export class ContainerDetector implements Detector {
return containerIdStr || '';
}

private _isInKubernetesEnvironment(): boolean {
return process.env.KUBERNETES_SERVICE_HOST !== undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know k8s well enough to have confidence in its standard envvars and its client API usage. I'd be happier if others with k8s experience could weigh in.

It might help to add some comments linking to official k8s docs justifying the envvars and APIs being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is purely based out of my research - let me find other PRs or articles supporting this.

Thanks for the review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhee11
Copy link
Contributor Author

abhee11 commented Apr 23, 2024

@trentm I have updated the PR as per comments and added relevant comments - please have a look.

throw new Error('Container name not specified in environment');
}

const response = await api.listNamespacePod(namespace);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this listing all pods in the namespace? What if there are multiple pods in the namespace with the same container name (as would happen with a k8s depoyment)? Will each detect a random container ID?

Listing all pods in a namespace is also much more expensive than getting a single pod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @dashpole
I believe this statement is true where it will list all pods.

May be I can provide a condition where it will look for a specific pod defined as environment variable.

@aabmass
Copy link
Member

aabmass commented Apr 24, 2024

@abhee11 I had a few questions

  • Why is the cgroup-based mechanism not good enough? Is this for non-standard container runtimes?
  • I don't see any call-out on permissions, wouldn't every workload need an RBAC config in order to call the k8s API?

Usually I've seen kind of annotation done with the collector's k8sattributes processor which supports container.id. The obvious downside is you have to run a collector but then you only need to grant RBAC permissions to the collector's service account. Some more info here https://opentelemetry.io/blog/2022/k8s-metadata/

@abhee11
Copy link
Contributor Author

abhee11 commented Apr 24, 2024

@abhee11 I had a few questions

  • Why is the cgroup-based mechanism not good enough? Is this for non-standard container runtimes?
  • I don't see any call-out on permissions, wouldn't every workload need an RBAC config in order to call the k8s API?

Usually I've seen kind of annotation done with the collector's k8sattributes processor which supports container.id. The obvious downside is you have to run a collector but then you only need to grant RBAC permissions to the collector's service account. Some more info here https://opentelemetry.io/blog/2022/k8s-metadata/

@aabmass

  • Why is the cgroup-based mechanism not good enough? Is this for non-standard container runtimes?
    I believe cgroup mechanism workd fine for container run time v1 and v2 - but fails for containerd. The cgroup mechanism itself is not standard where we look for a file in a specific location ( this is not available for us in containerd runtime )
  • I don't see any call-out on permissions, wouldn't every workload need an RBAC config in order to call the k8s API?
    The client-node library is taking care of the permissions and certificates path.

@aabmass
Copy link
Member

aabmass commented Apr 25, 2024

  • The client-node library is taking care of the permissions and certificates path.

The actual k8s service account needs to have permission to call the k8s API. There's some info here. Or are you finding this isn't needed?

@abhee11
Copy link
Contributor Author

abhee11 commented May 6, 2024

@aabmass Sorry for delayed response. Yes I believe the the client library was handling certifications.

Now the manual implementation with calling K8 api without any client lib - I am just asserting that the certificate file exists and is passed tot he https.request parameter.

Copy link
Member

@aabmass aabmass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the manual implementation with calling K8 api without any client lib - I am just asserting that the certificate file exists and is passed tot he https.request parameter.

With the client library or calling the API directly, I believe the certificate is just providing authentication so the k8s API knows who is sending the request. However, the service account still has to have permission to make the API call.

I tried running a snippet of this code in a GKE cluster and it does fail with 403 error:

const { KubeConfig, CoreV1Api } = require('@kubernetes/client-node');
const kubeconfig = new KubeConfig();
kubeconfig.loadFromDefault();
const api = kubeconfig.makeApiClient(CoreV1Api);
api.listNamespacedPod("default")

pods is forbidden: User "system:serviceaccount:default:default" cannot list resource "pods" in API group "" in the namespace "default". You can also check with kubectl can-i:

kubectl auth can-i list pods --as=system:serviceaccount:default:default 

outputs no in my cluster with no additional config.

@@ -25,6 +25,8 @@ import * as fs from 'fs';
import * as util from 'util';
import { diag } from '@opentelemetry/api';

const { KubeConfig, CoreV1Api } = require('@kubernetes/client-node');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to use require instead of import?

throw new Error('Container name not specified in environment');
}

const response = await api.listNamespacePod(namespace);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a typo, should be api.listNamespacedPod. I'm not sure why typescript isn't complaining

@abhee11
Copy link
Contributor Author

abhee11 commented May 22, 2024

Now the manual implementation with calling K8 api without any client lib - I am just asserting that the certificate file exists and is passed tot he https.request parameter.

With the client library or calling the API directly, I believe the certificate is just providing authentication so the k8s API knows who is sending the request. However, the service account still has to have permission to make the API call.

I tried running a snippet of this code in a GKE cluster and it does fail with 403 error:

const { KubeConfig, CoreV1Api } = require('@kubernetes/client-node');
const kubeconfig = new KubeConfig();
kubeconfig.loadFromDefault();
const api = kubeconfig.makeApiClient(CoreV1Api);
api.listNamespacedPod("default")

pods is forbidden: User "system:serviceaccount:default:default" cannot list resource "pods" in API group "" in the namespace "default". You can also check with kubectl can-i:

kubectl auth can-i list pods --as=system:serviceaccount:default:default 

outputs no in my cluster with no additional config.

@aabmass I am pushing out a change without the client-node library
The serviceaccount needs to be there inorder to make the api call. It places a certificate in a directory. With my new change it will basically check the existence of the certificate in that directory and would be passed as apart of the https request.

I am not sure - if there is more I need to validate in terms of certificate. Here is my breakdown

  1. We need a certificate in order to make the kube API call
  2. The certificate associated with the service account is validated ( exists and is in the path )
  3. If it does - it should be able to make the call and get the metadata.

Thanks again for the review . I should be able to push the changes today itself. Let me know what you think .

@iskiselev
Copy link

We are doing similar for C# at open-telemetry/opentelemetry-dotnet-contrib#1699.
It would be great to use same environment variable for thing not provided by K8s by default. We plan:

  • for container name use KUBERNETES_CONTAINER_NAME environment variable
  • for pod name use KUBERNETES_POD_NAME environment variable with fallback to HOSTNAME if it was not provided (as hostname will have pod_name on most environments if it was not overridden).

We also discussing to make it possible to provide container name / pod name on resource detector constructor making environment variable resolution as a fallback - the analogue here would be to read this values from ResourceDetectionConfig as I understand.

@abhee11
Copy link
Contributor Author

abhee11 commented Jun 13, 2024

@aabmass Can you please take a look at the implementation and see if it makes sense. I have addressed your comments and put a comment up for further clarification.

@aabmass
Copy link
Member

aabmass commented Jun 27, 2024

@abhee11 please take another look at my comments. I understand this authenticates as the service account using a local certificate. There are no instructions here for authorizing said service account. The PR that @iskiselev linked for C# does have these instructions https://github.com/open-telemetry/opentelemetry-dotnet-contrib/pull/1699/files#diff-a77c04d30700216dec1c4dbb106e5c0d8490b01b1e9ca20d99fbcc174a4803b5

Reemphasizing my above concern with having every workload call the k8s API:

I don't see any call-out on permissions, wouldn't every workload need an RBAC config in order to call the k8s API?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants