Skip to content

Conversation

cyrille-leclerc
Copy link
Member

@cyrille-leclerc cyrille-leclerc commented Oct 2, 2025

Use the OpenTelemetry Kubernetes Operator (aka OTel Operator) to inject SDK configuration and manage OTel Collectors.

To ease usage of the OTel Operator, we deploy it through the OpenTelemetry Kube Stack Helm Chart.

Noteworthy:

  • Instrumentation: we only inject the OTel configuration through he inject-sdk with env vars like OTEL_OTLP_EXPORTER_ENDPOINT, we don't inject the SDKs themselves like inject-java because OTel SDKs are already bundled in the container images. Note that bundling OTel SDKs in container images enables the OTel Demo to produce the same container images for th Docker Compose and Kubernetes deployments.
  • The OTel Demo components lack consistency in their usage of OTLP grpc and http/protobuf requiring to inject the 2 grpc and http/protobuf endpoint configurations while the OTel Operator Instrumentation CRD doesn't offer such flexibility, requiring for the moment to sometimes override the OTEL_EXPORTER_OTLP_ENDPOINT env var at the component level.
  • The OTel Demo Helm Chart MUST ensure that the OTel Operator is up and the Instrumentation CRD is deployed before starting the component (e.g. ad, fraud-detection...) so that these components get the OTel configuration injected through env vars.
    • ⚠️ the solution found in this PR to ensure the components are started after the otel-operator instrumentation is up is to restart all components after the helm chart installed the resources through the restart-services-after-otel-operator-is-ready job
  • Important limitation on Docker Desktop Kubernetes: limitations of file system mounts requires to disable pod logs and host metrics scrapping.

Pending work:

  • Review all ./examples an propose alternatives:
    • bring-your-own-observability: should just be a change in the otel-col config in Values.yaml to add exporters to "your own observability"
    • collector-as-damonset: I deleted it as daemon becomes the default deployment
    • custom-environment-variable: should just be a change in Values.yaml
    • kubernetes-infra-monitoring: I guess we could delete it, it will just be changing the presets in Values.yaml: logsCollection, hostMetrics, kubeletMetrics, kubernetesEvents, and clusterMetrics
    • public-hosted-ingres: TODO
  • Fix the problem of OTel Col receivers (httpcheck/frontend-proxy, nginx, postgresql, redis...) that are currently deployed on daemon collectors even though they should run as singletons. Solutions like leader election, OTel Col receiver creator, or the creation of a Gateway collector should be looked at.

Successfully tested on

  • Docker Desktop Mac Kubernetes: cpu_limit=8, memory_limit=8GB, swap=1GB
  • Digital Ocean Kubernetes 2 nodes with 4vCPUs and 8GB each

FYI @rogercoll

APM dashboard

# Conflicts:
#	charts/opentelemetry-demo/Chart.lock
#	charts/opentelemetry-demo/Chart.yaml
#	charts/opentelemetry-demo/examples/bring-your-own-observability/rendered/component.yaml
#	charts/opentelemetry-demo/examples/collector-as-daemonset/rendered/component.yaml
#	charts/opentelemetry-demo/examples/collector-as-daemonset/rendered/grafana/configmap-dashboard-provider.yaml
#	charts/opentelemetry-demo/examples/collector-as-daemonset/rendered/grafana/deployment.yaml
#	charts/opentelemetry-demo/examples/collector-as-daemonset/rendered/opentelemetry-collector/clusterrole.yaml
#	charts/opentelemetry-demo/examples/collector-as-daemonset/rendered/opentelemetry-collector/clusterrolebinding.yaml
#	charts/opentelemetry-demo/examples/collector-as-daemonset/rendered/opentelemetry-collector/configmap-agent.yaml
#	charts/opentelemetry-demo/examples/collector-as-daemonset/rendered/opentelemetry-collector/daemonset.yaml
#	charts/opentelemetry-demo/examples/collector-as-daemonset/rendered/opentelemetry-collector/serviceaccount.yaml
#	charts/opentelemetry-demo/examples/custom-environment-variables/rendered/component.yaml
#	charts/opentelemetry-demo/examples/custom-environment-variables/rendered/grafana/configmap-dashboard-provider.yaml
#	charts/opentelemetry-demo/examples/custom-environment-variables/rendered/grafana/deployment.yaml
#	charts/opentelemetry-demo/examples/custom-environment-variables/rendered/opentelemetry-collector/clusterrole.yaml
#	charts/opentelemetry-demo/examples/custom-environment-variables/rendered/opentelemetry-collector/clusterrolebinding.yaml
#	charts/opentelemetry-demo/examples/custom-environment-variables/rendered/opentelemetry-collector/configmap.yaml
#	charts/opentelemetry-demo/examples/custom-environment-variables/rendered/opentelemetry-collector/deployment.yaml
#	charts/opentelemetry-demo/examples/custom-environment-variables/rendered/opentelemetry-collector/service.yaml
#	charts/opentelemetry-demo/examples/custom-environment-variables/rendered/opentelemetry-collector/serviceaccount.yaml
#	charts/opentelemetry-demo/examples/default/rendered/component.yaml
#	charts/opentelemetry-demo/examples/default/rendered/grafana/configmap-dashboard-provider.yaml
#	charts/opentelemetry-demo/examples/default/rendered/grafana/deployment.yaml
#	charts/opentelemetry-demo/examples/default/rendered/opentelemetry-collector/clusterrole.yaml
#	charts/opentelemetry-demo/examples/default/rendered/opentelemetry-collector/clusterrolebinding.yaml
#	charts/opentelemetry-demo/examples/default/rendered/opentelemetry-collector/configmap.yaml
#	charts/opentelemetry-demo/examples/default/rendered/opentelemetry-collector/deployment.yaml
#	charts/opentelemetry-demo/examples/default/rendered/opentelemetry-collector/service.yaml
#	charts/opentelemetry-demo/examples/default/rendered/opentelemetry-collector/serviceaccount.yaml
#	charts/opentelemetry-demo/examples/kubernetes-infra-monitoring/rendered/component.yaml
#	charts/opentelemetry-demo/examples/kubernetes-infra-monitoring/rendered/grafana/configmap-dashboard-provider.yaml
#	charts/opentelemetry-demo/examples/kubernetes-infra-monitoring/rendered/grafana/deployment.yaml
#	charts/opentelemetry-demo/examples/kubernetes-infra-monitoring/rendered/opentelemetry-collector/clusterrole.yaml
#	charts/opentelemetry-demo/examples/kubernetes-infra-monitoring/rendered/opentelemetry-collector/clusterrolebinding.yaml
#	charts/opentelemetry-demo/examples/kubernetes-infra-monitoring/rendered/opentelemetry-collector/configmap-agent.yaml
#	charts/opentelemetry-demo/examples/kubernetes-infra-monitoring/rendered/opentelemetry-collector/daemonset.yaml
#	charts/opentelemetry-demo/examples/kubernetes-infra-monitoring/rendered/opentelemetry-collector/serviceaccount.yaml
#	charts/opentelemetry-demo/examples/public-hosted-ingress/rendered/component.yaml
#	charts/opentelemetry-demo/examples/public-hosted-ingress/rendered/grafana/configmap-dashboard-provider.yaml
#	charts/opentelemetry-demo/examples/public-hosted-ingress/rendered/grafana/deployment.yaml
#	charts/opentelemetry-demo/examples/public-hosted-ingress/rendered/opentelemetry-collector/clusterrole.yaml
#	charts/opentelemetry-demo/examples/public-hosted-ingress/rendered/opentelemetry-collector/clusterrolebinding.yaml
#	charts/opentelemetry-demo/examples/public-hosted-ingress/rendered/opentelemetry-collector/configmap.yaml
#	charts/opentelemetry-demo/examples/public-hosted-ingress/rendered/opentelemetry-collector/deployment.yaml
#	charts/opentelemetry-demo/examples/public-hosted-ingress/rendered/opentelemetry-collector/ingress.yaml
#	charts/opentelemetry-demo/examples/public-hosted-ingress/rendered/opentelemetry-collector/service.yaml
#	charts/opentelemetry-demo/examples/public-hosted-ingress/rendered/opentelemetry-collector/serviceaccount.yaml
#	charts/opentelemetry-demo/values.yaml
# Conflicts:
#	charts/opentelemetry-demo/Chart.yaml
#	charts/opentelemetry-demo/examples/bring-your-own-observability/rendered/component.yaml
#	charts/opentelemetry-demo/examples/bring-your-own-observability/rendered/flagd-config.yaml
#	charts/opentelemetry-demo/examples/bring-your-own-observability/rendered/product-catalog-products.yaml
#	charts/opentelemetry-demo/examples/bring-your-own-observability/rendered/serviceaccount.yaml
#	charts/opentelemetry-demo/examples/collector-as-daemonset/rendered/component.yaml
#	charts/opentelemetry-demo/examples/collector-as-daemonset/rendered/flagd-config.yaml
#	charts/opentelemetry-demo/examples/collector-as-daemonset/rendered/grafana-config.yaml
#	charts/opentelemetry-demo/examples/collector-as-daemonset/rendered/product-catalog-products.yaml
#	charts/opentelemetry-demo/examples/collector-as-daemonset/rendered/serviceaccount.yaml
#	charts/opentelemetry-demo/examples/custom-environment-variables/rendered/component.yaml
#	charts/opentelemetry-demo/examples/custom-environment-variables/rendered/flagd-config.yaml
#	charts/opentelemetry-demo/examples/custom-environment-variables/rendered/grafana-config.yaml
#	charts/opentelemetry-demo/examples/custom-environment-variables/rendered/product-catalog-products.yaml
#	charts/opentelemetry-demo/examples/custom-environment-variables/rendered/serviceaccount.yaml
#	charts/opentelemetry-demo/examples/default/rendered/component.yaml
#	charts/opentelemetry-demo/examples/default/rendered/flagd-config.yaml
#	charts/opentelemetry-demo/examples/default/rendered/grafana-config.yaml
#	charts/opentelemetry-demo/examples/default/rendered/product-catalog-products.yaml
#	charts/opentelemetry-demo/examples/default/rendered/serviceaccount.yaml
#	charts/opentelemetry-demo/examples/kubernetes-infra-monitoring/rendered/component.yaml
#	charts/opentelemetry-demo/examples/kubernetes-infra-monitoring/rendered/flagd-config.yaml
#	charts/opentelemetry-demo/examples/kubernetes-infra-monitoring/rendered/grafana-config.yaml
#	charts/opentelemetry-demo/examples/kubernetes-infra-monitoring/rendered/product-catalog-products.yaml
#	charts/opentelemetry-demo/examples/kubernetes-infra-monitoring/rendered/serviceaccount.yaml
#	charts/opentelemetry-demo/examples/public-hosted-ingress/rendered/component.yaml
#	charts/opentelemetry-demo/examples/public-hosted-ingress/rendered/flagd-config.yaml
#	charts/opentelemetry-demo/examples/public-hosted-ingress/rendered/grafana-config.yaml
#	charts/opentelemetry-demo/examples/public-hosted-ingress/rendered/product-catalog-products.yaml
#	charts/opentelemetry-demo/examples/public-hosted-ingress/rendered/serviceaccount.yaml
Copy link
Contributor

@rogercoll rogercoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share more context on why we should use auto-instrumentation for the otel-demo services? The otel-demo services are already configured with its corresponding Otel SDK, what would be the use case of dynamically injecting another SDK?

An alterative would be using the kube-stack Helm Chart for deploying the collectors and create a new k8s-only uninstrumented (no SDK) service in the otel-demo for the auto instrumentation use case.

@cyrille-leclerc
Copy link
Member Author

cyrille-leclerc commented Oct 8, 2025

Could you share more context on why we should use auto-instrumentation for the otel-demo services? The otel-demo services are already configured with its corresponding Otel SDK, what would be the use case of dynamically injecting another SDK?

+1 that otel-demo services are instrumented with OTel SDKs today. However, I think it's valuable to demo the OTel Operator Instrumentation CRD with the inject-sdk pod annotation today:

  • Show the best practices and soon hopefully stop bundling the OTel SDK in the container images of the otel-demo service
  • Get the benefit of the OTel Operator injecting the SDK config through the OTEL_ env vars, particularly resource attributes with the following benefits:
    • For OTel practitioners, get their telemetry fully enriched and compliant with OTel specs like Specify resource attributes using Kubernetes annotations. Manually setting resource attributes on K8s is very error prone.
    • For the OTel project, to verify that the OTel Operator Instrumentation CRD doesn't forget anything. For example I discovered through this PR that most services of the demo set a wrong host.name value, using the k8s.pod.name value. I guess it's because the OTel SDK HostNameProvider that is used when it's not the desired value in containers

An alterative would be using the kube-stack Helm Chart for deploying the collectors and create a new k8s-only uninstrumented (no SDK) service in the otel-demo for the auto instrumentation use case.

+1 I see it as a subsequent milestone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants