Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribution Proposal: Kubernetes-specific distribution #357

Closed
TylerHelmuth opened this issue Jun 9, 2023 · 18 comments · Fixed by #507
Closed

Distribution Proposal: Kubernetes-specific distribution #357

TylerHelmuth opened this issue Jun 9, 2023 · 18 comments · Fixed by #507

Comments

@TylerHelmuth
Copy link
Member

TylerHelmuth commented Jun 9, 2023

Description

Many many users run the collector in Kubernetes today and the OpenTelemetry project has encouraged that behavior via the helm charts, operator, and Kubernetes-specific components. To continue fostering the concept that the Collector is a terrific solution for monitoring Kubernetes I believe we need a more direct approach.

In order to facilitate a good default image for the OpenTelemetry-helm-charts, OpenTelemetry-operator, and any user looking to monitor Kubernetes, I propose that we introduce a new community-supported distribution specifically made for monitoring Kubernetes. This distribution will provide an alternative to the Contrib distribution, which we do not recommend be used in production, for users who want to use a community supported image vs building their own.

Benefits:

  • Ease of use for users, especially those getting started. Not all OTel users also want to manage a build pipeline for the Collector. In order to build an image for Kubernetes today you must run ocb with your manifest AND include the output in a docker image that you then host somewhere. This image provides a solution for those looking to run the collector to monitor Kubernetes but also don’t want the burden of maintaining a collector build pipeline.
  • Following our own best practices. Today the Collector and Operator helm charts use the Contrib image by default which enables bad practices if the user chooses to use the chart in production without modification. The operator uses the Core repo, which is limiting for most users. A Kubernetes-specific distribution would be in line with our best practices and would not limit users by default
  • Smaller than Contrib. A Kubernetes-specific distribution should naturally be smaller than Contrib. This follows our and industry best practices.

Rules for components in the image:

This is only my initial ruleset, I am very interested in the community's opinions and feedback.

  • Only includes components from Contrib and Core
  • Only components that are Alpha or higher
  • All receivers must facilitate the collection of data that is generated by Kubernetes or services running in Kubernetes
    • Although you could run a collector in Kubernetes to reach out to any endpoint, this image is not targeting that use case. This image is for monitoring Kubernetes and the processes it runs.
  • All components must be vendor-neutral. (I don’t see another way to ensure vendor neutrality in a community distribution that does not want to be the Contrib distribution - without this rule we'd have to allow every receiver and exporter in Contrib, at which point this image has no relevance)

Proposed manifest:

Please carefully review this manifest. I believe I have created it correctly based on the initial rules laid out above, but I may have missed a component or two that meets the criteria.

extensions:
  - gomod: go.opentelemetry.io/collector/extension/zpagesextension v0.79.0
  - gomod: go.opentelemetry.io/collector/extension/ballastextension v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/basicauthextension v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/bearertokenauthextension v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/headerssetterextension v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/healthcheckextension v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/httpforwarder v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/oauth2clientauthextension v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/observer/hostobserver v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/observer/k8sobserver v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/oidcauthextension v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/pprofextension v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/storage v0.79.0
    import: github.com/open-telemetry/opentelemetry-collector-contrib/extension/storage/filestorage

exporters:
  - gomod: go.opentelemetry.io/collector/exporter/loggingexporter v0.79.0
  - gomod: go.opentelemetry.io/collector/exporter/otlpexporter v0.79.0
  - gomod: go.opentelemetry.io/collector/exporter/otlphttpexporter v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/loadbalancingexporter v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter v0.79.0

processors:
  - gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.79.0
  - gomod: go.opentelemetry.io/collector/processor/memorylimiterprocessor v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/attributesprocessor v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/cumulativetodeltaprocessor v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatorateprocessor v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/filterprocessor v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/groupbyattrsprocessor v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/groupbytraceprocessor v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sattributesprocessor v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/metricstransformprocessor v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/probabilisticsamplerprocessor v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/redactionprocessor v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/resourcedetectionprocessor v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/resourceprocessor v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/routingprocessor v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/servicegraphprocessor v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/spanmetricsprocessor v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/transformprocessor v0.79.0

receivers:
  - gomod: go.opentelemetry.io/collector/receiver/otlpreceiver v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/dockerstatsreceiver v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/filelogreceiver v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/fluentforwardreceiver v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/httpcheckreceiver v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/k8sclusterreceiver v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/k8seventsreceiver v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/k8sobjectsreceiver v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kubeletstatsreceiver v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/receivercreator v0.79.0

connectors:
  - gomod: go.opentelemetry.io/collector/connector/forwardconnector v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/connector/countconnector v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/connector/servicegraphconnector v0.79.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/connector/spanmetricsconnector v0.79.0

Manifest Change Histroy

  • Removed dockerobserverextension and dockerstatsreceiver. Added loadbalancingexporter.
  • Added receivercreator

Additional Info

This topic has come up recently in the Collector SIG, but again in the Operator SIG. Please see the SIG notes for details.

/cc @open-telemetry/operator-approvers

@TylerHelmuth TylerHelmuth added enhancement New feature or request distribution proposal and removed enhancement New feature or request labels Jun 9, 2023
@codeboten
Copy link
Contributor

Although I'm not opposed to a k8s distro, I would like to see further discussions on what distros are officially supported by the community progress before making a decision here. See the otep: open-telemetry/oteps#229

@rmfitzpatrick
Copy link

A note for the proposed manifest is that the observer extensions are primarily supported by the receiver creator so it would ideally be included as well.

@jpkrohling
Copy link
Member

Thank you for the proposal, @TylerHelmuth. It makes a lot of sense. I would even go one step further and suggest yet another one for sidecars, containing only OTLP receiver/exporter and batching processor. We can talk about that on another issue, though.

I have a few comments/questions about some components.

  • gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/observer/dockerobserver v0.79.0

I feel ashamed for not knowing this, but does this work on Kubernetes? I was always under the impression that this was for "pure" Docker environments. It would be nice to get a positive confirmation from someone using this with their Collector on Kubernetes.

  • gomod: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter v0.79.0

I'm curious why Prometheus components were selected but not other open-source solutions we promised compatibility with, like Jaeger and Zipkin. I have good arguments for having a Prometheus receiver while still not having a Jaeger receiver, but not so much for a Prometheus exporter when Jaeger is missing. And even if we can make a case for a Prometheus exporter, I would keep only the remote-write exporter, not this one. My suggestion: remove this one, keep the remote-write exporter, and document why we have this exporter but not Jaeger (because it can ingest OTLP already?) and Zipkin (no active code owners?)

  • gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor v0.79.0

If you include this, you probably need the load-balancing exporter as well so that users can have a scalable tail-sampling setup.

receivers:

I would include a Jaeger receiver there. While the Jaeger client libraries have been deprecated for some time, some users still have services instrumented with OpenTracing+Jaeger.

@pavolloffay
Copy link
Member

+1 on the proposal.

I agree with @jpkrohling I would expect that other OSS receivers (e.g. Zipkin, Jaeger, OpenCensus) would be included in the distro.

@codeboten
Copy link
Contributor

One question i have is how the release process can be improved to support N distributions. The goreleaser tooling has already hit disk limits with only the two distros we've currently supporting. This isn't to say we shouldn't do this, but that we'll need to spend some cycles finding ways to split the release process to support this. Maybe this could be a separate issue.

Specifically around the k8s distro, would it make sense for different owners to be listed? I understand the overlap of the groups may be enough to have collector-contrib-approvers as the owners for now, but it may not be the case with future distros (one example is @MovieStoreGuy's suggestion of a lambda distro).

@TylerHelmuth
Copy link
Member Author

@rmfitzpatrick I don't know much about the receiver creator, but is it useful for monitoring Kubernetes or applications that run on Kubernetes? If not, I'd opt to remove the observer extensions if the other components don't require them.

@jpkrohling after doing a quick test I think you're right. I have removed the dockerobserverextension and dockerstatsreceiver.

I included all prometheus receiver because it directly related to the rule: "All receivers must facilitate the collection of data that is generated by Kubernetes or services running in Kubernetes". Looking at the rules now they need some work to clarify which exporters should be included.

As for jaeger and zipkin, I think their inclusion will be based on some of the outcomes of #360. Again, I think this could use more rules clarification. Part of me says they can be included bc the facilitate monitoring applications running in Kubernetes and the other part of me says that OTLP meets that goal so we don't need them. A big question that #360 will answer is "do all the distros this community supports need to include support for previous standards?". Would definitely appreciate some ideas on how to adjust to rules to clearly allow or disallow the receivers and/or exporters. We may need a specific rule for which receivers the distro allows and which exporters the distro allows.

Good point on the loadbalancingexporter, that definitely facilitates telemetry collection in k8s. I have added it.

@codeboten I added the Release pipeline question to #360 for now. I think open-telemetry/collector-contrib-approvers should continue to own everything in the repo to guarantee someone is responsible, but I like the idea of different distributions having other supporting owners. I can see the opentelemetry-operator-approvers and opentelemetry-helm-approvers helping support this distribution.

@TylerHelmuth
Copy link
Member Author

TylerHelmuth commented Jun 16, 2023

Based on the discussion so far there there is definitely a need to touch up the rules. Here are some ideas popping into my head:

  • All OSS receivers should be included: This would allow receivers like jaeger, zipkin, prometheus and opencensus, but I think opens the flood gates too wide. I would need to check but I think this would include many many more receivers in contrib that are for open source technologies. In an effort to keep this distribution slim and targeted I feel a rule like this is too broad.

  • All technologies OTel promised to support will be included [as receivers]: This is entirely dependent on Create rules for supported distributions #360. One possible outcome of that issue is that we decide that we need to keep our support promises in all distros we support. If that is the case, then we include the related component in this distro. That support might only be for receivers, but might need to include exporters as well.

  • the only supported export format is otlp (except the logging exporter): This rule would be easy to maintain and pushes forward the idea that OTLP is "all thats needed", but I imaging would be pretty limiting for many users as prometheus is still a huge part of the k8s ecosystem.

@rmfitzpatrick
Copy link

@rmfitzpatrick I don't know much about the receiver creator, but is it useful for monitoring Kubernetes or applications that run on Kubernetes? If not, I'd opt to remove the observer extensions if the other components don't require them.

It's helpful for dynamically instantiating other included receivers based on discovered pod/port/node endpoints as presented by the k8s observer: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/receivercreator#rule-expressions. It operates similarly to the prometheus kubernetes_sd_config but for arbitrary receivers and telemetry types.

@TylerHelmuth
Copy link
Member Author

Ok, it sounds like it qualifies then. I've updated the manifest.

@andrzej-stencel
Copy link
Member

the Contrib distribution, which we do not recommend be used in production

@TylerHelmuth can you point to a piece of documentation that mentions this? I searched but couldn't find any.

@TylerHelmuth
Copy link
Member Author

@astencel-sumo the recommendation can be found here: https://github.com/open-telemetry/opentelemetry-collector-releases/tree/main/distributions/otelcol-contrib#recommendation

@jpkrohling
Copy link
Member

I think we should have only OTLP exporters for the Kubernetes distribution. Pretty much everything out there supports ingesting OTLP already, including Jaeger. I remember seeing an issue on Prometheus' side about adding native OTLP ingestion there as well, so, I believe we won't need a Prometheus exporter for this new distribution.

On the receiver side, I have two ways of seeing this:

  • There are still quite a few workloads out there instrumented with OpenTracing and Jaeger. Having a Jaeger receiver would help them. I haven't seen that many Zipkin workloads out there, but perhaps I'm just an ecosystem away from them? From this perspective, I would vote for having a Jaeger receiver, alongside Prometheus and OTLP.
  • "All technologies OTel promised to support will be included [as receivers]" is effectively like the above, except that it includes Zipkin.

Between the two, I prefer the second, as it's more objective than the first.

@TylerHelmuth
Copy link
Member Author

@jpkrohling I'm good with supporting only exporters that use OTLP and also the logging exporter.

I'm also ok with bullet 2, I have a feeling that's where #360 will go anyways.

@jsirianni
Copy link
Member

When using a DaemonSet to read container logs and kubelet logs, the journald receiver is sometimes needed for the kubelet, as it can run as a systemd service outside of the cluster. Should this be considered? The journald receiver complicates the build due to the required dependencies inside the container image.

@TylerHelmuth
Copy link
Member Author

I'd say if it is a useful component in k8s then it should be considered for inclusion. What extra dependencies does it require? I am not aware of anything special our existing Contrib pipeline is doing for this receiver.

@jsirianni
Copy link
Member

I'd say if it is a useful component in k8s then it should be considered for inclusion. What extra dependencies does it require? I am not aware of anything special our existing Contrib pipeline is doing for this receiver.

We (observIQ) install systemd, which provides the required journalctl command and its libraries. The journald receiver runs the journalctl command with os.exec. https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/operator/input/journald/journald.go#L87. I have seen others install only the required binary and libraries, instead of the entire systemd package but I cannot find the thread right now.

AFAIK, a native pure Go implementation for reading from the journal does not exist, so dependence on the journalctl command is necessary.

@sathieu
Copy link

sathieu commented Oct 12, 2023

journald support can be implemented with go bindings (see coreos/go-systemd). This is what is used in promtail (see here). systemd libs are still needed and promtail uses FROM debian:bullseye-slim.

@swiatekm
Copy link

The journald dependency problem is a bit complicated. If you include a specific version of the systemd libs in the container image, you can run into compatibility issues if the delta between your version and the host's is large enough. It's safest to mount the libraries from the host, but that requires more work from the user.

In the interest of unblocking this release, I'd suggest including the receiver without the binary dependencies, and moving this discussion to a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants