Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: k8slog receiver #23339

Closed
2 tasks
h0cheung opened this issue Jun 13, 2023 · 19 comments
Closed
2 tasks

New component: k8slog receiver #23339

h0cheung opened this issue Jun 13, 2023 · 19 comments
Labels

Comments

@h0cheung
Copy link
Contributor

h0cheung commented Jun 13, 2023

The purpose and use-cases of the new component

The existing filelog receiver can find and collect logs in files based on configured path expressions.

However, for k8s containers, we need a special path to access the files in the container. It is not convenient to match the corresponding path directly, and difficult to get its metadata or filter it.

The sidecar solution brings additional problems such as deployment troubles and wasted resources. At present, there is no solution in otel that meets our needs in all aspects.

So we decided to develop a new component: k8slog receiver, to provide full functionality to collect logs in k8s containers. It is an all-in-one solution, which supports to:

  • collect logs from files inside k8s containers via daemonset
  • support docker and cri-containerd
  • support many docker graph drivers / snapshotters
  • collect logs from stdout of k8s containers via daemonset
  • filter containers by metadata
  • extract metadata of containers
  • collect file logs in k8s containers via sidecar

If we want to collect logs from k8s, we can use this component.

Example configuration for the component

receivers:
  k8slog:
    discovery:
      mode: 1 # choices:
              # 1 (default): collect logs from files inside containers via daemonset
              # 2: collect logs from stdout of containers via daemonset
              # 3: collect logs from files via sidecar
      host_root: /host-root # default. we need to mount / of host in the container as a subpath. Read-only is enough.
      runtime_apis: # APIs to be used for discovering pods. By default it will try to connect to both docker and cri-containerd.
        type: docker
        addr: unix:///<host_root>/var/run/docker.sock # default. address of docker api.
        containerd_addr: unix:///<host_root>/run/docker/containerd/containerd.sock # default. address of underlying containerd api.
      filter: # filters for pods. By default all containers will be included.
        annotations:
        labels:
        namespaces:
        containers:
        pods:
        uids:
        expr: # custom expression (using antonmedv/expr)
      extract: # extrace resource attributes, like k8sattributes processor.
        metadata:
        env:
        annotations:
        labels:
    # configs below are (almost) the same as filelog receiver.
    include: # In `daemonset-file` mode, these paths are paths inside the contaienrs. In `daemonset-stdout` mode, these paths is useless.
      - /var/log/*.log
    exclude:
    start_at:
    multiline:
    force_flush_period:
    encoding:
    include_file_name:
    poll_interval:
    fingerprint_size:
    max_log_size:
    max_concurrent_files:
    attributes:
    resource:
    operators:
    storage:
    preserve_leading_whitespaces:
    preserve_trailing_whitespaces:
    max_batches:
    header:
    retry_on_failure:

Telemetry data types supported

logs

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute this as a representative of the vendor.

Sponsor (optional)

@TylerHelmuth

Additional context

No response

@h0cheung h0cheung added the needs triage New item requiring triage label Jun 13, 2023
@h0cheung h0cheung changed the title New component: New component: k8slog receiver Jun 13, 2023
@songy23 songy23 added the Sponsor Needed New component seeking sponsor label Jun 14, 2023
@djaglowski djaglowski removed the needs triage New item requiring triage label Jun 28, 2023
@TylerHelmuth
Copy link
Member

@h0cheung I will sponsor this component.

A couple comments on the proposal:

  • lets not introduce expr at the start. If we need custom expression I would prefer to use OTTL, but we could introduce it later.
  • for mode, lets you strings instead of integers as they make it easier for users reading configuration to understand what the component is doing.
  • Do we definitely need to include configuration option if we already have the filter configuration section? Could we combine them in some way?
  • I think it would be nice to nest the filelogreceiver configuration options under another section instead of placing them all in root.

@TylerHelmuth TylerHelmuth added Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor labels Aug 14, 2023
@djaglowski
Copy link
Member

djaglowski commented Aug 15, 2023

I generally think this is a good idea but strongly believe we need to depend on the fileconsumer package for tracking and reading log files. It appears this is the intention, but I'd like to understand what if any differences you foresee.

configs below are (almost) the same as filelog receiver.

Can you clarify in which ways it is different? Is it only in the way the include paths are interpretted?

When reading from files, to what extent should users be able to apply operators? If there is some pre-configured set of operators that handles parsing, would a user be able to change these? Would they be able to place operators before and/or after the pre-defined operators?

@TylerHelmuth
Copy link
Member

@djaglowski I agree that we should use fileconsumer.

I believe this component will pre-configure some operators based on kubernetes requirements. Since this is an opinionated receiver, I don't think those should be configurable (at least not the operators themselves - the proposal is using a different interface to configure how the resource attributes should be set and I like separating this config from the operator itself).

Users should be able to add operators after the component's pre-determined manipulation/setup.

@h0cheung
Copy link
Contributor Author

h0cheung commented Aug 20, 2023

@djaglowski @TylerHelmuth

As planned, I will write this component from scratch. Some of the configuration items are the same as filelog, this is to make it easier for users to get started, it does not mean that the same implementation is used. There are several reasons for this:

  • fileconsumer currently has some imperfections, such as [pkg/stanza/fileconsumer] Add ability to read files asynchronously #23056 tries to address. It's not easy to make large changes for already released components, but for new components I want to do as good a job as possible from the start.
  • For symbolic links, we need special handling based on the mount point mapping when looking for files.
  • In daemonset-stdout mode, the file path of the logs is completely defined, no glob is needed. So we can manage the reader directly after getting the file path, no need for resource consuming mechanisms like fingerprint, just handle the log rotation.

For efficiency and flexibility, parsing the k8s standard output and splicing containerd logs (based on "T" and "F" tags) will be implemented in separate code. There are no preconfigured operators.

@djaglowski
Copy link
Member

As planned, I will write this component from scratch. Some of the configuration items are the same as filelog, this is to make it easier for users to get started, it does not mean that the same implementation is used.

fileconsumer currently has some imperfections, such as #23056 tries to address. It's not easy to make large changes for already released components, but for new components I want to do as good a job as possible from the start.

I strongly disagree with the idea of writing a completely separate implementation for file tracking and ingestion. We should address imperfections in pkg/stanza/fileconsumer rather than maintaining two separate implementations. For the particular issue mentioned here, this is a performance enhancement for an edge case - hardly a reason to reroll a major package and maintain two implementations.

For symbolic links, we need special handling based on the mount point mapping when looking for files.

I recommend we define an interface or function that can be passed to fileconsumer to customize behavior for handling symlinks. We can have a default behavior and another that is substituted by k8slog and perhaps reused by other components in the future.

In daemonset-stdout mode, the file path of the logs is completely defined, no glob is needed. So we can manage the reader directly after getting the file path, no need for resource consuming mechanisms like fingerprint, just handle the log rotation.

Similarly, I recommend we define an interface for finding files. We already have a matcher package which fully encapsulates the finding and filtering of files. It should be possible to substitute another implementation if necessary.

@TylerHelmuth
Copy link
Member

I agree with @djaglowski.

@h0cheung
Copy link
Contributor Author

h0cheung commented Aug 24, 2023

I think interfacing the components in fileconsumer is a great idea.

I think we can try to implement asynchronous file reading, optimized fingerprint matching, and optimized log rotation determination in the k8slog receiver's implementation of the fileconsumer interfaces.

I'll make changes to the init commit as soon as possible.

Let's make this component work together.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Oct 24, 2023
@h0cheung
Copy link
Contributor Author

h0cheung commented Nov 6, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

remove stale

@github-actions github-actions bot removed the Stale label Nov 7, 2023
Copy link
Contributor

github-actions bot commented Jan 8, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Jan 8, 2024
@djaglowski djaglowski removed the Stale label Jan 8, 2024
@swiatekm
Copy link
Contributor

Looking at the proposed configuration, three questions come to mind.

  1. Why interface directly with the container runtime, as opposed to the kubelet? If we want to filter over Pod attributes like labels, does the container runtime even know about them?

  2. Mounting the whole host root, even as readonly, looks problematic from a security standpoint. Do we actually need anything outside of /var/log/pods?

  3. How is mode 1 supposed to work in terms of implementation?

@h0cheung
Copy link
Contributor Author

h0cheung commented Feb 20, 2024

Looking at the proposed configuration, three questions come to mind.

  1. Why interface directly with the container runtime, as opposed to the kubelet? If we want to filter over Pod attributes like labels, does the container runtime even know about them?
  2. Mounting the whole host root, even as read-only, looks problematic from a security standpoint. Do we actually need anything outside of /var/log/pods?
  3. How is mode 1 supposed to work in terms of implementation?
  1. In addition to the runtime api, we also use one of the kubeapi or kubelet.
  2. For deamonset-stdout mode there are often symbolic links under /var/log/pods. So, we need to mount at least the pointers to the symbolic links. For daemonset-file mode, we need to mount all the paths that are used as mount points for the container.
  3. Use the runtime api to get the mount point information (including the root) and then map the pattern outside the container to find the log files based on the pattern inside the container.

For example, for a docker container, it has the following mount points:

  • / -> /var/lib/docker/overlay2/xxxxxxxxxxxxx/merged, where the newly generated files are under /var/lib/docker/overlay2/xxxxxxxxxxxxx/upper
  • /logs -> /home/xxx/logs/

Also, the host root is mounted under /host-root on the otel container.
Then, when the user configures include=["/logs/*.log", "/var/log/*.log"], we can find log files by /host-root/home/xxx/logs/*.log and /host-root/var/lib/docker/overlay2/xxxxxxxxxxxxx/upper/var/log/*.log.
If symbolic links are involved, we need to recursively perform the above mount-point-based mapping process on the target of the link according to the mount point.

@swiatekm
Copy link
Contributor

What you want to do looks quite complex. I think I'd prefer if we started just with just mode 2, providing a nicer interface and more features over using filelog receiver directly for that use case, and then possibly expand.

My general feeling is that philosophically it's a better approach to have a sidecar simply print logs from files inside the container to stdout, and then handle them via the standard Kubernetes logging mechanism, rather then go around that mechanism by digging into container runtime internals.

@dmitryax
Copy link
Member

For example, for a docker container, it has the following mount points:

  • / -> /var/lib/docker/overlay2/xxxxxxxxxxxxx/merged, where the newly generated files are under /var/lib/docker/overlay2/xxxxxxxxxxxxx/upper
  • /logs -> /home/xxx/logs/

Why do we need docker runtime support? It's gone since k8s 1.24

@swiatekm
Copy link
Contributor

For example, for a docker container, it has the following mount points:

  • / -> /var/lib/docker/overlay2/xxxxxxxxxxxxx/merged, where the newly generated files are under /var/lib/docker/overlay2/xxxxxxxxxxxxx/upper
  • /logs -> /home/xxx/logs/

Why do we need docker runtime support? It's gone since k8s 1.24

There exist some popular cloud providers who offer extended support for K8s versions, one year after upstream EOL to be exact. They still support 1.23 until October 2024.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label May 27, 2024
@LH8PPL
Copy link

LH8PPL commented Jun 18, 2024

Is someone really working on this? do we know when will have a working receiver?

@github-actions github-actions bot removed the Stale label Jun 19, 2024
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Aug 19, 2024
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants