Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/hostmetrics] filesystem scraper doesn't respect root_path #35990

Closed
povilasv opened this issue Oct 25, 2024 · 13 comments · Fixed by #36000
Closed

[receiver/hostmetrics] filesystem scraper doesn't respect root_path #35990

povilasv opened this issue Oct 25, 2024 · 13 comments · Fixed by #36000
Labels
bug Something isn't working receiver/hostmetrics

Comments

@povilasv
Copy link
Contributor

povilasv commented Oct 25, 2024

Component(s)

receiver/hostmetrics

What happened?

Description

Looks like #35504 gave us a regression.

Now filesystem scraper ignores root_path and tries to just open /mounts althought root_path is set. Uses with root_path will start getting these errors, and no filesystem metrics:

2024-10-25T08:45:13.299+0300    error   scraperhelper/scrapercontroller.go:205  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics", "data_type": "metrics", "error": "open /mounts: no such file or directory", "scraper": "filesystem"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
        go.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:205
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
        go.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:177

IMO this is an important regression, as we use this in opentelemetry-helm-charts hostMetrics preset -> https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/_config.tpl#L71-L88

Steps to Reproduce

receivers:
  hostmetrics:
    root_path: /tmp
    collection_interval: 30s
    scrapers:
      cpu:
        metrics:
          system.cpu.time:
            enabled: true
      disk:
        metrics:
          system.disk.io:
            enabled: true
          system.disk.operations:
            enabled: true
      filesystem:
        metrics:
          system.filesystem.usage:
            enabled: true
      load:
        metrics:
          system.cpu.load_average.1m:
            enabled: true
          system.cpu.load_average.5m:
            enabled: true
          system.cpu.load_average.15m:
            enabled: true
      memory:
        metrics:
          system.memory.usage:
            enabled: true
      network:
        metrics:
          system.network.connections:
            enabled: true
          system.network.io:
            enabled: true
      paging:
        metrics:
          system.paging.operations:
            enabled: true
          system.paging.usage:
            enabled: true
      process:
        metrics:
          process.cpu.time:
            enabled: true
          process.memory.usage:
            enabled: true

processors:
  batch:
    timeout: 10s
    send_batch_size: 1024

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    metrics:
      receivers: [hostmetrics]
      processors: [batch]
      exporters: [debug]

  telemetry:
    logs:
      level: "debug"

You will get:

2024-10-25T09:03:56.691+0300    error   scraperhelper/scrapercontroller.go:205  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics", "data_type": "metrics", "error": "open /mounts: no such file or directory", "scraper": "filesystem"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
        go.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:205
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
        go.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:177

although it should try to open /tmp/mounts

Expected Result

Actual Result

Collector version

v0.102.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

@povilasv povilasv added bug Something isn't working needs triage New item requiring triage labels Oct 25, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth
Copy link
Member

@povilasv is there a work around?

@povilasv
Copy link
Contributor Author

Workaround is to manually set the env var:

HOST_PROC_MOUNTINFO=/hostfs/proc/1

@marcelaraujo
Copy link

@povilasv, there is the same issue here after upgrading to the latest version.

{"level":"error","ts":1729866966.393247,"caller":"scraperhelper/scrapercontroller.go:205","msg":"Error scraping metrics","kind":"receiver","name":"hostmetrics","data_type":"metrics","error":"open /mounts: no such file or directory","scraper":"hostmetrics","stacktrace":"go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport\n\tgo.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:205\ngo.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1\n\tgo.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:181"}
{"level":"error","ts":1729866996.3939178,"caller":"scraperhelper/scrapercontroller.go:205","msg":"Error scraping metrics","kind":"receiver","name":"hostmetrics","data_type":"metrics","error":"open /mounts: no such file or directory","scraper":"hostmetrics","stacktrace":"go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport\n\tgo.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:205\ngo.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1\n\tgo.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:181"}
{"level":"error","ts":1729867026.3932698,"caller":"scraperhelper/scrapercontroller.go:205","msg":"Error scraping metrics","kind":"receiver","name":"hostmetrics","data_type":"metrics","error":"open /mounts: no such file or directory","scraper":"hostmetrics","stacktrace":"go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport\n\tgo.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:205\ngo.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1\n\tgo.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:181"}

@povilasv
Copy link
Contributor Author

@marcelaraujo try setting the environment variable to HOST_PROC_MOUNTINFO=/hostfs/proc/1

it should workaround this issue

@marcelaraujo
Copy link

marcelaraujo commented Oct 25, 2024

Hi @povilasv

It didn't work.

What should be the values for this case?

env:
   - name: HOST_PROC_MOUNTINFO
     value: /proc/1/self
volumes:
   - name: hostfs
     hostPath:
        path: /
volumeMounts:
   - name: hostfs
     mountPath: /hostfs
     readOnly: true
     mountPropagation: HostToContainer
config:
   receivers:
      hostmetrics:
          root_path: /hosts

When I tried using your environment variable, I got a different issue

{"level":"error","ts":1729870334.3869734,"caller":"scraperhelper/scrapercontroller.go:205","msg":"Error scraping metrics","kind":"receiver","name":"hostmetrics","data_type":"metrics","error":"failed to read usage at /hostfs/conf: no such file or directory; failed to read usage at /hostfs/hostfs/run/containerd/io.containerd.runtime.v2.task/k8s.io/b9d436f09506a98c09ba162dc17d2692f70430adf0601dfc1cd2f676c0253b80/rootfs/conf: no such file or directory","scraper":"hostmetrics","stacktrace":"go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport\n\tgo.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:205\ngo.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1\n\tgo.opentelemetry.io/collector/receiver@v0.112.0/scraperhelper/scrapercontroller.go:177"}

@atoulme
Copy link
Contributor

atoulme commented Oct 25, 2024

Looking at this issue and going to attempt to reproduce ; the environment variable is used in 2 places and setting it as a workaround might not be the fix.

@atoulme
Copy link
Contributor

atoulme commented Oct 25, 2024

Using your config file, on Mac, trying to reproduce:

docker run --rm -it -v $(pwd)/config.yaml:/etc/otelcol-contrib/config.yaml -v /:/tmp otel/opentelemetry-collector-contrib:latest

I don't see the error reported. I will try to reproduce on a Linux VM next.

@atoulme
Copy link
Contributor

atoulme commented Oct 25, 2024

Reproducing on Linux now, taking it further.

@atoulme
Copy link
Contributor

atoulme commented Oct 25, 2024

Adding -e HOST_PROC_MOUNTINFO=/tmp/proc/1 doesn't fix the issue.

2024-10-25T16:20:16.489Z	error	scraperhelper/scrapercontroller.go:205	Error scraping metrics	{"kind": "receiver", "name": "hostmetrics", "data_type": "metrics", "error": "failed to read usage at /tmp/tmp/boot: no such file or directory; failed to read usage at /tmp/tmp/boot/efi: no such file or directory; failed to read usage at /tmp/tmp/snap/snapd/21759: no such file or directory; failed to read usage at /tmp/tmp/snap/amazon-ssm-agent/7993: no such file or directory; failed to read usage at /tmp/tmp/snap/core18/2829: no such file or directory; failed to read usage at /tmp/tmp/snap/core22/1621: no such file or directory; failed to read usage at /tmp/tmp/snap/amazon-ssm-agent/9565: no such file or directory; failed to read usage at /tmp/tmp/snap/core18/2846: no such file or directory; failed to read usage at /tmp/tmp/snap/snapd/22991: no such file or directory; failed to read usage at /tmp/tmp/snap/core22/1663: no such file or directory; failed to read usage at /tmp/etc/otelcol-contrib/config.yaml: no such file or directory", "scraper": "hostmetrics"}

@atoulme
Copy link
Contributor

atoulme commented Oct 25, 2024

Using -e HOST_PROC_MOUNTINFO="" fixes the issue, setting the correct default value for this env var. The fix is to not set the default value of HOST_PROC_MOUNTINFO in the envMap at all as any non-empty value is going to skew things. I will open a PR in a second with a fix.

@atoulme
Copy link
Contributor

atoulme commented Oct 25, 2024

#36000 is a fix.

I am going to try and revive #32536 to make sure we have this fix tested.

@marcelaraujo
Copy link

@atoulme Confirming the suggestion worked.

@atoulme atoulme removed the needs triage New item requiring triage label Oct 28, 2024
zzhlogin pushed a commit to zzhlogin/opentelemetry-collector-contrib-aws that referenced this issue Nov 12, 2024
…e system is mounted (open-telemetry#36000)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
Do not set the default value of HOST_PROC_MOUNTINFO to respect root_path

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes open-telemetry#35990
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment