Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add systemd (unit) receiver #37169

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

pieterlexis-tomtom
Copy link

Description

This PR adds an initial systemd unit receiver, as requested in #33532.
I would describe the receiver a very early alpha, but usable.

Comments are welcome.

Link to tracking issue

Fixes #33532

Testing

This receiver has had some light testing and for this config:

---
receivers:
  systemd:
    units:
      - named.service
      - cups.service

exporters:
  debug:
    verbosity: detailed
  prometheus:
    endpoint: 127.0.0.1:8080
    resource_to_telemetry_conversion:
      enabled: true

service:
  telemetry:
    logs:
      level: debug
  pipelines:
    metrics:
      receivers:
        - systemd
      exporters:
        - prometheus
        - debug

it outputs this as metrics:

# HELP systemd_failed_jobs_total How many jobs have ever failed in total
# TYPE systemd_failed_jobs_total counter
systemd_failed_jobs_total 0
# HELP systemd_installed_jobs_total How many jobs have ever been queued in total
# TYPE systemd_installed_jobs_total counter
systemd_installed_jobs_total 2136
# HELP systemd_jobs How many jobs are currently queued
# TYPE systemd_jobs gauge
systemd_jobs 0
# HELP systemd_system_state The current state of the service manager
# TYPE systemd_system_state gauge
systemd_system_state{architecture="x86-64",system_state="running",systemd_version="249.11-0ubuntu3.12",virtualization=""} 3
# HELP systemd_unit_errno The errno (exit code) of the last error/exit
# TYPE systemd_unit_errno gauge
systemd_unit_errno{systemd_unit_name="cups.service"} 0
systemd_unit_errno{systemd_unit_name="named.service"} 0
# HELP systemd_unit_restarts_total Amount of time the unit was restarted this boot
# TYPE systemd_unit_restarts_total counter
systemd_unit_restarts_total{systemd_unit_name="cups.service"} 0
systemd_unit_restarts_total{systemd_unit_name="named.service"} 0
# HELP systemd_unit_state The full state of the unit. The gauge value is the sub state, but all states (load, active, sub) are exposed as attributes
# TYPE systemd_unit_state gauge
systemd_unit_state{active_state="active",load_state="loaded",sub_state="running",systemd_unit_name="cups.service"} 6
systemd_unit_state{active_state="active",load_state="loaded",sub_state="running",systemd_unit_name="named.service"} 6

Apart from that, there are several go tests of the scraper.

Copy link

CLA Not Signed

@@ -4,8 +4,172 @@ status:
class: receiver
stability:
development: [metrics]
distributions: []
distributions: [contrib]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it's not in contrib yet, please remove for now

- active_state
- sub_state
systemd.unit.restarts:
description: Amount of time the unit was restarted this boot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the unit {restarts} doesn't match the description

unit: "{system_state}"
attributes:
- systemd_version
- system_state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no that doesn't really work, you are sending a metric but only care for its attribute. You cannot make the unit match the attribute name.

The value of the metric must itself be meaningful, or just send 1 if you're just going to create a time series with the attributes.

- systemd_version
- system_state
- architecture
- virtualization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would disable those 2 attributes by default

gauge:
value_type: int
unit: "{jobs}"
systemd.installed_jobs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is installed jobs the right name? How about systemd.jobs.total ?

monotonic: true
value_type: int
unit: "{jobs}"
systemd.failed_jobs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
systemd.failed_jobs:
systemd.jobs.failed:

enabled: false
gauge:
value_type: int
unit: "{active_state}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What value are we reporting?

attributes:
- load_state
- active_state
- sub_state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might create high cardinality as your metric will have changing attribute values, creating multiple time series.

@pieterlexis-tomtom
Copy link
Author

@atoulme I'll fix the issues mentioned somewhere the coming week

Copy link
Contributor

github-actions bot commented Feb 6, 2025

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Feb 6, 2025
@pieterlexis-tomtom
Copy link
Author

@atoulme I have some problems with rebasing. Is there a guide somewhere where there are some descriptions on how to fix the gomod files, builder-config etc.?

@github-actions github-actions bot removed the Stale label Feb 12, 2025
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New component: Add systemd receiver
2 participants