-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add system shutdown timestamp #3111
base: master
Are you sure you want to change the base?
Conversation
Add a metric for the scheduled shutdown time from systemd. Signed-off-by: Ben Kochie <superq@gmail.com>
haha excellent. i should really have just dumped my code over the fence last night, as I ended up with something very similar: modified collector/systemd_linux.go
@@ -130,6 +130,9 @@ func NewSystemdCollector(logger log.Logger) (Collector, error) {
socketRefusedConnectionsDesc := prometheus.NewDesc(
prometheus.BuildFQName(namespace, subsystem, "socket_refused_connections_total"),
"Total number of refused socket connections", []string{"name"}, nil)
+ scheduledShutdownTime := prometheus.NewDesc(
+ prometheus.BuildFQName(namespace, subsystem, "scheduled_shutdown_timestamp_seconds"),
+ "time of the next scheduled reboot, or zero", []string{"kind"}, nil)
systemdVersionDesc := prometheus.NewDesc(
prometheus.BuildFQName(namespace, subsystem, "version"),
"Detected systemd version", []string{"version"}, nil)
@@ -170,6 +173,7 @@ func NewSystemdCollector(logger log.Logger) (Collector, error) {
systemdVersionDesc: systemdVersionDesc,
systemdUnitIncludePattern: systemdUnitIncludePattern,
systemdUnitExcludePattern: systemdUnitExcludePattern,
+ scheduledShutdownTime: scheduledShutdownTime,
logger: logger,
}, nil
}
@@ -194,6 +198,13 @@ func (c *systemdCollector) Update(ch chan<- prometheus.Metric) error {
systemdVersion,
systemdVersionFull,
)
+ shutdownTimestamp, shutdownKind := c.getShutdownTime(conn)
+ ch <- prometheus.MustNewConstMetric(
+ c.scheduledShutdownTime,
+ prometheus.GaugeValue,
+ shutdownTimestamp,
+ shutdownKind,
+ )
allUnits, err := c.getAllUnits(conn)
if err != nil {
@@ -506,3 +517,20 @@ func (c *systemdCollector) getSystemdVersion(conn *dbus.Conn) (float64, string)
}
return v, version
}
+
+func (c *systemdCollector) getShutdownTimesamp(conn *dbus.Conn) (float64, string) {
+ timestamp, err := conn.GetManagerProperty("ScheduledShutdown")
+ if err != nil {
+ level.Debug(c.logger).Log("msg", "Unable to get scheduled shutdown time, defaulting to 0")
+ return 0, ""
+ }
+ version = strings.TrimPrefix(strings.TrimSuffix(version, `"`), `"`)
+ level.Debug(c.logger).Log("msg", "Got systemd version", "version", version)
+ parsedVersion := systemdVersionRE.FindString(version)
+ v, err := strconv.ParseFloat(parsedVersion, 64)
+ if err != nil {
+ level.Debug(c.logger).Log("msg", "Got invalid systemd version", "version", version)
+ return 0, ""
+ }
+ return v, version
+} bewarned there's leftover copy-paste from the version fetch function above. :) i even filed coreos/go-systemd#447 upstream to figure out how to wrestle that thing out of that byzantine API. :) thank you so much for working on this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great start, stuck like i was at "wtf is this dbus interface gaah" :)
func (c *systemdCollector) collectScheduledShutdownMetrics(conn *dbus.Conn, ch chan<- prometheus.Metric) error { | ||
var shutdownTimeUsec uint64 | ||
|
||
timestampValue, err := conn.GetServicePropertyContext(context.TODO(), "org.freedesktop.login1", "ScheduledShutdown") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just so you know, i'm not sure this returns a single integer. if it behaves like the commandline tool, it returns a tuple of 3 elements. with a pending reboot:
root@perdulce:~# busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown
(st) "reboot" 1725545703588789
without:
anarcat@angela:~$ busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown
(st) "" 18446744073709551615
notice how the timestamp is in nanoseconds, and how completely out of whack it is when there's no scheduled shutdown. not sure what's going on there.
the script i wrote in #3110 (comment) does this somewhat properly, and outputs the following metrics, with the first example:
# HELP node_shutdown_scheduled_timestamp_seconds time of the next scheduled reboot, or zero
# TYPE node_shutdown_scheduled_timestamp_seconds gauge
node_shutdown_scheduled_timestamp_seconds{kind=reboot} 1725545703.588789
with the second, it does that:
# HELP node_shutdown_scheduled_timestamp_seconds time of the next scheduled reboot, or zero
# TYPE node_shutdown_scheduled_timestamp_seconds gauge
node_shutdown_scheduled_timestamp_seconds 0
@@ -112,6 +113,11 @@ func NewSystemdCollector(logger log.Logger) (Collector, error) { | |||
"Whether the system is operational (see 'systemctl is-system-running')", | |||
nil, nil, | |||
) | |||
systemShutdownDesc := prometheus.NewDesc( | |||
prometheus.BuildFQName(namespace, subsystem, "system_shutdown_timestamp"), | |||
"Time for a scheduled shutdown (see 'systemctl status systemd-shutdownd.service')", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that command outputs:
Unit systemd-shutdownd.service could not be found.
here.
I found a systemd-shutdown(8)
manual page, that doesn't much in understanding that component. In fact, I had a frustrating time trying to find any meaningful documentation on how that damn thing works... The org.freedesktop.login1(5) manual page does mention it though:
ScheduledShutdown shows the value pair set with the
ScheduleShutdown() method described above.
That's for the property we're (trying to) fetch(ing) here... That method referenced there is:
ScheduleShutdown() schedules a shutdown operation type at time
usec in microseconds since the UNIX epoch. type can be one of
"poweroff", "dry-poweroff", "reboot", "dry-reboot", "halt", and
"dry-halt". (The "dry-" variants do not actually execute the
shutdown action.) CancelScheduledShutdown() cancels a scheduled
shutdown. The output parameter cancelled is true if a shutdown
operation was scheduled.
... which is, frankly, not that much helpful.
@@ -112,6 +113,11 @@ func NewSystemdCollector(logger log.Logger) (Collector, error) { | |||
"Whether the system is operational (see 'systemctl is-system-running')", | |||
nil, nil, | |||
) | |||
systemShutdownDesc := prometheus.NewDesc( | |||
prometheus.BuildFQName(namespace, subsystem, "system_shutdown_timestamp"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that should be system_shutdown_timestamp_seconds
, no?
@@ -112,6 +113,11 @@ func NewSystemdCollector(logger log.Logger) (Collector, error) { | |||
"Whether the system is operational (see 'systemctl is-system-running')", | |||
nil, nil, | |||
) | |||
systemShutdownDesc := prometheus.NewDesc( | |||
prometheus.BuildFQName(namespace, subsystem, "system_shutdown_timestamp"), | |||
"Time for a scheduled shutdown (see 'systemctl status systemd-shutdownd.service')", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure how to represent the "no shutdown scheduled" state. in my script, i used "zero seconds" as a value for that, but the property returned somehow uses something else (which looks a lot like MAX_INT-1, AKA 2^64-1, AKA 18446744073709551615 ≈ 1,844 674 407 4 × 10^19)
also note that on my laptop, this morning, after the device went to sleep on its own after a timeout, dbus says this:
anarcat@angela:~$ busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown
(st) "suspend" 0
Add a metric for the scheduled shutdown time from systemd.