Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref!: convert linux meminfo implementation to use procfs lib #3049

Merged
merged 3 commits into from
Jul 14, 2024

Conversation

tjhop
Copy link
Contributor

@tjhop tjhop commented Jun 12, 2024

Part of #2957

Prometheus' procfs lib supports collecting memory info and we're using a
new enough version of the lib that has it available, so this converts
the meminfo collector for Linux to use data from procfs lib instead. The
bits I've touched for darwin/openbsd/netbsd are with intent to preserve
the original struct implementation/backwards compatibility.

Signed-off-by: TJ Hoplock t.hoplock@gmail.com

@tjhop
Copy link
Contributor Author

tjhop commented Jun 12, 2024

Looking at the failing tests

@tjhop
Copy link
Contributor Author

tjhop commented Jun 13, 2024

@SuperQ would you mind taking a look when you get time? Lemme know your thoughts 👍

@tjhop
Copy link
Contributor Author

tjhop commented Jun 13, 2024

Ah, good point, I appreciate the forethought. I'll rework it to not coerce nils and leave them out 👍

tjhop added 3 commits June 13, 2024 13:54
Part of prometheus#2957

Prometheus' procfs lib supports collecting memory info and we're using a
new enough version of the lib that has it available, so this converts
the meminfo collector for Linux to use data from procfs lib instead. The
bits I've touched for darwin/openbsd/netbsd are with intent to preserve
the original struct implementation/backwards compatibility.

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Fixes:

```
ts=2024-06-11T19:04:55.591Z caller=meminfo.go:44 level=debug collector=meminfo msg="Set node_mem" memInfo="unsupported value type"
```

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Nil entries in procfs.Meminfo fields indicate that the value isn't
present on the system. Coercing those nil values to `0` introduces new
metrics on systems that should not be present and can break some
queries.

Addresses PR feedback:
prometheus#3049 (comment)
prometheus#3049 (comment)

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
@tjhop tjhop force-pushed the feat/meminfo-collector-procfs-lib branch from 4ee485e to d088970 Compare June 14, 2024 04:33
@tjhop
Copy link
Contributor Author

tjhop commented Jun 14, 2024

@SuperQ is this more in line with what you were thinking?

@SuperQ
Copy link
Member

SuperQ commented Jun 16, 2024

Yes, this is looking much better. I'll try and do a deeper review soon.

@tjhop
Copy link
Contributor Author

tjhop commented Jul 1, 2024

Friendly bump. @SuperQ please lemme know if you have any further thoughts 👍

return nil, fmt.Errorf("invalid line in meminfo: %s", line)
}
memInfo[key] = fv
if meminfo.ActiveBytes != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd convert this to a loop, something like:

for k, v := range map[string]*int {..} {
  if v == nil {
    continue
  }
  metrics[k] = float64(*v)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we wanted to switch to an explicit mapping.

Copy link
Contributor Author

@tjhop tjhop Jul 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think we need an explicit mapping here, especially since the field names in the Meminfo struct don't 100% align with the existing metric names, ex:

	if meminfo.ActiveAnonBytes != nil {
		metrics["Active_anon_bytes"] = float64(*meminfo.ActiveAnonBytes)

@SuperQ SuperQ merged commit fdaa8fc into prometheus:master Jul 14, 2024
7 checks passed
v-zhuravlev pushed a commit to grafana/node_exporter that referenced this pull request Nov 1, 2024
…eus#3049)

* ref!: convert linux meminfo implementation to use procfs lib

Part of prometheus#2957

Prometheus' procfs lib supports collecting memory info and we're using a
new enough version of the lib that has it available, so this converts
the meminfo collector for Linux to use data from procfs lib instead. The
bits I've touched for darwin/openbsd/netbsd are with intent to preserve
the original struct implementation/backwards compatibility.

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>

* fix: meminfo debug log unsupported value

Fixes:

```
ts=2024-06-11T19:04:55.591Z caller=meminfo.go:44 level=debug collector=meminfo msg="Set node_mem" memInfo="unsupported value type"
```

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>

* fix: don't coerce nil Meminfo entries to 0, leave out if nil

Nil entries in procfs.Meminfo fields indicate that the value isn't
present on the system. Coercing those nil values to `0` introduces new
metrics on systems that should not be present and can break some
queries.

Addresses PR feedback:
prometheus#3049 (comment)
prometheus#3049 (comment)

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>

---------

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
SuperQ added a commit that referenced this pull request Feb 16, 2025
* [CHANGE] meminfo: Convert linux implementation to use procfs lib #3049
* [CHANGE] Update logging to use Go log/slog #3097
* [FEATURE] filesystem: Add `node_filesystem_mount_info` metric #2970
* [FEATURE] btrfs: Add metrics for commit statistics #3010
* [FEATURE] interrupts: Add collector include/exclude filtering #3028
* [FEATURE] interrupts: Add "exclude zeros" filtering #3028
* [FEATURE] slabinfo: Add filters for slab name. #3041
* [FEATURE] pressure: add IRQ PSI metrics #3048
* [FEATURE] hwmon: Add include and exclude filter for sensors #3072
* [FEATURE] filesystem: Add NetBSD support #3082
* [FEATURE] netdev: Add ifAlias label #3087
* [FEATURE] hwmon: Add Support for GPU Clock Frequencies #3093
* [FEATURE] Add `exclude[]` URL parameter #3116
* [FEATURE] Add AIX support #3136
* [FEATURE] filesystem: Add fs-types/mount-points include flags #3171
* [FEATURE] netstat: Add collector for tcp packet counters for FreeBSD. #3177
* [ENHANCEMENT] ethtool: Add logging for filtering flags #2979
* [ENHANCEMENT] netstat: Add TCPRcvQDrop to default metrics #3021
* [ENHANCEMENT] diskstats: Add block device rotational #3022
* [ENHANCEMENT] cpu: Support CPU online status #3032
* [ENHANCEMENT] arp: optimize interface name resolution #3133
* [ENHANCEMENT] textfile: Allow specifiying multiple directory globs #3135
* [ENHANCEMENT] filesystem: Add reporting of purgeable space on MacOS #3206
* [ENHANCEMENT] ethtool: Skip full scan of NetClass directories #3239
* [BUGFIX] zfs: Prevent `procfs` integer underflow #2961
* [BUGFIX] pressure: Fix collection on systems that do not expose a full CPU stat #3054
* [BUGFIX] cpu: Fix FreeBSD 32-bit host support and plug memory leak #3083
* [BUGFIX] hwmon: Add safety check to hwmon read #3134
* [BUGFIX] zfs: Allow space in dataset name #3186

Signed-off-by: Ben Kochie <superq@gmail.com>
@SuperQ SuperQ mentioned this pull request Feb 16, 2025
SuperQ added a commit that referenced this pull request Feb 17, 2025
* [CHANGE] meminfo: Convert linux implementation to use procfs lib #3049
* [CHANGE] Update logging to use Go log/slog #3097
* [FEATURE] filesystem: Add `node_filesystem_mount_info` metric #2970
* [FEATURE] btrfs: Add metrics for commit statistics #3010
* [FEATURE] interrupts: Add collector include/exclude filtering #3028
* [FEATURE] interrupts: Add "exclude zeros" filtering #3028
* [FEATURE] slabinfo: Add filters for slab name. #3041
* [FEATURE] pressure: add IRQ PSI metrics #3048
* [FEATURE] hwmon: Add include and exclude filter for sensors #3072
* [FEATURE] filesystem: Add NetBSD support #3082
* [FEATURE] netdev: Add ifAlias label #3087
* [FEATURE] hwmon: Add Support for GPU Clock Frequencies #3093
* [FEATURE] Add `exclude[]` URL parameter #3116
* [FEATURE] Add AIX support #3136
* [FEATURE] filesystem: Add fs-types/mount-points include flags #3171
* [FEATURE] netstat: Add collector for tcp packet counters for FreeBSD. #3177
* [ENHANCEMENT] ethtool: Add logging for filtering flags #2979
* [ENHANCEMENT] netstat: Add TCPRcvQDrop to default metrics #3021
* [ENHANCEMENT] diskstats: Add block device rotational #3022
* [ENHANCEMENT] cpu: Support CPU online status #3032
* [ENHANCEMENT] arp: optimize interface name resolution #3133
* [ENHANCEMENT] textfile: Allow specifiying multiple directory globs #3135
* [ENHANCEMENT] filesystem: Add reporting of purgeable space on MacOS #3206
* [ENHANCEMENT] ethtool: Skip full scan of NetClass directories #3239
* [BUGFIX] zfs: Prevent `procfs` integer underflow #2961
* [BUGFIX] pressure: Fix collection on systems that do not expose a full CPU stat #3054
* [BUGFIX] cpu: Fix FreeBSD 32-bit host support and plug memory leak #3083
* [BUGFIX] hwmon: Add safety check to hwmon read #3134
* [BUGFIX] zfs: Allow space in dataset name #3186

Signed-off-by: Ben Kochie <superq@gmail.com>
@juliusv
Copy link
Member

juliusv commented Feb 24, 2025

I just updated the Node Exporter on https://demo.promlabs.com/ from 1.7.0 to 1.9.0 and noticed that a few meminfo time series disappeared after the update. These were there previously but no longer present after the update:

node_memory_FileHugePages_bytes{instance="node-exporter:9100", job="node"}	0
node_memory_FilePmdMapped_bytes{instance="node-exporter:9100", job="node"}	0
node_memory_Hugetlb_bytes{instance="node-exporter:9100", job="node"}	0
node_memory_KReclaimable_bytes{instance="node-exporter:9100", job="node"}	84930560
node_memory_SecPageTables_bytes{instance="node-exporter:9100", job="node"}	0
node_memory_ShadowCallStack_bytes{instance="node-exporter:9100", job="node"}	1843200
node_memory_Zswap_bytes{instance="node-exporter:9100", job="node"}	0
node_memory_Zswapped_bytes{instance="node-exporter:9100", job="node"}	0

See also https://demo.promlabs.com/query?g0.expr=%7B__name__%3D%7E%22node_memory_.*%22%7D+offset+1h+unless+on%28__name__%29+%7B__name__%3D%7E%22node_memory_.*%22%7D&g0.show_tree=0&g0.tab=table&g0.end_input=2025-02-24+09%3A18%3A00&g0.moment_input=2025-02-24+09%3A18%3A00&g0.range_input=1h&g0.res_type=auto&g0.res_density=medium&g0.display_mode=lines&g0.show_exemplars=0

Is this related to the new explicit mapping not catching all possible fields?

@SuperQ
Copy link
Member

SuperQ commented Feb 24, 2025

Yes, that is probably related. I guess we need to update the procfs parsing library.

@juliusv
Copy link
Member

juliusv commented Feb 24, 2025

Yes, that is probably related. I guess we need to update the procfs parsing library.

Previously we just parsed whatever fields the /proc/meminfo file contained and translated those directly into metric names. With the procfs library, we now explicitly map each known field into a Go struct field and then read it out again. That seems brittle, as you'll never know what kind of memory usage field the kernel will add next, or which fields are only present with certain kernel / host combinations. Would it be better to go back to just passing through all fields that are actually there, as before?

@SuperQ
Copy link
Member

SuperQ commented Feb 24, 2025

I think some of the intention here was to better curate the output of these metrics for HELP text and cardinality control.

@tjhop
Copy link
Contributor Author

tjhop commented Feb 24, 2025

Ironically, moving away from the raw mapping -> struct was an explicit design decision in the beginning to better align with other collectors and to take advantage of the fact that struct breakage surfaces a compile time error:

#2957 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants