Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stack Monitoring] Switch cgroup memory fields to keyword #88260

Merged

Conversation

matschaffer
Copy link
Contributor

Rel elastic/beats#31765

Per https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html they should be strings. The internal collection template (monitoring-es.json) has them as keywords without any ignore_above.

This template uses ignore_above on all keywords, so keeping that convention for these mapping updates.

Opening as draft since we won't want to merge this until there's a corresponding beats change in place to match.

Testing

So far I can't figure out how to get ES to respond with max in the cgroup memory setting:

$ curl -s -k -u elastic:'(password)' https://localhost:9200/_nodes/stats/os | jq '.nodes | values | .[].os.cgroup.memory'
{
  "control_group": "/",
  "limit_in_bytes": "9223372036854771712",
  "usage_in_bytes": "17973219328"
}

I'll need that before I can advise on how to test this.

@matschaffer matschaffer requested a review from jbaiera July 5, 2022 07:52
@matschaffer matschaffer self-assigned this Jul 5, 2022
@matschaffer matschaffer requested a review from a team July 5, 2022 07:53
@elasticsearchmachine elasticsearchmachine added the external-contributor Pull request authored by a developer outside the Elasticsearch team label Jul 5, 2022
@matschaffer
Copy link
Contributor Author

elastic/beats#32197 is open, considering this ready for review.

@matschaffer matschaffer marked this pull request as ready for review July 5, 2022 08:04
@@ -732,14 +732,16 @@
"usage": {
"properties": {
"bytes": {
"type": "long"
"ignore_above": 1024,
"type": "keyword"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we won't be able to run metric aggregations on this field anymore, is that okay?

Copy link
Contributor Author

@matschaffer matschaffer Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I should at least check that we're not trying to graph it in stack monitoring. ES returns a string (potentially max) so if we needed to keep it graphable I guess we could have metricbeat just drop the field if it's not a number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see it in https://github.com/elastic/kibana/blob/b8c1323d5bdc921a6238003bc698bc03750c5320/x-pack/plugins/monitoring/server/lib/metrics/elasticsearch/metrics.ts

Additionally it's already keyword in internal collection so I think we're safe here. Would it be nice to aggregate cgroup memory, perhaps. But I'd probably opt to tackle that as a new feature.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @matschaffer, I've created a changelog YAML for you.

@matschaffer matschaffer requested a review from a team July 7, 2022 02:24
@matschaffer
Copy link
Contributor Author

Guessing I can probably merge this but maybe someone from @elastic/es-data-management should weigh in too. Also maybe they have ideas on how to get ES producing the max value seen in forum posts.

@pugnascotia pugnascotia added v8.3.3 and removed v8.3.2 labels Jul 7, 2022
@jbaiera
Copy link
Member

jbaiera commented Jul 8, 2022

The change looks good to me.

To weigh in on the cgroups question about why "max" appears as a value: The code that collects the cgroup memory values is just reading a single line from either the /sys/fs/cgroup/<GROUP>/memory.limit_in_bytes file (cgroups v1) or the /sys/fs/cgroup/<GROUP>/memory.max file (cgroups v2).

In cgroups v1, a value of -1 indicates that there is no limit on a cgroup's memory, but in v2 a value of max indicates that there is no limit. The value max is default in cgroups v2.

Incidentally, the api that returns these values originally returned a string not because max is a valid value (though it is now) but because some linux distros would return the maximum value of an unsigned 64 bit long if there was no limit. That would overflow a signed long in Java and thus a string was returned instead.

@matschaffer
Copy link
Contributor Author

Ah, so 9223372036854771712 is max when running on docker for mac I'd guess. Interesting. Thanks @jbaiera !

@matschaffer
Copy link
Contributor Author

Finally got the metricbeat suite to run against a docker image with this PR (./gradlew buildDockerImage + a follow-up PR to metricbeat mage setup). I'd say we're good to go.

@matschaffer matschaffer merged commit 6e4cb3d into elastic:master Jul 11, 2022
@matschaffer matschaffer deleted the beats-31765-cgroup-memory-mapping branch July 11, 2022 07:44
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.3

weizijun added a commit to weizijun/elasticsearch that referenced this pull request Jul 12, 2022
* upstream/master:
  Pass IndexMetadata to AllocationDecider.can_remain (elastic#88453)
  [TSDB] Cache rollup bucket timestamp to reduce rounding cost (elastic#88420)
  Correct some typos/mistakes in comments/docs (elastic#88446)
  Make ClusterInfo use immutable maps in all cases (elastic#88447)
  Reduce map lookups (elastic#88418)
  Don't index geo_shape field in AbstractBuilderTestCase (elastic#88437)
  Remove usages of TestGeoShapeFieldMapperPlugin from enrich module (elastic#88440)
  Fix test memory leak (elastic#88362)
  Improve error when sorting on incompatible types (elastic#88399)
  Remove usages of BucketCollector#getLeafCollector(LeafReaderContext) (elastic#88414)
  Mute ReactiveStorageIT::testScaleWhileShrinking (elastic#88431)
  Clarify snapshot docs on archive indices (elastic#88417)
  [Stack Monitoring] Switch cgroup memory fields to keyword (elastic#88260)
  Fix RealmIdentifier XContent parser (elastic#88410)
  Make LoggedExec gradle task configuration cache compatible (elastic#87621)
  Update CorruptedFileIT so that it passes with new allocation strategy (elastic#88314)
  Update RareClusterStateIT to work with the new shards allocator (elastic#87922)
  Ensure CreateApiKey always creates a new document (elastic#88413)

# Conflicts:
#	x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/v2/RollupShardIndexer.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Monitoring external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Data Management Meta label for data/management team Team:Infra Monitoring UI v8.3.3 v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants