-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Move data tier usage calculation to node level (#100230) #101128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move data tier usage calculation to node level (#100230) #101128
Conversation
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @gmarouli, I've created a changelog YAML for you. |
Reopened: #47875 |
@elasticmachine run elasticsearch-ci/part-2 |
@elasticmachine update branch |
@elasticmachine run elasticsearch-ci/part-2 |
@elasticmachine update branch |
Hi @gmarouli, I've updated the changelog YAML for you. |
I am switching to build kite so I can re-run the failing test quicker. I haven't been able to find the connection between the failing test and this code and I cannot reproduce it locally. But it appears that it fails consistently only in this PR so there must be something I am missing |
I cannot figure out why the test is failing. I opened another PR in which I am going to apply the changes step by step and see which one is triggering the test failure. |
Also I was just able to reproduce it the failure when it runs in the whole suite not as an individual test, I will follow up on this too. |
Duplicate of #101599 |
Current situation
DataTiersUsageTransportAction
executes an internal nodes stats action with all the trimmings:This puts a lot of memory pressure to the coordinating node (which in this case is always the elected master) that can cause further instabilities.
Proposed solution
We could trim down the data we need since we only care about
docs
andstore
per shard, that would reduce what needs to be kept in memory; however, with the optimisations of the many shards project, we would like to make it even more light weight.We chose to do this and to push some parts of the calculation to the nodes themselves. Namely, each node sends to the elected master, grouped per preferred tier:
The elected master then collects the data and aggregates them to one response.
Fixes: #100230