[ML] Investigate alternative methods for sharing job memory usage information

When there are multiple ml nodes in the cluster the job allocation decision is made based on the number of open jobs on each node and how much memory they use. Job memory usage is store in the job configuration and is updated periodically during the job's run when a model size stats doc is emitted by autodetect. This can lead to frequent job config updates (cluster state updates) particularly so for historical look-back jobs.   

1. Consider moving the job's established memory usage from the config as it is a result of the job running not part of it's setup. 
2. Consider alternative methods go gather the open job's memory usage and make that information trivially available to the code making the allocation decision. 

This is pertinent to the job config migration project #32905 where the job's memory usage is not available in the cluster state during the allocation decision. A temporary work around was implemented in #33994 basing the decision on the job count rather than memory usage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Investigate alternative methods for sharing job memory usage information #34084

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ML] Investigate alternative methods for sharing job memory usage information #34084

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions