Description
The Memory estimation process uses the extractor against the source index before the data frame analytics process runs. But, in doing this, it ignores the the data frame analytics source query.
Extractor simply does a match_all: https://github.com/elastic/elasticsearch/blob/1dd816f030e256f497c1726695d1209a481b9d8e/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/DataFrameDataExtractorFactory.java#L42..L58
And the estimation process uses the extractor to determine row and column counts: https://github.com/elastic/elasticsearch/blob/12528b351a99941ffc1d44dd2c0265768ef79635/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/process/MemoryUsageEstimationProcessManager.java#L59..L60
The memory estimator needs to account for a defined query when estimating against the source index.