Skip to content

Data Frame Analytics memory estimation ignores Query #49454

Closed
@benwtrent

Description

@benwtrent

The Memory estimation process uses the extractor against the source index before the data frame analytics process runs. But, in doing this, it ignores the the data frame analytics source query.

Extractor simply does a match_all: https://github.com/elastic/elasticsearch/blob/1dd816f030e256f497c1726695d1209a481b9d8e/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/DataFrameDataExtractorFactory.java#L42..L58

And the estimation process uses the extractor to determine row and column counts: https://github.com/elastic/elasticsearch/blob/12528b351a99941ffc1d44dd2c0265768ef79635/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/process/MemoryUsageEstimationProcessManager.java#L59..L60

The memory estimator needs to account for a defined query when estimating against the source index.

Metadata

Metadata

Labels

:mlMachine learning>bug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions