Closed
Description
At present the ML UI has functionality to calculate a rough estimate of the model memory requirement for certain types of anomaly detection jobs. However, it doesn't cover all detector functions and doesn't cover population jobs.
The ML API in Elasticsearch should provide an endpoint that encapsulates the various formulas, can be extended to cover all possible configurations, and can be kept up to date when model sizes change.
The inputs to this endpoint will be:
- An
analysis_config
, in the same format as would be provided to the create job endpoint - documented in https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-put-job.html#ml-put-job-path-parms - Overall cardinalities for the
by
,over
andpartition
fields - Max bucket cardinalities for
influencer
fields that are not alsoby
,over
orpartition
fields
An example of the proposed request format is:
POST _ml/anomaly_detectors/_estimate_model_memory
{
"analysis_config": {
"bucket_span": "10m",
"detectors": [
{
"function": "sum",
"field_name": "bytes",
"partition_field_name": "src_ip"
}
],
"influencers": [ "src_ip", "dest_ip" ]
},
"overall_cardinality": {
"src_ip": 567483
},
"max_bucket_cardinality": {
"dest_ip": 7456
}
}
An example of the proposed response format is:
{
"model_memory_estimate": "836mb"
}