Description
Elasticsearch Version
8.8.0
Installed Plugins
No response
Java Version
bundled
OS Version
All
Problem Description
Aggregation on size field done by a Rollup job fails with the following error:
[es/i-1/es.log] [2023-06-20T14:50:00.062Z][WARN][org.elasticsearch.xpack.core.indexing.AsyncTwoPhaseIndexer] [instance-0000000001] Error while attempting to bulk index documents: failure in bulk execution:
[0]: index [rolluptest_rollup], id [rolluptest$XKr_GEPutrdX778J5QWByg], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [_size] of type [integer] in document with id 'rolluptest$XKr_GEPutrdX778J5QWByg'. Preview of field's value: '{sum={value=109.0}}']
[1]: index [rolluptest_rollup], id [rolluptest$AmH9kDfoBhyB01E2E7JGSw], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [_size] of type [integer] in document with id 'rolluptest$AmH9kDfoBhyB01E2E7JGSw'. Preview of field's value: '{sum={value=115.0}}']
Indexing fails because, after a bucket is aggregated the Rollup job tried to write a field whose name is _size
and whose value is the aggregation over time (sum) of all the documents _size
values for a specific time bucket.
This is not possible, unfortunately, because _size
is a special meta field used to indicate the size of the _source
fo a document. It comes from the MapperSize
plugin.
A workaround exists which takes advantage of runtime fields. We can read the _size
field using a runtime field script and later on aggregate on the runtime field writing a field whose name is not _size
, but some other name we are allowed to use.
Example
PUT rolluptest
{
"mappings": {
"runtime": {
"size": {
"type": "long",
"script": {
"source": "emit(doc['_size'].value)"
}
}
},
"_size": {
"enabled": true
}
}
}
PUT _rollup/job/rolluptest
{
"index_pattern": "rolluptest",
"rollup_index": "rolluptest_rollup",
"cron": "*/30 * * * * ?",
"page_size": 1000,
"groups": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "1h",
"delay": "7d"
},
"terms": {
"fields": [ "name.keyword" ]
}
},
"metrics": [
{
"field": "size",
"metrics": [ "sum" ]
}
]
}
Steps to Reproduce
Step 1
PUT rolluptest/_doc/1
{
"text": "This is a document",
"@timestamp": "2023-05-12T00:00:00Z",
"number": 10,
"name": "app1"
}
PUT rolluptest/_doc/2
{
"text": "This is another document",
"@timestamp": "2023-05-13T00:00:00Z",
"number": 20,
"name": "app2"
}
Step 2
PUT _rollup/job/rolluptest
{
"index_pattern": "rolluptest",
"rollup_index": "rolluptest_rollup",
"cron": "*/30 * * * * ?",
"page_size": 1000,
"groups": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "1h",
"delay": "7d"
},
"terms": {
"fields": [ "name.keyword" ]
}
},
"metrics": [
{
"field": "_size",
"metrics": [ "sum" ]
}
]
}
Logs (if relevant)
[es/i-1/es.log] [2023-06-20T14:50:00.062Z][WARN][org.elasticsearch.xpack.core.indexing.AsyncTwoPhaseIndexer] [instance-0000000001] Error while attempting to bulk index documents: failure in bulk execution:
[0]: index [rolluptest_rollup], id [rolluptest$XKr_GEPutrdX778J5QWByg], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [_size] of type [integer] in document with id 'rolluptest$XKr_GEPutrdX778J5QWByg'. Preview of field's value: '{sum={value=109.0}}']
[1]: index [rolluptest_rollup], id [rolluptest$AmH9kDfoBhyB01E2E7JGSw], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [_size] of type [integer] in document with id 'rolluptest$AmH9kDfoBhyB01E2E7JGSw'. Preview of field's value: '{sum={value=115.0}}']