Skip to content

SQL: GROUPing BY YEAR with interval formula generates the incorrect query #46428

Closed
@astefan

Description

@astefan

For this SQL query SELECT YEAR(birth_date + INTERVAL 1 YEAR) AS x FROM test_emp GROUP BY x, the following ES query is generated:

{
    "size": 0,
    "_source": false,
    "stored_fields": "_none_",
    "aggregations": {
        "groupby": {
            "composite": {
                "size": 1000,
                "sources": [
                    {
                        "1185": {
                            "date_histogram": {
                                "field": "birth_date + INTERVAL 1 YEAR",
                                "missing_bucket": true,
                                "value_type": "date",
                                "order": "asc",
                                "fixed_interval": "31536000000ms",
                                "time_zone": "Z"
                            }
                        }
                    }
                ]
            }
        }
    }
}

Which is obviously wrong. Also, the results of the query will be wrong, as well. A possible query should be:

{
    "size": 0,
    "_source": false,
    "stored_fields": "_none_",
    "aggregations": {
        "groupby": {
            "composite": {
                "size": 1000,
                "sources": [
                    {
                        "1037": {
                            "date_histogram": {
                                "script": {
                                    "source": "InternalSqlScriptUtils.add(InternalSqlScriptUtils.docValue(doc,params.v0),InternalSqlScriptUtils.intervalYearMonth(params.v1,params.v2))",
                                    "lang": "painless",
                                    "params": {
                                        "v0": "birth_date",
                                        "v1": "P1Y",
                                        "v2": "INTERVAL_YEAR"
                                    }
                                },
                                "missing_bucket": true,
                                "value_type": "date",
                                "order": "asc",
                                "fixed_interval": "31536000000ms",
                                "time_zone": "Z",
                                "format": "YYYY"
                            }
                        }
                    }
                ]
            }
        }
    }
}

This fix should probably be combined with a fix for #40162 where we should probably use a calendar_interval of 1 year only for grouping by YEAR function scenarios.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions