Skip to content

[Enhancement] Keep aggregation results in Calcite consistent with current PPL behavior #3403

@LantaoJin

Description

@LantaoJin

Some behaviours between PPL and Databases are different.

In stats command of PPL-on-OpenSearch, the order of result is quite different with other databases.

As an example, check command stats count() by colA, colB:

  1. the sequence of output schema is different.

In PPL v2, the sequence of output schema is count, colA, colB. But in most databases, the sequence of output schema is colA, colB, count.

  1. the output order id different.

In PPL v2, the order of output results is ordered by colA then colB. But in most databases, the order of output is random. User must add add ORDER BY clause after GROUP BY clause to keep the results aligning.

Query without order by in database/Calcite:

select deptno,job, MAX(SAL) max_sal from emp group by deptno,job;
+--------+-----------+---------+
| DEPTNO |    JOB    | MAX_SAL |
+--------+-----------+---------+
| 20     | CLERK     | 1100.00 |
| 30     | SALESMAN  | 1600.00 |
| 20     | MANAGER   | 2975.00 |
| 30     | MANAGER   | 2850.00 |
| 10     | MANAGER   | 2450.00 |
| 20     | ANALYST   | 3000.00 |
| 10     | PRESIDENT | 5000.00 |
| 30     | CLERK     | 950.00  |
| 10     | CLERK     | 1300.00 |
+--------+-----------+---------+

Query without sort in PPL v2, the stats-by command will sort by-expression by default.

source = opensearch_dashboards_sample_data_flights | stats count() by DestCountry,OriginWeather
"datarows": [
    [
      7,
      "AE",
      "Clear"
    ],
    [
      3,
      "AE",
      "Cloudy"
    ],
    [
      5,
      "AE",
      "Damaging Wind"
    ],
...
[
      208,
      "CN",
      "Clear"
    ],
    [
      187,
      "CN",
      "Cloudy"
    ],
    [
      79,
      "CN",
      "Damaging Wind"
    ],
...
  1. Span column in by-expression always the first column in result.

stats count() by gender, state, span(age,10) outputs

1, 20, "F", "VA"
1, 30, "F", "IN"
1, 30, "F", "PA"
1, 30, "M", "IL"

This issue is to keep alignment with V2 behaviour in new Calcite implementation.

Sub-issues

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcalcitecalcite migration releated

Type

No type

Projects

Status

Not Started

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions