-
Notifications
You must be signed in to change notification settings - Fork 181
Description
Some behaviours between PPL and Databases are different.
In stats command of PPL-on-OpenSearch, the order of result is quite different with other databases.
As an example, check command stats count() by colA, colB:
- the sequence of output schema is different.
In PPL v2, the sequence of output schema is count, colA, colB. But in most databases, the sequence of output schema is colA, colB, count.
- the output order id different.
In PPL v2, the order of output results is ordered by colA then colB. But in most databases, the order of output is random. User must add add ORDER BY clause after GROUP BY clause to keep the results aligning.
Query without order by in database/Calcite:
select deptno,job, MAX(SAL) max_sal from emp group by deptno,job;
+--------+-----------+---------+
| DEPTNO | JOB | MAX_SAL |
+--------+-----------+---------+
| 20 | CLERK | 1100.00 |
| 30 | SALESMAN | 1600.00 |
| 20 | MANAGER | 2975.00 |
| 30 | MANAGER | 2850.00 |
| 10 | MANAGER | 2450.00 |
| 20 | ANALYST | 3000.00 |
| 10 | PRESIDENT | 5000.00 |
| 30 | CLERK | 950.00 |
| 10 | CLERK | 1300.00 |
+--------+-----------+---------+
Query without sort in PPL v2, the stats-by command will sort by-expression by default.
source = opensearch_dashboards_sample_data_flights | stats count() by DestCountry,OriginWeather
"datarows": [
[
7,
"AE",
"Clear"
],
[
3,
"AE",
"Cloudy"
],
[
5,
"AE",
"Damaging Wind"
],
...
[
208,
"CN",
"Clear"
],
[
187,
"CN",
"Cloudy"
],
[
79,
"CN",
"Damaging Wind"
],
...
- Span column in by-expression always the first column in result.
stats count() by gender, state, span(age,10) outputs
1, 20, "F", "VA"
1, 30, "F", "IN"
1, 30, "F", "PA"
1, 30, "M", "IL"
This issue is to keep alignment with V2 behaviour in new Calcite implementation.
Sub-issues
Metadata
Metadata
Assignees
Labels
Type
Projects
Status