Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kusto phase1 #37961

Merged
merged 38 commits into from
Sep 27, 2022
Merged

Kusto phase1 #37961

merged 38 commits into from
Sep 27, 2022

Conversation

kashwy
Copy link
Contributor

@kashwy kashwy commented Jun 10, 2022

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

This is the initial implement of Kusto Query Language. (MVP)

Add Support to Kusto Query Language (KQL) as mentioned in this issue:
#36806

In this PR, we support the following features .

  • Tabular expression statements
  • Limit returned results
  • Select Column (basic project)
  • sort, order
  • Perform string equality operations
  • Filter using a list of elements
  • Filter using common string operations
  • Aggregate by columns
  • Basic aggregate functions

We parse KQL queries to ClickHouse ASTs, use a setting called sql_dialect to switch between ClickHouse and KQL:
By default sql_dialect = 'clickhouse'
set sql_dialect = 'kusto'
Switch to parse KQL
set sql_dialect = 'clickhouse'
switch back to parse ClickHouse SQL.

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

@robot-ch-test-poll robot-ch-test-poll added the pr-feature Pull request with new product feature label Jun 10, 2022
@qoega qoega added the can be tested Allows running workflows for external contributors label Jun 10, 2022
@yakov-olkhovskiy yakov-olkhovskiy self-assigned this Jun 11, 2022
kashwy and others added 10 commits June 13, 2022 06:28
This is the initial implement of Kusto Query Language.

in this commit, we support the following features as MVP :

Tabular expression statements
Limit returned results
Select Column (basic project)
sort, order
Perform string equality operations
Filter using a list of elements
Filter using common string operations
Some string  operators
Aggregate by columns
Base aggregate functions
only support avg, count ,min, max, sum
Aggregate by time intervals
Add new test cases
Copy link
Member

@yakov-olkhovskiy yakov-olkhovskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall approach with translation into SQL seems fine, though maybe not very efficient in the future for some KQL specific features.
Also I see some issues with expressions sequencing - for example expression t1 | take 3 | order by id will be evaluated to the same SQL as t1 | order by id | take 3 - apparently you are trying to combine everything into one SELECT which is not always correct...

@@ -38,6 +38,7 @@ static constexpr UInt64 operator""_GiB(unsigned long long value)
*/

#define COMMON_SETTINGS(M) \
M(String, sql_dialect, "clickhouse", "Which SQL dialect will be used to parse query", 0)\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make it just "dialect"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature looks quite marginal, so I would prefer to use longer setting name. We could use just dialect setting of type Enum (not String), but we don't have other dialects for now (and I'm not sure we will have more dialects in the near future). What do you think about something like use_kusto_query_language_dialect of type Bool?

Copy link
Member

@yakov-olkhovskiy yakov-olkhovskiy Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tavplubix Bool will exclude any other dialects in the future if we decide to add any - it should be mutually exclusive...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is another PR use the dialect.:
https://github.com/ClickHouse/ClickHouse/pull/36308/files
so there will be more dialects

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so it makes sense to name it just dialect, but it must be Enum, not String

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so it makes sense to name it just dialect, but it must be Enum, not String
sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we changed the dialect to Enum.

@kashwy
Copy link
Contributor Author

kashwy commented Jun 30, 2022

Overall approach with translation into SQL seems fine, though maybe not very efficient in the future for some KQL specific features. Also I see some issues with expressions sequencing - for example expression t1 | take 3 | order by id will be evaluated to the same SQL as t1 | order by id | take 3 - apparently you are trying to combine everything into one SELECT which is not always correct...

yes it's not efficient for some KQL specific features, according the concept of KQL, output of a pipeline is the input of next pipeline, so each pipeline will generate a query, I am not sure if this is efficient in ClickHouse when KQL has a lot of pipe

@yakov-olkhovskiy
Copy link
Member

yakov-olkhovskiy commented Jun 30, 2022

yes it's not efficient for some KQL specific features, according the concept of KQL, output of a pipeline is the input of next pipeline, so each pipeline will generate a query, I am not sure if this is efficient in ClickHouse when KQL has a lot of pipe

ClickHouse will try to optimize of course, but it's not always possible I guess. But for simple cases like this it can be done in KQL parser, so t1 | take 3 | order by id suppose to translate into select with sub-select

@kashwy
Copy link
Contributor Author

kashwy commented Jun 30, 2022

yes it's not efficient for some KQL specific features, according the concept of KQL, output of a pipeline is the input of next pipeline, so each pipeline will generate a query, I am not sure if this is efficient in ClickHouse when KQL has a lot of pipe

ClickHouse will try to optimize of course, but it's not always possible I guess. But for simple cases like this it can be done in KQL parser, so t1 | take 3 | order by id suppose to translate into select with sub-select

yes, we can do for this case

kashwy and others added 10 commits August 16, 2022 17:31
This is the initial implement of Kusto Query Language.

in this commit, we support the following features as MVP :

Tabular expression statements
Limit returned results
Select Column (basic project)
sort, order
Perform string equality operations
Filter using a list of elements
Filter using common string operations
Some string  operators
Aggregate by columns
Base aggregate functions
only support avg, count ,min, max, sum
Aggregate by time intervals
Add new test cases
@kashwy
Copy link
Contributor Author

kashwy commented Aug 24, 2022

@kashwy will review and test it soon

thanks

@yakov-olkhovskiy
Copy link
Member

I don't think it's correct:

ip-172-31-42-195.us-east-2.compute.internal :) tbl

SELECT *
FROM tbl

┌─a─┬─b─┐
│ 1 │ 1 │
│ 1 │ 2 │
│ 2 │ 1 │
│ 2 │ 2 │
└───┴───┘
4 rows in set. Elapsed: 0.009 sec.

ip-172-31-42-195.us-east-2.compute.internal :) tbl | order by b | limit 2

SELECT *
FROM
(
    SELECT *
    FROM tbl
    LIMIT 2
)
ORDER BY b DESC

┌─a─┬─b─┐
│ 1 │ 2 │
│ 1 │ 1 │
└───┴───┘
2 rows in set. Elapsed: 0.018 sec.

ip-172-31-42-195.us-east-2.compute.internal :) tbl | limit 2 | order by b

SELECT *
FROM
(
    SELECT *
    FROM tbl
    LIMIT 2
)
ORDER BY b DESC

┌─a─┬─b─┐
│ 1 │ 2 │
│ 1 │ 1 │
└───┴───┘
2 rows in set. Elapsed: 0.018 sec.

I see at least two discrepancies here - default order should be ASC I think, and both KQL expressions are evaluated to the same SQL, which is not correct - the first one should be SELECT * FROM tbl ORDER BY b ASC LIMIT 2

@kashwy
Copy link
Contributor Author

kashwy commented Aug 25, 2022

@yakov-olkhovskiy thanks for the test, default for order in KQL is desc. I will fix the subquery for the limit.

@kashwy
Copy link
Contributor Author

kashwy commented Sep 7, 2022

Hi, @yakov-olkhovskiy I have updated the PR, can you do a test and review again ?

Thanks

@kashwy
Copy link
Contributor Author

kashwy commented Sep 12, 2022

Hi @yakov-olkhovskiy , did you get chance to test and review ?

Thanks.

@yakov-olkhovskiy
Copy link
Member

I will check it soon

@kashwy
Copy link
Contributor Author

kashwy commented Sep 20, 2022

@yakov-olkhovskiy I updated the PR according to your suggestions.

Thanks

@yakov-olkhovskiy
Copy link
Member

@kashwy will review it soon

Copy link
Member

@yakov-olkhovskiy yakov-olkhovskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@kashwy
Copy link
Contributor Author

kashwy commented Sep 27, 2022

looks good to me

@yakov-olkhovskiy thank you very much.

is it able to merge ?

@yakov-olkhovskiy yakov-olkhovskiy merged commit 7c94f98 into ClickHouse:master Sep 27, 2022
@yakov-olkhovskiy
Copy link
Member

@kashwy merged
for future - these tests don't look very reliable - they will fail if anything will be changed in the query printout - maybe try to find a better way to test

@kashwy
Copy link
Contributor Author

kashwy commented Sep 27, 2022

thanks, we have added functional test in our phase2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be tested Allows running workflows for external contributors pr-feature Pull request with new product feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants