Skip to content

Port test in select.rs to sqllogic #6158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 30, 2023
Merged

Port test in select.rs to sqllogic #6158

merged 12 commits into from
Apr 30, 2023

Conversation

aprimadi
Copy link
Contributor

Which issue does this PR close?

More work toward #4495

Rationale for this change

Cause I like easy task

What changes are included in this PR?

Just porting some tests to sqllogic

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Apr 29, 2023
@aprimadi
Copy link
Contributor Author

Tests that cannot be ported (yet):

  • query_get_indexed_field - unsupported SQL types ARRAY
  • query_nested_get_indexed_field - unsupported SQL types ARRAY
  • query_nested_get_indexed_field_on_struct - unsupported SQL types ARRAY
  • query_on_string_dictionary - using DictionaryArray
  • sort_on_window_null_string - using DictionaryArray
  • test_prepare_statement - using partitioned csv
  • parallel_query_with_filter - using partitioned csv
  • boolean_literal - using partitioned csv
  • unprojected_filter - this test is using dataframe interface

@aprimadi aprimadi marked this pull request as ready for review April 30, 2023 06:49
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @aprimadi -- this looks great. I reviewed all the changes carefully

Cause I like easy task

😆 -- thank you -- this is really helpful overall and I think makes it easier, over the long term, to keep up with DataFusion and improve its test coverage

You are inspiring me to try and paralleize this work more -- I'll try and write up some tickets that can be split up for others who want to get involved with DataFusion with a useful but low overhead tasks (and it will be nice to keep track of how much we have to go)

false
false

# explain select between
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for testing the whole plan

@@ -168,7 +168,7 @@ async fn context_for_test_file(relative_path: &Path) -> SessionContext {
let ctx = SessionContext::with_config(config);

match relative_path.file_name().unwrap().to_str().unwrap() {
"aggregate.slt" | "select.slt" => {
"aggregate.slt" => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@alamb alamb merged commit 2f39479 into apache:main Apr 30, 2023
@aprimadi aprimadi deleted the ap-sqllogic branch April 30, 2023 16:45
@aprimadi
Copy link
Contributor Author

No worries @alamb. This keeps me engaged while I'm learning through the more difficult bits of the project.

@alamb
Copy link
Contributor

alamb commented Apr 30, 2023

No worries @alamb. This keeps me engaged while I'm learning through the more difficult bits of the project.

It is much appreciated. While you have a newcomer's eyes, I would be interested in the parts you found the most difficult to pick up (really I am interested in how to make it less difficult).

@aprimadi
Copy link
Contributor Author

aprimadi commented May 1, 2023

The external sort come to mind and also the k-way sort preserving merge.

K-way sort preserving merge because there is a clever bit of the code that uses an array representation of a binary tree (to implement losers tree).

Like for example this code:
https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/physical_plan/sorts/merge.rs#L252
https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/physical_plan/sorts/merge.rs#L271
These really find the leaf node of the loser tree that this cursor index belongs to but it's not immediately apparent without realizing that the binary tree is represented this way:

   0
   1 
 2   3
4 5 6 7

i.e. the root of the tree is an element at index 1 in the vector and the first child of the root is an element at index 2 in the vector and so on...

I guess I'm just hesitant to work on larger tasks due to:

  1. Whether someone else is currently working on it
  2. Whether it's high priority. Don't want to block the issue for a prolonged period.
  3. If the issue touch areas that I'm currently not very familiar with. Again for the same reason as 2, don't want to block the issue for a prolonged period.

Currently I prefer to work on issue that I can finish on one weekend and yeah currently #4495 is my default go to issue if I don't find issues that are interesting or that I'm sure I can finish on one weekend 😆.

@alamb
Copy link
Contributor

alamb commented May 1, 2023

K-way sort preserving merge because there is a clever bit of the code that uses an array representation of a binary tree (to implement losers tree).

Yes, I agree this is quite a nice piece of code that is non trivial. I think the parts of DataFusion that are the most performance critical are also the parts that are the hardest to understand. It would be great to improve the understandability.

@aprimadi I wonder if you might be willing to take a shot at adding some comments (perhaps copy/paste/modify your comments in #6158 (comment)) to help other future readers

Also, I will try and be better about triaging tickets as they come in and marking the relevant ones as "good first issues" -- https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22

It is a little lacking now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants