Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Enable Spark SQL tests for Spark 3.5.1 #603

Merged
merged 15 commits into from
Jul 2, 2024

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Jun 27, 2024

Which issue does this PR close?

Closes #589

Rationale for this change

What changes are included in this PR?

  • Add diff file for 3.5.1
  • Enable Spark SQL tests for 3.5.1 in CI
  • Add documentation on how I did this

How are these changes tested?

Comment on lines 23 to 24
results as that version of Spark. To do this, we need to make some changes to the Apache Spark source code so that
it enables Comet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to make some changes
Technically we do not need to make changes in Spark to enable Comet, but for testing purpose, we implemented some convenient features.

## Generating The Diff File

```shell
git diff v3.5.1 58c7ce4407d2c4685a1feaf3e60cefac32de0d39 > ../datafusion-comet/dev/diffs/3.5.1.diff
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simply git diff v3.5.1 > ..

@andygrove andygrove marked this pull request as ready for review June 28, 2024 17:49
@codecov-commenter
Copy link

codecov-commenter commented Jun 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 34.06%. Comparing base (2aa20f0) to head (345592a).
Report is 4 commits behind head on main.

Additional details and impacted files
@@              Coverage Diff              @@
##               main     #603       +/-   ##
=============================================
- Coverage     54.66%   34.06%   -20.61%     
- Complexity      792      796        +4     
=============================================
  Files           103      108        +5     
  Lines          9713    38731    +29018     
  Branches       1851     8568     +6717     
=============================================
+ Hits           5310    13195     +7885     
- Misses         3451    22793    +19342     
- Partials        952     2743     +1791     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andygrove
Copy link
Member Author

@kazuyukitanimura Spark SQL tests are now passing in CI, but I had to ignore quite a few. I will file a follow on issue for resolving those.

Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
What would be the best way to find ignored tests that are specific to 3.5?
For example, https://github.com/apache/datafusion-comet/blob/main/dev/diffs/4.0.0-preview1.diff#L445
Spark 4.0.0-preview1.diff has #551 mentioned for all additionally ignored tests compared 3.4.3.diff

@andygrove
Copy link
Member Author

Looks good. What would be the best way to find ignored tests that are specific to 3.5? For example, https://github.com/apache/datafusion-comet/blob/main/dev/diffs/4.0.0-preview1.diff#L445 Spark 4.0.0-preview1.diff has #551 mentioned for all additionally ignored tests compared 3.4.3.diff

Good point. I will go ahead and update the diff with links for all newly ignored tests compared to 3.4.3.

Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andygrove
Copy link
Member Author

Thanks for the review @kazuyukitanimura

@andygrove andygrove merged commit 917fd43 into apache:main Jul 2, 2024
73 checks passed
@andygrove andygrove deleted the spark-sql-tests-351 branch July 2, 2024 02:18
himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
* Enable Spark SQL tests for Spark 3.5.1

* fix conflicts

* finish creating diff

* improve docs, update slf4j version

* improve docs

* fix diff

* fix diff

* address feedback

* add IgnoreComet to patch

* update test

* remove unused imports

* ignore 2 failing tests

* scalastyle

* ignore failing parquet metadata tests

* add links to tracking issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Spark SQL tests for Spark 3.5.1
3 participants