Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: Add spark-4.0 profile and shims #407

Merged
merged 27 commits into from
May 28, 2024

Conversation

kazuyukitanimura
Copy link
Contributor

Which issue does this PR close?

Part of #372

Rationale for this change

To be ready for Spark 4.0

What changes are included in this PR?

This PR adds the spark-4.0 profile and shims

How are these changes tested?

This is an initial commit. Tests with the spark-4.0 profile do not pass yet. Tests for spark-3.x should pass.

@kazuyukitanimura
Copy link
Contributor Author

@viirya @andygrove Please approve to start CI

@viirya
Copy link
Member

viirya commented May 9, 2024

Triggered.

@kazuyukitanimura
Copy link
Contributor Author

@viirya @andygrove Is there a way to start CI without bothering you?

@viirya
Copy link
Member

viirya commented May 9, 2024

I remember only the first-time contributors need approval to trigger CI.

@kazuyukitanimura
Copy link
Contributor Author

kazuyukitanimura commented May 11, 2024

@viirya @andygrove passed all the tests on my personal github actions
It looks still not triggering here. Maybe need a committership to start CI with github actions workflow change?

@viirya
Copy link
Member

viirya commented May 11, 2024

Triggered CI pipelines.

pom.xml Show resolved Hide resolved
pom.xml Show resolved Hide resolved
override def write(
rdd: RDD[_],
inputs: Iterator[_],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the input iterator has partitionId, we still need to get rid of it. I don't run this but looks like this will cause failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have ShimCometShuffleWriteProcessor.write() for Spark 3.x

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I mean for Spark 4.0 case, we also need to get rid of the partition. I saw you replied my other comment: #407 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the partitionId removal.
Thanks @viirya

@kazuyukitanimura
Copy link
Contributor Author

@viirya Please take another look
cc @andygrove @comphead

matrix:
os: [ubuntu-latest]
java_version: [17]
test-target: [rust, java]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to run rust test for Spark 4.0 separately? It should be no difference with 3.4 or 3.3/3.2.
We also don't run rust test for Spark 3.3/3.2 but only for Spark 3.4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment on lines 109 to 113
- if: matrix.test-target == 'java'
name: Install Spark
shell: bash
working-directory: ./apache-spark
run: build/mvn install -Phive -Phadoop-cloud -DskipTests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only needed for Spark 4.0? I don't see we install it for other Spark versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, This is only needed for Spark 4.0 because there is no 4.0.0-SNAPSHOT jar publicly available

Comment on lines 27 to 29
/**
* Returns a tuple of expressions for the `unhex` function.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/**
* Returns a tuple of expressions for the `unhex` function.
*/
/**
* Returns a tuple of expressions for the `unhex` function.
*/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -29,7 +29,7 @@ import org.apache.comet.CometConf
/**
* This test suite contains tests for only Spark 3.4.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
* This test suite contains tests for only Spark 3.4.
* This test suite contains tests for only Spark 3.4+.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I have some minor comments.

@kazuyukitanimura
Copy link
Contributor Author

kazuyukitanimura commented May 23, 2024

Thank you @viirya Please take a final look 32bc314

@viirya
Copy link
Member

viirya commented May 28, 2024

@kazuyukitanimura I think you can try to merge this.

@kazuyukitanimura kazuyukitanimura merged commit 9b3e87b into apache:main May 28, 2024
43 checks passed
@kazuyukitanimura
Copy link
Contributor Author

Thank you @viirya merged

himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
This PR adds the spark-4.0 profile and shims
This is an initial commit. Tests with the spark-4.0 profile do not pass yet. Tests for spark-3.x should pass.

(cherry picked from commit 9b3e87b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants