Skip to content

Adapt code and dependencies to Spark 3.0.1 #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

asqasq
Copy link
Contributor

@asqasq asqasq commented Dec 14, 2020

Adapt the plugin tp Spark 3.0. The version for Spark 2.2.0 is under a new branch spark_2_2_0 so that we can
keep the newest version for the newrest Spark version in master.

I have tested the plugin with Spark 3.0.1, Hadoop 2.7, Apache Crail 1.3 and Crail Spark Terasort
with 1GB, 4GB, 16HB and 64GB and validated the correct sorting with and without this
plugin. I did not run into problems or incorrect sortings.

Please have a look at the code.




override def getReaderForRange[K, C](
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear how to specify a range for CrailShuffleReader.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify?

@@ -82,7 +82,7 @@ class CrailShuffleWriter[K, V](
initRatio = runTime/initTime
overhead = 100/initRatio
logInfo("shuffler writer: initTime " + initTime + ", runTime " + runTime + ", initRatio " + initRatio + ", overhead " + overhead)
return Some(MapStatus(blockManager.shuffleServerId, sizes))
return Some(MapStatus(blockManager.shuffleServerId, sizes, context.taskAttemptId()))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear what a good value would be. It also works with a constant value.

Copy link
Contributor Author

@asqasq asqasq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added two comments in the code, please have a look and let me know your thoughts.

@@ -42,6 +42,12 @@
<spark.version>2.2.0</spark.version>
</properties>
</profile>
<profile>
<id>spark-3.0.1</id>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the 2.X profile still work or should we remove them?

<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.crail</groupId>
<artifactId>crail-client</artifactId>
<version>1.2-incubating-SNAPSHOT</version>
<version>1.3-incubating-SNAPSHOT</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we depend on a non-snapshot? Or are there any changes that require latest and greatest (since we do not push the SNAPSHOT to maven central this would require crail source to compile)




override def getReaderForRange[K, C](
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify?

@PepperJo
Copy link
Contributor

Feel free to add yourself as an authors in https://github.com/zrlio/crail-spark-io/blob/master/AUTHORS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants