-
Notifications
You must be signed in to change notification settings - Fork 15
Adapt code and dependencies to Spark 3.0.1 #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
||
|
||
|
||
override def getReaderForRange[K, C]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear how to specify a range for CrailShuffleReader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify?
@@ -82,7 +82,7 @@ class CrailShuffleWriter[K, V]( | |||
initRatio = runTime/initTime | |||
overhead = 100/initRatio | |||
logInfo("shuffler writer: initTime " + initTime + ", runTime " + runTime + ", initRatio " + initRatio + ", overhead " + overhead) | |||
return Some(MapStatus(blockManager.shuffleServerId, sizes)) | |||
return Some(MapStatus(blockManager.shuffleServerId, sizes, context.taskAttemptId())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear what a good value would be. It also works with a constant value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added two comments in the code, please have a look and let me know your thoughts.
@@ -42,6 +42,12 @@ | |||
<spark.version>2.2.0</spark.version> | |||
</properties> | |||
</profile> | |||
<profile> | |||
<id>spark-3.0.1</id> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the 2.X profile still work or should we remove them?
<scope>compile</scope> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.apache.crail</groupId> | ||
<artifactId>crail-client</artifactId> | ||
<version>1.2-incubating-SNAPSHOT</version> | ||
<version>1.3-incubating-SNAPSHOT</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we depend on a non-snapshot? Or are there any changes that require latest and greatest (since we do not push the SNAPSHOT to maven central this would require crail source to compile)
|
||
|
||
|
||
override def getReaderForRange[K, C]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify?
Feel free to add yourself as an authors in https://github.com/zrlio/crail-spark-io/blob/master/AUTHORS |
Adapt the plugin tp Spark 3.0. The version for Spark 2.2.0 is under a new branch spark_2_2_0 so that we can
keep the newest version for the newrest Spark version in master.
I have tested the plugin with Spark 3.0.1, Hadoop 2.7, Apache Crail 1.3 and Crail Spark Terasort
with 1GB, 4GB, 16HB and 64GB and validated the correct sorting with and without this
plugin. I did not run into problems or incorrect sortings.
Please have a look at the code.