-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-6991] [SparkR] Adds support for zipPartitions. #5568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #30512 has finished for PR 5568 at commit
|
Thanks @hlin09 - one high-level question: do you have an intended use case in mind for arbitrary / large arity functions? In Spark's Scala API up to 4 RDDs are supported, which has perf. benefits (avoid an union + shuffle). |
I do have some user cases that would best to have more that 4 arity zip function. Another concern is that the current communication pattern between Spark and R does not allows tuples of more than RDDs. From perf point of view, "union + shuffle" might be better than recursively zip 2 of them. |
#' Zip an RDD's partitions with one (or more) RDD(s). | ||
#' | ||
#' Return a new RDD by applying a function to the zipped partitions. | ||
#' Assumes that all the RDDs have the same number of partitions*, but does *not* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unmatched *
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done.
@hlin09 - cool, I don't have a strong opinion towards either approach for now. Did a pass and left some minor style comments. |
Test build #31005 has started for PR 5568 at commit |
LGTM. /cc @shivaram |
Thanks @concretevitamin for the review. Will merge this after Jenkins passes |
Hmm this is weird - somehow the test results for this PR were never posted to github. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31005/consoleFull reports a json parsing error cc @shaneknapp |
I also re-ran the R tests locally to just verify -- Merging this. |
Author: hlin09 <hlin09pu@gmail.com> Closes apache#5568 from hlin09/zipPartitions and squashes the following commits: 12c08a5 [hlin09] Fix comments d2d32db [hlin09] Merge branch 'master' into zipPartitions ec56d2f [hlin09] Fix test. 27655d3 [hlin09] Adds support for zipPartitions.
Author: hlin09 <hlin09pu@gmail.com> Closes apache#5568 from hlin09/zipPartitions and squashes the following commits: 12c08a5 [hlin09] Fix comments d2d32db [hlin09] Merge branch 'master' into zipPartitions ec56d2f [hlin09] Fix test. 27655d3 [hlin09] Adds support for zipPartitions.
No description provided.