Skip to content

[SPARK-2823][GraphX]fix GraphX EdgeRDD zipPartitions #1763

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

luluorta
Copy link
Contributor

@luluorta luluorta commented Aug 4, 2014

If the users set “spark.default.parallelism” and the value is different with the EdgeRDD partition number, GraphX jobs will throw:
java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions

@luluorta luluorta changed the title fix GraphX EdgeRDD zipPartitions [SPARK-2823]fix GraphX EdgeRDD zipPartitions Aug 4, 2014
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@luluorta luluorta changed the title [SPARK-2823]fix GraphX EdgeRDD zipPartitions [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPartitions Aug 4, 2014
@ankurdave
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Aug 4, 2014

QA tests have started for PR 1763. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17863/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 4, 2014

QA results for PR 1763:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17863/consoleFull

@ankurdave
Copy link
Contributor

Sorry for the delay on this. It would be great if the PR also added a unit test to reproduce the bug. I can add that if you don't have time.

asfgit pushed a commit that referenced this pull request Sep 3, 2014
If the users set “spark.default.parallelism” and the value is different with the EdgeRDD partition number, GraphX jobs will throw:
java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions

Author: luluorta <luluorta@gmail.com>

Closes #1763 from luluorta/fix-graph-zip and squashes the following commits:

8338961 [luluorta] fix GraphX EdgeRDD zipPartitions

(cherry picked from commit 9b225ac)
Signed-off-by: Ankur Dave <ankurdave@gmail.com>
asfgit pushed a commit that referenced this pull request Sep 3, 2014
If the users set “spark.default.parallelism” and the value is different with the EdgeRDD partition number, GraphX jobs will throw:
java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions

Author: luluorta <luluorta@gmail.com>

Closes #1763 from luluorta/fix-graph-zip and squashes the following commits:

8338961 [luluorta] fix GraphX EdgeRDD zipPartitions

(cherry picked from commit 9b225ac)
Signed-off-by: Ankur Dave <ankurdave@gmail.com>
@asfgit asfgit closed this in 9b225ac Sep 3, 2014
@ankurdave
Copy link
Contributor

Thanks! I added a test, verified that it failed before and succeeds now, and merged this into master, branch-1.1, and branch-1.0.

xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
If the users set “spark.default.parallelism” and the value is different with the EdgeRDD partition number, GraphX jobs will throw:
java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions

Author: luluorta <luluorta@gmail.com>

Closes apache#1763 from luluorta/fix-graph-zip and squashes the following commits:

8338961 [luluorta] fix GraphX EdgeRDD zipPartitions
ankurdave pushed a commit to ankurdave/spark that referenced this pull request Nov 19, 2014
If the users set “spark.default.parallelism” and the value is different with the EdgeRDD partition number, GraphX jobs will throw:
java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions

Author: luluorta <luluorta@gmail.com>

Closes apache#1763 from luluorta/fix-graph-zip and squashes the following commits:

8338961 [luluorta] fix GraphX EdgeRDD zipPartitions
@Earne
Copy link
Contributor

Earne commented Dec 23, 2014

@ankurdave Ithink you miss this PR when you [Extract interfaces for EdgeRDD and VertexRDD[(https://github.com//pull/2530).

SPARK-2823 was reopened due to this.
Can we just hava a param numPartitions in EdgeRDD.scala like what VertexRDD#L313 did?
Is coalesce necessary in GraphLoader#L70? RDD after coalesce(numEdgePartitions) may not have partitions.length == numEdgePartitions
coalesce

@luluorta
Copy link
Contributor Author

Thanks, @Earne

Actually we already had a method to customize the partition number of EdgeRDD by using Graph.partitionBy Graph.scala#L136.

I guess the better name for the param of coalesce(numEdgePartitions) is maxEdgePartitions, cause it is used for making sure the generated EdgeRDD with no more than maxEdgePartitions partitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants