-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-12425][STREAMING] DStream union optimisation #10382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Could you create a JIRA and add the JIRA number to the title? |
Jenkins, test this please |
@@ -46,7 +45,11 @@ class UnionDStream[T: ClassTag](parents: Array[DStream[T]]) | |||
s" time $validTime") | |||
} | |||
if (rdds.size > 0) { | |||
Some(new UnionRDD(ssc.sc, rdds)) | |||
if(rdds.forall(_.partitioner.isDefined) && rdds.flatMap(_.partitioner).toSet.size == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use Some(ssc.sc.union(rdds))
directly here
Test build #48025 has finished for PR 10382 at commit
|
@zsxwing based on your comment, I took the liberty to deduplicate the logic to determine if |
retest this please |
Test build #48214 has finished for PR 10382 at commit
|
retest this please |
ping @davies to take a look |
Test build #48220 has finished for PR 10382 at commit
|
Would be nice to add a unit test in the WindowOperationsSuite for this. To make sure that windowed RDDs always have the partitioner attached. |
LGTM, could you fix the conflict? |
@gpoulin are you able to follow up on this or should I take it over? |
Sorry, for the delay, I was quite busy lately. I should have sometime over easter to look at this. |
@gpoulin I'll push this through if you'll rebase, or can just update it separately myself |
acf6f4e
to
be31680
Compare
Use PartitionerAwareUnionRDD when possbile for optimizing shuffling and preserving the partitioner.
Deduplicate logic to determine if `PartitionerAwareUnionRDD` should be used instead of `UnionRDD`.
be31680
to
3bb5ea3
Compare
Jenkins retest this please |
Test build #54817 has finished for PR 10382 at commit
|
Jenkins retest this please |
Test build #54861 has finished for PR 10382 at commit
|
Jenkins retest this please |
Test build #54871 has finished for PR 10382 at commit
|
Merged to master |
Use PartitionerAwareUnionRDD when possbile for optimizing shuffling and
preserving the partitioner.