Skip to content

[SPARK-12425][STREAMING] DStream union optimisation #10382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

gpoulin
Copy link
Contributor

@gpoulin gpoulin commented Dec 18, 2015

Use PartitionerAwareUnionRDD when possbile for optimizing shuffling and
preserving the partitioner.

@gpoulin gpoulin changed the title DStream union optimisation [STREAMING] DStream union optimisation Dec 18, 2015
@zsxwing
Copy link
Member

zsxwing commented Dec 18, 2015

Could you create a JIRA and add the JIRA number to the title?

@gpoulin gpoulin changed the title [STREAMING] DStream union optimisation [SPARK-12425][STREAMING] DStream union optimisation Dec 18, 2015
@zsxwing
Copy link
Member

zsxwing commented Dec 18, 2015

Jenkins, test this please

@@ -46,7 +45,11 @@ class UnionDStream[T: ClassTag](parents: Array[DStream[T]])
s" time $validTime")
}
if (rdds.size > 0) {
Some(new UnionRDD(ssc.sc, rdds))
if(rdds.forall(_.partitioner.isDefined) && rdds.flatMap(_.partitioner).toSet.size == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use Some(ssc.sc.union(rdds)) directly here

@SparkQA
Copy link

SparkQA commented Dec 18, 2015

Test build #48025 has finished for PR 10382 at commit 839d9ee.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gpoulin
Copy link
Contributor Author

gpoulin commented Dec 22, 2015

@zsxwing based on your comment, I took the liberty to deduplicate the logic to determine if UnionRDD or a PartitionerAwareUnionRDD

@zsxwing
Copy link
Member

zsxwing commented Dec 22, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Dec 22, 2015

Test build #48214 has finished for PR 10382 at commit acf6f4e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Dec 22, 2015

retest this please

@zsxwing
Copy link
Member

zsxwing commented Dec 23, 2015

ping @davies to take a look

@SparkQA
Copy link

SparkQA commented Dec 23, 2015

Test build #48220 has finished for PR 10382 at commit acf6f4e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor

tdas commented Dec 23, 2015

Would be nice to add a unit test in the WindowOperationsSuite for this. To make sure that windowed RDDs always have the partitioner attached.

@zsxwing
Copy link
Member

zsxwing commented Jan 5, 2016

@gpoulin could you resolve the conflicts and add a unit test for window as per @tdas's suggestion? Thanks!

@davies
Copy link
Contributor

davies commented Mar 17, 2016

LGTM, could you fix the conflict?

@srowen
Copy link
Member

srowen commented Mar 21, 2016

@gpoulin are you able to follow up on this or should I take it over?

@gpoulin
Copy link
Contributor Author

gpoulin commented Mar 21, 2016

Sorry, for the delay, I was quite busy lately. I should have sometime over easter to look at this.

@srowen
Copy link
Member

srowen commented Apr 1, 2016

@gpoulin I'll push this through if you'll rebase, or can just update it separately myself

@gpoulin gpoulin force-pushed the dstream_union_optimisation branch from acf6f4e to be31680 Compare April 3, 2016 16:28
gpoulin added 2 commits April 3, 2016 12:31
Use PartitionerAwareUnionRDD when possbile for optimizing shuffling and
preserving the partitioner.
Deduplicate logic to determine if `PartitionerAwareUnionRDD` should be
used instead of `UnionRDD`.
@gpoulin gpoulin force-pushed the dstream_union_optimisation branch from be31680 to 3bb5ea3 Compare April 3, 2016 16:33
@srowen
Copy link
Member

srowen commented Apr 4, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Apr 4, 2016

Test build #54817 has finished for PR 10382 at commit 3bb5ea3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Apr 4, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Apr 4, 2016

Test build #54861 has finished for PR 10382 at commit 3bb5ea3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Apr 4, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Apr 4, 2016

Test build #54871 has finished for PR 10382 at commit 3bb5ea3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Apr 5, 2016

Merged to master

@asfgit asfgit closed this in 7201f03 Apr 5, 2016
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Apr 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants