Skip to content

[SPARK-18750][yarn] Avoid using "mapValues" when allocating containers. #16667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Jan 21, 2017

That method is prone to stack overflows when the input map is really
large; instead, use plain "map". Also includes a unit test that was
tested and caused stack overflows without the fix.

That method is prone to stack overflows when the input map is really
large; instead, use plain "map". Also includes a unit tests that was
tested and caused stack overflows without the fix.
@SparkQA
Copy link

SparkQA commented Jan 21, 2017

Test build #71751 has finished for PR 16667 at commit 5d33e3e.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Jan 21, 2017

Argh, api not available in old hadoop... fix coming.

@SparkQA
Copy link

SparkQA commented Jan 21, 2017

Test build #71756 has finished for PR 16667 at commit 16a99fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Jan 22, 2017

@srowen @tgravescs

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine. I'm surprised there is a difference but take your word for it if it fixes it. It should be functionally equivalent anyway.

Copy link
Contributor

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Just a couple small nits.

import org.apache.hadoop.fs.CommonConfigurationKeysPublic
import org.apache.hadoop.net.DNSToSwitchMapping
import org.apache.hadoop.yarn.api.records._
import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used that I see, remove

yarnConf, resource)

val totalTasks = 32 * 1024
val totalContainers = totalTasks / 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason for the 16 here? I assume its just random selected that shows the issue but perhaps add in a comment.

val count = containers.size / hosts.size / 2

val hostToContainerMap = new HashMap[String, Set[ContainerId]]()
hosts.keys.take(hosts.size / 2).zipWithIndex.foreach { case (host, i) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar wouldn't hurt to have small description here for someone looking at it later

@SparkQA
Copy link

SparkQA commented Jan 23, 2017

Test build #71867 has finished for PR 16667 at commit 68c8925.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

+1

asfgit pushed a commit that referenced this pull request Jan 25, 2017
That method is prone to stack overflows when the input map is really
large; instead, use plain "map". Also includes a unit test that was
tested and caused stack overflows without the fix.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #16667 from vanzin/SPARK-18750.

(cherry picked from commit 76db394)
Signed-off-by: Tom Graves <tgraves@yahoo-inc.com>
@asfgit asfgit closed this in 76db394 Jan 25, 2017
asfgit pushed a commit that referenced this pull request Jan 25, 2017
That method is prone to stack overflows when the input map is really
large; instead, use plain "map". Also includes a unit test that was
tested and caused stack overflows without the fix.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #16667 from vanzin/SPARK-18750.

(cherry picked from commit 76db394)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
@vanzin vanzin deleted the SPARK-18750 branch January 27, 2017 00:57
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
That method is prone to stack overflows when the input map is really
large; instead, use plain "map". Also includes a unit test that was
tested and caused stack overflows without the fix.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#16667 from vanzin/SPARK-18750.
zzcclp added a commit to zzcclp/spark that referenced this pull request Feb 7, 2017
…ainers. apache#16667

without adding LocalityPlacementStrategySuite.scala
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
That method is prone to stack overflows when the input map is really
large; instead, use plain "map". Also includes a unit test that was
tested and caused stack overflows without the fix.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#16667 from vanzin/SPARK-18750.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants