Skip to content

SPARK-4040. Update documentation to exemplify use of local (n) value, fo... #2964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jayunit100
Copy link

This is a minor docs update which helps to clarify the way local[n] is used for streaming apps.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@jayunit100
Copy link
Author

(bump) - any thoughts on this ? I'd also like role some more improvements into it in a follow up......


{% highlight scala %}
val conf = new SparkConf()
.setMaster("local")
.setMaster("local[1]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually though the default behavior of local was 2 threads, but in the code I don't think that's true. I thought @mateiz mentioned one time that it's better to run with minimal parallelism by default to expose issues that might only appear when there are multiple executors.

In any event, given that, and the thrust of this doc change, is it good to encourage people to use 1 worker? how about explicitly 2? Making it explicit is a small good thing anyway.

@jayunit100
Copy link
Author

Hi sean . I like that idea of running with 2 threads , and making it explicit : Thats the main purpose of the PR.... i'll update that (and the // stuff) , and then rebase this PR

@jayunit100 jayunit100 force-pushed the SPARK-4040 branch 2 times, most recently from 0ea9a4b to 6bcab3f Compare October 31, 2014 00:14
@jayunit100
Copy link
Author

okay ! updated . After this i think we can look into some deeper updates into the streaming docs as well. (fyi @srowen ) looking good now?

.setAppName("CountingSheep")
.set("spark.executor.memory", "1g")
val sc = new SparkContext(conf)
{% endhighlight %}

Note that we can have more than 1 worker in local mode, and in cases like spark streaming, we may actually
require one to prevent any sort of starvation issues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note on this, the threads shouldn't be called "workers", since that means something else in our distributed cluster mode. It's better to call them threads here.

Also, on this like, capitalize Spark Streaming.

@mateiz
Copy link
Contributor

mateiz commented Nov 1, 2014

Thanks for adding these clarifications, it's a good idea.

@jayunit100
Copy link
Author

@mateiz @srowen okay, updated w/ the threads vs workers disambiguation... thanks for the feedback, just ping if any other updates are necessary.!

@jayunit100
Copy link
Author

(bump) all set on this guy ? or shall we wait till after the upcoming spark release?

@mateiz
Copy link
Contributor

mateiz commented Nov 5, 2014

This looks fine to merge into 1.2; will do so. Thanks!

asfgit pushed a commit that referenced this pull request Nov 5, 2014
… fo...

This is a minor docs update which helps to clarify the way local[n] is used for streaming apps.

Author: jay@apache.org <jayunit100>

Closes #2964 from jayunit100/SPARK-4040 and squashes the following commits:

35b5a5e [jay@apache.org] SPARK-4040: Update documentation to exemplify use of local (n) value.

(cherry picked from commit 868cd4c)
Signed-off-by: Matei Zaharia <matei@databricks.com>
@asfgit asfgit closed this in 868cd4c Nov 5, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants