Skip to content

[SPARK-17062][MESOS] add conf option to mesos dispatcher #14650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

skonto
Copy link
Contributor

@skonto skonto commented Aug 15, 2016

What changes were proposed in this pull request?

Adds --conf option to set spark configuration properties in mesos dispacther.
Properties provided with --conf take precedence over properties within the properties file.
The reason for this PR is that for simple configuration or testing purposes we need to provide a property file (ideally a shared one for a cluster) even if we just provide a single property.

How was this patch tested?

Manually tested.

@skonto
Copy link
Contributor Author

skonto commented Aug 15, 2016

@mgummelt pls review.

@SparkQA
Copy link

SparkQA commented Aug 15, 2016

Test build #63790 has finished for PR 14650 at commit db42e31.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Aug 16, 2016

Does the existing spark-submit not work on Mesos?

@skonto
Copy link
Contributor Author

skonto commented Aug 16, 2016

@rxin:
MesosDispacther uses spark-daemon.sh which uses class not submit.
It also creates the sparkConfig from scratch using its own logic, so no luck there I guess.
I have not checked the implications of launching it with spark-submit and whether it can re-use its logic, isn't the case that class is more suitable here for consistency reasons (at least)? For example spark executors on Mesos are started with class. I think we are just starting a class within the project not submitting something for example an app.

@skonto
Copy link
Contributor Author

skonto commented Aug 16, 2016

@srowen @mgummelt what do you think?

@srowen
Copy link
Member

srowen commented Aug 17, 2016

Seems OK to me, to the limits of my understanding, and given the logic of #14650 (comment)

@skonto
Copy link
Contributor Author

skonto commented Aug 17, 2016

@mgummelt?

@skonto
Copy link
Contributor Author

skonto commented Aug 19, 2016

@srowen could you merge it pls?

@mgummelt
Copy link
Contributor

I'm generally fine with this, though one downside is that it introduces a consistency with other daemon classes such as Master.scala, which only accepts a properties file. Maybe we should make a JIRA to add this to the other classes.

Not that Spark configuration has any sort of sane consistency to be upheld. There are ~10 ways of setting config properties. It's one of the most confusing things about operations.

@srowen Do you know why even have a separate set of Spark config properties, rather than just using Java System properties? I know you can load Spark conf from Java properties, but you can also load them via --conf (and only in spark-submit), which seems like an unnecessary nonstandard interface.

@@ -73,6 +82,10 @@ private[mesos] class MesosClusterDispatcherArguments(args: Array[String], conf:
propertiesFile = value
parse(tail)

case ("--conf") :: value :: tail =>
cliSparkConfig += value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this consistent with SparkSubmitArguments, and parse these into a HashMap here.

@srowen
Copy link
Member

srowen commented Aug 20, 2016

Responding to #14650 (comment) , IMHO it's best if configuration never uses environment variables or system properties. They're global to a process and that eventually ends in some tears, despite its convenience. Hence I personally support standardizing towards explicit configuration.

In practice some env variables are used here and there for legacy reasons. And yes you can set props via system properties. I think being consistent is OK, right or wrong, but would generally move away from env / sys properties if given the choice.

@skonto
Copy link
Contributor Author

skonto commented Aug 29, 2016

WIP

@SparkQA
Copy link

SparkQA commented Sep 13, 2016

Test build #65311 has finished for PR 14650 at commit 3532f5d.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@skonto
Copy link
Contributor Author

skonto commented Sep 13, 2016

@mgummelt @srowen I refactored a bit MesosSchedulerDispatcherArguments to follow the SparkSubmit pattern. Added several tests.
My thinking is that all objects with a main method (SparkSubmit, MesosClusterDispatcher, Yarn stuff..) which utilize spark conf and other args should have a common structure for parsing args and testing. So I re-used the spark submit design to drive this, minimizing technical debt in a sense.
It is a minor thing but it is good to have consistency across the code base. If it is acceptable I can further refactor all remaining stuff in another pr.

@SparkQA
Copy link

SparkQA commented Sep 13, 2016

Test build #65333 has finished for PR 14650 at commit 497e517.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait SparkConfigParseUtils

@skonto
Copy link
Contributor Author

skonto commented Sep 14, 2016

WIP fixing test, removing public class...

@SparkQA
Copy link

SparkQA commented Sep 14, 2016

Test build #65358 has finished for PR 14650 at commit 342d8d3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@skonto
Copy link
Contributor Author

skonto commented Sep 14, 2016

@srowen @mgummelt pls review...

@skonto
Copy link
Contributor Author

skonto commented Sep 19, 2016

@srowen what do you think?

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @vanzin because this is touching spark submit.

import org.apache.spark.{SPARK_BRANCH, SPARK_BUILD_DATE, SPARK_BUILD_USER, SPARK_REPO_URL}
import org.apache.spark.{SPARK_REVISION, SPARK_VERSION, SparkException, SparkUserAppException}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is trivial, but I think at this point a "_" import is fine and called for. It's split over two imports now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok cool It was there before my PR but I will fix it no problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -27,28 +27,18 @@ import org.scalatest.{BeforeAndAfterEach, Matchers}
import org.scalatest.concurrent.Timeouts
import org.scalatest.time.SpanSugar._

import org.apache.spark._
import org.apache.spark.{SparkFunSuite, _}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed? it already imports everything

Copy link
Contributor Author

@skonto skonto Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// scalastyle:on println
printUsageAndExit(1)
}

case _ =>
case value@_ =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case value => or maybe I misunderstand why it's written this way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its the same thing case value => less verbose. Will change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

/*
* Adds main method to a class or object
*/
private[spark] trait Executable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trait feels funny. I didn't see what it is needed for though. It's used in tests only, but isn't used in a place where you need to abstract over both implementations?

Copy link
Contributor Author

@skonto skonto Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to mark specific classes with main support, I could have used a different implementation to find out about it like duck typing, or reflection etc but these are either not permitted (eg. scala style) or are too complex. It is not bad to tag a class that way. Maybe sound redundant even funny (I would not call it that way) but it serves a purpose to make things easy for the test code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need any of that; it seems like this is only called where SparkSubmit is definitely being invoked. I didn't see where a different implementation would even be used.

Copy link
Contributor Author

@skonto skonto Sep 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so in the code there is no reason as you said. In the test code I tried to re-use this function here doing some refactoring first. In order this to work, I need to pass an object for testing and that object needs to implement some specific functions.
Spark project does not allow duck typing and I thought reflection is too much. I could have avoided the trait in the first place by not reusing the function but I thought I could improve test code too.

Copy link
Contributor Author

@skonto skonto Sep 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway I can eliminate it with reflection, it is test code anyway, is it ok @srowen?

Copy link
Contributor

@vanzin vanzin Sep 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this trait seems unnecessary. For the test, you can use two arguments: a "SparkConfigParseUtils" and a second one that's a main function (main: Array[String] => Unit).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trait still exists and still looks out of place. It could be merged with CommandLineUtils, which then could be called something like SparkEntryPoint (although CommandLineUtils is also probably fine).

propertiesFile = Utils.loadDefaultSparkProperties(conf, propertiesFile)
Utils.updateSparkConfigFromProperties(conf, confProperties)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it differs from Mesos to non-Mesos implementations. Here you can update props with --conf but wouldn't this be identical elsewhere? I may have missed the purpose.

Copy link
Contributor Author

@skonto skonto Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic differs wrt to others (I would have to refactor everything otherwise which is not wise given code size for now). I just added support for the --conf option, that is the same across implementations, and re-used common code for parsing stuff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I mean, why isn't this same code then called elsewhere? You know more about this than I, but I suppose I'd expect more parallel similarity between the Mesos and non-Mesos code, for supporting the same functionality. There's a method factored out here but not reused. Marcelo may understand this more anyway.

Copy link
Contributor Author

@skonto skonto Sep 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-mesos code either does not support --conf or it does not need in all cases to set Spark confguration. Check here and here. If you compare the same functionality (the one about loading the properties file) with the MesosClusterDispatcherArguments file, things are completely different, even before my PR. Check here.

So I dont expect any parallel similarity since for example in this case ApplicationMaster does not need to get the spark configuration in a Spark Config.

@SparkQA
Copy link

SparkQA commented Sep 19, 2016

Test build #65594 has finished for PR 14650 at commit d40b928.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to do a second pass (have to run now), but I really would like that Executable trait to go away.

/*
* Adds main method to a class or object
*/
private[spark] trait Executable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trait still exists and still looks out of place. It could be merged with CommandLineUtils, which then could be called something like SparkEntryPoint (although CommandLineUtils is also probably fine).

exitFn(1)
}

private[spark] def parseSparkConfProperty(pair: String): Either[Unit, (String, String)] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning Either here is weird. It should return (String, String). I know you did this because of the error handling case, but that's better handled by:

printErrorAndExit(s"Spark config without '=': $pair")
throw new SparkException(s"Spark config without '=': $pair")

That way if exitFn doesn't really exit, you get the exception. Hopefully only tests override exitFn and those shouldn't really trigger the exception, so it's there just to make the compiler happy.

Copy link
Contributor Author

@skonto skonto Oct 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor Author

@skonto skonto Oct 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok cool I was not aware if I can throw a SparkException at that point (if the rest of the code base follows a similar approach)... exception is much better I was just wondering about it...

Copy link
Contributor Author

@skonto skonto Nov 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I fixed that although throw new SparkException(s"Spark config without '=': $pair") is a bit redundant, it is there only to make compiler happy which is again a bit strange to me, anyway.
User of Either is a bit too much as well. Not so happy in general.

*/
private[spark] trait CommandLineUtils {

// scalastyle:off println
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you're missing // scalastyle:on println

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah..

}
SparkSubmit.parseSparkConfProperty(value)
.right
.foreach{case (k, v) => sparkProperties(k) = v}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space before and after {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

def updateSparkConfigFromProperties(
conf: SparkConf,
properties: Map[String, String])
: Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move to previous line

input: Array[String],
searchString: String,
mainObject: Executable with CommandLineUtils = SparkSubmit)
: Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move to previous line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

import org.apache.spark.deploy.TestPrematureExit

class MesosClusterDispatcherArgumentsSuite extends SparkFunSuite
with TestPrematureExit{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space before '{'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@skonto
Copy link
Contributor Author

skonto commented Oct 20, 2016

@vanzin & @srowen I will update the PR asap thnx for the review.

@skonto
Copy link
Contributor Author

skonto commented Nov 15, 2016

WIP

@skonto skonto force-pushed the dipatcher_conf branch 2 times, most recently from e089939 to 19dbd73 Compare November 16, 2016 12:01
@skonto
Copy link
Contributor Author

skonto commented Nov 16, 2016

@vanzin @srowen @mgummelt pls review. I am sorry for the late response.

@SparkQA
Copy link

SparkQA commented Nov 16, 2016

Test build #68714 has finished for PR 14650 at commit 19dbd73.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@skonto
Copy link
Contributor Author

skonto commented Nov 16, 2016

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Nov 16, 2016

Test build #68715 has finished for PR 14650 at commit 19dbd73.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 16, 2016

Test build #68718 has finished for PR 14650 at commit cfadc06.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@skonto
Copy link
Contributor Author

skonto commented Nov 17, 2016

@srowen could I get a merge please?

@rxin
Copy link
Contributor

rxin commented Nov 17, 2016

@vanzin did you do your 2nd pass?

@vanzin
Copy link
Contributor

vanzin commented Nov 17, 2016

Not yet, I'll try to find some time today.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure updateSparkConfigFromProperties should be touching system properties. We really should avoid adding more code that messes with those.

Other than that, looks ok.

case Seq(k, v) => sparkProperties(k) = v
case _ => SparkSubmit.printErrorAndExit(s"Spark config without '=': $value")
}
val pair = SparkSubmit.parseSparkConfProperty(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: val (confName, confValue) = SparkSubmit.parseSparkConfProperty(value)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

*/
private[spark] trait CommandLineUtils {

// scalastyle:on println
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you don't need this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

throw new SparkException(s"Spark config without '=': $pair")
}
}
// scalastyle:on println
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can move this after printErrorAndExit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


// scalastyle:off println
private[spark] def printWarning(str: String): Unit = printStream.println("Warning: " + str)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove extra empty line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

Copy link
Contributor Author

@skonto skonto Nov 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the empty line it still shows one which is used to separate statements there.

k.startsWith("spark.")
}.foreach { case (k, v) =>
conf.set(k, v)
sys.props.getOrElseUpdate(k, v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this method updating system properties? The name nor the scaladoc mention that fact, and this is a pretty important thing the method does. Not only that but the behavior is different from what it does to the SparkConf (set if not yet set, instead of always set).

Copy link
Contributor Author

@skonto skonto Nov 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loadDefaultSparkProperties method a few lines above does that so I thought this is a common practice to load stuff there because they might be reused elsewhere. Also this method is used at the Dispatcher side only. I can remove it for sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

@skonto
Copy link
Contributor Author

skonto commented Nov 17, 2016

@vanzin I updated the PR. Thank you for the review.
@rxin its ready I guess.

@SparkQA
Copy link

SparkQA commented Nov 18, 2016

Test build #68805 has finished for PR 14650 at commit 946202d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@skonto
Copy link
Contributor Author

skonto commented Nov 18, 2016

@rxin @srowen could I get merge pls if there are no other issues?

@srowen
Copy link
Member

srowen commented Nov 19, 2016

Since @vanzin has looked at this most closely I'd defer to him for a merge when you both think it's ready.

@vanzin
Copy link
Contributor

vanzin commented Nov 20, 2016

LGTM. There are a few style nits that I'll fix during the merge to avoid another round.

Merging to master.

@skonto
Copy link
Contributor Author

skonto commented Nov 20, 2016

@vanzin if there is anything to fix I can do it if you want I can do another parse to check it against the style guide. Eg. avoid imports such as import org.apache.spark.util._

I thought when you did the 2nd parse you we were ok with the result.
For all the style nit issues I guess sometimes is a matter of taste (although technically correct)...

@vanzin
Copy link
Contributor

vanzin commented Nov 20, 2016

This is already merged. Maybe there's something wrong with the apache / github sync.

https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=ea77c81ec0db27ea4709f71dc080d00167505a7d

@asfgit asfgit closed this in ea77c81 Nov 20, 2016
@rxin
Copy link
Contributor

rxin commented Nov 20, 2016

Guys this broke the build. Please don't use Option.contains in the future. It is both confusing and not working for Scala 2.10.

@vanzin
Copy link
Contributor

vanzin commented Nov 20, 2016

I see you already pushed a fix, thanks.

uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
Adds --conf option to set spark configuration properties in mesos dispacther.
Properties provided with --conf take precedence over properties within the properties file.
The reason for this PR is that for simple configuration or testing purposes we need to provide a property file (ideally a shared one for a cluster) even if we just provide a single property.

Manually tested.

Author: Stavros Kontopoulos <st.kontopoulos@gmail.com>
Author: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com>

Closes apache#14650 from skonto/dipatcher_conf.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants