-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-17062][MESOS] add conf option to mesos dispatcher #14650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@mgummelt pls review. |
Test build #63790 has finished for PR 14650 at commit
|
Does the existing spark-submit not work on Mesos? |
@rxin: |
Seems OK to me, to the limits of my understanding, and given the logic of #14650 (comment) |
@srowen could you merge it pls? |
I'm generally fine with this, though one downside is that it introduces a consistency with other daemon classes such as Master.scala, which only accepts a properties file. Maybe we should make a JIRA to add this to the other classes. Not that Spark configuration has any sort of sane consistency to be upheld. There are ~10 ways of setting config properties. It's one of the most confusing things about operations. @srowen Do you know why even have a separate set of Spark config properties, rather than just using Java System properties? I know you can load Spark conf from Java properties, but you can also load them via |
@@ -73,6 +82,10 @@ private[mesos] class MesosClusterDispatcherArguments(args: Array[String], conf: | |||
propertiesFile = value | |||
parse(tail) | |||
|
|||
case ("--conf") :: value :: tail => | |||
cliSparkConfig += value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this consistent with SparkSubmitArguments
, and parse these into a HashMap here.
Responding to #14650 (comment) , IMHO it's best if configuration never uses environment variables or system properties. They're global to a process and that eventually ends in some tears, despite its convenience. Hence I personally support standardizing towards explicit configuration. In practice some env variables are used here and there for legacy reasons. And yes you can set props via system properties. I think being consistent is OK, right or wrong, but would generally move away from env / sys properties if given the choice. |
WIP |
Test build #65311 has finished for PR 14650 at commit
|
@mgummelt @srowen I refactored a bit MesosSchedulerDispatcherArguments to follow the SparkSubmit pattern. Added several tests. |
Test build #65333 has finished for PR 14650 at commit
|
WIP fixing test, removing public class... |
Test build #65358 has finished for PR 14650 at commit
|
@srowen what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @vanzin because this is touching spark submit.
import org.apache.spark.{SPARK_BRANCH, SPARK_BUILD_DATE, SPARK_BUILD_USER, SPARK_REPO_URL} | ||
import org.apache.spark.{SPARK_REVISION, SPARK_VERSION, SparkException, SparkUserAppException} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is trivial, but I think at this point a "_" import is fine and called for. It's split over two imports now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok cool It was there before my PR but I will fix it no problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -27,28 +27,18 @@ import org.scalatest.{BeforeAndAfterEach, Matchers} | |||
import org.scalatest.concurrent.Timeouts | |||
import org.scalatest.time.SpanSugar._ | |||
|
|||
import org.apache.spark._ | |||
import org.apache.spark.{SparkFunSuite, _} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed? it already imports everything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// scalastyle:on println | ||
printUsageAndExit(1) | ||
} | ||
|
||
case _ => | ||
case value@_ => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case value =>
or maybe I misunderstand why it's written this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its the same thing case value => less verbose. Will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
/* | ||
* Adds main method to a class or object | ||
*/ | ||
private[spark] trait Executable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This trait feels funny. I didn't see what it is needed for though. It's used in tests only, but isn't used in a place where you need to abstract over both implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to mark specific classes with main support, I could have used a different implementation to find out about it like duck typing, or reflection etc but these are either not permitted (eg. scala style) or are too complex. It is not bad to tag a class that way. Maybe sound redundant even funny (I would not call it that way) but it serves a purpose to make things easy for the test code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need any of that; it seems like this is only called where SparkSubmit is definitely being invoked. I didn't see where a different implementation would even be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so in the code there is no reason as you said. In the test code I tried to re-use this function here doing some refactoring first. In order this to work, I need to pass an object for testing and that object needs to implement some specific functions.
Spark project does not allow duck typing and I thought reflection is too much. I could have avoided the trait in the first place by not reusing the function but I thought I could improve test code too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway I can eliminate it with reflection, it is test code anyway, is it ok @srowen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that this trait seems unnecessary. For the test, you can use two arguments: a "SparkConfigParseUtils" and a second one that's a main function (main: Array[String] => Unit
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This trait still exists and still looks out of place. It could be merged with CommandLineUtils
, which then could be called something like SparkEntryPoint
(although CommandLineUtils
is also probably fine).
propertiesFile = Utils.loadDefaultSparkProperties(conf, propertiesFile) | ||
Utils.updateSparkConfigFromProperties(conf, confProperties) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it differs from Mesos to non-Mesos implementations. Here you can update props with --conf but wouldn't this be identical elsewhere? I may have missed the purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic differs wrt to others (I would have to refactor everything otherwise which is not wise given code size for now). I just added support for the --conf option, that is the same across implementations, and re-used common code for parsing stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I mean, why isn't this same code then called elsewhere? You know more about this than I, but I suppose I'd expect more parallel similarity between the Mesos and non-Mesos code, for supporting the same functionality. There's a method factored out here but not reused. Marcelo may understand this more anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-mesos code either does not support --conf or it does not need in all cases to set Spark confguration. Check here and here. If you compare the same functionality (the one about loading the properties file) with the MesosClusterDispatcherArguments file, things are completely different, even before my PR. Check here.
So I dont expect any parallel similarity since for example in this case ApplicationMaster does not need to get the spark configuration in a Spark Config.
342d8d3
to
d40b928
Compare
Test build #65594 has finished for PR 14650 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to do a second pass (have to run now), but I really would like that Executable
trait to go away.
/* | ||
* Adds main method to a class or object | ||
*/ | ||
private[spark] trait Executable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This trait still exists and still looks out of place. It could be merged with CommandLineUtils
, which then could be called something like SparkEntryPoint
(although CommandLineUtils
is also probably fine).
exitFn(1) | ||
} | ||
|
||
private[spark] def parseSparkConfProperty(pair: String): Either[Unit, (String, String)] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning Either
here is weird. It should return (String, String)
. I know you did this because of the error handling case, but that's better handled by:
printErrorAndExit(s"Spark config without '=': $pair")
throw new SparkException(s"Spark config without '=': $pair")
That way if exitFn
doesn't really exit, you get the exception. Hopefully only tests override exitFn
and those shouldn't really trigger the exception, so it's there just to make the compiler happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok cool I was not aware if I can throw a SparkException at that point (if the rest of the code base follows a similar approach)... exception is much better I was just wondering about it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I fixed that although throw new SparkException(s"Spark config without '=': $pair") is a bit redundant, it is there only to make compiler happy which is again a bit strange to me, anyway.
User of Either
is a bit too much as well. Not so happy in general.
*/ | ||
private[spark] trait CommandLineUtils { | ||
|
||
// scalastyle:off println |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you're missing // scalastyle:on println
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah..
} | ||
SparkSubmit.parseSparkConfProperty(value) | ||
.right | ||
.foreach{case (k, v) => sparkProperties(k) = v} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space before and after {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
def updateSparkConfigFromProperties( | ||
conf: SparkConf, | ||
properties: Map[String, String]) | ||
: Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move to previous line
input: Array[String], | ||
searchString: String, | ||
mainObject: Executable with CommandLineUtils = SparkSubmit) | ||
: Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move to previous line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
import org.apache.spark.deploy.TestPrematureExit | ||
|
||
class MesosClusterDispatcherArgumentsSuite extends SparkFunSuite | ||
with TestPrematureExit{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space before '{'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
WIP |
e089939
to
19dbd73
Compare
Test build #68714 has finished for PR 14650 at commit
|
Jenkins, retest this please. |
Test build #68715 has finished for PR 14650 at commit
|
19dbd73
to
cfadc06
Compare
Test build #68718 has finished for PR 14650 at commit
|
@srowen could I get a merge please? |
@vanzin did you do your 2nd pass? |
Not yet, I'll try to find some time today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure updateSparkConfigFromProperties
should be touching system properties. We really should avoid adding more code that messes with those.
Other than that, looks ok.
case Seq(k, v) => sparkProperties(k) = v | ||
case _ => SparkSubmit.printErrorAndExit(s"Spark config without '=': $value") | ||
} | ||
val pair = SparkSubmit.parseSparkConfProperty(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: val (confName, confValue) = SparkSubmit.parseSparkConfProperty(value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
*/ | ||
private[spark] trait CommandLineUtils { | ||
|
||
// scalastyle:on println |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you don't need this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
throw new SparkException(s"Spark config without '=': $pair") | ||
} | ||
} | ||
// scalastyle:on println |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can move this after printErrorAndExit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
// scalastyle:off println | ||
private[spark] def printWarning(str: String): Unit = printStream.println("Warning: " + str) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove extra empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the empty line it still shows one which is used to separate statements there.
k.startsWith("spark.") | ||
}.foreach { case (k, v) => | ||
conf.set(k, v) | ||
sys.props.getOrElseUpdate(k, v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this method updating system properties? The name nor the scaladoc mention that fact, and this is a pretty important thing the method does. Not only that but the behavior is different from what it does to the SparkConf (set if not yet set, instead of always set).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loadDefaultSparkProperties method a few lines above does that so I thought this is a common practice to load stuff there because they might be reused elsewhere. Also this method is used at the Dispatcher side only. I can remove it for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
cfadc06
to
946202d
Compare
Test build #68805 has finished for PR 14650 at commit
|
Since @vanzin has looked at this most closely I'd defer to him for a merge when you both think it's ready. |
LGTM. There are a few style nits that I'll fix during the merge to avoid another round. Merging to master. |
@vanzin if there is anything to fix I can do it if you want I can do another parse to check it against the style guide. Eg. avoid imports such as import org.apache.spark.util._ I thought when you did the 2nd parse you we were ok with the result. |
This is already merged. Maybe there's something wrong with the apache / github sync. |
Guys this broke the build. Please don't use Option.contains in the future. It is both confusing and not working for Scala 2.10. |
I see you already pushed a fix, thanks. |
Adds --conf option to set spark configuration properties in mesos dispacther. Properties provided with --conf take precedence over properties within the properties file. The reason for this PR is that for simple configuration or testing purposes we need to provide a property file (ideally a shared one for a cluster) even if we just provide a single property. Manually tested. Author: Stavros Kontopoulos <st.kontopoulos@gmail.com> Author: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com> Closes apache#14650 from skonto/dipatcher_conf.
What changes were proposed in this pull request?
Adds --conf option to set spark configuration properties in mesos dispacther.
Properties provided with --conf take precedence over properties within the properties file.
The reason for this PR is that for simple configuration or testing purposes we need to provide a property file (ideally a shared one for a cluster) even if we just provide a single property.
How was this patch tested?
Manually tested.