Skip to content

[SPARK-4449][Core]Specify port range in spark #5722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from

Conversation

WangTaoTheTonic
Copy link
Contributor

Based on #3314, use a range for port retry per @sowen @tgravescs 's comments.

With lots of signature of function changed, user can set "spark.*.port" to a string like "a:b" in which "a" represents the minimum port services will start on and "b" the maximum.

In the meantime it still gurantee the backward-compatibility which means user can still use a single number as ports' value.

@SparkQA
Copy link

SparkQA commented Apr 27, 2015

Test build #30967 has finished for PR 5722 at commit d4da80b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MasterWebUI(val master: Master, requestedPort: String)
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31096 has finished for PR 5722 at commit a01a20e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch adds the following new dependencies:
    • tachyon-0.6.4.jar
    • tachyon-client-0.6.4.jar
  • This patch removes the following dependencies:
    • tachyon-0.5.0.jar
    • tachyon-client-0.5.0.jar

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31108 has finished for PR 5722 at commit db0d3ee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MasterWebUI(val master: Master, requestedPort: String)
  • This patch does not change any dependencies.

@WangTaoTheTonic
Copy link
Contributor Author

@srowen @tgravescs Could you guys take a look at this? Thanks~

@SparkQA
Copy link

SparkQA commented Apr 29, 2015

Test build #31232 has finished for PR 5722 at commit a149eba.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MasterWebUI(val master: Master, requestedPort: String)
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 29, 2015

Test build #31257 has finished for PR 5722 at commit 083760a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MasterWebUI(val master: Master, requestedPort: String)
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 30, 2015

Test build #31416 has finished for PR 5722 at commit 5e2940d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MasterWebUI(val master: Master, requestedPort: String)
  • This patch does not change any dependencies.

s"Maximum port ${maxPort} should be between 1024 and 65535 (inclusive)," +
" or 0 for a random free port.")
require(minPort <= maxPort, s"Minimum ${minPort} port should not be" +
s" less than the maximum ${maxPort}.")

val serviceString = if (serviceName.isEmpty) "" else s" '$serviceName'"
val maxRetries = portMaxRetries(conf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you can have a case where maxPort - minPort > maxRetries and that will potentially throw an exception the user doesn't expect, right?

I think the right thing to do here is to set maxPort to minPort + maxRetries if it's not defined. And potentially deprecate spark.port.maxRetries in the process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehh..I think I couldn't understand you totally. what did you mean by that will potentially throw an exception the user doesn't expect? If maxPort - minPort > maxRetries then port will retry between minPort and minPort + maxRetires - 1 and throw an exception if all ports are not available in that range.

I think minPort and maxPort is the range that port will limited to (if successfully), and maxRetries is the max retry times to try to bind.

But take another thought, these three configs are proposed for same reason (limit port range #1777), and they maybe overlap but maxRetires is kinda global setting.

Anyway, I will set maxPort to minPort + maxRetries if it's not defined per your comments as 65535 is sort of too big for the default value.

@SparkQA
Copy link

SparkQA commented May 4, 2015

Test build #31753 has finished for PR 5722 at commit 28a3adf.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MasterWebUI(val master: Master, requestedPort: String)

@WangTaoTheTonic
Copy link
Contributor Author

Test error is:
[error] oro#oro;2.0.8!oro.jar origin location must be absolute: file:/home/jenkins/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar.

Seems like it has nothing to do with this patch. So Jenkins, retest this please.

@WangTaoTheTonic
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented May 4, 2015

Test build #31756 has finished for PR 5722 at commit 28a3adf.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MasterWebUI(val master: Master, requestedPort: String)

@WangTaoTheTonic
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented May 4, 2015

Test build #31757 has finished for PR 5722 at commit 28a3adf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MasterWebUI(val master: Master, requestedPort: String)


val serviceString = if (serviceName.isEmpty) "" else s" '$serviceName'"
val maxRetries = portMaxRetries(conf)
for (offset <- 0 to maxRetries) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you still have the same bug here. Imagine your port range is 4200:4300 (100 ports to try), but your spark.port.maxRetries is the default (16). You'll only try 16 ports before throwing an exception, when the user explicitly asked for a 100-port range to be tried.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah...Like I said above:

I think minPort and maxPort is the range that port will limited to (if successfully), and maxRetries is the max retry times to try to bind.
But take another thought, these three configs are proposed for same reason (limit port range #1777), and they maybe overlap but maxRetires is kinda global setting.

So basically I think spark.port.maxRetries is kind of a global setting while min:max is special for a single port.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why bother with port ranges at all? All port ranges are basically port:port + spark.port.maxRetries with your current code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case max - min < maxRetries they are not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanzin since you may for example want a single port for one setting, a tight range of 3 for another, and a wide range of 100 for another. I could conceive that you want to choose from potentially 100 ports, but not retry 100 times, or specify 1 port but actually allow some retries. But I do agree that maxRetries becomes largely redundant with this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not only redundant, it's super confusing. The behavior is not "define a min and max port", but rather "define a min and max port, but hey, the max port is actually bound by the max number of retries, which is a global configuration" and applies to all ports.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good question how to reconcile this with current behavior, yes. Right now, everything is a single port and implicitly allowed to try many different ports, up to a number of retries. After this, a single port will mean a single port only. I don't think both can be supported at the same time.

The main issue I foresee is that port collisions on default ports will no longer be resolved automatically. I suppose that, where a user has set something to an explicit port, he/she probably doesn't necessarily want or expect different ports to get used. The user that has configured nothing may be surprised that starting two Spark drivers on one machine won't work anymore though.

The various defaults in Spark (e.g. web UI on 4040) might change to "[4040,4055]" to match current behavior. It would be more explicit too. But it makes Spark and users hard-code this arbitrary "max retries" in the ranges.

Possibly there can be a new syntax like "[4040,]" meaning "try ports from 4040". And then maxRetries can have its current role as controlling just how many more ports to try.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this, a single port will mean a single port only.

I don't think that's the case in the current code. And in any case, the behaviour could not become that without changing the property names, otherwise it would break backwards compatibility.

In any case, it just seems extremely confusing to be able to specify a max but not have that really take effect unless you also modify some other property. That doesn't add anything to the existing behavior (which is already a range, except the max is implicit by being min + maxRetries). It makes the feature pretty useless in my view.

@SparkQA
Copy link

SparkQA commented May 24, 2015

Test build #33440 has finished for PR 5722 at commit dff9542.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MasterWebUI(val master: Master, requestedPort: String)

@WangTaoTheTonic
Copy link
Contributor Author

@srowen Is it ok to go?

@WangTaoTheTonic
Copy link
Contributor Author

Looks like @vanzin is okay about this, what more concerns do you have? @srowen

@SparkQA
Copy link

SparkQA commented May 29, 2015

Test build #33737 has finished for PR 5722 at commit 4db351a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MasterWebUI(val master: Master, requestedPort: String)

@tgravescs
Copy link
Contributor

I haven't been keeping up with this. @vanzin @srowen do you have further comments on this?

@andrewor14
Copy link
Contributor

Any updates on this @vanzin @srowen? I took another quick look myself and found it kind of weird that we're passing strings as ports everywhere. What was the original motivation for doing this again? Revisiting the discussion about maxRetryPorts, is it worth having essentially different retry ports for each port? Personally I'm not super sold on this feature unless there's a compelling use case for it.

@vanzin
Copy link
Contributor

vanzin commented Jul 2, 2015

Code-wise I think this is ok; but I raised the same concern in a previous comment (#5722 (comment)). Basically, having all ranges be implicit (start port + a shared "max range size") seems to be the less problematic approach (you still may run into issues, but at least at that point it's not a configuration issue).

@WangTaoTheTonic
Copy link
Contributor Author

@andrewor14 There are two motivations about this, first is mention in #3314, we would like to control the retry range when user set some port to 0, but @srowen came up this idea which is setting each port to a range.

The other motivation is mentioned in #5657. When we start a public port on a specified setting, it better not to retry becasue once the retry happens client will fail to connect it.

@andrewor14
Copy link
Contributor

The other motivation is mentioned in #5657. When we start a public port on a specified setting, it better not to retry becasue once the retry happens client will fail to connect it.

The users can just configure spark.port.maxRetries themselves can't they?

@andrewor14 There are two motivations about this, first is mention in #3314, we would like to control the retry range when user set some port to 0, but @srowen came up this idea which is setting each port to a range.

I see, the idea is that if a service has a random port (0), then it will pick another random one within a particular range. However, I still don't see how much value that actually adds at the cost of invasive changes throughout the code base. It's weird that all the ports are now strings, even the ones that will never start at 0. I'm personally still not sold on the extra complexity this feature brings to the code.

@srowen
Copy link
Member

srowen commented Sep 2, 2015

@andrewor14 my short summary of the problem and/or appeal of a change is that, right now, if you set a port to P, then Spark will always try ports P to P+(N-1), where N is spark.port.maxRetries. This makes sense for some ports (executors, etc.) even if you can imagine that in some cases you want executors to try 9000-9010, and web UIs to try 8000-8080 or something, to match firewall rules -- that is, different N. More importantly imagine running the web UI on port 80; it doesn't make sense to march through 81, 82, 83 as those are different well-known ports.

The problem is indeed that it's a fairly invasive change. I think it would go something like this:

  • If a port is specified as a number, continue to try several ports following according to spark.port.maxRetries for backwards compatibility
  • Deprecate spark.port.maxRetries I think
  • Introduce new "[min,max]" range syntax for ports
  • Change all the port handling code everywhere to handle (Int,Int)
  • Change port arg parsing code to handle both cases

I admit I think this would be a good change; I'm also put off by how much change it would be.

@vanzin
Copy link
Contributor

vanzin commented Sep 2, 2015

It's true that for some ports we want a well-defined, fixed value, but perhaps this is not the right way to fix it? For ports that require a well-defined value, isn't it better to make sure that the code that starts that service does not retry at all, instead of making configuration for other ports more complicated?

@tgravescs
Copy link
Contributor

Its not necessarily a well defined value but a well defined range. This is how many network acls work. You can open up specific port ranges for clients to use.

@vanzin
Copy link
Contributor

vanzin commented Sep 2, 2015

You can open up specific port ranges for clients to use.

But that's handled today with "port" + "maxPortRetries", right? The other use case is, for example, "the shuffle service must bind to port X, and no other port". That's independent from, for example, the port range where the user wants to set up his executor's akka service.

@andrewor14
Copy link
Contributor

More importantly imagine running the web UI on port 80; it doesn't make sense to march through 81, 82, 83 as those are different well-known ports.

I would argue that Spark should never use any well-known ports (below 1024). I think it's OK for Spark to retry at 1025 and beyond. Also I thought the original motivation was for ports that start at 0, where we will pick a random port within a user-specified range that's smaller than 1024 - 65536. Is that not the case?

@andrewor14
Copy link
Contributor

Either way, whether or not we want this feature I agree with @vanzin that this is not the right fix. I've run Spark in a fire-walled environment where I had to assign a special port to each component that would otherwise default at 0. E.g. connection manager at port 9910, block manager at port 9920, executor at port 9930 etc. This is effectively the same as specifying a port range if we set spark.port.maxRetries = 10.

@srowen
Copy link
Member

srowen commented Sep 2, 2015

@vanzin you're arguing that you either want to retry all ports N times, or not at all (N=0). That does describe a lot of use cases, but it wouldn't let you, say, let some ports take on a fixed value and not others. If you mean, let N vary per port, yes, that seems to me exactly what a range is. 9999 or [9999,9999] means 9999 + 0 retries; [9999,10001] means 9999 + 2 retries.

Sure, it's an open question: how bad does anyone need to do that?

@andrewor14 yes this came up again for exactly this reason -- someone really wanted to run the web UIs on port 80 (can't find the JIRA just now). I also wondered if this is good practice; not sure it is. There's a separate issue right now that the logic won't allow you to do this. Even if that were fixed you're faced with having to use fixed ports for all ports if any of them (like port 80) must be fixed.

I'm pretty neutral -- if this were being written from scratch I'd push for range syntax, but not sure now just given the code change needed.

@vanzin
Copy link
Contributor

vanzin commented Sep 2, 2015

If you mean, let N vary per port, yes, that seems to me exactly what a range is.

Correct, but my original question was the same as yours: is there a use case for that?

My view is that no, there isn't. Let's take two ports, "webui" and "akka", and assign different ranges to them. There's only a 10 port range for "webui", and a 1000 port range for "akka". In practice, though, it's as if both ranges were 10, since as soon as you start the 11th application, the webui can't bind and Spark will exit.

@srowen
Copy link
Member

srowen commented Sep 2, 2015

What if you want to run your driver UI on exactly one port, but need the executors to try many ports because they may collide with other executors on the cluster from other apps? or even because it's possible you'll schedule two of your executors on one machine? Is that realistic -- I wonder if I miss some reason this won't happen.

@vanzin
Copy link
Contributor

vanzin commented Sep 2, 2015

What if you want to run your driver UI on exactly one port

Ok, I can somewhat see that argument, although I don't see why someone would require that Spark's UI bind to a specific port. If they're building some application on top of Spark that has its own UI, I'd understand, but then they'd have their own code to decide to which port that UI would bind.

@srowen
Copy link
Member

srowen commented Sep 2, 2015

I think it's almost all about binding to port 80. It doesn't make sense to retry on 81, 82, etc. even if it makes sense for all the other daemons to retry 9000, 9001, etc. Right now the code won't even let you use port 80. Interesting question whether it should be allowed. But yeah that is I think the most legitimate use case you can construct for this, which isn't crazy at least.

@andrewor14
Copy link
Contributor

Ok, I can somewhat see that argument, although I don't see why someone would require that Spark's UI bind to a specific port. If they're building some application on top of Spark that has its own UI, I'd understand, but then they'd have their own code to decide to which port that UI would bind.

Ah I see, I think the use case is if a third party app wants to always point to the SparkUI at 4040 and would rather fail the driver early if it can't bind to that port. I can buy that.

I think my main issue with this patch in its current state is that we're propagating the port string semantics everywhere. Instead I would like to see the complexity contained no further than parsing the config, i.e. we should not pass strings around everywhere and expect people to understand how to parse it downstream.

For instance, we could have something like:

val (port1: Int, maxRetries1: Int) = conf.getPortAndMaxRetries(..., "5050:5080")
val (port2: Int, _) = conf.getPortAndMaxRetries(..., "9988")

and this method will take care of the validation all the parsing.

@tgravescs
Copy link
Contributor

So it seems that the documentation is not very clear on this also:

spark.port.maxRetries 16 Default maximum number of retries when binding to a port before giving up.

when I read that it doesn't mean that it increments the port from the start point. I read this as retry the same port 16 times.

That can be fixed in a separate jira though.

https://issues.apache.org/jira/browse/SPARK-10432

@WangTaoTheTonic
Copy link
Contributor Author

Hi guys, sorry for seeing your comments so late. I think we already come up with an agreement with two cases this patch wanna fix:

One is that some ports should bind to specified value so that client could connect to them without retries.
The other one is that for some random port (start at 0) we wanna control their ranges too.

And the main problem is that current code is propagating port range as a string everywhere, the better way is converting port argument at parsing stage then passing them as (min, max) in arguments list.

Did I understand it in right way? If yes I will rebase/rewrite the code.

@srowen
Copy link
Member

srowen commented Sep 25, 2015

I'd say it slightly differently: the goal is to define, for every configurable port, a range of 1 or more port numbers that may be bound to it. It could be a single port, or a range.

Yes, plumbing through a range of (Int,Int) seems better than String or Int, although the plumbing work will be significant.

The other problem is backwards compatibility. I think that a single number like "8080" could be interpreted as the range "[8080,8080]" but then that changes behavior, as previously, this would cause retries through to port 8095. We can make a breaking change. Or, we can preserve the behavior. So 8080 means [8080,8095]. Callers would use [8080,8080] to mean "only 8080". Which is clunkier then.

@WangTaoTheTonic
Copy link
Contributor Author

I think we better keep backwards compatibility and treat [min, max] as a fine grained control on exact port while the "only 8080" with max retries as a coarse one, which means if user config a range, let's ignore the function of max retries and if user specify a exact value for a port, we retry from the config value until we retrive the max retries.

Maybe weakening the max retries config in first step and removing it in next few releases is better idea.

@andrewor14
Copy link
Contributor

I would like to move the discussion to the JIRA since there's still disagreement over the high level approach and the changes here are not what we want. I think this requires a broader discussion considering Spark 2.0 is around the corner, so an invasive change is less of a concern provided that we do it the right way this time.

@WangTaoTheTonic would you mind closing this PR for now since there hasn't been activity for more than 2 months?

@asfgit asfgit closed this in ce5fd40 Dec 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants