Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5388] Provide a stable application submission gateway for standalone cluster mode #4216

Closed
wants to merge 59 commits into from

Conversation

andrewor14
Copy link
Contributor

The goal is to provide a stable, REST-based application submission gateway that is not inherently based on Akka, which is unstable across versions. This PR targets standalone cluster mode, but is implemented in a general enough manner that can be potentially extended to other modes in the future. Client mode is currently not included in the changes here because there are many more Akka messages exchanged there.

As of the changes here, the Master will advertise two ports, 7077 and 6066. We need to keep around the old one (7077) for client mode and older versions of Spark submit. However, all new versions of Spark submit will use the REST gateway (6066).

By the way this includes ~700 lines of tests and ~200 lines of license.

Andrew Or added 14 commits January 16, 2015 17:51
This commit introduces type-safe schemas for all messages exchanged
in the REST protocol. Each message is expected to contain an ACTION
field that has only one possible value for each message type.

Before the message is sent, we validate that all required fields
are in fact present, and that the value of the action field is the
correct type.

The next step is to actually integrate this in standalone mode.
This commit embeds the REST server in the standalone Master and
forces Spark submit to submit applications through the REST client.
This is the first working end-to-end implementation of a stable
submission interface in standalone cluster mode.
Previously the REST protocol was very explicitly tied to the
standalone mode. This commit frees the protocol from this
restriction.
Previously APP_ARGs, SPARK_PROPERTYs and ENVIRONMENT_VARIABLEs
will appear in the JSON at random places. Now they are grouped
together at the end of the JSON blob.
This is applicable to application arguments, Spark properties, and
environment variables, all of which were previously handled through
parameterized fields, which were cumbersome to parse. Since JSON
naturally supports nesting, we should take advantage of it too.

This commit refactors the code that converts the messages to and
from JSON in a way that subclasses can easily override the conversion
behavior without duplicating code.
This involves refactoring SparkSubmit a little to put the code
that launches the REST client in the right place. This commit also
adds port retry logic in the REST server, which was previously
missing.
This commit makes the StandaloneRestServer actually handle status
requests. The existing polling behavior from o.a.s.deploy.Client
is also implemented in the StandaloneRestClient and amended.

Additionally, the validation behavior was confusing before this
commit. Previously the error message would seem to indicate that
the user constructed a malformed message even if the message was
constructed on the server side. This commit ensures that the error
message is different for these two cases.
Otherwise it's a little ambiguous what we mean by SPARK_VERSION.
This commit does the following major things:
  (1) Refactor SparkSubmit such that SparkSubmitSuite now passes
  (2) Refactor the REST messages such that it's easier to test them
  (3) Add type-safety validation logic for REST fields
  (4) Move REST fields to its own file
  (5) Maintain ordering of fields added to REST messages
  (6) Add an option to disable the REST server, as we do in tests
@SparkQA
Copy link

SparkQA commented Jan 27, 2015

Test build #26138 has started for PR 4216 at commit 6568ca5.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 27, 2015

Test build #26138 has finished for PR 4216 at commit 6568ca5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • * (4) the main class for the child

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26138/
Test FAILed.

The motivation is to fix failing tests SparkSubmitSuite and
DriverSuite.
@SparkQA
Copy link

SparkQA commented Jan 27, 2015

Test build #26186 has started for PR 4216 at commit d8d3717.

  • This patch merges cleanly.

@vanzin
Copy link
Contributor

vanzin commented Jan 27, 2015

Hi @andrewor14 , I see the bug is targeted at 1.3. This seems like a pretty considerable change to come so close to the branching point. Should we target it at 1.4 instead?

@SparkQA
Copy link

SparkQA commented Jan 27, 2015

Test build #26186 has finished for PR 4216 at commit d8d3717.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • * (4) the main class for the child

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26186/
Test PASSed.

Andrew Or added 2 commits January 27, 2015 15:56
This is actually non-trivial because we must run standalone mode
instead of relying on the existing local-cluster mode. This means
we must manually start our own Master and Workers, and provide a
real jar when submitting the test application, which involves
manually packaging our own jar. Further, since the driver output
is difficult to obtain programmatically in cluster mode, we need
to write the results to a special file and verify them later.
@andrewor14 andrewor14 changed the title [WIP][SPARK-5388] Provide a stable application submission gateway [SPARK-5388] Provide a stable application submission gateway Jan 28, 2015
@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26217 has started for PR 4216 at commit 5879286.

  • This patch merges cleanly.

@andrewor14
Copy link
Contributor Author

@vanzin I understand your concern. Although this patch adds many lines, its scope is actually limited only to standalone cluster mode, and the default submit behavior is actually unchanged. The idea is to provide only an alternative to the existing submission gateway, not a replacement (at least for now). Also, it should be fairly straightforward to test this as we only need to test one mode. I will be sure to reiterate on review comments promptly so as not to potentially delay the release.

@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26218 has started for PR 4216 at commit efa5e18.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 6, 2015

Test build #26892 has started for PR 4216 at commit d2b1ef8.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 6, 2015

Test build #26892 has finished for PR 4216 at commit d2b1ef8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class MasterStateResponse(
    • class LocalSparkCluster(
    • * (4) the main class for the child
    • case class BoundPortsResponse(actorPort: Int, webUIPort: Int, restPort: Option[Int])
    • throw new SubmitRestMissingFieldException("Main class is missing.")

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26892/
Test FAILed.

@pwendell
Copy link
Contributor

pwendell commented Feb 6, 2015

@andrewor14 okay I think this time you are causing the test failure :)

val sparkProperties: HashMap[String, String] = new HashMap[String, String]()

// Standalone cluster mode only
var useRest: Boolean = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not actually settable by the user, so for this one it might be good to indicate it's used only internally

@pwendell
Copy link
Contributor

pwendell commented Feb 6, 2015

I did a quick pass, this is looking good, but there are some comments on the JIRA worth addressing.

@SparkQA
Copy link

SparkQA commented Feb 6, 2015

Test build #26902 has started for PR 4216 at commit b9e2a08.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 6, 2015

Test build #26903 has started for PR 4216 at commit dfe4bd7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 6, 2015

Test build #26902 has finished for PR 4216 at commit b9e2a08.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class MasterStateResponse(
    • class LocalSparkCluster(
    • * (4) the main class for the child
    • case class BoundPortsResponse(actorPort: Int, webUIPort: Int, restPort: Option[Int])
    • throw new SubmitRestMissingFieldException("Main class is missing.")

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26902/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Feb 6, 2015

Test build #26903 has finished for PR 4216 at commit dfe4bd7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class MasterStateResponse(
    • class LocalSparkCluster(
    • * (4) the main class for the child
    • case class BoundPortsResponse(actorPort: Int, webUIPort: Int, restPort: Option[Int])
    • throw new SubmitRestMissingFieldException("Main class is missing.")

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26903/
Test PASSed.

Andrew Or added 2 commits February 6, 2015 14:18
This allows us to use the raw values for these types of fields
in the JSON instead of a string version of them.
Conflicts:
	core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@andrewor14
Copy link
Contributor Author

Alright, as of the latest commit I believe I have addressed all comments both on this PR and on the JIRA. This PR is ready to go from my side.

@SparkQA
Copy link

SparkQA commented Feb 6, 2015

Test build #26967 has started for PR 4216 at commit 8d7ce07.

  • This patch merges cleanly.

@pwendell
Copy link
Contributor

pwendell commented Feb 6, 2015

Okay, this LGTM pending tests. I believe all of the feedback has been addressed. Thanks @vanzin for thorough and very helpful feedback on this patch. This protocol looks much better than it did at the beginning.

@SparkQA
Copy link

SparkQA commented Feb 6, 2015

Test build #26967 has finished for PR 4216 at commit 8d7ce07.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class MasterStateResponse(
    • class LocalSparkCluster(
    • * (4) the main class for the child
    • case class BoundPortsResponse(actorPort: Int, webUIPort: Int, restPort: Option[Int])
    • throw new SubmitRestMissingFieldException("Main class is missing.")

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26967/
Test PASSed.

asfgit pushed a commit that referenced this pull request Feb 6, 2015
…dalone cluster mode

The goal is to provide a stable, REST-based application submission gateway that is not inherently based on Akka, which is unstable across versions. This PR targets standalone cluster mode, but is implemented in a general enough manner that can be potentially extended to other modes in the future. Client mode is currently not included in the changes here because there are many more Akka messages exchanged there.

As of the changes here, the Master will advertise two ports, 7077 and 6066. We need to keep around the old one (7077) for client mode and older versions of Spark submit. However, all new versions of Spark submit will use the REST gateway (6066).

By the way this includes ~700 lines of tests and ~200 lines of license.

Author: Andrew Or <andrew@databricks.com>

Closes #4216 from andrewor14/rest and squashes the following commits:

8d7ce07 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
6f0c597 [Andrew Or] Use nullable fields for integer and boolean values
dfe4bd7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
b9e2a08 [Andrew Or] Minor comments
02b5cea [Andrew Or] Fix tests
d2b1ef8 [Andrew Or] Comment changes + minor code refactoring across the board
9c82a36 [Andrew Or] Minor comment and wording updates
b4695e7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
c9a8ad7 [Andrew Or] Do not include appResource and mainClass as properties
6fc7670 [Andrew Or] Report REST server response back to the user
40e6095 [Andrew Or] Pass submit parameters through system properties
cbd670b [Andrew Or] Include unknown fields, if any, in server response
9fee16f [Andrew Or] Include server protocol version on mismatch
09f873a [Andrew Or] Fix style
8188e61 [Andrew Or] Upgrade Jackson from 2.3.0 to 2.4.4
37538e0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
9165ae8 [Andrew Or] Fall back to Akka if endpoint was not REST
252d53c [Andrew Or] Clean up server error handling behavior further
c643f64 [Andrew Or] Fix style
bbbd329 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
792e112 [Andrew Or] Use specific HTTP response codes on error
f98660b [Andrew Or] Version the protocol and include it in REST URL
721819f [Andrew Or] Provide more REST-like interface for submit/kill/status
581f7bf [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
9e0d1af [Andrew Or] Move some classes around to reduce number of files (minor)
42e5de4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
1f1c03f [Andrew Or] Use Jackson's DefaultScalaModule to simplify messages
9229433 [Andrew Or] Reduce duplicate naming in REST field
ade28fd [Andrew Or] Clean up REST response output in Spark submit
b2fef8b [Andrew Or] Abstract the success field to the general response
6c57b4b [Andrew Or] Increase timeout in end-to-end tests
bf696ff [Andrew Or] Add checks for enabling REST when using kill/status
7ee6737 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
e2f7f5f [Andrew Or] Provide more safeguard against missing fields
9581df7 [Andrew Or] Clean up uses of exceptions
914fdff [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
e2104e6 [Andrew Or] stable -> rest
3db7379 [Andrew Or] Fix comments and name fields for better error messages
8d43486 [Andrew Or] Replace SubmitRestProtocolAction with class name
df90e8b [Andrew Or] Use Jackson for JSON de/serialization
d7a1f9f [Andrew Or] Fix local cluster tests
efa5e18 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
e42c131 [Andrew Or] Add end-to-end tests for standalone REST protocol
837475b [Andrew Or] Show the REST port on the Master UI
d8d3717 [Andrew Or] Use a daemon thread pool for REST server
6568ca5 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
77774ba [Andrew Or] Minor fixes
206cae4 [Andrew Or] Refactor and add tests for the REST protocol
63c05b3 [Andrew Or] Remove MASTER as a field (minor)
9e21b72 [Andrew Or] Action -> SparkSubmitAction (minor)
51c5ca6 [Andrew Or] Distinguish client and server side Spark versions
b44e103 [Andrew Or] Implement status requests + fix validation behavior
120ab9d [Andrew Or] Support kill and request driver status through SparkSubmit
544de1d [Andrew Or] Major clean ups in code and comments
e958cae [Andrew Or] Supported nested values in messages
484bd21 [Andrew Or] Specify an ordering for fields in SubmitDriverRequestMessage
6ff088d [Andrew Or] Rename classes to generalize REST protocol
af9d9cb [Andrew Or] Integrate REST protocol in standalone mode
53e7c0e [Andrew Or] Initial client, server, and all the messages

(cherry picked from commit 1390e56)
Signed-off-by: Patrick Wendell <patrick@databricks.com>
@asfgit asfgit closed this in 1390e56 Feb 6, 2015
@andrewor14 andrewor14 deleted the rest branch February 7, 2015 00:00
dongjoon-hyun pushed a commit that referenced this pull request Dec 27, 2023
### What changes were proposed in this pull request?
This pr aims to upgrade jackson from 2.16.0 to 2.16.1

### Why are the changes needed?
The new version bring some fix:

- [#4200](FasterXML/jackson-databind#4200): JsonSetter(contentNulls = FAIL) is ignored in delegating JsonCreator argument
- [#4216](FasterXML/jackson-databind#4216): Primitive array deserializer not being captured by DeserializerModifier
- [#4219](FasterXML/jackson-databind#4219): JsonNode.findValues() and findParents() missing expected values in 2.16.0

The full release notes as follows:

- https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.16.1

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #44494 from LuciferYang/SPARK-46508.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Aug 7, 2024
This pr aims to upgrade jackson from 2.16.0 to 2.16.1

The new version bring some fix:

- [apache#4200](FasterXML/jackson-databind#4200): JsonSetter(contentNulls = FAIL) is ignored in delegating JsonCreator argument
- [apache#4216](FasterXML/jackson-databind#4216): Primitive array deserializer not being captured by DeserializerModifier
- [apache#4219](FasterXML/jackson-databind#4219): JsonNode.findValues() and findParents() missing expected values in 2.16.0

The full release notes as follows:

- https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.16.1

No

Pass Github Actions

No

Closes apache#44494 from LuciferYang/SPARK-46508.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants