|
| 1 | +--- |
| 2 | +layout: global |
| 3 | +title: Spark on Kubernetes Development |
| 4 | +--- |
| 5 | + |
| 6 | +[Kubernetes](https://kubernetes.io/) is a framework for easily deploying, scaling, and managing containerized |
| 7 | +applications. It would be useful for a user to run their Spark jobs on a Kubernetes cluster alongside their |
| 8 | +other Kubernetes-managed applications. For more about the motivations for adding this feature, see the umbrella JIRA |
| 9 | +ticket that tracks this project: [SPARK-18278](https://issues.apache.org/jira/browse/SPARK-18278). |
| 10 | + |
| 11 | +This submodule is an initial implementation of allowing Kubernetes to be a |
| 12 | +supported cluster manager for Spark, along with Mesos, Hadoop YARN, and Standalone. This document provides a summary of |
| 13 | +important matters to keep in mind when developing this feature. |
| 14 | + |
| 15 | +# Building Spark with Kubernetes Support |
| 16 | + |
| 17 | +To build Spark with Kubernetes support, use the `kubernetes` profile when invoking Maven. For example, to simply compile |
| 18 | +the Kubernetes core implementation module along with its dependencies: |
| 19 | + |
| 20 | + build/mvn compile -Pkubernetes -pl resource-managers/kubernetes/core -am |
| 21 | + |
| 22 | +To build a distribution of Spark with Kubernetes support, use the `dev/make-distribution.sh` script, and add the |
| 23 | +`kubernetes` profile as part of the build arguments. Any other build arguments can be specified as one would expect when |
| 24 | +building Spark normally. For example, to build Spark against Hadoop 2.7 and Kubernetes: |
| 25 | + |
| 26 | + dev/make-distribution.sh --tgz -Phadoop2.7 -Pkubernetes |
| 27 | + |
| 28 | +# Kubernetes Code Modules |
| 29 | + |
| 30 | +Below is a list of the submodules for this cluster manager and what they do. |
| 31 | + |
| 32 | +* `core`: Implementation of the Kubernetes cluster manager support. |
| 33 | +* `integration-tests`: Integration tests for the project. |
| 34 | +* `docker-minimal-bundle`: Base Dockerfiles for the driver and the executors. The Dockerfiles are used for integration |
| 35 | + tests as well as being provided in packaged distributions of Spark. |
| 36 | +* `integration-tests-spark-jobs`: Spark jobs that are only used in integration tests. |
| 37 | +* `integration-tests-spark-jobs-helpers`: Dependencies for the spark jobs used in integration tests. These dependencies |
| 38 | + are separated out to facilitate testing the shipping of jars to drivers running on Kubernetes clusters. |
| 39 | + |
| 40 | +# Running the Kubernetes Integration Tests |
| 41 | + |
| 42 | +Note that the integration test framework is currently being heavily revised and is subject to change. |
| 43 | + |
| 44 | +Running any of the integration tests requires including `kubernetes-integration-tests` profile in the build command. In |
| 45 | +order to prepare the environment for running the integration tests, the `pre-integration-test` step must be run in Maven |
| 46 | +on the `resource-managers/kubernetes/integration-tests` module: |
| 47 | + |
| 48 | + build/mvn pre-integration-test -Pkubernetes -Pkubernetes-integration-tests -pl resource-managers/kubernetes/integration-tests -am |
| 49 | + |
| 50 | +Afterwards, the integration tests can be executed with Maven or your IDE. Note that when running tests from an IDE, the |
| 51 | +`pre-integration-test` phase must be run every time the Spark main code changes. When running tests from the |
| 52 | +command line, the `pre-integration-test` phase should automatically be invoked if the `integration-test` phase is run. |
| 53 | + |
| 54 | +# Usage Guide |
| 55 | + |
| 56 | +See the [usage guide](../../docs/running-on-kubernetes.md) for more information. |
0 commit comments