Skip to content

Commit 77b287e

Browse files
mccheahfoxish
authored andcommitted
Development workflow documentation for the current state of the world. (alteryx#20)
* Development workflow documentation for the current state of the world. * Address comments. * Clarified code change and added ticket link
1 parent 909b281 commit 77b287e

File tree

1 file changed

+56
-0
lines changed

1 file changed

+56
-0
lines changed
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
layout: global
3+
title: Spark on Kubernetes Development
4+
---
5+
6+
[Kubernetes](https://kubernetes.io/) is a framework for easily deploying, scaling, and managing containerized
7+
applications. It would be useful for a user to run their Spark jobs on a Kubernetes cluster alongside their
8+
other Kubernetes-managed applications. For more about the motivations for adding this feature, see the umbrella JIRA
9+
ticket that tracks this project: [SPARK-18278](https://issues.apache.org/jira/browse/SPARK-18278).
10+
11+
This submodule is an initial implementation of allowing Kubernetes to be a
12+
supported cluster manager for Spark, along with Mesos, Hadoop YARN, and Standalone. This document provides a summary of
13+
important matters to keep in mind when developing this feature.
14+
15+
# Building Spark with Kubernetes Support
16+
17+
To build Spark with Kubernetes support, use the `kubernetes` profile when invoking Maven. For example, to simply compile
18+
the Kubernetes core implementation module along with its dependencies:
19+
20+
build/mvn compile -Pkubernetes -pl resource-managers/kubernetes/core -am
21+
22+
To build a distribution of Spark with Kubernetes support, use the `dev/make-distribution.sh` script, and add the
23+
`kubernetes` profile as part of the build arguments. Any other build arguments can be specified as one would expect when
24+
building Spark normally. For example, to build Spark against Hadoop 2.7 and Kubernetes:
25+
26+
dev/make-distribution.sh --tgz -Phadoop2.7 -Pkubernetes
27+
28+
# Kubernetes Code Modules
29+
30+
Below is a list of the submodules for this cluster manager and what they do.
31+
32+
* `core`: Implementation of the Kubernetes cluster manager support.
33+
* `integration-tests`: Integration tests for the project.
34+
* `docker-minimal-bundle`: Base Dockerfiles for the driver and the executors. The Dockerfiles are used for integration
35+
tests as well as being provided in packaged distributions of Spark.
36+
* `integration-tests-spark-jobs`: Spark jobs that are only used in integration tests.
37+
* `integration-tests-spark-jobs-helpers`: Dependencies for the spark jobs used in integration tests. These dependencies
38+
are separated out to facilitate testing the shipping of jars to drivers running on Kubernetes clusters.
39+
40+
# Running the Kubernetes Integration Tests
41+
42+
Note that the integration test framework is currently being heavily revised and is subject to change.
43+
44+
Running any of the integration tests requires including `kubernetes-integration-tests` profile in the build command. In
45+
order to prepare the environment for running the integration tests, the `pre-integration-test` step must be run in Maven
46+
on the `resource-managers/kubernetes/integration-tests` module:
47+
48+
build/mvn pre-integration-test -Pkubernetes -Pkubernetes-integration-tests -pl resource-managers/kubernetes/integration-tests -am
49+
50+
Afterwards, the integration tests can be executed with Maven or your IDE. Note that when running tests from an IDE, the
51+
`pre-integration-test` phase must be run every time the Spark main code changes. When running tests from the
52+
command line, the `pre-integration-test` phase should automatically be invoked if the `integration-test` phase is run.
53+
54+
# Usage Guide
55+
56+
See the [usage guide](../../docs/running-on-kubernetes.md) for more information.

0 commit comments

Comments
 (0)