Add support for dynamic allocation via shuffle tracking (kubeflow#976)

TomHellier · Jul 16, 2020 · 3ca7472 · 3ca7472
1 parent 555c27a
commit 3ca7472
Show file tree

Hide file tree

Showing 9 changed files with 326 additions and 9 deletions.
diff --git a/docs/api-docs.md b/docs/api-docs.md
@@ -586,6 +586,21 @@ SparkUIConfiguration
 <p>SparkUIOptions allows configuring the Service and the Ingress to expose the sparkUI</p>
 </td>
 </tr>
+<tr>
+<td>
+<code>dynamicAllocation</code></br>
+<em>
+<a href="#sparkoperator.k8s.io/v1beta2.DynamicAllocation">
+DynamicAllocation
+</a>
+</em>
+</td>
+<td>
+<em>(Optional)</em>
+<p>DynamicAllocation configures dynamic allocation that becomes available for the Kubernetes
+scheduleer backend since Spark 3.0.</p>
+</td>
+</tr>
 </table>
 </td>
 </tr>
@@ -969,6 +984,86 @@ executors to connect to the driver.</p>
 <p>
 <p>DriverState tells the current state of a spark driver.</p>
 </p>
+<h3 id="sparkoperator.k8s.io/v1beta2.DynamicAllocation">DynamicAllocation
+</h3>
+<p>
+(<em>Appears on:</em>
+<a href="#sparkoperator.k8s.io/v1beta2.SparkApplicationSpec">SparkApplicationSpec</a>)
+</p>
+<p>
+<p>DynamicAllocation contains configuration options for dynamic allocation.</p>
+</p>
+<table>
+<thead>
+<tr>
+<th>Field</th>
+<th>Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>
+<code>enabled</code></br>
+<em>
+bool
+</em>
+</td>
+<td>
+<p>Enabled controls whether dynamic allocation is enabled or not.</p>
+</td>
+</tr>
+<tr>
+<td>
+<code>initialExecutors</code></br>
+<em>
+int32
+</em>
+</td>
+<td>
+<em>(Optional)</em>
+<p>InitialExecutors is the initial number of executors to request. If .spec.executor.instances
+is also set, the initial number of executors is set to the bigger of that and this option.</p>
+</td>
+</tr>
+<tr>
+<td>
+<code>minExecutors</code></br>
+<em>
+int32
+</em>
+</td>
+<td>
+<em>(Optional)</em>
+<p>MinExecutors is the lower bound for the number of executors if dynamic allocation is enabled.</p>
+</td>
+</tr>
+<tr>
+<td>
+<code>maxExecutors</code></br>
+<em>
+int32
+</em>
+</td>
+<td>
+<em>(Optional)</em>
+<p>MaxExecutors is the upper bound for the number of executors if dynamic allocation is enabled.</p>
+</td>
+</tr>
+<tr>
+<td>
+<code>shuffleTrackingTimeout</code></br>
+<em>
+int64
+</em>
+</td>
+<td>
+<em>(Optional)</em>
+<p>ShuffleTrackingTimeout controls the timeout in milliseconds for executors that are holding
+shuffle data if shuffle tracking is enabled (true by default if dynamic allocation is enabled).</p>
+</td>
+</tr>
+</tbody>
+</table>
 <h3 id="sparkoperator.k8s.io/v1beta2.ExecutorSpec">ExecutorSpec
 </h3>
 <p>
@@ -2072,6 +2167,21 @@ SparkUIConfiguration
 <p>SparkUIOptions allows configuring the Service and the Ingress to expose the sparkUI</p>
 </td>
 </tr>
+<tr>
+<td>
+<code>dynamicAllocation</code></br>
+<em>
+<a href="#sparkoperator.k8s.io/v1beta2.DynamicAllocation">
+DynamicAllocation
+</a>
+</em>
+</td>
+<td>
+<em>(Optional)</em>
+<p>DynamicAllocation configures dynamic allocation that becomes available for the Kubernetes
+scheduleer backend since Spark 3.0.</p>
+</td>
+</tr>
 </tbody>
 </table>
 <h3 id="sparkoperator.k8s.io/v1beta2.SparkApplicationStatus">SparkApplicationStatus
@@ -2582,7 +2692,7 @@ string
 <a href="#sparkoperator.k8s.io/v1beta2.SparkApplicationSpec">SparkApplicationSpec</a>)
 </p>
 <p>
-<p>Specific SparkUI config parameters</p>
+<p>SparkUIConfiguration is for driver UI specific configuration parameters.</p>
 </p>
 <table>
 <thead>
@@ -2636,5 +2746,5 @@ map[string]string
 <hr/>
 <p><em>
 Generated with <code>gen-crd-api-reference-docs</code>
-on git commit <code>f313873</code>.
+on git commit <code>555c27a</code>.
 </em></p>
diff --git a/docs/user-guide.md b/docs/user-guide.md
@@ -34,6 +34,7 @@ The Kubernetes Operator for Apache Spark ships with a command-line tool called `
     * [Using Container LifeCycle Hooks](#using-container-lifecycle-hooks)
     * [Python Support](#python-support)
     * [Monitoring](#monitoring)
+    * [Dynamic Allocation](#dynamic-allocation)
 * [Working with SparkApplications](#working-with-sparkapplications)
     * [Creating a New SparkApplication](#creating-a-new-sparkapplication)
     * [Deleting a SparkApplication](#deleting-a-sparkapplication)
@@ -638,6 +639,21 @@ spec:
 
 The operator automatically adds the annotations such as `prometheus.io/scrape=true` on the driver and/or executor pods (depending on the values of  `.spec.monitoring.exposeDriverMetrics` and `.spec.monitoring.exposeExecutorMetrics`) so the metrics exposed on the pods can be scraped by the Prometheus server in the same cluster.
 
+### Dynamic Allocation
+
+The operator supports a limited form of [Spark Dynamic Resource Allocation](http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation) through the shuffle tracking enhancement introduced in Spark 3.0.0 *without needing an external shuffle service* (not available in the Kubernetes mode). See this [issue](https://issues.apache.org/jira/browse/SPARK-27963) for detais on the enhancement. To enable this limited form of dynamic allocation, follow the example below:
+
+```yaml
+spec:
+  dynamicAllocation:
+    enabled: true
+    initialExecutors: 2
+    minExecutors: 2
+    maxExecutors: 10
+```
+
+Note that if dynamic allocation is enabled, the number of executors to request initially is set to the bigger of `.spec.dynamicAllocation.initialExecutors` and `.spec.executor.instances` if both are set.
+
 ## Working with SparkApplications
 
 ### Creating a New SparkApplication

diff --git a/manifest/crds/sparkoperator.k8s.io_scheduledsparkapplications.yaml b/manifest/crds/sparkoperator.k8s.io_scheduledsparkapplications.yaml
@@ -1800,6 +1800,23 @@ spec:
                         type: object
                       type: array
                   type: object
+                dynamicAllocation:
+                  properties:
+                    enabled:
+                      type: boolean
+                    initialExecutors:
+                      format: int32
+                      type: integer
+                    maxExecutors:
+                      format: int32
+                      type: integer
+                    minExecutors:
+                      format: int32
+                      type: integer
+                    shuffleTrackingTimeout:
+                      format: int64
+                      type: integer
+                  type: object
                 executor:
                   properties:
                     affinity:

diff --git a/manifest/crds/sparkoperator.k8s.io_sparkapplications.yaml b/manifest/crds/sparkoperator.k8s.io_sparkapplications.yaml
@@ -1786,6 +1786,23 @@ spec:
                     type: object
                   type: array
               type: object
+            dynamicAllocation:
+              properties:
+                enabled:
+                  type: boolean
+                initialExecutors:
+                  format: int32
+                  type: integer
+                maxExecutors:
+                  format: int32
+                  type: integer
+                minExecutors:
+                  format: int32
+                  type: integer
+                shuffleTrackingTimeout:
+                  format: int64
+                  type: integer
+              type: object
             executor:
               properties:
                 affinity:

diff --git a/pkg/apis/sparkoperator.k8s.io/v1beta2/types.go b/pkg/apis/sparkoperator.k8s.io/v1beta2/types.go
@@ -276,6 +276,10 @@ type SparkApplicationSpec struct {
 	// SparkUIOptions allows configuring the Service and the Ingress to expose the sparkUI
 	// +optional
 	SparkUIOptions *SparkUIConfiguration `json:"sparkUIOptions,omitempty"`
+	// DynamicAllocation configures dynamic allocation that becomes available for the Kubernetes
+	// scheduleer backend since Spark 3.0.
+	// +optional
+	DynamicAllocation *DynamicAllocation `json:"dynamicAllocation,omitempty"`
 }
 
 // BatchSchedulerConfiguration used to configure how to batch scheduling Spark Application
@@ -288,7 +292,7 @@ type BatchSchedulerConfiguration struct {
 	PriorityClassName *string `json:"priorityClassName,omitempty"`
 }
 
-// Specific SparkUI config parameters
+// SparkUIConfiguration is for driver UI specific configuration parameters.
 type SparkUIConfiguration struct {
 	// ServicePort allows configuring the port at service level that might be different from the targetPort.
 	// TargetPort should be the same as the one defined in spark.ui.port
@@ -631,6 +635,26 @@ type GPUSpec struct {
 	Quantity int64 `json:"quantity"`
 }
 
+// DynamicAllocation contains configuration options for dynamic allocation.
+type DynamicAllocation struct {
+	// Enabled controls whether dynamic allocation is enabled or not.
+	Enabled bool `json:"enabled,omitempty"`
+	// InitialExecutors is the initial number of executors to request. If .spec.executor.instances
+	// is also set, the initial number of executors is set to the bigger of that and this option.
+	// +optional
+	InitialExecutors *int32 `json:"initialExecutors,omitempty"`
+	// MinExecutors is the lower bound for the number of executors if dynamic allocation is enabled.
+	// +optional
+	MinExecutors *int32 `json:"minExecutors,omitempty"`
+	// MaxExecutors is the upper bound for the number of executors if dynamic allocation is enabled.
+	// +optional
+	MaxExecutors *int32 `json:"maxExecutors,omitempty"`
+	// ShuffleTrackingTimeout controls the timeout in milliseconds for executors that are holding
+	// shuffle data if shuffle tracking is enabled (true by default if dynamic allocation is enabled).
+	// +optional
+	ShuffleTrackingTimeout *int64 `json:"shuffleTrackingTimeout,omitempty"`
+}
+
 // PrometheusMonitoringEnabled returns if Prometheus monitoring is enabled or not.
 func (s *SparkApplication) PrometheusMonitoringEnabled() bool {
 	return s.Spec.Monitoring != nil && s.Spec.Monitoring.Prometheus != nil

diff --git a/pkg/apis/sparkoperator.k8s.io/v1beta2/zz_generated.deepcopy.go b/pkg/apis/sparkoperator.k8s.io/v1beta2/zz_generated.deepcopy.go
diff --git a/pkg/config/constants.go b/pkg/config/constants.go
@@ -148,6 +148,24 @@ const (
 	SparkDriverKubernetesMaster = "spark.kubernetes.driver.master"
 	// SparkDriverServiceAnnotationKeyPrefix is the key prefix of annotations to be added to the driver service.
 	SparkDriverServiceAnnotationKeyPrefix = "spark.kubernetes.driver.service.annotation."
+	// SparkDynamicAllocationEnabled is the Spark configuration key for specifying if dynamic
+	// allocation is enabled or not.
+	SparkDynamicAllocationEnabled = "spark.dynamicAllocation.enabled"
+	// SparkDynamicAllocationShuffleTrackingEnabled is the Spark configuration key for
+	// specifying if shuffle data tracking is enabled.
+	SparkDynamicAllocationShuffleTrackingEnabled = "spark.dynamicAllocation.shuffleTracking.enabled"
+	// SparkDynamicAllocationShuffleTrackingTimeout is the Spark configuration key for specifying
+	// the shuffle tracking timeout in milliseconds if shuffle tracking is enabled.
+	SparkDynamicAllocationShuffleTrackingTimeout = "spark.dynamicAllocation.shuffleTracking.timeout"
+	// SparkDynamicAllocationInitialExecutors is the Spark configuration key for specifying
+	// the initial number of executors to request if dynamic allocation is enabled.
+	SparkDynamicAllocationInitialExecutors = "spark.dynamicAllocation.initialExecutors"
+	// SparkDynamicAllocationMinExecutors is the Spark configuration key for specifying the
+	// lower bound of the number of executors to request if dynamic allocation is enabled.
+	SparkDynamicAllocationMinExecutors = "spark.dynamicAllocation.minExecutors"
+	// SparkDynamicAllocationMaxExecutors is the Spark configuration key for specifying the
+	// upper bound of the number of executors to request if dynamic allocation is enabled.
+	SparkDynamicAllocationMaxExecutors = "spark.dynamicAllocation.maxExecutors"
 )
 
 const (