Docker image for Apache Spark History Server with the DataFlint plugin and UI pre-installed.
# Build the image (downloads JAR from Maven Central)
docker build -t dataflint-history-server .
# Run with your event logs directory
docker run -d \
-p 18080:18080 \
-v /path/to/spark-events:/spark-history:ro \
--name dataflint-history-server \
dataflint-history-serverAccess the History Server at http://localhost:18080. Click on any application to see the DataFlint tab.
# Step 1: Build the JARs locally
./build-jars.sh
# Step 2: Build Docker image with local JAR
docker build -t dataflint-history-server --build-arg USE_LOCAL_JAR=true .
# Step 3: Run
docker run -d \
-p 18080:18080 \
-v /path/to/spark-events:/spark-history:ro \
dataflint-history-server# Set your event logs directory
export SPARK_HISTORY_DIR=/path/to/spark-events
# Start the service
docker-compose up -d| Argument | Default | Description |
|---|---|---|
SPARK_VERSION |
3.5.1 |
Apache Spark version |
SCALA_VERSION |
2.12 |
Scala binary version (2.12 or 2.13) |
DATAFLINT_VERSION |
0.8.3 |
DataFlint plugin version |
USE_LOCAL_JAR |
false |
Use locally built JAR instead of Maven |
# Spark 3.5.3 from Maven
docker build -t dataflint-hs:3.5.3 --build-arg SPARK_VERSION=3.5.3 .
# Spark 3.4.1 with Scala 2.13 from Maven
docker build -t dataflint-hs:3.4.1 \
--build-arg SPARK_VERSION=3.4.1 \
--build-arg SCALA_VERSION=2.13 .
# Spark 4.0.0 from Maven (automatically uses Scala 2.13)
docker build -t dataflint-hs:4.0.0 --build-arg SPARK_VERSION=4.0.0 .
# Local JAR for Spark 3.x
./build-jars.sh
docker build -t dataflint-hs:local --build-arg USE_LOCAL_JAR=true .
# Local JAR for Spark 4.x
./build-jars.sh
docker build -t dataflint-hs:4.0.0-local \
--build-arg USE_LOCAL_JAR=true \
--build-arg SPARK_VERSION=4.0.0 .The build-jars.sh script automates the local build process:
- Builds the React UI (
spark-ui) - Builds plugin JARs for all Scala versions (
spark-plugin) - Copies JARs to
docker/jars/directory
Prerequisites:
- Node.js 20+
- Java 8+
- sbt
| Variable | Default | Description |
|---|---|---|
SPARK_HISTORY_DIR |
./spark-events |
Host path to Spark event logs |
HISTORY_SERVER_PORT |
18080 |
Host port for History Server |
The container expects event logs at /spark-history. Mount your Spark event logs directory:
docker run -v /your/spark/events:/spark-history:ro ...| Spark Version | Scala Version | Notes |
|---|---|---|
| 3.2.x - 3.5.x | 2.12, 2.13 | Default: 2.12 |
| 4.0.x | 2.13 | Automatically selected |
To enable event logging in your Spark applications:
spark = SparkSession.builder \
.config("spark.eventLog.enabled", "true") \
.config("spark.eventLog.dir", "/path/to/spark-events") \
.getOrCreate()Or via spark-submit:
spark-submit \
--conf spark.eventLog.enabled=true \
--conf spark.eventLog.dir=/path/to/spark-events \
your_app.pyFor S3, GCS, or other cloud storage, you may need to add additional JARs and configuration:
docker run -d \
-p 18080:18080 \
-e SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=s3a://bucket/spark-events" \
-v /path/to/aws-hadoop-jars:/opt/spark/jars/cloud:ro \
dataflint-history-server- Verify event logs exist in the mounted directory
- Check logs:
docker logs dataflint-history-server - Ensure event log files are complete (not still being written)
- The DataFlint tab appears when you click on a specific application
- Verify the plugin JAR was downloaded:
docker exec dataflint-history-server ls /opt/spark/jars/dataflint*
- Ensure Node.js 20+ is installed:
node --version - Ensure Java 8+ is installed:
java -version - Ensure sbt is installed:
sbt --version - Check that you're running from the
docker/directory