-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
What docker image(s) are you using?
pyspark-notebook
OS system and architecture running docker image
any
What Docker command are you running?
docker build ./docker-stacks/pyspark-notebook
--build-arg spark_version="3.1.3" \
--build-arg spark_checksum="2AF22A096CAA616F7C2045D2CA2291DCBBB9112233434C2249944FBD6CE85AF58743A479A86BA8454E86EF3A3D5730664466EE6D12D05AB03C9F2128CCD8AB1B" \
--build-arg openjdk_version="11" \
--build-arg hadoop_version="3.2"
How to Reproduce the problem?
Building the pyspark-notebook
Docker image for 3.1.3 (as well as 3.1.2 or 2.4.8) is no longer possible as PR #1727 introduced a mandatory ${scala_version}
variable to the file path that's used to download the respective Spark archive:
docker-stacks/pyspark-notebook/Dockerfile
Line 35 in aeb9402
RUN wget -q "https://archive.apache.org/dist/spark/spark-${APACHE_SPARK_VERSION}/spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}-scala${scala_version}.tgz" && \ |
that is not applicable to these older versions of Spark.
Neither 3.1.3 nor 3.1.2 are offered by the Apache Spark project for specific Scala versions (https://archive.apache.org/dist/spark/spark-3.1.3/ and https://archive.apache.org/dist/spark/spark-3.1.2/), nor are the Hadoop-binaries for 2.4.8 provided with specific Scala versions (https://archive.apache.org/dist/spark/spark-2.4.8/).
Command output
No response
Expected behavior
The notebook should build for all versions of Spark, at least the ones that are actively supported/maintained by the Apache Spark project.
Actual behavior
File download fails because the specified file simply does not exist.
Anything else?
This regression breaks builds for the Anovos library (cf. https://github.com/anovos/anovos/runs/7518229159?check_suite_focus=true)