Skip to content

[BUG] - pyspark-notebook no longer builds for Spark 3.1.3, 3.1.2, 2.4.8 #1756

@ionicsolutions

Description

@ionicsolutions

What docker image(s) are you using?

pyspark-notebook

OS system and architecture running docker image

any

What Docker command are you running?

docker build ./docker-stacks/pyspark-notebook
  --build-arg spark_version="3.1.3" \
  --build-arg spark_checksum="2AF22A096CAA616F7C2045D2CA2291DCBBB9112233434C2249944FBD6CE85AF58743A479A86BA8454E86EF3A3D5730664466EE6D12D05AB03C9F2128CCD8AB1B" \
  --build-arg openjdk_version="11" \
  --build-arg hadoop_version="3.2"

How to Reproduce the problem?

Building the pyspark-notebook Docker image for 3.1.3 (as well as 3.1.2 or 2.4.8) is no longer possible as PR #1727 introduced a mandatory ${scala_version} variable to the file path that's used to download the respective Spark archive:

RUN wget -q "https://archive.apache.org/dist/spark/spark-${APACHE_SPARK_VERSION}/spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}-scala${scala_version}.tgz" && \

that is not applicable to these older versions of Spark.

Neither 3.1.3 nor 3.1.2 are offered by the Apache Spark project for specific Scala versions (https://archive.apache.org/dist/spark/spark-3.1.3/ and https://archive.apache.org/dist/spark/spark-3.1.2/), nor are the Hadoop-binaries for 2.4.8 provided with specific Scala versions (https://archive.apache.org/dist/spark/spark-2.4.8/).

Command output

No response

Expected behavior

The notebook should build for all versions of Spark, at least the ones that are actively supported/maintained by the Apache Spark project.

Actual behavior

File download fails because the specified file simply does not exist.

Anything else?

This regression breaks builds for the Anovos library (cf. https://github.com/anovos/anovos/runs/7518229159?check_suite_focus=true)

Metadata

Metadata

Assignees

No one assigned

    Labels

    type:BugA problem with the definition of one of the docker images maintained here

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions