Skip to content

Miniconda py38 and Spark 3.0 #1115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 4, 2020
Merged

Miniconda py38 and Spark 3.0 #1115

merged 3 commits into from
Jul 4, 2020

Conversation

Bidek56
Copy link
Contributor

@Bidek56 Bidek56 commented Jun 19, 2020

Upgrading Spark to 3.0
Upgrading Python to 3.8
Removing Toree due to lack of support
Open JDK needs to stay at 11 because of SparkR limitation
Spylon kernel may need to be removed soon since it has not been maintained in years

@parente
Copy link
Member

parente commented Jun 19, 2020

We should consider swapping out Toree and spylon for https://almond.sh/

@Bidek56
Copy link
Contributor Author

Bidek56 commented Jun 19, 2020

We could add it but Almond has a Dockerfile already which extends from base-notebook, not sure what value does a copy of others code adds here. It will make the image larger than it already is. Maybe add a link to the Almond repo in the docs?

@parente
Copy link
Member

parente commented Jun 21, 2020

We could add it but Almond has a Dockerfile already which extends from base-notebook ...

Good call. That it does.

not sure what value does a copy of others code adds here

Without Toree or Spylon in the all-spark-notebook image, there's no support for using Scala in Jupyter in the image any longer. Maybe it's fine to support Python and R only but it is a significant departure from the original purpose of the image.

@Bidek56
Copy link
Contributor Author

Bidek56 commented Jun 22, 2020

These's a good chance Toree will be upgraded to work with JDK 11+ now that Spark 3.0 is out.
@parente do you still have access to the Spylong so it can be upgraded? Thx

@@ -88,7 +88,7 @@ lint-build-test-all: $(foreach I,$(ALL_IMAGES),lint/$(I) arch_patch/$(I) build/$

lint-install: ## install hadolint
@echo "Installing hadolint at $(HADOLINT) ..."
@curl -sL -o $(HADOLINT) "https://github.com/hadolint/hadolint/releases/download/v1.17.6/hadolint-$(shell uname -s)-$(shell uname -m)"
@curl -sL -o $(HADOLINT) "https://github.com/hadolint/hadolint/releases/download/v1.18.0/hadolint-$(shell uname -s)-$(shell uname -m)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this update. I didn't realize that the version was hardcoded -- my bad. It could be an opportunity to put the hadolint version in a variable and to print it in the echo to know the version that was used.

Copy link
Collaborator

@romainx romainx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,

Just a remark if Scala kernel(s) is(are) removed we need also to remove the corresponding example(s) from the Image Specifics page.

Best

@@ -76,14 +76,14 @@ RUN mkdir /home/$NB_USER/work && \

# Install conda as jovyan and check the md5 sum provided on the download site
ENV MINICONDA_VERSION=4.8.2 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a new version 4.8.3 (Miniconda3-py38_4.8.3-Linux-x86_64.sh) in the miniconda repo. We should also use this new version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to limit the change set to avoid breaking things, I figured once this PR gets merged then I can do it in the next PR, unless disagree.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok it makes sense. Thanks.

@parente
Copy link
Member

parente commented Jun 25, 2020

@parente do you still have access to the Spylong so it can be upgraded?

I would remove spylon from the image as unmaintained.

@Bidek56
Copy link
Contributor Author

Bidek56 commented Jun 25, 2020

@parente do you still have access to the Spylong so it can be upgraded?

I would remove spylon from the image as unmaintained.

Do you want to do it in this PR or next one? It still works with Spark 3.0.

@parente
Copy link
Member

parente commented Jun 26, 2020

If it's working with 3.0, go ahead and leave it for now.

@maresb
Copy link
Contributor

maresb commented Jun 30, 2020

Looks like Travis CI failed due to a web error? It builds locally for me.

Note the warning:

Solving environment: ...working... WARNING conda.core.solve:_add_specs(601): pinned spec conda==4.8.2 conflicts with explicit specs.  Overriding pinned spec.

I don't know where the aforementioned conflicting explicit spec is, but perhaps it would be better after all to just bump to 4.8.3:

Miniconda3-py38_4.8.3-Linux-x86_64.sh | 88.7M | 2020-06-16 14:57:56 | d63adf39f2c220950a063e0529d4ff74

@Bidek56
Copy link
Contributor Author

Bidek56 commented Jun 30, 2020

@parente Can we merge this PR yet? Thx

@maresb
Copy link
Contributor

maresb commented Jun 30, 2020

To be more specific, base-notebook builds for me. I didn't try the others...

@Bidek56
Copy link
Contributor Author

Bidek56 commented Jun 30, 2020

The PR builds fine, the failure is a Travis issue not the code.

@parente
Copy link
Member

parente commented Jul 2, 2020

@Bidek56 I'm planning on tagging the last images containing Spark 2.x and then merging this PR when I get a few minutes in the next day or two.

@parente
Copy link
Member

parente commented Jul 4, 2020

I've tagged the latest pyspark and all-spark images with spark-2. Merging now.

@parente parente merged commit 229c7fe into jupyter:master Jul 4, 2020
@Bidek56 Bidek56 deleted the miniconda-py38 branch July 4, 2020 14:33
@lresende
Copy link
Member

These's a good chance Toree will be upgraded to work with JDK 11+ now that Spark 3.0 is out.
@parente do you still have access to the Spylong so it can be upgraded? Thx

Yes, release coming soon.

@parente
Copy link
Member

parente commented Jul 20, 2020

@lresende Nothing prevents coexistence. There was a request + push to get Spark 3.0 into the images. We can add toree back as soon as it's compatible.

romainx added a commit to romainx/docker-stacks that referenced this pull request Aug 15, 2020
Allow to build `pyspark-notebook` image with an alternative Spark version.

- Define arguments for Spark installation
- Add a note in "Image Specifics" explaining how to build an image with an alternative Spark version
- Remove Toree documentation from "Image Specifics" since its support has been droped in jupyter#1115
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants