-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-32204][SPARK-32182][DOCS] Add a quickstart page with Binder integration in PySpark documentation #29491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
openjdk-8-jre |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
#!/bin/bash | ||
|
||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
# This file is used for Binder integration to install PySpark available in | ||
# Jupyter notebook. | ||
|
||
VERSION=$(python -c "exec(open('python/pyspark/version.py').read()); print(__version__)") | ||
pip install "pyspark[sql,ml,mllib]<=$VERSION" |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,7 +36,7 @@ ARG APT_INSTALL="apt-get install --no-install-recommends -y" | |
# TODO(SPARK-32407): Sphinx 3.1+ does not correctly index nested classes. | ||
# See also https://github.com/sphinx-doc/sphinx/issues/7551. | ||
# We should use the latest Sphinx version once this is fixed. | ||
ARG PIP_PKGS="sphinx==3.0.4 mkdocs==1.0.4 numpy==1.18.1 pydata_sphinx_theme==0.3.1" | ||
ARG PIP_PKGS="sphinx==3.0.4 mkdocs==1.0.4 numpy==1.18.1 pydata_sphinx_theme==0.3.1 ipython==7.16.1 nbsphinx==0.7.1" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this makes me think we should have a shared requirements.txt somewhere. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We tried that in #27928 but couldn't get buy-in from a release manager. Also discussed briefly here: #29491 (comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, let's run it separately - there's already a JIRA for it SPARK-31167 as @nchammas pointed out. It's a bit complicated than I thought. |
||
ARG GEM_PKGS="jekyll:4.0.0 jekyll-redirect-from:0.16.0 rouge:3.15.0" | ||
|
||
# Install extra needed repos and refresh. | ||
|
@@ -75,6 +75,7 @@ RUN apt-get clean && apt-get update && $APT_INSTALL gnupg ca-certificates && \ | |
pip3 install $PIP_PKGS && \ | ||
# Install R packages and dependencies used when building. | ||
# R depends on pandoc*, libssl (which are installed above). | ||
# Note that PySpark doc generation also needs pandoc due to nbsphinx | ||
$APT_INSTALL r-base r-base-dev && \ | ||
$APT_INSTALL texlive-latex-base texlive texlive-fonts-extra texinfo qpdf && \ | ||
Rscript -e "install.packages(c('curl', 'xml2', 'httr', 'devtools', 'testthat', 'knitr', 'rmarkdown', 'roxygen2', 'e1071', 'survival'), repos='https://cloud.r-project.org/')" && \ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,3 +4,5 @@ PyGithub==1.26.0 | |
Unidecode==0.04.19 | ||
sphinx | ||
pydata_sphinx_theme | ||
ipython | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are there other implications to this, like, our python packaging now requires ipython when installing pyspark? (sorry ignorant question) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nope, it means nothing special and this file is used nowhere within Spark. It's just for some dev people but I doubt if this is actually often used. We should leverage this file and standardize the dependencies somehow like @nchammas and @dongjoon-hyun tried before but .. I currently don't have a good idea about how to handle it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The packages that are being installed, when installing pyspark are defined in the setup.py: https://github.com/apache/spark/blob/master/python/setup.py#L206 The defacto way is to add an Happy to migrate this to the setup.py if you like. |
||
nbsphinx |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -45,8 +45,20 @@ | |
'sphinx.ext.viewcode', | ||
'sphinx.ext.mathjax', | ||
'sphinx.ext.autosummary', | ||
'nbsphinx', # Converts Jupyter Notebook to reStructuredText files for Sphinx. | ||
# For ipython directive in reStructuredText files. It is generated by the notebook. | ||
'IPython.sphinxext.ipython_console_highlighting' | ||
] | ||
|
||
# Links used globally in the RST files. | ||
# These are defined here to allow link substitutions dynamically. | ||
rst_epilog = """ | ||
.. |binder| replace:: Live Notebook | ||
.. _binder: https://mybinder.org/v2/gh/apache/spark/{0}?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart.ipynb | ||
.. |examples| replace:: Examples | ||
.. _examples: https://github.com/apache/spark/tree/{0}/examples/src/main/python | ||
""".format(os.environ.get("RELEASE_TAG", "master")) | ||
|
||
# Add any paths that contain templates here, relative to this directory. | ||
templates_path = ['_templates'] | ||
|
||
|
@@ -84,7 +96,7 @@ | |
|
||
# List of patterns, relative to source directory, that match files and | ||
# directories to ignore when looking for source files. | ||
exclude_patterns = ['_build'] | ||
exclude_patterns = ['_build', '.DS_Store', '**.ipynb_checkpoints'] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we really need to include the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's an excluding pattern :D. |
||
|
||
# The reST default role (used for this markup: `text`) to use for all | ||
# documents. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,3 +20,7 @@ | |
Getting Started | ||
=============== | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
quickstart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a
requirements.txt
somewhere instead?