- 
                Notifications
    You must be signed in to change notification settings 
- Fork 0
PySpark Notebook Setup Guide
        Chaokun Yang edited this page Oct 21, 2018 
        ·
        1 revision
      
    - Download anaconda
wget https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh 
- Install
Reference: https://conda.io/docs/user-guide/install/macos.html#install-macos-silent
sh Anaconda3-5.2.0-Linux-x86_64.sh -b -p . -f
conda create -n bigdata python=3.6 anaconda
source activate bigdata
# useful to have nice tables of contents in the notebooks, but they are not required.
conda install -n bigdata -c conda-forge jupyter_contrib_nbextensions
# If you want to use the Jupyter extensions (optional, they are mainly useful to have nice tables of contents), you first need to install them:
jupyter contrib nbextension install --user
# Then you can activate an extension, such as the Table of Contents (2) extension:
jupyter nbextension enable toc2/main
# Okay! You can now start Jupyter, simply type:
jupyter notebookNote: you can also visit http://localhost:8888/nbextensions to activate and configure Jupyter extensions.
# configure jupyter and prompt for password
jupyter notebook --generate-config
jupass=`python -c "from notebook.auth import passwd; print(passwd())"`
echo "c.NotebookApp.password = u'"$jupass"'" >> $HOME/.jupyter/jupyter_notebook_config.py
echo "c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False" >> $HOME/.jupyter/jupyter_notebook_config.pyThere are two ways to get PySpark available in a Jupyter Notebook:
- 
Configure PySpark driver to use Jupyter Notebook: running pysparkwill automatically open a Jupyter Notebooksource activate bigdata export PYSPARK_DRIVER_PYTHON=jupyter export PYSPARK_DRIVER_PYTHON_OPTS='notebook' # specify python used by spark cluster export PYSPARK_PYTHON=`which python` pyspark --master yarn 
- 
Load a regular Jupyter Notebook and load PySpark using findSpark package use findSparkpackage to make a Spark Context available in your code.- Install findspark
 pip install findspark - 
Start jupyter notebookjupyter notebook 
- 
Run following code in notebookimport findspark findspark.init() # or findspark.init('/path/to/spark_home') import pyspark sc = pyspark.SparkContext(appName="myAppName") 
 
- Install 
https://blog.sicara.com/get-started-pyspark-jupyter-guide-tutorial-ae2fe84f594f