I jotted down the steps to setup Apache Spark and running PySpark in Jupyter notebook on my local machine (Mac).
-
Download Spark - http://spark.apache.org/downloads.html
-
Unzip -
$ tar -xzf spark-x.y.z-bin-hadoopx.y.tgz
-
Copy to standard location (On Mac) -
mv spark-x.y.z-bin-hadoopx.y /usr/local/Cellar/spark/x.y.z
-
Create symbolic link -
$ ln -s /usr/local/Cellar/spark/x.y.z /usr/local/opt/spark
-
Set
SPARK_HOME
environment variable and add Spark to $PATH
$ echo 'export SPARK_HOME=/usr/local/opt/spark' >> ~/.bash_profile
$ echo 'export PATH="$SPARK_HOME/bin:$PATH"' >> ~/.bash_profile
$ source ~/.bash_profile
-
$ pip3 install jupyter
-
Update the PySpark driver environment variables
$ echo 'export PYSPARK_DRIVER_PYTHON=jupyter' >> ~/.bash_profile
$ echo 'export PYSPARK_DRIVER_PYTHON_OPTS=notebook' >> ~/.bash_profile
$ echo 'export PYSPARK_PYTHON=python3' >> ~/.bash_profile
$ source ~/.bash_profile
$ pyspark
$ jupyter notebook
You will be redirected to http://localhost:8888/tree
Note: So far what I've done, can be achieved quickly using the Jupyter Docker Stacks
$ docker run -it --rm -p 8888:8888 jupyter/pyspark-notebook
Note: Jupyter might fail to work on a system with Java 1.8+. I had Java 1.13.0 and I was getting error: Exception: Java gateway process exited before sending its port number. Multiple JDKs can be managed the following way..
Manage multiple JDKs through jEnv
- Install
$ brew install jenv
- Add to PATH
$ echo 'export PATH="$HOME/.jenv/bin:$PATH"' >> ~/.bash_profile
$ echo 'eval "$(jenv init -)"' >> ~/.bash_profile
$ source ~/.bash_profile
- Add multiple JDKs to jenv
$ jenv add /Library/Java/JavaVirtualMachines/openjdk64-1.8.0.241.jdk/Contents/Home
$ jenv add /Library/Java/JavaVirtualMachines/openjdk64-1.8.0.242.jdk/Contents/Home
$ jenv add /Library/Java/JavaVirtualMachines/openjdk64-13.0.2.jdk/Contents/Home
- List all JDKs
$ jenv versions
- To configure java version, say 1.8.0.242, for a particular project
$ cd /path/to/project/dir/
$ jenv local 1.8.0.242
- Verify java version for that project
$ java --version