-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Home
Edwin Chan edited this page Sep 5, 2022
·
12 revisions
- Installing system requirements (spark, java, anaconda)
Mac
- Use this guide to install java, spark and anaconda on a m1 mac https://mungingdata.com/pyspark/install-delta-lake-jupyter-conda-mac/
Linux or Windows with WSL
export SPARK_VERSION=3.2.0
export SPARK_DIRECTORY=/opt/spark
export HADOOP_VERSION=2.7
mkdir -p ${SPARK_DIRECTORY}
sudo apt-get update
sudo apt-get -y install openjdk-8-jdk
curl https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
--output ${SPARK_DIRECTORY}/spark.tgz
cd ${SPARK_DIRECTORY} && tar -xvzf spark.tgz && mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark
- Installing python library requirements in a conda env. Pull [spark-branch](https://github.com/ydataai/pandas-profiling/tree/spark-branch) and run
conda env create -f venv/spark.yml
This creates your conda env for spark called spark-env with all requirements packed inside
then activate the environment using
source activate spark-env
- Finally, run the command which should execute and provide profiling for some spark data
tests/backends/spark_backend/example.py
Don’t worry about any errors you see for now - as long as the report builds properly.