Skip to content

Commit

Permalink
[SPARK-32363][PYTHON][BUILD] Fix flakiness in pip package testing in …
Browse files Browse the repository at this point in the history
…Jenkins

This PR proposes:

- Don't use `--user` in pip packaging test
- Pull `source` out of the subshell, and place it first.
- Exclude user sitepackages in Python path during pip installation test

to address the flakiness of the pip packaging test in Jenkins.

(I think) apache#29116 caused this flakiness given my observation in the Jenkins log. I had to work around by specifying `--user` but it turned out that it does not properly work in old Conda on Jenkins for some reasons. Therefore, reverting this change back.

(I think) the installation at user site-packages affects other environments created by Conda in the old Conda version that Jenkins has. Seems it fails to isolate the environments for some reasons. So, it excludes user sitepackages in the Python path during the test.

In addition, apache#29116 also added some fallback logics of `conda (de)activate` and `source (de)activate` because Conda prefers to use `conda (de)activate` now per the official documentation and `source (de)activate` doesn't work for some reasons in certain environments (see also conda/conda#7980). The problem was that `source` loads things to the current shell so does not affect the current shell. Therefore, this PR pulls `source` out of the subshell.

Disclaimer: I made the analysis purely based on Jenkins machine's log in this PR. It may have a different reason I missed during my observation.

To make the build and tests pass in Jenkins.

No, dev-only.

Jenkins tests should test it out.

Closes apache#29117 from HyukjinKwon/debug-conda.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
  • Loading branch information
HyukjinKwon committed Aug 19, 2020
1 parent 4762cd5 commit aff7106
Showing 1 changed file with 7 additions and 6 deletions.
13 changes: 7 additions & 6 deletions dev/run-pip-tests
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,15 @@ fi
PYSPARK_VERSION=$(python3 -c "exec(open('python/pyspark/version.py').read());print(__version__)")
PYSPARK_DIST="$FWDIR/python/dist/pyspark-$PYSPARK_VERSION.tar.gz"
# The pip install options we use for all the pip commands
PIP_OPTIONS="--user --upgrade --no-cache-dir --force-reinstall "
PIP_OPTIONS="--upgrade --no-cache-dir --force-reinstall"
# Test both regular user and edit/dev install modes.
PIP_COMMANDS=("pip install $PIP_OPTIONS $PYSPARK_DIST"
"pip install $PIP_OPTIONS -e python/")

# Jenkins has PySpark installed under user sitepackages shared for some reasons.
# In this test, explicitly exclude user sitepackages to prevent side effects
export PYTHONNOUSERSITE=1

for python in "${PYTHON_EXECS[@]}"; do
for install_command in "${PIP_COMMANDS[@]}"; do
echo "Testing pip installation with python $python"
Expand All @@ -86,7 +90,7 @@ for python in "${PYTHON_EXECS[@]}"; do
source "$CONDA_PREFIX/etc/profile.d/conda.sh"
fi
conda create -y -p "$VIRTUALENV_PATH" python=$python numpy pandas pip setuptools
conda activate "$VIRTUALENV_PATH" || (echo "Falling back to 'source activate'" && source activate "$VIRTUALENV_PATH")
source activate "$VIRTUALENV_PATH" || (echo "Falling back to 'conda activate'" && conda activate "$VIRTUALENV_PATH")
else
mkdir -p "$VIRTUALENV_PATH"
virtualenv --python=$python "$VIRTUALENV_PATH"
Expand All @@ -101,8 +105,6 @@ for python in "${PYTHON_EXECS[@]}"; do
cd "$FWDIR"/python
# Delete the egg info file if it exists, this can cache the setup file.
rm -rf pyspark.egg-info || echo "No existing egg info file, skipping deletion"
# Also, delete the symbolic link if exists. It can be left over from the previous editable mode installation.
python -c "from distutils.sysconfig import get_python_lib; import os; f = os.path.join(get_python_lib(), 'pyspark.egg-link'); os.unlink(f) if os.path.isfile(f) else 0"
python setup.py sdist


Expand All @@ -121,7 +123,6 @@ for python in "${PYTHON_EXECS[@]}"; do
cd /

echo "Run basic sanity check on pip installed version with spark-submit"
export PATH="$(python3 -m site --user-base)/bin:$PATH"
spark-submit "$FWDIR"/dev/pip-sanity-check.py
echo "Run basic sanity check with import based"
python "$FWDIR"/dev/pip-sanity-check.py
Expand All @@ -132,7 +133,7 @@ for python in "${PYTHON_EXECS[@]}"; do

# conda / virtualenv environments need to be deactivated differently
if [ -n "$USE_CONDA" ]; then
conda deactivate || (echo "Falling back to 'source deactivate'" && source deactivate)
source deactivate || (echo "Falling back to 'conda deactivate'" && conda deactivate)
else
deactivate
fi
Expand Down

0 comments on commit aff7106

Please sign in to comment.