Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .azure-pipelines/gpu-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,11 @@ jobs:
displayName: 'Install dependencies'

- bash: |
set -e
python requirements/collect_env_details.py
python -c "import torch ; mgpu = torch.cuda.device_count() ; assert mgpu >= 2, f'GPU: {mgpu}'"
python requirements/check-avail-strategies.py
python requirements/check-avail-extras.py
displayName: 'Env details'

- bash: |
Expand Down
4 changes: 2 additions & 2 deletions .azure-pipelines/ipu-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,8 @@ jobs:
export GIT_TERMINAL_PROMPT=1
python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if 'fairscale' not in line] ; open(fname, 'w').writelines(lines)"
python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if 'horovod' not in line] ; open(fname, 'w').writelines(lines)"
python ./requirements/adjust_versions.py requirements/extra.txt
python ./requirements/adjust_versions.py requirements/examples.txt
python ./requirements/adjust-versions.py requirements/extra.txt
python ./requirements/adjust-versions.py requirements/examples.txt
pip install . --requirement requirements/devel.txt
pip list
displayName: 'Install dependencies'
Expand Down
2 changes: 1 addition & 1 deletion .github/BECOMING_A_CORE_CONTRIBUTOR.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Here, we describe general expectations from core contributors:

### Pull Requests (PRs)

- Pull requests are the evolutionary mechanism of Lightning, so quality is extremely important. Make sure contributors adhere to the guidelines described in the [contributing section](CONTRIBUTING.md#Pull-request).
- Pull requests are the evolutionary mechanism of Lightning, so quality is extremely important. Make sure contributors adhere to the guidelines described in the [contributing section](CONTRIBUTING.md#Pull-Request).

- Some PRs are from people who want to get involved and try to add something unnecessary. We do want their help though! So don’t approve the PR, but direct them to a Github issue that they might be interested in helping with instead!

Expand Down
13 changes: 9 additions & 4 deletions .github/workflows/ci_test-conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.head_ref }}
cancel-in-progress: ${{ ! (github.ref == 'refs/heads/master' || startsWith(github.ref, 'refs/heads/release/')) }}

defaults:
run:
shell: bash -l {0}

jobs:
conda:
runs-on: ubuntu-20.04
Expand All @@ -33,12 +37,14 @@ jobs:
conda info
conda list
# adjust versions according installed Torch version
python ./requirements/adjust_versions.py requirements/extra.txt
python ./requirements/adjust_versions.py requirements/examples.txt
pip install --requirement requirements/devel.txt --find-links https://download.pytorch.org/whl/test/torch_test.html
python ./requirements/adjust-versions.py requirements/extra.txt
python ./requirements/adjust-versions.py requirements/examples.txt
pip install --requirement requirements/devel.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html
# set a per-test timeout of 2.5 minutes to fail sooner. this aids with hanging tests
pip install pytest-timeout
pip list
# sanity check
python requirements/check-avail-extras.py

- name: Pull checkpoints from S3
working-directory: ./legacy
Expand All @@ -51,7 +57,6 @@ jobs:
- name: Tests
run: |
coverage run --source pytorch_lightning -m pytest --timeout 150 pytorch_lightning tests -v --durations=50 --junitxml=junit/test-results-${{ runner.os }}-torch${{ matrix.pytorch-version }}.xml
shell: bash -l {0}

- name: Upload pytest results
uses: actions/upload-artifact@v2
Expand Down
16 changes: 7 additions & 9 deletions .github/workflows/ci_test-full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,13 +44,6 @@ jobs:
run: echo "::set-output name=period::$(python -c 'import time ; days = time.time() / 60 / 60 / 24 ; print(int(days / 7))' 2>&1)"
id: times

- name: Upgrade pip
run: |
python --version
# needed for `pip cache` command
pip install --upgrade pip --user
pip --version

# Github Actions: Run step on specific OS: https://stackoverflow.com/a/57948488/4521646
- name: Setup macOS
if: runner.os == 'macOS'
Expand Down Expand Up @@ -96,7 +89,7 @@ jobs:
url=$(python -c "print('test/cpu/torch_test.html' if '${{matrix.release}}' == 'pre' else 'cpu/torch_stable.html')" 2>&1)
pip install --requirement requirements.txt --upgrade $flag --find-links "https://download.pytorch.org/whl/${url}"
# adjust versions according installed Torch version
python ./requirements/adjust_versions.py requirements/examples.txt
python ./requirements/adjust-versions.py requirements/examples.txt
pip install --requirement requirements/examples.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --upgrade
pip install --requirement requirements/test.txt --upgrade
pip list
Expand All @@ -109,7 +102,7 @@ jobs:
HOROVOD_WITHOUT_TENSORFLOW: 1
run: |
# adjust versions according installed Torch version
python ./requirements/adjust_versions.py requirements/extra.txt
python ./requirements/adjust-versions.py requirements/extra.txt
pip install --requirement ./requirements/extra.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --upgrade
pip list
shell: bash
Expand All @@ -128,6 +121,7 @@ jobs:
pip install --no-cache-dir -r requirements/horovod.txt
fi
horovodrun --check-build
python -c "import horovod.torch"
shell: bash

- name: Cache datasets
Expand All @@ -136,6 +130,10 @@ jobs:
path: Datasets
key: pl-dataset

- name: Sanity check
run: |
python requirements/check-avail-extras.py

- name: Tests
run: |
# NOTE: do not include coverage report here, see: https://github.com/nedbat/coveragepy/issues/1003
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci_test-slow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ jobs:
- name: Install dependencies
run: |
# adjust versions according installed Torch version
python ./requirements/adjust_versions.py requirements.txt ${{ matrix.pytorch-version }}
python ./requirements/adjust-versions.py requirements.txt ${{ matrix.pytorch-version }}
pip install --requirement requirements.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --upgrade
pip install --requirement requirements/test.txt
pip list
Expand Down
45 changes: 36 additions & 9 deletions .github/workflows/code-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,40 @@ concurrency:
jobs:
mypy:
runs-on: ubuntu-20.04
#strategy:
# fail-fast: false
# matrix:
# include:
# - {python-version: "3.8", pytorch-version: "1.8"}
# - {python-version: "3.9", pytorch-version: "1.10"}
steps:
- uses: actions/checkout@master
- uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Install dependencies
run: |
pip install '.[dev]'
pip list
- run: mypy --install-types --non-interactive
- uses: actions/checkout@master
- uses: actions/setup-python@v2
with:
# python-version: ${{ matrix.python-version }}
python-version: 3.9

# Note: This uses an internal pip API and may not always work
# https://github.com/actions/cache/blob/master/examples.md#multiple-oss-in-a-workflow
- name: Cache pip
uses: actions/cache@v2
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-

- name: Install dependencies
env:
# TORCH_VERSION: ${{ matrix.pytorch-version }}
TORCH_VERSION: "1.10"
run: |
pip install "torch==$TORCH_VERSION" --find-links https://download.pytorch.org/whl/cpu/torch_stable.html
# adjust versions according installed Torch version
python ./requirements/adjust-versions.py requirements/extra.txt
python ./requirements/adjust-versions.py requirements/examples.txt
pip install '.[dev]' --upgrade-strategy only-if-needed --find-links https://download.pytorch.org/whl/cpu/torch_stable.html
pip list

- name: Type check
run: mypy --install-types --non-interactive
2 changes: 1 addition & 1 deletion .github/workflows/release-docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
- name: Publish Latest to Docker
uses: docker/build-push-action@v1.1.0
# only on releases and latest Python and PyTorch
if: matrix.python_version == "3.9" && matrix.pytorch_version == "1.10"
if: matrix.python_version == '3.9' && matrix.pytorch_version == '1.10'
with:
repository: pytorchlightning/pytorch_lightning
username: ${{ secrets.DOCKER_USERNAME }}
Expand Down
2 changes: 1 addition & 1 deletion dockers/base-conda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ ENV \
COPY ./requirements/extra.txt requirements-extra.txt
COPY ./requirements/examples.txt requirements-examples.txt
COPY ./requirements/test.txt requirements-test.txt
COPY ./requirements/adjust_versions.py requirements_adjust_versions.py
COPY ./requirements/adjust-versions.py requirements_adjust_versions.py
COPY ./.github/prune-packages.py requirements_prune_packages.py

RUN \
Expand Down
14 changes: 5 additions & 9 deletions dockers/base-cuda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ ARG CUDA_VERSION=10.2

FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu18.04

ARG BAGUA_CUDA_VERSION=102
ARG PYTHON_VERSION=3.9
ARG PYTORCH_VERSION=1.8

Expand Down Expand Up @@ -74,12 +73,13 @@ RUN \
python${PYTHON_VERSION} get-pip.py && \
rm get-pip.py && \

# Disable cache
# Disable cache \
export BAGUA_CUDA_VERSION=${CUDA_VERSION//"."/""} && \
pip config set global.cache-dir false && \
# set particular PyTorch version
python ./requirements/adjust_versions.py requirements.txt ${PYTORCH_VERSION} && \
python ./requirements/adjust_versions.py requirements/extra.txt ${PYTORCH_VERSION} && \
python ./requirements/adjust_versions.py requirements/examples.txt ${PYTORCH_VERSION} && \
python ./requirements/adjust-versions.py requirements.txt ${PYTORCH_VERSION} && \
python ./requirements/adjust-versions.py requirements/extra.txt ${PYTORCH_VERSION} && \
python ./requirements/adjust-versions.py requirements/examples.txt ${PYTORCH_VERSION} && \
python -c "print(' '.join([ln for ln in open('requirements/extra.txt').readlines() if 'horovod' in ln]))" > ./requirements/horovod.txt && \
python requirements/prune_packages.py requirements/extra.txt "horovod" && \
# Install all requirements
Expand Down Expand Up @@ -138,10 +138,6 @@ RUN \
pip install deepspeed==0.5.7 && \
python -c "import deepspeed; print(deepspeed.__version__)"

RUN \
# install Bagua
pip install bagua-cuda${BAGUA_CUDA_VERSION}==0.9.0

RUN \
# Show what we have
pip --version && \
Expand Down
4 changes: 2 additions & 2 deletions dockers/base-ipu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -75,13 +75,13 @@ ENV \

COPY ./requirements/extra.txt requirements-extra.txt
COPY ./requirements/test.txt requirements-test.txt
COPY ./requirements/adjust_versions.py adjust_versions.py
COPY ./requirements/adjust-versions.py adjust-versions.py
COPY ./.github/prune-packages.py prune_packages.py

RUN \
pip list | grep torch && \
python -c "import torch; print(torch.__version__)" && \
python adjust_versions.py requirements-extra.txt && \
python adjust-versions.py requirements-extra.txt && \
python prune_packages.py requirements-extra.txt "fairscale" "horovod" && \
# Install remaining requirements
pip install -r requirements-extra.txt --no-cache-dir && \
Expand Down
2 changes: 1 addition & 1 deletion dockers/base-xla/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ RUN \
python .github/prune-packages.py requirements/examples.txt "torchvision" && \
# drop unnecessary packages
python .github/prune-packages.py requirements/extra.txt "fairscale" "horovod" && \
python ./requirements/adjust_versions.py ./requirements/extra.txt && \
python ./requirements/adjust-versions.py ./requirements/extra.txt && \
# install PL dependencies
pip install --requirement ./requirements/devel.txt --no-cache-dir && \
cd .. && \
Expand Down
File renamed without changes.
7 changes: 7 additions & 0 deletions requirements/check-avail-extras.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# import gcsfs
import hydra # noqa: F401
import jsonargparse # noqa: F401
import matplotlib # noqa: F401
import omegaconf # noqa: F401
import rich # noqa: F401
import torchtext # noqa: F401
7 changes: 7 additions & 0 deletions requirements/check-avail-strategies.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
import bagua # noqa: F401
import deepspeed # noqa: F401
import fairscale # noqa: F401
import horovod.torch

# returns an error code
assert horovod.torch.nccl_built()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is failing and blocking master:

Extension horovod.torch has not been built: /usr/local/lib/python3.7/dist-packages/horovod/torch/mpi_lib/_mpi_lib.cpython-37m-x86_64-linux-gnu.so not found
If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.
Warning! MPI libs are missing, but python applications are still available.
Traceback (most recent call last):
  File "requirements/check-avail-strategies.py", line 7, in <module>
    assert horovod.torch.nccl_built()
AttributeError: module 'horovod.torch' has no attribute 'nccl_built'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so it seems it is working as expected :)
what kind of jobs is failing?