Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge latest bug fixes from release/1.3.x into main #1314

Merged
merged 38 commits into from
Jan 3, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
a6f7e75
add back sparse module
ClaudiaComito Mar 28, 2023
0ee2b38
bring back test_signal to pre-merge state
ClaudiaComito Mar 29, 2023
7677483
undo merge damage, part 2 of n
ClaudiaComito Mar 29, 2023
010eb61
undo merge damage 2 of 2(?)
ClaudiaComito Mar 29, 2023
c157b62
Merge branch 'main' into workflows/merge-release-into-main
ClaudiaComito Mar 29, 2023
b82da70
reinstate quick_start.md
ClaudiaComito Mar 29, 2023
952eac1
copy from fix/1168-update-docker-image-and-documentation-on-release13…
JuanPedroGHM Oct 31, 2023
5b37e05
corrected bug
Nov 7, 2023
2289eb3
docker scripts documentation
JuanPedroGHM Nov 14, 2023
198380c
Fix tzdata handling and merging multiple actions
bhagemeier Nov 21, 2023
e23d9d7
update pre-commit-config
ClaudiaComito Nov 22, 2023
ee39c63
Fix Pytorch release tracking workflows (#1264)
mtar Nov 22, 2023
3d3e8e8
Merge branch 'release/1.3.x' into bugs/1258-_Bug_Lasso_does_not_work_…
ClaudiaComito Nov 22, 2023
4a7b155
Merge branch 'release/1.3.x' into docker-release-update
ClaudiaComito Nov 22, 2023
ce2c3ef
Merge pull request #1257 from helmholtz-analytics/docker-release-update
mrfh92 Nov 22, 2023
e57f11e
Merge pull request #1267 from helmholtz-analytics/workflows/ci-matrix…
mrfh92 Nov 22, 2023
f1e0894
Merge pull request #1266 from helmholtz-analytics/workflows/update-pr…
mrfh92 Nov 22, 2023
a8cebca
Merge pull request #1259 from helmholtz-analytics/bugs/1258-_Bug_Lass…
mrfh92 Nov 22, 2023
94cd067
Fix `ht.diff` for 1-element-axis edge case (#1201)
mtar Nov 22, 2023
a1b0053
update version to 1.3.1 before release
ClaudiaComito Nov 23, 2023
e3af04b
revert
ClaudiaComito Nov 23, 2023
05325e2
Update version before release (#1274)
ClaudiaComito Nov 23, 2023
2de6410
Merge branch 'release/1.3.x' of github.com:helmholtz-analytics/heat i…
ClaudiaComito Nov 23, 2023
3db7af7
Update pytorch release PR workflow (#1286)
mtar Dec 6, 2023
d19f024
Pin `setup-mpi` version to 1.2.0 in CI matrix (#1313)
ClaudiaComito Dec 20, 2023
0d11791
Merge branch 'release/1.3.x' of github.com:helmholtz-analytics/heat i…
ClaudiaComito Dec 22, 2023
f077c20
Merge branch 'release/1.3.x' into workflows/merge-release-into-main
ClaudiaComito Dec 22, 2023
152239e
update version
ClaudiaComito Dec 22, 2023
656cef4
Merge branch 'main' into workflows/merge-release-into-main
ClaudiaComito Dec 22, 2023
b24a956
skip ihfftn tests for older torch versions
ClaudiaComito Dec 22, 2023
b008f53
add reason for skipping tests
ClaudiaComito Dec 22, 2023
6c3cbef
fix test skipping heuristics
ClaudiaComito Dec 22, 2023
fcd0218
raise NotImplementedError for ihfftn with torch<1.11
ClaudiaComito Dec 22, 2023
cce6057
fix check for ihfftn
ClaudiaComito Dec 22, 2023
968736e
raise error re: ihfftn support on older torch versions
ClaudiaComito Dec 22, 2023
ebadf7e
expand tests
ClaudiaComito Dec 22, 2023
1dd8d62
Apply suggestions from code review
mtar Jan 3, 2024
5bd3cb6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@ on:
heat_version:
description: 'Heat version'
required: true
default: '1.2.2'
default: 'latest'
type: string
pytorch_img:
description: 'Base PyTorch Img'
required: true
default: '23.03-py3'
default: '23.05-py3'
type: string
name:
description: 'Output Image name'
required: true
default: 'heat:1.2.2_torch1.13_cu12.1'
default: 'heat:1.3.0_torch2.0.0_cu12.1'
type: string
jobs:
build-and-push-img:
Expand Down Expand Up @@ -43,7 +43,7 @@ jobs:
name: Build
uses: docker/build-push-action@v4
with:
context: docker/
file: docker/Dockerfile.release
build-args: |
HEAT_VERSION=${{ inputs.heat_version }}
PYTORCH_IMG=${{ inputs.pytorch_img}}
Expand All @@ -59,7 +59,7 @@ jobs:
name: Build and push
uses: docker/build-push-action@v4
with:
context: docker/
file: docker/Dockerfile.release
build-args: |
HEAT_VERSION=${{ inputs.heat_version }}
PYTORCH_IMG=${{ inputs.pytorch_img}}
Expand Down
21 changes: 0 additions & 21 deletions docker/Dockerfile

This file was deleted.

18 changes: 18 additions & 0 deletions docker/Dockerfile.release
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
ARG HEAT_VERSION=latest
ARG PYTORCH_IMG=23.05-py3

FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base
COPY ./tzdata.seed /tmp/tzdata.seed
RUN debconf-set-selections /tmp/tzdata.seed
RUN apt update && DEBIAN_FRONTEND=noninteractive apt install -y build-essential openssh-client python3-dev git && apt clean && rm -rf /var/lib/apt/lists/*

FROM base AS release-install
ARG HEAT_VERSION
RUN pip install --upgrade pip
RUN pip install mpi4py --no-binary :all:
RUN echo ${HEAT_VERSION}
RUN if [[ ${HEAT_VERSION} =~ ^([1-9]\d*|0)(\.(([1-9]\d*)|0)){2}$ ]]; then \
pip install heat[hdf5,netcdf]==${HEAT_VERSION}; \
else \
pip install heat[hdf5,netcdf]; \
fi
13 changes: 13 additions & 0 deletions docker/Dockerfile.source
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
ARG PYTORCH_IMG=23.05-py3
ARG HEAT_BRANCH=main

FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base
COPY ./tzdata.seed /tmp/tzdata.seed
RUN debconf-set-selections /tmp/tzdata.seed
RUN apt update && DEBIAN_FRONTEND=noninteractive apt install -y build-essential openssh-client python3-dev git && apt clean && rm -rf /var/lib/apt/lists/*

FROM base AS source-install
ARG HEAT_BRANCH
RUN pip install --upgrade pip
RUN git clone -b ${HEAT_BRANCH} https://github.com/helmholtz-analytics/heat.git
RUN pip install mpi4py --no-binary :all: && pushd heat && pip install .[hdf5,netcdf] && popd && rm -rf heat
51 changes: 30 additions & 21 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,30 +2,35 @@

There is some flexibility to building the Docker images of Heat.

Firstly, one can build from the released version taken from PyPI. This will either be
the latest release or the version set through the `--build-arg=HEAT_VERSION=1.2.0`
Firstly, one can build from the released version taken from PyPI using `Dockerfile.release`. This will either be
the latest release or the version set through the `--build-arg HEAT_VERSION=X.Y.Z`
argument.

Secondly one can build a docker image from the GitHub sources, selected through
`--build-arg=INSTALL_TYPE=source`. The default branch to be built is main, other
branches can be specified using `--build-arg=HEAT_BRANCH=branchname`.
Secondly one can build a docker image from the GitHub sources, by building using `Dockerfile.source`. The default branch to be built is main, other
branches can be specified using `--build-arg HEAT_BRANCH=<branch-name>`.

## General build

### Docker

The [Dockerfile](./Dockerfile) guiding the build of the Docker image is located in this
directory. It is typically most convenient to `cd` over here and run the Docker build as:
The [Dockerfile](./Dockerfile.release or ./Dockerfile.source) guiding the build of the Docker image is located in this directory. It is typically most convenient to `cd` to the `docker` directory and run the build command as:

```console
$ docker build --build-args HEAT_VERSION=1.2.2 --PYTORCH_IMG=22.05-py3 -t heat:local .
$ docker build -t heat:latest -f Dockerfile.source .
```

We also offer prebuilt images in our [Package registry](https://github.com/helmholtz-analytics/heat/pkgs/container/heat) from which you can pull existing images:
Or optionally, using a particular version and pytorch base image:

```console
$ docker build --build-arg HEAT_VERSION=X.Y.Z --build-arg PYTORCH_IMG=<nvcr-tag> -t heat:X.Y.Z -f Dockerfile.release .
```

The heat image is based on the nvidia pytorch container. You can find exisiting tags in the [nvidia container catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch/tags).

We also offer prebuilt images in our [Package registry](https://github.com/helmholtz-analytics/heat/pkgs/container/heat) from which you can pull existing images:

```console
$ docker pull ghcr.io/helmholtz-analytics/heat:1.2.0-dev_torch1.12_cuda11.7_py3.8
$ docker pull ghcr.io/helmholtz-analytics/heat:<version-tag>
```

### Building for HPC
Expand All @@ -38,45 +43,45 @@ image also for HPC systems, such as the ones available at [Jülich Supercomputin

To use one of the existing images from our registry:

$ apptainer build heat.sif docker://ghcr.io/helmholtz-analytics/heat:1.2.0-dev_torch1.12_cuda11.7_py3.8
$ apptainer build heat.sif docker://ghcr.io/helmholtz-analytics/heat:<version-tag>

Building the image can require root access in some systems. If that is the case, we recomend build the image on a local machine, and then upload it to the desired HPC system.
Building the image can require root access in some systems. If that is the case, we recommend building the image on a local machine, and then upload it to the desired HPC system.

If you see an error indicating that there is not enough space, use the --tmpdir flag of the build command. [Apptainer docs](https://apptainer.org/docs/user/latest/build_a_container.html)

#### SIB (Singularity Image Builder)
#### SIB (Singularity Image Builder) for Apptainer images

A simple `Dockerfile` (in addition to the one above) to be used with SIB could look like
this:

FROM ghcr.io/helmholtz-analytics/heat:1.2.0_torch1.12_cuda11.7_py3.8
FROM ghcr.io/helmholtz-analytics/heat:<version-tag>

The invocation to build the image would be:

$ sib upload ./Dockerfile heat_1.2.0_torch1.12_cuda11.7_py3.8
$ sib build --recipe-name heat_1.2.0_torch1.12_cuda11.7_py3.8
$ sib download --recipe-name heat_1.2.0_torch1.12_cuda11.7_py3.8
$ sib upload ./Dockerfile heat
$ sib build --recipe-name heat
$ sib download --recipe-name heat

However, SIB is capable of using just about any available Docker image from any
registry, such that a specific Singularity image can be built by simply referencing the
available image. SIB is thus used as a conversion tool.

## Running on HPC

$ singularity run --nv heat_1.2.0_torch.11_cuda11.5_py3.9.sif /bin/bash
$ apptainer run --nv heat /bin/bash
$ python
Python 3.8.13 (default, Mar 28 2022, 11:38:47)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import heat as ht
...

The `--nv` argument to `singularity`enables NVidia GPU support, which is desired for
The `--nv` argument to `apptainer` enables NVidia GPU support, which is desired for
Heat.

### Multi-node example

The following file can be used as an example to use the singularity file together with SLURM, which allows heat to work in a multi-node environment.
The following file can be used as an example to use the apptainer file together with SLURM, which allows heat to work in a multi-node environment.

```bash
#!/bin/bash
Expand All @@ -86,5 +91,9 @@ The following file can be used as an example to use the singularity file togethe

...

srun --mpi="pmi2" singularity exec --nv heat_1.2.0_torch.11_cuda11.5_py3.9.sif bash -c "cd ~/code/heat/examples/lasso; python demo.py"
srun --mpi="pmi2" apptainer exec --nv heat_1.2.0_torch.11_cuda11.5_py3.9.sif bash -c "cd ~/code/heat/examples/lasso; python demo.py"
```

## Scripts

The scripts folder has a small collection of helper scripts to automate certain tasks, primarly meant for heat developers. Explanations are given at the top of the script.
68 changes: 68 additions & 0 deletions docker/scripts/build_and_push.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
#!/bin/bash
### As the name suggests, this script is meant for the HeAT developers to quickly build a new Docker image with the specified HeAT version, and Pytorch IMG version. The arguments TORCH_VERSION, CUDA_VERSION, and PYTHON_VERSION should indicated the versions of thouse libraries found on the pytorch image from nvidia, and used only to create the image tag.
# If you want to upload the image to the github package registry, use the '--upload' option. You need be logged in to the registry. Instructions here: https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#authenticating-to-the-container-registry

GHCR_UPLOAD=false

while [[ $# -gt 0 ]]; do
case $1 in
--heat-version)
HEAT_VERSION="$2"
shift # past argument
shift # past value
;;
--pytorch-img)
PYTORCH_IMG="$2"
shift # past argument
shift # past value
;;
--torch-version)
TORCH_VERSION="$2"
shift # past argument
shift # past value
;;
--cuda-version)
CUDA_VERSION="$2"
shift # past argument
shift # past value
;;
--python-version)
PYTHON_VERSION="$2"
shift # past argument
shift # past value
;;
--upload)
GHCR_UPLOAD=true
shift
shift
;;
-*|--*)
echo "Unknown option $1"
exit 1
;;
*)
esac
done

echo "HEAT_VERSION=$HEAT_VERSION"
echo "PYTORCH_IMG=$PYTORCH_IMG"
echo "TORCH_VERSION=$TORCH_VERSION"
echo "CUDA_VERSION=$CUDA_VERSION"
echo "PYTHON_VERSION=$PYTHON_VERSION"


ghcr_tag="ghcr.io/helmholtz-analytics/heat:${HEAT_VERSION}_torch${TORCH_VERSION}_cu${CUDA_VERSION}_py${PYTHON_VERSION}"

echo "Building image $ghcr_tag"

docker build --file ../Dockerfile.release \
--build-arg HEAT_VERSION=$HEAT_VERSION \
--build-arg PYTORCH_IMG=$PYTORCH_IMG \
--tag $ghcr_tag \
.

if [ $GHCR_UPLOAD = true ]; then
echo "Push image"
echo "You might need to log in into ghcr.io (https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#authenticating-to-the-container-registry)"
docker push $ghcr_tag
fi
21 changes: 21 additions & 0 deletions docker/scripts/install_print_test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash
# Scripts to quickly obtain all relevant information out of a new nvidia pytorch container. Run it inside a pytorch container from nvidia and it will first print the software stack (cuda version, torch version, ...), install heat from source, and run the heat unit tests. Usefull to quickly check if a container is compatible with heat.

# Container setup
apt update && DEBIAN_FRONTEND=noninteractive apt install -y build-essential openssh-client python3-dev git && apt clean && rm -rf /var/lib/apt/lists/*

# Print environment
pip list | grep torch
python --version
nvcc --version
mpirun --version

# Install heat from source.
git clone https://github.com/helmholtz-analytics/heat.git
cd heat
pip install --upgrade pip
pip install mpi4py --no-binary :all:
pip install .[netcdf,hdf5,dev]

# Run tests
HEAT_TEST_USE_DEVICE=gpu mpirun -n 1 pytest heat/
19 changes: 19 additions & 0 deletions docker/scripts/test_nvidia_image_haicore_enroot.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash
# Example SLURM/ENROOT script. It will mount the container using enroot, and then run the test script to test the compatibility of the image with the source version of heat.

# Clear environment, else mpi4py will fail to install.
ml purge

SBATCH_PARAMS=(
--partition normal
--time 00:10:00
--nodes 1
--tasks-per-node 1
--gres gpu:1
--container-image ~/containers/nvidia+pytorch+23.05-py3.sqsh
--container-writable
--container-mounts /etc/slurm/task_prolog.hk:/etc/slurm/task_prolog.hk,/scratch:/scratch
--container-mount-home
)

sbatch "${SBATCH_PARAMS[@]}" ./install_print_test.sh
2 changes: 1 addition & 1 deletion docker/singularity-dockerfile.sample
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# This is a sample file to use with the Singularity image builder
FROM ghcr.io/helmholtz-analytics/heat:1.2.0_torch1.11_cuda11.5_py3.9
FROM ghcr.io/helmholtz-analytics/heat:1.3.0_torch1.12_cuda11.7_py3.8
7 changes: 4 additions & 3 deletions quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,19 +35,20 @@ pip install heat[hdf5,netcdf]
Get the docker image from our package repository

```
docker pull ghcr.io/helmholtz-analytics/heat:1.2.0-dev_torch1.12_cuda11.7_py3.8
docker pull ghcr.io/helmholtz-analytics/heat:<version-tag>
```

or build it from our Dockerfile

```
git clone https://github.com/helmholtz-analytics/heat.git
cd heat/docker
docker build -t heat:latest .
docker build --build-arg HEAT_VERSION=X.Y.Z --build-arg PYTORCH_IMG=<nvcr-tag> -t heat:X.Y.Z .
```

See [our docker README](https://github.com/helmholtz-analytics/heat/tree/main/docker/README.md) for other details.
`<nvcr-tag>` should be replaced with an existing version of the official Nvidia pytorch container image. Information and existing tags can be found on the [here](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)

See [our docker README](https://github.com/helmholtz-analytics/heat/tree/main/docker/README.md) for other details.

### Test
In your terminal, test your setup with the [`heat_test.py`](https://github.com/helmholtz-analytics/heat/blob/main/scripts/heat_test.py) script:
Expand Down
Loading