From 68d93eec347ab7de4d13233674da477c0856440c Mon Sep 17 00:00:00 2001
From: ClaudiaComito <c.comito@fz-juelich.de@users.noreply.github.com>
Date: Tue, 1 Nov 2022 14:45:58 +0000
Subject: [PATCH 01/57] New PyTorch release

---
 .github/pytorch-release-versions/pytorch-latest.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/pytorch-release-versions/pytorch-latest.txt b/.github/pytorch-release-versions/pytorch-latest.txt
index f8f4f03b3d..feaae22bac 100644
--- a/.github/pytorch-release-versions/pytorch-latest.txt
+++ b/.github/pytorch-release-versions/pytorch-latest.txt
@@ -1 +1 @@
-1.12.1
+1.13.0

From 771a5570f839c9bc754fe20ef3d505cbdeb5087b Mon Sep 17 00:00:00 2001
From: Ashwath V A <73862377+Mystic-Slice@users.noreply.github.com>
Date: Thu, 3 Nov 2022 16:14:58 +0530
Subject: [PATCH 02/57] `ht.array`, closed loophole allowing `DNDarray`
 construction with incompatible shapes of local arrays (#1034)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* Replace bug report MD template with form in view of further automation

* Fix bug report file name

* Update bug_report.yml

* Update bug_report.yml

* Update bug_report.yml

* Update bug_report.yml

* Auto generated release notes and changelog (#974)

* wip: Initial release draft and changelog updater actions configuration

* doc: pr title style guide in contibuting.md

* ci: improved release draft templates

* ci: extra release draft categories

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Tutorial note about local and global printing (#972)

* doc: parallel tutorial note metioning local and global printing

* doc: extenden local print note with ``ht.local_printing()``

* Fix typo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Updated the tutorial document. (#977)

* Updated the tutorial document.

1. Corrected the spelling mistake -> (sigular to single)
2. Corrected the statement -> the number of dimensions is the rank of the array.
3. Made 2 more small changes.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo

Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Set write permissions for workflow

* Update schedule

* Update schedule

* Update schedule

* Move pytorch version file out of workflows dir

* Update paths

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/psf/black: 22.3.0 → 22.6.0](https://github.com/psf/black/compare/22.3.0...22.6.0)

* Push pytorch release update to release/1.2.x branch, not main

* Update schedule

* Bypass  `on push` trigger

* Update schedule

* Fix condition syntax

* Fix syntax

* On push trigger workaround

* Update schedule

* Update schedule

* Enable non-negative sample size

* Read `min` value directly from torch return object

* Enable non-negative number of samples for `logspace`

* Add test for `logspace`

* Add MPI version field to bug report template

* fix: set cuda rng state on gpu tests for test_random.py (#1014)

* Test latest pyorch on both main and release branch

* Move pytorch release record out of workflows directory

* Update paths

* New PyTorch release

* Temporarily remove trigger

* Update pytorch-latest.txt

* Reinstate trigger

* New PyTorch release

* Remove matrix strategy

* Update pytorch-latest.txt

* New PyTorch release

* New PyTorch release

* fix: set cuda rng state on gpu tests for test_random.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added tests for python 3.9 and pytorch 1.12

Co-authored-by: Claudia Comito <c.comito@fz-juelich.de>
Co-authored-by: Daniel Coquelin <daniel.coquelin@gmail.com>
Co-authored-by: ClaudiaComito <c.comito@fz-juelich.de@users.noreply.github.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [pre-commit.ci] pre-commit autoupdate (#1024)

updates:
- [github.com/psf/black: 22.6.0 → 22.8.0](https://github.com/psf/black/compare/22.6.0...22.8.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>

* Refactored code for readability

* rename file and activate force push

* Update bug_report.yml

fixes formatting issues

* Update bug_report.yml

fixes an issue where the bug label is not set.

* Update README.md

Use status badge from a different workflow action

* Update codecov.yml

* Update codecov.yml

* Fixed code checking for non-matching local shapes while using is_split + Added test

* Add section `Google Summer of Code 2022`

* Bug/1017 `prod` / `sum` with empty arrays (#1018)

* Check for split in `__reduce_op`

* Check whether x is distributed

Co-authored-by: mtar <m.tarnawa@fz-juelich.de>

Co-authored-by: mtar <m.tarnawa@fz-juelich.de>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>

* Add section "Array API"

* Mirror Repository and run GitHub CI at HZDR (#1032)

* Update ci worflow action

* Update codecov.yml

* Bug/999 Fix `keepdim` in `any`/`all` (#1000)

* Fix `all`

* Fix `any`

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add distributed tests

* Expanded tests for combination of axis/split axis

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: mtar <m.tarnawa@fz-juelich.de>

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/psf/black: 22.8.0 → 22.10.0](https://github.com/psf/black/compare/22.8.0...22.10.0)

* Updated error message

Co-authored-by: Claudia Comito <c.comito@fz-juelich.de>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: JuanPedroGHM <juanpedroghm@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SaiSuraj27 <87087741+SaiSuraj27@users.noreply.github.com>
Co-authored-by: neosunhan <neosunhan@gmail.com>
Co-authored-by: Markus Goetz <markus.goetz@kit.edu>
Co-authored-by: mtar <m.tarnawa@fz-juelich.de>
Co-authored-by: Daniel Coquelin <daniel.coquelin@gmail.com>
Co-authored-by: ClaudiaComito <c.comito@fz-juelich.de@users.noreply.github.com>
Co-authored-by: neosunhan <97215518+neosunhan@users.noreply.github.com>
---
 .github/ISSUE_TEMPLATE/bug_report.yml         |  9 ++--
 .github/PULL_REQUEST_TEMPLATE.md              |  3 --
 .github/release-drafter.yml                   |  4 ++
 .../workflows/{mirrorci.yml => ci_cpu.yml}    | 10 ++--
 .gitlab-ci.yml                                | 42 +++++++++------
 .pre-commit-config.yaml                       |  2 +-
 README.md                                     |  4 +-
 codecov.yml                                   | 15 ------
 heat/core/_operations.py                      |  2 +-
 heat/core/factories.py                        | 24 +++++----
 heat/core/logical.py                          |  6 +++
 heat/core/tests/test_arithmetics.py           |  8 +++
 heat/core/tests/test_factories.py             |  6 +++
 heat/core/tests/test_logical.py               | 52 +++++++++++++++++++
 setup.py                                      |  2 +-
 15 files changed, 131 insertions(+), 58 deletions(-)
 rename .github/workflows/{mirrorci.yml => ci_cpu.yml} (58%)

diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml
index d09661121b..707a87bc1e 100644
--- a/.github/ISSUE_TEMPLATE/bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -1,7 +1,7 @@
 name: Bug Report
 description: File a bug report
 title: "[Bug]: "
-labels: ["bug"]
+labels: ["bug :bug:"]
 
 body:
   - type: markdown
@@ -44,18 +44,19 @@ body:
       label: Python version
       description: What Python version?
       options:
+        - 3.7
         - 3.8
         - 3.9
-        - 3.10
-        - 3.7
+        - "3.10"
   - type: dropdown
     id: pytorch-version
     attributes:
       label: PyTorch version
       description: What PyTorch version?
       options:
+        - 1.12
         - 1.11
-        - 1.10
+        - "1.10"
         - 1.9
         - 1.8
         - 1.7
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
index 267787f455..5185637281 100644
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -50,6 +50,3 @@ my be illegible. It may be easiest to save the output of each to a file.
 
 #### Does this change modify the behaviour of other functions? If so, which?
 yes / no
-
-<!-- Remove this line for GPU Cluster tests. It will need an approval. --->
-skip ci
diff --git a/.github/release-drafter.yml b/.github/release-drafter.yml
index 1791f1dd8f..1d823f452e 100644
--- a/.github/release-drafter.yml
+++ b/.github/release-drafter.yml
@@ -29,6 +29,10 @@ categories:
     labels:
       - 'io'
       - 'communication'
+  - title: 'Google Summer of Code 2022'
+    label: 'GSoC22'
+  - title: 'Array API'
+    label: 'array API'
 change-template: '- #$NUMBER $TITLE (by @$AUTHOR)'
 categorie-template: '### $TITLE'
 exclude-labels:
diff --git a/.github/workflows/mirrorci.yml b/.github/workflows/ci_cpu.yml
similarity index 58%
rename from .github/workflows/mirrorci.yml
rename to .github/workflows/ci_cpu.yml
index 54d0f4a273..a8b3f2efa7 100644
--- a/.github/workflows/mirrorci.yml
+++ b/.github/workflows/ci_cpu.yml
@@ -10,11 +10,11 @@ jobs:
     - name: Mirror + trigger CI
       uses: SvanBoxel/gitlab-mirror-and-ci-action@master
       with:
-        args: "https://gitlab.jsc.fz-juelich.de/haf/heat"
+        args: "https://gitlab.hzdr.de/haf/heat"
       env:
-        FORCE_PUSH: "false"
-        GITLAB_HOSTNAME: "gitlab.jsc.fz-juelich.de"
+        FORCE_PUSH: "true"
+        GITLAB_HOSTNAME: "gitlab.hzdr.de"
         GITLAB_USERNAME: ""
-        GITLAB_PASSWORD: ${{ secrets.GITLAB_TOKEN }}
-        GITLAB_PROJECT_ID: "4935"
+        GITLAB_PASSWORD: ${{ secrets.GITLAB_TOKEN_1 }}
+        GITLAB_PROJECT_ID: "845"
         GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 10a3b4f889..822a501a9a 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -1,19 +1,29 @@
 test:
-  image: ubuntu:20.04
+  image: nvidia/cuda:11.6.2-runtime-ubuntu20.04
   tags:
-  - heat
+    - cuda
+    - x86_64
   script:
-  - apt update
-  - apt -y install build-essential python3-pip curl
-  - DEBIAN_FRONTEND=noninteractive apt -y install libopenmpi-dev openmpi-bin openmpi-doc
-  - apt -y install libhdf5-openmpi-dev libpnetcdf-dev
-  - pip install pytest coverage
-  - pip install .[hdf5,netcdf]
-  - COVERAGE_FILE=report/cov/coverage1 mpirun --allow-run-as-root -n 1 coverage run --source=heat --parallel-mode -m pytest --junitxml=report/test/report1.xml heat/
-  - COVERAGE_FILE=report/cov/coverage2 mpirun --allow-run-as-root -n 3 coverage run --source=heat --parallel-mode -m pytest --junitxml=report/test/report2.xml heat/
-  - COVERAGE_FILE=report/cov/coverage5 mpirun --allow-run-as-root -n 5 coverage run --source=heat --parallel-mode -m pytest --junitxml=report/test/report5.xml heat/
-  - COVERAGE_FILE=report/cov/coverage8 mpirun --allow-run-as-root -n 8 coverage run --source=heat --parallel-mode -m pytest --junitxml=report/test/report8.xml heat/
-  - coverage combine report/cov/*
-  - coverage report
-  - coverage xml
-  - curl -s https://codecov.io/bash | bash -s -- -c -F unit -f coverage.xml -t $CODECOV_TOKEN  || echo "Codecov failed to upload"
+    - apt update
+    - apt -y install build-essential python3-pip curl git
+    - DEBIAN_FRONTEND=noninteractive apt -y install libopenmpi-dev openmpi-bin openmpi-doc
+    - apt -y install libhdf5-openmpi-dev libpnetcdf-dev
+    - pip install pytest coverage
+    - pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
+    - pip install .[hdf5,netcdf]
+    - COVERAGE_FILE=report/cov/coverage1 HEAT_TEST_USE_DEVICE=cpu mpirun --allow-run-as-root -n 1 coverage run --source=heat --parallel-mode -m pytest --junitxml=report/test/report1.xml heat/
+    - COVERAGE_FILE=report/cov/coverage2 HEAT_TEST_USE_DEVICE=gpu mpirun --allow-run-as-root -n 3 coverage run --source=heat --parallel-mode -m pytest --junitxml=report/test/report3.xml heat/
+    - COVERAGE_FILE=report/cov/coverage5 HEAT_TEST_USE_DEVICE=cpu mpirun --allow-run-as-root -n 5 coverage run --source=heat --parallel-mode -m pytest --junitxml=report/test/report5.xml heat/
+    - COVERAGE_FILE=report/cov/coverage8 HEAT_TEST_USE_DEVICE=gpu mpirun --allow-run-as-root -n 6 coverage run --source=heat --parallel-mode -m pytest --junitxml=report/test/report6.xml heat/
+    - coverage combine report/cov/*
+    - coverage report
+    - coverage xml
+    - curl -Os https://uploader.codecov.io/latest/linux/codecov
+    - chmod +x codecov
+    - ./codecov -F unit -f ./coverage.xml -t $CODECOV_TOKEN -Z
+  artifacts:
+    when: always
+    paths:
+      - report/test/report*.xml
+    reports:
+      junit: report/test/report*.xml
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index bff221de53..20132efc45 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -10,7 +10,7 @@ repos:
     -   id: check-added-large-files
     -   id: flake8
 -   repo: https://github.com/psf/black
-    rev: 22.6.0
+    rev: 22.10.0
     hooks:
     -   id: black
 -   repo: https://github.com/pycqa/pydocstyle
diff --git a/README.md b/README.md
index 7d9510a677..fc2b53886e 100644
--- a/README.md
+++ b/README.md
@@ -8,9 +8,7 @@ Heat is a distributed tensor framework for high performance data analytics.
 
 Project Status
 --------------
-
-[![Jenkins](https://img.shields.io/jenkins/build?jobUrl=https%3A%2F%2Fheat-ci.fz-juelich.de%2Fjob%2Fheat%2Fjob%2Fheat%2Fjob%2Fmain%2F&label=CPU)](https://heat-ci.fz-juelich.de/blue/organizations/jenkins/heat%2Fheat/activity?branch=main)
-[![Jenkins](https://img.shields.io/jenkins/build?jobUrl=https%3A%2F%2Fheat-ci.fz-juelich.de%2Fjob%2FGPU%2520Cluster%2Fjob%2Fmain%2F&label=GPU)](https://heat-ci.fz-juelich.de/blue/organizations/jenkins/GPU%20Cluster%2Fmain/activity)
+[![Mirror and run GitLab CI](https://github.com/helmholtz-analytics/heat/actions/workflows/ci_cpu.yml/badge.svg)](https://github.com/helmholtz-analytics/heat/actions/workflows/ci_cpu.yml)
 [![Documentation Status](https://readthedocs.org/projects/heat/badge/?version=latest)](https://heat.readthedocs.io/en/latest/?badge=latest)
 [![codecov](https://codecov.io/gh/helmholtz-analytics/heat/branch/main/graph/badge.svg)](https://codecov.io/gh/helmholtz-analytics/heat)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
diff --git a/codecov.yml b/codecov.yml
index 169384c39e..95cd8ab0a5 100644
--- a/codecov.yml
+++ b/codecov.yml
@@ -13,31 +13,16 @@ coverage:
         # basic
         target: auto
         threshold: 3%
-        base: auto
         flags:
           - unit
-          - gpu
         paths:
           - "heat"
-       # advanced settings
-        branches:
-          - main
-        if_ci_failed: error #success, failure, error, ignore
-        informational: false
-        only_pulls: false
     patch:
       default:
         # basic
         target: auto
         threshold: 3%
-        base: auto
-        # advanced
-        branches:
-          - main
-        if_ci_failed: error #success, failure, error, ignore
-        only_pulls: false
         flags:
           - "unit"
-          - "gpu"
         paths:
           - "heat"
diff --git a/heat/core/_operations.py b/heat/core/_operations.py
index 75217f6b2d..08f04e305c 100644
--- a/heat/core/_operations.py
+++ b/heat/core/_operations.py
@@ -422,7 +422,7 @@ def __reduce_op(
     balanced = x.balanced
 
     # if local tensor is empty, replace it with the identity element
-    if 0 in x.lshape and (axis is None or (x.split in axis)):
+    if x.is_distributed() and 0 in x.lshape and (axis is None or split in axis):
         if neutral is None:
             neutral = float("nan")
         neutral_shape = x.gshape[:split] + (1,) + x.gshape[split + 1 :]
diff --git a/heat/core/factories.py b/heat/core/factories.py
index 695545c879..926cdc5dea 100644
--- a/heat/core/factories.py
+++ b/heat/core/factories.py
@@ -381,9 +381,13 @@ def array(
         obj = sanitize_memory_layout(obj, order=order)
     # check with the neighboring rank whether the local shape would fit into a global shape
     elif is_split is not None:
-        gshape = np.array(gshape)
-        lshape = np.array(lshape)
         obj = sanitize_memory_layout(obj, order=order)
+
+        # Check whether the shape of distributed data
+        # matches in all dimensions except the split axis
+        neighbour_shape = np.array(gshape)
+        lshape = np.array(lshape)
+
         if comm.rank < comm.size - 1:
             comm.Isend(lshape, dest=comm.rank + 1)
         if comm.rank != 0:
@@ -395,21 +399,23 @@ def array(
             if length != len(lshape):
                 discard_buffer = np.empty(length)
                 comm.Recv(discard_buffer, source=comm.rank - 1)
-                gshape[is_split] = np.iinfo(gshape.dtype).min
+                neighbour_shape[is_split] = np.iinfo(neighbour_shape.dtype).min
             else:
                 # check whether the individual shape elements match
-                comm.Recv(gshape, source=comm.rank - 1)
+                comm.Recv(neighbour_shape, source=comm.rank - 1)
                 for i in range(length):
                     if i == is_split:
                         continue
-                    elif lshape[i] != gshape[i] and lshape[i] - 1 != gshape[i]:
-                        gshape[is_split] = np.iinfo(gshape.dtype).min
+                    elif lshape[i] != neighbour_shape[i]:
+                        neighbour_shape[is_split] = np.iinfo(neighbour_shape.dtype).min
 
         # sum up the elements along the split dimension
-        reduction_buffer = np.array(gshape[is_split])
-        comm.Allreduce(MPI.IN_PLACE, reduction_buffer, MPI.SUM)
+        reduction_buffer = np.array(neighbour_shape[is_split])
+        comm.Allreduce(MPI.IN_PLACE, reduction_buffer, MPI.MIN)
         if reduction_buffer < 0:
-            raise ValueError("unable to construct tensor, shape of local data chunk does not match")
+            raise ValueError(
+                "Unable to construct DNDarray. Local data slices have inconsistent shapes or dimensions."
+            )
         ttl_shape = np.array(obj.shape)
         ttl_shape[is_split] = lshape[is_split]
         comm.Allreduce(MPI.IN_PLACE, ttl_shape, MPI.SUM)
diff --git a/heat/core/logical.py b/heat/core/logical.py
index 9b4dcc8f79..a6be081ea7 100644
--- a/heat/core/logical.py
+++ b/heat/core/logical.py
@@ -91,6 +91,9 @@ def all(
     def local_all(t, *args, **kwargs):
         return torch.all(t != 0, *args, **kwargs)
 
+    if keepdim and axis is None:
+        axis = tuple(range(x.ndim))
+
     return _operations.__reduce_op(
         x, local_all, MPI.LAND, axis=axis, out=out, neutral=1, keepdim=keepdim
     )
@@ -196,6 +199,9 @@ def any(
     def local_any(t, *args, **kwargs):
         return torch.any(t != 0, *args, **kwargs)
 
+    if keepdim and axis is None:
+        axis = tuple(range(x.ndim))
+
     return _operations.__reduce_op(
         x, local_any, MPI.LOR, axis=axis, out=out, neutral=0, keepdim=keepdim
     )
diff --git a/heat/core/tests/test_arithmetics.py b/heat/core/tests/test_arithmetics.py
index f1fc79d2d3..9e89a7e119 100644
--- a/heat/core/tests/test_arithmetics.py
+++ b/heat/core/tests/test_arithmetics.py
@@ -646,6 +646,10 @@ def test_prod(self):
         self.assertEqual(shape_split_axis_tuple_prod.split, None)
         self.assertTrue((shape_split_axis_tuple_prod == expected_result).all())
 
+        # empty array
+        empty = ht.array([])
+        self.assertEqual(ht.prod(empty), ht.array([1.0]))
+
         # exceptions
         with self.assertRaises(ValueError):
             ht.ones(array_len).prod(axis=1)
@@ -792,6 +796,10 @@ def test_sum(self):
         self.assertEqual(shape_split_axis_tuple_sum.split, None)
         self.assertTrue((shape_split_axis_tuple_sum == expected_result).all())
 
+        # empty array
+        empty = ht.array([])
+        self.assertEqual(ht.sum(empty), ht.array([0.0]))
+
         # exceptions
         with self.assertRaises(ValueError):
             ht.ones(array_len).sum(axis=1)
diff --git a/heat/core/tests/test_factories.py b/heat/core/tests/test_factories.py
index dd973f988e..c7c48bd013 100644
--- a/heat/core/tests/test_factories.py
+++ b/heat/core/tests/test_factories.py
@@ -308,6 +308,12 @@ def test_array(self):
         with self.assertRaises(TypeError):
             ht.array((4,), comm={})
 
+        # data already distributed but don't match in shape
+        if self.get_size() > 1:
+            with self.assertRaises(ValueError):
+                dim = self.get_rank() + 1
+                ht.array([[0] * dim] * dim, is_split=0)
+
     def test_asarray(self):
         # same heat array
         arr = ht.array([1, 2])
diff --git a/heat/core/tests/test_logical.py b/heat/core/tests/test_logical.py
index a995d53db3..691df7ec62 100644
--- a/heat/core/tests/test_logical.py
+++ b/heat/core/tests/test_logical.py
@@ -140,6 +140,32 @@ def test_all(self):
         out_noaxis = ht.zeros((1, 2, 3, 5), split=1)
         ht.all(ones_noaxis_split_axis_neg, axis=-2, out=out_noaxis)
 
+        # test keepdim
+        ones_2d = ht.ones((1, 1))
+        self.assertEqual(ones_2d.all(keepdim=True).shape, ones_2d.shape)
+
+        ones_2d_split = ht.ones((2, 2), split=0)
+        keepdim_is_one = ones_2d_split.all(keepdim=True)
+        self.assertEqual(keepdim_is_one.shape, (1, 1))
+        self.assertEqual(keepdim_is_one.split, None)
+        keepdim_is_one = ones_2d_split.all(axis=0, keepdim=True)
+        self.assertEqual(keepdim_is_one.shape, (1, 2))
+        self.assertEqual(keepdim_is_one.split, None)
+        keepdim_is_one = ones_2d_split.all(axis=1, keepdim=True)
+        self.assertEqual(keepdim_is_one.shape, (2, 1))
+        self.assertEqual(keepdim_is_one.split, 0)
+
+        ones_2d_split = ht.ones((2, 2), split=1)
+        keepdim_is_one = ones_2d_split.all(keepdim=True)
+        self.assertEqual(keepdim_is_one.shape, (1, 1))
+        self.assertEqual(keepdim_is_one.split, None)
+        keepdim_is_one = ones_2d_split.all(axis=0, keepdim=True)
+        self.assertEqual(keepdim_is_one.shape, (1, 2))
+        self.assertEqual(keepdim_is_one.split, 1)
+        keepdim_is_one = ones_2d_split.all(axis=1, keepdim=True)
+        self.assertEqual(keepdim_is_one.shape, (2, 1))
+        self.assertEqual(keepdim_is_one.split, None)
+
         # exceptions
         with self.assertRaises(ValueError):
             ht.ones(array_len).all(axis=1)
@@ -212,6 +238,32 @@ def test_any(self):
         self.assertEqual(any_tensor.dtype, ht.bool)
         self.assertTrue(ht.equal(any_tensor, res))
 
+        # test keepdim
+        ones_2d = ht.ones((1, 1))
+        self.assertEqual(ones_2d.any(keepdim=True).shape, ones_2d.shape)
+
+        ones_2d_split = ht.ones((2, 2), split=0)
+        keepdim_any = ones_2d_split.any(keepdim=True)
+        self.assertEqual(keepdim_any.shape, (1, 1))
+        self.assertEqual(keepdim_any.split, None)
+        keepdim_any = ones_2d_split.any(axis=0, keepdim=True)
+        self.assertEqual(keepdim_any.shape, (1, 2))
+        self.assertEqual(keepdim_any.split, None)
+        keepdim_any = ones_2d_split.any(axis=1, keepdim=True)
+        self.assertEqual(keepdim_any.shape, (2, 1))
+        self.assertEqual(keepdim_any.split, 0)
+
+        ones_2d_split = ht.ones((2, 2), split=1)
+        keepdim_any = ones_2d_split.any(keepdim=True)
+        self.assertEqual(keepdim_any.shape, (1, 1))
+        self.assertEqual(keepdim_any.split, None)
+        keepdim_any = ones_2d_split.any(axis=0, keepdim=True)
+        self.assertEqual(keepdim_any.shape, (1, 2))
+        self.assertEqual(keepdim_any.split, 1)
+        keepdim_any = ones_2d_split.any(axis=1, keepdim=True)
+        self.assertEqual(keepdim_any.shape, (2, 1))
+        self.assertEqual(keepdim_any.split, None)
+
     def test_isclose(self):
         size = ht.communication.MPI_WORLD.size
         a = ht.float32([[2, 2], [2, 2]])
diff --git a/setup.py b/setup.py
index 5aaea7ff3e..e97991ec3b 100644
--- a/setup.py
+++ b/setup.py
@@ -33,7 +33,7 @@
     install_requires=[
         "mpi4py>=3.0.0",
         "numpy>=1.13.0",
-        "torch>=1.7.0, <=1.12.1",
+        "torch>=1.7.0, <1.13",
         "scipy>=0.14.0",
         "pillow>=6.0.0",
         "torchvision>=0.8.0",

From f04b123c5610bb856c43ab3521b42429f9eff82d Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
 <41898282+github-actions[bot]@users.noreply.github.com>
Date: Tue, 8 Nov 2022 05:37:01 +0100
Subject: [PATCH 03/57] Support PyTorch 1.13.0 on branch release/1.2.x (#1048)

* Support latest PyTorch release

* Update setup.py

* Specify allclose tolerance in test_inv()

* Increase allclose tolerance in test_inv

* Increase allclose tolerance for distributed floating-point operations

Co-authored-by: ClaudiaComito <ClaudiaComito@users.noreply.github.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: Claudia Comito <c.comito@fz-juelich.de>
---
 heat/core/linalg/tests/test_basics.py | 27 ++++++++++++++-------------
 setup.py                              |  2 +-
 2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/heat/core/linalg/tests/test_basics.py b/heat/core/linalg/tests/test_basics.py
index 275590a4bf..a3cb827b84 100644
--- a/heat/core/linalg/tests/test_basics.py
+++ b/heat/core/linalg/tests/test_basics.py
@@ -235,7 +235,7 @@ def test_inv(self):
         self.assertEqual(ainv.split, a.split)
         self.assertEqual(ainv.device, a.device)
         self.assertTupleEqual(ainv.shape, a.shape)
-        self.assertTrue(ht.allclose(ainv, ares))
+        self.assertTrue(ht.allclose(ainv, ares, atol=1e-6))
 
         # distributed
         a = ht.array([[5.0, -3, 2], [-3, 2, -1], [-3, 2, -2]], split=0)
@@ -243,14 +243,14 @@ def test_inv(self):
         self.assertEqual(ainv.split, a.split)
         self.assertEqual(ainv.device, a.device)
         self.assertTupleEqual(ainv.shape, a.shape)
-        self.assertTrue(ht.allclose(ainv, ares))
+        self.assertTrue(ht.allclose(ainv, ares, atol=1e-6))
 
         a = ht.array([[5.0, -3, 2], [-3, 2, -1], [-3, 2, -2]], split=1)
         ainv = ht.linalg.inv(a)
         self.assertEqual(ainv.split, a.split)
         self.assertEqual(ainv.device, a.device)
         self.assertTupleEqual(ainv.shape, a.shape)
-        self.assertTrue(ht.allclose(ainv, ares))
+        self.assertTrue(ht.allclose(ainv, ares, atol=1e-6))
 
         # array Size=(2,2,2,2)
         ares = ht.array(
@@ -267,7 +267,7 @@ def test_inv(self):
         self.assertEqual(ainv.split, a.split)
         self.assertEqual(ainv.device, a.device)
         self.assertTupleEqual(ainv.shape, a.shape)
-        self.assertTrue(ht.allclose(ainv, ares))
+        self.assertTrue(ht.allclose(ainv, ares, atol=1e-6))
 
         a = ht.array(
             [[[2, 1], [6, 4]], [[1, 2], [2, 3]], [[1, 2], [2, 3]], [[2, 1], [6, 4]]],
@@ -278,7 +278,7 @@ def test_inv(self):
         self.assertEqual(ainv.split, a.split)
         self.assertEqual(ainv.device, a.device)
         self.assertTupleEqual(ainv.shape, a.shape)
-        self.assertTrue(ht.allclose(ainv, ares))
+        self.assertTrue(ht.allclose(ainv, ares, atol=1e-6))
 
         # pivoting row change
         ares = ht.array([[-1, 0, 2], [2, 0, -1], [-6, 3, 0]], dtype=ht.double) / 3.0
@@ -287,26 +287,27 @@ def test_inv(self):
         self.assertEqual(ainv.split, a.split)
         self.assertEqual(ainv.device, a.device)
         self.assertTupleEqual(ainv.shape, a.shape)
-        self.assertTrue(ht.allclose(ainv, ares))
+        self.assertTrue(ht.allclose(ainv, ares, atol=1e-6))
 
         a = ht.array([[1, 2, 0], [2, 4, 1], [2, 1, 0]], dtype=ht.double, split=1)
         ainv = ht.linalg.inv(a)
         self.assertEqual(ainv.split, a.split)
         self.assertEqual(ainv.device, a.device)
         self.assertTupleEqual(ainv.shape, a.shape)
-        self.assertTrue(ht.allclose(ainv, ares))
+        self.assertTrue(ht.allclose(ainv, ares, atol=1e-15))
 
         ht.random.seed(42)
         a = ht.random.random((20, 20), dtype=ht.float64, split=1)
         ainv = ht.linalg.inv(a)
         i = ht.eye(a.shape, split=1, dtype=a.dtype)
-        self.assertTrue(ht.allclose(a @ ainv, i))
+        # loss of precision in distributed floating-point ops
+        self.assertTrue(ht.allclose(a @ ainv, i, atol=1e-12))
 
-        # ht.random.seed(42)
-        # a = ht.random.random((20, 20), dtype=ht.float64, split=0)
-        # ainv = ht.linalg.inv(a)
-        # i = ht.eye(a.shape, split=0, dtype=a.dtype)
-        # self.assertTrue(ht.allclose(a @ ainv, i))
+        ht.random.seed(42)
+        a = ht.random.random((20, 20), dtype=ht.float64, split=0)
+        ainv = ht.linalg.inv(a)
+        i = ht.eye(a.shape, split=0, dtype=a.dtype)
+        self.assertTrue(ht.allclose(a @ ainv, i, atol=1e-12))
 
         with self.assertRaises(RuntimeError):
             ht.linalg.inv(ht.array([1, 2, 3], split=0))
diff --git a/setup.py b/setup.py
index e97991ec3b..2210ceaf97 100644
--- a/setup.py
+++ b/setup.py
@@ -33,7 +33,7 @@
     install_requires=[
         "mpi4py>=3.0.0",
         "numpy>=1.13.0",
-        "torch>=1.7.0, <1.13",
+        "torch>=1.7.0, <1.13.1",
         "scipy>=0.14.0",
         "pillow>=6.0.0",
         "torchvision>=0.8.0",

From 3b6acc2fe4f5acde107a3b1db76bd8676adeb473 Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Tue, 8 Nov 2022 05:45:06 +0100
Subject: [PATCH 04/57] Update schedule

---
 .github/workflows/pytorch-latest-release.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/pytorch-latest-release.yml b/.github/workflows/pytorch-latest-release.yml
index 584ef8866b..cb61ee512b 100644
--- a/.github/workflows/pytorch-latest-release.yml
+++ b/.github/workflows/pytorch-latest-release.yml
@@ -1,7 +1,7 @@
 name: Get latest PyTorch release version
 on:
   schedule:
-    - cron:  '25 14 * * 2'
+    - cron:  '0 3 * * 1,4'
 permissions:
   contents: write
 jobs:

From 147b2e996c3f827b41a383d60313ffd07b27f4fe Mon Sep 17 00:00:00 2001
From: mtar <m.tarnawa@fz-juelich.de>
Date: Wed, 16 Nov 2022 12:36:17 +0100
Subject: [PATCH 05/57] update gitlab url

---
 .github/workflows/{ci_cpu.yml => ci_cb.yml} | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
 rename .github/workflows/{ci_cpu.yml => ci_cb.yml} (79%)

diff --git a/.github/workflows/ci_cpu.yml b/.github/workflows/ci_cb.yml
similarity index 79%
rename from .github/workflows/ci_cpu.yml
rename to .github/workflows/ci_cb.yml
index a8b3f2efa7..c376f656b0 100644
--- a/.github/workflows/ci_cpu.yml
+++ b/.github/workflows/ci_cb.yml
@@ -10,10 +10,10 @@ jobs:
     - name: Mirror + trigger CI
       uses: SvanBoxel/gitlab-mirror-and-ci-action@master
       with:
-        args: "https://gitlab.hzdr.de/haf/heat"
+        args: "https://codebase.helmholtz.cloud/haf/heat"
       env:
         FORCE_PUSH: "true"
-        GITLAB_HOSTNAME: "gitlab.hzdr.de"
+        GITLAB_HOSTNAME: "codebase.helmholtz.cloud"
         GITLAB_USERNAME: ""
         GITLAB_PASSWORD: ${{ secrets.GITLAB_TOKEN_1 }}
         GITLAB_PROJECT_ID: "845"

From a4a00bb4711b15b37a7e21971d74cfed2ba671c3 Mon Sep 17 00:00:00 2001
From: mtar <m.tarnawa@fz-juelich.de>
Date: Wed, 16 Nov 2022 13:29:36 +0100
Subject: [PATCH 06/57] update badge

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index fc2b53886e..4f1f9204dd 100644
--- a/README.md
+++ b/README.md
@@ -8,7 +8,7 @@ Heat is a distributed tensor framework for high performance data analytics.
 
 Project Status
 --------------
-[![Mirror and run GitLab CI](https://github.com/helmholtz-analytics/heat/actions/workflows/ci_cpu.yml/badge.svg)](https://github.com/helmholtz-analytics/heat/actions/workflows/ci_cpu.yml)
+[![Mirror and run GitLab CI](https://github.com/helmholtz-analytics/heat/actions/workflows/ci_cb.yml/badge.svg)](https://github.com/helmholtz-analytics/heat/actions/workflows/ci_cb.yml)
 [![Documentation Status](https://readthedocs.org/projects/heat/badge/?version=latest)](https://heat.readthedocs.io/en/latest/?badge=latest)
 [![codecov](https://codecov.io/gh/helmholtz-analytics/heat/branch/main/graph/badge.svg)](https://codecov.io/gh/helmholtz-analytics/heat)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

From 5f95a5fd99aa5839b86778fe01f066503e16e8ad Mon Sep 17 00:00:00 2001
From: Claudia Comito <c.comito@fz-juelich.de>
Date: Fri, 18 Nov 2022 06:03:34 +0100
Subject: [PATCH 07/57] import CI from main

---
 .github/workflows/{ci_cpu.yml => ci_cb.yml} | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
 rename .github/workflows/{ci_cpu.yml => ci_cb.yml} (79%)

diff --git a/.github/workflows/ci_cpu.yml b/.github/workflows/ci_cb.yml
similarity index 79%
rename from .github/workflows/ci_cpu.yml
rename to .github/workflows/ci_cb.yml
index a8b3f2efa7..c376f656b0 100644
--- a/.github/workflows/ci_cpu.yml
+++ b/.github/workflows/ci_cb.yml
@@ -10,10 +10,10 @@ jobs:
     - name: Mirror + trigger CI
       uses: SvanBoxel/gitlab-mirror-and-ci-action@master
       with:
-        args: "https://gitlab.hzdr.de/haf/heat"
+        args: "https://codebase.helmholtz.cloud/haf/heat"
       env:
         FORCE_PUSH: "true"
-        GITLAB_HOSTNAME: "gitlab.hzdr.de"
+        GITLAB_HOSTNAME: "codebase.helmholtz.cloud"
         GITLAB_USERNAME: ""
         GITLAB_PASSWORD: ${{ secrets.GITLAB_TOKEN_1 }}
         GITLAB_PROJECT_ID: "845"

From 900588a47df74b788f7a0ee8dec003ff05a478c0 Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Fri, 18 Nov 2022 09:47:01 +0100
Subject: [PATCH 08/57] Update torch 1.12 versions

---
 .github/workflows/ci.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
index d61f034c6d..f76462be3b 100644
--- a/.github/workflows/ci.yaml
+++ b/.github/workflows/ci.yaml
@@ -21,7 +21,7 @@ jobs:
           - 'torch==1.7.1+cpu torchvision==0.8.2+cpu torchaudio==0.7.2'
           - 'torch==1.8.1+cpu torchvision==0.9.1+cpu torchaudio==0.8.1'
           - 'torch==1.9.1+cpu torchvision==0.10.1+cpu torchaudio==0.9.1'
-          - 'torch==1.12.0+cpu torchvision==0.13.0+cpu torchaudio==0.12.0'
+          - 'torch==1.12.1+cpu torchvision==0.13.1+cpu torchaudio==0.12.1'
         exclude:
           - py-version: 3.9
             pytorch-version: 'torch==1.7.1+cpu torchvision==0.8.2+cpu torchaudio==0.7.2'

From 61d3e109c98412b4145ddebe43dadf09f1320c49 Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Wed, 30 Nov 2022 11:54:42 +0100
Subject: [PATCH 09/57] Lanczos decomposition `linalg.solver.lanczos`:  Support
 double precision, complex data types (#1038)

* Randomized v0 to inherit input's dtype

* Add test_lanczos

* Allocate buffer of correct dtype for T

* Add correctness test for Lanczos

* Matrix size change

* Remove split=1 test, test dtypes

* Enable initialization of random complex v0

* Add correctness test for complex input

* Fix complex dtype heuristics, docs

* Raise error if input is int

* Test exceptions

* Test exceptions, take 2

* Fix exception tests

* replace rand with randn in random initialization of v0

* replace rand with randn

* replace rand with randn

* remove print statements

* Increase allclose tolerance for torch 1.13

* Increase test coverage

* normalize v0

* increase tolerance for single precision check

* replace inverse V with conj transp V

* last attempt with complex single precision, tolerance way up

* Generalize algorithm for complex input

* Expand correctness tests complex input

* Adapt algo to complex input, sanitize output buffers

* Test output buffers

* Update docs

* test non-distributed output buffer

* Simplify T.dtype assignment for complex input matrix

* expand test coverage

* separate complex from real implementation
---
 heat/core/linalg/solver.py            | 228 ++++++++++++++++++--------
 heat/core/linalg/tests/test_solver.py | 109 ++++++++++++
 2 files changed, 268 insertions(+), 69 deletions(-)

diff --git a/heat/core/linalg/solver.py b/heat/core/linalg/solver.py
index e35ab25656..1ce03491c0 100644
--- a/heat/core/linalg/solver.py
+++ b/heat/core/linalg/solver.py
@@ -3,6 +3,7 @@
 """
 import heat as ht
 from ..dndarray import DNDarray
+from ..sanitation import sanitize_out
 from typing import List, Dict, Any, TypeVar, Union, Tuple, Optional
 
 import torch
@@ -85,100 +86,189 @@ def lanczos(
     Parameters
     ----------
     A : DNDarray
-        2D symmetric, positive definite Matrix
+        2D Hermitian (if complex) or symmetric positive-definite matrix.
+        Only distribution along axis 0 is supported, i.e. `A.split` must be `0` or `None`.
     m : int
         Number of Lanczos iterations
     v0 : DNDarray, optional
-        1D starting vector of Euclidian norm 1. If not provided, a random vector will be used to start the algorithm
+        1D starting vector of Euclidean norm 1. If not provided, a random vector will be used to start the algorithm
     V_out : DNDarray, optional
-        Output Matrix for the Krylow vectors, Shape = (n, m)
+        Output Matrix for the Krylow vectors, Shape = (n, m), dtype=A.dtype, must be initialized to zero
     T_out : DNDarray, optional
-        Output Matrix for the Tridiagonal matrix, Shape = (m, m)
+        Output Matrix for the Tridiagonal matrix, Shape = (m, m), must be initialized to zero
     """
     if not isinstance(A, DNDarray):
-        raise TypeError("A needs to be of type ht.dndarra, but was {}".format(type(A)))
-
+        raise TypeError("A needs to be of type ht.dndarray, but was {}".format(type(A)))
     if not (A.ndim == 2):
         raise RuntimeError("A needs to be a 2D matrix")
+    if A.dtype is ht.int32 or A.dtype is ht.int64:
+        raise TypeError("A can be float or complex, got {}".format(A.dtype))
     if not isinstance(m, (int, float)):
-        raise TypeError("m must be eiter int or float, but was {}".format(type(m)))
+        raise TypeError("m must be int, got {}".format(type(m)))
 
     n, column = A.shape
     if n != column:
-        raise TypeError("Input Matrix A needs to be symmetric.")
-    T = ht.zeros((m, m))
+        raise TypeError("Input Matrix A needs to be symmetric positive-definite.")
+
+    # output data types: T is always Real
+    A_is_complex = A.dtype is ht.complex128 or A.dtype is ht.complex64
+    T_dtype = A.real.dtype
+
+    # initialize or sanitize output buffers
+    if T_out is not None:
+        sanitize_out(
+            T_out,
+            output_shape=(m, m),
+            output_split=None,
+            output_device=A.device,
+            output_comm=A.comm,
+        )
+        T = T_out
+    else:
+        T = ht.zeros((m, m), dtype=T_dtype, device=A.device, comm=A.comm)
     if A.split == 0:
-        # This is done for better memory access in the reorthogonalization Gram-Schmidt algorithm
-        V = ht.ones((n, m), split=0, dtype=A.dtype, device=A.device)
+        if V_out is not None:
+            sanitize_out(
+                V_out,
+                output_shape=(n, m),
+                output_split=0,
+                output_device=A.device,
+                output_comm=A.comm,
+            )
+            V = V_out
+        else:
+            # This is done for better memory access in the reorthogonalization Gram-Schmidt algorithm
+            V = ht.zeros((n, m), split=0, dtype=A.dtype, device=A.device, comm=A.comm)
     else:
-        V = ht.ones((n, m), split=None, dtype=A.dtype, device=A.device)
+        if A.split == 1:
+            raise NotImplementedError("Distribution along axis 1 not implemented yet.")
+        if V_out is not None:
+            sanitize_out(
+                V_out,
+                output_shape=(n, m),
+                output_split=None,
+                output_device=A.device,
+                output_comm=A.comm,
+            )
+            V = V_out
+        else:
+            V = ht.zeros((n, m), split=None, dtype=A.dtype, device=A.device, comm=A.comm)
+
+    if A_is_complex:
+        if v0 is None:
+            vr = (
+                ht.random.rand(n, split=V.split, dtype=T_dtype, device=V.device, comm=V.comm)
+                + ht.random.rand(n, split=V.split, dtype=T_dtype, device=V.device, comm=V.comm) * 1j
+            )
+            v0 = vr / ht.norm(vr)
+        else:
+            if v0.split != V.split:
+                v0.resplit_(axis=V.split)
+        # # 0th iteration
+        # # vector v0 has Euclidean norm = 1
+        w = ht.matmul(A, v0)
+        alpha = ht.dot(ht.conj(w).T, v0)
+        w = w - alpha * v0
+        T[0, 0] = alpha.real
+        V[:, 0] = v0
+        for i in range(1, int(m)):
+            beta = ht.norm(w)
+            if ht.abs(beta) < 1e-10:
+                # print("Lanczos breakdown in iteration {}".format(i))
+                # Lanczos Breakdown, pick a random vector to continue
+                vr = (
+                    ht.random.rand(n, split=V.split, dtype=T_dtype, device=V.device, comm=V.comm)
+                    + ht.random.rand(n, split=V.split, dtype=T_dtype, device=V.device, comm=V.comm)
+                    * 1j
+                )
+                # orthogonalize v_r with respect to all vectors v[i]
+                for j in range(i):
+                    vi_loc = V._DNDarray__array[:, j]
+                    a = torch.dot(vr.larray, torch.conj(vi_loc))
+                    b = torch.dot(vi_loc, torch.conj(vi_loc))
+                    A.comm.Allreduce(ht.communication.MPI.IN_PLACE, a, ht.communication.MPI.SUM)
+                    A.comm.Allreduce(ht.communication.MPI.IN_PLACE, b, ht.communication.MPI.SUM)
+                    vr._DNDarray__array = vr._DNDarray__array - a / b * vi_loc
+                # normalize v_r to Euclidean norm 1 and set as ith vector v
+                vi = vr / ht.norm(vr)
+            else:
+                vr = w
+
+                # Reorthogonalization
+                for j in range(i):
+                    vi_loc = V.larray[:, j]
+                    a = torch.dot(vr._DNDarray__array, torch.conj(vi_loc))
+                    b = torch.dot(vi_loc, torch.conj(vi_loc))
+                    A.comm.Allreduce(ht.communication.MPI.IN_PLACE, a, ht.communication.MPI.SUM)
+                    A.comm.Allreduce(ht.communication.MPI.IN_PLACE, b, ht.communication.MPI.SUM)
+                    vr._DNDarray__array = vr._DNDarray__array - a / b * vi_loc
+
+                vi = vr / ht.norm(vr)
 
-    if v0 is None:
-        vr = ht.random.rand(n, split=V.split)
-        v0 = vr / ht.norm(vr)
+            w = ht.matmul(A, vi)
+            alpha = ht.dot(ht.conj(w).T, vi)
+
+            w = w - alpha * vi - beta * V[:, i - 1]
+
+            T[i - 1, i] = beta.real
+            T[i, i - 1] = beta.real
+            T[i, i] = alpha.real
+            V[:, i] = vi
     else:
-        if v0.split != V.split:
-            v0.resplit_(axis=V.split)
-    # # 0th iteration
-    # # vector v0 has euklidian norm = 1
-    w = ht.matmul(A, v0)
-    alpha = ht.dot(w, v0)
-    w = w - alpha * v0
-    T[0, 0] = alpha
-    V[:, 0] = v0
-    for i in range(1, int(m)):
-        beta = ht.norm(w)
-        if ht.abs(beta) < 1e-10:
-            # print("Lanczos breakdown in iteration {}".format(i))
-            # Lanczos Breakdown, pick a random vector to continue
-            vr = ht.random.rand(n, dtype=A.dtype, split=V.split)
-            # orthogonalize v_r with respect to all vectors v[i]
-            for j in range(i):
-                vi_loc = V.larray[:, j]
-                a = torch.dot(vr.larray, vi_loc)
-                b = torch.dot(vi_loc, vi_loc)
-                A.comm.Allreduce(ht.communication.MPI.IN_PLACE, a, ht.communication.MPI.SUM)
-                A.comm.Allreduce(ht.communication.MPI.IN_PLACE, b, ht.communication.MPI.SUM)
-                vr.larray = vr.larray - a / b * vi_loc
-            # normalize v_r to Euclidian norm 1 and set as ith vector v
-            vi = vr / ht.norm(vr)
+        if v0 is None:
+            vr = ht.random.rand(n, split=V.split, dtype=T_dtype, device=V.device, comm=V.comm)
+            v0 = vr / ht.norm(vr)
         else:
-            vr = w
+            if v0.split != V.split:
+                v0.resplit_(axis=V.split)
+        # # 0th iteration
+        # # vector v0 has Euclidean norm = 1
+        w = ht.matmul(A, v0)
+        alpha = ht.dot(w, v0)
+        w = w - alpha * v0
+        T[0, 0] = alpha
+        V[:, 0] = v0
+        for i in range(1, int(m)):
+            beta = ht.norm(w)
+            if ht.abs(beta) < 1e-10:
+                # print("Lanczos breakdown in iteration {}".format(i))
+                # Lanczos Breakdown, pick a random vector to continue
+                vr = ht.random.rand(n, split=V.split, dtype=T_dtype, device=V.device, comm=V.comm)
+                # orthogonalize v_r with respect to all vectors v[i]
+                for j in range(i):
+                    vi_loc = V._DNDarray__array[:, j]
+                    a = torch.dot(vr.larray, vi_loc)
+                    b = torch.dot(vi_loc, vi_loc)
+                    A.comm.Allreduce(ht.communication.MPI.IN_PLACE, a, ht.communication.MPI.SUM)
+                    A.comm.Allreduce(ht.communication.MPI.IN_PLACE, b, ht.communication.MPI.SUM)
+                    vr._DNDarray__array = vr._DNDarray__array - a / b * vi_loc
+                # normalize v_r to Euclidean norm 1 and set as ith vector v
+                vi = vr / ht.norm(vr)
+            else:
+                vr = w
 
-            # Reorthogonalization
-            # ToDo: Rethink this; mask torch calls, See issue #494
-            # This is the fast solution, using item access on the ht.dndarray level is way slower
-            for j in range(i):
-                vi_loc = V.larray[:, j]
-                a = torch.dot(vr._DNDarray__array, vi_loc)
-                b = torch.dot(vi_loc, vi_loc)
-                A.comm.Allreduce(ht.communication.MPI.IN_PLACE, a, ht.communication.MPI.SUM)
-                A.comm.Allreduce(ht.communication.MPI.IN_PLACE, b, ht.communication.MPI.SUM)
-                vr._DNDarray__array = vr._DNDarray__array - a / b * vi_loc
+                # Reorthogonalization
+                for j in range(i):
+                    vi_loc = V.larray[:, j]
+                    a = torch.dot(vr._DNDarray__array, vi_loc)
+                    b = torch.dot(vi_loc, vi_loc)
+                    A.comm.Allreduce(ht.communication.MPI.IN_PLACE, a, ht.communication.MPI.SUM)
+                    A.comm.Allreduce(ht.communication.MPI.IN_PLACE, b, ht.communication.MPI.SUM)
+                    vr._DNDarray__array = vr._DNDarray__array - a / b * vi_loc
 
-            vi = vr / ht.norm(vr)
+                vi = vr / ht.norm(vr)
 
-        w = ht.matmul(A, vi)
-        alpha = ht.dot(w, vi)
+            w = ht.matmul(A, vi)
+            alpha = ht.dot(w, vi)
 
-        w = w - alpha * vi - beta * V[:, i - 1]
+            w = w - alpha * vi - beta * V[:, i - 1]
 
-        T[i - 1, i] = beta
-        T[i, i - 1] = beta
-        T[i, i] = alpha
-        V[:, i] = vi
+            T[i - 1, i] = beta
+            T[i, i - 1] = beta
+            T[i, i] = alpha
+            V[:, i] = vi
 
     if V.split is not None:
         V.resplit_(axis=None)
 
-    if T_out is not None:
-        T_out = T.copy()
-        if V_out is not None:
-            V_out = V.copy()
-            return V_out, T_out
-        return V, T_out
-    elif V_out is not None:
-        V_out = V.copy()
-        return V_out, T
-
     return V, T
diff --git a/heat/core/linalg/tests/test_solver.py b/heat/core/linalg/tests/test_solver.py
index 2cc16de55b..8e23bbd984 100644
--- a/heat/core/linalg/tests/test_solver.py
+++ b/heat/core/linalg/tests/test_solver.py
@@ -29,3 +29,112 @@ def test_cg(self):
             ht.linalg.cg(b, b, x0)
         with self.assertRaises(RuntimeError):
             ht.linalg.cg(A, b, A)
+
+    def test_lanczos(self):
+        # define positive definite matrix (n,n), split = 0
+        n = 100
+        A = ht.random.randn(n, n, dtype=ht.float64, split=0)
+        B = A @ A.T
+        # Lanczos decomposition with iterations m = n
+        V, T = ht.lanczos(B, m=n)
+        self.assertTrue(V.dtype is B.dtype)
+        self.assertTrue(T.dtype is B.dtype)
+        # V must be unitary
+        V_inv = ht.linalg.inv(V)
+        self.assertTrue(ht.allclose(V_inv, V.T))
+        # V T V.T must be = B, V transposed = V inverse
+        lanczos_B = V @ T @ V_inv
+        self.assertTrue(ht.allclose(lanczos_B, B))
+
+        # complex128, output buffers
+        A = (
+            ht.random.rand(n, n, dtype=ht.float64, split=0)
+            + ht.random.rand(n, n, dtype=ht.float64, split=0) * 1j
+        )
+        A_conj = ht.conj(A)
+        B = A @ A_conj.T
+        m = n
+        V_out = ht.zeros((n, m), dtype=B.dtype, split=B.split, device=B.device, comm=B.comm)
+        T_out = ht.zeros((m, m), dtype=ht.float64, device=B.device, comm=B.comm)
+        # Lanczos decomposition with iterations m = n
+        ht.lanczos(B, m=m, V_out=V_out, T_out=T_out)
+        # V must be unitary
+        V_inv = ht.linalg.inv(V_out)
+        self.assertTrue(ht.allclose(V_inv, ht.conj(V_out).T))
+        # V T V* must be = B, V conjugate transpose = V inverse
+        lanczos_B = V_out @ T_out @ V_inv
+        self.assertTrue(ht.allclose(lanczos_B, B))
+
+        # single precision tolerance
+        if int(torch.__version__.split(".")[1]) == 13:
+            tolerance = 1e-3
+        else:
+            tolerance = 1e-4
+
+        # float32, pre_defined v0, split mismatch
+        A = ht.random.randn(n, n, dtype=ht.float32, split=0)
+        B = A @ A.T
+        v0 = ht.random.randn(n, device=A.device, split=None)
+        v0 = v0 / ht.norm(v0)
+        # Lanczos decomposition with iterations m = n
+        V, T = ht.lanczos(B, m=n, v0=v0)
+        self.assertTrue(V.dtype is B.dtype)
+        self.assertTrue(T.dtype is B.dtype)
+        # V must be unitary
+        V_inv = ht.linalg.inv(V)
+        self.assertTrue(ht.allclose(V_inv, V.T, atol=tolerance))
+        # V T V.T must be = B, V transposed = V inverse
+        lanczos_B = V @ T @ V_inv
+        self.assertTrue(ht.allclose(lanczos_B, B, atol=tolerance))
+
+        # complex64
+        A = (
+            ht.random.randn(n, n, dtype=ht.float32, split=0)
+            + ht.random.randn(n, n, dtype=ht.float32, split=0) * 1j
+        )
+        A_conj = ht.conj(A)
+        B = A @ A_conj.T
+        # Lanczos decomposition with iterations m = n
+        V, T = ht.lanczos(B, m=n)
+        # V must be unitary
+        # V T V* must be = B, V conjugate transpose = V inverse
+        V_conj = ht.conj(V)
+        lanczos_B = V @ T @ V_conj.T
+        self.assertTrue(ht.allclose(lanczos_B, B, atol=tolerance))
+
+        # non-distributed
+        A = ht.random.randn(n, n, dtype=ht.float64, split=None)
+        B = A @ A.T
+        # Lanczos decomposition with iterations m = n
+        m = n
+        V_out = ht.zeros((n, m), dtype=B.dtype, split=B.split, device=B.device, comm=B.comm)
+        T_out = ht.zeros((m, m), dtype=ht.float64, device=B.device, comm=B.comm)
+        ht.lanczos(B, m=m, V_out=V_out, T_out=T_out)
+        self.assertTrue(V_out.dtype is B.dtype)
+        self.assertTrue(T_out.dtype is B.real.dtype)
+        # V must be unitary
+        V_inv = ht.linalg.inv(V_out)
+        self.assertTrue(ht.allclose(V_inv, V_out.T))
+        # without output buffers
+        V, T = ht.lanczos(B, m=m)
+        # V T V.T must be = B, V transposed = V inverse
+        lanczos_B = V @ T @ V.T
+        self.assertTrue(ht.allclose(lanczos_B, B))
+
+        with self.assertRaises(TypeError):
+            V, T = ht.lanczos(B, m="3")
+        with self.assertRaises(TypeError):
+            A = ht.random.randint(0, 5, (10, 10))
+            V, T = ht.lanczos(A, m=3)
+        with self.assertRaises(TypeError):
+            A = torch.randn(10, 10)
+            V, T = ht.lanczos(A, m=3)
+        with self.assertRaises(TypeError):
+            A = ht.random.randn(10, 12)
+            V, T = ht.lanczos(A, m=3)
+        with self.assertRaises(RuntimeError):
+            A = ht.random.randn(10, 12, 12)
+            V, T = ht.lanczos(A, m=3)
+        with self.assertRaises(NotImplementedError):
+            A = ht.random.randn(10, 10, split=1)
+            V, T = ht.lanczos(A, m=3)

From 90f072f639255923c8a238447fd053abe2a5bfbf Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Wed, 30 Nov 2022 13:20:00 +0100
Subject: [PATCH 10/57] Sort release draft entries

---
 .github/release-drafter.yml | 42 +++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/.github/release-drafter.yml b/.github/release-drafter.yml
index 1d823f452e..c1abd3124d 100644
--- a/.github/release-drafter.yml
+++ b/.github/release-drafter.yml
@@ -5,34 +5,36 @@ categories:
   - title: '🐛 Bug Fixes'
     labels:
       - 'bug :bug:'
-  - title: '🧹 Maintenance'
-    label: 'chore'
-  - title: '📜 Documentation'
-    label: 'documentation :book:'
-  - title: '🧪 Testing'
-    label: 'testing'
-  - title: '💯 Benchmarking'
-    label: 'benchmarking'
-  - title: 'Linear Algebra'
-    label: 'linalg'
-  - title: 'DNDarray'
-    label: 'dndarray'
   - title: 'Arithmetic'
     label: 'arithmetic'
-  - title: 'Random'
-    label: 'random'
-  - title: 'Logical'
-    label: 'logical'
-  - title: 'Manipulation'
-    label: 'manipulation'
+  - title: 'Array API'
+    label: 'array API'
   - title: 'Communication'
     labels:
       - 'io'
+      - 'I/O'
       - 'communication'
+  - title: 'DNDarray'
+    label: 'dndarray'
+  - title: 'Linear Algebra'
+    label: 'linalg'
+  - title: 'Logical'
+    label: 'logical'
+  - title: 'Manipulations'
+    label: 'manipulation'
+  - title: 'Random'
+    label: 'random'
   - title: 'Google Summer of Code 2022'
     label: 'GSoC22'
-  - title: 'Array API'
-    label: 'array API'
+  - title: '💯 Benchmarking'
+    label: 'benchmarking'
+  - title: '📜 Documentation'
+    label: 'documentation :book:'
+  - title: '🧹 Maintenance'
+    label: 'chore'
+  - title: '🧪 Testing'
+    label: 'testing'
+  
 change-template: '- #$NUMBER $TITLE (by @$AUTHOR)'
 categorie-template: '### $TITLE'
 exclude-labels:

From 3cb291c0661c8a73e3d0eb18dfaaf8d367e15aa1 Mon Sep 17 00:00:00 2001
From: Claudia Comito <c.comito@fz-juelich.de>
Date: Wed, 30 Nov 2022 14:44:35 +0100
Subject: [PATCH 11/57] Bump up version

---
 heat/core/version.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/heat/core/version.py b/heat/core/version.py
index b402b86818..9146a07726 100644
--- a/heat/core/version.py
+++ b/heat/core/version.py
@@ -1,10 +1,10 @@
-"""This module contains HeAT's version information."""
+"""This module contains Heat's version information."""
 
 major: int = 1
-"""Indicates HeAT's main version."""
+"""Indicates Heat's main version."""
 minor: int = 2
 """Indicates feature extension."""
-micro: int = 0
+micro: int = 1
 """Indicates revisions for bugfixes."""
 extension: str = "dev"
 """Indicates special builds, e.g. for specific hardware."""

From e788831eb8807f317b03b79cca47eec0e1d42bda Mon Sep 17 00:00:00 2001
From: Claudia Comito <c.comito@fz-juelich.de>
Date: Thu, 1 Dec 2022 06:12:14 +0100
Subject: [PATCH 12/57] Remove dev extension from version

---
 heat/core/version.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/heat/core/version.py b/heat/core/version.py
index 9146a07726..a8d2fa2c5b 100644
--- a/heat/core/version.py
+++ b/heat/core/version.py
@@ -6,7 +6,7 @@
 """Indicates feature extension."""
 micro: int = 1
 """Indicates revisions for bugfixes."""
-extension: str = "dev"
+extension: str = ""
 """Indicates special builds, e.g. for specific hardware."""
 
 if not extension:

From b4a189d8ed8bac12189cad5fa0008a810ba8f65e Mon Sep 17 00:00:00 2001
From: Claudia Comito <c.comito@fz-juelich.de>
Date: Thu, 1 Dec 2022 06:13:42 +0100
Subject: [PATCH 13/57] Set extension to None

---
 heat/core/version.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/heat/core/version.py b/heat/core/version.py
index a8d2fa2c5b..eacc02cd3e 100644
--- a/heat/core/version.py
+++ b/heat/core/version.py
@@ -6,7 +6,7 @@
 """Indicates feature extension."""
 micro: int = 1
 """Indicates revisions for bugfixes."""
-extension: str = ""
+extension: str = None
 """Indicates special builds, e.g. for specific hardware."""
 
 if not extension:

From 1d4b1458661555e3af905fa1deb399d827252515 Mon Sep 17 00:00:00 2001
From: Claudia Comito <c.comito@fz-juelich.de>
Date: Thu, 1 Dec 2022 11:15:19 +0100
Subject: [PATCH 14/57] update main version to 1.2.1-dev

---
 heat/core/version.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/heat/core/version.py b/heat/core/version.py
index eacc02cd3e..9146a07726 100644
--- a/heat/core/version.py
+++ b/heat/core/version.py
@@ -6,7 +6,7 @@
 """Indicates feature extension."""
 micro: int = 1
 """Indicates revisions for bugfixes."""
-extension: str = None
+extension: str = "dev"
 """Indicates special builds, e.g. for specific hardware."""
 
 if not extension:

From 158a0047fda3c37e2f14c19ee01cf83a6955f87e Mon Sep 17 00:00:00 2001
From: Claudia Comito <c.comito@fz-juelich.de>
Date: Fri, 2 Dec 2022 09:41:38 +0100
Subject: [PATCH 15/57] automate +1 increment of new torch version in setup.py

---
 .github/workflows/latest-pytorch-support.yml | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/latest-pytorch-support.yml b/.github/workflows/latest-pytorch-support.yml
index a0390fb530..15338f0f6e 100644
--- a/.github/workflows/latest-pytorch-support.yml
+++ b/.github/workflows/latest-pytorch-support.yml
@@ -5,8 +5,13 @@ on:
     paths:
       - '.github/pytorch-release-versions/*'
 env:
-  previous_pytorch: $(grep 'torch>=' setup.py | awk -F '<=' '{print $2}' | tr -d '",')
+  previous_pytorch: $(grep 'torch>=' setup.py | awk -F '<' '{print $2}' | tr -d '",')
   new_pytorch: $(<.github/pytorch-release-versions/pytorch-latest.txt)
+  new_major: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f1)
+  new_minor: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f2)
+  new_micro: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f3)
+  new_setup_micro: $(($env.new_micro+1))
+  new_setup_pytorch: ("$env.new_major"."$env.new_minor"."$env.new_setup_micro")
 permissions:
   contents: write
   issues: write
@@ -38,7 +43,7 @@ jobs:
         run: |
           echo ${{ env.previous_pytorch }}
           echo ${{ env.new_pytorch }}
-          sed -i '/torch>=/ s/'"${{ env.previous_pytorch }}"'/'"${{ env.new_pytorch }}"'/g' setup.py
+          sed -i '/torch>=/ s/'"${{ env.previous_pytorch }}"'/'"${{ env.new_setup_pytorch }}"'/g' setup.py
           sed -i 's/'"${{ env.previous_pytorch }}"'/'"${{ env.new_pytorch }}"'/g' .github/pytorch-release-versions/pytorch-latest.txt
       - name: Define env variable
         run: |
@@ -56,4 +61,4 @@ jobs:
               Issue/s resolved: #${{ steps.create-issue.outputs.number }}
               Auto-generated by [create-pull-request][1]
               [1]: https://github.com/peter-evans/create-pull-request
-            reviewers: ClaudiaComito, mtar, coquelin77, JuanPedroGHM
+            reviewers: ClaudiaComito, mtar, JuanPedroGHM

From 77dd56c338754a9f18825e35bd9c92e7610f0d36 Mon Sep 17 00:00:00 2001
From: Claudia Comito <c.comito@fz-juelich.de>
Date: Fri, 2 Dec 2022 09:45:52 +0100
Subject: [PATCH 16/57] edit terminology

---
 .github/workflows/latest-pytorch-support.yml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/latest-pytorch-support.yml b/.github/workflows/latest-pytorch-support.yml
index 15338f0f6e..d9759d4510 100644
--- a/.github/workflows/latest-pytorch-support.yml
+++ b/.github/workflows/latest-pytorch-support.yml
@@ -9,9 +9,9 @@ env:
   new_pytorch: $(<.github/pytorch-release-versions/pytorch-latest.txt)
   new_major: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f1)
   new_minor: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f2)
-  new_micro: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f3)
-  new_setup_micro: $(($env.new_micro+1))
-  new_setup_pytorch: ("$env.new_major"."$env.new_minor"."$env.new_setup_micro")
+  new_patch: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f3)
+  new_setup_patch: $(($env.new_patch+1))
+  new_setup_pytorch: ("$env.new_major"."$env.new_minor"."$env.new_setup_patch")
 permissions:
   contents: write
   issues: write

From 6158fa9804684fef7169ceebb3f9c3d7a7c04672 Mon Sep 17 00:00:00 2001
From: Claudia Comito <c.comito@fz-juelich.de>
Date: Fri, 2 Dec 2022 09:53:02 +0100
Subject: [PATCH 17/57] update latest-pytorch workflow from main

---
 .github/workflows/latest-pytorch-support.yml | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/latest-pytorch-support.yml b/.github/workflows/latest-pytorch-support.yml
index a0390fb530..d9759d4510 100644
--- a/.github/workflows/latest-pytorch-support.yml
+++ b/.github/workflows/latest-pytorch-support.yml
@@ -5,8 +5,13 @@ on:
     paths:
       - '.github/pytorch-release-versions/*'
 env:
-  previous_pytorch: $(grep 'torch>=' setup.py | awk -F '<=' '{print $2}' | tr -d '",')
+  previous_pytorch: $(grep 'torch>=' setup.py | awk -F '<' '{print $2}' | tr -d '",')
   new_pytorch: $(<.github/pytorch-release-versions/pytorch-latest.txt)
+  new_major: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f1)
+  new_minor: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f2)
+  new_patch: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f3)
+  new_setup_patch: $(($env.new_patch+1))
+  new_setup_pytorch: ("$env.new_major"."$env.new_minor"."$env.new_setup_patch")
 permissions:
   contents: write
   issues: write
@@ -38,7 +43,7 @@ jobs:
         run: |
           echo ${{ env.previous_pytorch }}
           echo ${{ env.new_pytorch }}
-          sed -i '/torch>=/ s/'"${{ env.previous_pytorch }}"'/'"${{ env.new_pytorch }}"'/g' setup.py
+          sed -i '/torch>=/ s/'"${{ env.previous_pytorch }}"'/'"${{ env.new_setup_pytorch }}"'/g' setup.py
           sed -i 's/'"${{ env.previous_pytorch }}"'/'"${{ env.new_pytorch }}"'/g' .github/pytorch-release-versions/pytorch-latest.txt
       - name: Define env variable
         run: |
@@ -56,4 +61,4 @@ jobs:
               Issue/s resolved: #${{ steps.create-issue.outputs.number }}
               Auto-generated by [create-pull-request][1]
               [1]: https://github.com/peter-evans/create-pull-request
-            reviewers: ClaudiaComito, mtar, coquelin77, JuanPedroGHM
+            reviewers: ClaudiaComito, mtar, JuanPedroGHM

From fdfbde7af26924081bd123602374f721782c66e9 Mon Sep 17 00:00:00 2001
From: mtar <m.tarnawa@fz-juelich.de>
Date: Fri, 9 Dec 2022 10:27:49 +0100
Subject: [PATCH 18/57] Use ubuntu 20.04

---
 .github/workflows/ci.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
index f76462be3b..2b455e3953 100644
--- a/.github/workflows/ci.yaml
+++ b/.github/workflows/ci.yaml
@@ -7,7 +7,7 @@ on:
 jobs:
   approved:
     if: github.event.review.state == 'approved'
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-20.04
     strategy:
       fail-fast: false
       matrix:

From 0b386930629f2e96ceb9285a4c682ca779dab663 Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Fri, 9 Dec 2022 13:19:58 +0100
Subject: [PATCH 19/57] Add curly brackets in env variable calls

---
 .github/workflows/latest-pytorch-support.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/latest-pytorch-support.yml b/.github/workflows/latest-pytorch-support.yml
index d9759d4510..d3834da16c 100644
--- a/.github/workflows/latest-pytorch-support.yml
+++ b/.github/workflows/latest-pytorch-support.yml
@@ -10,8 +10,8 @@ env:
   new_major: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f1)
   new_minor: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f2)
   new_patch: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f3)
-  new_setup_patch: $(($env.new_patch+1))
-  new_setup_pytorch: ("$env.new_major"."$env.new_minor"."$env.new_setup_patch")
+  new_setup_patch: $((${{env.new_patch}}+1))
+  new_setup_pytorch: ("${{env.new_major}}"."${{env.new_minor}}"."${{env.new_setup_patch}}")
 permissions:
   contents: write
   issues: write

From 46d80031522b7a66aae08419c72777095727e75d Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Fri, 9 Dec 2022 13:22:25 +0100
Subject: [PATCH 20/57] Test workflow

---
 .github/pytorch-release-versions/pytorch-latest.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/pytorch-release-versions/pytorch-latest.txt b/.github/pytorch-release-versions/pytorch-latest.txt
index feaae22bac..b50dd27dd9 100644
--- a/.github/pytorch-release-versions/pytorch-latest.txt
+++ b/.github/pytorch-release-versions/pytorch-latest.txt
@@ -1 +1 @@
-1.13.0
+1.13.1

From 9753126f4bb2cb4b87766ca6cec859b3e90deeec Mon Sep 17 00:00:00 2001
From: mtar <m.tarnawa@fz-juelich.de>
Date: Fri, 9 Dec 2022 13:22:31 +0100
Subject: [PATCH 21/57] upgrade checkout & python actions

---
 .github/workflows/ci.yaml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
index 2b455e3953..956c1ce5d6 100644
--- a/.github/workflows/ci.yaml
+++ b/.github/workflows/ci.yaml
@@ -30,13 +30,13 @@ jobs:
     name: Python ${{ matrix.py-version }} with ${{ matrix.pytorch-version }}; options ${{ matrix.install-options }}
     steps:
       - name: Checkout
-        uses: actions/checkout@v2
+        uses: actions/checkout@v3
       - name: Setup MPI
         uses: mpi4py/setup-mpi@v1
         with:
           mpi: ${{ matrix.mpi }}
       - name: Use Python ${{ matrix.py-version }}
-        uses: actions/setup-python@v2
+        uses: actions/setup-python@v4
         with:
           python-version: ${{ matrix.py-version }}
           architecture: x64

From 3328ac8469f01dab5226f7e0076719922cdd9f1d Mon Sep 17 00:00:00 2001
From: ClaudiaComito <c.comito@fz-juelich.de@users.noreply.github.com>
Date: Mon, 12 Dec 2022 03:12:22 +0000
Subject: [PATCH 22/57] New PyTorch release

---
 .github/pytorch-release-versions/pytorch-latest.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/pytorch-release-versions/pytorch-latest.txt b/.github/pytorch-release-versions/pytorch-latest.txt
index b50dd27dd9..feaae22bac 100644
--- a/.github/pytorch-release-versions/pytorch-latest.txt
+++ b/.github/pytorch-release-versions/pytorch-latest.txt
@@ -1 +1 @@
-1.13.1
+1.13.0

From 9cce973868326e6b909441a5dfd61e83c36879f9 Mon Sep 17 00:00:00 2001
From: "codesee-maps[bot]"
 <86324825+codesee-maps[bot]@users.noreply.github.com>
Date: Mon, 12 Dec 2022 14:13:50 +0100
Subject: [PATCH 23/57] Install the CodeSee workflow. Learn more at
 https://docs.codesee.io (#1055)

Co-authored-by: codesee-maps[bot] <86324825+codesee-maps[bot]@users.noreply.github.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: mtar <m.tarnawa@fz-juelich.de>
---
 .github/workflows/codesee-arch-diagram.yml | 77 +++-------------------
 1 file changed, 9 insertions(+), 68 deletions(-)

diff --git a/.github/workflows/codesee-arch-diagram.yml b/.github/workflows/codesee-arch-diagram.yml
index 5ff5de7b11..01d7f7e629 100644
--- a/.github/workflows/codesee-arch-diagram.yml
+++ b/.github/workflows/codesee-arch-diagram.yml
@@ -1,3 +1,5 @@
+# This workflow was added by CodeSee. Learn more at https://codesee.io/
+# This is v2.0 of this workflow file
 on:
   push:
     branches:
@@ -5,77 +7,16 @@ on:
   pull_request_target:
     types: [opened, synchronize, reopened]
 
-name: CodeSee Map
+name: CodeSee
+
+permissions: read-all
 
 jobs:
-  test_map_action:
+  codesee:
     runs-on: ubuntu-latest
     continue-on-error: true
-    name: Run CodeSee Map Analysis
+    name: Analyze the repo with CodeSee
     steps:
-      - name: checkout
-        id: checkout
-        uses: actions/checkout@v2
-        with:
-          repository: ${{ github.event.pull_request.head.repo.full_name }}
-          ref: ${{ github.event.pull_request.head.ref }}
-          fetch-depth: 0
-
-      # codesee-detect-languages has an output with id languages.
-      - name: Detect Languages
-        id: detect-languages
-        uses: Codesee-io/codesee-detect-languages-action@latest
-
-      - name: Configure JDK 16
-        uses: actions/setup-java@v2
-        if: ${{ fromJSON(steps.detect-languages.outputs.languages).java }}
-        with:
-          java-version: '16'
-          distribution: 'zulu'
-
-      # CodeSee Maps Go support uses a static binary so there's no setup step required.
-
-      - name: Configure Node.js 14
-        uses: actions/setup-node@v2
-        if: ${{ fromJSON(steps.detect-languages.outputs.languages).javascript }}
-        with:
-          node-version: '14'
-
-      - name: Configure Python 3.x
-        uses: actions/setup-python@v2
-        if: ${{ fromJSON(steps.detect-languages.outputs.languages).python }}
-        with:
-          python-version: '3.10'
-          architecture: 'x64'
-
-      - name: Configure Ruby '3.x'
-        uses: ruby/setup-ruby@v1
-        if: ${{ fromJSON(steps.detect-languages.outputs.languages).ruby }}
-        with:
-          ruby-version: '3.0'
-
-      # CodeSee Maps Rust support uses a static binary so there's no setup step required.
-
-      - name: Generate Map
-        id: generate-map
-        uses: Codesee-io/codesee-map-action@latest
-        with:
-          step: map
-          github_ref: ${{ github.ref }}
-          languages: ${{ steps.detect-languages.outputs.languages }}
-
-      - name: Upload Map
-        id: upload-map
-        uses: Codesee-io/codesee-map-action@latest
-        with:
-          step: mapUpload
-          api_token: ${{ secrets.CODESEE_ARCH_DIAG_API_TOKEN }}
-          github_ref: ${{ github.ref }}
-
-      - name: Insights
-        id: insights
-        uses: Codesee-io/codesee-map-action@latest
+      - uses: Codesee-io/codesee-action@v2
         with:
-          step: insights
-          api_token: ${{ secrets.CODESEE_ARCH_DIAG_API_TOKEN }}
-          github_ref: ${{ github.ref }}
+          codesee-token: ${{ secrets.CODESEE_ARCH_DIAG_API_TOKEN }}

From 18f66ea378d8be2657de34ec0b19bd099802da4d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Guti=C3=A9rrez=20Hermosillo=20Muriedas=2C=20Juan=20Pedro?=
 <juanpedroghm@gmail.com>
Date: Mon, 12 Dec 2022 14:51:43 +0100
Subject: [PATCH 24/57] Updated CHANGELOG.md and fixed changelog updater target
 branch

---
 .github/workflows/changelog-updater.yml |  2 +-
 CHANGELOG.md                            | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/changelog-updater.yml b/.github/workflows/changelog-updater.yml
index 15cef8c19e..4044743591 100644
--- a/.github/workflows/changelog-updater.yml
+++ b/.github/workflows/changelog-updater.yml
@@ -22,6 +22,6 @@ jobs:
       - name: Commit updated CHANGELOG
         uses: stefanzweifel/git-auto-commit-action@v4
         with:
-          branch: main
+          branch: release/1.2.x
           commit_message: Update CHANGELOG
           file_pattern: CHANGELOG.md
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 9d49c9f203..d5c06d2dde 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,22 @@
+# v1.2.1
+
+## Changes
+
+- #1048 Support PyTorch 1.13.0 on branch release/1.2.x (by @github-actions)
+
+## 🐛 Bug Fixes
+
+- #1038 Lanczos decomposition `linalg.solver.lanczos`:  Support double precision, complex data types (by @ClaudiaComito)
+- #1034 `ht.array`, closed loophole allowing `DNDarray` construction with incompatible shapes of local arrays (by @Mystic-Slice)
+
+## Linear Algebra
+
+- #1038 Lanczos decomposition `linalg.solver.lanczos`:  Support double precision, complex data types (by @ClaudiaComito)
+
+## 🧪 Testing
+
+- #1025 mirror repository on gitlab + ci (by @mtar)
+- #1014 fix: set cuda rng state on gpu tests for test_random.py (by @JuanPedroGHM)
 
 # v1.2.0
 

From 65f77ef9c505038d23f0d688bc1ad926a40939d1 Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Tue, 13 Dec 2022 03:20:57 +0000
Subject: [PATCH 25/57] [pre-commit.ci] pre-commit autoupdate
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

updates:
- [github.com/psf/black: 22.10.0 → 22.12.0](https://github.com/psf/black/compare/22.10.0...22.12.0)
---
 .pre-commit-config.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 20132efc45..4b1cee7560 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -10,7 +10,7 @@ repos:
     -   id: check-added-large-files
     -   id: flake8
 -   repo: https://github.com/psf/black
-    rev: 22.10.0
+    rev: 22.12.0
     hooks:
     -   id: black
 -   repo: https://github.com/pycqa/pydocstyle

From b1e08f3222c19db4ac152e13787f39cd269a483e Mon Sep 17 00:00:00 2001
From: ClaudiaComito <c.comito@fz-juelich.de@users.noreply.github.com>
Date: Mon, 19 Dec 2022 03:06:41 +0000
Subject: [PATCH 26/57] New PyTorch release

---
 .github/pytorch-release-versions/pytorch-latest.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/pytorch-release-versions/pytorch-latest.txt b/.github/pytorch-release-versions/pytorch-latest.txt
index feaae22bac..b50dd27dd9 100644
--- a/.github/pytorch-release-versions/pytorch-latest.txt
+++ b/.github/pytorch-release-versions/pytorch-latest.txt
@@ -1 +1 @@
-1.13.0
+1.13.1

From 69473d56d0b82913fb6e63a774fa815b9d8058a4 Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Mon, 19 Dec 2022 09:20:13 +0100
Subject: [PATCH 27/57] FIx env variables call

---
 .github/workflows/latest-pytorch-support.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/latest-pytorch-support.yml b/.github/workflows/latest-pytorch-support.yml
index d3834da16c..2d8d20c904 100644
--- a/.github/workflows/latest-pytorch-support.yml
+++ b/.github/workflows/latest-pytorch-support.yml
@@ -10,8 +10,8 @@ env:
   new_major: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f1)
   new_minor: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f2)
   new_patch: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f3)
-  new_setup_patch: $((${{env.new_patch}}+1))
-  new_setup_pytorch: ("${{env.new_major}}"."${{env.new_minor}}"."${{env.new_setup_patch}}")
+  new_setup_patch: $((${{new_patch}}+1))
+  new_setup_pytorch: ("${{new_major}}"."${{new_minor}}"."${{new_setup_patch}}")
 permissions:
   contents: write
   issues: write

From 166e8b30badd140ed41d38af6340914aef3a75d5 Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Mon, 19 Dec 2022 09:20:46 +0100
Subject: [PATCH 28/57] Trigger workflow

---
 .github/pytorch-release-versions/pytorch-latest.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/pytorch-release-versions/pytorch-latest.txt b/.github/pytorch-release-versions/pytorch-latest.txt
index b50dd27dd9..1a6a57522e 100644
--- a/.github/pytorch-release-versions/pytorch-latest.txt
+++ b/.github/pytorch-release-versions/pytorch-latest.txt
@@ -1 +1 @@
-1.13.1
+1.13.1 

From 41c20daa1e604fd1d7b1837b56803bad24d0d291 Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Mon, 19 Dec 2022 09:29:38 +0100
Subject: [PATCH 29/57] Env variables manipulation

---
 .github/workflows/latest-pytorch-support.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/latest-pytorch-support.yml b/.github/workflows/latest-pytorch-support.yml
index 2d8d20c904..7030d914f2 100644
--- a/.github/workflows/latest-pytorch-support.yml
+++ b/.github/workflows/latest-pytorch-support.yml
@@ -10,8 +10,6 @@ env:
   new_major: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f1)
   new_minor: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f2)
   new_patch: $(<.github/pytorch-release-versions/pytorch-latest.txt | cut -d'.' -f3)
-  new_setup_patch: $((${{new_patch}}+1))
-  new_setup_pytorch: ("${{new_major}}"."${{new_minor}}"."${{new_setup_patch}}")
 permissions:
   contents: write
   issues: write
@@ -43,6 +41,8 @@ jobs:
         run: |
           echo ${{ env.previous_pytorch }}
           echo ${{ env.new_pytorch }}
+          # new_setup_patch: $((${{new_patch}}+1))
+          echo "new_setup_pytorch=$("${{env.new_major}}"."${{env.new_minor}}"."${{((${{env.new_patch}}+1))}}")" >> $GITHUB_ENV
           sed -i '/torch>=/ s/'"${{ env.previous_pytorch }}"'/'"${{ env.new_setup_pytorch }}"'/g' setup.py
           sed -i 's/'"${{ env.previous_pytorch }}"'/'"${{ env.new_pytorch }}"'/g' .github/pytorch-release-versions/pytorch-latest.txt
       - name: Define env variable

From 4f9a3b7bcbcf6b1b89109426c18379cea1038150 Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Mon, 19 Dec 2022 09:30:02 +0100
Subject: [PATCH 30/57] Trigger

---
 .github/pytorch-release-versions/pytorch-latest.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/pytorch-release-versions/pytorch-latest.txt b/.github/pytorch-release-versions/pytorch-latest.txt
index 1a6a57522e..b50dd27dd9 100644
--- a/.github/pytorch-release-versions/pytorch-latest.txt
+++ b/.github/pytorch-release-versions/pytorch-latest.txt
@@ -1 +1 @@
-1.13.1 
+1.13.1

From e992f98b711ebaa95c54fe703c40734873637689 Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Mon, 19 Dec 2022 10:58:28 +0100
Subject: [PATCH 31/57] Env variables

---
 .github/workflows/latest-pytorch-support.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/latest-pytorch-support.yml b/.github/workflows/latest-pytorch-support.yml
index 7030d914f2..69e5c8c58c 100644
--- a/.github/workflows/latest-pytorch-support.yml
+++ b/.github/workflows/latest-pytorch-support.yml
@@ -41,8 +41,8 @@ jobs:
         run: |
           echo ${{ env.previous_pytorch }}
           echo ${{ env.new_pytorch }}
-          # new_setup_patch: $((${{new_patch}}+1))
-          echo "new_setup_pytorch=$("${{env.new_major}}"."${{env.new_minor}}"."${{((${{env.new_patch}}+1))}}")" >> $GITHUB_ENV
+          echo "new_setup_patch=$((${{ env.new_patch }}+1))" >> $GITHUB_ENV
+          echo "new_setup_pytorch=$("${{ env.new_major }}"."${{ env.new_minor }}"."${{ env.new_setup_patch }}")" >> $GITHUB_ENV
           sed -i '/torch>=/ s/'"${{ env.previous_pytorch }}"'/'"${{ env.new_setup_pytorch }}"'/g' setup.py
           sed -i 's/'"${{ env.previous_pytorch }}"'/'"${{ env.new_pytorch }}"'/g' .github/pytorch-release-versions/pytorch-latest.txt
       - name: Define env variable

From 43029132b85338bd90dbe9414c53192bf8b5ea00 Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Mon, 19 Dec 2022 10:59:09 +0100
Subject: [PATCH 32/57] Trigger workflow

---
 .github/pytorch-release-versions/pytorch-latest.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/pytorch-release-versions/pytorch-latest.txt b/.github/pytorch-release-versions/pytorch-latest.txt
index b50dd27dd9..57b1e5a8bc 100644
--- a/.github/pytorch-release-versions/pytorch-latest.txt
+++ b/.github/pytorch-release-versions/pytorch-latest.txt
@@ -1 +1 @@
-1.13.1
+ 1.13.1

From 6cb5c40534638b1badc9a20b79bd48fa0daea07c Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Mon, 19 Dec 2022 11:10:54 +0100
Subject: [PATCH 33/57] Test env variables

---
 .github/workflows/latest-pytorch-support.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/latest-pytorch-support.yml b/.github/workflows/latest-pytorch-support.yml
index 69e5c8c58c..0cfa068bbc 100644
--- a/.github/workflows/latest-pytorch-support.yml
+++ b/.github/workflows/latest-pytorch-support.yml
@@ -43,8 +43,8 @@ jobs:
           echo ${{ env.new_pytorch }}
           echo "new_setup_patch=$((${{ env.new_patch }}+1))" >> $GITHUB_ENV
           echo "new_setup_pytorch=$("${{ env.new_major }}"."${{ env.new_minor }}"."${{ env.new_setup_patch }}")" >> $GITHUB_ENV
+          echo ${{ env.new_setup_pytorch }}
           sed -i '/torch>=/ s/'"${{ env.previous_pytorch }}"'/'"${{ env.new_setup_pytorch }}"'/g' setup.py
-          sed -i 's/'"${{ env.previous_pytorch }}"'/'"${{ env.new_pytorch }}"'/g' .github/pytorch-release-versions/pytorch-latest.txt
       - name: Define env variable
         run: |
           echo "new=$(<.github/pytorch-release-versions/pytorch-latest.txt)" >> $GITHUB_ENV

From 370d8e277dd1924ee06377a9809c7d15b395d63c Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Mon, 19 Dec 2022 11:11:37 +0100
Subject: [PATCH 34/57] Trigger workflow

---
 .github/pytorch-release-versions/pytorch-latest.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/pytorch-release-versions/pytorch-latest.txt b/.github/pytorch-release-versions/pytorch-latest.txt
index 57b1e5a8bc..b50dd27dd9 100644
--- a/.github/pytorch-release-versions/pytorch-latest.txt
+++ b/.github/pytorch-release-versions/pytorch-latest.txt
@@ -1 +1 @@
- 1.13.1
+1.13.1

From bde18c107eb6f9ed3478220d06eff903893c0088 Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Mon, 19 Dec 2022 11:50:25 +0100
Subject: [PATCH 35/57] Handle env variables

---
 .github/workflows/latest-pytorch-support.yml | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/latest-pytorch-support.yml b/.github/workflows/latest-pytorch-support.yml
index 0cfa068bbc..d2e957b88d 100644
--- a/.github/workflows/latest-pytorch-support.yml
+++ b/.github/workflows/latest-pytorch-support.yml
@@ -37,12 +37,14 @@ jobs:
         with:
           token: ${{ secrets.GITHUB_TOKEN }}
           ref: 'release/1.2.x'
-      - name: Update setup.py
+      - name: Increment patch
         run: |
-          echo ${{ env.previous_pytorch }}
-          echo ${{ env.new_pytorch }}
           echo "new_setup_patch=$((${{ env.new_patch }}+1))" >> $GITHUB_ENV
+      - name: Define version string
+        run: |
           echo "new_setup_pytorch=$("${{ env.new_major }}"."${{ env.new_minor }}"."${{ env.new_setup_patch }}")" >> $GITHUB_ENV
+      - name: Update setup.py
+        run: |
           echo ${{ env.new_setup_pytorch }}
           sed -i '/torch>=/ s/'"${{ env.previous_pytorch }}"'/'"${{ env.new_setup_pytorch }}"'/g' setup.py
       - name: Define env variable

From d9ac4edc6e23e2ab4d582b93e8f9353ea94fb64f Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Mon, 19 Dec 2022 11:51:24 +0100
Subject: [PATCH 36/57] Test workflow [skip ci]

---
 .github/pytorch-release-versions/pytorch-latest.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/pytorch-release-versions/pytorch-latest.txt b/.github/pytorch-release-versions/pytorch-latest.txt
index b50dd27dd9..61ce01b301 100644
--- a/.github/pytorch-release-versions/pytorch-latest.txt
+++ b/.github/pytorch-release-versions/pytorch-latest.txt
@@ -1 +1 @@
-1.13.1
+1.13.2

From ab7c8c0d58fd81e2ea2ae8df5ff12ce88bccbc5b Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Mon, 19 Dec 2022 11:52:08 +0100
Subject: [PATCH 37/57] test workflow

---
 .github/pytorch-release-versions/pytorch-latest.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/pytorch-release-versions/pytorch-latest.txt b/.github/pytorch-release-versions/pytorch-latest.txt
index 61ce01b301..01b7568230 100644
--- a/.github/pytorch-release-versions/pytorch-latest.txt
+++ b/.github/pytorch-release-versions/pytorch-latest.txt
@@ -1 +1 @@
-1.13.2
+1.13.3

From 52e74cd9840c9edebc58afa8eef31e16be5338b0 Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Mon, 19 Dec 2022 11:56:21 +0100
Subject: [PATCH 38/57] Support PyTorch 1.13.1

---
 setup.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/setup.py b/setup.py
index 2210ceaf97..0e8f00b0de 100644
--- a/setup.py
+++ b/setup.py
@@ -33,7 +33,7 @@
     install_requires=[
         "mpi4py>=3.0.0",
         "numpy>=1.13.0",
-        "torch>=1.7.0, <1.13.1",
+        "torch>=1.7.0, <1.13.2",
         "scipy>=0.14.0",
         "pillow>=6.0.0",
         "torchvision>=0.8.0",

From 955729f32b1f1249c220923f519a20c4042e99b1 Mon Sep 17 00:00:00 2001
From: Ashwath V A <73862377+Mystic-Slice@users.noreply.github.com>
Date: Mon, 19 Dec 2022 17:57:30 +0530
Subject: [PATCH 39/57] Distributed Compressed Sparse Row Matrix (#1028)

* Chunk function for CSR sparse arrays

* Handled split and is_split parameters + Added getters and setters for member variables of csr class

* Bug fixes

* Tests for sparse csr method

* Separate module for sparse class and operations

* Refactored class to hold torch.sparse tensor inside instead of separate data items as DND arrays

* Arithmetics for sparse: Add and Mul (using the torch's operations)

* Convert sparse matrix to dense

* Return Global indptr as DNDarray connected with other processes + Code style fixes

* dtype of sparse matrices corrected

* device for sparse matrices corrected

* shape for sparse matrices is corrected + Fixed bug with is_split

* Bug fix: Recalculate gnnz when is_split is not None

* Added methods to retrieve data items from Dcsr_matrix (data, indices, ind_ptr)

* Fixed typos + minor corrections

* Fixed bug in calculating global indptr

* Improved APIs for accessing data, indices and indptr

* Binary Operator implemented for Dcsr_matrix. Element-wise addition and multiplication works

* Documentation + Bug fix in shape calculation when using is_split

* Modified parameter used by sparse matrix in the chunk function and added documentation

* Rename file

* Removed unnecessary attributes in Dcsr_matrix class

* torch.sparse_csr_tensor is just a torch.Tensor with a particular layout setting and not a class by itself

* Fixed lshape to be a Tuple

* Fixed bug with indptr

* Tests for dcsr_matrix.py

* Fixed bug with recalculation of gnnz after element-wise operation

* Tests for sparse/arithmetics.py

* Tests for sparse/manipulations.py

* Tests for sparse/factories.py

* Ensured indices to be of dtype torch.int64 to support large data + Supporting tests

* Added check for is_split to ensure that the distributed data match in shape

* Code Cleanup

* Fixed tests

* Added tests for dtype and device attributes in sparse matrix builder method

* Added tests for usage of output buffer in todense()

* Stray comment removed

* Scalar arithmetics for Dcsr_matrix + Supporting tests

* Updated release drafter

* Corrected code for checking if lshape matches with other processes when using is_split + Fixed tests for the same

* Code corrections

* Code corrections

* Code corrections

* Renamed binary operator and other arithmetic functions

* Fixed code to handle case where number of processes > number of rows + Supporting tests

* Added more tests for case when number of processes > number of rows

* Fixed bug: Mismatched shapes when using scipy sparse matrix as input

* Fixed device mismatch error in tests

* Fixed code to avoid undefined behaviour of torch sparse tensors when no data in process

* torch version check

* Fixed broken tests

* Added tests for torch versions check

* Fixed version checks in tests

* Skip tests incase of version incompatibility

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Check major version numbers of torch too

* New pytorch version

Co-authored-by: mtar <m.tarnawa@fz-juelich.de>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
---
 .github/release-drafter.yml             |    2 +
 heat/__init__.py                        |    1 +
 heat/core/_operations.py                |    4 +-
 heat/core/communication.py              |   13 +-
 heat/sparse/__init__.py                 |    7 +
 heat/sparse/_operations.py              |  168 +++
 heat/sparse/arithmetics.py              |   89 ++
 heat/sparse/dcsr_matrix.py              |  339 ++++++
 heat/sparse/factories.py                |  236 ++++
 heat/sparse/manipulations.py            |   79 ++
 heat/sparse/tests/__init__.py           |    4 +
 heat/sparse/tests/test_arithmetics.py   | 1390 +++++++++++++++++++++++
 heat/sparse/tests/test_dcsrmatrix.py    |  287 +++++
 heat/sparse/tests/test_factories.py     |  504 ++++++++
 heat/sparse/tests/test_manipulations.py |   93 ++
 setup.py                                |    2 +-
 16 files changed, 3212 insertions(+), 6 deletions(-)
 create mode 100644 heat/sparse/__init__.py
 create mode 100644 heat/sparse/_operations.py
 create mode 100644 heat/sparse/arithmetics.py
 create mode 100644 heat/sparse/dcsr_matrix.py
 create mode 100644 heat/sparse/factories.py
 create mode 100644 heat/sparse/manipulations.py
 create mode 100644 heat/sparse/tests/__init__.py
 create mode 100644 heat/sparse/tests/test_arithmetics.py
 create mode 100644 heat/sparse/tests/test_dcsrmatrix.py
 create mode 100644 heat/sparse/tests/test_factories.py
 create mode 100644 heat/sparse/tests/test_manipulations.py

diff --git a/.github/release-drafter.yml b/.github/release-drafter.yml
index 7fef410249..a7aa0f06c3 100644
--- a/.github/release-drafter.yml
+++ b/.github/release-drafter.yml
@@ -24,6 +24,8 @@ categories:
     label: 'manipulation'
   - title: 'Random'
     label: 'random'
+  - title: 'Sparse'
+    label: 'sparse'
   - title: 'Google Summer of Code 2022'
     label: 'GSoC22'
   - title: '💯 Benchmarking'
diff --git a/heat/__init__.py b/heat/__init__.py
index b89a900f42..0cac56b609 100644
--- a/heat/__init__.py
+++ b/heat/__init__.py
@@ -14,5 +14,6 @@
 from . import nn
 from . import optim
 from . import regression
+from . import sparse
 from . import spatial
 from . import utils
diff --git a/heat/core/_operations.py b/heat/core/_operations.py
index 08f04e305c..17d8f73f28 100644
--- a/heat/core/_operations.py
+++ b/heat/core/_operations.py
@@ -1,6 +1,4 @@
-"""
-Generalized MPI operations. i.e. element-wise binary operations
-"""
+"""Generalized operations for DNDarray"""
 
 import builtins
 import numpy as np
diff --git a/heat/core/communication.py b/heat/core/communication.py
index ad58dae964..2c7ecb40c3 100644
--- a/heat/core/communication.py
+++ b/heat/core/communication.py
@@ -159,7 +159,12 @@ def is_distributed(self) -> bool:
         return self.size > 1
 
     def chunk(
-        self, shape: Tuple[int], split: int, rank: int = None, w_size: int = None
+        self,
+        shape: Tuple[int],
+        split: int,
+        rank: int = None,
+        w_size: int = None,
+        sparse: bool = False,
     ) -> Tuple[int, Tuple[int], Tuple[slice]]:
         """
         Calculates the chunk of data that will be assigned to this compute node given a global data shape and a split
@@ -179,7 +184,8 @@ def chunk(
         w_size : int, optional
             The MPI world size, defaults to ``self.size``.
             Intended for creating chunk maps without communication
-
+        sparse : bool, optional
+            Specifies whether the array is a sparse matrix
         """
         # ensure the split axis is valid, we actually do not need it
         split = sanitize_axis(shape, split)
@@ -202,6 +208,9 @@ def chunk(
             start = rank * chunk + remainder
         end = start + chunk
 
+        if sparse:
+            return start, end
+
         return (
             start,
             tuple(shape[i] if i != split else end - start for i in range(dims)),
diff --git a/heat/sparse/__init__.py b/heat/sparse/__init__.py
new file mode 100644
index 0000000000..538cb9fd48
--- /dev/null
+++ b/heat/sparse/__init__.py
@@ -0,0 +1,7 @@
+"""add sparse heat function to the ht.sparse namespace"""
+
+from .arithmetics import *
+from .dcsr_matrix import *
+from .factories import *
+from ._operations import *
+from .manipulations import *
diff --git a/heat/sparse/_operations.py b/heat/sparse/_operations.py
new file mode 100644
index 0000000000..1d38114955
--- /dev/null
+++ b/heat/sparse/_operations.py
@@ -0,0 +1,168 @@
+"""Generalized operations for DCSR_matrix"""
+import torch
+import numpy as np
+
+from heat.sparse.dcsr_matrix import DCSR_matrix
+
+from . import factories
+from ..core.communication import MPI
+from ..core.dndarray import DNDarray
+from ..core import types
+
+from typing import Callable, Optional, Dict
+
+__all__ = []
+
+
+def __binary_op_csr(
+    operation: Callable,
+    t1: DCSR_matrix,
+    t2: DCSR_matrix,
+    out: Optional[DCSR_matrix] = None,
+    fn_kwargs: Optional[Dict] = {},
+) -> DCSR_matrix:
+    """
+    Generic wrapper for element-wise binary operations of two operands.
+    Takes the operation function and the two operands involved in the operation as arguments.
+
+    Parameters
+    ----------
+    operation : PyTorch function
+        The operation to be performed. Function that performs operation elements-wise on the involved tensors,
+        e.g. add values from other to self
+    t1: DCSR_matrix
+        The first operand involved in the operation.
+    t2: DCSR_matrix
+        The second operand involved in the operation.
+    out: DCSR_matrix, optional
+        Output buffer in which the result is placed. If not provided, a freshly allocated matrix is returned.
+    fn_kwargs: Dict, optional
+        keyword arguments used for the given operation
+        Default: {} (empty dictionary)
+
+    Returns
+    -------
+    result: ht.sparse.DCSR_matrix
+        A DCSR_matrix containing the results of element-wise operation.
+    """
+    if not np.isscalar(t1) and not isinstance(t1, DCSR_matrix):
+        raise TypeError(
+            f"Only Dcsr_matrices and numeric scalars are supported, but input was {type(t1)}"
+        )
+    if not np.isscalar(t2) and not isinstance(t2, DCSR_matrix):
+        raise TypeError(
+            f"Only Dcsr_matrices and numeric scalars are supported, but input was {type(t2)}"
+        )
+
+    if not isinstance(t1, DCSR_matrix) and not isinstance(t2, DCSR_matrix):
+        raise TypeError(
+            f"Operator only to be used with Dcsr_matrices, but input types were {type(t1)} and {type(t2)}"
+        )
+
+    promoted_type = types.result_type(t1, t2).torch_type()
+
+    # If one of the inputs is a scalar
+    # just perform the operation on the data tensor
+    # and create a new sparse matrix
+    if np.isscalar(t1) or np.isscalar(t2):
+        matrix = t1
+        scalar = t2
+
+        if np.isscalar(t1):
+            matrix = t2
+            scalar = t1
+
+        res_values = operation(matrix.larray.values().to(promoted_type), scalar, **fn_kwargs)
+        res_torch_sparse_csr = torch.sparse_csr_tensor(
+            matrix.lindptr,
+            matrix.lindices,
+            res_values,
+            size=matrix.lshape,
+            device=matrix.device.torch_device,
+        )
+        return factories.sparse_csr_matrix(
+            res_torch_sparse_csr, is_split=matrix.split, comm=matrix.comm, device=matrix.device
+        )
+
+    if t1.shape != t2.shape:
+        raise ValueError(
+            f"Dcsr_matrices of different shapes are not supported, but input shapes were {t1.shape} and {t2.shape}"
+        )
+    output_shape = t1.shape
+
+    if t1.split is not None or t2.split is not None:
+        if t1.split is None:
+            t1 = factories.sparse_csr_matrix(t1.larray, split=0)
+
+        if t2.split is None:
+            t2 = factories.sparse_csr_matrix(t2.larray, split=0)
+
+    output_split = t1.split
+    output_device = t1.device
+    output_comm = t1.comm
+    output_balanced = t1.balanced
+    output_lshape = t1.lshape
+
+    # sanitize out buffer
+    if out is not None:
+        if out.shape != output_shape:
+            raise ValueError(
+                f"Output buffer shape is not compatible with the result. Expected {output_shape}, received {out.shape}"
+            )
+
+        if out.split != output_split:
+            if out.split is None:
+                out = factories.sparse_csr_matrix(out.larray, split=0)
+            else:
+                out = factories.sparse_csr_matrix(
+                    torch.sparse_csr_tensor(
+                        torch.tensor(out.indptr, dtype=torch.int64),
+                        torch.tensor(out.indices, dtype=torch.int64),
+                        torch.tensor(out.data),
+                    )
+                )
+
+        out.device = output_device
+        out.balanced = (
+            output_balanced  # At this point, inputs and out buffer assumed to be balanced
+        )
+
+    # If there are no non-zero elements in either tensors, skip torch operation to
+    #   1. Avoid unnecessary computation
+    #   2. Avoid undefined behaviour when no data in process
+    if t1.lnnz == 0 and t2.lnnz == 0:
+        result = t1.larray
+    else:
+        result = operation(t1.larray.to(promoted_type), t2.larray.to(promoted_type), **fn_kwargs)
+
+    if output_split is not None:
+        output_gnnz = torch.tensor(result._nnz())
+        output_comm.Allreduce(MPI.IN_PLACE, output_gnnz, MPI.SUM)
+        output_gnnz = output_gnnz.item()
+    else:
+        output_gnnz = torch.tensor(result._nnz())
+
+    output_type = types.canonical_heat_type(result.dtype)
+
+    if out is None:
+        return DCSR_matrix(
+            array=torch.sparse_csr_tensor(
+                result.crow_indices().to(torch.int64),
+                result.col_indices().to(torch.int64),
+                result.values(),
+                size=output_lshape,
+            ),
+            gnnz=output_gnnz,
+            gshape=output_shape,
+            dtype=output_type,
+            split=output_split,
+            device=output_device,
+            comm=output_comm,
+            balanced=output_balanced,
+        )
+
+    out.larray.copy_(result)
+    out.gnnz = output_gnnz
+    out.dtype = output_type
+    out.comm = output_comm
+    return out
diff --git a/heat/sparse/arithmetics.py b/heat/sparse/arithmetics.py
new file mode 100644
index 0000000000..1029bd53d0
--- /dev/null
+++ b/heat/sparse/arithmetics.py
@@ -0,0 +1,89 @@
+"""Arithmetic functions for Dcsr_matrices"""
+from __future__ import annotations
+
+import torch
+
+from .dcsr_matrix import DCSR_matrix
+
+from . import _operations
+
+__all__ = [
+    "add",
+    "mul",
+]
+
+
+def add(t1: DCSR_matrix, t2: DCSR_matrix) -> DCSR_matrix:
+    """
+    Element-wise addition of values from two operands, commutative.
+    Takes the first and second operand (scalar or :class:`~heat.sparse.DCSR_matrix`) whose elements are to be added
+    as argument and returns a ``DCSR_matrix`` containing the results of element-wise addition of ``t1`` and ``t2``.
+
+    Parameters
+    ----------
+    t1: DCSR_matrix
+        The first operand involved in the addition
+    t2: DCSR_matrix
+        The second operand involved in the addition
+
+    Examples
+    --------
+    >>> heat_sparse_csr
+    (indptr: tensor([0, 2, 3]), indices: tensor([0, 2, 2]), data: tensor([1., 2., 3.]), dtype=ht.float32, device=cpu:0, split=0)
+    >>> heat_sparse_csr.todense()
+    DNDarray([[1., 0., 2.],
+              [0., 0., 3.]], dtype=ht.float32, device=cpu:0, split=0)
+    >>> sum_sparse = heat_sparse_csr + heat_sparse_csr
+        (or)
+    >>> sum_sparse = ht.sparse.sparse_add(heat_sparse_csr, heat_sparse_csr)
+    >>> sum_sparse
+    (indptr: tensor([0, 2, 3], dtype=torch.int32), indices: tensor([0, 2, 2], dtype=torch.int32), data: tensor([2., 4., 6.]), dtype=ht.float32, device=cpu:0, split=0)
+    >>> sum_sparse.todense()
+    DNDarray([[2., 0., 4.],
+              [0., 0., 6.]], dtype=ht.float32, device=cpu:0, split=0)
+    """
+    return _operations.__binary_op_csr(torch.add, t1, t2)
+
+
+DCSR_matrix.__add__ = lambda self, other: add(self, other)
+DCSR_matrix.__add__.__doc__ = add.__doc__
+DCSR_matrix.__radd__ = lambda self, other: add(self, other)
+DCSR_matrix.__radd__.__doc__ = add.__doc__
+
+
+def mul(t1: DCSR_matrix, t2: DCSR_matrix) -> DCSR_matrix:
+    """
+    Element-wise multiplication (NOT matrix multiplication) of values from two operands, commutative.
+    Takes the first and second operand (scalar or :class:`~heat.sparse.DCSR_matrix`) whose elements are to be
+    multiplied as argument.
+
+    Parameters
+    ----------
+    t1: DCSR_matrix
+        The first operand involved in the multiplication
+    t2: DCSR_matrix
+        The second operand involved in the multiplication
+
+    Examples
+    --------
+    >>> heat_sparse_csr
+    (indptr: tensor([0, 2, 3]), indices: tensor([0, 2, 2]), data: tensor([1., 2., 3.]), dtype=ht.float32, device=cpu:0, split=0)
+    >>> heat_sparse_csr.todense()
+    DNDarray([[1., 0., 2.],
+              [0., 0., 3.]], dtype=ht.float32, device=cpu:0, split=0)
+    >>> pdt_sparse = heat_sparse_csr * heat_sparse_csr
+        (or)
+    >>> pdt_sparse = ht.sparse.sparse_mul(heat_sparse_csr, heat_sparse_csr)
+    >>> pdt_sparse
+    (indptr: tensor([0, 2, 3]), indices: tensor([0, 2, 2]), data: tensor([1., 4., 9.]), dtype=ht.float32, device=cpu:0, split=0)
+    >>> pdt_sparse.todense()
+    DNDarray([[1., 0., 4.],
+              [0., 0., 9.]], dtype=ht.float32, device=cpu:0, split=0)
+    """
+    return _operations.__binary_op_csr(torch.mul, t1, t2)
+
+
+DCSR_matrix.__mul__ = lambda self, other: mul(self, other)
+DCSR_matrix.__mul__.__doc__ = mul.__doc__
+DCSR_matrix.__rmul__ = lambda self, other: mul(self, other)
+DCSR_matrix.__rmul__.__doc__ = mul.__doc__
diff --git a/heat/sparse/dcsr_matrix.py b/heat/sparse/dcsr_matrix.py
new file mode 100644
index 0000000000..468ab84c27
--- /dev/null
+++ b/heat/sparse/dcsr_matrix.py
@@ -0,0 +1,339 @@
+"""Provides DCSR_matrix, a distributed compressed sparse row matrix"""
+from __future__ import annotations
+
+import torch
+from mpi4py import MPI
+from typing import Union, Tuple, TypeVar
+
+from ..core.devices import Device
+from ..core.dndarray import DNDarray
+from ..core.factories import array
+from ..core.types import datatype, canonical_heat_type
+
+__all__ = ["DCSR_matrix"]
+
+Communication = TypeVar("Communication")
+
+
+class DCSR_matrix:
+    """
+    Distributed Compressed Sparse Row Matrix. It is composed of
+    PyTorch sparse_csr_tensors local to each process.
+
+    Parameters
+    ----------
+    array : torch.Tensor (layout ==> torch.sparse_csr)
+        Local sparse array
+    gnnz: int
+        Total number of non-zero elements across all processes
+    gshape : Tuple[int,...]
+        The global shape of the array
+    dtype : datatype
+        The datatype of the array
+    split : int or None
+        If split is not None, it denotes the axis on which the array is divided between processes.
+        DCSR_matrix only supports distribution along axis 0.
+    device : Device
+        The device on which the local arrays are using (cpu or gpu)
+    comm : Communication
+        The communications object for sending and receiving data
+    balanced: bool or None
+        Describes whether the data are evenly distributed across processes.
+    """
+
+    def __init__(
+        self,
+        array: torch.Tensor,
+        gnnz: int,
+        gshape: Tuple[int, ...],
+        dtype: datatype,
+        split: Union[int, None],
+        device: Device,
+        comm: Communication,
+        balanced: bool,
+    ):
+        self.__array = array
+        self.__gnnz = gnnz
+        self.__gshape = gshape
+        self.__dtype = dtype
+        self.__split = split
+        self.__device = device
+        self.__comm = comm
+        self.__balanced = balanced
+
+    def global_indptr(self) -> DNDarray:
+        """
+        Global indptr of the ``DCSR_matrix`` as a ``DNDarray``
+        """
+        if self.split is None:
+            raise ValueError("This method works only for distributed matrices")
+
+        # Need to know the number of non-zero elements
+        # in the processes with lesser rank
+        all_nnz = torch.zeros(self.comm.size + 1, device=self.device.torch_device)
+
+        # Each process must drop their nnz in index = rank + 1
+        all_nnz[self.comm.rank + 1] = self.lnnz
+        self.comm.Allreduce(MPI.IN_PLACE, all_nnz, MPI.SUM)
+
+        # Build prefix array out of all the nnz
+        all_nnz = torch.cumsum(all_nnz, dim=0)
+
+        global_indptr = self.lindptr + int(all_nnz[self.comm.rank])
+
+        # Remove the (n+1) the element from all the processes except last
+        if self.comm.rank != self.comm.size - 1:
+            global_indptr = global_indptr[:-1]
+
+        # NOTE: indptr might be unbalanced in distribution but should not be self balanced
+        return array(
+            global_indptr,
+            dtype=self.lindptr.dtype,
+            device=self.device,
+            comm=self.comm,
+            is_split=self.split,
+        )
+
+    @property
+    def balanced(self) -> bool:
+        """
+        Boolean value indicating if the DCSR_matrix is balanced between the MPI processes
+        """
+        return self.__balanced
+
+    @property
+    def comm(self) -> Communication:
+        """
+        The :class:`~heat.core.communication.Communication` of the ``DCSR_matrix``
+        """
+        return self.__comm
+
+    @property
+    def device(self) -> Device:
+        """
+        The :class:`~heat.core.devices.Device` of the ``DCSR_matrix``
+        """
+        return self.__device
+
+    @property
+    def larray(self) -> torch.Tensor:
+        """
+        Local data of the ``DCSR_matrix``
+        """
+        return self.__array
+
+    @property
+    def data(self) -> torch.Tensor:
+        """
+        Global data of the ``DCSR_matrix``
+        """
+        if self.split is None:
+            return self.ldata
+
+        data_buffer = torch.zeros(
+            size=(self.gnnz,), dtype=self.dtype.torch_type(), device=self.device.torch_device
+        )
+        counts, displs = self.counts_displs_nnz()
+        self.comm.Allgatherv(self.ldata, (data_buffer, counts, displs))
+        return data_buffer
+
+    @property
+    def gdata(self) -> torch.Tensor:
+        """
+        Global data of the ``DCSR_matrix``
+        """
+        return self.data
+
+    @property
+    def ldata(self) -> torch.Tensor:
+        """
+        Local data of the ``DCSR_matrix``
+        """
+        return self.__array.values()
+
+    @property
+    def indptr(self) -> torch.Tensor:
+        """
+        Global indptr of the ``DCSR_matrix``
+        """
+        if self.split is None:
+            return self.lindptr
+
+        return self.global_indptr().resplit(axis=None).larray
+
+    @property
+    def gindptr(self) -> torch.Tensor:
+        """
+        Global indptr of the ``DCSR_matrix``
+        """
+        return self.indptr
+
+    @property
+    def lindptr(self) -> torch.Tensor:
+        """
+        Local indptr of the ``DCSR_matrix``
+        """
+        return self.__array.crow_indices()
+
+    @property
+    def indices(self) -> torch.Tensor:
+        """
+        Global indices of the ``DCSR_matrix``
+        """
+        if self.split is None:
+            return self.lindices
+
+        indices_buffer = torch.zeros(
+            size=(self.gnnz,), dtype=self.lindices.dtype, device=self.device.torch_device
+        )
+        counts, displs = self.counts_displs_nnz()
+        self.comm.Allgatherv(self.lindices, (indices_buffer, counts, displs))
+        return indices_buffer
+
+    @property
+    def gindices(self) -> torch.Tensor:
+        """
+        Global indices of the ``DCSR_matrix``
+        """
+        return self.indices
+
+    @property
+    def lindices(self) -> torch.Tensor:
+        """
+        Local indices of the ``DCSR_matrix``
+        """
+        return self.__array.col_indices()
+
+    @property
+    def ndim(self) -> int:
+        """
+        Number of dimensions of the ``DCSR_matrix``
+        """
+        return len(self.__gshape)
+
+    @property
+    def nnz(self) -> int:
+        """
+        Total number of non-zero elements of the ``DCSR_matrix``
+        """
+        return self.__gnnz
+
+    @property
+    def gnnz(self) -> int:
+        """
+        Total number of non-zero elements of the ``DCSR_matrix``
+        """
+        return self.nnz
+
+    @property
+    def lnnz(self) -> int:
+        """
+        Number of non-zero elements on the local process of the ``DCSR_matrix``
+        """
+        return self.__array._nnz()
+
+    @property
+    def shape(self) -> Tuple[int, ...]:
+        """
+        Global shape of the ``DCSR_matrix``
+        """
+        return self.__gshape
+
+    @property
+    def gshape(self) -> Tuple[int, ...]:
+        """
+        Global shape of the ``DCSR_matrix``
+        """
+        return self.shape
+
+    @property
+    def lshape(self) -> Tuple[int, ...]:
+        """
+        Local shape of the ``DCSR_matrix``
+        """
+        return tuple(self.__array.size())
+
+    @property
+    def dtype(self):
+        """
+        The :class:`~heat.core.types.datatype` of the ``DCSR_matrix``
+        """
+        return self.__dtype
+
+    @property
+    def split(self) -> int:
+        """
+        Returns the axis on which the ``DCSR_matrix`` is split
+        """
+        return self.__split
+
+    def is_distributed(self) -> bool:
+        """
+        Determines whether the data of this ``DCSR_matrix`` is distributed across multiple processes.
+        """
+        return self.split is not None and self.comm.is_distributed()
+
+    def counts_displs_nnz(self) -> Tuple[Tuple[int], Tuple[int]]:
+        """
+        Returns actual counts (number of non-zero items per process) and displacements (offsets) of the DCSR_matrix.
+        Does not assume load balance.
+        """
+        if self.split is not None:
+            counts = torch.zeros(self.comm.size)
+            counts[self.comm.rank] = self.lnnz
+            self.comm.Allreduce(MPI.IN_PLACE, counts, MPI.SUM)
+            displs = [0] + torch.cumsum(counts, dim=0)[:-1].tolist()
+            return tuple(counts.tolist()), tuple(displs)
+        else:
+            raise ValueError(
+                "Non-distributed DCSR_matrix. Cannot calculate counts and displacements."
+            )
+
+    def astype(self, dtype, copy=True) -> DCSR_matrix:
+        """
+        Returns a casted version of this matrix.
+        Casted matrix is a new matrix of the same shape but with given type of this matrix. If copy is ``True``, the
+        same matrix is returned instead.
+
+        Parameters
+        ----------
+        dtype : datatype
+            HeAT type to which the matrix is cast
+        copy : bool, optional
+            By default the operation returns a copy of this matrix. If copy is set to ``False`` the cast is performed
+            in-place and this matrix is returned
+        """
+        dtype = canonical_heat_type(dtype)
+        casted_matrix = self.__array.type(dtype.torch_type())
+        if copy:
+            return DCSR_matrix(
+                casted_matrix,
+                self.gnnz,
+                self.gshape,
+                dtype,
+                self.split,
+                self.device,
+                self.comm,
+                self.balanced,
+            )
+
+        self.__array = casted_matrix
+        self.__dtype = dtype
+
+        return self
+
+    def __repr__(self) -> str:
+        """
+        Computes a printable representation of the passed DCSR_matrix.
+        """
+        print_string = (
+            f"(indptr: {self.indptr}, indices: {self.indices}, data: {self.data}, "
+            f"dtype=ht.{self.dtype.__name__}, device={self.device}, split={self.split})"
+        )
+
+        # Check has to happen after generating string because
+        # generation of string invokes functions that require
+        # participation from all processes
+        if self.comm.rank != 0:
+            return ""
+        return print_string
diff --git a/heat/sparse/factories.py b/heat/sparse/factories.py
new file mode 100644
index 0000000000..04c1607ddd
--- /dev/null
+++ b/heat/sparse/factories.py
@@ -0,0 +1,236 @@
+"""Provides high-level DCSR_matrix initialization functions"""
+
+import torch
+import numpy as np
+from scipy.sparse import csr_matrix as scipy_csr_matrix
+
+from typing import Optional, Type, Union
+import warnings
+
+from ..core import devices
+from ..core import types
+from ..core.communication import MPI, sanitize_comm, Communication
+from ..core.devices import Device
+from ..core.types import datatype
+
+from .dcsr_matrix import DCSR_matrix
+
+__all__ = [
+    "sparse_csr_matrix",
+]
+
+
+def sparse_csr_matrix(
+    obj: Union[torch.Tensor, scipy_csr_matrix],
+    dtype: Optional[Type[datatype]] = None,
+    split: Optional[int] = None,
+    is_split: Optional[int] = None,
+    device: Optional[Device] = None,
+    comm: Optional[Communication] = None,
+) -> DCSR_matrix:
+    """
+    Create a :class:`~heat.sparse.DCSR_matrix`.
+
+    Parameters
+    ----------
+    obj : :class:`torch.Tensor` (layout ==> torch.sparse_csr) or :class:`scipy.sparse.csr_matrix`
+        Sparse tensor that needs to be distributed
+    dtype : datatype, optional
+        The desired data-type for the sparse matrix. If not given, then the type will be determined as the minimum type required
+        to hold the objects in the sequence. This argument can only be used to ‘upcast’ the array. For downcasting, use
+        the :func:`~heat.sparse.dcsr_matrix.astype` method.
+    split : int or None, optional
+        The axis along which the passed array content ``obj`` is split and distributed in memory. DCSR_matrix only supports
+        distribution along axis 0. Mutually exclusive with ``is_split``.
+    is_split : int or None, optional
+        Specifies the axis along which the local data portions, passed in obj, are split across all machines. DCSR_matrix only
+        supports distribution along axis 0. Useful for interfacing with other distributed-memory code. The shape of the global
+        array is automatically inferred. Mutually exclusive with ``split``.
+    device : str or Device, optional
+        Specifies the :class:`~heat.core.devices.Device` the array shall be allocated on (i.e. globally set default
+        device).
+    comm : Communication, optional
+        Handle to the nodes holding distributed array chunks.
+
+    Raises
+    ------
+    ValueError
+        If split and is_split parameters are not one of 0 or None.
+
+    Examples
+    --------
+    Create a :class:`~heat.sparse.DCSR_matrix` from :class:`torch.Tensor` (layout ==> torch.sparse_csr)
+    >>> indptr = torch.tensor([0, 2, 3, 6])
+    >>> indices = torch.tensor([0, 2, 2, 0, 1, 2])
+    >>> data = torch.tensor([1, 2, 3, 4, 5, 6], dtype=torch.float)
+    >>> torch_sparse_csr = torch.sparse_csr_tensor(indptr, indices, data)
+    >>> heat_sparse_csr = ht.sparse.sparse_csr_matrix(torch_sparse_csr, split=0)
+    >>> heat_sparse_csr
+    (indptr: tensor([0, 2, 3, 6]), indices: tensor([0, 2, 2, 0, 1, 2]), data: tensor([1., 2., 3., 4., 5., 6.]), dtype=ht.float32, device=cpu:0, split=0)
+
+    Create a :class:`~heat.sparse.DCSR_matrix` from :class:`scipy.sparse.csr_matrix`
+    >>> scipy_sparse_csr = scipy.sparse.csr_matrix((data, indices, indptr))
+    >>> heat_sparse_csr = ht.sparse.sparse_csr_matrix(scipy_sparse_csr, split=0)
+    >>> heat_sparse_csr
+    (indptr: tensor([0, 2, 3, 6], dtype=torch.int32), indices: tensor([0, 2, 2, 0, 1, 2], dtype=torch.int32), data: tensor([1., 2., 3., 4., 5., 6.]), dtype=ht.float32, device=cpu:0, split=0)
+
+    Create a :class:`~heat.sparse.DCSR_matrix` using data that is already distributed (with `is_split`)
+    >>> indptrs = [torch.tensor([0, 2, 3]), torch.tensor([0, 3])]
+    >>> indices = [torch.tensor([0, 2, 2]), torch.tensor([0, 1, 2])]
+    >>> data = [torch.tensor([1, 2, 3], dtype=torch.float),
+                torch.tensor([4, 5, 6], dtype=torch.float)]
+    >>> rank = ht.MPI_WORLD.rank
+    >>> local_indptr = indptrs[rank]
+    >>> local_indices = indices[rank]
+    >>> local_data = data[rank]
+    >>> local_torch_sparse_csr = torch.sparse_csr_tensor(local_indptr, local_indices, local_data)
+    >>> heat_sparse_csr = ht.sparse.sparse_csr_matrix(local_torch_sparse_csr, is_split=0)
+    >>> heat_sparse_csr
+    (indptr: tensor([0, 2, 3, 6]), indices: tensor([0, 2, 2, 0, 1, 2]), data: tensor([1., 2., 3., 4., 5., 6.]), dtype=ht.float32, device=cpu:0, split=0)
+    """
+    # version check
+    if int(torch.__version__.split(".")[1]) < 10:
+        raise RuntimeError(f"ht.sparse requires torch >= 1.10. Found version {torch.__version__}.")
+
+    # sanitize the data type
+    if dtype is not None:
+        dtype = types.canonical_heat_type(dtype)
+
+    # sanitize device
+    if device is not None:
+        device = devices.sanitize_device(device)
+
+    # Convert input into torch.Tensor (layout ==> torch.sparse_csr)
+    if isinstance(obj, scipy_csr_matrix):
+        obj = torch.sparse_csr_tensor(
+            obj.indptr,
+            obj.indices,
+            obj.data,
+            device=device.torch_device if device is not None else devices.get_device().torch_device,
+            size=obj.shape,
+        )
+
+    # infer dtype from obj if not explicitly given
+    if dtype is None:
+        dtype = types.canonical_heat_type(obj.dtype)
+    else:
+        torch_dtype = dtype.torch_type()
+        if obj.dtype != torch_dtype:
+            obj = obj.type(torch_dtype)
+
+    # infer device from obj if not explicitly given
+    if device is None:
+        device = devices.sanitize_device(obj.device.type)
+
+    if str(obj.device) != device.torch_device:
+        warnings.warn(
+            "Array 'obj' is not on device '{}'. It will be moved to it.".format(device), UserWarning
+        )
+        obj = obj.to(device.torch_device)
+
+    comm = sanitize_comm(comm)
+    gshape = tuple(obj.shape)
+    lshape = gshape
+    gnnz = obj.values().shape[0]
+
+    if split == 0:
+        start, end = comm.chunk(gshape, split, sparse=True)
+
+        # Find the starting and ending indices for
+        # col_indices and values tensors for this process
+        indices_start = obj.crow_indices()[start]
+        indices_end = obj.crow_indices()[end]
+
+        # Slice the data belonging to this process
+        data = obj.values()[indices_start:indices_end]
+        # start:(end + 1) because indptr is of size (n + 1) for array with n rows
+        indptr = obj.crow_indices()[start : end + 1]
+        indices = obj.col_indices()[indices_start:indices_end]
+
+        indptr = indptr - indptr[0]
+
+        lshape = list(lshape)
+        lshape[split] = end - start
+        lshape = tuple(lshape)
+
+    elif split is not None:
+        raise ValueError(f"Split axis {split} not supported for class DCSR_matrix")
+
+    elif is_split == 0:
+        # Check whether the distributed data matches in
+        # all dimensions other than axis 0
+        neighbour_shape = np.array(gshape)
+        lshape = np.array(lshape)
+
+        if comm.rank < comm.size - 1:
+            comm.Isend(lshape, dest=comm.rank + 1)
+        if comm.rank != 0:
+            # Dont have to check whether the number of dimensions are same since
+            # both torch.sparse_csr_tensor and scipy.sparse.csr_matrix are 2D only
+
+            # check whether the individual shape elements match
+            comm.Recv(neighbour_shape, source=comm.rank - 1)
+            for i in range(len(lshape)):
+                if i == is_split:
+                    continue
+                elif lshape[i] != neighbour_shape[i]:
+                    neighbour_shape[is_split] = np.iinfo(neighbour_shape.dtype).min
+
+        lshape = tuple(lshape)
+
+        # sum up the elements along the split dimension
+        reduction_buffer = np.array(neighbour_shape[is_split])
+        # To check if any process has found that its neighbour
+        # does not match with itself in shape
+        comm.Allreduce(MPI.IN_PLACE, reduction_buffer, MPI.MIN)
+        if reduction_buffer < 0:
+            raise ValueError(
+                "Unable to construct DCSR_matrix. Local data slices have inconsistent shapes or dimensions."
+            )
+
+        data = obj.values()
+        indptr = obj.crow_indices()
+        indices = obj.col_indices()
+
+        # Calculate gshape
+        gshape_split = torch.tensor(gshape[is_split])
+        comm.Allreduce(MPI.IN_PLACE, gshape_split, MPI.SUM)
+        gshape = list(gshape)
+        gshape[is_split] = gshape_split.item()
+        gshape = tuple(gshape)
+
+        # Calculate gnnz
+        lnnz = data.shape[0]
+        gnnz_buffer = torch.tensor(lnnz)
+        comm.Allreduce(MPI.IN_PLACE, gnnz_buffer, MPI.SUM)
+        gnnz = gnnz_buffer.item()
+
+        split = is_split
+
+    elif is_split is not None:
+        raise ValueError(f"Split axis {split} not supported for class DCSR_matrix")
+
+    else:  # split is None and is_split is None
+        data = obj.values()
+        indptr = obj.crow_indices()
+        indices = obj.col_indices()
+
+    sparse_array = torch.sparse_csr_tensor(
+        indptr.to(torch.int64),
+        indices.to(torch.int64),
+        data,
+        size=lshape,
+        dtype=dtype.torch_type(),
+        device=device.torch_device,
+    )
+
+    return DCSR_matrix(
+        array=sparse_array,
+        gnnz=gnnz,
+        gshape=gshape,
+        dtype=dtype,
+        split=split,
+        device=device,
+        comm=comm,
+        balanced=True,
+    )
diff --git a/heat/sparse/manipulations.py b/heat/sparse/manipulations.py
new file mode 100644
index 0000000000..a8952c410d
--- /dev/null
+++ b/heat/sparse/manipulations.py
@@ -0,0 +1,79 @@
+"""Manipulation operations for (potentially distributed) `DCSR_matrix`."""
+from __future__ import annotations
+
+from heat.sparse.dcsr_matrix import DCSR_matrix
+
+from ..core.memory import sanitize_memory_layout
+from ..core.dndarray import DNDarray
+from ..core.factories import empty
+
+__all__ = [
+    "todense",
+]
+
+
+def todense(sparse_matrix: DCSR_matrix, order="C", out: DNDarray = None) -> DNDarray:
+    """
+    Convert :class:`~heat.sparse.DCSR_matrix` to a dense :class:`~heat.core.DNDarray`.
+    Output follows the same distribution among processes as the input
+
+    Parameters
+    ----------
+    sparse_matrix : :class:`~heat.sparse.DCSR_matrix`
+        The sparse csr matrix which is to be converted to a dense array
+    order: str, optional
+        Options: ``'C'`` or ``'F'``. Specifies the memory layout of the newly created `DNDarray`. Default is ``order='C'``,
+        meaning the array will be stored in row-major order (C-like). If ``order=‘F’``, the array will be stored in
+        column-major order (Fortran-like).
+    out : DNDarray
+        Output buffer in which the values of the dense format is stored.
+        If not specified, a new DNDarray is created.
+
+    Raises
+    ------
+    ValueError
+        If shape of output buffer does not match that of the input.
+    ValueError
+        If split axis of output buffer does not match that of the input.
+
+    Examples
+    --------
+    >>> indptr = torch.tensor([0, 2, 3, 6])
+    >>> indices = torch.tensor([0, 2, 2, 0, 1, 2])
+    >>> data = torch.tensor([1, 2, 3, 4, 5, 6], dtype=torch.float)
+    >>> torch_sparse_csr = torch.sparse_csr_tensor(indptr, indices, data)
+    >>> heat_sparse_csr = ht.sparse.sparse_csr_matrix(torch_sparse_csr, split=0)
+    >>> heat_sparse_csr
+    (indptr: tensor([0, 2, 3, 6]), indices: tensor([0, 2, 2, 0, 1, 2]), data: tensor([1., 2., 3., 4., 5., 6.]), dtype=ht.float32, device=cpu:0, split=0)
+    >>> heat_sparse_csr.todense()
+    DNDarray([[1., 0., 2.],
+              [0., 0., 3.],
+              [4., 5., 6.]], dtype=ht.float32, device=cpu:0, split=0)
+    """
+    if out is not None:
+        if out.shape != sparse_matrix.shape:
+            raise ValueError(
+                f"Expected output buffer with shape {sparse_matrix.shape} but was {out.shape}"
+            )
+
+        if out.split != sparse_matrix.split:
+            raise ValueError(
+                f"Expected output buffer with split axis {sparse_matrix.split} but was {out.split}"
+            )
+
+    if out is None:
+        out = empty(
+            shape=sparse_matrix.shape,
+            split=sparse_matrix.split,
+            dtype=sparse_matrix.dtype,
+            device=sparse_matrix.device,
+            comm=sparse_matrix.comm,
+            order=order,
+        )
+
+    out.larray = sanitize_memory_layout(sparse_matrix.larray.to_dense(), order=order)
+    return out
+
+
+DCSR_matrix.todense = lambda self, order="C", out=None: todense(self, order, out)
+DCSR_matrix.to_dense = lambda self, order="C", out=None: todense(self, order, out)
diff --git a/heat/sparse/tests/__init__.py b/heat/sparse/tests/__init__.py
new file mode 100644
index 0000000000..da01ac5e0f
--- /dev/null
+++ b/heat/sparse/tests/__init__.py
@@ -0,0 +1,4 @@
+from .test_arithmetics import *
+from .test_dcsrmatrix import *
+from .test_factories import *
+from .test_manipulations import *
diff --git a/heat/sparse/tests/test_arithmetics.py b/heat/sparse/tests/test_arithmetics.py
new file mode 100644
index 0000000000..e1629d80a4
--- /dev/null
+++ b/heat/sparse/tests/test_arithmetics.py
@@ -0,0 +1,1390 @@
+import unittest
+import heat as ht
+import torch
+import numpy as np
+
+import random
+
+from heat.core.tests.test_suites.basic_test import TestCase
+
+
+@unittest.skipIf(
+    int(torch.__version__.split(".")[0]) <= 1 and int(torch.__version__.split(".")[1]) < 10,
+    f"ht.sparse requires torch >= 1.10. Found version {torch.__version__}.",
+)
+class TestArithmetics(TestCase):
+    @classmethod
+    def setUpClass(self):
+        super(TestArithmetics, self).setUpClass()
+
+        """
+        A = [[0, 0, 1, 0, 2]
+            [0, 0, 0, 0, 0]
+            [0, 3, 0, 0, 0]
+            [4, 0, 0, 5, 0]
+            [0, 0, 0, 0, 6]]
+        """
+        self.ref_indptr_A = torch.tensor(
+            [0, 2, 2, 3, 5, 6], dtype=torch.int, device=self.device.torch_device
+        )
+        self.ref_indices_A = torch.tensor(
+            [2, 4, 1, 0, 3, 4], dtype=torch.int, device=self.device.torch_device
+        )
+        self.ref_data_A = torch.tensor(
+            [1, 2, 3, 4, 5, 6], dtype=torch.float, device=self.device.torch_device
+        )
+        self.ref_torch_sparse_csr_A = torch.sparse_csr_tensor(
+            self.ref_indptr_A,
+            self.ref_indices_A,
+            self.ref_data_A,
+            device=self.device.torch_device,
+        )
+
+        """
+        B = [[2, 0, 0, 0, 3]
+            [0, 0, 4, 0, 0]
+            [0, 1, 0, 1, 0]
+            [0, 0, 0, 0, 0]
+            [0, 3, 0, 4, 0]]
+        """
+        self.ref_indptr_B = torch.tensor(
+            [0, 2, 3, 5, 5, 7], dtype=torch.int, device=self.device.torch_device
+        )
+        self.ref_indices_B = torch.tensor(
+            [0, 4, 2, 1, 3, 1, 3], dtype=torch.int, device=self.device.torch_device
+        )
+        self.ref_data_B = torch.tensor(
+            [2, 3, 4, 1, 1, 3, 4], dtype=torch.float, device=self.device.torch_device
+        )
+        self.ref_torch_sparse_csr_B = torch.sparse_csr_tensor(
+            self.ref_indptr_B,
+            self.ref_indices_B,
+            self.ref_data_B,
+            device=self.device.torch_device,
+        )
+
+        self.world_size = ht.communication.MPI_WORLD.size
+        self.rank = ht.communication.MPI_WORLD.rank
+
+        self.scalar = np.array(random.randint(1, 100))
+        if self.world_size > 0:
+            ht.communication.MPI_WORLD.Bcast(self.scalar, root=0)
+        self.scalar = self.scalar.item()
+
+    def test_add(self):
+
+        heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A)
+        heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_B)
+
+        """
+        C = [[2, 0, 1, 0, 5]
+            [0, 0, 4, 0, 0]
+            [0, 4, 0, 1, 0]
+            [4, 0, 0, 5, 0]
+            [0, 3, 0, 4, 6]]
+        """
+        indptr_C = [0, 3, 4, 6, 8, 11]
+        indices_C = [0, 2, 4, 2, 1, 3, 0, 3, 1, 3, 4]
+        data_C = [2, 1, 5, 4, 4, 1, 4, 5, 3, 4, 6]
+
+        heat_sparse_csr_C = heat_sparse_csr_A + heat_sparse_csr_B
+
+        self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+        self.assertTrue(
+            (
+                heat_sparse_csr_C.indptr == torch.tensor(indptr_C, device=self.device.torch_device)
+            ).all()
+        )
+        self.assertTrue(
+            (
+                heat_sparse_csr_C.indices
+                == torch.tensor(indices_C, device=self.device.torch_device)
+            ).all()
+        )
+        self.assertTrue(
+            (heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)).all()
+        )
+        self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+        self.assertEqual(heat_sparse_csr_C.split, None)
+        self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+        self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+        # Distributed case
+        if self.world_size == 2:
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A, split=0)
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_B, split=0)
+
+            indptr_C_dist = [[0, 3, 4, 6], [0, 2, 5]]
+            indices_C_dist = [[0, 2, 4, 2, 1, 3], [0, 3, 1, 3, 4]]
+            data_C_dist = [[2, 1, 5, 4, 4, 1], [4, 5, 3, 4, 6]]
+
+            heat_sparse_csr_C = heat_sparse_csr_A + heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_A.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+            # Operands with different splits
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A, split=0)
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_B, split=None)
+            heat_sparse_csr_C = heat_sparse_csr_A + heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_A.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A, split=None)
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_B, split=0)
+            heat_sparse_csr_C = heat_sparse_csr_A + heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_B.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_B.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+        if self.world_size == 3:
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A, split=0)
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_B, split=0)
+
+            indptr_C_dist = [[0, 3, 4], [0, 2, 4], [0, 3]]
+            indices_C_dist = [[0, 2, 4, 2], [1, 3, 0, 3], [1, 3, 4]]
+            data_C_dist = [[2, 1, 5, 4], [4, 1, 4, 5], [3, 4, 6]]
+
+            heat_sparse_csr_C = heat_sparse_csr_A + heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_A.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+            # Operands with different splits
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A, split=0)
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_B, split=None)
+            heat_sparse_csr_C = heat_sparse_csr_A + heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_A.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A, split=None)
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_B, split=0)
+            heat_sparse_csr_C = heat_sparse_csr_A + heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_B.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_B.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+        # Number of processes > Number of rows
+        if self.world_size == 6:
+            indptr_A = torch.tensor(
+                [0, 2, 2, 2, 2, 2], dtype=torch.int, device=self.device.torch_device
+            )
+            indices_A = torch.tensor([2, 4], dtype=torch.int, device=self.device.torch_device)
+            data_A = torch.tensor([1, 2], dtype=torch.float, device=self.device.torch_device)
+            torch_sparse_csr_A = torch.sparse_csr_tensor(
+                indptr_A, indices_A, data_A, device=self.device.torch_device
+            )
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(torch_sparse_csr_A, split=0)
+
+            indptr_B = torch.tensor(
+                [0, 2, 3, 5, 5, 5], dtype=torch.int, device=self.device.torch_device
+            )
+            indices_B = torch.tensor(
+                [0, 4, 2, 1, 3], dtype=torch.int, device=self.device.torch_device
+            )
+            data_B = torch.tensor(
+                [2, 3, 4, 1, 1], dtype=torch.float, device=self.device.torch_device
+            )
+            torch_sparse_csr_B = torch.sparse_csr_tensor(
+                indptr_B, indices_B, data_B, device=self.device.torch_device
+            )
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(torch_sparse_csr_B, split=0)
+
+            indptr_C_dist = [[0, 3], [0, 1], [0, 2], [0, 0], [0, 0], [0]]
+            indices_C_dist = [[0, 2, 4], [2], [1, 3], [], [], []]
+            data_C_dist = [[2, 1, 5], [4], [1, 1], [], [], []]
+
+            indptr_C = [0, 3, 4, 6, 6, 6]
+            indices_C = [0, 2, 4, 2, 1, 3]
+            data_C = [2, 1, 5, 4, 1, 1]
+
+            heat_sparse_csr_C = heat_sparse_csr_A + heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_A.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+            # Operands with different splits
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(torch_sparse_csr_A, split=0)
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(torch_sparse_csr_B, split=None)
+            heat_sparse_csr_C = heat_sparse_csr_A + heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_A.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(torch_sparse_csr_A, split=None)
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(torch_sparse_csr_B, split=0)
+            heat_sparse_csr_C = heat_sparse_csr_A + heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_B.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_B.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+        # scalar
+        heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A)
+
+        indptr_C = self.ref_indptr_A
+        indices_C = self.ref_indices_A
+        data_C = self.ref_data_A + self.scalar
+        heat_sparse_csr_C = heat_sparse_csr_A + self.scalar
+
+        self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+        self.assertTrue(
+            (
+                heat_sparse_csr_C.indptr == torch.tensor(indptr_C, device=self.device.torch_device)
+            ).all()
+        )
+        self.assertTrue(
+            (
+                heat_sparse_csr_C.indices
+                == torch.tensor(indices_C, device=self.device.torch_device)
+            ).all()
+        )
+        self.assertTrue(
+            (heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)).all()
+        )
+        self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+        self.assertEqual(heat_sparse_csr_C.split, None)
+        self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+        self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+        if self.world_size == 2:
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A, split=0)
+            indptr_C_dist = [[0, 2, 2, 3], [0, 2, 3]]
+            indices_C_dist = [[2, 4, 1], [0, 3, 4]]
+            data_C_dist = [[1, 2, 3], [4, 5, 6]]
+            data_C_dist = [[x + self.scalar for x in data] for data in data_C_dist]
+            heat_sparse_csr_C = heat_sparse_csr_A + self.scalar
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_A.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+        if self.world_size == 3:
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A, split=0)
+            indptr_C_dist = [[0, 2, 2], [0, 1, 3], [0, 1]]
+            indices_C_dist = [[2, 4], [1, 0, 3], [4]]
+            data_C_dist = [[1, 2], [3, 4, 5], [6]]
+            data_C_dist = [[x + self.scalar for x in data] for data in data_C_dist]
+            heat_sparse_csr_C = heat_sparse_csr_A + self.scalar
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_A.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+        """
+        [[1, 0]
+        [0, 1]]
+        """
+        torch_sparse_csr_2x2 = torch.sparse_csr_tensor(
+            [0, 1, 2], [0, 1], [1, 1], device=self.device.torch_device
+        )
+        heat_sparse_csr_2x2 = ht.sparse.sparse_csr_matrix(torch_sparse_csr_2x2)
+        with self.assertRaises(ValueError):
+            heat_sparse_csr_C = heat_sparse_csr_A + heat_sparse_csr_2x2
+
+        with self.assertRaises(TypeError):
+            heat_sparse_csr_C = ht.sparse.add(2, 3)
+        with self.assertRaises(TypeError):
+            heat_sparse_csr_C = ht.sparse.add(heat_sparse_csr_2x2, torch_sparse_csr_2x2)
+        with self.assertRaises(TypeError):
+            heat_sparse_csr_C = ht.sparse.add(torch_sparse_csr_2x2, heat_sparse_csr_2x2)
+        with self.assertRaises(ValueError):
+            heat_sparse_csr_C = ht.sparse.add(heat_sparse_csr_2x2, heat_sparse_csr_A)
+
+    def test_mul(self):
+
+        heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A)
+        heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_B)
+
+        """
+        C = [[0, 0, 0, 0, 6]
+            [0, 0, 0, 0, 0]
+            [0, 3, 0, 0, 0]
+            [0, 0, 0, 0, 0]
+            [0, 0, 0, 0, 0]]
+        """
+        indptr_C = [0, 1, 1, 2, 2, 2]
+        indices_C = [4, 1]
+        data_C = [6, 3]
+
+        heat_sparse_csr_C = heat_sparse_csr_A * heat_sparse_csr_B
+
+        self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+        self.assertTrue(
+            (
+                heat_sparse_csr_C.indptr == torch.tensor(indptr_C, device=self.device.torch_device)
+            ).all()
+        )
+        self.assertTrue(
+            (
+                heat_sparse_csr_C.indices
+                == torch.tensor(indices_C, device=self.device.torch_device)
+            ).all()
+        )
+        self.assertTrue(
+            (heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)).all()
+        )
+        self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+        self.assertEqual(heat_sparse_csr_C.split, None)
+        self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+        self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+        # Distributed case
+        if self.world_size == 2:
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A, split=0)
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_B, split=0)
+
+            indptr_C_dist = [[0, 1, 1, 2], [0, 0, 0]]
+            indices_C_dist = [[4, 1], []]
+            data_C_dist = [[6, 3], []]
+
+            heat_sparse_csr_C = heat_sparse_csr_A * heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_A.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+            # Operands with different splits
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A, split=0)
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_B, split=None)
+            heat_sparse_csr_C = heat_sparse_csr_A * heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_A.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A, split=None)
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_B, split=0)
+            heat_sparse_csr_C = heat_sparse_csr_A * heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_B.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_B.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+        if self.world_size == 3:
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A, split=0)
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_B, split=0)
+
+            indptr_C_dist = [[0, 1, 1], [0, 1, 1], [0, 0]]
+            indices_C_dist = [[4], [1], []]
+            data_C_dist = [[6], [3], []]
+
+            heat_sparse_csr_C = heat_sparse_csr_A * heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_A.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+            # Operands with different splits
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A, split=0)
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_B, split=None)
+            heat_sparse_csr_C = heat_sparse_csr_A * heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_A.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A, split=None)
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_B, split=0)
+            heat_sparse_csr_C = heat_sparse_csr_A * heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_B.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_B.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+        # Number of processes > Number of rows
+        if self.world_size == 6:
+            indptr_A = torch.tensor(
+                [0, 2, 2, 2, 2, 2], dtype=torch.int, device=self.device.torch_device
+            )
+            indices_A = torch.tensor([2, 4], dtype=torch.int, device=self.device.torch_device)
+            data_A = torch.tensor([1, 2], dtype=torch.float, device=self.device.torch_device)
+            torch_sparse_csr_A = torch.sparse_csr_tensor(
+                indptr_A, indices_A, data_A, device=self.device.torch_device
+            )
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(torch_sparse_csr_A, split=0)
+
+            indptr_B = torch.tensor(
+                [0, 2, 3, 5, 5, 5], dtype=torch.int, device=self.device.torch_device
+            )
+            indices_B = torch.tensor(
+                [0, 4, 2, 1, 3], dtype=torch.int, device=self.device.torch_device
+            )
+            data_B = torch.tensor(
+                [2, 3, 4, 1, 1], dtype=torch.float, device=self.device.torch_device
+            )
+            torch_sparse_csr_B = torch.sparse_csr_tensor(
+                indptr_B, indices_B, data_B, device=self.device.torch_device
+            )
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(torch_sparse_csr_B, split=0)
+
+            indptr_C_dist = [[0, 1], [0, 0], [0, 0], [0, 0], [0, 0], [0]]
+            indices_C_dist = [[4], [], [], [], [], []]
+            data_C_dist = [[6], [], [], [], [], []]
+
+            indptr_C = [0, 1, 1, 1, 1, 1]
+            indices_C = [4]
+            data_C = [6]
+
+            heat_sparse_csr_C = heat_sparse_csr_A * heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_A.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+            # Operands with different splits
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(torch_sparse_csr_A, split=0)
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(torch_sparse_csr_B, split=None)
+            heat_sparse_csr_C = heat_sparse_csr_A * heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_A.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(torch_sparse_csr_A, split=None)
+            heat_sparse_csr_B = ht.sparse.sparse_csr_matrix(torch_sparse_csr_B, split=0)
+            heat_sparse_csr_C = heat_sparse_csr_A * heat_sparse_csr_B
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_B.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_B.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+        # scalar
+        heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A)
+
+        indptr_C = self.ref_indptr_A
+        indices_C = self.ref_indices_A
+        data_C = self.ref_data_A * self.scalar
+        heat_sparse_csr_C = heat_sparse_csr_A * self.scalar
+
+        self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+        self.assertTrue(
+            (
+                heat_sparse_csr_C.indptr == torch.tensor(indptr_C, device=self.device.torch_device)
+            ).all()
+        )
+        self.assertTrue(
+            (
+                heat_sparse_csr_C.indices
+                == torch.tensor(indices_C, device=self.device.torch_device)
+            ).all()
+        )
+        self.assertTrue(
+            (heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)).all()
+        )
+        self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+        self.assertEqual(heat_sparse_csr_C.split, None)
+        self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+        self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+        if self.world_size == 2:
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A, split=0)
+            indptr_C_dist = [[0, 2, 2, 3], [0, 2, 3]]
+            indices_C_dist = [[2, 4, 1], [0, 3, 4]]
+            data_C_dist = [[1, 2, 3], [4, 5, 6]]
+            data_C_dist = [[x * self.scalar for x in data] for data in data_C_dist]
+            heat_sparse_csr_C = heat_sparse_csr_A * self.scalar
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_A.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+        if self.world_size == 3:
+            heat_sparse_csr_A = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr_A, split=0)
+            indptr_C_dist = [[0, 2, 2], [0, 1, 3], [0, 1]]
+            indices_C_dist = [[2, 4], [1, 0, 3], [4]]
+            data_C_dist = [[1, 2], [3, 4, 5], [6]]
+            data_C_dist = [[x * self.scalar for x in data] for data in data_C_dist]
+            heat_sparse_csr_C = heat_sparse_csr_A * self.scalar
+
+            self.assertIsInstance(heat_sparse_csr_C, ht.sparse.DCSR_matrix)
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indptr
+                    == torch.tensor(indptr_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindptr
+                    == torch.tensor(indptr_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.indices
+                    == torch.tensor(indices_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.lindices
+                    == torch.tensor(indices_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.data == torch.tensor(data_C, device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue(
+                (
+                    heat_sparse_csr_C.ldata
+                    == torch.tensor(data_C_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertEqual(heat_sparse_csr_C.nnz, len(data_C))
+            self.assertEqual(heat_sparse_csr_C.lnnz, len(data_C_dist[self.rank]))
+            self.assertEqual(heat_sparse_csr_C.split, 0)
+            self.assertEqual(heat_sparse_csr_C.shape, heat_sparse_csr_A.shape)
+            self.assertEqual(heat_sparse_csr_C.lshape, heat_sparse_csr_A.lshape)
+            self.assertEqual(heat_sparse_csr_C.dtype, ht.float)
+
+        """
+        [[1, 0]
+        [0, 1]]
+        """
+        torch_sparse_csr_2x2 = torch.sparse_csr_tensor(
+            [0, 1, 2], [0, 1], [1, 1], device=self.device.torch_device
+        )
+        heat_sparse_csr_2x2 = ht.sparse.sparse_csr_matrix(torch_sparse_csr_2x2)
+        with self.assertRaises(ValueError):
+            heat_sparse_csr_C = heat_sparse_csr_A * heat_sparse_csr_2x2
+
+        with self.assertRaises(TypeError):
+            heat_sparse_csr_C = ht.sparse.mul(2, 3)
+        with self.assertRaises(TypeError):
+            heat_sparse_csr_C = ht.sparse.mul(heat_sparse_csr_2x2, torch_sparse_csr_2x2)
+        with self.assertRaises(TypeError):
+            heat_sparse_csr_C = ht.sparse.mul(torch_sparse_csr_2x2, heat_sparse_csr_2x2)
+        with self.assertRaises(ValueError):
+            heat_sparse_csr_C = ht.sparse.mul(heat_sparse_csr_2x2, heat_sparse_csr_A)
diff --git a/heat/sparse/tests/test_dcsrmatrix.py b/heat/sparse/tests/test_dcsrmatrix.py
new file mode 100644
index 0000000000..137c231ed1
--- /dev/null
+++ b/heat/sparse/tests/test_dcsrmatrix.py
@@ -0,0 +1,287 @@
+import unittest
+import heat as ht
+import torch
+
+from heat.core.tests.test_suites.basic_test import TestCase
+
+from typing import Tuple
+
+
+@unittest.skipIf(
+    int(torch.__version__.split(".")[0]) <= 1 and int(torch.__version__.split(".")[1]) < 10,
+    f"ht.sparse requires torch >= 1.10. Found version {torch.__version__}.",
+)
+class TestDCSR_matrix(TestCase):
+    @classmethod
+    def setUpClass(self):
+
+        super(TestDCSR_matrix, self).setUpClass()
+        """
+        A = [[0, 0, 1, 0, 2]
+            [0, 0, 0, 0, 0]
+            [0, 3, 0, 0, 0]
+            [4, 0, 0, 5, 0]
+            [0, 0, 0, 0, 6]]
+        """
+        self.ref_indptr = torch.tensor(
+            [0, 2, 2, 3, 5, 6], dtype=torch.int, device=self.device.torch_device
+        )
+        self.ref_indices = torch.tensor(
+            [2, 4, 1, 0, 3, 4], dtype=torch.int, device=self.device.torch_device
+        )
+        self.ref_data = torch.tensor(
+            [1, 2, 3, 4, 5, 6], dtype=torch.float, device=self.device.torch_device
+        )
+        self.ref_torch_sparse_csr = torch.sparse_csr_tensor(
+            self.ref_indptr, self.ref_indices, self.ref_data, device=self.device.torch_device
+        )
+
+        self.world_size = ht.communication.MPI_WORLD.size
+        self.rank = ht.communication.MPI_WORLD.rank
+
+    def test_larray(self):
+
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr)
+
+        self.assertIsInstance(heat_sparse_csr.larray, torch.Tensor)
+        self.assertEqual(heat_sparse_csr.larray.layout, torch.sparse_csr)
+        self.assertEqual(tuple(heat_sparse_csr.larray.shape), heat_sparse_csr.lshape)
+        self.assertEqual(tuple(heat_sparse_csr.larray.shape), heat_sparse_csr.gshape)
+
+        # Distributed case
+        if self.world_size > 1:
+            heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr, split=0)
+
+            self.assertIsInstance(heat_sparse_csr.larray, torch.Tensor)
+            self.assertEqual(heat_sparse_csr.larray.layout, torch.sparse_csr)
+            self.assertEqual(tuple(heat_sparse_csr.larray.shape), heat_sparse_csr.lshape)
+            self.assertNotEqual(tuple(heat_sparse_csr.larray.shape), heat_sparse_csr.gshape)
+
+    def test_nnz(self):
+
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr)
+
+        self.assertIsInstance(heat_sparse_csr.nnz, int)
+        self.assertIsInstance(heat_sparse_csr.gnnz, int)
+        self.assertIsInstance(heat_sparse_csr.lnnz, int)
+
+        self.assertEqual(heat_sparse_csr.nnz, self.ref_torch_sparse_csr._nnz())
+        self.assertEqual(heat_sparse_csr.nnz, heat_sparse_csr.gnnz)
+        self.assertEqual(heat_sparse_csr.nnz, heat_sparse_csr.lnnz)
+
+        # Distributed case
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr, split=0)
+
+        if self.world_size == 2:
+            nnz_dist = [3, 3]
+            self.assertEqual(heat_sparse_csr.nnz, self.ref_torch_sparse_csr._nnz())
+            self.assertEqual(heat_sparse_csr.lnnz, nnz_dist[self.rank])
+
+        if self.world_size == 3:
+            nnz_dist = [2, 3, 1]
+            self.assertEqual(heat_sparse_csr.nnz, self.ref_torch_sparse_csr._nnz())
+            self.assertEqual(heat_sparse_csr.lnnz, nnz_dist[self.rank])
+
+        # Number of processes > Number of rows
+        if self.world_size == 6:
+            nnz_dist = [2, 0, 1, 2, 1, 0]
+            self.assertEqual(heat_sparse_csr.nnz, self.ref_torch_sparse_csr._nnz())
+            self.assertEqual(heat_sparse_csr.lnnz, nnz_dist[self.rank])
+
+    def test_shape(self):
+
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr)
+
+        self.assertIsInstance(heat_sparse_csr.shape, Tuple)
+        self.assertIsInstance(heat_sparse_csr.gshape, Tuple)
+        self.assertIsInstance(heat_sparse_csr.lshape, Tuple)
+
+        self.assertEqual(heat_sparse_csr.shape, self.ref_torch_sparse_csr.shape)
+        self.assertEqual(heat_sparse_csr.shape, heat_sparse_csr.gshape)
+        self.assertEqual(heat_sparse_csr.shape, heat_sparse_csr.lshape)
+
+        # Distributed case
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr, split=0)
+
+        if self.world_size == 2:
+            lshape_dist = [(3, 5), (2, 5)]
+
+            self.assertEqual(heat_sparse_csr.shape, self.ref_torch_sparse_csr.shape)
+            self.assertEqual(heat_sparse_csr.lshape, lshape_dist[self.rank])
+
+        if self.world_size == 3:
+            lshape_dist = [(2, 5), (2, 5), (1, 5)]
+
+            self.assertEqual(heat_sparse_csr.shape, self.ref_torch_sparse_csr.shape)
+            self.assertEqual(heat_sparse_csr.lshape, lshape_dist[self.rank])
+
+        # Number of processes > Number of rows
+        if self.world_size == 6:
+            lshape_dist = [(1, 5), (1, 5), (1, 5), (1, 5), (1, 5), (0, 5)]
+
+            self.assertEqual(heat_sparse_csr.shape, self.ref_torch_sparse_csr.shape)
+            self.assertEqual(heat_sparse_csr.lshape, lshape_dist[self.rank])
+
+    def test_dtype(self):
+
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr)
+        self.assertEqual(heat_sparse_csr.dtype, ht.float32)
+
+    def test_data(self):
+
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr)
+
+        self.assertTrue((heat_sparse_csr.data == self.ref_data).all())
+        self.assertTrue((heat_sparse_csr.data == heat_sparse_csr.gdata).all())
+        self.assertTrue((heat_sparse_csr.data == heat_sparse_csr.ldata).all())
+
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr, split=0)
+        if self.world_size == 2:
+            data_dist = [[1, 2, 3], [4, 5, 6]]
+
+            self.assertTrue((heat_sparse_csr.data == self.ref_data).all())
+            self.assertTrue((heat_sparse_csr.data == heat_sparse_csr.gdata).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.ldata
+                    == torch.tensor(data_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+        if self.world_size == 3:
+            data_dist = [[1, 2], [3, 4, 5], [6]]
+
+            self.assertTrue((heat_sparse_csr.data == self.ref_data).all())
+            self.assertTrue((heat_sparse_csr.data == heat_sparse_csr.gdata).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.ldata
+                    == torch.tensor(data_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+        # Number of processes > Number of rows
+        if self.world_size == 6:
+            data_dist = [[1, 2], [], [3], [4, 5], [6], []]
+
+            self.assertTrue((heat_sparse_csr.data == self.ref_data).all())
+            self.assertTrue((heat_sparse_csr.data == heat_sparse_csr.gdata).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.ldata
+                    == torch.tensor(data_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+    def test_indices(self):
+
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr)
+
+        self.assertTrue((heat_sparse_csr.indices == self.ref_indices).all())
+        self.assertTrue((heat_sparse_csr.indices == heat_sparse_csr.gindices).all())
+        self.assertTrue((heat_sparse_csr.indices == heat_sparse_csr.lindices).all())
+
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr, split=0)
+        if self.world_size == 2:
+            indices_dist = [[2, 4, 1], [0, 3, 4]]
+
+            self.assertTrue((heat_sparse_csr.indices == self.ref_indices).all())
+            self.assertTrue((heat_sparse_csr.indices == heat_sparse_csr.gindices).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindices
+                    == torch.tensor(indices_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+        if self.world_size == 3:
+            indices_dist = [[2, 4], [1, 0, 3], [4]]
+
+            self.assertTrue((heat_sparse_csr.indices == self.ref_indices).all())
+            self.assertTrue((heat_sparse_csr.indices == heat_sparse_csr.gindices).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindices
+                    == torch.tensor(indices_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+        # Number of processes > Number of rows
+        if self.world_size == 6:
+            indices_dist = [[2, 4], [], [1], [0, 3], [4], []]
+
+            self.assertTrue((heat_sparse_csr.indices == self.ref_indices).all())
+            self.assertTrue((heat_sparse_csr.indices == heat_sparse_csr.gindices).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindices
+                    == torch.tensor(indices_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+    def test_indptr(self):
+
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr)
+
+        self.assertTrue((heat_sparse_csr.indptr == self.ref_indptr).all())
+        self.assertTrue((heat_sparse_csr.indptr == heat_sparse_csr.gindptr).all())
+        self.assertTrue((heat_sparse_csr.indptr == heat_sparse_csr.lindptr).all())
+
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr, split=0)
+        if self.world_size == 2:
+            indptr_dist = [[0, 2, 2, 3], [0, 2, 3]]
+
+            self.assertTrue((heat_sparse_csr.indptr == self.ref_indptr).all())
+            self.assertTrue((heat_sparse_csr.indptr == heat_sparse_csr.gindptr).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindptr
+                    == torch.tensor(indptr_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+        if self.world_size == 3:
+            indptr_dist = [[0, 2, 2], [0, 1, 3], [0, 1]]
+
+            self.assertTrue((heat_sparse_csr.indptr == self.ref_indptr).all())
+            self.assertTrue((heat_sparse_csr.indptr == heat_sparse_csr.gindptr).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindptr
+                    == torch.tensor(indptr_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+        # Number of processes > Number of rows
+        if self.world_size == 6:
+            indptr_dist = [[0, 2], [0, 0], [0, 1], [0, 2], [0, 1], [0]]
+
+            self.assertTrue((heat_sparse_csr.indptr == self.ref_indptr).all())
+            self.assertTrue((heat_sparse_csr.indptr == heat_sparse_csr.gindptr).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindptr
+                    == torch.tensor(indptr_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+    def test_astype(self):
+
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr)
+
+        # check starting invariant
+        self.assertEqual(heat_sparse_csr.dtype, ht.float32)
+
+        # check the copy case for uint8
+        as_uint8 = heat_sparse_csr.astype(ht.uint8)
+        self.assertIsInstance(as_uint8, ht.sparse.DCSR_matrix)
+        self.assertEqual(as_uint8.dtype, ht.uint8)
+        self.assertEqual(as_uint8.larray.dtype, torch.uint8)
+        self.assertIsNot(as_uint8, heat_sparse_csr)
+
+        # check the copy case for uint8
+        as_float64 = heat_sparse_csr.astype(ht.float64, copy=False)
+        self.assertIsInstance(as_float64, ht.sparse.DCSR_matrix)
+        self.assertEqual(as_float64.dtype, ht.float64)
+        self.assertEqual(as_float64.larray.dtype, torch.float64)
+        self.assertIs(as_float64, heat_sparse_csr)
diff --git a/heat/sparse/tests/test_factories.py b/heat/sparse/tests/test_factories.py
new file mode 100644
index 0000000000..ed8ba5a946
--- /dev/null
+++ b/heat/sparse/tests/test_factories.py
@@ -0,0 +1,504 @@
+import unittest
+import heat as ht
+import torch
+import scipy
+
+from heat.core.tests.test_suites.basic_test import TestCase
+
+
+@unittest.skipIf(
+    int(torch.__version__.split(".")[0]) <= 1 and int(torch.__version__.split(".")[1]) < 10,
+    f"ht.sparse requires torch >= 1.10. Found version {torch.__version__}.",
+)
+class TestFactories(TestCase):
+    @classmethod
+    def setUpClass(self):
+        super(TestFactories, self).setUpClass()
+
+        """
+        A = [[0, 0, 1, 0, 2]
+            [0, 0, 0, 0, 0]
+            [0, 3, 0, 0, 0]
+            [4, 0, 0, 5, 0]
+            [0, 0, 0, 0, 6]]
+        """
+        self.ref_indptr = torch.tensor(
+            [0, 2, 2, 3, 5, 6], dtype=torch.int, device=self.device.torch_device
+        )
+        self.ref_indices = torch.tensor(
+            [2, 4, 1, 0, 3, 4], dtype=torch.int, device=self.device.torch_device
+        )
+        self.ref_data = torch.tensor(
+            [1, 2, 3, 4, 5, 6], dtype=torch.float, device=self.device.torch_device
+        )
+
+        self.ref_torch_sparse_csr = torch.sparse_csr_tensor(
+            self.ref_indptr, self.ref_indices, self.ref_data, device=self.device.torch_device
+        )
+
+        self.ref_scipy_sparse_csr = scipy.sparse.csr_matrix(
+            (
+                torch.tensor([1, 2, 3, 4, 5, 6], dtype=torch.float, device="cpu"),
+                torch.tensor([2, 4, 1, 0, 3, 4], dtype=torch.int, device="cpu"),
+                torch.tensor([0, 2, 2, 3, 5, 6], dtype=torch.int, device="cpu"),
+            )
+        )
+
+        self.world_size = ht.communication.MPI_WORLD.size
+        self.rank = ht.communication.MPI_WORLD.rank
+
+    def test_sparse_csr_matrix(self):
+        """
+        Input sparse: torch.Tensor
+        """
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr)
+
+        self.assertIsInstance(heat_sparse_csr, ht.sparse.DCSR_matrix)
+        self.assertEqual(heat_sparse_csr.dtype, ht.float32)
+        self.assertEqual(heat_sparse_csr.indptr.dtype, torch.int64)
+        self.assertEqual(heat_sparse_csr.indices.dtype, torch.int64)
+        self.assertEqual(heat_sparse_csr.shape, self.ref_torch_sparse_csr.shape)
+        self.assertEqual(heat_sparse_csr.lshape, self.ref_torch_sparse_csr.shape)
+        self.assertEqual(heat_sparse_csr.split, None)
+        self.assertTrue((heat_sparse_csr.indptr == self.ref_indptr).all())
+        self.assertTrue((heat_sparse_csr.lindptr == self.ref_indptr).all())
+        self.assertTrue((heat_sparse_csr.indices == self.ref_indices).all())
+        self.assertTrue((heat_sparse_csr.lindices == self.ref_indices).all())
+        self.assertTrue((heat_sparse_csr.data == self.ref_data).all())
+        self.assertTrue((heat_sparse_csr.ldata == self.ref_data).all())
+
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(
+            self.ref_torch_sparse_csr, dtype=ht.float32, device=self.device
+        )
+        self.assertEqual(heat_sparse_csr.dtype, ht.float32)
+        self.assertEqual(heat_sparse_csr.device, self.device)
+
+        # Distributed case (split)
+        if self.world_size == 2:
+            indptr_dist = [[0, 2, 2, 3], [0, 2, 3]]
+            indices_dist = [[2, 4, 1], [0, 3, 4]]
+            data_dist = [[1, 2, 3], [4, 5, 6]]
+
+            lshape_dist = [(3, 5), (2, 5)]
+
+            heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr, split=0)
+            self.assertEqual(heat_sparse_csr.shape, self.ref_torch_sparse_csr.shape)
+            self.assertEqual(heat_sparse_csr.lshape, lshape_dist[self.rank])
+            self.assertEqual(heat_sparse_csr.split, 0)
+            self.assertTrue((heat_sparse_csr.indptr == self.ref_indptr).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindptr
+                    == torch.tensor(indptr_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.indices == self.ref_indices).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindices
+                    == torch.tensor(indices_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.data == self.ref_data).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.ldata
+                    == torch.tensor(data_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+        if self.world_size == 3:
+            indptr_dist = [[0, 2, 2], [0, 1, 3], [0, 1]]
+            indices_dist = [[2, 4], [1, 0, 3], [4]]
+            data_dist = [[1, 2], [3, 4, 5], [6]]
+
+            lshape_dist = [(2, 5), (2, 5), (1, 5)]
+
+            heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr, split=0)
+            self.assertEqual(heat_sparse_csr.shape, self.ref_torch_sparse_csr.shape)
+            self.assertEqual(heat_sparse_csr.lshape, lshape_dist[self.rank])
+            self.assertEqual(heat_sparse_csr.split, 0)
+            self.assertTrue((heat_sparse_csr.indptr == self.ref_indptr).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindptr
+                    == torch.tensor(indptr_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.indices == self.ref_indices).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindices
+                    == torch.tensor(indices_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.data == self.ref_data).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.ldata
+                    == torch.tensor(data_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+        # Number of processes > Number of rows
+        if self.world_size == 6:
+            indptr_dist = [[0, 2], [0, 0], [0, 1], [0, 2], [0, 1], [0]]
+            indices_dist = [[2, 4], [], [1], [0, 3], [4], []]
+            data_dist = [[1, 2], [], [3], [4, 5], [6], []]
+
+            lshape_dist = [(1, 5), (1, 5), (1, 5), (1, 5), (1, 5), (0, 5)]
+
+            heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr, split=0)
+            self.assertEqual(heat_sparse_csr.shape, self.ref_torch_sparse_csr.shape)
+            self.assertEqual(heat_sparse_csr.lshape, lshape_dist[self.rank])
+            self.assertEqual(heat_sparse_csr.split, 0)
+            self.assertTrue((heat_sparse_csr.indptr == self.ref_indptr).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindptr
+                    == torch.tensor(indptr_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.indices == self.ref_indices).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindices
+                    == torch.tensor(indices_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.data == self.ref_data).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.ldata
+                    == torch.tensor(data_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+        # Distributed case (is_split)
+        if self.world_size == 2:
+            indptr_dist = [[0, 2, 2, 3], [0, 2, 3]]
+            indices_dist = [[2, 4, 1], [0, 3, 4]]
+            data_dist = [[1, 2, 3], [4, 5, 6]]
+
+            lshape_dist = [(3, 5), (2, 5)]
+
+            dist_torch_sparse_csr = torch.sparse_csr_tensor(
+                torch.tensor(indptr_dist[self.rank], device=self.device.torch_device),
+                torch.tensor(indices_dist[self.rank], device=self.device.torch_device),
+                torch.tensor(data_dist[self.rank], device=self.device.torch_device),
+                size=lshape_dist[self.rank],
+            )
+
+            heat_sparse_csr = ht.sparse.sparse_csr_matrix(dist_torch_sparse_csr, is_split=0)
+            self.assertEqual(heat_sparse_csr.shape, self.ref_torch_sparse_csr.shape)
+            self.assertEqual(heat_sparse_csr.lshape, lshape_dist[self.rank])
+            self.assertEqual(heat_sparse_csr.split, 0)
+            self.assertTrue((heat_sparse_csr.indptr == self.ref_indptr).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindptr
+                    == torch.tensor(indptr_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.indices == self.ref_indices).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindices
+                    == torch.tensor(indices_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.data == self.ref_data).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.ldata
+                    == torch.tensor(data_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+        if self.world_size == 3:
+            indptr_dist = [[0, 2, 2], [0, 1, 3], [0, 1]]
+            indices_dist = [[2, 4], [1, 0, 3], [4]]
+            data_dist = [[1, 2], [3, 4, 5], [6]]
+
+            lshape_dist = [(2, 5), (2, 5), (1, 5)]
+
+            dist_torch_sparse_csr = torch.sparse_csr_tensor(
+                torch.tensor(indptr_dist[self.rank], device=self.device.torch_device),
+                torch.tensor(indices_dist[self.rank], device=self.device.torch_device),
+                torch.tensor(data_dist[self.rank], device=self.device.torch_device),
+                size=lshape_dist[self.rank],
+            )
+
+            heat_sparse_csr = ht.sparse.sparse_csr_matrix(dist_torch_sparse_csr, is_split=0)
+            self.assertEqual(heat_sparse_csr.shape, self.ref_torch_sparse_csr.shape)
+            self.assertEqual(heat_sparse_csr.lshape, lshape_dist[self.rank])
+            self.assertEqual(heat_sparse_csr.split, 0)
+            self.assertTrue((heat_sparse_csr.indptr == self.ref_indptr).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindptr
+                    == torch.tensor(indptr_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.indices == self.ref_indices).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindices
+                    == torch.tensor(indices_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.data == self.ref_data).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.ldata
+                    == torch.tensor(data_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+        """
+        Input sparse: scipy.sparse.csr_matrix
+        """
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_scipy_sparse_csr)
+
+        self.assertIsInstance(heat_sparse_csr, ht.sparse.DCSR_matrix)
+        self.assertEqual(heat_sparse_csr.dtype, ht.float32)
+        self.assertEqual(heat_sparse_csr.indptr.dtype, torch.int64)
+        self.assertEqual(heat_sparse_csr.indices.dtype, torch.int64)
+        self.assertEqual(heat_sparse_csr.shape, self.ref_scipy_sparse_csr.shape)
+        self.assertEqual(heat_sparse_csr.lshape, self.ref_scipy_sparse_csr.shape)
+        self.assertEqual(heat_sparse_csr.split, None)
+        self.assertTrue((heat_sparse_csr.indptr == self.ref_indptr).all())
+        self.assertTrue((heat_sparse_csr.lindptr == self.ref_indptr).all())
+        self.assertTrue((heat_sparse_csr.indices == self.ref_indices).all())
+        self.assertTrue((heat_sparse_csr.lindices == self.ref_indices).all())
+        self.assertTrue((heat_sparse_csr.data == self.ref_data).all())
+        self.assertTrue((heat_sparse_csr.ldata == self.ref_data).all())
+
+        # Distributed case (split)
+        if self.world_size == 2:
+            indptr_dist = [[0, 2, 2, 3], [0, 2, 3]]
+            indices_dist = [[2, 4, 1], [0, 3, 4]]
+            data_dist = [[1, 2, 3], [4, 5, 6]]
+
+            lshape_dist = [(3, 5), (2, 5)]
+
+            heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_scipy_sparse_csr, split=0)
+            self.assertEqual(heat_sparse_csr.shape, self.ref_scipy_sparse_csr.shape)
+            self.assertEqual(heat_sparse_csr.lshape, lshape_dist[self.rank])
+            self.assertEqual(heat_sparse_csr.split, 0)
+            self.assertTrue((heat_sparse_csr.indptr == self.ref_indptr).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindptr
+                    == torch.tensor(indptr_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.indices == self.ref_indices).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindices
+                    == torch.tensor(indices_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.data == self.ref_data).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.ldata
+                    == torch.tensor(data_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+        if self.world_size == 3:
+            indptr_dist = [[0, 2, 2], [0, 1, 3], [0, 1]]
+            indices_dist = [[2, 4], [1, 0, 3], [4]]
+            data_dist = [[1, 2], [3, 4, 5], [6]]
+
+            lshape_dist = [(2, 5), (2, 5), (1, 5)]
+
+            heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_scipy_sparse_csr, split=0)
+            self.assertEqual(heat_sparse_csr.shape, self.ref_scipy_sparse_csr.shape)
+            self.assertEqual(heat_sparse_csr.lshape, lshape_dist[self.rank])
+            self.assertEqual(heat_sparse_csr.split, 0)
+            self.assertTrue((heat_sparse_csr.indptr == self.ref_indptr).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindptr
+                    == torch.tensor(indptr_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.indices == self.ref_indices).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindices
+                    == torch.tensor(indices_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.data == self.ref_data).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.ldata
+                    == torch.tensor(data_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+        # Number of processes > Number of rows
+        if self.world_size == 6:
+            indptr_dist = [[0, 2], [0, 0], [0, 1], [0, 2], [0, 1], [0]]
+            indices_dist = [[2, 4], [], [1], [0, 3], [4], []]
+            data_dist = [[1, 2], [], [3], [4, 5], [6], []]
+
+            lshape_dist = [(1, 5), (1, 5), (1, 5), (1, 5), (1, 5), (0, 5)]
+
+            heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_scipy_sparse_csr, split=0)
+            self.assertEqual(heat_sparse_csr.shape, self.ref_scipy_sparse_csr.shape)
+            self.assertEqual(heat_sparse_csr.lshape, lshape_dist[self.rank])
+            self.assertEqual(heat_sparse_csr.split, 0)
+            self.assertTrue((heat_sparse_csr.indptr == self.ref_indptr).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindptr
+                    == torch.tensor(indptr_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.indices == self.ref_indices).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindices
+                    == torch.tensor(indices_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.data == self.ref_data).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.ldata
+                    == torch.tensor(data_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+        # Distributed case (is_split)
+        if self.world_size == 2:
+            indptr_dist = [[0, 2, 2, 3], [0, 2, 3]]
+            indices_dist = [[2, 4, 1], [0, 3, 4]]
+            data_dist = [[1, 2, 3], [4, 5, 6]]
+
+            lshape_dist = [(3, 5), (2, 5)]
+
+            dist_scipy_sparse_csr = scipy.sparse.csr_matrix(
+                (
+                    torch.tensor(data_dist[self.rank], device="cpu"),
+                    torch.tensor(indices_dist[self.rank], device="cpu"),
+                    torch.tensor(indptr_dist[self.rank], device="cpu"),
+                ),
+                shape=lshape_dist[self.rank],
+            )
+
+            heat_sparse_csr = ht.sparse.sparse_csr_matrix(dist_scipy_sparse_csr, is_split=0)
+            self.assertEqual(heat_sparse_csr.shape, self.ref_torch_sparse_csr.shape)
+            self.assertEqual(heat_sparse_csr.lshape, lshape_dist[self.rank])
+            self.assertEqual(heat_sparse_csr.split, 0)
+            self.assertTrue((heat_sparse_csr.indptr == self.ref_indptr).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindptr
+                    == torch.tensor(indptr_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.indices == self.ref_indices).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindices
+                    == torch.tensor(indices_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.data == self.ref_data).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.ldata
+                    == torch.tensor(data_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+        if self.world_size == 3:
+            indptr_dist = [[0, 2, 2], [0, 1, 3], [0, 1]]
+            indices_dist = [[2, 4], [1, 0, 3], [4]]
+            data_dist = [[1, 2], [3, 4, 5], [6]]
+
+            lshape_dist = [(2, 5), (2, 5), (1, 5)]
+
+            dist_scipy_sparse_csr = scipy.sparse.csr_matrix(
+                (
+                    torch.tensor(data_dist[self.rank], device="cpu"),
+                    torch.tensor(indices_dist[self.rank], device="cpu"),
+                    torch.tensor(indptr_dist[self.rank], device="cpu"),
+                ),
+                shape=lshape_dist[self.rank],
+            )
+
+            self.assertEqual(heat_sparse_csr.shape, self.ref_torch_sparse_csr.shape)
+            self.assertEqual(heat_sparse_csr.lshape, lshape_dist[self.rank])
+            self.assertEqual(heat_sparse_csr.split, 0)
+            self.assertTrue((heat_sparse_csr.indptr == self.ref_indptr).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindptr
+                    == torch.tensor(indptr_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.indices == self.ref_indices).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.lindices
+                    == torch.tensor(indices_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+            self.assertTrue((heat_sparse_csr.data == self.ref_data).all())
+            self.assertTrue(
+                (
+                    heat_sparse_csr.ldata
+                    == torch.tensor(data_dist[self.rank], device=self.device.torch_device)
+                ).all()
+            )
+
+        # Errors (torch.Tensor)
+        with self.assertRaises(ValueError):
+            heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr, split=1)
+
+        with self.assertRaises(ValueError):
+            dist_torch_sparse_csr = torch.sparse_csr_tensor(
+                torch.tensor([0, 0, 0], device=self.device.torch_device),  # indptr
+                torch.tensor([], dtype=torch.int64, device=self.device.torch_device),  # indices
+                torch.tensor([], dtype=torch.int64, device=self.device.torch_device),  # data
+                size=(2, 2),
+            )
+
+            heat_sparse_csr = ht.sparse.sparse_csr_matrix(dist_torch_sparse_csr, is_split=1)
+
+        # Errors (scipy.sparse.csr_matrix)
+        with self.assertRaises(ValueError):
+            heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_scipy_sparse_csr, split=1)
+
+        with self.assertRaises(ValueError):
+            dist_scipy_sparse_csr = scipy.sparse.csr_matrix(
+                (
+                    torch.tensor([], dtype=torch.int64, device="cpu"),  # data
+                    torch.tensor([], dtype=torch.int64, device="cpu"),  # indices
+                    torch.tensor([0, 0, 0], device="cpu"),  # indptr
+                ),
+                shape=(2, 2),
+            )
+
+            heat_sparse_csr = ht.sparse.sparse_csr_matrix(dist_torch_sparse_csr, is_split=1)
+
+        # Invalid distribution for is_split
+        if self.world_size > 1:
+            with self.assertRaises(ValueError):
+                dist_torch_sparse_csr = torch.sparse_csr_tensor(
+                    torch.tensor(
+                        [0] * ((self.rank + 1) + 1), device=self.device.torch_device
+                    ),  # indptr
+                    torch.tensor([], dtype=torch.int64, device=self.device.torch_device),  # indices
+                    torch.tensor([], dtype=torch.int64, device=self.device.torch_device),  # data
+                    size=(self.rank + 1, self.rank + 1),
+                )
+
+                heat_sparse_csr = ht.sparse.sparse_csr_matrix(dist_torch_sparse_csr, is_split=0)
diff --git a/heat/sparse/tests/test_manipulations.py b/heat/sparse/tests/test_manipulations.py
new file mode 100644
index 0000000000..f03bfe9955
--- /dev/null
+++ b/heat/sparse/tests/test_manipulations.py
@@ -0,0 +1,93 @@
+import unittest
+import heat as ht
+import torch
+
+from heat.core.tests.test_suites.basic_test import TestCase
+
+
+@unittest.skipIf(
+    int(torch.__version__.split(".")[0]) <= 1 and int(torch.__version__.split(".")[1]) < 10,
+    f"ht.sparse requires torch >= 1.10. Found version {torch.__version__}.",
+)
+class TestManipulations(TestCase):
+    @classmethod
+    def setUpClass(self):
+
+        super(TestManipulations, self).setUpClass()
+        """
+        A = [[0, 0, 1, 0, 2]
+            [0, 0, 0, 0, 0]
+            [0, 3, 0, 0, 0]
+            [4, 0, 0, 5, 0]
+            [0, 0, 0, 0, 6]]
+        """
+        self.ref_indptr = torch.tensor(
+            [0, 2, 2, 3, 5, 6], dtype=torch.int, device=self.device.torch_device
+        )
+        self.ref_indices = torch.tensor(
+            [2, 4, 1, 0, 3, 4], dtype=torch.int, device=self.device.torch_device
+        )
+        self.ref_data = torch.tensor(
+            [1, 2, 3, 4, 5, 6], dtype=torch.float, device=self.device.torch_device
+        )
+        self.ref_torch_sparse_csr = torch.sparse_csr_tensor(
+            self.ref_indptr, self.ref_indices, self.ref_data, device=self.device.torch_device
+        )
+
+    def test_todense(self):
+
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr)
+
+        ref_dense_array = ht.array(
+            [
+                [0, 0, 1, 0, 2],
+                [0, 0, 0, 0, 0],
+                [0, 3, 0, 0, 0],
+                [4, 0, 0, 5, 0],
+                [0, 0, 0, 0, 6],
+            ]
+        )
+
+        dense_array = heat_sparse_csr.todense()
+
+        self.assertTrue(ht.equal(ref_dense_array, dense_array))
+        self.assertEqual(dense_array.split, None)
+        self.assertEqual(dense_array.dtype, heat_sparse_csr.dtype)
+        self.assertEqual(dense_array.shape, heat_sparse_csr.shape)
+
+        # with output buffer
+        out_buffer = ht.empty(shape=[5, 5])
+        heat_sparse_csr.todense(out=out_buffer)
+
+        self.assertTrue(ht.equal(ref_dense_array, out_buffer))
+        self.assertEqual(out_buffer.split, None)
+        self.assertEqual(out_buffer.dtype, heat_sparse_csr.dtype)
+        self.assertEqual(out_buffer.shape, heat_sparse_csr.shape)
+
+        # Distributed case
+        heat_sparse_csr = ht.sparse.sparse_csr_matrix(self.ref_torch_sparse_csr, split=0)
+
+        dense_array = heat_sparse_csr.todense()
+        ref_dense_array = ht.array(ref_dense_array, split=0)
+
+        self.assertTrue(ht.equal(ref_dense_array, dense_array))
+        self.assertEqual(dense_array.split, 0)
+        self.assertEqual(dense_array.dtype, heat_sparse_csr.dtype)
+        self.assertEqual(dense_array.shape, heat_sparse_csr.shape)
+
+        # with output buffer
+        out_buffer = ht.empty(shape=[5, 5], split=0)
+        heat_sparse_csr.todense(out=out_buffer)
+
+        self.assertTrue(ht.equal(ref_dense_array, out_buffer))
+        self.assertEqual(out_buffer.split, 0)
+        self.assertEqual(out_buffer.dtype, heat_sparse_csr.dtype)
+        self.assertEqual(out_buffer.shape, heat_sparse_csr.shape)
+
+        with self.assertRaises(ValueError):
+            out_buffer = ht.empty(shape=[3, 3], split=0)
+            heat_sparse_csr.todense(out=out_buffer)
+
+        with self.assertRaises(ValueError):
+            out_buffer = ht.empty(shape=[5, 5], split=None)
+            heat_sparse_csr.todense(out=out_buffer)
diff --git a/setup.py b/setup.py
index 2210ceaf97..0e8f00b0de 100644
--- a/setup.py
+++ b/setup.py
@@ -33,7 +33,7 @@
     install_requires=[
         "mpi4py>=3.0.0",
         "numpy>=1.13.0",
-        "torch>=1.7.0, <1.13.1",
+        "torch>=1.7.0, <1.13.2",
         "scipy>=0.14.0",
         "pillow>=6.0.0",
         "torchvision>=0.8.0",

From f98ed45c35da8d81f6749f4565e13f781d33abae Mon Sep 17 00:00:00 2001
From: ClaudiaComito <c.comito@fz-juelich.de@users.noreply.github.com>
Date: Thu, 22 Dec 2022 03:05:55 +0000
Subject: [PATCH 40/57] New PyTorch release

---
 .github/pytorch-release-versions/pytorch-latest.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/pytorch-release-versions/pytorch-latest.txt b/.github/pytorch-release-versions/pytorch-latest.txt
index 01b7568230..b50dd27dd9 100644
--- a/.github/pytorch-release-versions/pytorch-latest.txt
+++ b/.github/pytorch-release-versions/pytorch-latest.txt
@@ -1 +1 @@
-1.13.3
+1.13.1

From bdfc8d847f755219ef49881ef1f705c3b8205e8b Mon Sep 17 00:00:00 2001
From: ClaudiaComito <c.comito@fz-juelich.de@users.noreply.github.com>
Date: Mon, 2 Jan 2023 03:06:27 +0000
Subject: [PATCH 41/57] New PyTorch release

---
 .github/pytorch-release-versions/pytorch-latest.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/pytorch-release-versions/pytorch-latest.txt b/.github/pytorch-release-versions/pytorch-latest.txt
index b50dd27dd9..19765bd501 100644
--- a/.github/pytorch-release-versions/pytorch-latest.txt
+++ b/.github/pytorch-release-versions/pytorch-latest.txt
@@ -1 +1 @@
-1.13.1
+null

From 8fc5e2e3a431e456d9db9c433125740b3b46e277 Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Tue, 3 Jan 2023 03:43:44 +0000
Subject: [PATCH 42/57] [pre-commit.ci] pre-commit autoupdate
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

updates:
- [github.com/pycqa/pydocstyle: 6.1.1 → 6.2.0](https://github.com/pycqa/pydocstyle/compare/6.1.1...6.2.0)
---
 .pre-commit-config.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 4b1cee7560..257234a853 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -14,7 +14,7 @@ repos:
     hooks:
     -   id: black
 -   repo: https://github.com/pycqa/pydocstyle
-    rev: 6.1.1  # pick a git hash / tag to point to
+    rev: 6.2.0  # pick a git hash / tag to point to
     hooks:
     -   id: pydocstyle
         exclude: 'tests|benchmarks|examples|scripts|setup.py'  #|heat/utils/data/mnist.py|heat/utils/data/_utils.py  ?

From 3e543b79ad60ba251a30f7616b1292498d46a7af Mon Sep 17 00:00:00 2001
From: ClaudiaComito <c.comito@fz-juelich.de@users.noreply.github.com>
Date: Thu, 5 Jan 2023 03:05:37 +0000
Subject: [PATCH 43/57] New PyTorch release

---
 .github/pytorch-release-versions/pytorch-latest.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/pytorch-release-versions/pytorch-latest.txt b/.github/pytorch-release-versions/pytorch-latest.txt
index 19765bd501..b50dd27dd9 100644
--- a/.github/pytorch-release-versions/pytorch-latest.txt
+++ b/.github/pytorch-release-versions/pytorch-latest.txt
@@ -1 +1 @@
-null
+1.13.1

From e273bf32e1e8637a3232a778624a10a6efca6b5a Mon Sep 17 00:00:00 2001
From: ClaudiaComito <c.comito@fz-juelich.de@users.noreply.github.com>
Date: Mon, 9 Jan 2023 03:05:48 +0000
Subject: [PATCH 44/57] New PyTorch release

---
 .github/pytorch-release-versions/pytorch-latest.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/pytorch-release-versions/pytorch-latest.txt b/.github/pytorch-release-versions/pytorch-latest.txt
index b50dd27dd9..19765bd501 100644
--- a/.github/pytorch-release-versions/pytorch-latest.txt
+++ b/.github/pytorch-release-versions/pytorch-latest.txt
@@ -1 +1 @@
-1.13.1
+null

From 0fc6b9fccd38ffb814d71fa75749bd0ac132ee9f Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Tue, 10 Jan 2023 03:33:41 +0000
Subject: [PATCH 45/57] [pre-commit.ci] pre-commit autoupdate
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

updates:
- [github.com/pycqa/pydocstyle: 6.2.0 → 6.2.3](https://github.com/pycqa/pydocstyle/compare/6.2.0...6.2.3)
---
 .pre-commit-config.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 257234a853..4d5fcf5359 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -14,7 +14,7 @@ repos:
     hooks:
     -   id: black
 -   repo: https://github.com/pycqa/pydocstyle
-    rev: 6.2.0  # pick a git hash / tag to point to
+    rev: 6.2.3  # pick a git hash / tag to point to
     hooks:
     -   id: pydocstyle
         exclude: 'tests|benchmarks|examples|scripts|setup.py'  #|heat/utils/data/mnist.py|heat/utils/data/_utils.py  ?

From 54db5063dcb0841752a0335daac42893f35d9497 Mon Sep 17 00:00:00 2001
From: ClaudiaComito <c.comito@fz-juelich.de@users.noreply.github.com>
Date: Thu, 12 Jan 2023 03:05:35 +0000
Subject: [PATCH 46/57] New PyTorch release

---
 .github/pytorch-release-versions/pytorch-latest.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/pytorch-release-versions/pytorch-latest.txt b/.github/pytorch-release-versions/pytorch-latest.txt
index 19765bd501..b50dd27dd9 100644
--- a/.github/pytorch-release-versions/pytorch-latest.txt
+++ b/.github/pytorch-release-versions/pytorch-latest.txt
@@ -1 +1 @@
-null
+1.13.1

From da69840a1cc6e5209e3a2b2535bcd4778f0da3fe Mon Sep 17 00:00:00 2001
From: Claudia Comito <c.comito@fz-juelich.de>
Date: Sat, 14 Jan 2023 10:33:28 +0100
Subject: [PATCH 47/57] [skip ci] Quick start instructions for newcomers

---
 README.md            | 13 ++++++++++++-
 scripts/heat_dev.yml | 16 ++++++++++++++++
 scripts/heat_env.yml | 16 ++++++++++++++++
 scripts/heat_test.py |  9 +++++++++
 4 files changed, 53 insertions(+), 1 deletion(-)
 create mode 100644 scripts/heat_dev.yml
 create mode 100644 scripts/heat_env.yml
 create mode 100644 scripts/heat_test.py

diff --git a/README.md b/README.md
index 4f1f9204dd..8899b6880a 100644
--- a/README.md
+++ b/README.md
@@ -15,6 +15,12 @@ Project Status
 [![license: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
 [![Downloads](https://pepy.tech/badge/heat)](https://pepy.tech/project/heat)
 
+NEW!
+--------------
+- [Quick Start](quick_start.md) for new users and contributors (Jan 14, 2023)
+
+
+
 Goals
 -----
 
@@ -43,7 +49,9 @@ Features
 Getting Started
 ---------------
 
-Check out our Jupyter Notebook [tutorial](https://github.com/helmholtz-analytics/heat/blob/main/scripts/tutorial.ipynb)
+TL;DR: [Quick Start](quick_start.md)
+
+Check out our Jupyter Notebook [tutorial]((https://github.com/helmholtz-analytics/heat/blob/main/scripts/)tutorial.ipynb)
 right here on Github or in the /scripts directory.
 
 The complete documentation of the latest version is always deployed on
@@ -71,6 +79,8 @@ or automatically using the setup.py.
 Installation
 ------------
 
+TL;DR: [Quick Start](quick_start.md)
+
 Tagged releases are made available on the
 [Python Package Index (PyPI)](https://pypi.org/project/heat/). You can typically
 install the latest version with
@@ -87,6 +97,7 @@ More information can be found [here](https://pytorch.org/get-started/locally/).
 
 Hacking
 -------
+TL;DR: [Quick Start](quick_start.md)
 
 If you want to work with the development version, you can check out the sources using
 
diff --git a/scripts/heat_dev.yml b/scripts/heat_dev.yml
new file mode 100644
index 0000000000..3de812e489
--- /dev/null
+++ b/scripts/heat_dev.yml
@@ -0,0 +1,16 @@
+name: heat_dev
+channels:
+  - conda-forge
+  - defaults
+dependencies:
+  - python=3.9
+  - openmpi
+  - mpi4py
+  - h5py[version='>=2.9',build=mpi*]
+  - netcdf4
+  - pytorch=1.13.0
+  - torchvision
+  - scipy
+  - pre-commit
+  - black
+  - flake8
diff --git a/scripts/heat_env.yml b/scripts/heat_env.yml
new file mode 100644
index 0000000000..9d9130c22f
--- /dev/null
+++ b/scripts/heat_env.yml
@@ -0,0 +1,16 @@
+name: heat_env
+channels:
+  - conda-forge
+  - defaults
+dependencies:
+  - python=3.9
+  - openmpi
+  - mpi4py
+  - h5py[version='>=2.9',build=mpi*]
+  - netcdf4
+  - pytorch=1.13.0
+  - torchvision
+  - scipy
+  - pip
+  - pip:
+    - heat
diff --git a/scripts/heat_test.py b/scripts/heat_test.py
new file mode 100644
index 0000000000..98a5e49a37
--- /dev/null
+++ b/scripts/heat_test.py
@@ -0,0 +1,9 @@
+""" Test script for MPI & Heat installation """
+
+import heat as ht
+
+x = ht.arange(10, split=0)
+if x.comm.rank == 0:
+    print("x is distributed: ", x.is_distributed())
+print("Global DNDarray x: ", x)
+print("Local torch tensor on rank ", x.comm.rank, ": ", x.larray)

From 25827fe01bda3a3da2935f9b4235cc972f98d957 Mon Sep 17 00:00:00 2001
From: Claudia Comito <c.comito@fz-juelich.de>
Date: Sat, 14 Jan 2023 10:35:46 +0100
Subject: [PATCH 48/57] [skip ci]  Add quick_start file

---
 quick_start.md | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)
 create mode 100644 quick_start.md

diff --git a/quick_start.md b/quick_start.md
new file mode 100644
index 0000000000..0fcf1b4f87
--- /dev/null
+++ b/quick_start.md
@@ -0,0 +1,80 @@
+## Heat Quick Start
+
+No-frills instructions for [new users](#new-users-condaconda-pippip-hpchpc-dockerdocker) and [new contributors](#new-contributors).
+
+## New Users ([conda](#conda), [pip](#pip), [HPC](#hpc), [Docker](#docker))
+
+### `conda`
+A Heat conda build is [in progress](https://github.com/helmholtz-analytics/heat/issues/1050).
+The script [heat_env.yml](https://github.com/helmholtz-analytics/heat/blob/main/scripts/heat_env.yml):
+- creates a virtual environment `heat_env`
+- installs all dependencies including OpenMPI using [conda](https://conda.io/projects/conda/en/latest/user-guide/getting-started.html)
+- installs Heat via `pip`
+
+```
+conda env create -f heat_env.yml
+conda activate heat_env
+```
+
+[Test](#test) your installation.
+
+### `pip`
+
+Pre-requisite: MPI installation. We test with [OpenMPI](https://docs.open-mpi.org/en/v5.0.x/installing-open-mpi/index.html)
+
+Virtual environment and installation:
+```
+python -m venv heat_env
+source heat_env/bin/activate
+pip install heat[hdf5,netcdf]
+```
+[Test](#test) your installation.
+
+### HPC
+Work in progress...
+
+### Docker
+Work in progress ([PR 970](https://github.com/helmholtz-analytics/heat/pull/970))
+
+### Test
+In your terminal, test your setup with the [`heat_test.py`](https://github.com/helmholtz-analytics/heat/blob/main/scripts/heat_test.py) script:
+
+```
+mpirun -n 2 python heat_test.py
+```
+
+It should print something like this:
+```
+x is distributed:  True
+Global DNDarray x:  DNDarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=ht.int32, device=cpu:0, split=0)
+Global DNDarray x:
+Local torch tensor on rank  0 :  tensor([0, 1, 2, 3, 4], dtype=torch.int32)
+Local torch tensor on rank  1 :  tensor([5, 6, 7, 8, 9], dtype=torch.int32)
+```
+
+## New Contributors
+
+1. Clone the [Heat repository](https://github.com/helmholtz-analytics/heat).
+2. Create a virtual environment `heat_dev` with all dependencies via [heat_dev.yml](https://github.com/helmholtz-analytics/heat/blob/main/scripts/heat_dev.yml). Note that `heat_dev.yml` does not install Heat via `pip` (as opposed to [`heat_env.yml`](#conda) for users).
+
+```
+conda env create -f heat_dev.yml
+conda activate heat_dev
+```
+
+
+3. In the `/heat` directory of your local repo, install the [pre-commit hooks]( https://pre-commit.com/):
+```
+cd $MY_REPO_DIR/heat/
+pre-commit install
+```
+
+4. Pick an Issue you'd like to work on. Check out [Good First Issues](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22), start from most recent.
+
+5. New branches should be named according to the following scheme:
+   - New feature: `features/ISSUE_NUMBER-my-new-feature`
+   - Bug fix: `bugs/ISSUE_NUMBER-my-bug-fix`
+   - Documentation: `docs/ISSUE_NUMBER-my-better-docs`
+   - Automation (CI, GitHub Actions etc.): `workflows/ISSUE_NUMBER-my-fancy-workflow`
+
+6. After making your changes, go ahead create a Pull Request so we can review them. Thank you so much!

From dd87bed8811ef2597201a86904e09201be01c3b8 Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Sat, 14 Jan 2023 10:36:51 +0100
Subject: [PATCH 49/57] [skip ci]

---
 quick_start.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/quick_start.md b/quick_start.md
index 0fcf1b4f87..b045aef358 100644
--- a/quick_start.md
+++ b/quick_start.md
@@ -1,4 +1,4 @@
-## Heat Quick Start
+# Heat Quick Start
 
 No-frills instructions for [new users](#new-users-condaconda-pippip-hpchpc-dockerdocker) and [new contributors](#new-contributors).
 

From 8a6fbe94f58b30c1b02b893576e54c2e049a1074 Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Sun, 15 Jan 2023 08:20:12 +0100
Subject: [PATCH 50/57] [skip ci] update pytorch latest version

---
 .github/pytorch-release-versions/pytorch-latest.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/pytorch-release-versions/pytorch-latest.txt b/.github/pytorch-release-versions/pytorch-latest.txt
index feaae22bac..b50dd27dd9 100644
--- a/.github/pytorch-release-versions/pytorch-latest.txt
+++ b/.github/pytorch-release-versions/pytorch-latest.txt
@@ -1 +1 @@
-1.13.0
+1.13.1

From 5a03aae51f30ca6ff97f33f86b20584e559d190e Mon Sep 17 00:00:00 2001
From: Claudia Comito <c.comito@fz-juelich.de>
Date: Mon, 16 Jan 2023 07:07:43 +0100
Subject: [PATCH 51/57] [skip ci] Add paragraph on unit tests

---
 quick_start.md | 60 +++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 49 insertions(+), 11 deletions(-)

diff --git a/quick_start.md b/quick_start.md
index b045aef358..213234bacb 100644
--- a/quick_start.md
+++ b/quick_start.md
@@ -54,27 +54,65 @@ Local torch tensor on rank  1 :  tensor([5, 6, 7, 8, 9], dtype=torch.int32)
 
 ## New Contributors
 
-1. Clone the [Heat repository](https://github.com/helmholtz-analytics/heat).
+1. [Fork](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) or, if you have write access, clone the [Heat repository](https://github.com/helmholtz-analytics/heat).
+
 2. Create a virtual environment `heat_dev` with all dependencies via [heat_dev.yml](https://github.com/helmholtz-analytics/heat/blob/main/scripts/heat_dev.yml). Note that `heat_dev.yml` does not install Heat via `pip` (as opposed to [`heat_env.yml`](#conda) for users).
 
-```
-conda env create -f heat_dev.yml
-conda activate heat_dev
-```
+    ```
+    conda env create -f heat_dev.yml
+    conda activate heat_dev
+    ```
 
 
 3. In the `/heat` directory of your local repo, install the [pre-commit hooks]( https://pre-commit.com/):
-```
-cd $MY_REPO_DIR/heat/
-pre-commit install
-```
+
+    ```
+    cd $MY_REPO_DIR/heat/
+    pre-commit install
+    ```
 
 4. Pick an Issue you'd like to work on. Check out [Good First Issues](https://github.com/helmholtz-analytics/heat/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22), start from most recent.
 
-5. New branches should be named according to the following scheme:
+5. [New branches](https://docs.github.com/en/get-started/quickstart/contributing-to-projects#creating-a-branch-to-work-on) should be named according to the following scheme:
    - New feature: `features/ISSUE_NUMBER-my-new-feature`
    - Bug fix: `bugs/ISSUE_NUMBER-my-bug-fix`
    - Documentation: `docs/ISSUE_NUMBER-my-better-docs`
    - Automation (CI, GitHub Actions etc.): `workflows/ISSUE_NUMBER-my-fancy-workflow`
 
-6. After making your changes, go ahead create a Pull Request so we can review them. Thank you so much!
+6. Write and run (locally) [unit tests](https://docs.python.org/3/library/unittest.html) for any change you introduce. Here's a sample of our [test modules](https://github.com/helmholtz-analytics/heat/tree/main/heat/core/tests).
+
+    Running all unit tests locally, e.g. on 3 processes:
+
+    ```
+    mpirun -n 3 python -m unittest
+    ```
+    Testing one module only, e.g. `manipulations`:
+
+    ```
+    mpirun -n 3 python -m unittest heat/core/tests/test_manipulations.py
+    ```
+
+    Testing one function within a module, e.g. `manipulations.concatenate`:
+
+    ```
+    mpirun -n 3 python -m unittest heat.core.tests.test_manipulations.TestManipulations.test_concatenate
+    ```
+
+    Testing with CUDA (if available):
+
+    ```
+    export HEAT_TEST_USE_DEVICE=gpu
+    mpirun -n 3 python -m unittest
+    ```
+
+    Helpful options for debugging:
+
+    ```
+    mpirun --tag-output -n 3 python -m unittest -vf
+    ```
+
+
+7. After [making and pushing](https://docs.github.com/en/get-started/quickstart/contributing-to-projects#making-and-pushing-changes) your changes, go ahead and [create a Pull Request](https://docs.github.com/en/get-started/quickstart/contributing-to-projects#making-a-pull-request). Make sure you go through the Due Diligence checklist (part of our PR template).
+
+
+    ## Thank you so much for your time!

From 73e62041893ea52a717d888972f1501acb947b11 Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Thu, 19 Jan 2023 09:14:21 +0100
Subject: [PATCH 52/57] Fix edge-case contiguity mismatch for Allgatherv
 (#1058)

* Fix edge-case contiguity mismatch for Allgatherv

* Update ubuntu

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* switch back to ubuntu 20.04

* Upgrade CI to ubuntu 22.04 and cuda 11.7.1

* avoid unnecessary gathering of test DNDarrays

* early out for resplit of non-distributed DNDarrays

* match split of comparison array to expected output

* avoid MPI calls in non-distributed cases

* avoid MPI calls in non-distributed resplit

* set  default to None

* remove print statement

* upgrade torch version

* copy to cpu before comparing

* use ht.allclose instead of np.allclose

* cast different dtype operands to promoted dtype within torch call

* compare local tensors to corresponding slice of expected_array only

* expand tests

* remove redundant code

* use pytorch with cuda117 support

* [skip ci] Update heat/core/communication.py

Co-authored-by: mtar <m.tarnawa@fz-juelich.de>

* [skip ci] Update heat/core/communication.py

Co-authored-by: mtar <m.tarnawa@fz-juelich.de>

* [skip ci]  Update heat/core/communication.py

Co-authored-by: mtar <m.tarnawa@fz-juelich.de>

* Remove dead code

* Update pytorch-latest.txt

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: mtar <m.tarnawa@fz-juelich.de>
---
 .github/release-drafter.yml               |  2 +-
 .gitlab-ci.yml                            |  4 +--
 heat/core/communication.py                | 42 +++++++++++++++--------
 heat/core/dndarray.py                     |  3 ++
 heat/core/linalg/basics.py                | 27 +++++++++------
 heat/core/linalg/tests/test_basics.py     | 26 ++++++++++++--
 heat/core/logical.py                      | 14 +++++++-
 heat/core/manipulations.py                |  8 ++---
 heat/core/tests/test_dndarray.py          |  1 -
 heat/core/tests/test_logical.py           |  2 ++
 heat/core/tests/test_manipulations.py     | 10 ++++++
 heat/core/tests/test_suites/basic_test.py |  8 +++--
 12 files changed, 108 insertions(+), 39 deletions(-)

diff --git a/.github/release-drafter.yml b/.github/release-drafter.yml
index c1abd3124d..7fef410249 100644
--- a/.github/release-drafter.yml
+++ b/.github/release-drafter.yml
@@ -34,7 +34,7 @@ categories:
     label: 'chore'
   - title: '🧪 Testing'
     label: 'testing'
-  
+
 change-template: '- #$NUMBER $TITLE (by @$AUTHOR)'
 categorie-template: '### $TITLE'
 exclude-labels:
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 822a501a9a..9be27312dd 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -1,5 +1,5 @@
 test:
-  image: nvidia/cuda:11.6.2-runtime-ubuntu20.04
+  image: nvidia/cuda:11.7.1-runtime-ubuntu22.04
   tags:
     - cuda
     - x86_64
@@ -9,7 +9,7 @@ test:
     - DEBIAN_FRONTEND=noninteractive apt -y install libopenmpi-dev openmpi-bin openmpi-doc
     - apt -y install libhdf5-openmpi-dev libpnetcdf-dev
     - pip install pytest coverage
-    - pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
+    - pip3 install torch torchvision torchaudio
     - pip install .[hdf5,netcdf]
     - COVERAGE_FILE=report/cov/coverage1 HEAT_TEST_USE_DEVICE=cpu mpirun --allow-run-as-root -n 1 coverage run --source=heat --parallel-mode -m pytest --junitxml=report/test/report1.xml heat/
     - COVERAGE_FILE=report/cov/coverage2 HEAT_TEST_USE_DEVICE=gpu mpirun --allow-run-as-root -n 3 coverage run --source=heat --parallel-mode -m pytest --junitxml=report/test/report3.xml heat/
diff --git a/heat/core/communication.py b/heat/core/communication.py
index ad58dae964..9aa71323da 100644
--- a/heat/core/communication.py
+++ b/heat/core/communication.py
@@ -240,7 +240,11 @@ def counts_displs_shape(
 
     @classmethod
     def mpi_type_and_elements_of(
-        cls, obj: Union[DNDarray, torch.Tensor], counts: Tuple[int], displs: Tuple[int]
+        cls,
+        obj: Union[DNDarray, torch.Tensor],
+        counts: Tuple[int],
+        displs: Tuple[int],
+        is_contiguous: Optional[bool],
     ) -> Tuple[MPI.Datatype, Tuple[int, ...]]:
         """
         Determines the MPI data type and number of respective elements for the given tensor (:class:`~heat.core.dndarray.DNDarray`
@@ -255,12 +259,18 @@ def mpi_type_and_elements_of(
             Optional counts arguments for variable MPI-calls (e.g. Alltoallv)
         displs : Tuple[ints,...], optional
             Optional displacements arguments for variable MPI-calls (e.g. Alltoallv)
+        is_contiguous: bool
+            Information on global contiguity of the memory-distributed object. If `None`, it will be set to local contiguity via ``torch.Tensor.is_contiguous()``.
         # ToDo: The option to explicitely specify the counts and displacements to be send still needs propper implementation
         """
         mpi_type, elements = cls.__mpi_type_mappings[obj.dtype], torch.numel(obj)
 
-        # simple case, continuous memory can be transmitted as is
-        if obj.is_contiguous():
+        # simple case, contiguous memory can be transmitted as is
+        if is_contiguous is None:
+            # determine local contiguity
+            is_contiguous = obj.is_contiguous()
+
+        if is_contiguous:
             if counts is None:
                 return mpi_type, elements
             else:
@@ -273,7 +283,7 @@ def mpi_type_and_elements_of(
                     ),
                 )
 
-        # non-continuous memory, e.g. after a transpose, has to be packed in derived MPI types
+        # non-contiguous memory, e.g. after a transpose, has to be packed in derived MPI types
         elements = obj.shape[0]
         shape = obj.shape[1:]
         strides = [1] * len(shape)
@@ -305,7 +315,11 @@ def as_mpi_memory(cls, obj) -> MPI.memory:
 
     @classmethod
     def as_buffer(
-        cls, obj: torch.Tensor, counts: Tuple[int] = None, displs: Tuple[int] = None
+        cls,
+        obj: torch.Tensor,
+        counts: Tuple[int] = None,
+        displs: Tuple[int] = None,
+        is_contiguous: Optional[bool] = None,
     ) -> List[Union[MPI.memory, Tuple[int, int], MPI.Datatype]]:
         """
         Converts a passed ``torch.Tensor`` into a memory buffer object with associated number of elements and MPI data type.
@@ -318,14 +332,16 @@ def as_buffer(
             Optional counts arguments for variable MPI-calls (e.g. Alltoallv)
         displs : Tuple[int,...], optional
             Optional displacements arguments for variable MPI-calls (e.g. Alltoallv)
+        is_contiguous: bool, optional
+            Optional information on global contiguity of the memory-distributed object.
         """
         squ = False
         if not obj.is_contiguous() and obj.ndim == 1:
             # this makes the math work below this function.
             obj.unsqueeze_(-1)
             squ = True
-        mpi_type, elements = cls.mpi_type_and_elements_of(obj, counts, displs)
 
+        mpi_type, elements = cls.mpi_type_and_elements_of(obj, counts, displs, is_contiguous)
         mpi_mem = cls.as_mpi_memory(obj)
         if squ:
             # the squeeze happens in the mpi_type_and_elements_of function in the case of a
@@ -1037,7 +1053,6 @@ def __allgather_like(
                         type(sendbuf)
                     )
                 )
-
         # unpack the receive buffer
         if isinstance(recvbuf, tuple):
             recvbuf, recv_counts, recv_displs = recvbuf
@@ -1053,17 +1068,18 @@ def __allgather_like(
 
         # keep a reference to the original buffer object
         original_recvbuf = recvbuf
-
+        sbuf_is_contiguous, rbuf_is_contiguous = None, None
         # permute the send_axis order so that the split send_axis is the first to be transmitted
         if axis != 0:
             send_axis_permutation = list(range(sendbuf.ndimension()))
             send_axis_permutation[0], send_axis_permutation[axis] = axis, 0
             sendbuf = sendbuf.permute(*send_axis_permutation)
+            sbuf_is_contiguous = False
 
-        if axis != 0:
             recv_axis_permutation = list(range(recvbuf.ndimension()))
             recv_axis_permutation[0], recv_axis_permutation[axis] = axis, 0
             recvbuf = recvbuf.permute(*recv_axis_permutation)
+            rbuf_is_contiguous = False
         else:
             recv_axis_permutation = None
 
@@ -1074,20 +1090,18 @@ def __allgather_like(
         if sendbuf is MPI.IN_PLACE or not isinstance(sendbuf, torch.Tensor):
             mpi_sendbuf = sbuf
         else:
-            mpi_sendbuf = self.as_buffer(sbuf, send_counts, send_displs)
+            mpi_sendbuf = self.as_buffer(sbuf, send_counts, send_displs, sbuf_is_contiguous)
             if send_counts is not None:
                 mpi_sendbuf[1] = mpi_sendbuf[1][0][self.rank]
 
         if recvbuf is MPI.IN_PLACE or not isinstance(recvbuf, torch.Tensor):
             mpi_recvbuf = rbuf
         else:
-            mpi_recvbuf = self.as_buffer(rbuf, recv_counts, recv_displs)
+            mpi_recvbuf = self.as_buffer(rbuf, recv_counts, recv_displs, rbuf_is_contiguous)
             if recv_counts is None:
                 mpi_recvbuf[1] //= self.size
-
         # perform the scatter operation
         exit_code = func(mpi_sendbuf, mpi_recvbuf, **kwargs)
-
         return exit_code, sbuf, rbuf, original_recvbuf, recv_axis_permutation
 
     def Allgather(
@@ -1260,7 +1274,7 @@ def __alltoall_like(
         # keep a reference to the original buffer object
         original_recvbuf = recvbuf
 
-        # Simple case, continuous buffers can be transmitted as is
+        # Simple case, contiguous buffers can be transmitted as is
         if send_axis < 2 and recv_axis < 2:
             send_axis_permutation = list(range(recvbuf.ndimension()))
             recv_axis_permutation = list(range(recvbuf.ndimension()))
diff --git a/heat/core/dndarray.py b/heat/core/dndarray.py
index 9ec0ea89e1..6e9d2c56ef 100644
--- a/heat/core/dndarray.py
+++ b/heat/core/dndarray.py
@@ -1268,8 +1268,11 @@ def resplit_(self, axis: int = None):
         axis = sanitize_axis(self.shape, axis)
 
         # early out for unchanged content
+        if self.comm.size == 1:
+            self.__split = axis
         if axis == self.split:
             return self
+
         if axis is None:
             gathered = torch.empty(
                 self.shape, dtype=self.dtype.torch_type(), device=self.device.torch_device
diff --git a/heat/core/linalg/basics.py b/heat/core/linalg/basics.py
index bc5d3e9e65..7a2776386b 100644
--- a/heat/core/linalg/basics.py
+++ b/heat/core/linalg/basics.py
@@ -510,6 +510,13 @@ def matmul(a: DNDarray, b: DNDarray, allow_resplit: bool = False) -> DNDarray:
     if b.dtype != c_type:
         b = c_type(b, device=b.device)
 
+    # early out for single-process setup, torch matmul
+    if a.comm.size == 1:
+        ret = factories.array(torch.matmul(a.larray, b.larray), device=a.device)
+        if gpu_int_flag:
+            ret = og_type(ret, device=a.device)
+        return ret
+
     if a.split is None and b.split is None:  # matmul from torch
         if len(a.gshape) < 2 or len(b.gshape) < 2 or not allow_resplit:
             # if either of A or B is a vector
@@ -517,17 +524,17 @@ def matmul(a: DNDarray, b: DNDarray, allow_resplit: bool = False) -> DNDarray:
             if gpu_int_flag:
                 ret = og_type(ret, device=a.device)
             return ret
-        else:
-            a.resplit_(0)
-            slice_0 = a.comm.chunk(a.shape, a.split)[2][0]
-            hold = a.larray @ b.larray
 
-            c = factories.zeros((a.gshape[-2], b.gshape[1]), dtype=c_type, device=a.device)
-            c.larray[slice_0.start : slice_0.stop, :] += hold
-            c.comm.Allreduce(MPI.IN_PLACE, c, MPI.SUM)
-            if gpu_int_flag:
-                c = og_type(c, device=a.device)
-            return c
+        a.resplit_(0)
+        slice_0 = a.comm.chunk(a.shape, a.split)[2][0]
+        hold = a.larray @ b.larray
+
+        c = factories.zeros((a.gshape[-2], b.gshape[1]), dtype=c_type, device=a.device)
+        c.larray[slice_0.start : slice_0.stop, :] += hold
+        c.comm.Allreduce(MPI.IN_PLACE, c, MPI.SUM)
+        if gpu_int_flag:
+            c = og_type(c, device=a.device)
+        return c
 
     # if they are vectors they need to be expanded to be the proper dimensions
     vector_flag = False  # flag to run squeeze at the end of the function
diff --git a/heat/core/linalg/tests/test_basics.py b/heat/core/linalg/tests/test_basics.py
index a3cb827b84..08e0ac43dd 100644
--- a/heat/core/linalg/tests/test_basics.py
+++ b/heat/core/linalg/tests/test_basics.py
@@ -237,7 +237,6 @@ def test_inv(self):
         self.assertTupleEqual(ainv.shape, a.shape)
         self.assertTrue(ht.allclose(ainv, ares, atol=1e-6))
 
-        # distributed
         a = ht.array([[5.0, -3, 2], [-3, 2, -1], [-3, 2, -2]], split=0)
         ainv = ht.linalg.inv(a)
         self.assertEqual(ainv.split, a.split)
@@ -245,6 +244,7 @@ def test_inv(self):
         self.assertTupleEqual(ainv.shape, a.shape)
         self.assertTrue(ht.allclose(ainv, ares, atol=1e-6))
 
+        ares = ht.array([[2.0, 2, 1], [3, 4, 1], [0, 1, -1]], split=1)
         a = ht.array([[5.0, -3, 2], [-3, 2, -1], [-3, 2, -2]], split=1)
         ainv = ht.linalg.inv(a)
         self.assertEqual(ainv.split, a.split)
@@ -281,7 +281,7 @@ def test_inv(self):
         self.assertTrue(ht.allclose(ainv, ares, atol=1e-6))
 
         # pivoting row change
-        ares = ht.array([[-1, 0, 2], [2, 0, -1], [-6, 3, 0]], dtype=ht.double) / 3.0
+        ares = ht.array([[-1, 0, 2], [2, 0, -1], [-6, 3, 0]], dtype=ht.double, split=0) / 3.0
         a = ht.array([[1, 2, 0], [2, 4, 1], [2, 1, 0]], dtype=ht.double, split=0)
         ainv = ht.linalg.inv(a)
         self.assertEqual(ainv.split, a.split)
@@ -289,6 +289,7 @@ def test_inv(self):
         self.assertTupleEqual(ainv.shape, a.shape)
         self.assertTrue(ht.allclose(ainv, ares, atol=1e-6))
 
+        ares = ht.array([[-1, 0, 2], [2, 0, -1], [-6, 3, 0]], dtype=ht.double, split=1) / 3.0
         a = ht.array([[1, 2, 0], [2, 4, 1], [2, 1, 0]], dtype=ht.double, split=1)
         ainv = ht.linalg.inv(a)
         self.assertEqual(ainv.split, a.split)
@@ -365,9 +366,28 @@ def test_matmul(self):
         self.assertEqual(ret00.shape, (n, k))
         self.assertEqual(ret00.dtype, ht.float)
         self.assertEqual(ret00.split, None)
-        self.assertEqual(a.split, 0)
+        if a.comm.size > 1:
+            self.assertEqual(a.split, 0)
         self.assertEqual(b.split, None)
 
+        # splits 0 None on 1 process
+        if a.comm.size == 1:
+            a = ht.ones((n, m), split=0)
+            b = ht.ones((j, k), split=None)
+            a[0] = ht.arange(1, m + 1)
+            a[:, -1] = ht.arange(1, n + 1)
+            b[0] = ht.arange(1, k + 1)
+            b[:, 0] = ht.arange(1, j + 1)
+            ret00 = ht.matmul(a, b, allow_resplit=True)
+
+            self.assertEqual(ht.all(ret00 == ht.array(a_torch @ b_torch)), 1)
+            self.assertIsInstance(ret00, ht.DNDarray)
+            self.assertEqual(ret00.shape, (n, k))
+            self.assertEqual(ret00.dtype, ht.float)
+            self.assertEqual(ret00.split, None)
+            self.assertEqual(a.split, 0)
+            self.assertEqual(b.split, None)
+
         if a.comm.size > 1:
             # splits 00
             a = ht.ones((n, m), split=0, dtype=ht.float64)
diff --git a/heat/core/logical.py b/heat/core/logical.py
index a6be081ea7..8106a556ee 100644
--- a/heat/core/logical.py
+++ b/heat/core/logical.py
@@ -140,7 +140,19 @@ def allclose(
     t1, t2 = __sanitize_close_input(x, y)
 
     # no sanitation for shapes of x and y needed, torch.allclose raises relevant errors
-    _local_allclose = torch.tensor(torch.allclose(t1.larray, t2.larray, rtol, atol, equal_nan))
+    try:
+        _local_allclose = torch.tensor(torch.allclose(t1.larray, t2.larray, rtol, atol, equal_nan))
+    except RuntimeError:
+        promoted_dtype = torch.promote_types(t1.larray.dtype, t2.larray.dtype)
+        _local_allclose = torch.tensor(
+            torch.allclose(
+                t1.larray.type(promoted_dtype),
+                t2.larray.type(promoted_dtype),
+                rtol,
+                atol,
+                equal_nan,
+            )
+        )
 
     # If x is distributed, then y is also distributed along the same axis
     if t1.comm.is_distributed():
diff --git a/heat/core/manipulations.py b/heat/core/manipulations.py
index 33ebf4d365..7cf02ab016 100644
--- a/heat/core/manipulations.py
+++ b/heat/core/manipulations.py
@@ -3372,6 +3372,9 @@ def resplit(arr: DNDarray, axis: int = None) -> DNDarray:
     # early out for unchanged content
     if axis == arr.split:
         return arr.copy()
+    if not arr.is_distributed():
+        return factories.array(arr.larray, split=axis, device=arr.device, copy=True)
+
     if axis is None:
         # new_arr = arr.copy()
         gathered = torch.empty(
@@ -3381,11 +3384,6 @@ def resplit(arr: DNDarray, axis: int = None) -> DNDarray:
         arr.comm.Allgatherv(arr.larray, (gathered, counts, displs), recv_axis=arr.split)
         new_arr = factories.array(gathered, is_split=axis, device=arr.device, dtype=arr.dtype)
         return new_arr
-    # tensor needs be split/sliced locally
-    if arr.split is None:
-        temp = arr.larray[arr.comm.chunk(arr.shape, axis)[2]]
-        new_arr = factories.array(temp, is_split=axis, device=arr.device, dtype=arr.dtype)
-        return new_arr
 
     arr_tiles = tiling.SplitTiles(arr)
     new_arr = factories.empty(arr.gshape, split=axis, dtype=arr.dtype, device=arr.device)
diff --git a/heat/core/tests/test_dndarray.py b/heat/core/tests/test_dndarray.py
index e42c5a9a14..726a85e77a 100644
--- a/heat/core/tests/test_dndarray.py
+++ b/heat/core/tests/test_dndarray.py
@@ -126,7 +126,6 @@ def test_gethalo(self):
             # test no data on process
             data_np = np.arange(2 * 12).reshape(2, 12)
             data = ht.array(data_np, split=0)
-            print("DEBUGGING: data.lshape_map = ", data.lshape_map)
             data.get_halo(1)
 
             data_with_halos = data.array_with_halos
diff --git a/heat/core/tests/test_logical.py b/heat/core/tests/test_logical.py
index 691df7ec62..c2e3d1a786 100644
--- a/heat/core/tests/test_logical.py
+++ b/heat/core/tests/test_logical.py
@@ -182,6 +182,7 @@ def test_allclose(self):
         c = ht.zeros((4, 6), split=0)
         d = ht.zeros((4, 6), split=1)
         e = ht.zeros((4, 6))
+        f = ht.float64([[2.000005, 2.000005], [2.000005, 2.000005]])
 
         self.assertFalse(ht.allclose(a, b))
         self.assertTrue(ht.allclose(a, b, atol=1e-04))
@@ -189,6 +190,7 @@ def test_allclose(self):
         self.assertTrue(ht.allclose(a, 2))
         self.assertTrue(ht.allclose(a, 2.0))
         self.assertTrue(ht.allclose(2, a))
+        self.assertTrue(ht.allclose(f, a))
         self.assertTrue(ht.allclose(c, d))
         self.assertTrue(ht.allclose(c, e))
         self.assertTrue(e.allclose(c))
diff --git a/heat/core/tests/test_manipulations.py b/heat/core/tests/test_manipulations.py
index 9a41bceab8..4464053fd3 100644
--- a/heat/core/tests/test_manipulations.py
+++ b/heat/core/tests/test_manipulations.py
@@ -2992,6 +2992,16 @@ def test_resplit(self):
             self.assertEqual(data2.lshape, (data.comm.size, 1))
             self.assertEqual(data2.split, 1)
 
+            # resplitting a non-distributed DNDarray with split not None
+            if ht.MPI_WORLD.size == 1:
+                data = ht.zeros(10, 10, split=0)
+                data2 = ht.resplit(data, 1)
+                data3 = ht.resplit(data, None)
+                self.assertTrue((data == data2).all())
+                self.assertTrue((data == data3).all())
+                self.assertEqual(data2.split, 1)
+                self.assertTrue(data3.split is None)
+
             # splitting an unsplit tensor should result in slicing the tensor locally
             shape = (ht.MPI_WORLD.size, ht.MPI_WORLD.size)
             data = ht.zeros(shape)
diff --git a/heat/core/tests/test_suites/basic_test.py b/heat/core/tests/test_suites/basic_test.py
index f094668bc8..39f6a5f063 100644
--- a/heat/core/tests/test_suites/basic_test.py
+++ b/heat/core/tests/test_suites/basic_test.py
@@ -136,8 +136,12 @@ def assert_array_equal(self, heat_array, expected_array):
             "Local shapes do not match. "
             "Got {} expected {}".format(heat_array.lshape, expected_array[slices].shape),
         )
-        local_heat_numpy = heat_array.numpy()
-        self.assertTrue(np.allclose(local_heat_numpy, expected_array))
+        # compare local tensors to corresponding slice of expected_array
+        is_allclose = np.allclose(heat_array.larray.cpu(), expected_array[slices])
+        ht_is_allclose = ht.array(
+            [is_allclose], dtype=ht.bool, is_split=0, device=heat_array.device
+        )
+        self.assertTrue(ht.all(ht_is_allclose))
 
     def assert_func_equal(
         self,

From 4d89640d3dae44960cb678cd7eec82a18bdd6c6c Mon Sep 17 00:00:00 2001
From: Claudia Comito <c.comito@fz-juelich.de>
Date: Thu, 19 Jan 2023 09:25:39 +0100
Subject: [PATCH 53/57] [skip ci] Upgrade version before release

---
 heat/core/version.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/heat/core/version.py b/heat/core/version.py
index eacc02cd3e..e363346349 100644
--- a/heat/core/version.py
+++ b/heat/core/version.py
@@ -4,7 +4,7 @@
 """Indicates Heat's main version."""
 minor: int = 2
 """Indicates feature extension."""
-micro: int = 1
+micro: int = 2
 """Indicates revisions for bugfixes."""
 extension: str = None
 """Indicates special builds, e.g. for specific hardware."""

From efe1fdcd8fd34eaacfde1da2aba7027bba080c9b Mon Sep 17 00:00:00 2001
From: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Date: Thu, 19 Jan 2023 10:25:49 +0100
Subject: [PATCH 54/57] Update ubuntu version to latest

---
 .github/workflows/ci.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
index 956c1ce5d6..714337d9ef 100644
--- a/.github/workflows/ci.yaml
+++ b/.github/workflows/ci.yaml
@@ -7,7 +7,7 @@ on:
 jobs:
   approved:
     if: github.event.review.state == 'approved'
-    runs-on: ubuntu-20.04
+    runs-on: ubuntu-latest
     strategy:
       fail-fast: false
       matrix:

From 17e5e8970cf03d7b76b9347f048eb5286e6f945f Mon Sep 17 00:00:00 2001
From: mtar <m.tarnawa@fz-juelich.de>
Date: Mon, 23 Jan 2023 14:57:11 +0100
Subject: [PATCH 55/57] add __eq__ to Device (#1063)

* add __eq__ to Device

* restructure tests

* add equality with torch device objects

* update docstring

* minimise code
---
 heat/core/devices.py            | 18 +++++++++++++++++-
 heat/core/tests/test_devices.py |  8 ++++++++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/heat/core/devices.py b/heat/core/devices.py
index c70f212682..3e6f9fe3b6 100644
--- a/heat/core/devices.py
+++ b/heat/core/devices.py
@@ -6,7 +6,7 @@
 
 import torch
 
-from typing import Optional, Union
+from typing import Any, Optional, Union
 
 from . import communication
 
@@ -74,6 +74,22 @@ def __str__(self) -> str:
         """
         return "{}:{}".format(self.device_type, self.device_id)
 
+    def __eq__(self, other: Any) -> bool:
+        """
+        Overloads the `==` operator for local equal check.
+
+        Parameters
+        ----------
+        other : Any
+            The object to compare with
+        """
+        if isinstance(other, Device):
+            return self.device_type == other.device_type and self.device_id == other.device_id
+        elif isinstance(other, torch.device):
+            return self.device_type == other.type and self.device_id == other.index
+        else:
+            return NotImplemented
+
 
 # create a CPU device singleton
 cpu = Device("cpu", 0, "cpu")
diff --git a/heat/core/tests/test_devices.py b/heat/core/tests/test_devices.py
index e0ce2a758b..9b1158e125 100644
--- a/heat/core/tests/test_devices.py
+++ b/heat/core/tests/test_devices.py
@@ -3,11 +3,19 @@
 
 import heat as ht
 from .test_suites.basic_test import TestCase
+from torch import device as torch_device
 
 envar = os.getenv("HEAT_TEST_USE_DEVICE", "cpu")
 
 
 class TestDevices(TestCase):
+    def test_eq(self):
+        self.assertEqual(ht.Device("cpu", 0, "cpu:0"), ht.cpu)
+        self.assertEqual(ht.Device("cpu", 0, "cpu:0"), torch_device("cpu:0"))
+        self.assertNotEqual(ht.Device("gpu", 0, "cuda:0"), ht.cpu)
+        self.assertNotEqual(ht.Device("cpu", 1, "cpu:1"), ht.cpu)
+        self.assertNotEqual(1, ht.cpu)
+
     @unittest.skipIf(envar not in ["cpu"], "only supported for cpu")
     def test_get_default_device_cpu(self):
         self.assertIs(ht.get_device(), ht.cpu)

From 799abefc8e12d8244a330c7f0a05b8ecd31718f7 Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Tue, 24 Jan 2023 18:51:28 +0100
Subject: [PATCH 56/57] [pre-commit.ci] pre-commit autoupdate (#1080)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

updates:
- [github.com/pycqa/pydocstyle: 6.2.3 → 6.3.0](https://github.com/pycqa/pydocstyle/compare/6.2.3...6.3.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
---
 .pre-commit-config.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 4d5fcf5359..51bf1fd594 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -14,7 +14,7 @@ repos:
     hooks:
     -   id: black
 -   repo: https://github.com/pycqa/pydocstyle
-    rev: 6.2.3  # pick a git hash / tag to point to
+    rev: 6.3.0  # pick a git hash / tag to point to
     hooks:
     -   id: pydocstyle
         exclude: 'tests|benchmarks|examples|scripts|setup.py'  #|heat/utils/data/mnist.py|heat/utils/data/_utils.py  ?

From 62410e74fca6dceb3798d072d001b5a6604ad3b1 Mon Sep 17 00:00:00 2001
From: Pratham Shah <82367556+shahpratham@users.noreply.github.com>
Date: Tue, 24 Jan 2023 23:45:12 +0530
Subject: [PATCH 57/57] Signal processing: fully distributed 1D convolution
 (#983)

* first commit

* started distributed kernel support

* fixed communication between processes

* storing values from all calculated signals through communication

* pads weights when kerenl is uneven

* flipped input kernel dndarray

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* calculating correct convolution across all processes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added example and minor changes

* minor change

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing pre-commit hooks

* swap a and v when v is larger

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* used bcast

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added tests for distributed kernels

* Accumulate filtered_signal in 1D within first loop

* Fix split axis of  when signal is distributed

* avoid empty local chunk condition

* pre-commit auto fixes

* added test for large random signal and removed earlier implementation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* supported and addded test for all modes

* added example and refactored

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added support for scalars and corrected halo_size

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* reformatted

* coorected halo_size

* cast t_v to float on cuda

* error message on unbalanced weights

* error message on unbalanced weight

* resolved device issue

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Claudia Comito <c.comito@fz-juelich.de>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
---
 heat/core/signal.py            | 151 +++++++++++++++++++++++----------
 heat/core/tests/test_signal.py |  70 +++++++++++----
 2 files changed, 158 insertions(+), 63 deletions(-)

diff --git a/heat/core/signal.py b/heat/core/signal.py
index b7556e50af..aab6b4d113 100644
--- a/heat/core/signal.py
+++ b/heat/core/signal.py
@@ -1,13 +1,13 @@
 """Provides a collection of signal-processing operations"""
 
 import torch
-from typing import Union, Tuple, Sequence
+import numpy as np
 
 from .communication import MPI
 from .dndarray import DNDarray
 from .types import promote_types
-from .manipulations import pad
-from .factories import array
+from .manipulations import pad, flip
+from .factories import array, zeros
 import torch.nn.functional as fc
 
 __all__ = ["convolve"]
@@ -15,14 +15,14 @@
 
 def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
     """
-    Returns the discrete, linear convolution of two one-dimensional `DNDarray`s.
+    Returns the discrete, linear convolution of two one-dimensional `DNDarray`s or scalars.
 
     Parameters
     ----------
-    a : DNDarray
-        One-dimensional signal `DNDarray` of shape (N,)
-    v : DNDarray
-        One-dimensional filter weight `DNDarray` of shape (M,).
+    a : DNDarray or scalar
+        One-dimensional signal `DNDarray` of shape (N,) or scalar.
+    v : DNDarray or scalar
+        One-dimensional filter weight `DNDarray` of shape (M,) or scalar.
     mode : str
         Can be 'full', 'valid', or 'same'. Default is 'full'.
         'full':
@@ -40,15 +40,6 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
           overlap completely. Values outside the signal boundary have no
           effect.
 
-    Notes
-    -----
-        Contrary to the original `numpy.convolve`, this function does not
-        swap the input arrays if the second one is larger than the first one.
-        This is because `a`, the signal, might be memory-distributed,
-        whereas the filter `v` is assumed to be non-distributed,
-        i.e. a copy of `v` will reside on each process.
-
-
     Examples
     --------
     Note how the convolution operator flips the second array
@@ -62,7 +53,27 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
     DNDarray([1., 3., 3., 3., 3.])
     >>> ht.convolve(a, v, mode='valid')
     DNDarray([3., 3., 3.])
+    >>> a = ht.ones(10, split = 0)
+    >>> v = ht.arange(3, split = 0).astype(ht.float)
+    >>> ht.convolve(a, v, mode='valid')
+    DNDarray([3., 3., 3., 3., 3., 3., 3., 3.])
+
+    [0/3] DNDarray([3., 3., 3.])
+    [1/3] DNDarray([3., 3., 3.])
+    [2/3] DNDarray([3., 3.])
+    >>> a = ht.ones(10, split = 0)
+    >>> v = ht.arange(3, split = 0)
+    >>> ht.convolve(a, v)
+    DNDarray([0., 1., 3., 3., 3., 3., 3., 3., 3., 3., 3., 2.], dtype=ht.float32, device=cpu:0, split=0)
+
+    [0/3] DNDarray([0., 1., 3., 3.])
+    [1/3] DNDarray([3., 3., 3., 3.])
+    [2/3] DNDarray([3., 3., 3., 2.])
     """
+    if np.isscalar(a):
+        a = array([a])
+    if np.isscalar(v):
+        v = array([v])
     if not isinstance(a, DNDarray):
         try:
             a = array(a)
@@ -77,24 +88,25 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
     a = a.astype(promoted_type)
     v = v.astype(promoted_type)
 
-    if v.is_distributed():
-        raise TypeError("Distributed filter weights are not supported")
     if len(a.shape) != 1 or len(v.shape) != 1:
         raise ValueError("Only 1-dimensional input DNDarrays are allowed")
-    if a.shape[0] <= v.shape[0]:
-        raise ValueError("Filter size must not be greater than or equal to signal size")
     if mode == "same" and v.shape[0] % 2 == 0:
         raise ValueError("Mode 'same' cannot be used with even-sized kernel")
+    if not v.is_balanced():
+        raise ValueError("Only balanced kernel weights are allowed")
+
+    if v.shape[0] > a.shape[0]:
+        a, v = v, a
 
     # compute halo size
-    halo_size = v.shape[0] // 2
+    halo_size = torch.max(v.lshape_map[:, 0]).item() // 2
 
     # pad DNDarray with zeros according to mode
     if mode == "full":
         pad_size = v.shape[0] - 1
         gshape = v.shape[0] + a.shape[0] - 1
     elif mode == "same":
-        pad_size = halo_size
+        pad_size = v.shape[0] // 2
         gshape = a.shape[0]
     elif mode == "valid":
         pad_size = 0
@@ -105,8 +117,10 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
     a = pad(a, pad_size, "constant", 0)
 
     if a.is_distributed():
-        if (v.shape[0] > a.lshape_map[:, 0]).any():
-            raise ValueError("Filter weight is larger than the local chunks of signal")
+        if (v.lshape_map[:, 0] > a.lshape_map[:, 0]).any():
+            raise ValueError(
+                "Local chunk of filter weight is larger than the local chunks of signal"
+            )
         # fetch halos and store them in a.halo_next/a.halo_prev
         a.get_halo(halo_size)
         # apply halos to local array
@@ -114,11 +128,21 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
     else:
         signal = a.larray
 
+    # flip filter for convolution as Pytorch conv1d computes correlations
+    v = flip(v, [0])
+    if v.larray.shape != v.lshape_map[0]:
+        # pads weights if input kernel is uneven
+        target = torch.zeros(v.lshape_map[0][0], dtype=v.larray.dtype, device=v.larray.device)
+        pad_size = v.lshape_map[0][0] - v.larray.shape[0]
+        target[pad_size:] = v.larray
+        weight = target
+    else:
+        weight = v.larray
+
+    t_v = weight  # stores temporary weight
+
     # make signal and filter weight 3D for Pytorch conv1d function
     signal = signal.reshape(1, 1, signal.shape[0])
-
-    # flip filter for convolution as Pytorch conv1d computes correlations
-    weight = v.larray.flip(dims=(0,))
     weight = weight.reshape(1, 1, weight.shape[0])
 
     # cast to float if on GPU
@@ -126,23 +150,56 @@ def convolve(a: DNDarray, v: DNDarray, mode: str = "full") -> DNDarray:
         float_type = promote_types(signal.dtype, torch.float32).torch_type()
         signal = signal.to(float_type)
         weight = weight.to(float_type)
+        t_v = t_v.to(float_type)
 
-    # apply torch convolution operator
-    signal_filtered = fc.conv1d(signal, weight)
-
-    # unpack 3D result into 1D
-    signal_filtered = signal_filtered[0, 0, :]
-
-    # if kernel shape along split axis is even we need to get rid of duplicated values
-    if a.comm.rank != 0 and v.shape[0] % 2 == 0:
-        signal_filtered = signal_filtered[1:]
-
-    return DNDarray(
-        signal_filtered.contiguous(),
-        (gshape,),
-        signal_filtered.dtype,
-        a.split,
-        a.device,
-        a.comm,
-        balanced=False,
-    ).astype(a.dtype.torch_type())
+    if v.is_distributed():
+        size = v.comm.size
+
+        for r in range(size):
+            rec_v = v.comm.bcast(t_v, root=r)
+            t_v1 = rec_v.reshape(1, 1, rec_v.shape[0])
+            local_signal_filtered = fc.conv1d(signal, t_v1)
+            # unpack 3D result into 1D
+            local_signal_filtered = local_signal_filtered[0, 0, :]
+
+            if a.comm.rank != 0 and v.lshape_map[0][0] % 2 == 0:
+                local_signal_filtered = local_signal_filtered[1:]
+
+            # accumulate filtered signal on the fly
+            global_signal_filtered = array(
+                local_signal_filtered, is_split=0, device=a.device, comm=a.comm
+            )
+            if r == 0:
+                # initialize signal_filtered, starting point of slice
+                signal_filtered = zeros(
+                    gshape, dtype=a.dtype, split=a.split, device=a.device, comm=a.comm
+                )
+                start_idx = 0
+
+            # accumulate relevant slice of filtered signal
+            # note, this is a binary operation between unevenly distributed dndarrays and will require communication, check out _operations.__binary_op()
+            signal_filtered += global_signal_filtered[start_idx : start_idx + gshape]
+            if r != size - 1:
+                start_idx += v.lshape_map[r + 1][0].item()
+        return signal_filtered
+
+    else:
+        # apply torch convolution operator
+        signal_filtered = fc.conv1d(signal, weight)
+
+        # unpack 3D result into 1D
+        signal_filtered = signal_filtered[0, 0, :]
+
+        # if kernel shape along split axis is even we need to get rid of duplicated values
+        if a.comm.rank != 0 and v.shape[0] % 2 == 0:
+            signal_filtered = signal_filtered[1:]
+
+        return DNDarray(
+            signal_filtered.contiguous(),
+            (gshape,),
+            signal_filtered.dtype,
+            a.split,
+            a.device,
+            a.comm,
+            balanced=False,
+        ).astype(a.dtype.torch_type())
diff --git a/heat/core/tests/test_signal.py b/heat/core/tests/test_signal.py
index a471218f89..7abd69e183 100644
--- a/heat/core/tests/test_signal.py
+++ b/heat/core/tests/test_signal.py
@@ -20,59 +20,94 @@ def test_convolve(self):
             [0, 1, 3, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 42, 29, 15]
         ).astype(ht.int)
 
-        signal = ht.arange(0, 16, split=0).astype(ht.int)
+        dis_signal = ht.arange(0, 16, split=0).astype(ht.int)
+        signal = ht.arange(0, 16).astype(ht.int)
+        full_ones = ht.ones(7, split=0).astype(ht.int)
         kernel_odd = ht.ones(3).astype(ht.int)
         kernel_even = [1, 1, 1, 1]
+        dis_kernel_odd = ht.ones(3, split=0).astype(ht.int)
+        dis_kernel_even = ht.ones(4, split=0).astype(ht.int)
 
         with self.assertRaises(TypeError):
             signal_wrong_type = [0, 1, 2, "tre", 4, "five", 6, "ʻehiku", 8, 9, 10]
             ht.convolve(signal_wrong_type, kernel_odd, mode="full")
         with self.assertRaises(TypeError):
             filter_wrong_type = [1, 1, "pizza", "pineapple"]
-            ht.convolve(signal, filter_wrong_type, mode="full")
+            ht.convolve(dis_signal, filter_wrong_type, mode="full")
         with self.assertRaises(ValueError):
-            ht.convolve(signal, kernel_odd, mode="invalid")
+            ht.convolve(dis_signal, kernel_odd, mode="invalid")
         with self.assertRaises(ValueError):
-            s = signal.reshape((2, -1))
+            s = dis_signal.reshape((2, -1))
             ht.convolve(s, kernel_odd)
         with self.assertRaises(ValueError):
             k = ht.eye(3)
-            ht.convolve(signal, k)
-        with self.assertRaises(ValueError):
-            ht.convolve(kernel_even, full_even)
+            ht.convolve(dis_signal, k)
         with self.assertRaises(ValueError):
-            ht.convolve(signal, kernel_even, mode="same")
+            ht.convolve(dis_signal, kernel_even, mode="same")
         if self.comm.size > 1:
-            with self.assertRaises(TypeError):
-                k = ht.ones(4, split=0).astype(ht.int)
-                ht.convolve(signal, k)
-        if self.comm.size >= 5:
             with self.assertRaises(ValueError):
-                ht.convolve(signal, kernel_even, mode="valid")
+                ht.convolve(full_ones, kernel_even, mode="valid")
+            with self.assertRaises(ValueError):
+                ht.convolve(kernel_even, full_ones, mode="valid")
+        if self.comm.size > 5:
+            with self.assertRaises(ValueError):
+                ht.convolve(dis_signal, kernel_even)
 
         # test modes, avoid kernel larger than signal chunk
         if self.comm.size <= 3:
             modes = ["full", "same", "valid"]
             for i, mode in enumerate(modes):
                 # odd kernel size
-                conv = ht.convolve(signal, kernel_odd, mode=mode)
+                conv = ht.convolve(dis_signal, kernel_odd, mode=mode)
+                gathered = manipulations.resplit(conv, axis=None)
+                self.assertTrue(ht.equal(full_odd[i : len(full_odd) - i], gathered))
+
+                conv = ht.convolve(dis_signal, dis_kernel_odd, mode=mode)
+                gathered = manipulations.resplit(conv, axis=None)
+                self.assertTrue(ht.equal(full_odd[i : len(full_odd) - i], gathered))
+
+                conv = ht.convolve(signal, dis_kernel_odd, mode=mode)
                 gathered = manipulations.resplit(conv, axis=None)
                 self.assertTrue(ht.equal(full_odd[i : len(full_odd) - i], gathered))
+
                 # different data types
-                conv = ht.convolve(signal.astype(ht.float), kernel_odd)
+                conv = ht.convolve(dis_signal.astype(ht.float), kernel_odd)
+                gathered = manipulations.resplit(conv, axis=None)
+                self.assertTrue(ht.equal(full_odd.astype(ht.float), gathered))
+
+                conv = ht.convolve(dis_signal.astype(ht.float), dis_kernel_odd)
+                gathered = manipulations.resplit(conv, axis=None)
+                self.assertTrue(ht.equal(full_odd.astype(ht.float), gathered))
+
+                conv = ht.convolve(signal.astype(ht.float), dis_kernel_odd)
                 gathered = manipulations.resplit(conv, axis=None)
                 self.assertTrue(ht.equal(full_odd.astype(ht.float), gathered))
 
                 # even kernel size
                 # skip mode 'same' for even kernels
                 if mode != "same":
-                    conv = ht.convolve(signal, kernel_even, mode=mode)
+                    conv = ht.convolve(dis_signal, kernel_even, mode=mode)
+                    dis_conv = ht.convolve(dis_signal, dis_kernel_even, mode=mode)
                     gathered = manipulations.resplit(conv, axis=None)
+                    dis_gathered = manipulations.resplit(dis_conv, axis=None)
 
                     if mode == "full":
                         self.assertTrue(ht.equal(full_even, gathered))
+                        self.assertTrue(ht.equal(full_even, dis_gathered))
                     else:
                         self.assertTrue(ht.equal(full_even[3:-3], gathered))
+                        self.assertTrue(ht.equal(full_even[3:-3], dis_gathered))
+
+                # distributed large signal and kernel
+                np.random.seed(12)
+                np_a = np.random.randint(1000, size=4418)
+                np_b = np.random.randint(1000, size=1543)
+                np_conv = np.convolve(np_a, np_b, mode=mode)
+
+                a = ht.array(np_a, split=0, dtype=ht.int32)
+                b = ht.array(np_b, split=0, dtype=ht.int32)
+                conv = ht.convolve(a, b, mode=mode)
+                self.assert_array_equal(conv, np_conv)
 
         # test edge cases
         # non-distributed signal, size-1 kernel
@@ -81,3 +116,6 @@ def test_convolve(self):
         kernel = ht.ones(1).astype(ht.int)
         conv = ht.convolve(alt_signal, kernel)
         self.assertTrue(ht.equal(signal, conv))
+
+        conv = ht.convolve(1, 5)
+        self.assertTrue(ht.equal(ht.array([5]), conv))