v1.4.0 - Interactive HPC tutorials, distributed FFT, batch-parallel clustering, support PyTorch 2.2.2
- #1406 New tutorials for interactive parallel mode for both HPC and local usage (by @ClaudiaComito)
- #1288 Batch-parallel K-means and K-medians (by @mrfh92)
- #1228 Introduce in-place-operators for
arithmetics.py
(by @LScheib) - #1218 Distributed Fast Fourier Transforms (by @ClaudiaComito)
- #1363
ht.array
constructor respects implicit torch device when copy is set to false (by @JuanPedroGHM) - #1216 Avoid unnecessary gathering of distributed operand (by @samadpls)
- #1329 Refactoring of QR: stabilized Gram-Schmidt for split=1 and TS-QR for split=0 (by @mrfh92)
- #1418 and #1290: Support PyTorch 2.2.2 (by @mtar)
- #1315 and #1337: Fix some NumPy deprecations in the core and statistics tests (by @FOsterfeld)
- #1259 Bug-fix for
ht.regression.Lasso()
on GPU (by @mrfh92) - #1201 Fix
ht.diff
for 1-element-axis edge case (by @mtar)
- #1257 Docker release 1.3.x update (by @JuanPedroGHM)
- #1274 Update version before release (by @ClaudiaComito)
- #1267 Unit tests: Increase tolerance for
ht.allclose
onht.inv
operations for all torch versions (by @ClaudiaComito) - #1266 Sync
pre-commit
configuration withmain
branch (by @ClaudiaComito) - #1264 Fix Pytorch release tracking workflows (by @mtar)
- #1234 Update sphinx package requirements (by @mtar)
- #1187 Create configuration file for Read the Docs (by @mtar)
v1.3.0 - Scalable SVD, GSoC`22 contributions, Docker image, PyTorch 2 support, AMD GPUs acceleration
This release includes many important updates (see below). We particularly would like to thank our enthusiastic GSoC2022 / tentative GSoC2023 contributors @Mystic-Slice @neosunhan @Sai-Suraj-27 @shahpratham @AsRaNi1 @Ishaan-Chandak 🙏🏼 Thank you so much!
- #1155 Support PyTorch 2.0.1 (by @ClaudiaComito)
- #1152 Support AMD GPUs (by @mtar)
- #1126 Distributed hierarchical SVD (by @mrfh92)
- #1028 Introducing the
sparse
module: Distributed Compressed Sparse Row Matrix (by @Mystic-Slice) - Performance improvements:
- #1125 distributed
heat.reshape()
speed-up (by @ClaudiaComito) - #1141
heat.pow()
speed-up when exponent isint
(by @ClaudiaComito @coquelin77 ) - #1119
heat.array()
default tocopy=None
(e.g., only if necessary) (by @ClaudiaComito @neosunhan )
- #1125 distributed
- #970 Dockerfile and accompanying documentation (by @bhagemeier)
- #1154 Introduce
DNDarray.__array__()
method for interoperability withnumpy
,xarray
(by @ClaudiaComito) - #1147 Adopt NEP29, drop support for PyTorch 1.7, Python 3.6 (by @mtar)
- #1119
ht.array()
default tocopy=None
(e.g., only if necessary) (by @ClaudiaComito) - #1020 Implement
broadcast_arrays
,broadcast_to
(by @neosunhan) - #1008 API: Rename
keepdim
kwarg tokeepdims
(by @neosunhan) - #788 Interface for DPPY interoperability (by @coquelin77 @fschlimb )
- #1126 Distributed hierarchical SVD (by @mrfh92)
- #1020 Implement
broadcast_arrays
,broadcast_to
(by @neosunhan) - #983 Signal processing: fully distributed 1D convolution (by @shahpratham)
- #1063 add eq to Device (by @mtar)
- #1141
heat.pow()
speed-up when exponent isint
(by @ClaudiaComito) - #1136 Fixed PyTorch version check in
sparse
module (by @Mystic-Slice) - #1098 Validates number of dimensions in input to
ht.sparse.sparse_csr_matrix
(by @Ishaan-Chandak) - #1095 Convolve with distributed kernel on multiple GPUs (by @shahpratham)
- #1094 Fix division precision error in
random
module (by @Mystic-Slice) - #1075 Fixed initialization of DNDarrays communicator in some routines (by @AsRaNi1)
- #1066 Verify input object type and layout + Supporting tests (by @Mystic-Slice)
- #1037 Distributed weighted
average()
along tuple of axes: shape ofweights
to match shape of input (by @Mystic-Slice)
- #1137 Continous Benchmarking of runtime (by @JuanPedroGHM)
- #1150 Refactoring for efficiency and readability (by @Sai-Suraj-27)
- #1130 Reintroduce Quick Start (by @ClaudiaComito)
- #1079 A better README file (by @Sai-Suraj-27)
- #1126, #1160 Distributed hierarchical SVD (by @mrfh92 @ClaudiaComito )
- #1058 Fix edge-case contiguity mismatch for Allgatherv (by @ClaudiaComito)
@ClaudiaComito, @JuanPedroGHM
- #1048 Support PyTorch 1.13.0 on branch release/1.2.x (by @github-actions)
- #1038 Lanczos decomposition
linalg.solver.lanczos
: Support double precision, complex data types (by @ClaudiaComito) - #1034
ht.array
, closed loophole allowingDNDarray
construction with incompatible shapes of local arrays (by @Mystic-Slice)
- #1038 Lanczos decomposition
linalg.solver.lanczos
: Support double precision, complex data types (by @ClaudiaComito)
- #1025 mirror repository on gitlab + ci (by @mtar)
- #1014 fix: set cuda rng state on gpu tests for test_random.py (by @JuanPedroGHM)
- #906 PyTorch 1.11 support
- #595 Distributed 1-D convolution:
ht.convolve
- #941 Parallel I/O: write to CSV file with
ht.save_csv
. - #887 Binary operations between operands of equal shapes, equal
split
axes, but different distribution maps. - Expanded memory-distributed linear algebra, manipulations modules.
- #826 Fixed
__setitem__
handling of distributedDNDarray
values which have a different shape in the split dimension - #846 Fixed an issue in
_reduce_op
when axis and keepdim were set. - #846 Fixed an issue in
min
,max
where DNDarrays with empty processes can't be computed. - #868 Fixed an issue in
__binary_op
where data was falsely distributed if a DNDarray has single element. - #916 Fixed an issue in
random.randint
where the size parameter does not accept ints.
- #945
ht.divide
now supportsout
andwhere
kwargs
- #868 New
MPICommunication
methodSplit
- #940 and #967 Duplicate
MPI.COMM_WORLD
andMPI_SELF
to make library more independent.
- #840 New feature:
vecdot()
- #842 New feature:
vdot
- #846 New features
norm
,vector_norm
,matrix_norm
- #850 New Feature
cross
- #877 New feature
det
- #875 New feature
inv
- #862 New feature
signbit
- #816 New feature: Local printing (
ht.local_printing()
) and global printing options - #816 New feature: print only on process 0 with
print0(...)
andht.print0(...)
- #858 New Feature:
standard_normal
,normal
- #827 New feature:
sign
,sgn
- #928 New feature:
bucketize
,digitize
- #876 Fix examples (Lasso and kNN)
- #894 Change inclusion of license file
- #948 Improve CSV write performance.
- #960 Bypass unnecessary communication by replacing
factories.array
withDNDarray
contruct in random.py
- #864 Dependencies: constrain
torchvision
version range to match supportedpytorch
version range.
- Slicing/indexing overhaul for a more NumPy-like user experience. Warning for distributed arrays: breaking change! Indexing one element along the distribution axis now implies the indexed element is communicated to all processes.
- More flexibility in handling non-load-balanced distributed arrays.
- More distributed operations, incl. meshgrid.
- #758 Indexing a distributed
DNDarray
along theDNDarray.split
dimension now returns a non-distributedDNDarray
, i.e. the indexed element is MPI-broadcasted. Example on 2 processes:a = ht.arange(5 * 5, split=0).reshape((5, 5)) print(a.larray) >>> [0] tensor([[ 0, 1, 2, 3, 4], >>> [0] [ 5, 6, 7, 8, 9], >>> [0] [10, 11, 12, 13, 14]], dtype=torch.int32) >>> [1] tensor([[15, 16, 17, 18, 19], >>> [1] [20, 21, 22, 23, 24]], dtype=torch.int32) b = a[:, 2] print(b.larray) >>> [0] tensor([ 2, 7, 12], dtype=torch.int32) >>> [1] tensor([17, 22], dtype=torch.int32) print(b.shape) >>> [0] (5,) >>> [1] (5,) print(b.split) >>> [0] 0 >>> [1] 0 c = a[4] print(c.larray) >>> [0] tensor([20, 21, 22, 23, 24], dtype=torch.int32) >>> [1] tensor([20, 21, 22, 23, 24], dtype=torch.int32) print(c.shape) >>> [0] (5,) >>> [1] (5,) print(c.split) >>> [0] None >>> [1] None
- #758 Fix indexing inconsistencies in
DNDarray.__getitem__()
- #768 Fixed an issue where
deg2rad
andrad2deg
are not working with the 'out' parameter. - #785 Removed
storage_offset
when finding the mpi buffer (communication. MPICommunication.as_mpi_memory()
). - #785 added allowance for 1 dimensional non-contiguous local tensors in
communication. MPICommunication.mpi_type_and_elements_of()
- #787 Fixed an issue where Heat cannot be imported when some optional dependencies are not available.
- #790 catch incorrect device after
bcast
inDNDarray.__getitem__
- #796
heat.reshape(a, shape, new_split)
now always returns a distributedDNDarray
ifnew_split is not None
(inlcuding when the original inputa
is not distributed) - #811 Fixed memory leak in
DNDarray.larray
- #820
randn
values are pushed away from 0 by the minimum value the given dtype before being transformed into the Gaussian shape - #821 Fixed
__getitem__
handling of distributedDNDarray
key element - #831
__getitem__
handling ofarray-like
1-element key
- #812 New feature:
logaddexp
,logaddexp2
- #718 New feature:
trace()
- #768 New feature: unary positive and negative operations
- #820
dot
can handle matrix-vector operation now
- #796
DNDarray.reshape(shape)
: method now allows shape elements to be passed in as single arguments.
- #761 New feature:
result_type
- #788 Added the partition interface
DNDarray
for use with DPPY - #794 New feature:
meshgrid
- #821 Enhancement: it is no longer necessary to load-balance an imbalanced
DNDarray
before gathering it onto all processes. In short:ht.resplit(array, None)
now works on imbalanced arrays as well.
- #660 NN module for data parallel neural networks
- #699 Support for complex numbers; New functions:
angle
,real
,imag
,conjugate
- #702 Support channel stackoverflow
- #728
DASO
optimizer - #757 Major documentation overhaul, custom docstrings formatting
- #706 Bug fix: prevent
__setitem__
,__getitem__
from modifying key in place - #709 Set the encoding for README.md in setup.py explicitly.
- #716 Bugfix: Finding clusters by spectral gap fails when multiple diffs identical
- #732 Corrected logic in
DNDarray.__getitem__
to produce the correct split axis - #734 Fix division by zero error in
__local_op
with out != None on empty local arrays. - #735 Set return type to bool in relational functions.
- #744 Fix split semantics for reduction operations
- #756 Keep track of sent items while balancing within
sort()
- #764 Fixed an issue where
repr
was giving the wrong output. - #767 Corrected
std
to not use numpy
- #680 New property:
larray
: extract local torch.Tensor - #683 New properties:
nbytes
,gnbytes
,lnbytes
- #687 New property:
balanced
- #707 New feature:
asarray()
- #559 Enhancement:
save_netcdf
allows naming dimensions, creating unlimited dimensions, using existing dimensions and variables, slicing
- #658 Bugfix:
matmul
on GPU will cast away fromint
s tofloat
s for the operation and cast back upon its completion. This may result in numerical inaccuracies for very largeint64
DNDarrays
- #677 New features:
split
,vsplit
,dsplit
,hsplit
- #690 New feature:
ravel
- #690 Enhancement:
reshape
accepts shape arguments with one unknown dimension - #690 Enhancement: reshape accepts shape arguments with one unknown dimension.
- #706 Bug fix: prevent
__setitem__
,__getitem__
from modifying key in place
- #660 New submodule:
nn.DataParallel
for creating and training data parallel neural networks - #660 New feature: Synchronous and Asynchronous gradient updates availble for
ht.nn.DataParallel
- #660 New feature:
utils.data.datatools.DataLoader
for created a localtorch.utils.data.Dataloader
for use withht.nn.DataParallel
- #660 New feature:
utils.data.datatools.Dataset
for created a localtorch.utils.data.Dataset
for use withht.nn.DataParallel
- #660 Added MNIST example to
example/nn
to show the use ofht.nn.DataParallel
. TheMNISTDataset
can be found inht.utils.data.mnist.py
- #660 New feature: Data loader for H5 datasets which shuffles data in the background during training (
utils.data.partial_dataset.PartialH5Dataset
) - #728 New feature:
nn.DataParallelMultiGPU
which usestorch.distributed
for local communication (for use withoptim.DASO
) - #728 New feature:
optim.DetectMetricPlateau
detects when a given metric plateaus.
- #792 API extension (aliases):
greater
,greater_equal
,less
,less_equal
,not_equal
- #679 New feature:
histc()
andhistogram()
- #709 Set the encoding for README.md in setup.py explicitly.
- #716 Bugfix: Finding clusters by spectral gap fails when multiple diffs identical
- #732 Corrected logic in
DNDarray.__getitem__
to produce the correct split axis - #734 Fix division by zero error in
__local_op
with out != None on empty local arrays. - #735 Set return type to bool in relational functions.
- #744 Fix split semantics for reduction operations
- #756 Keep track of sent items while balancing within
sort()
- #764 Fixed an issue where
repr
was giving the wrong output.
- #690 Enhancement: reshape accepts shape arguments with one unknown dimension.
- #706 Bug fix: prevent
__setitem__
,__getitem__
from modifying key in place
- #717 Switch CPU CI over to Jenkins and pre-commit to GitHub action.
- #720 Ignore test files in codecov report and allow drops in code coverage.
- #725 Add tests for expected warnings.
- #736 Reference Jenkins CI tests and set development status to Beta.
- #678 Bugfix: Internal functions now use explicit device parameters for DNDarray and torch.Tensor initializations.
- #684 Bug fix: distributed
reshape
now works on booleans as well.
- #488 Enhancement: Rework of the test device selection.
- #569 New feature: distributed
percentile()
andmedian()
- #572 New feature: distributed
pad()
- #573 Bugfix: matmul fixes: early out for 2 vectors, remainders not added if inner block is 1 for split 10 case
- #575 Bugfix: Binary operations use proper type casting
- #575 Bugfix:
where()
andcov()
convert ints to floats when given as parameters - #577 Add
DNDarray.ndim
property - #578 Bugfix: Bad variable in
reshape()
- #580 New feature: distributed
fliplr()
- #581 New Feature:
DNDarray.tolist()
- #583 New feature: distributed
rot90()
- #593 New feature distributed
arctan2()
- #594 New feature: Advanced indexing
- #594 Bugfix: distributed
__getitem__
and__setitem__
memory consumption heavily reduced - #596 New feature: distributed
outer()
- #598 Type casting changed to PyTorch style casting (i.e. intuitive casting) instead of safe casting
- #600 New feature:
shape()
- #608 New features: distributed
stack()
,column_stack()
,row_stack()
- #614 New feature: printing of DNDarrays and
__repr__
and__str__
functions - #615 New feature: distributed
skew()
- #615 New feature: distributed
kurtosis()
- #618 Printing of unbalanced DNDarrays added
- #620 New feature: distributed
knn
- #624 Bugfix: distributed
median()
indexing and casting - #629 New features: distributed
asin
,acos
,atan
,atan2
- #631 Bugfix: get_halo behaviour when rank has no data.
- #634 New features: distributed
kmedians
,kmedoids
,manhattan
- #633 Documentation: updated contributing.md
- #635
DNDarray.__getitem__
balances and resplits the given key to None if the key is a DNDarray - #638 Fix: arange returns float32 with single input of type float & update skipped device tests
- #639 Bugfix: balanced array in demo_knn, changed behaviour of knn
- #648 Bugfix: tensor printing with PyTorch 1.6.0
- #651 Bugfix:
NotImplemented
is nowNotImplementedError
incore.communication.Communication
base class - #652 Feature: benchmark scripts and jobscript generation
- #653 Printing above threshold gathers the data without a buffer now
- #653 Bugfixes: Update unittests argmax & argmin + force index order in mpi_argmax & mpi_argmin. Add device parameter for tensor creation in dndarray.get_halo().
- #659 New feature: distributed
random.permutation
+random.randperm
- #662 Bugfixes:
minimum()
andmaximum()
split semantics, scalar input, different input dtype - #664 New feature / enhancement: distributed
random.random_sample
,random.random
,random.sample
,random.ranf
,random.random_integer
- #666 New feature: distributed prepend/append for
diff()
. - #667 Enhancement
reshape
: rename axis parameter - #678 New feature: distributed
tile
- #670 New Feature:
bincount()
- #674 New feature:
repeat
- #670 New Feature: distributed
bincount()
- #672 Bug / Enhancement: Remove
MPIRequest.wait()
, rewrite calls with capital letters. lower casewait()
now falls back to thempi4py
function
- Update documentation theme to "Read the Docs"
- #429 Create submodule for Linear Algebra functions
- #429 Implemented QR
- #429 Implemented a tiling class to create Square tiles along the diagonal of a 2D matrix
- #429 Added PyTorch Jitter to inner function of matmul for increased speed
- #483 Bugfix: Underlying torch tensor moves to the right device on array initialisation
- #483 Bugfix: DNDarray.cpu() changes heat device to cpu
- #496 New feature: flipud()
- #498 Feature: flip()
- #499 Bugfix: MPI datatype mapping:
torch.int16
now maps toMPI.SHORT
instead ofMPI.SHORT_INT
- #501 New Feature: flatten
- #506 Bugfix: setup.py has correct version parsing
- #507 Bugfix: sanitize_axis changes axis of 0-dim scalars to None
- #511 New feature: reshape
- #515 ht.var() now returns the unadjusted sample variance by default, Bessel's correction can be applied by setting ddof=1.
- #518 Implementation of Spectral Clustering.
- #519 Bugfix: distributed slicing with empty list or scalar as input; distributed nonzero() of empty (local) tensor.
- #520 Bugfix: Resplit returns correct values now.
- #520 Feature: SplitTiles class, used in new resplit, tiles with theoretical and actual split axes
- #521 Add documentation for the dtype reduce_op in Heat's core
- #522 Added CUDA-aware MPI detection for MVAPICH, MPICH and ParaStation.
- #524 New Feature: cumsum & cumprod
- #526 float32 is now consistent default dtype for factories.
- #531 Tiling objects are not separate from the DNDarray
- #534
eye()
supports all 2D split combinations and matrix configurations. - #535 Introduction of BaseEstimator and clustering, classification and regression mixins.
- #536 Getting rid of the docs folder
- #541 Introduction of basic halo scheme for inter-rank operations
- #558
sanitize_memory_layout
assumes default memory layout of the input tensor - #558 Support for PyTorch 1.5.0 added
- #562 Bugfix: split semantics of ht.squeeze()
- #567 Bugfix: split differences for setitem are now assumed to be correctly given, error will come from torch upon the setting of the value
- #454 Update lasso example
- #474 New feature: distributed Gaussian Naive Bayes classifier
- #473 Matmul now will not split any of the input matrices if both have
split=None
. To toggle splitting of one input for increased speed use the allow_resplit flag. - #473
dot
handles 2 split None vectors correctly now - #470 Enhancement: Accelerate distance calculations in kmeans clustering by introduction of new module spatial.distance
- #478
ht.array
now typecasts the local torch tensors if the torch tensors given are not the torch version of the specified dtype + unit test updates - #479 Completion of spatial.distance module to support 2D input arrays of different splittings (None or 0) and different datatypes, also if second input argument is None
This version adds support for PyTorch 1.4.0. There are also several minor feature improvements and bug fixes listed below.
- #443 added option for neutral elements to be used in the place of empty tensors in reduction operations (
operations.__reduce_op
) (cf. #369 and #444) - #445
var
andstd
both now support iterable axis arguments - #452 updated pull request template
- #465 bug fix:
x.unique()
returns a DNDarray both in distributed and non-distributed mode (cf. [#464]) - #463 Bugfix: Lasso tests now run with both GPUs and CPUs
This version fixes the packaging, such that installed versions of HeAT contain all required Python packages.
This version varies greatly from the previous version (0.1.0). This version includes a great increase in functionality and there are many changes. Many functions which were working previously now behave more closely to their numpy counterparts. Although a large amount of progress has been made, work is still ongoing. We appreciate everyone who uses this package and we work hard to solve the issues which you report to us. Thank you!
- python >= 3.5
- mpi4py >= 3.0.0
- numpy >= 1.13.0
- torch >= 1.3.0
- h5py >= 2.8.0
- netCDF4 >= 1.4.0, <= 1.5.2
- pre-commit >= 1.18.3 (development requirement)
#415 GPU support was added for this release. To set the default device use ht.use_device(dev)
where dev
can be either
"gpu" or "cpu". Make sure to specify the device when creating DNDarrays if the desired device is different than the
default. If no device is specified then that device is assumed to be "cpu".
- #308 balance
- #308 convert DNDarray to numpy NDarray
- #412 diag and diagonal
- #388 diff
- #362 distributed random numbers
- #327 exponents and logarithms
- #423 Fortran memory layout
- #330 load csv
- #326 maximum
- #324 minimum
- #304 nonzero
- #367 prod
- #402 modf
- #428 redistribute
- #345 resplit out of place
- #402 round
- #312 sort
- #423 strides
- #304 where
- Code of conduct
- Contribution guidelines
- pre-commit and black checks added to Pull Requests to ensure proper formatting
- Issue templates
- #357 Logspace factory
- #428 lshape map creation
- Pull Request Template
- Removal of the ml folder in favor of regression and clustering folders
- #365 Test suite
- KMeans bug fixes
- Working in distributed mode
- Fixed shape cluster centers for
init='kmeans++'
- __local_op now returns proper gshape
- allgatherv fix -> elements now sorted in the correct order
- getitiem fixes and improvements
- unique now returns a distributed result if the input was distributed
- AllToAll on single process now functioning properly
- optional packages are truly optional for running the unit tests
- the output of mean and var (and std) now set the correct split axis for the returned DNDarray