Release v0.103 · LLNL/lbann

============================== Release Notes: v0.103 ==============================

C++ API:

Support for new training algorithms:

2nd-order optimization with K-FAC.
Currently supports fully-connected, convolution, and GRU layers.
Added Sub-graph parallelism support for multi-branch architectures
(split, slice, sum, and concat layers)
Data + sub-graph parallelism for in-core models (D&SP and D&SP-cSub)
Initial sub-graph parallelism support for common layers in
Transformers
Model topology mutation in LTFB/PBT
Added sub-grid parallelism support for K-FAC using primary and
secondary grids
Truncation selection exchange for LTFB/PBT
Regularized evolution for LTFB/PBT
Hyperparameter grid search
Multi-GAN training algorithm with multiple discriminators

Support for new network structures:

Support for new layers:

Added support for 2D Matrices for Scatter and Gather layers
Added support for distributed Scatter and Gather layers
DistConv enabled 3D MatMul
Added image rotation layer and composite image transformation layer
(rotate, shear, translate)
Added distributed tensor parallelism with channelwise decomposition for channelwise fully connected layer
Added "binary-with-constant" operators
Updated deconvolution layer to match PyTorch's API
Updated identity layer to copy tensors to enable tensor parallelism in subsequent layers in the compute graph
Added IdentityZero layer that allows alternating generator/discriminator
updates for training GANs.
Added an External layer that enables separately-compiled library to be loaded dynamically
Added support for labels_only mode on data-parallel cross entropy layer

Python front-end:

Performance optimizations:

Enabled the input layers to use a view of the I/O buffers in the
buffered data coordinator
Use default-allocated GPU memory for long-lived buffers
Optimized GPU kernels for entry-wise operators
Optionally use default-allocated GPU memory for long-lived buffers

Model portability & usability:

Experiments & Applications:

Internal features:

Added operator class
Added AlternateUpdates callback to be used with IdentityZero layers for
training GANs.
Added support for serializing network architectures to protobuf format.
Reformatted headers and implementation files for a more IWYU paradigm.
General support for ROCm-enabled DistConv
Support fo use of libfabric plugin for RCCL and NCCL
Framework-wide improvements in support for ROCm and MIOpen
Callback for alternating optimizer layer update
Command line argument to hang the LBANN application for debuggin
Add a cuTT/hipTT backend to the permute layer
Add a permute layer utilizing cuTENSOR for the permute implementation
Weight initializer from NumPy file

I/O & data readers:

Updated SMILES data reader to use sample lists
Added explicitly managed buffered reading and local unpacking for the
SMILES data reader to minimize file access
Sample lists with integral indices can use range format (start ... end)
Added a new extensible HDF5 data reader that uses a data set schema
and experiment schema files to define how the data is represented.
This allows the user to change the representation of data without
changing the data reader.
Changed the input layer to take a data field and only produce a
single output. Currently valid Data fields are samples, labels,
and responses.
Added support for using arbitrary field names with HDF5 data reader.
Updated the data coordinator and data readers to
take dynamic data fields rather than fixed fields. Input buffers
are no long allocated for fields that are not used in active
models.
Added support in the generic data reader and synthetic data reader
clases for arbitrary data fields.
Added support for data readers to return full Conduit nodes to the
Data Coordinator.
Data coordinator can now directly return packed data fields to
input layers.
Added padding and cutout transformations

Build system:

Added support for using uptream Spack repositories
Added support to reuse existing Spack environments, which
significantly decreases the startup time of running a CI job
Enforce consistent GPU targets in Spack environment
Switched from Bamboo to GitLab CI framework

Bug fixes:

Fixed GPU kernels that launched with more blocks than allowed
Fixed build and runtime errors with DistConv
Use correct LayerNorm operaton in "Attention Is All You Need"
Transformer
Fixed a bug where the input layer performed unnecessary memory
allocations.
Bug fixes within Cosmoflow and U-Net models
Fixed a bug in the GPU-based computation of the batchnorm
statistics
Patch for when distconv'd input layer is followed by non-distconv layer
Bugfix input layer activations: Fixed the input layer so that it
would only resize the activation matrix if it wasn't already setup
to be a view of the data_coordinator's matrix. This addresses a
signficant performance bug in the data ingestion where the
activation matrix was a view into the data coordinator's internal buffers.
Fixed bad convolution parameters producing incorrect layer shapes.
Enabling tensor copy on distconv-enabled Identity layer
General cleanup and improvement in the coverage and robustness of
CI testing
Fix buffer overflow in SMILES data reader
Fix a bug in TSE
Do not construct bias weights when not needed in conv and FC modules
Use tournament set in LTFB with truncation selection exchange
Cleanup data reader tests memory leaks
Fixed a buffer overrun, heap overflow, and double allocation of the
data store in the SMILES data reader
Match LayerNorm and InstanceNorm layers to PyTorch
Make sure GPU grid dims are valid in slice/concat layers
Fixed incorrect matrix ording in K-FAC for conv layer
Bugfix for polynomial learning rate schedule

Provide feedback