Skip to content

Fastor V0.5

Compare
Choose a tag to compare
@romeric romeric released this 20 Mar 15:21
· 473 commits to master since this release

Fastor V0.5 is one hell of a release as it brings a lot of new features, fundamental performance improvements, improved flexibility working with Tensors and many bug fixes:

New Features

  1. Improved IO formatting. Flexible, configurable formatting for all derived tensor classes
  2. Generic matmul function for AbstractTensors and expressions
  3. Introduce a new Tensor type SingleValueTensor for tensors of any size and dimension that have all their values the same. It is extremely space efficient as it stores a single value under the hood. It provides a more optimised route for certain linear algebra functions. For instance matmul of a Tensor and SingleValueTensor is O(n) and transpose is O(1)
  4. New evaluation methods for all expressions teval and teval_s that provide fast evaluation of higher order tensors
  5. cast method to cast a tensor to a tensor of different data type
  6. get_mem_index and get_flat_index to generalise indexing across all tensor classes. Eval methods now use these
  7. Binary comparison operators for expressions that evaluate lazy. Also binary comparison operators for SIMDVectors
  8. Constructing column major tensors is now supported by using Tensor(external_data,ColumnMajor)
  9. tocolumnmajor and torowmajor free functions
  10. all_of, any_of and none_of free function reducers that work boolean expressions
  11. Fixed views now support noalias feature
  12. FASTOR_IF_CONSTEXPR macro for C++17

Performance and other key improvements

  1. Tensor class can now be treated as a compile time type as it can be initialised as constexpr by defining the macro FASTOR_ZERO_INITIALISE
  2. Higher order einsum functions now dispatch to matmul whenever possible which is much faster
  3. Much faster generic permutation, contraction and einsum algorithms that definitely beat the speed of hand-written C-code now based on recursive templates. CONTRACT_OPT is no longer necessary
  4. A much faster loop tiling based transpose function. It is at least 2X faster than implementations in other ET libraries
  5. Introducing libxsmm backend for matmul. The switch from in-built to libxsmm routines for matmul can be configured by the user using BLAS_SWITCH_MATRIX_SIZE_S for square matrices and BLAS_SWITCH_MATRIX_SIZE_NS for non-square matrices. Default sizes are 16 and 13 respectively. libxsmm brings substantial improvement for bigger size matrices
  6. Condensed unary ops and binary ops into a single more maintainable macro
  7. FASTOR_ASSERT is now a macro to assert which optimises better at release
  8. Optimised determinant for 4x4 cases. Determinant now works on all types and not just float and double
  9. all is now an alias to fall which means many tensor view expressions can now be dispatched to tensor fixed views. The implication of this is that expressions like a(all) and A(all,all) can just return the underlying tensor as opposed to creating a view with unnecessary sequences and offsets. This is much faster
  10. Specialised constructors for many view types that construct the tensor much faster
  11. Improved support for TensorMap class to behave exactly the same as Tensor class including views, block indexing and so on
  12. Improved unit-testing under many configurations (debug and release)
  13. Many Tensor related methods and functionalities have been separated in to separate files that are now usable by other tensor type classes
  14. Division of an expression by a scalar can now be dispatched to multiplication which creates the opportunity for FMA
  15. Cofactor and adjoint can now fall back to a scalar version when SIMD types are not available
  16. Documentation is now available under Wiki pages

Bug fixes

  1. Fix a bug in product method of Tensor class (99e3ff0)
  2. Fix AVX store bug in backend matmul 3k3 (8f4c6ae)
  3. Fix bug in tensor matmul for matrix-vector case (899c6c0)
  4. Fix a bug in SIMDVector under scalar mode with mixed types (f707070)
  5. Fix bugs with math functions on SIMDVector with size>256 not compiling (ca2c74d)
  6. Fix bugs with matrix-vector einsum (8241ac8, 70838d2)
  7. Fix a bug with strided_contraction when the second matrix disappears (4ff2ea0)
  8. Fix a bug in 4D tensor initializer_list constructor (901d8b1)
  9. Fixes to fully support SIMDVector fallback to scalar version
  10. and many more undocumented fixes

Key changes

  1. Complete re-architecturing the directory hierarchy of Fastor. Fastor should now be included as #include <Fastor/Fastor.h>
  2. TensorRef class has now been renamed to TensorMap
  3. Expressions now evaluate based on the type of their underlying derived classes rather than the tensor that they are getting assigned to

There is a lot more major and minor undocumented changes.