Skip to content

Fastor V0.6

Compare
Choose a tag to compare
@romeric romeric released this 01 May 18:21
· 245 commits to master since this release

Fastor V0.6 is a major release that brings a lot fundamental internal redesign and performance improvements. This is perhaps the biggest release since the inception of project Fastor. The following are a list of changes and the new features released in this version

  1. The whole of Fastor's expression template engine has been reworked to facilitate arbitrary re-structuring of the expressions. Most users will not notice this change as it pertains to internal re-architecturing but the change is quite significant. The main driver for this has been to introduce and chaing linear algebra expressions with other element-wise operations.
  2. A series of linear algebra expressions are introduced as a result with less verbose names and the other existing linear algebra routines are now moved to a dedicated linear algebra expression module. This lays out the basic building blocks of Fastor's tensor algebra library
  3. Multiplication operator % introduced that evaluate lazy and takes any expression
  4. Greedy like matmul implemented. Operations like A%B%C%D%... will be evaluated in the most efficient order
  5. inv function introduced for lazy inversion. Extremely fast matrix inversion up to stack size 256x256
  6. trans function introduced for lazy transpose. Extremely fast AVX512 8x8 double and 16x16 float transpose using explicit SIMD introduced
  7. det function introduced for lazy determinant
  8. solve function evaluated for lazy solve. solve has the behaviour that if both the inputs are Tensor it evaluates immedidately and if either one of the inputs is an expressions it delays the evaluation. solve is now also able to solve matrices for up to stack size 256x256
  9. qr function introduced for QR factorisation using modified Gram-Schmidt factorisation that has the potential to be easily SIMD vectorised in the future. The scalar implementation at the moment has good performance
  10. absdet and logdet functions introduced for lazy computation of absolute and natural logarithm of a determinant
  11. determinant, matmul, transpose and most verbose linear algebra functions can now take expressions but evaluate immediately
  12. einsum, contraction, inner, outer, permutation, cross, sum and product, now all work on expressions. einsum/contraction for expressions also dispatches to the same operation minimisation algorithms that the non-expression version does hence the above set of new functions are as fast for expressions as they are for tensor types. cross function for cross product of vectors is introduced as well
  13. Most linear algebra operations like qr, det, solve take optional parameters (class enums) to request the type of computation for instance det<DetCompType::Simple>, qr<QRCompType::MGSR> etc
  14. MKL (JIT) backend introduced which can be used in the same way as libxsmm
  15. The backend _matmul routines are reworked and specifically tuned for AVX512 and _matmul_mk_smalln is cleaned up and uniformed for up to 5::SIMDVector::Size. Most matmul routines are now available at SSE2 level when it makes sense. matmul is now as fast as the dedicated MKL JIT API
  16. AVX512 SIMDVector for int32_t and int64_t introduced. SIMDVector for int32_t and int64_t are now activated at SSE2 level as well
  17. Most intrinsics are now activated at SSE2 level
  18. All views are now reworked so there is now no need for FASTOR_USE_VECTORISED_EXPR_ASSIGN macro unless one wants to vectorise strided views
  19. Multi-dimensional TensorFixedViews introduced. This makes it possible to create arbitrary dimensional tensor views with compile time deducible sizes. This together with dynamic views complete the whole view expressions of Fastor
  20. diag function introduced for viewing the diagonal elements of 2D tensors and works just like other views in that it can appear on either side of an equation (can be assigned to)
  21. Major bug fix for in-place division of all expressions by integral numbers
  22. A lot of new features, traits and internal development tools have been added.
  23. As a result Fastor now requires a C++14 supporting compiler

The next few releases from here on will be incremental and will focus on ironing out corner cases while new features will be continuously rolled out.