Fastor V0.6
Fastor V0.6 is a major release that brings a lot fundamental internal redesign and performance improvements. This is perhaps the biggest release since the inception of project Fastor. The following are a list of changes and the new features released in this version
- The whole of Fastor's expression template engine has been reworked to facilitate arbitrary re-structuring of the expressions. Most users will not notice this change as it pertains to internal re-architecturing but the change is quite significant. The main driver for this has been to introduce and chaing linear algebra expressions with other element-wise operations.
- A series of linear algebra expressions are introduced as a result with less verbose names and the other existing linear algebra routines are now moved to a dedicated linear algebra expression module. This lays out the basic building blocks of Fastor's tensor algebra library
- Multiplication operator
%
introduced that evaluate lazy and takes any expression - Greedy like matmul implemented. Operations like
A%B%C%D%...
will be evaluated in the most efficient order inv
function introduced for lazy inversion. Extremely fast matrix inversion up to stack size256x256
trans
function introduced for lazy transpose. Extremely fast AVX512 8x8 double and 16x16 float transpose using explicit SIMD introduceddet
function introduced for lazy determinantsolve
function evaluated for lazy solve.solve
has the behaviour that if both the inputs areTensor
it evaluates immedidately and if either one of the inputs is an expressions it delays the evaluation.solve
is now also able to solve matrices for up to stack size256x256
qr
function introduced for QR factorisation using modified Gram-Schmidt factorisation that has the potential to be easily SIMD vectorised in the future. The scalar implementation at the moment has good performanceabsdet
andlogdet
functions introduced for lazy computation of absolute and natural logarithm of a determinantdeterminant
,matmul
,transpose
and most verbose linear algebra functions can now take expressions but evaluate immediatelyeinsum
,contraction
,inner
,outer
,permutation
,cross
,sum
andproduct
, now all work on expressions.einsum/contraction
for expressions also dispatches to the same operation minimisation algorithms that the non-expression version does hence the above set of new functions are as fast for expressions as they are for tensor types.cross
function for cross product of vectors is introduced as well- Most linear algebra operations like
qr
,det
,solve
take optional parameters (class enums) to request the type of computation for instancedet<DetCompType::Simple>
,qr<QRCompType::MGSR>
etc - MKL (JIT) backend introduced which can be used in the same way as libxsmm
- The backend
_matmul
routines are reworked and specifically tuned for AVX512 and_matmul_mk_smalln
is cleaned up and uniformed for up to5::SIMDVector::Size
. Most matmul routines are now available at SSE2 level when it makes sense.matmul
is now as fast as the dedicated MKL JIT API - AVX512
SIMDVector
forint32_t
andint64_t
introduced.SIMDVector
forint32_t
andint64_t
are now activated at SSE2 level as well - Most intrinsics are now activated at SSE2 level
- All views are now reworked so there is now no need for
FASTOR_USE_VECTORISED_EXPR_ASSIGN
macro unless one wants to vectorise strided views - Multi-dimensional
TensorFixedViews
introduced. This makes it possible to create arbitrary dimensional tensor views with compile time deducible sizes. This together with dynamic views complete the whole view expressions of Fastor diag
function introduced for viewing the diagonal elements of 2D tensors and works just like other views in that it can appear on either side of an equation (can be assigned to)- Major bug fix for in-place division of all expressions by integral numbers
- A lot of new features, traits and internal development tools have been added.
- As a result Fastor now requires a C++14 supporting compiler
The next few releases from here on will be incremental and will focus on ironing out corner cases while new features will be continuously rolled out.