Conversation
2751cd6 to
8adaf32
Compare
…amax(). Make potrf() compile on fma3<avx2> Smarter way of calculating register matrix number of columns in potrf() Make the code compile for ARM again Fixed calculation of RegisterMatrix columns number in potrf()
roversch
left a comment
There was a problem hiding this comment.
Some minor comments. Please address here or in a follow-up MR
| #if XSIMD_WITH_NEON64 | ||
| # include <blast/math/algorithm/arch/neon64/Tile.hpp> | ||
| #endif | ||
|
|
There was a problem hiding this comment.
Maybe use #if, #elif, #else and #error for more robust error messages.
There was a problem hiding this comment.
Yes. But on the other hand, this #if-#elif thingy is already in xsimd. And if it happens that different architecture constants are defined at the same time, then you just have different specializations of tile() declared. The architecture is a parameter of tile(), so they will not conflict. And if you require to instantiate blast::tile() with Arch template parameter for which detail::tile<...>() does not exist, the compiler should give a pretty clear message IMO.
What do you think?
| { | ||
| size_t constexpr SS = SimdSize_v<ET>; | ||
| size_t constexpr TILE_STEP = 4; // TODO: this is almost arbitrary and needs to be ppoperly determined | ||
| size_t constexpr TILE_STEP = 4; // TODO: this is almost arbitrary and needs to be properly determined |
There was a problem hiding this comment.
This is the step in 'columns' right? It depends on the architecture as well. E.g. the optimal tiles are different for AVX and AVX2, AVX512
There was a problem hiding this comment.
Yes, true. Also depends on data type (float or double, and potentially std::complex<float> and std::complex<double>). This is a reminder to do it in a more clever way. It would be also good to avoid duplicating the tile() function code between architectures.
| /** | ||
| * @brief Default size of a panel (in a panel matrix) for a given architecture and data type | ||
| * | ||
| * TODO: Is it always equal to SIMD size? Deprecate? | ||
| * | ||
| * @tparam Arch architecture | ||
| */ |
| { | ||
| std::size_t constexpr registerCapacity(xsimd::avx2) | ||
| { | ||
| return 16; |
There was a problem hiding this comment.
In bytes? Couldn't find the documentation on this function
There was a problem hiding this comment.
Ah, no, it's num_of_vector_registers. Better name?
There was a problem hiding this comment.
It is more precise, yes.
Uh oh!
There was an error while loading. Please reload this page.