simd-diagonal-load is a C++ library that efficiently loads diagonals into SIMD vectors, when the data is stored in such a way that only columns can be loaded.
The problem is described in
A naive implementation could be formulated as:
int width=1000000; // a big number
uint8_t matrix[width][16];
fill_matrix_with_interesting_values(&matrix);
for (int i=0; i < width - 16; ++i) {
uint8_t diagonal_vector[16];
for (int j=0; j<16; ++j) {
diagonal_vector[j] = matrix[i+j][j];
}
do_something(&diagonal_vector);
}
This library is demnostrated in the
https://github.com/eriksjolund/simd-diagonal-load-json-demo#quick-demonstration