There are two variants: * AVX512_VNNI (Tiger Lake, Rocket Lake) - 512bit/256bit/128bit * AVX_VNNI - (upcoming Alder Lake) - 256bit/128bit VNNI replaces 3 simd instructions with one instruction. It seems that we can use it inside `MultiplyGroup()`. https://software.intel.com/content/www/us/en/develop/articles/intel-advanced-vector-extensions-512-intel-avx-512-new-vector-neural-network-instruction.html