matmul: modify split 10 case to avoid possible memory issues #360
Open
Description
Related
currently the 10 functionality of matmul creates a matrix which is the size of the result on each process. this can be very dangerous if there is not enough memory.
Feature functionality
An update is needed to apply blocking to the split 10 case so that the memory of the nodes is not stretched quite so much.
Additional context
the blocking sizes should be based on the result of the split sizes of the output not of the inputs, it should be clear why this is a special case and not the general rule