Use optimal kernel parameters (architectures, matrix layouts)

I am trying to figure out what to use as optimal kernel parameter for different architectures.

For example, it looks like blis is using 8x4 for Sandy Bridge, but 8x6 for Haswell. Why? What lead them to this setup? Specifically, because operations are usually on 4 doubles at a time, how does the 6 fit in there. Is Haswell able to separately execute a `_mm256` and a `_mm` operation *at the same time*?

Furthermore, if we have non-square kernels like for dgemm, is there a scenario where choosing 4x8 over 8x4 is better?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use optimal kernel parameters (architectures, matrix layouts) #34

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Use optimal kernel parameters (architectures, matrix layouts) #34

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions