dgemm performance degradation on ARM NEOVERSEV1 with lower P*Q 

I see about 2-10% perf degradation in OpenBLAS/gemm.c benchmark on single core ( also seen on multicore) of graviton3 machine. 
This is the issue due to P*Q used in NEOVERSEV1 is not comparable to what it’s  L2 cache size (which is 1MB).
from the faq of OpenBLAS  : A general rule of thumb for selecting a starting point seems to be that PxQ is about half the size of L2 cache.

![image](https://github.com/OpenMathLib/OpenBLAS/assets/129051745/b14837de-be1d-4f1a-88d1-c537d54e2166)


So may be we should update the NEOVERSEV1 param P*Q as per the  L2 cache size.

Or can anyone help in understanding why NEOVERSEV1 P*Q is kept that low.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dgemm performance degradation on ARM NEOVERSEV1 with lower P*Q #4323

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

dgemm performance degradation on ARM NEOVERSEV1 with lower P*Q #4323

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions