-
Notifications
You must be signed in to change notification settings - Fork 2
Home
chunying edited this page Dec 27, 2022
·
2 revisions
peak A72: 1.8GHz *2MLA * 4 float/neon = 14.4 GFlops
- test with MegPeak:
fmla_x2 throughput: 1.116263 ns 14.333539 GFlops latency: 7.808628 ns - test with tengine 16x4 kernel:
void sgemm_A16_B4(float *mid_A, float *B, float *mid_B, float *C, int m, int n, int k)
{
for (int i = 0; i < m; i += 16) {
for (int j = 0; j < n; j += 4) {
tengine_4x16_kernel(C, mid_B + j * k, mid_A + i * k, k);
}
}
}
firefly@chun:~/chun/Tengine_gemm_tutorial/step3$ taskset 0x10 ./test
[m n k]: 512 512 512
[tengine 4x16]: 22.13 ms , GFLOPS = 12.129880
12.12988/14.4 = 0.8423
✔️ this kernel can attained 84.23% peak performance