Skip to content

[Query] Similar performance with/without polycc for matmul on AMD Ryzen 5 3600 #113

@avithemad

Description

@avithemad

This is just a query to clarify if I am doing something wrong on compiling the examples.

So I was following this tutorial: https://github.com/bondhugula/llvm-project/blob/hop/mlir/docs/HighPerfCodeGen.md. In the section where we see comparison of openblas/mkl with gcc, clang and pluto, I was expecting to see a similar improvement of about 5x to 10x with the tiled schedule, but I do not see any performance improvements.

with pluto

➜  matmul git:(master) ✗ make tiled
../../polycc matmul.c --noparallel  --second-level-tile  -o matmul.tiled.c
[pluto] compute_deps (isl)
[pluto] Number of statements: 1
[pluto] Total number of loops: 3
[pluto] Number of deps: 3
[pluto] Maximum domain dimensionality: 3
[pluto] Number of parameters: 3
[pluto] Diamond tiling not possible/useful
[pluto] Affine transformations [<iter coeff's> <param> <const>]

T(S1): (i, j, k)
loop types (loop, loop, loop)

[Pluto] After tiling:
T(S1): (zT3/16, zT4/2, zT5/16, zT3, zT4, zT5, i, j, k)
loop types (loop, loop, loop, loop, loop, loop, loop, loop, loop)

[Pluto] After intra-tile optimize
T(S1): (zT3/16, zT4/2, zT5/16, zT3, zT4, zT5, i, k, j)
loop types (loop, loop, loop, loop, loop, loop, loop, loop, loop)

[pluto] using statement-wise -fs/-ls options: S1(4,9), 
[Pluto] Output written to matmul.tiled.c

[pluto] Timing statistics
[pluto] SCoP extraction + dependence analysis time: 0.000710s
[pluto] Auto-transformation time: 0.002295s
[pluto] Tile size selection time: 0.000000s
[pluto]                 Total constraint solving time (LP/MIP/ILP) time: 0.000482s
[pluto] Code generation time: 0.023760s
[pluto] Other/Misc time: 0.075358s
[pluto] Total time: 0.102123s
[pluto] All times: 0.000710 0.002295 0.023760 0.075358
gcc -O3 -march=native -mtune=native -ffast-math -DTIME matmul.tiled.c -o tiled -lm

➜  matmul git:(master) ✗ ./tiled
3.056028s
5.62 GFLOPS

with plain gcc O3 flag

➜  matmul git:(master) ✗ gcc  matmul.c -o matmul.gcc -ffast-math -lm -DTIME -O3 -march=native -mtune=native
➜  matmul git:(master) ✗ ./matmul.gcc 
3.340085s
5.14 GFLOPS

and just to get the peak performance, below is the result of using openblas on the same machine

➜  matmul git:(master) ✗ ./openblas 
0.080084s
214.52 GFLOPS

Please let me know in case any additional information is needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions