-
Notifications
You must be signed in to change notification settings - Fork 75
Open
Description
This is just a query to clarify if I am doing something wrong on compiling the examples.
So I was following this tutorial: https://github.com/bondhugula/llvm-project/blob/hop/mlir/docs/HighPerfCodeGen.md. In the section where we see comparison of openblas/mkl with gcc, clang and pluto, I was expecting to see a similar improvement of about 5x to 10x with the tiled schedule, but I do not see any performance improvements.
with pluto
➜ matmul git:(master) ✗ make tiled
../../polycc matmul.c --noparallel --second-level-tile -o matmul.tiled.c
[pluto] compute_deps (isl)
[pluto] Number of statements: 1
[pluto] Total number of loops: 3
[pluto] Number of deps: 3
[pluto] Maximum domain dimensionality: 3
[pluto] Number of parameters: 3
[pluto] Diamond tiling not possible/useful
[pluto] Affine transformations [<iter coeff's> <param> <const>]
T(S1): (i, j, k)
loop types (loop, loop, loop)
[Pluto] After tiling:
T(S1): (zT3/16, zT4/2, zT5/16, zT3, zT4, zT5, i, j, k)
loop types (loop, loop, loop, loop, loop, loop, loop, loop, loop)
[Pluto] After intra-tile optimize
T(S1): (zT3/16, zT4/2, zT5/16, zT3, zT4, zT5, i, k, j)
loop types (loop, loop, loop, loop, loop, loop, loop, loop, loop)
[pluto] using statement-wise -fs/-ls options: S1(4,9),
[Pluto] Output written to matmul.tiled.c
[pluto] Timing statistics
[pluto] SCoP extraction + dependence analysis time: 0.000710s
[pluto] Auto-transformation time: 0.002295s
[pluto] Tile size selection time: 0.000000s
[pluto] Total constraint solving time (LP/MIP/ILP) time: 0.000482s
[pluto] Code generation time: 0.023760s
[pluto] Other/Misc time: 0.075358s
[pluto] Total time: 0.102123s
[pluto] All times: 0.000710 0.002295 0.023760 0.075358
gcc -O3 -march=native -mtune=native -ffast-math -DTIME matmul.tiled.c -o tiled -lm
➜ matmul git:(master) ✗ ./tiled
3.056028s
5.62 GFLOPS
with plain gcc O3 flag
➜ matmul git:(master) ✗ gcc matmul.c -o matmul.gcc -ffast-math -lm -DTIME -O3 -march=native -mtune=native
➜ matmul git:(master) ✗ ./matmul.gcc
3.340085s
5.14 GFLOPS
and just to get the peak performance, below is the result of using openblas on the same machine
➜ matmul git:(master) ✗ ./openblas
0.080084s
214.52 GFLOPS
Please let me know in case any additional information is needed.
Metadata
Metadata
Assignees
Labels
No labels