Open
Description
We'll start to explore using MLIR transform dialect to do codegen for (fused) compute-intensive pattern. The initial target is to support gemm codegen on ARM platform to address the dynamic shape problem of Arm Compute Library.
The initial plan is:
- Step 1, enhance the fusion decision pass. We’ll add a new fusion kind
kTransform
for the transform-based fusion pattern. - Step 2, lower the lmhlo fusion op to linalg on tensor.
- Step 3, transform the linalg computation to loops using transform dialect.
- Step 4, refined the transformed loop to make it suitable for BladeDISC runtime.
- Step 5, add a new pass to the disc pass pipeline to drive the above process.
- Step 6, weight pre-packing support
- add
disc_linalg.multi_level_pack
op, used for doing packing. - add
transform.disc.cache_read
transform op, relying ondisc_linalg.multi_level_pack
op. - add folding support for
disc_linalg.multi_level_pack
. - lower
disc_linalg.multi_level_pack
to loop if it can not be folded. - fuse const weight op into the
kTransform
fusion pattern, lower it to linalg and then schedule it.
- add
- Step 7, assign a default schedule for each
kTransform
pattern. - Step 8, schedule selection logic injection
- Step 9, initial model level testing: bert (albert).
- Step 10, support nt, tn, tt format GEMM.
- Step 11, support batch matmul
- Step 12, support GEMM epilogue fusion.
- Step 13, performance optimization