[WIP] [AMD] Emit AMD specific intrinsics for dot #4594

binarman · 2024-08-28T19:22:16Z

This PR:

Makes AccelerateAMDMatmul pass to emit FMA i8xi8->i32 and fp16xfp16->fp32 cases
Extends AMD FMA Dot code generation with new v_dot instructions for fp16xfp16 and int8 dtypes

~~This PR is a part of PR series. Final goal is to improve efficiency of small dot operations and bypass as much shared memory accesses as possible.~~

Rough list of PRs:

Basic FMA dot fixes, dot 3d support and relaxing small dimensions for dot [Backend] Improve dot support to target FMA #4516
Blocked->dotOp shared memory bypassing [Backend] Bypass conversion for suitable blocked to dotOperand layout #4538
Accelerate AMD Matmul + emit dot operations (this PR) [WIP] [AMD] Emit AMD specific intrinsics for dot #4594
~~Layout optimization, so operand B is loaded in proper mfma layout and do not need to go through LDS [WIP] Optimize fma dot #4581~~
~~Vectorization optimization of dot operands/results (in case llvm can not do this internally)~~
~~Reduction operation hoisting out of the K loop (reduction operation is a byproduct of layout optimization step) Hoist reduction outside a loop #4559~~

binarman · 2024-08-28T19:33:17Z

This PR depends on #4516

binarman · 2024-11-18T13:13:16Z

Closing this PR for now.
Will reopen it if base PRs #4516 is merged.

This PR introduces FMA dot operand converter and related tests.

- Fix compiler crashes in FMA.cpp - Fix lit test

- cleanup hash function in FMA.cpp - add more details in TODO in SharedToDotOperandFMA.cpp - cleanup DotOperandEncodingAttr::toLinearLayout

binarman · 2024-12-24T19:03:50Z

Reopening after base FMA fixes are merged(#4516).
This PR currently depends on #5469, because it introduces some changes to shared to dot op conversion.

Will rebase it and move from WIP after #5469 is merged.

This PR: - Makes AccelerateAMDMatmul pass to emit FMA i8xi8->i32 and fp16xfp16->fp32 cases - Extends AMD FMA Dot code generation with new v_dot instructions for fp16xfp16 and int8 dtypes

binarman changed the title ~~[AMD] Emit AMD specific intrinsics for dot~~ [WIP] [AMD] Emit AMD specific intrinsics for dot Aug 28, 2024

alefimov-amd force-pushed the v_dot_codegen branch from b3f384c to 90a467a Compare August 28, 2024 19:51

binarman closed this Nov 18, 2024

binarman added 8 commits December 20, 2024 14:39

Implement conversion from FMA dot operand to linear layout

0ac924a

This PR introduces FMA dot operand converter and related tests.

fix repetitions in FMA dot inputs and outputs

69c3354

- Remove orderedOutDimNames function

547f7f8

- Fix compiler crashes in FMA.cpp - Fix lit test

remove redundant changes

2157f20

fix typo

a7c978b

generate warp and lane layout in broadcast form

eb33d00

- remove legacy converter from pattern

7f2d2a6

- cleanup hash function in FMA.cpp - add more details in TODO in SharedToDotOperandFMA.cpp - cleanup DotOperandEncodingAttr::toLinearLayout

add dot 3d test

c372e5a

binarman reopened this Dec 24, 2024

binarman added 2 commits December 24, 2024 19:26

[AMD] Emit AMD specific intrinsics for dot

f81987f

This PR: - Makes AccelerateAMDMatmul pass to emit FMA i8xi8->i32 and fp16xfp16->fp32 cases - Extends AMD FMA Dot code generation with new v_dot instructions for fp16xfp16 and int8 dtypes

unify common and amd FMA path

8e03e70

binarman force-pushed the v_dot_codegen branch from 90a467a to 8e03e70 Compare December 24, 2024 19:27

binarman added 2 commits December 26, 2024 16:49

fix test_min_dot_size test

bb519bd

Merge remote-tracking branch 'openai/main' into v_dot_codegen

0e77da7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [AMD] Emit AMD specific intrinsics for dot #4594

[WIP] [AMD] Emit AMD specific intrinsics for dot #4594

binarman commented Aug 28, 2024 •

edited

Loading

binarman commented Aug 28, 2024

binarman commented Nov 18, 2024

binarman commented Dec 24, 2024

[WIP] [AMD] Emit AMD specific intrinsics for dot #4594

Are you sure you want to change the base?

[WIP] [AMD] Emit AMD specific intrinsics for dot #4594

Conversation

binarman commented Aug 28, 2024 • edited Loading

binarman commented Aug 28, 2024

binarman commented Nov 18, 2024

binarman commented Dec 24, 2024

binarman commented Aug 28, 2024 •

edited

Loading