Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [AMD] Emit AMD specific intrinsics for dot #4594

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

binarman
Copy link
Contributor

@binarman binarman commented Aug 28, 2024

This PR:

  • Makes AccelerateAMDMatmul pass to emit FMA i8xi8->i32 and fp16xfp16->fp32 cases
  • Extends AMD FMA Dot code generation with new v_dot instructions for fp16xfp16 and int8 dtypes

This PR is a part of PR series. Final goal is to improve efficiency of small dot operations and bypass as much shared memory accesses as possible.

Rough list of PRs:

@binarman
Copy link
Contributor Author

This PR depends on #4516

@binarman
Copy link
Contributor Author

Closing this PR for now.
Will reopen it if base PRs #4516 is merged.

@binarman binarman closed this Nov 18, 2024
This PR introduces FMA dot operand converter and related tests.
- Fix compiler crashes in FMA.cpp
- Fix lit test
- cleanup hash function in FMA.cpp
- add more details in TODO in SharedToDotOperandFMA.cpp
- cleanup DotOperandEncodingAttr::toLinearLayout
@binarman
Copy link
Contributor Author

Reopening after base FMA fixes are merged(#4516).
This PR currently depends on #5469, because it introduces some changes to shared to dot op conversion.

Will rebase it and move from WIP after #5469 is merged.

@binarman binarman reopened this Dec 24, 2024
This PR:

- Makes AccelerateAMDMatmul pass to emit FMA i8xi8->i32 and fp16xfp16->fp32 cases
- Extends AMD FMA Dot code generation with new v_dot instructions for fp16xfp16 and int8 dtypes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant