-
Notifications
You must be signed in to change notification settings - Fork 23
Grouped GEMM with ck_tile #434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
matthiasdiener
wants to merge
58
commits into
dev
Choose a base branch
from
ck-grouped-gemm
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+381
−11
Open
Changes from all commits
Commits
Show all changes
58 commits
Select commit
Hold shift + click to select a range
ad748da
GEMM reference HIP implementation
matthiasdiener 11e090b
blockwise amax
matthiasdiener 9006224
Merge branch 'dev' into compute-ref-offload
matthiasdiener 3ecea7f
Change to use Tensor arguments, combine mxfp8/non-mxfp8 paths
matthiasdiener cafee59
Merge remote-tracking branch 'origin/dev' into compute-ref-offload
matthiasdiener 86fbbac
skip on SwizzleScale limitation on gfx950
matthiasdiener 54de3db
Revert "skip on SwizzleScale limitation on gfx950"
matthiasdiener 311ddfe
MXFP8 fix
matthiasdiener 306e432
Merge remote-tracking branch 'origin/dev' into compute-ref-offload
matthiasdiener 445e64f
correct scale_inv packing and exp2(biased−127) conversion
matthiasdiener 462945f
cleanups
matthiasdiener e32fb3d
Merge branch 'dev' into compute-ref-offload
matthiasdiener 7bf8adb
Merge remote-tracking branch 'origin/dev' into compute-ref-offload
matthiasdiener e11e400
use Tensor class for more device objects
matthiasdiener 325ece6
Pass D Tensor into run_reference and move RefD allocation into Perfor…
matthiasdiener fc64b8c
[WIP] proof-of-concept: grouped GEMM with ck_tile
matthiasdiener 134b350
Merge branch 'dev' into ck-grouped-gemm
matthiasdiener 9091e6c
restructure and enable tests
matthiasdiener 7435062
Merge remote-tracking branch 'origin/dev' into ck-grouped-gemm
matthiasdiener a00a1c8
Merge remote-tracking branch 'origin/dev' into ck-grouped-gemm
matthiasdiener 4e9ead9
grid improvements
matthiasdiener 259645c
restructure
matthiasdiener 9986bd4
reduce code duplication & simplify
matthiasdiener 355ec2f
make the code more similar to nv, check emopty gelu/bias
matthiasdiener df5e3ea
Merge branch 'dev' into ck-grouped-gemm
matthiasdiener a42f7ca
further simplify & make closer to nv
matthiasdiener fac7c11
add ck_tile reference
matthiasdiener 71b97e0
rename in error messages
matthiasdiener dd3ed2f
allow flattened higher-D tensors
matthiasdiener 7b0413e
Merge remote-tracking branch 'origin/dev' into ck-grouped-gemm
matthiasdiener ebc005f
relax tolerance on gfx942
matthiasdiener c0bf502
enable more tests
matthiasdiener 0b16287
return early when num_gemms<=0
matthiasdiener 58b34e7
simplify normalization
matthiasdiener 74f229a
Merge remote-tracking branch 'origin/dev' into ck-grouped-gemm
matthiasdiener e28c801
run hipblaslt for num_gemms==1
matthiasdiener 6151b96
Merge remote-tracking branch 'origin/dev' into ck-grouped-gemm
matthiasdiener 5c57d47
disable ck_tile when accumulate=true
matthiasdiener 29d6ab7
Merge remote-tracking branch 'origin/dev' into ck-grouped-gemm
matthiasdiener 6e9aae4
Merge remote-tracking branch 'origin/dev' into ck-grouped-gemm
matthiasdiener 2e844d9
remove test file
matthiasdiener 4aa8229
Merge branch 'dev' into ck-grouped-gemm
matthiasdiener f680d6a
fix copyright header
matthiasdiener 6d85088
simplify calls in dispatch_grouped
matthiasdiener 7910038
remove is_mi3*0_class
matthiasdiener e8ebb0e
disable unused constants
matthiasdiener deb7474
Merge remote-tracking branch 'origin/dev' into ck-grouped-gemm
matthiasdiener e866bc6
add another fallback
matthiasdiener ee438fb
implement Primus-Turbo selection logic, persistent descs
matthiasdiener b65dbfa
Merge remote-tracking branch 'origin/dev' into ck-grouped-gemm
matthiasdiener 0cbf1cd
tighten tolerances
matthiasdiener 98e0c66
use namespace, various cleanups
matthiasdiener 36bd68e
avoid creating vector with Tensors
matthiasdiener 070c58d
Merge remote-tracking branch 'origin/dev' into ck-grouped-gemm
matthiasdiener c5d83a4
merge dispatch_grouped into ck_tile_grouped_gemm
matthiasdiener 56afb04
Merge remote-tracking branch 'origin/dev' into ck-grouped-gemm
matthiasdiener 26dfbb6
same tolerances for gfx950
matthiasdiener 7b1dbfa
add to readme
matthiasdiener File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.