-
Couldn't load subscription status.
- Fork 286
[AMD] fix bug&add amd fp8 examples #966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
46794c4
499daa3
555537a
f84bc97
21cf0c3
9b2fab3
24e08ae
bc2663a
4d427d9
4fd8529
cf99bef
e839192
70f3f6a
2bf7961
f7f6131
91e9548
460c64f
8eefca0
7bd45c5
4cf8c30
bc22219
50b97e1
05305f2
923fc6d
7b7fda3
1008679
f490b4a
b39ada8
d62b898
e7b0f30
8c73c9c
c8aec22
4549e0e
ccadc2e
b491082
3289910
10870e1
f20cd33
570c6c9
fff5543
d5e3b6b
3f15c59
0582143
e8f0d9f
335bbc6
cf8cc88
3a00c4d
3b839d2
4c11021
bc9a5fb
dd5b64f
42e5538
9d53c8a
cd3b6b5
3072de6
1913abb
acaf988
4bc49cd
ae39e35
0c0fa53
9a4a08f
8b345ae
c34315c
cd1564d
426df21
40eabcd
5b6bcaa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -109,3 +109,13 @@ template <typename T1, typename T2> | |
| TL_DEVICE void AtomicAdd(T1 *address, T2 val) { | ||
| atomicAdd(reinterpret_cast<T1 *>(address), static_cast<T1>(val)); | ||
| } | ||
|
|
||
| // Overload for when the first argument is a value instead of a pointer | ||
| template <typename T1, typename T2> | ||
| TL_DEVICE void AtomicAdd(T1 address, T2 val) { | ||
| atomicAdd(reinterpret_cast<T1 *>(&address), static_cast<T1>(val)); | ||
| } | ||
|
|
||
| template <typename T1, typename T2> TL_DEVICE T1 AtomicAddRet(T1 &ref, T2 val) { | ||
| return atomicAdd(&ref, static_cast<T1>(val)); | ||
| } | ||
|
Comment on lines
+113
to
+121
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Invalid AtomicAdd overload takes a value; atomics must target real memory
Remove this overload and, if needed, provide a reference overload (non‑returning) alongside AtomicAddRet: -// Overload for when the first argument is a value instead of a pointer
-template <typename T1, typename T2>
-TL_DEVICE void AtomicAdd(T1 address, T2 val) {
- atomicAdd(reinterpret_cast<T1 *>(&address), static_cast<T1>(val));
-}
+// Preferred non-returning reference overload (mirrors AtomicAddRet)
+template <typename T1, typename T2>
+TL_DEVICE void AtomicAdd(T1& ref, T2 val) {
+ atomicAdd(&ref, static_cast<T1>(val));
+}Optionally constrain T1 to HIP‑supported atomic types via static_assert. 🤖 Prompt for AI Agents |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -70,7 +70,9 @@ template <int M, int N, int K, int num_warp_m, int num_warp_n, bool TransposeA, | |
| typename B_type, typename C_type, typename AccDataType = float> | ||
| class GemmTensorOp { | ||
| public: | ||
| static_assert(!clear_accum, "clear_accum=true is not supported yet"); | ||
| // Note: clear_accum=true is not fully supported in HIP implementation | ||
| // but we'll handle it by manually clearing the accumulator | ||
| // static_assert(!clear_accum, "clear_accum=true is not supported yet"); | ||
|
Comment on lines
+73
to
+75
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chainVerify that clear_accum=true is properly handled. The static_assert has been removed and replaced with a comment stating that clear_accum will be "manually cleared." However, examining the If Run the following script to check if clear_accum is handled elsewhere or if callers are responsible for zeroing: 🏁 Script executed: #!/bin/bash
# Search for clear_accum usage patterns and accumulator initialization
rg -n -C5 'clear_accum.*true|C_local.*=.*0|memset.*C_local' --type=cpp --type=hLength of output: 3003 Restore or implement clear_accum handling 🤖 Prompt for AI Agents
Comment on lines
+73
to
+75
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do not drop support guard without implementing clear_accum behavior Removing the static_assert enables clear_accum=true but no code zeros accumulators, yielding nondeterministic results. Add zeroing when clear_accum is true. Example insertion at the start of body() and body_rs() before the ki loop: if constexpr (clear_accum) {
for (int i = 0; i < warp_rows; ++i) {
for (int j = 0; j < warp_cols; ++j) {
((float32x4*)C_local)[i * warp_cols + j] = 0;
}
}
}Optionally re‑enable the static_assert if you cannot guarantee correct zeroing on all code paths. 🤖 Prompt for AI Agents |
||
|
|
||
| static constexpr int micro_size_x = 16; | ||
| static constexpr int micro_size_y = 16; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical logic error: taking address of pass-by-value parameter.
This overload takes
T1 addressby value and then uses&addressto get a pointer. This retrieves the address of the local copy created for the function parameter, not the address of the caller's original variable. The atomic operation would be performed on the local copy, which is destroyed when the function returns, making the atomic operation completely ineffective.Either remove this overload entirely if it's not needed, or if the intent was to support non-pointer types, the signature should be
AtomicAdd(T1 &address, T2 val)(taking a reference).Apply this diff to fix the signature to take a reference:
🤖 Prompt for AI Agents