You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/blog/2025-11-05-1762335811.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ Some final takeaways:
20
20
21
21
It's pretty clear that ML compilers are going to be a big deal. NVIDIA's TensorRT is also an ML compiler, but it only targets their GPUs. Once the generated machine code (from cross-vendor ML compilers) is comparable in performance to hand-tuned kernels, these compilers are going to break the (in)famous moat of CUDA.
22
22
23
-
And thankfully, this will also finally make AMD's consumer GPUs more accessible to developers (by making AMD's terrible support for ROCm on consumer GPUs unnecessary). Yes, cheap shot, but I've lost a lot of hair trying to support AMD's consumer GPUs over the years.
23
+
And thankfully, this will also finally make AMD's consumer GPUs more accessible to developers (by codifying the immense tribal knowledge of various ROCm versions on AMD's consumer GPUs).
24
24
25
25
Hand-written kernels could go the way of hand-written assembly code. This was always going to happen eventually, but I think it's pretty close now.
0 commit comments