We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 56de92d commit bd5218dCopy full SHA for bd5218d
README.md
@@ -10,7 +10,6 @@ High-performance Diffusion Transformer (DiT) implementation from scratch using C
10
- Memory coalescing for Q, K, V matrix operations
11
12
### Attention Kernel Performance Results
13
-I'll convert this into a clean markdown table format.
14
15
| Metric | CUDA Implementation | PyTorch Reference | Improvement |
16
|--------|-------------------|------------------|-------------|
0 commit comments