Skip to content

Commit a7c188f

Browse files
Update README.md
1 parent af01550 commit a7c188f

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ One of the primary objectives for this project is to develop a solution that can
1212

1313
### **Current Implementation:**
1414

15-
This currently supports unconditional diffusion model training, and the end-to-end training loop is currently running at about 55% the speed of PyTorch with `torch.compile` when run on a single H100. Further detailed benchmarks will have to be done to understand bottlenecks + adjust implementation for better performance. I do think we can incorporate low-precision training here, though.
15+
This currently supports unconditional diffusion model training, and the end-to-end training loop is currently running at about 55% the speed of PyTorch with `torch.compile` when run on a single H100. Further detailed benchmarks will have to be done to understand bottlenecks + adjust implementation for better performance. I do think we can incorporate low-precision training here, though (probably FP16 w/ loss scaling).
1616

1717
| Platform | Time on H100 (ms) |
1818
|--------------------------------------|-------------------|

0 commit comments

Comments
 (0)