This repository contains materials for the lecture on FP8 & Triton, part of the short course on Scaling the LLM training, organized in collaboration with Yandex and Yandex School of Data Analysis.
For materials in Russian, use the ru/ directory. For materials in English, use the en/ directory.
To open the notebook locally, use the following command from the root of this repo:
cd trace-viewer
npm install
npm run devThen navigate to either
http://localhost:5173?trace=var/traces/ru.lecture_triton_fp8.json- or
http://localhost:5173?trace=var/traces/en.lecture_triton_fp8.json,
depending on your preferred language.
To re-generate the traces, run:
python execute.py -m ru.lecture_triton_fp8
python execute.py -m en.lecture_triton_fp8
If you find this content useful, consider citing it as follows:
@misc{LLMScalingWeekFP8Triton,
author = {Savinov, Vladislav},
title = {Speeding up training with Triton and FP8},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/acforvs/ysda-llm-scaling-week}},
year = {2025}
}- DeepSeek-AI. (2024). DeepSeek-V3 Technical Report
- DeepSeek-AI. DeepGEMM
- Team Cohere. (2025). Command A: An Enterprise-Ready Large Language Model
- Micikevicius, P. et al. (2022). FP8 Formats for Deep Learning
- OpenAI. Triton
- NVIDIA. TransformerEngine
- Austin et al. (2025). How to Scale Your Model, Google DeepMind
- Modal Labs. GPU Glossary
- Meta AI. (2025). The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation
I'd like to thank the team behind CS336, which was a big inspiration for how the materials are structured and for some parts of the talk.