Speeding up training with Triton and FP8 / Ускоряем обучения за счёт Triton и FP8

This repository contains materials for the lecture on FP8 & Triton, part of the short course on Scaling the LLM training, organized in collaboration with Yandex and Yandex School of Data Analysis.

Local setup

For materials in Russian, use the ru/ directory. For materials in English, use the en/ directory.

To open the notebook locally, use the following command from the root of this repo:

cd trace-viewer
npm install
npm run dev

Then navigate to either

http://localhost:5173?trace=var/traces/ru.lecture_triton_fp8.json
or http://localhost:5173?trace=var/traces/en.lecture_triton_fp8.json,

depending on your preferred language.

Re-running the code (>= H100 is required)

To re-generate the traces, run:

python execute.py -m ru.lecture_triton_fp8
python execute.py -m en.lecture_triton_fp8

Citation

If you find this content useful, consider citing it as follows:

@misc{LLMScalingWeekFP8Triton,
  author = {Savinov, Vladislav},
  title = {Speeding up training with Triton and FP8},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/acforvs/ysda-llm-scaling-week}},
  year = {2025}
}

References

DeepSeek-AI. (2024). DeepSeek-V3 Technical Report
DeepSeek-AI. DeepGEMM
Team Cohere. (2025). Command A: An Enterprise-Ready Large Language Model
Micikevicius, P. et al. (2022). FP8 Formats for Deep Learning
OpenAI. Triton
NVIDIA. TransformerEngine
Austin et al. (2025). How to Scale Your Model, Google DeepMind
Modal Labs. GPU Glossary
Meta AI. (2025). The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation

Acknowledgments

I'd like to thank the team behind CS336, which was a big inspiration for how the materials are structured and for some parts of the talk.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
en		en
profile		profile
ru		ru
trace-viewer		trace-viewer
var		var
README.md		README.md
arxiv_util.py		arxiv_util.py
execute.py		execute.py
execute_util.py		execute_util.py
file_util.py		file_util.py
lecture_util.py		lecture_util.py
reference.py		reference.py
torch_util.py		torch_util.py
triton_util.py		triton_util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speeding up training with Triton and FP8 / Ускоряем обучения за счёт Triton и FP8

Local setup

Re-running the code (>= H100 is required)

Citation

References

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

acforvs/ysda-llm-scaling-week

Folders and files

Latest commit

History

Repository files navigation

Speeding up training with Triton and FP8 / Ускоряем обучения за счёт Triton и FP8

Local setup

Re-running the code (>= H100 is required)

Citation

References

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages