#
b200
Here are 2 public repositories matching this topic...
Code for pre-training a GPT-2 model on (eight) NVIDIA DGX B200 GPUs and short tutorial on the topic. Uses Torch and HF Transformers. It can pre-train GPT-2 Small on 32 GB of data in around 2.5 hours. It handles dataset tokenization too.
-
Updated
Sep 6, 2025 - Python
Improve this page
Add a description, image, and links to the b200 topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the b200 topic, visit your repo's landing page and select "manage topics."