Skip to content
Merged
Show file tree
Hide file tree
Changes from 137 commits
Commits
Show all changes
139 commits
Select commit Hold shift + click to select a range
7488385
create transformer_backend folder with debug run
3outeille Aug 28, 2025
39a3b34
add hf config
3outeille Aug 28, 2025
ea7c594
can now register train spec for hf model
3outeille Aug 28, 2025
5f0adf5
can now switch with different flavors using HF Llama modeling
3outeille Aug 28, 2025
7c3795c
it is now working up to apply_ac
3outeille Aug 28, 2025
3fb2bf8
now working up to init_weights
3outeille Sep 6, 2025
25daeca
fix mapping when convert_to_hf_config + add breaking test to ensure p…
3outeille Sep 6, 2025
3e67f2c
define own apply_ac for transformer backend instead of reusing llama3
3outeille Sep 8, 2025
8c5c0ae
HF model without any parallelism now train (but grad_norm is high)
3outeille Sep 9, 2025
4ae9560
a bit cleaner way to get passed args
3outeille Sep 10, 2025
9be95f9
now same number of params + same attention backend but noticed highe…
3outeille Sep 10, 2025
bf91447
fix seed and deterministic
3outeille Sep 11, 2025
4c2fc0b
fix torch deterministic for HF modeling that was producing Nans
3outeille Sep 11, 2025
9bffa38
HF model now numerically stable compared to TT (given a fixed attent…
3outeille Sep 15, 2025
40d84cc
handling the is_hf_initialized flag in patch
3outeille Sep 15, 2025
bd3f332
refactor HF transformer model args
3outeille Sep 16, 2025
249be92
wrapper model class to avoid transformers to be explicit in train.py
3outeille Sep 16, 2025
e2d4ada
add better testing script with reference log for later sanity check
3outeille Sep 16, 2025
4b498a9
no need to fill passed args
3outeille Sep 16, 2025
eb403d5
can now handle multiple HF modeling
3outeille Sep 16, 2025
a0d67a7
handle pref logits accessing inside HF model wrapper
3outeille Sep 16, 2025
ea05552
isolate HF patch for llama in another file
3outeille Sep 16, 2025
adefa2c
find hacky way to pass HF model.name through CLI
3outeille Sep 16, 2025
a235863
more granularity of logging when doing parameter breakdown
3outeille Sep 17, 2025
fc43dc8
add __repr__ to HFTransformerModelArgs for better debugging logs
3outeille Sep 17, 2025
23ae378
HF deepseek v3 is now training
3outeille Sep 17, 2025
2573be4
refactor to make it clear which args comes from which parts
3outeille Sep 17, 2025
46ae0a3
fix refactor and simplify things
3outeille Sep 18, 2025
b33d575
hacky way to switch flavors for now
3outeille Sep 18, 2025
007f005
hf deepseek train while matching same param counts as tt deepseek
3outeille Sep 18, 2025
dd2b04c
wtf deepseek q_proj weight init differ ???
3outeille Sep 22, 2025
9abdae3
deepseek now has same weight init in HF & TT. Reasons was rng_state w…
3outeille Sep 22, 2025
f9e90bc
adapt mfu to handle moe
3outeille Sep 22, 2025
ba5d6d1
beginning parallelism by setting tests
3outeille Sep 23, 2025
338a250
better compare_distributed_run test
3outeille Sep 24, 2025
36a5673
add seed + deterministic to compare_distributed_run
3outeille Sep 24, 2025
ed892a2
better extract and compare metrics
3outeille Sep 24, 2025
1c1452f
refactor to introduce slurm
3outeille Sep 24, 2025
5e4911f
error handling with subprocess
3outeille Sep 24, 2025
4891a47
FSDP for llama in 1D works
3outeille Sep 24, 2025
9e260a0
better formatting of compare_distributed_run + display min/max grad_n…
3outeille Sep 24, 2025
a604bee
make FSDP work in a cleaner way (mapping instead of renaming)
3outeille Sep 25, 2025
0b38d0d
Improve logging in compare_distributed_run
3outeille Sep 26, 2025
025a86f
PP for llama in 1D works
3outeille Sep 26, 2025
590737f
simplify PP logic by flattening the named_children hierarchy. This wi…
3outeille Sep 28, 2025
1a9af68
TP now works in 1D
3outeille Sep 28, 2025
e6b9ff5
add test filtering in compare distributed run
3outeille Sep 28, 2025
a4cb8c3
dont generate EP config if model is not a MoE
3outeille Sep 28, 2025
12c0c47
disable torch.utils.deterministic.fill_uninitialized_memory for Moe d…
3outeille Sep 28, 2025
13edc66
CP is now supported
3outeille Sep 29, 2025
52250fb
some cleaning
3outeille Sep 29, 2025
c523ede
cleaner way to make create_causal_mask = None
3outeille Sep 29, 2025
f9f5c66
uniformize llama and moe args passing
3outeille Sep 29, 2025
5a875b6
cleaning code
3outeille Sep 29, 2025
e4d963c
fix same global_batch_size across training + fix float32 for test (ev…
3outeille Sep 30, 2025
957cc4a
refactor compare_distributed_run to make it slurm compatible
3outeille Sep 30, 2025
a317c53
breaking test
3outeille Oct 1, 2025
d2f80a2
refactor test
3outeille Oct 4, 2025
6454e40
fix running job to slurm
3outeille Oct 5, 2025
b99a4d2
finally have a better testing xp with slurm
3outeille Oct 5, 2025
218f400
now everything works (1D/2D/3D/4D). need to fix correctness with PP
3outeille Oct 9, 2025
bb080ad
fix and uniformize weight init of llama-like model + various fix
3outeille Oct 14, 2025
3168f9e
support moe init and fix with moe layer (TP for lora layers)
3outeille Oct 15, 2025
a9a65b7
begin TP + EP with MoE model
3outeille Oct 15, 2025
b4a1b88
cleaning
3outeille Oct 15, 2025
5f1075b
add small example scripts
3outeille Oct 15, 2025
81f1855
Merge branch 'main' into 3outeille/transformers_backend
3outeille Oct 17, 2025
c35ccfc
fix all the merge issues
3outeille Oct 20, 2025
d5ce2e9
get rid of hf patches files and put it in hf_transformer_args
3outeille Oct 20, 2025
8d46723
remove eos_id + refactor Optional[int] to comply with torchtitan conv…
3outeille Oct 20, 2025
087f841
move torch.utils.deterministic.fill_uninitialized_memory = False to u…
3outeille Oct 20, 2025
937c68d
remove test_template for base_config instead
3outeille Oct 20, 2025
4f2b357
separate args &model + dont extract loss metrics -1.0 when double PP …
3outeille Oct 20, 2025
154289d
use recent refactoring for flops computation for dense and moe model
3outeille Oct 21, 2025
1b2cfd7
fix tie_embedding
3outeille Oct 21, 2025
0f2c51e
remove pad_token_id=None
3outeille Oct 21, 2025
4c8b4b7
make it clearer about args
3outeille Oct 21, 2025
c61271e
remove local testing scripts
3outeille Oct 21, 2025
a848545
fix linting
3outeille Oct 21, 2025
9488a16
create CI jobs to guard
3outeille Oct 21, 2025
5be438b
Merge branch 'main' into 3outeille/transformers_backend
3outeille Oct 29, 2025
e8a1757
update the way we register_train_spec
3outeille Oct 29, 2025
141c377
relative path for qwen3_fsdp2_tp2_pp2.toml
3outeille Oct 29, 2025
a67e971
dont use os.environ, use debugmodel or debugmodel_moe
3outeille Oct 29, 2025
060befe
refactor args to make it clearer
3outeille Oct 30, 2025
3425b12
add README
3outeille Oct 31, 2025
7b0ee5d
add requirements.txt
3outeille Oct 31, 2025
3e2222c
fix linting
3outeille Oct 31, 2025
70c348d
fix bug related to training with different seq_len than max_seq_len
3outeille Nov 1, 2025
af0a1cb
decouple MoE logic to another PR
3outeille Nov 1, 2025
980a92b
update experiments README
3outeille Nov 3, 2025
06b6f24
update README to confirm torch.compile support
3outeille Nov 3, 2025
a70c4c4
custom job_config
3outeille Nov 4, 2025
42884cd
remove unecessary change in train_spec
3outeille Nov 4, 2025
4fa0874
rename file to comply with torchtitan style
3outeille Nov 4, 2025
8ffa7f4
reuse ac form torchtitan
3outeille Nov 4, 2025
ff21c2b
reuse ddp from torchtitan
3outeille Nov 4, 2025
0a43a8a
reuse compile from torchtitan llama3
3outeille Nov 4, 2025
8026bc7
reuse compile from torchtitan
3outeille Nov 4, 2025
cd4042f
update parallelize with main
3outeille Nov 4, 2025
0700bdb
remove moe ep tp for now
3outeille Nov 4, 2025
767f71d
fix SequenceParallel for q and k norm
3outeille Nov 5, 2025
7f71f88
job_config.training will always have seq_len
3outeille Nov 5, 2025
7e63a82
fix loading weights in PP by using Module Dict
3outeille Nov 7, 2025
04fb8eb
clean reference qwen config
3outeille Nov 13, 2025
0d80f62
error out if no layer_idx
3outeille Nov 13, 2025
09f0c94
reuse pipeline from torchtitan
3outeille Nov 13, 2025
78d26ff
use c4 test for integration_tests
3outeille Nov 13, 2025
5243795
fix ci
3outeille Nov 13, 2025
84af768
Merge branch 'main' of github.com:huggingface/torchtitan into 3outeil…
3outeille Nov 13, 2025
fe691b8
fix linting
3outeille Nov 13, 2025
5d5ce2b
fix head dims in flops counting
3outeille Nov 14, 2025
6ace9f4
propose an alternative to passing name
3outeille Nov 14, 2025
97cd6fe
fix linting
3outeille Nov 14, 2025
5f1695f
bump transformers version from 4.55.4 to 4.57.1
3outeille Nov 14, 2025
2d2b612
change qwen3 config name
3outeille Nov 18, 2025
a2ea2ef
reuse fsdp from llama3. Moe will be handle in another PR
3outeille Nov 18, 2025
47fb2ea
clean logging
3outeille Nov 18, 2025
20308d3
move TitanDenseModelArgs to args
3outeille Nov 18, 2025
019f2cc
clean
3outeille Nov 18, 2025
fc93b4f
fix integration tests
3outeille Nov 18, 2025
f9e8e11
rename integration test file
3outeille Nov 18, 2025
83b0437
update README
3outeille Nov 18, 2025
fb978dd
revert accidental changes linting
3outeille Nov 18, 2025
71ff098
typo in naming
3outeille Nov 18, 2025
663a415
refactor
3outeille Nov 18, 2025
3dbe6fa
revert the way we select HF modeling in config
3outeille Nov 18, 2025
9be95da
Revert "reuse pipeline from torchtitan"
3outeille Nov 19, 2025
c0c273c
pass deterministic.fill_uninitialized_memory to HF model
3outeille Nov 19, 2025
4c50a00
fix linting
3outeille Nov 19, 2025
5b8d38c
fix integration tests
3outeille Nov 19, 2025
57bb8dd
fix minor stuff
3outeille Nov 20, 2025
3e6013b
Merge branch 'main' into update_transformers_backend_name
3outeille Nov 21, 2025
bffe1c8
rename transformers_backend to transformers_modeling_backend
3outeille Nov 21, 2025
1bbb3a8
Merge branch 'main' into 3outeille/transformers_backend
3outeille Nov 21, 2025
7dcc4a1
Merge branch '3outeille/transformers_backend' into update_transformer…
3outeille Nov 21, 2025
089f31a
typo
3outeille Nov 21, 2025
79e44c1
fix typo
3outeille Nov 21, 2025
d7c21e5
fix Dockerfile
3outeille Nov 22, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the name changed, you will also need to change the name here to make CI running:

pip_install -r /opt/conda/requirements-transformers-backend.txt

File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
name: Transformers Backend 8 GPU Integration Tests
name: Transformers Modeling Backend 8 GPU Integration Tests

on:
push:
branches: [ main ]
paths:
- 'torchtitan/experiments/transformers_backend/**'
- 'torchtitan/experiments/transformers_modeling_backend/**'
pull_request:
paths:
- 'torchtitan/experiments/transformers_backend/**'
- 'torchtitan/experiments/transformers_modeling_backend/**'
schedule:
# Runs every 12 hours
- cron: '0 */12 * * *'
Expand Down Expand Up @@ -50,4 +50,4 @@ jobs:
USE_CPP=0 python -m pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cu126

mkdir artifacts-to-be-uploaded
python -m torchtitan.experiments.transformers_backend.tests.integration_tests artifacts-to-be-uploaded --ngpu 8
python -m torchtitan.experiments.transformers_modeling_backend.tests.integration_tests artifacts-to-be-uploaded --ngpu 8
2 changes: 1 addition & 1 deletion torchtitan/experiments/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@ We provide this `experiments/` folder to host experiments that add significant v
| [moe_symm_mem_kernels](./moe_symm_mem_kernels/) | TBA | [@kwen2501](https://github.com/kwen2501) |
| [gpt_oss](./gpt_oss/) | TBA | [@jianiw](https://github.com/jianiw) |
| [compiler_toolkit](./compiler_toolkit/) | [![Compiler Toolkit 8 GPU Integration Tests](https://github.com/pytorch/torchtitan/actions/workflows/integration_test_8gpu_compiler_toolkit.yaml/badge.svg?branch=main)](https://github.com/pytorch/torchtitan/actions/workflows/integration_test_8gpu_compiler_toolkit.yaml?query=branch%3Amain) | [@SherlockNoMad](https://github.com/SherlockNoMad) [@yiming0416](https://github.com/yiming0416) |
| [transformers_backend](./transformers_backend/) | [![Transformers backend 8 GPU Integration Tests](https://github.com/pytorch/torchtitan/actions/workflows/integration_test_8gpu_transformers_backend.yaml/badge.svg?branch=main)](https://github.com/pytorch/torchtitan/actions/workflows/integration_test_8gpu_transformers_backend.yaml?query=branch%3Amain) | [@3outeille](https://github.com/3outeille) |
| [transformers_modeling_backend](./transformers_modeling_backend/) | [![Transformers modeling backend 8 GPU Integration Tests](https://github.com/pytorch/torchtitan/actions/workflows/integration_test_8gpu_transformers_modeling_backend.yaml/badge.svg?branch=main)](https://github.com/pytorch/torchtitan/actions/workflows/integration_test_8gpu_transformers_modeling_backend.yaml?query=branch%3Amain) | [@3outeille](https://github.com/3outeille) |
2 changes: 1 addition & 1 deletion torchtitan/experiments/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,6 @@
"vlm",
"compiler_toolkit.deepseek_v3",
"compiler_toolkit.llama3",
"transformers_backend",
"transformers_modeling_backend",
]
)
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,20 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also change the title name in README, and add some description that we are only using huggingface model definition as backend

- Requirements `transformers==4.57.1`

- Config: `torchtitan/torchtitan/experiments/transformers_backend/configs/qwen3.toml`
- Config: `torchtitan/torchtitan/experiments/transformers_modeling_backend/configs/qwen3.toml`
```diff
...
[model]
- name = "llama3"
+ name = "transformers_backend"
+ name = "transformers_modeling_backend"
flavor = "debugmodel"
hf_assets_path = "./tests/assets/tokenizer"

+[hf_transformers]
+model = "Qwen/Qwen3-4B-Instruct-2507"
...
```
- Train: `LOG_RANK=7 CONFIG_FILE=<YOUR_PATH>/torchtitan/experiments/transformers_backend/configs/qwen3.toml ./run_train.sh --job.custom_config_module=torchtitan.experiments.transformers_backend.job_config --compile.enable`
- Train: `LOG_RANK=7 CONFIG_FILE=<YOUR_PATH>/torchtitan/experiments/transformers_modeling_backend/configs/qwen3.toml ./run_train.sh --job.custom_config_module=torchtitan.experiments.transformers_modeling_backend.job_config --compile.enable`
- Make sure you have created the tokenizers beforehand
<img width="1334" height="453" alt="image" src="https://github.com/user-attachments/assets/da459448-027b-4af9-8176-6a3e433a272c" />

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ save_tb_folder = "tb"
enable_wandb = false

[model]
name = "transformers_backend"
name = "transformers_modeling_backend"
flavor = "debugmodel"
# test folder with tokenizer.json, for debug purpose only
hf_assets_path = "./tests/assets/tokenizer"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ save_tb_folder = "tb"
enable_wandb = false

[model]
name = "transformers_backend"
name = "transformers_modeling_backend"
flavor = "full"
# test folder with tokenizer.json, for debug purpose only
hf_assets_path = "./tests/assets/tokenizer"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
from torchtitan.distributed.activation_checkpoint import apply_ac

from torchtitan.distributed.tensor_parallel import maybe_enable_async_tp
from torchtitan.experiments.transformers_backend.job_config import JobConfig
from torchtitan.experiments.transformers_modeling_backend.job_config import JobConfig
from torchtitan.models.llama3.infra.parallelize import apply_compile, apply_ddp
from torchtitan.tools.logging import logger

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
from torchtitan.components.loss import LossFunction
from torchtitan.distributed import ParallelDims
from torchtitan.distributed.pipeline_parallel import build_pipeline_schedule
from torchtitan.experiments.transformers_backend.job_config import JobConfig
from torchtitan.experiments.transformers_modeling_backend.job_config import JobConfig
from torchtitan.protocols.train_spec import BaseModelArgs, ParallelizeFunction
from torchtitan.tools.logging import logger

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from tests.integration_tests.run_tests import run_tests


def build_transformers_backend_test_list() -> list[OverrideDefinitions]:
def build_transformers_modeling_backend_test_list() -> list[OverrideDefinitions]:
"""
key is the config file name and value is a list of OverrideDefinitions
that is used to generate variations of integration tests based on the
Expand All @@ -21,8 +21,8 @@ def build_transformers_backend_test_list() -> list[OverrideDefinitions]:
OverrideDefinitions(
[
[
"--model.name transformers_backend",
"--job.custom_config_module=torchtitan.experiments.transformers_backend.job_config",
"--model.name transformers_modeling_backend",
"--job.custom_config_module=torchtitan.experiments.transformers_modeling_backend.job_config",
"--hf_transformers.model Qwen/Qwen2.5-7B",
"--parallelism.data_parallel_shard_degree 2",
"--parallelism.tensor_parallel_degree 2",
Expand All @@ -31,15 +31,15 @@ def build_transformers_backend_test_list() -> list[OverrideDefinitions]:
],
],
"Transformers Backend FSDP+TP+PP",
"transformers_backend_fsdp+tp+pp",
"transformers_modeling_backend_fsdp+tp+pp",
ngpu=8,
),
]
return integration_tests_flavors


_TEST_SUITES_FUNCTION = {
"transformers_backend": build_transformers_backend_test_list,
"transformers_modeling_backend": build_transformers_modeling_backend_test_list,
}


Expand All @@ -64,7 +64,7 @@ def main():
if os.listdir(args.output_dir):
raise RuntimeError("Please provide an empty output directory.")

test_list = _TEST_SUITES_FUNCTION["transformers_backend"]()
test_list = _TEST_SUITES_FUNCTION["transformers_modeling_backend"]()
run_tests(args, test_list)


Expand Down
Loading