tp plan should not be NONE #38255

ArthurZucker · 2025-05-21T07:13:29Z

What does this PR do?

This PR was reverted because it was failing... 711d78d

…caled_dot_product_attention

TP2 DP2 works CP2 DP2 works

…rs into nouamane/nanotron

…t's okay

…ane/nanotron

…rs into nouamane/nanotron

HuggingFaceDocBuilderDev · 2025-05-21T07:28:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

pcuenca · 2025-05-21T11:47:44Z

src/transformers/modeling_utils.py

* accept custom device_mesh * fix device_map * assert that num_heads % tp_size == 0 * todo. * ReplicateParallel * handle tied weights * handle dtensor in save_pretrained with safe_serialization * tp test works * doesnt work * fix shard_and_distribute_module's rank should be local_rank * tp=4 is correct * dp+tp is broken * todo allreduce with dtensors on another dim is annoying * workaround to sync dp grads when using dtensors * loading a checkpoint works * wandb and compare losses with different tp/dp * cleaning * cleaning * . * . * logs * CP2 DP2 no mask works after commenting attn_mask and is_causal from scaled_dot_product_attention * DP=2 TP=2 now works even with tied embeddings * model.parameters() and model.module.parameters() are empty.. * reformat sanity_check_tensor_sync * set atol=1e-4 for CP to pass * try populate _parameters from named_modules * refactors TP2 DP2 works CP2 DP2 works * is_causal=True and pack sequences, no attn mask, and preshuffle dataset * fix packing * CP=4 doesn't work * fix labels and position_ids for CP * DP CP works with transformers 🥳🥳🥳 * refactor * add example cp * fixup * revert sdpa changes * example cleared * add CP, DP to the mesh init * nit * clean * use `ALL_PARALLEL_STYLES` * style * FSDP works * log on 1 rank * . * fix? * FSDP1 also has .parameters() bug * reported gradnorm when using FSDP1 is wrong, but loss is correct so it's okay * . * style and fixup * move stuff around * fix tests * style * let's make it a check * add missing licences * warning should be an info * tp plan should not be NONE * test all * god damn it * test all --------- Co-authored-by: nouamanetazi <nouamane98@gmail.com>

NouamaneTazi added 30 commits April 29, 2025 22:15

accept custom device_mesh

3d90a99

fix device_map

df1eaee

assert that num_heads % tp_size == 0

b929886

todo.

1df751b

ReplicateParallel

5887ffc

handle tied weights

924ccee

handle dtensor in save_pretrained with safe_serialization

cfacec5

tp test works

9833305

doesnt work

7d7b363

fix shard_and_distribute_module's rank should be local_rank

11f02a5

tp=4 is correct

317c027

dp+tp is broken

f3b4ae8

todo allreduce with dtensors on another dim is annoying

f6a49ee

workaround to sync dp grads when using dtensors

eaa6592

loading a checkpoint works

7c6219b

wandb and compare losses with different tp/dp

6ceabe0

cleaning

a9a1592

cleaning

4e323a5

.

7f327b1

.

c3e5c5e

logs

810bd51

CP2 DP2 no mask works after commenting attn_mask and is_causal from s…

8234873

…caled_dot_product_attention

DP=2 TP=2 now works even with tied embeddings

29c2a9c

model.parameters() and model.module.parameters() are empty..

8fa760b

reformat sanity_check_tensor_sync

610e6bb

set atol=1e-4 for CP to pass

75cad51

try populate _parameters from named_modules

b816a3c

refactors

688107c

TP2 DP2 works CP2 DP2 works

is_causal=True and pack sequences, no attn mask, and preshuffle dataset

cfe688b

fix packing

8309521

ArthurZucker and others added 21 commits May 15, 2025 13:39

Merge branch 'nouamane/nanotron' of github.com:huggingface/transforme…

6d462e9

…rs into nouamane/nanotron

style

43c175d

FSDP works

378b2e7

log on 1 rank

30752c6

.

9c1e1fc

fix?

3f683b6

Merge branch 'nouamane/nanotron' of github.com:huggingface/transforme…

d36acce

…rs into nouamane/nanotron

FSDP1 also has .parameters() bug

780d74d

reported gradnorm when using FSDP1 is wrong, but loss is correct so i…

9e54969

…t's okay

.

ba01287

style and fixup

677ce53

move stuff around

81c21de

Merge branch 'main' of github.com:huggingface/transformers into nouam…

656277c

…ane/nanotron

fix tests

e27ddb8

style

d702d94

let's make it a check

5083c0b

add missing licences

1bd83c4

warning should be an info

67a8182

Merge branch 'nouamane/nanotron' of github.com:huggingface/transforme…

48761ea

…rs into nouamane/nanotron

tp plan should not be NONE

3beeb90

test all

4c9edca

ArthurZucker added 2 commits May 21, 2025 10:06

god damn it

7ee946a

test all

304be60

ArthurZucker enabled auto-merge (squash) May 21, 2025 08:16

ArthurZucker disabled auto-merge May 21, 2025 08:22

ArthurZucker merged commit e288ee0 into main May 21, 2025
21 checks passed

ArthurZucker deleted the nouamane/nanotron branch May 21, 2025 08:22

pcuenca reviewed May 21, 2025

View reviewed changes

src/transformers/modeling_utils.py

Copy link

Member

pcuenca May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tp plan should not be NONE #38255

tp plan should not be NONE #38255

Uh oh!

ArthurZucker commented May 21, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented May 21, 2025

Uh oh!

Uh oh!

pcuenca May 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tp plan should not be NONE #38255

tp plan should not be NONE #38255

Uh oh!

Conversation

ArthurZucker commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented May 21, 2025

Uh oh!

Uh oh!

pcuenca May 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ArthurZucker commented May 21, 2025 •

edited

Loading