test tensor parallel: make tests for dense model more robust #41968

3outeille · 2025-10-31T15:26:08Z

Previous tests were not robust enough. New changes tests:

Non tp model against TP model (both eval-train) in forward/ forward + backward
Non tp model against TP model (both eval-train) in forward + compile

HuggingFaceDocBuilderDev · 2025-10-31T16:04:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2025-11-03T10:22:22Z

tests/tensor_parallel/test_tensor_parallel.py

+    # Load non-TP model and move to same device as TP model
+    device = model_tp.device
+    model = AutoModelForCausalLM.from_pretrained(model_id, dtype="auto")
+    model = model.to(device)


ah no device map auto because you always run with torchrun?

I run with pytest and this will call torch.mp.spawn which is pretty much like running with torchrun. I dont want to set device_map=auto because I am already setting manually the model to a specific device right after

tests/tensor_parallel/test_tensor_parallel.py

* Super * Super * Super * Super --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* detectron2 - part 1 * detectron2 - part 2 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

fix autoawq[kernels] Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

…ed for inference (#41969) * add support for saving encoder only so any decoder model can be loaded Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * use convolution_bias * convert modular * convolution_bias in convertion script --------- Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com> Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

github-actions · 2025-11-03T12:42:09Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: fastspeech2_conformer, parakeet

…face#41968) * make test forward and backward more robust * refactor compile part of test tensor parallel * linting * pass rank around instead of calling it over and over * Run slow v2 (huggingface#41914) * Super * Super * Super * Super --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix `detectron2` installation in docker files (huggingface#41975) * detectron2 - part 1 * detectron2 - part 2 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix `autoawq[kernels]` installation in quantization docker file (huggingface#41978) fix autoawq[kernels] Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * add support for saving encoder only so any parakeet model can be loaded for inference (huggingface#41969) * add support for saving encoder only so any decoder model can be loaded Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * use convolution_bias * convert modular * convolution_bias in convertion script --------- Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com> Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com> --------- Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com> Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

3outeille added 2 commits October 31, 2025 15:50

make test forward and backward more robust

302ab8a

refactor compile part of test tensor parallel

cd032c1

3outeille force-pushed the v4.57.1-test_tensor_parallel branch from daf1f25 to cd032c1 Compare October 31, 2025 15:50

linting

68501dd

3outeille changed the title ~~V4.57.1 test tensor parallel: make them more robust~~ test tensor parallel: make tests for dense model more robust Nov 1, 2025

pass rank around instead of calling it over and over

b57731b

3outeille requested review from ArthurZucker and Cyrilvallez November 1, 2025 13:12

ArthurZucker approved these changes Nov 3, 2025

View reviewed changes

ydshieh and others added 4 commits November 3, 2025 12:40

Run slow v2 (#41914)

ddeeea8

* Super * Super * Super * Super --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix detectron2 installation in docker files (#41975)

7d29ac3

* detectron2 - part 1 * detectron2 - part 2 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix autoawq[kernels] installation in quantization docker file (#41978)

c921f53

fix autoawq[kernels] Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

3outeille force-pushed the v4.57.1-test_tensor_parallel branch from dea23c9 to 2d435ba Compare November 3, 2025 12:41

Merge branch 'main' into v4.57.1-test_tensor_parallel

fc550b2

3outeille merged commit b433ec8 into main Nov 3, 2025
16 checks passed

3outeille deleted the v4.57.1-test_tensor_parallel branch November 3, 2025 12:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test tensor parallel: make tests for dense model more robust #41968

test tensor parallel: make tests for dense model more robust #41968

Uh oh!

3outeille commented Oct 31, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Oct 31, 2025

Uh oh!

ArthurZucker Nov 3, 2025

Uh oh!

3outeille Nov 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

test tensor parallel: make tests for dense model more robust #41968

test tensor parallel: make tests for dense model more robust #41968

Uh oh!

Conversation

3outeille commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Oct 31, 2025

Uh oh!

ArthurZucker Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

3outeille Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

3outeille commented Oct 31, 2025 •

edited

Loading

3outeille Nov 3, 2025 •

edited

Loading