Tensor Parallelism Training and Inference #2445

vince62s · 2023-07-19T14:46:40Z

Goal is to use torch.distributed and split Modules when possible (along Columns or Rows) so that bigger models can be split across multiple GPU.
New option: parallel_mode = "data_parallel" by default, and "tensor_parallel" to activate tensor parallelism.

eval_llm/MMLU/run_mmlu_opennmt.py

onmt/model_builder.py

onmt/models/model.py

onmt/modules/multi_headed_attn.py

onmt/utils/distributed.py

francoishernandez · 2023-08-01T14:55:33Z

eval_llm/MMLU/run_mmlu_opennmt.py

@@ -7,13 +7,14 @@
 import time
 import pandas as pd
 from onmt.utils.logging import init_logger
-from onmt.utils.distributed import MPInference
+from onmt.inference_engine import InferenceEngine


I think we miss the new inference_engine file in this commit 🙃

Tensor Parralelism Training and Inference

fa592e7

vince62s changed the title ~~[WIP] Tensor Parralelism Training and Inference~~ [WIP] Tensor Parallelism Training and Inference Jul 19, 2023

francoishernandez reviewed Jul 19, 2023

View reviewed changes

eval_llm/MMLU/run_mmlu_opennmt.py Show resolved Hide resolved

onmt/model_builder.py Outdated Show resolved Hide resolved

onmt/models/model.py Show resolved Hide resolved

onmt/modules/multi_headed_attn.py Show resolved Hide resolved

onmt/utils/distributed.py Outdated Show resolved Hide resolved

vince62s changed the title ~~[WIP] Tensor Parallelism Training and Inference~~ Tensor Parallelism Training and Inference Jul 19, 2023

vince62s added 7 commits July 19, 2023 22:27

fix comments

bfbb42d

hotfix

81baffb

Merge branch 'master' into tp

c51e1dd

Merge branch 'master' into tp

f37a58d

better code and results

4fedfaa

remove comment

8f1eadb

results 70B

7165e92

vince62s merged commit 4958bb1 into OpenNMT:master Jul 22, 2023

vince62s deleted the tp branch July 26, 2023 15:20

francoishernandez reviewed Aug 1, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor Parallelism Training and Inference #2445

Tensor Parallelism Training and Inference #2445

vince62s commented Jul 19, 2023 •

edited

Loading

francoishernandez Aug 1, 2023

Tensor Parallelism Training and Inference #2445

Tensor Parallelism Training and Inference #2445

Conversation

vince62s commented Jul 19, 2023 • edited Loading

francoishernandez Aug 1, 2023

Choose a reason for hiding this comment

vince62s commented Jul 19, 2023 •

edited

Loading