[FIX] Fix multi-modal training #648

lianqing11 · 2023-09-21T02:47:59Z

support multi-gpu training for multi-modal training
Support Deepspeed's "zero3" training mode for the model in hf_encoder_decoder.py

research4pan

Added features:

Deepspeed zero3 support
Add alignment for data type/weight type, converting them automatically to match each other (data type will be converted to the same types as weights)
Add alignment for data device/weight device, data devices will be converted to the same device as weights. This enables multi-gpu features.
Add an improved dataset data/llava_instruct_80k_truncated.json, which removes very-long-sentence samples from the original dataset.

research4pan

LGTM, thanks!

lianqing11 added 3 commits September 20, 2023 20:17

support low resource inference and training for multimodal model

45b9f47

fix multigpu finetune and zero3 training

fbee26f

remove script

6c015d9

research4pan reviewed Sep 21, 2023

View reviewed changes

lianqing11 added 2 commits September 21, 2023 17:12

update preprocess script and the path to save language projection

3a1fdef

remove zero3 script

19f4ac0

research4pan approved these changes Sep 21, 2023

View reviewed changes

research4pan merged commit c6b2f14 into main Sep 21, 2023
2 checks passed

research4pan deleted the lianqing/multi_modal_training branch March 31, 2024 09:41

Provide feedback