use multigpu(8*A800 80G) train flux_train.py OOM problem

when i use flux_train.py to train my flux-full-finetune model, i use --optimizer_type adamw8bit & --batch_size  1, this situation always meets OOM.but also, single gpu trainning can use --optimizer_type adamw8bit & --batch_size  8, and the gpu-using size almost 79g. How can i fix the multi-gpu OOM problem? thanks for your replay~