Skip to content

Increased VRAM consumption when coupled with DDP #1555

Open
@TParcollet

Description

@TParcollet

System Info

Hello there,

I'm fine-tuning a Llama 3 model from HuggingFace with PeFT and BitsAndBytes. Interestingly, when wrapping the model with DDP, the training end up taking more VRAM on the master GPU. More interestingly, this VRAM increases with the number of GPUs. Do you see any reason why this could happen?

Reproduction

Not easy to produce

Expected behavior

VRAM consumption is constant w.r.t number of DDP processes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions