Skip to content

4 bit Adam should support non constant lr #730

Closed
@msaroufim

Description

@msaroufim

Our low bit optimizers were merged in HF huggingface/transformers#31865 but

We have a known limitation that the 4 bit optimizer is not great when we don't have a constant learning rate

This is mentioned in the README https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim

Known issue: When learning rate is updated every step (e.g. using cosine learning rate scheduler), training speed is slower. This is because we have to convert learning rate to a CUDA tensor (which incurs expensive memory transfer cost), since torch.compile() will treat a Python float as a constant and trigger recompile whenever the value is changed.

However this is preventing @winglian from adopting this work

cc @gau-nernst @mlazos @janeyx99

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions