4 bit Adam should support non constant lr

Our low bit optimizers were merged in HF https://github.com/huggingface/transformers/pull/31865 but

We have a known limitation that the 4 bit optimizer is not great when we don't have a constant learning rate

This is mentioned in the README https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim

_Known issue: When learning rate is updated every step (e.g. using cosine learning rate scheduler), training speed is slower. This is because we have to convert learning rate to a CUDA tensor (which incurs expensive memory transfer cost), since torch.compile() will treat a Python float as a constant and trigger recompile whenever the value is changed._

However this is preventing @winglian from adopting this work

cc @gau-nernst @mlazos @janeyx99 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4 bit Adam should support non constant lr #730

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

4 bit Adam should support non constant lr #730

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions