Skip to content

CPUoffloadOptimizer issues #1209

Open
Open
@felipemello1

Description

@felipemello1

hi all, i was giving the CPUOffloadOptimizer a try and found two issues when using with QLoRA single device in torchtune:

  1. When using a LR scheduler i got. Maybe there is a way to inherit the optimizer class?
File "/data/users/felipemello/torchtune/torchtune/training/lr_schedulers.py", line 58, in get_cosine_schedule_with_warmup
    return LambdaLR(optimizer, lr_lambda, last_epoch)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/felipemello/.conda/envs/torchtune/lib/python3.11/site-packages/torch/optim/lr_scheduler.py", line 336, in __init__
    super().__init__(optimizer, last_epoch, verbose)
  File "/home/felipemello/.conda/envs/torchtune/lib/python3.11/site-packages/torch/optim/lr_scheduler.py", line 99, in __init__
    raise TypeError(f"{type(optimizer).__name__} is not an Optimizer")
TypeError: CPUOffloadOptimizer is not an Optimizer
  1. When passing model.params() i got the error below. I imagine that a simple fix is to keep only params that require grad, like adamw implementation oes
  File "/home/felipemello/.conda/envs/torchtune/lib/python3.11/site-packages/torchao/prototype/low_bit_optim/cpu_offload.py", line 76, in __init__
    p_cuda.register_post_accumulate_grad_hook(backward_hook)
  File "/home/felipemello/.conda/envs/torchtune/lib/python3.11/site-packages/torch/_tensor.py", line 678, in register_post_accumulate_grad_hook
    raise RuntimeError(
RuntimeError: cannot register a hook on a tensor that doesn't require gradient

cc: @gau-nernst

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions