Open
Description
hi all, i was giving the CPUOffloadOptimizer a try and found two issues when using with QLoRA single device in torchtune:
- When using a LR scheduler i got. Maybe there is a way to inherit the optimizer class?
File "/data/users/felipemello/torchtune/torchtune/training/lr_schedulers.py", line 58, in get_cosine_schedule_with_warmup
return LambdaLR(optimizer, lr_lambda, last_epoch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/felipemello/.conda/envs/torchtune/lib/python3.11/site-packages/torch/optim/lr_scheduler.py", line 336, in __init__
super().__init__(optimizer, last_epoch, verbose)
File "/home/felipemello/.conda/envs/torchtune/lib/python3.11/site-packages/torch/optim/lr_scheduler.py", line 99, in __init__
raise TypeError(f"{type(optimizer).__name__} is not an Optimizer")
TypeError: CPUOffloadOptimizer is not an Optimizer
- When passing model.params() i got the error below. I imagine that a simple fix is to keep only params that require grad, like adamw implementation oes
File "/home/felipemello/.conda/envs/torchtune/lib/python3.11/site-packages/torchao/prototype/low_bit_optim/cpu_offload.py", line 76, in __init__
p_cuda.register_post_accumulate_grad_hook(backward_hook)
File "/home/felipemello/.conda/envs/torchtune/lib/python3.11/site-packages/torch/_tensor.py", line 678, in register_post_accumulate_grad_hook
raise RuntimeError(
RuntimeError: cannot register a hook on a tensor that doesn't require gradient
cc: @gau-nernst