Skip to content

NF4 quantization slower on 0.3 vs 0.1 #642

@ebsmothers

Description

@ebsmothers

Hi, we're observing a slowdown in our torchtune QLoRA recipe initialization after changing from version 0.1 to 0.3 (I haven't checked 0.4 yet but will do so shortly). This was first pointed out in pytorch/torchtune#1246 and I believe the cause is coming from some changes in torchao.

Repro: from a torchtune git install

# Just some commit hash from right before we upgraded to 0.3
git checkout 52e328337579e9b84ba7f2448b29a6de7c5d8db3
pip install torchao==0.1

# Save time.perf_counter() on init and then log the delta with perf_counter()
# here: https://github.com/pytorch/torchtune/blob/0a407712eda252573326074d33af0a66c2d2990e/recipes/lora_finetune_single_device.py#L539
tune run lora_finetune_single_device --config llama3/8B_qlora_single_device
>>> 15.1960636760341

# Do the same on 0.3
pip install torchao==0.3
# also need to comment some quant APIs out to fix import errors
tune run lora_finetune_single_device --config llama3/8B_qlora_single_device
>>> 95.78260190901347

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions