Open
Description
Autoquant fails when CPU packages are used. Tried with the latest nightly packages by installing torchao and torch using the below:
pip install --pre torchao-nightly torch --index-url https://download.pytorch.org/whl/nightly/cpu
I modified the simple example from the documentation for CPU as below:
import torch
import torchao
# Plug in your model and example input
model = torch.nn.Sequential(torch.nn.Linear(32, 64))
input = torch.randn(32,32)
# perform autoquantization and torch.compile
model = torchao.autoquant(torch.compile(model, mode='max-autotune'))
# pass in an input which is used in order to pick fastest quantization operations
# and apply torch compilation.
model(input)
Above script gives the below output:
activation_shapes: torch.Size([32, 32]), times_seen: 1
weight_shape: torch.Size([64, 32]), dtype: torch.float32, bias_shape: torch.Size([64])
warning: failed to autoquant AQFloatLinearWeight for shape: (torch.Size([32, 32]), torch.Size([64, 32]), torch.Size([64]), torch.float32) due to Torch not compiled with CUDA enabled
warning: failed to autoquant AQWeightOnlyQuantizedLinearWeight for shape: (torch.Size([32, 32]), torch.Size([64, 32]), torch.Size([64]), torch.float32) due to Torch not compiled with CUDA enabled
warning: failed to autoquant AQWeightOnlyQuantizedLinearWeight2 for shape: (torch.Size([32, 32]), torch.Size([64, 32]), torch.Size([64]), torch.float32) due to Torch not compiled with CUDA enabled
warning: failed to autoquant AQInt8DynamicallyQuantizedLinearWeight for shape: (torch.Size([32, 32]), torch.Size([64, 32]), torch.Size([64]), torch.float32) due to Torch not compiled with CUDA enabled
best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'>
Hope I am not missing any steps here. Does autoquant support CPU? Hope someone can give me some advice. Thank you.