Skip to content

make quantize_.set_inductor_config None by default #1716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions torchao/quantization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -386,6 +386,9 @@ The benchmarks below were run on a single NVIDIA-A6000 GPU.
You try can out these apis with the `quantize_` api as above alongside the constructor `codebook_weight_only` an example can be found in in `torchao/_models/llama/generate.py`.

### Automatic Inductor Configuration

:warning: <em>This functionality is being migrated from the top level `quantize_` API to individual workflows, see https://github.com/pytorch/ao/issues/1715 for more details.</em>

The `quantize_` and `autoquant` apis now automatically use our recommended inductor configuration setings. You can mimic the same configuration settings for your own experiments by using the `torchao.quantization.utils.recommended_inductor_config_setter` to replicate our recommended configuration settings. Alternatively if you wish to disable these recommended settings, you can use the key word argument `set_inductor_config` and set it to false in the `quantize_` or `autoquant` apis to prevent assignment of those configuration settings. You can also overwrite these configuration settings after they are assigned if you so desire, as long as they are overwritten before passing any inputs to the torch.compiled model. This means that previous flows which referenced a variety of inductor configurations that needed to be set are now outdated, though continuing to manually set those same inductor configurations is unlikely to cause any issues.

## (To be moved to prototype) A16W4 WeightOnly Quantization with GPTQ
Expand Down
13 changes: 11 additions & 2 deletions torchao/quantization/quant_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -488,7 +488,7 @@ def quantize_(
model: torch.nn.Module,
config: Union[AOBaseConfig, Callable[[torch.nn.Module], torch.nn.Module]],
filter_fn: Optional[Callable[[torch.nn.Module, str], bool]] = None,
set_inductor_config: bool = True,
set_inductor_config: Optional[bool] = None,
device: Optional[torch.types.Device] = None,
):
"""Convert the weight of linear modules in the model with `config`, model is modified inplace
Expand All @@ -498,7 +498,7 @@ def quantize_(
config (Union[AOBaseConfig, Callable[[torch.nn.Module], torch.nn.Module]]): either (1) a workflow configuration object or (2) a function that applies tensor subclass conversion to the weight of a module and return the module (e.g. convert the weight tensor of linear to affine quantized tensor). Note: (2) will be deleted in a future release.
filter_fn (Optional[Callable[[torch.nn.Module, str], bool]]): function that takes a nn.Module instance and fully qualified name of the module, returns True if we want to run `config` on
the weight of the module
set_inductor_config (bool, optional): Whether to automatically use recommended inductor config settings (defaults to True)
set_inductor_config (bool, optional): Whether to automatically use recommended inductor config settings (defaults to None)
device (device, optional): Device to move module to before applying `filter_fn`. This can be set to `"cuda"` to speed up quantization. The final model will be on the specified `device`.
Defaults to None (do not change device).

Expand All @@ -522,6 +522,15 @@ def quantize_(
quantize_(m, int4_weight_only(group_size=32))

"""
if set_inductor_config != None:
warnings.warn(
"""The `set_inductor_config` argument to `quantize_` will be removed in a future release. This functionality is being migrated to individual workflows. Please see https://github.com/pytorch/ao/issues/1715 for more details."""
)
else: # None
# for now, default to True to not change existing behavior when the
# argument is not specified
set_inductor_config = True

if set_inductor_config:
torchao.quantization.utils.recommended_inductor_config_setter()

Expand Down
Loading