Skip to content

Conversation

@codewithdark-git
Copy link
Owner

No description provided.

Addresses an AttributeError in AWQ quantization where QuantizedLinear,
an nn.Module, was incorrectly passed to move_to_device, which expects
a tensor. This change ensures QuantizedLinear modules are moved to the
target device using the correct .to(device) method.

Additionally, this commit includes updates to the documentation:
- Docs for AWQ quantization were updated to include parameters like scale_dtype, enable_mnn_kernel, and batch_size.
- Clarified inference procedures for AWQ-quantized models.
- README.md was updated to list AWQ as a supported method and the roadmap was revised.
Extends the previous fix for AWQ to GPTQ and GGUF quantizers.
Addresses an AttributeError where QuantizedLinear (an nn.Module)
was incorrectly passed to `move_to_device`, a function expecting
a tensor. This change ensures QuantizedLinear modules are moved to
their target device using the correct `.to(device)` method in
AWQ, GPTQ, and GGUF quantizers.

This commit ensures consistent and correct device handling for
quantized layers created by these methods.
@codewithdark-git codewithdark-git merged commit a55dc1e into main May 25, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants