Fix/awq quantized linear device issue #6

codewithdark-git · 2025-05-25T07:49:37Z

No description provided.

Addresses an AttributeError in AWQ quantization where QuantizedLinear, an nn.Module, was incorrectly passed to move_to_device, which expects a tensor. This change ensures QuantizedLinear modules are moved to the target device using the correct .to(device) method. Additionally, this commit includes updates to the documentation: - Docs for AWQ quantization were updated to include parameters like scale_dtype, enable_mnn_kernel, and batch_size. - Clarified inference procedures for AWQ-quantized models. - README.md was updated to list AWQ as a supported method and the roadmap was revised.

Extends the previous fix for AWQ to GPTQ and GGUF quantizers. Addresses an AttributeError where QuantizedLinear (an nn.Module) was incorrectly passed to `move_to_device`, a function expecting a tensor. This change ensures QuantizedLinear modules are moved to their target device using the correct `.to(device)` method in AWQ, GPTQ, and GGUF quantizers. This commit ensures consistent and correct device handling for quantized layers created by these methods.

google-labs-jules bot added 2 commits May 25, 2025 07:20

codewithdark-git merged commit a55dc1e into main May 25, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix/awq quantized linear device issue #6

Fix/awq quantized linear device issue #6

Uh oh!

codewithdark-git commented May 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Fix/awq quantized linear device issue #6

Fix/awq quantized linear device issue #6

Uh oh!

Conversation

codewithdark-git commented May 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants