Skip to content

Commit dd7811e

Browse files
authored
fix severe vram leak regression in auto-round format packing
1 parent 6e8732c commit dd7811e

File tree

1 file changed

+1
-1
lines changed
  • auto_round/export/export_to_autoround

1 file changed

+1
-1
lines changed

auto_round/export/export_to_autoround/export.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -234,7 +234,7 @@ def pack_layer(layer_name, model, backend, device=None):
234234
qlayer.pack(layer, scale, device=device)
235235
else:
236236
qlayer.pack(layer, scale, zp, None, device=device)
237-
qlayer.to(device)
237+
qlayer.to(orig_device)
238238
else:
239239
scale = scale.to(torch.float32).t().contiguous()
240240
if isinstance(zp, torch.Tensor):

0 commit comments

Comments
 (0)