Skip to content

Commit 1a58087

Browse files
Reduce memory usage for fp8 scaled op. (#10531)
1 parent 6c14f3a commit 1a58087

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

comfy/quant_ops.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -358,7 +358,7 @@ def quantize(cls, tensor, scale=None, dtype=torch.float8_e4m3fn):
358358
scale = scale.to(device=tensor.device, dtype=torch.float32)
359359

360360
lp_amax = torch.finfo(dtype).max
361-
tensor_scaled = tensor.float() / scale
361+
tensor_scaled = tensor * (1.0 / scale).to(tensor.dtype)
362362
torch.clamp(tensor_scaled, min=-lp_amax, max=lp_amax, out=tensor_scaled)
363363
qdata = tensor_scaled.to(dtype, memory_format=torch.contiguous_format)
364364

0 commit comments

Comments
 (0)