Skip to content

Optimize RAM usage of AutoScheme #912

@wenhuach21

Description

@wenhuach21
Models Scheme VRAM Cost
(torch compile)
Time Cost
torch compile
VRAM Cost
wo torch compile
Time Cost
wo torch compile
Qwen3-8B W2A16/W4A16/W8A16 34G 30s * len of options 61G 40s * len of options
Qwen3-8B MXFP4/MXFP8 36G 60s * len of options 54G 120s * len of options
Qwen3-8B GGUF* 54G 30s * len of options 50G 23S * len of options
Qwen3-32B W2A16/W4A16/W8A16 OOM with 240G --- OOM with 240G ---
Qwen3-32B MXFP4/MXFP8 160G 200s * len of options 200G 240s * len of options
Qwen3-32B GGUF* 210G 80s * len of options 200G 60s * len of options

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions