-
Notifications
You must be signed in to change notification settings - Fork 59
Open
Description
| Models | Scheme | VRAM Cost (torch compile) |
Time Cost torch compile |
VRAM Cost wo torch compile |
Time Cost wo torch compile |
|---|---|---|---|---|---|
| Qwen3-8B | W2A16/W4A16/W8A16 | 34G | 30s * len of options | 61G | 40s * len of options |
| Qwen3-8B | MXFP4/MXFP8 | 36G | 60s * len of options | 54G | 120s * len of options |
| Qwen3-8B | GGUF* | 54G | 30s * len of options | 50G | 23S * len of options |
| Qwen3-32B | W2A16/W4A16/W8A16 | OOM with 240G | --- | OOM with 240G | --- |
| Qwen3-32B | MXFP4/MXFP8 | 160G | 200s * len of options | 200G | 240s * len of options |
| Qwen3-32B | GGUF* | 210G | 80s * len of options | 200G | 60s * len of options |