Replies: 1 comment
-
@hipudding is there something we can help you with to make it happen? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Ascends NPUs seems to be a great alternative (to Macstudio and epyc) to run quantized R1.
For example: Atlas 300I Duo offers 140TFLOPS fp16 408GB/s mem bandwidth + 96G Vram.
2 of this card onto a PC could run the quantized 671B R1 relatively well I would say.
However, as shown in https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/CANN.md, there is no deepseek architecture support yet, and low bit quantization seems to be not validated yet.
@hipudding Do you have plan on porting low-bit quantized R1 to Ascend cards, via gguf-cann backend?
That seems a pretty valid use case to me...
Beta Was this translation helpful? Give feedback.
All reactions