You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As far as I know vllm and ray doesn't support 8-bit quantization as of now. I think it's the most viable quantization technique out there and should be implemented for faster inference and reduced memory usage.
gravitywp, felixstander, mymusise, kitty-eu-org, pdan93 and 45 morefelixstander, wyhanz, PenutChen, lin72h, kitty-eu-org and 13 more