Closed as not planned
Description
ExLlama (https://github.com/turboderp/exllama)
It's currently the fastest and most memory-efficient executor of models that I'm aware of.
Is there an interest from the maintainers in adding this support?
ExLlama (https://github.com/turboderp/exllama)
It's currently the fastest and most memory-efficient executor of models that I'm aware of.
Is there an interest from the maintainers in adding this support?