Closed
Description
With https://github.com/huggingface/text-generation-inference adopting a less friendly license, this seems like a good opportunity to add best effort support for all Hugging Face transformers
models that generate text e.g., via AutoModelForCausalLM
and AutoModelForSeq2SeqLM
. This would allow them to take advantage of vLLM's other serving features while specific models can retain optimized implementations or gain them as they are implemented
- https://github.com/huggingface/text-generation-inference/blob/ecf6dc3a5a31c1b0e1ed48988ddf2416b5e35660/server/text_generation_server/models/causal_lm.py#L451
- https://github.com/huggingface/text-generation-inference/blob/ecf6dc3a5a31c1b0e1ed48988ddf2416b5e35660/server/text_generation_server/models/seq2seq_lm.py#L501