Skip to content

[Docs] More information regarding text generation & LLM inference #2564

Open
@jaywonchung

Description

@jaywonchung

📚 The doc issue

I am new to TorchServe and was looking for some features that I need to be able to consider using TorchServe for LLM text generation.

Today, there are a couple inference serving solutions out there, including text-generation-inference and vLLM. It would be great if the documentation can mention how TorchServe compares with these at the moment. For instance,

  • Does TorchServe support continuous batching?
  • Does TorchServe support paged attention?
  • Does TorchServe support streaming generated text through its inference API?
  • What are some LLMs that TorchServe is known to work well with, e.g. Llama2, Falcon? Apart from the Hugging Face integration example provided.

Suggest a potential alternative/fix

A dedicated page for text generation and LLM inference could make sense given that there would be a lot of people interested in this.

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationllmquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions