[Docs] More information regarding text generation & LLM inference

### 📚 The doc issue

I am new to TorchServe and was looking for some features that I need to be able to consider using TorchServe for LLM text generation.

Today, there are a couple inference serving solutions out there, including [text-generation-inference](https://github.com/huggingface/text-generation-inference) and [vLLM](https://vllm.ai). It would be great if the documentation can mention how TorchServe compares with these at the moment. For instance,

- Does TorchServe support continuous batching?
- Does TorchServe support paged attention?
- Does TorchServe support streaming generated text through its inference API?
- What are some LLMs that TorchServe is known to work well with, e.g. Llama2, Falcon? Apart from the Hugging Face integration example provided.

### Suggest a potential alternative/fix

A dedicated page for text generation and LLM inference could make sense given that there would be a lot of people interested in this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Docs] More information regarding text generation & LLM inference #2564

📚 The doc issue

Suggest a potential alternative/fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Docs] More information regarding text generation & LLM inference #2564

Description

📚 The doc issue

Suggest a potential alternative/fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions