From 55dc5f5586d838dcaf2861f65e4814dc441db321 Mon Sep 17 00:00:00 2001 From: William Falcon Date: Sat, 28 Sep 2024 10:33:17 -0400 Subject: [PATCH] Update README.md --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index ffeae3ea..f1c0c2bb 100644 --- a/README.md +++ b/README.md @@ -117,7 +117,7 @@ curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" - ### LLM serving LitServe isn’t *just* for LLMs like vLLM or Ollama; it serves any AI model with full control over internals ([learn more](https://lightning.ai/docs/litserve/features/serve-llms)). -For easy LLM serving, use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm) (built on LitServe). +For easy LLM serving, integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), or use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm) (built on LitServe). ``` litgpt serve microsoft/phi-2 @@ -146,7 +146,7 @@ Use LitServe to deploy any model or AI service: (Gen AI, classical ML, embedding
 Toy model:      Hello world
 LLMs:           Llama 3.2, LLM Proxy server, Agent with tool use
-RAG:            RAG API (LlamaIndex)
+RAG:            vLLM RAG (Llama 3.2), RAG API (LlamaIndex)
 NLP:            Hugging face, BERT, Text embedding API
 Multimodal:     OpenAI Clip, MiniCPM, Phi-3.5 Vision Instruct, Qwen2-VL, Pixtral
 Audio:          Whisper, AudioCraft, StableAudio, Noise cancellation (DeepFilterNet)
@@ -201,7 +201,7 @@ Reproduce the full benchmarks [here](https://lightning.ai/docs/litserve/home/ben
 
 These results are for image and text classification ML tasks. The performance relationships hold for other ML tasks (embedding, LLM serving, audio, segmentation, object detection, summarization etc...).   
     
-***💡 Note on LLM serving:*** For high-performance LLM serving (like Ollama/VLLM), use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm) or build your custom VLLM-like server with LitServe. Optimizations like kv-caching, which can be done with LitServe, are needed to maximize LLM performance.
+***💡 Note on LLM serving:*** For high-performance LLM serving (like Ollama/VLLM), integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), or use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm) or build your custom VLLM-like server with LitServe. Optimizations like kv-caching, which can be done with LitServe, are needed to maximize LLM performance.