diff --git a/README.md b/README.md index ffeae3ea..f1c0c2bb 100644 --- a/README.md +++ b/README.md @@ -117,7 +117,7 @@ curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" - ### LLM serving LitServe isn’t *just* for LLMs like vLLM or Ollama; it serves any AI model with full control over internals ([learn more](https://lightning.ai/docs/litserve/features/serve-llms)). -For easy LLM serving, use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm) (built on LitServe). +For easy LLM serving, integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), or use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm) (built on LitServe). ``` litgpt serve microsoft/phi-2 @@ -146,7 +146,7 @@ Use LitServe to deploy any model or AI service: (Gen AI, classical ML, embedding
Toy model: Hello world LLMs: Llama 3.2, LLM Proxy server, Agent with tool use -RAG: RAG API (LlamaIndex) +RAG: vLLM RAG (Llama 3.2), RAG API (LlamaIndex) NLP: Hugging face, BERT, Text embedding API Multimodal: OpenAI Clip, MiniCPM, Phi-3.5 Vision Instruct, Qwen2-VL, Pixtral Audio: Whisper, AudioCraft, StableAudio, Noise cancellation (DeepFilterNet) @@ -201,7 +201,7 @@ Reproduce the full benchmarks [here](https://lightning.ai/docs/litserve/home/ben These results are for image and text classification ML tasks. The performance relationships hold for other ML tasks (embedding, LLM serving, audio, segmentation, object detection, summarization etc...). -***💡 Note on LLM serving:*** For high-performance LLM serving (like Ollama/VLLM), use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm) or build your custom VLLM-like server with LitServe. Optimizations like kv-caching, which can be done with LitServe, are needed to maximize LLM performance. +***💡 Note on LLM serving:*** For high-performance LLM serving (like Ollama/VLLM), integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), or use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm) or build your custom VLLM-like server with LitServe. Optimizations like kv-caching, which can be done with LitServe, are needed to maximize LLM performance.