From 4232377cde69713c1b9a222005614d56c16ded2d Mon Sep 17 00:00:00 2001 From: Aniket Maurya Date: Sun, 29 Sep 2024 17:41:58 +0530 Subject: [PATCH] fix vLLM capitalization (#303) --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 702f58d8..3e5da022 100644 --- a/README.md +++ b/README.md @@ -202,7 +202,7 @@ Reproduce the full benchmarks [here](https://lightning.ai/docs/litserve/home/ben These results are for image and text classification ML tasks. The performance relationships hold for other ML tasks (embedding, LLM serving, audio, segmentation, object detection, summarization etc...). -***💡 Note on LLM serving:*** For high-performance LLM serving (like Ollama/VLLM), integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm), or build your custom VLLM-like server with LitServe. Optimizations like kv-caching, which can be done with LitServe, are needed to maximize LLM performance. +***💡 Note on LLM serving:*** For high-performance LLM serving (like Ollama/vLLM), integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm), or build your custom vLLM-like server with LitServe. Optimizations like kv-caching, which can be done with LitServe, are needed to maximize LLM performance.