Skip to content

Commit 5c28b33

Browse files
committed
update and fix README
1 parent 2d768aa commit 5c28b33

File tree

1 file changed

+8
-80
lines changed

1 file changed

+8
-80
lines changed

README.md

Lines changed: 8 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -20,94 +20,22 @@ LightLLM is a Python-based LLM (Large Language Model) inference and serving fram
2020

2121
[English Docs](https://lightllm-en.readthedocs.io/en/latest/) | [中文文档](https://lightllm-cn.readthedocs.io/en/latest/) | [Blogs](https://modeltc.github.io/lightllm-blog/)
2222

23-
## Features
24-
25-
- Tri-process asynchronous collaboration: tokenization, model inference, and detokenization are performed asynchronously, leading to a considerable improvement in GPU utilization.
26-
- Nopad (Unpad): offers support for nopad attention operations across multiple models to efficiently handle requests with large length disparities.
27-
- Dynamic Batch: enables dynamic batch scheduling of requests
28-
- [FlashAttention](https://github.com/Dao-AILab/flash-attention): incorporates FlashAttention to improve speed and reduce GPU memory footprint during inference.
29-
- Tensor Parallelism: utilizes tensor parallelism over multiple GPUs for faster inference.
30-
- [Token Attention](./docs/TokenAttention.md): implements token-wise's KV cache memory management mechanism, allowing for zero memory waste during inference.
31-
- High-performance Router: collaborates with Token Attention to meticulously manage the GPU memory of each token, thereby optimizing system throughput.
32-
- Int8KV Cache: This feature will increase the capacity of tokens to almost twice as much. only llama support.
33-
34-
## Supported Model List
35-
36-
The following table provides a list of supported models along with any special arguments required for their configuration and annotations.
37-
38-
| Model Name | Comments |
39-
|--------------------------------|-------------------------------------------------------------------------------------------------------|
40-
| [BLOOM](https://huggingface.co/bigscience/bloom) | None |
41-
| [LLaMA](https://github.com/facebookresearch/llama) | None |
42-
| [LLaMA V2](https://huggingface.co/meta-llama) | None |
43-
| [StarCoder](https://github.com/bigcode-project/starcoder) | None |
44-
| [Qwen-7b](https://github.com/QwenLM/Qwen-7B) | `--eos_id 151643 --trust_remote_code` |
45-
| [ChatGLM2-6b](https://github.com/THUDM/ChatGLM2-6B) | `--trust_remote_code` |
46-
| [InternLM-7b](https://github.com/InternLM/InternLM) | `--trust_remote_code` |
47-
| [InternVL-Chat](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5) | `--eos_id 32007 --trust_remote_code` (Phi3) or `--eos_id 92542 --trust_remote_code` (InternLM2) |
48-
| [Qwen-VL](https://huggingface.co/Qwen/Qwen-VL) | None |
49-
| [Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) | None |
50-
| [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) | `--eos_id 151645 --trust_remote_code`, and run `pip install git+https://github.com/huggingface/transformers` |
51-
| [Llava-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b) | None |
52-
| [Llava-13b](https://huggingface.co/liuhaotian/llava-v1.5-13b) | None |
53-
| [Mixtral](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) | None |
54-
| [Stablelm](https://huggingface.co/stabilityai/stablelm-2-1_6b) | `--trust_remote_code` |
55-
| [MiniCPM](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) | None |
56-
| [Phi-3](https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3) | Only supports Mini and Small |
57-
| [CohereForAI](https://huggingface.co/CohereForAI/c4ai-command-r-plus) | None |
58-
| [DeepSeek-V2-Lite](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite) | `--data_type bfloat16` |
59-
| [DeepSeek-V2](https://huggingface.co/deepseek-ai/DeepSeek-V2) | `--data_type bfloat16` |
23+
## News
24+
- [2025/02] 🔥 LightLLM v1.0.0 release, achieving the **fastest DeepSeek-R1** serving performance on single H200 machine.
6025

6126
## Get started
6227

63-
### Installation
64-
65-
Use lightllm with `docker`.
66-
67-
```shell
68-
docker pull ghcr.io/modeltc/lightllm:main
69-
```
70-
71-
To start a container with GPU support and port mapping:
72-
73-
```shell
74-
docker run -it --gpus all -p 8080:8080 \
75-
--shm-size 1g -v your_local_path:/data/ \
76-
ghcr.io/modeltc/lightllm:main /bin/bash
77-
```
78-
79-
80-
Note: If multiple GPUs are used, `--shm-size` in `docker run` command should be increased.
81-
82-
83-
Alternatively, you can [build the docker image](https://lightllm-en.readthedocs.io/en/latest/getting_started/installation.html#installing-with-docker) or [install from source with pip](https://lightllm-en.readthedocs.io/en/latest/getting_started/installation.html#installing-from-source).
84-
85-
### Quick Start
86-
87-
Lightllm provides LLM inference services with the state-of-the-art throughput performance via efficient request routers and TokenAttention.
88-
89-
We provide examples to launch the LightLLM service and query the model (via console and python) for both text and multimodal models.
90-
28+
- [Install LightLLM](https://lightllm-en.readthedocs.io/en/latest/getting_started/installation.html)
9129
- [Quick Start](https://lightllm-en.readthedocs.io/en/latest/getting_started/quickstart.html)
92-
- [Text Model Service](https://lightllm-en.readthedocs.io/en/latest/models/test.html#llama)
93-
- [Multimodal Model Service](https://lightllm-en.readthedocs.io/en/latest/models/test.html#llava)
30+
- [LLM Service](https://lightllm-en.readthedocs.io/en/latest/models/test.html#llama)
31+
- [VLM Service](https://lightllm-en.readthedocs.io/en/latest/models/test.html#llava)
9432

95-
Note: additional parameters for multimodal models (`--enable_multimodal`, `--cache_capacity`) require larger `--shm-size`.
96-
If the lightllm is run with `--tp > 1`, the visual model will run on the gpu 0.
97-
Input images format: list for dict like `{'type': 'url'/'base64', 'data': xxx}`
98-
The special image tag for Qwen-VL is `<img></img>` (`<image>` for Llava), the length of `data["multimodal_params"]["images"]` should be the same as the count of tags, The number can be 0, 1, 2, ...
99-
100-
101-
### Other
102-
103-
Please refer to the [documentation](https://lightllm-en.readthedocs.io/en/latest/) for more information.
10433

10534
## Performance
10635

107-
Lightllm provides high throughput services. The performance comparison between LightLLM and vLLM is shown [here](https://lightllm-en.readthedocs.io/en/latest/dev/performance.html). Up to vllm=0.1.2, we have achieved a 2x larger throughput than vLLM.
108-
36+
Learn more in the release blogs: [v1.0.0 blog](https://www.light-ai.top/lightllm-blog//by%20mtc%20team/2025/02/16/lightllm/).
10937

110-
### FAQ
38+
## FAQ
11139

11240
Please refer to the [FAQ](https://lightllm-en.readthedocs.io/en/latest/faq.html) for more information.
11341

@@ -138,7 +66,7 @@ We welcome any coopoeration and contribution. If there is a project requires lig
13866

13967
## Community
14068

141-
For further information and discussion, [join our discord server](https://discord.gg/WzzfwVSguU).
69+
For further information and discussion, [join our discord server](https://discord.gg/WzzfwVSguU). Welcome to be a member and look forward to your contribution!
14270

14371
## License
14472

0 commit comments

Comments
 (0)