Skip to content

Commit 824a21f

Browse files
update readme for sglang support (#953)
* update readme for sglang support Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine doc Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * Update README.md --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: Wenhua Cheng <wenhua.cheng@intel.com>
1 parent f1b5c72 commit 824a21f

File tree

1 file changed

+23
-1
lines changed

1 file changed

+23
-1
lines changed

README.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ and [fbaldassarri](https://huggingface.co/fbaldassarri). For usage instructions,
2727

2828

2929
## 🆕 What's New
30+
[2025/10] AutoRound has been integrated into **SGLang**. You can now run models in the AutoRound format directly using the latest SGLang later than v0.5.4.
31+
3032
[2025/10] We enhanced the RTN mode (--iters 0) to significantly reduce quantization cost compared to the default tuning mode. Check out [this doc](./docs/opt_rtn.md) for some accuracy results. If you don’t have sufficient resources, you can use this mode for 4-bit quantization.
3133

3234
[2025/10] We proposed a fast algorithm to generate **mixed bits/datatypes** schemes in minutes. Please
@@ -268,7 +270,6 @@ ar.quantize_and_save(output_dir)
268270
## Model Inference
269271

270272
### vLLM (CPU/Intel GPU/CUDA)
271-
Please note that support for the MoE models and visual language models is currently limited.
272273
```python
273274
from vllm import LLM, SamplingParams
274275

@@ -287,6 +288,26 @@ for output in outputs:
287288
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
288289
```
289290

291+
292+
### SGLang (Intel GPU/CUDA)
293+
Please note that support for the MoE models and visual language models is currently limited.
294+
295+
```python
296+
import sglang as sgl
297+
298+
llm = sgl.Engine(model_path="Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound")
299+
prompts = [
300+
"Hello, my name is",
301+
]
302+
sampling_params = {"temperature": 0.6, "top_p": 0.95}
303+
304+
outputs = llm.generate(prompts, sampling_params)
305+
for prompt, output in zip(prompts, outputs):
306+
print("===============================")
307+
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
308+
```
309+
310+
290311
### Transformers (CPU/Intel GPU/Gaudi/CUDA)
291312

292313

@@ -318,3 +339,4 @@ If you find AutoRound helpful, please ⭐ star the repo and share it with your c
318339

319340

320341

342+

0 commit comments

Comments
 (0)