Skip to content

Commit 29a12cf

Browse files
authored
cherry-pick speculative decoding related PR #133 and #135 (#136)
* docs: move Constrained_Decoding and Function_Calling to Feature_Guide | rm AI_Agents_Guide folder (#135) * docs: Add EAGLE/SpS Speculative Decoding support with vLLM (#133)
1 parent f6fd598 commit 29a12cf

File tree

20 files changed

+356
-79
lines changed

20 files changed

+356
-79
lines changed

AI_Agents_Guide/README.md

Lines changed: 0 additions & 62 deletions
This file was deleted.
File renamed without changes.
File renamed without changes.

Feature_Guide/Speculative_Decoding/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,4 +54,6 @@ may prove simpler than generating a summary for an article. [Spec-Bench](https:/
5454
shows the performance of different speculative decoding approaches on different tasks.
5555

5656
## Speculative Decoding with Triton Inference Server
57-
Follow [here](TRT-LLM/README.md) to learn how Triton Inference Server supports speculative decoding with [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM).
57+
Triton Inference Server supports speculative decoding on different types of Triton backends. See what a Triton backend is [here](https://github.com/triton-inference-server/backend).
58+
- Follow [here](TRT-LLM/README.md) to learn how Triton Inference Server supports speculative decoding with [TensorRT-LLM Backend](https://github.com/triton-inference-server/tensorrtllm_backend).
59+
- Follow [here](vLLM/README.md) to learn how Triton Inference Server supports speculative decoding with [vLLM Backend](https://github.com/triton-inference-server/vllm_backend).

0 commit comments

Comments
 (0)