Skip to content
This repository was archived by the owner on Mar 13, 2025. It is now read-only.

Commit 436f478

Browse files
committed
rename
1 parent 0557452 commit 436f478

File tree

2 files changed

+1
-1
lines changed

2 files changed

+1
-1
lines changed

models/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Engine is the abstraction for interacting with a model. It is responsible for sc
3232

3333
The `engine_config` section specifies the Hugging Face model ID (`model_id`), how to initialize it and what parameters to use when generating tokens with an LLM.
3434

35-
RayLLM supports continuous batching, meaning incoming requests are processed as soon as they arrive, and can be added to batches that are already being processed. This means that the model is not slowed down by certain sentences taking longer to generate than others. RayLLM also supports quantization, meaning compressed models can be deployed with cheaper hardware requirements. For more details on using quantized models in RayLLM, see the [quantization guide](continuous_batching/quantization/quantization.md).
35+
RayLLM supports continuous batching, meaning incoming requests are processed as soon as they arrive, and can be added to batches that are already being processed. This means that the model is not slowed down by certain sentences taking longer to generate than others. RayLLM also supports quantization, meaning compressed models can be deployed with cheaper hardware requirements. For more details on using quantized models in RayLLM, see the [quantization guide](continuous_batching/quantization/README.md).
3636

3737
* `model_id` is the ID that refers to the model in the RayLLM or OpenAI API.
3838
* `type` is the type of inference engine. Only `VLLMEngine` is currently supported.
File renamed without changes.

0 commit comments

Comments
 (0)