[docs][lmi] update guidance on advanced configurations #1716

siddvenk · 2024-04-01T17:31:24Z

Description

Updating the guidance for our advanced configurations. The feedback from PMs was that our previous documentation didn't help users take action.

The new guidance is as follows:

There are two types of advanced configurations: `LMI`, and `Pass Through`.
`LMI` configurations are processed by LMI and translated into configurations that DeepSpeed uses.
`Pass Through` configurations are passed directly to the backend library. These are opaque configurations from the perspective of the model server and LMI.
We recommend that you file an [issue](https://github.com/deepjavalibrary/djl-serving/issues/new?assignees=&labels=bug&projects=&template=bug_report.md&title=) for any issues you encounter with configurations.
For `LMI` configurations, if we determine an issue with the configuration, we will attempt to provide a workaround for the current released version, and attempt to fix the issue for the next release.
For `Pass Through` configurations it is possible that our investigation reveals an issue with the backend library.
In that situation, there is nothing LMI can do until the issue is fixed in the backend library.

lanking520 · 2024-04-01T19:33:43Z

serving/docs/lmi/deployment_guide/configurations.md

@@ -108,7 +108,7 @@ The following list of configurations is intended to highlight the relevant confi
 | option.revision               | The commit hash of a HuggingFace Hub Model Id. We recommend setting this value to ensure you use a specific version of the model artifacts                                                                                                                                                                                                        | None                                                                                                                                                                         | `dc1d3b3bfdb69df26f8fc966c16353274b138c56`                                                                                                                                                                                                               |
 | option.rolling_batch          | Enables continuous batching (iteration level batching) with one of the supported backends. Available backends differ by container, see [Inference Library Configurations](#inference-library-configuration) for mappings                                                                                                                          | None                                                                                                                                                                         | `auto`, `vllm`, `lmi-dist`, `deepspeed`, `trtllm`                                                                                                                                                                                                        |
 | option.max_rolling_batch_size | The maximum number of requests/sequences the model can process at a time. This parameter should be tuned to maximize throughput while staying within the available memory limits. `job_queue_size` should be set to a value equal or higher to this value. If the current batch is full, new requests will be queued until they can be processed. | `32` for all backends except DeepSpeed. `4` for DeepSpeed                                                                                                                    | Integer                                                                                                                                                                                                                                                  |
-| option.dtype                  | The data type you plan to cast the model weights to                                                                                                                                                                                                                                                                                               | `fp16`                                                                                                                                                                       | `fp32`, `fp16`, `bf16` (only on G5/P4/P5 or newer instance types), `int8` (only in lmi-dist)                                                                                                                                                             | 
+| option.dtype                  | The data type you plan to cast the model weights to. If not provided, LMI will use the model's default data type.                                                                                                                                                                                                                                 | `fp16`                                                                                                                                                                       | `fp32`, `fp16`, `bf16` (only on G5/P4/P5 or newer instance types), `int8` (only in lmi-dist)                                                                                                                                                             | 


we will use fp16 as default

siddvenk requested review from zachgk, frankfliu and a team as code owners April 1, 2024 17:31

lanking520 reviewed Apr 1, 2024

View reviewed changes

[docs][lmi] update guidance on advanced configurations

ab9cb61

siddvenk force-pushed the lmi-docs branch from 23d596d to ab9cb61 Compare April 1, 2024 22:56

lanking520 approved these changes Apr 2, 2024

View reviewed changes

siddvenk merged commit b4325ca into deepjavalibrary:master Apr 2, 2024
2 checks passed

siddvenk deleted the lmi-docs branch April 2, 2024 22:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs][lmi] update guidance on advanced configurations #1716

[docs][lmi] update guidance on advanced configurations #1716

siddvenk commented Apr 1, 2024

lanking520 Apr 1, 2024

siddvenk Apr 1, 2024

[docs][lmi] update guidance on advanced configurations #1716

[docs][lmi] update guidance on advanced configurations #1716

Conversation

siddvenk commented Apr 1, 2024

Description

lanking520 Apr 1, 2024

Choose a reason for hiding this comment

siddvenk Apr 1, 2024

Choose a reason for hiding this comment