Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs][lmi] update guidance on advanced configurations #1716

Merged
merged 1 commit into from
Apr 2, 2024

Conversation

siddvenk
Copy link
Contributor

@siddvenk siddvenk commented Apr 1, 2024

Description

Updating the guidance for our advanced configurations. The feedback from PMs was that our previous documentation didn't help users take action.

The new guidance is as follows:

There are two types of advanced configurations: `LMI`, and `Pass Through`.
`LMI` configurations are processed by LMI and translated into configurations that DeepSpeed uses.
`Pass Through` configurations are passed directly to the backend library. These are opaque configurations from the perspective of the model server and LMI.
We recommend that you file an [issue](https://github.com/deepjavalibrary/djl-serving/issues/new?assignees=&labels=bug&projects=&template=bug_report.md&title=) for any issues you encounter with configurations.
For `LMI` configurations, if we determine an issue with the configuration, we will attempt to provide a workaround for the current released version, and attempt to fix the issue for the next release.
For `Pass Through` configurations it is possible that our investigation reveals an issue with the backend library.
In that situation, there is nothing LMI can do until the issue is fixed in the backend library.

@siddvenk siddvenk requested review from zachgk, frankfliu and a team as code owners April 1, 2024 17:31
@@ -108,7 +108,7 @@ The following list of configurations is intended to highlight the relevant confi
| option.revision | The commit hash of a HuggingFace Hub Model Id. We recommend setting this value to ensure you use a specific version of the model artifacts | None | `dc1d3b3bfdb69df26f8fc966c16353274b138c56` |
| option.rolling_batch | Enables continuous batching (iteration level batching) with one of the supported backends. Available backends differ by container, see [Inference Library Configurations](#inference-library-configuration) for mappings | None | `auto`, `vllm`, `lmi-dist`, `deepspeed`, `trtllm` |
| option.max_rolling_batch_size | The maximum number of requests/sequences the model can process at a time. This parameter should be tuned to maximize throughput while staying within the available memory limits. `job_queue_size` should be set to a value equal or higher to this value. If the current batch is full, new requests will be queued until they can be processed. | `32` for all backends except DeepSpeed. `4` for DeepSpeed | Integer |
| option.dtype | The data type you plan to cast the model weights to | `fp16` | `fp32`, `fp16`, `bf16` (only on G5/P4/P5 or newer instance types), `int8` (only in lmi-dist) |
| option.dtype | The data type you plan to cast the model weights to. If not provided, LMI will use the model's default data type. | `fp16` | `fp32`, `fp16`, `bf16` (only on G5/P4/P5 or newer instance types), `int8` (only in lmi-dist) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will use fp16 as default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@siddvenk siddvenk merged commit b4325ca into deepjavalibrary:master Apr 2, 2024
2 checks passed
@siddvenk siddvenk deleted the lmi-docs branch April 2, 2024 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants