[TrtLLM] Support JIT compilation and dynamic batch for TrtLLM python backend #1678

sindhuvahinis · 2024-03-27T18:18:04Z

Description

T5 handler support is in another PR. Separated my changes into multiple PR for easy reviewing #1680

This is for TensorRT-LLM T5 python backend

Does JIT compilation of T5 models
Automatically sets dynamic batch if not provided.
Automatically sets engine=MPI and mpi_mode=True

NOTE: User has to disable rolling_batch in order for us to set dynamic batch and engine=mpi with these changes. We could do disable rolling batch ourselves, but did not do that way because, if TRTLLM release C++ backend support for T5 in between our releases, user could upgrade their TRTLLM in requirements.txt and enable rolling batch and test them. But open to suggestions here.

Testing:
Tested with flan-t5-xl in my EC2 machine.

My serving.properties

option.model_id=/opt/ml/model/t5/hf-model
option.rolling_batch=disable
option.entryPoint=djl_python.tensorrt_llm

wlm/src/main/java/ai/djl/serving/wlm/LmiConfigRecommender.java

lanking520 · 2024-03-28T19:50:36Z

engines/python/src/main/java/ai/djl/python/engine/PyModel.java

@@ -167,7 +168,7 @@ public void load(Path modelPath, String prefix, Map<String, ?> options) throws I
                } else if ("nc".equals(manager.getDevice().getDeviceType())
                        && pyEnv.getTensorParallelDegree() > 0) {
                    entryPoint = "djl_python.transformers_neuronx";
-                } else if (isTrtLlmBackend) {
+                } else if ("trtllm".equals(features)) {


why we cannot use the old param?

Because that param checks whether rolling_batch == trtllm. Might as well, remove the param.

case "rolling_batch": isTrtLlmBackend = "trtllm".equals(value); break;

https://github.com/deepjavalibrary/djl-serving/pull/1678/files#diff-244860c398daaa2ff4bc328b98c867ccce93643b1f65c4f1024bf6bdd5a7beecR124

…backend

sindhuvahinis requested a review from siddvenk March 27, 2024 18:18

sindhuvahinis requested review from zachgk, frankfliu and a team as code owners March 27, 2024 18:18

sindhuvahinis mentioned this pull request Mar 27, 2024

[TrtLLM] Python backend support for T5 model #1680

Merged

siddvenk reviewed Mar 28, 2024

View reviewed changes

wlm/src/main/java/ai/djl/serving/wlm/LmiConfigRecommender.java Show resolved Hide resolved

wlm/src/main/java/ai/djl/serving/wlm/LmiConfigRecommender.java Outdated Show resolved Hide resolved

lanking520 reviewed Mar 28, 2024

View reviewed changes

[TrtLLM] Support JIT compilation and dynamic batch for TrtLLM python …

b3d7191

…backend

sindhuvahinis force-pushed the jit branch from bcfea9b to b3d7191 Compare March 28, 2024 20:54

lanking520 approved these changes Mar 29, 2024

View reviewed changes

siddvenk approved these changes Mar 29, 2024

View reviewed changes

sindhuvahinis merged commit a5957f3 into deepjavalibrary:master Mar 29, 2024
7 checks passed

sindhuvahinis deleted the jit branch April 4, 2024 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TrtLLM] Support JIT compilation and dynamic batch for TrtLLM python backend #1678

[TrtLLM] Support JIT compilation and dynamic batch for TrtLLM python backend #1678

sindhuvahinis commented Mar 27, 2024 •

edited

Loading

lanking520 Mar 28, 2024

sindhuvahinis Mar 28, 2024

[TrtLLM] Support JIT compilation and dynamic batch for TrtLLM python backend #1678

[TrtLLM] Support JIT compilation and dynamic batch for TrtLLM python backend #1678

Conversation

sindhuvahinis commented Mar 27, 2024 • edited Loading

Description

lanking520 Mar 28, 2024

Choose a reason for hiding this comment

sindhuvahinis Mar 28, 2024

Choose a reason for hiding this comment

sindhuvahinis commented Mar 27, 2024 •

edited

Loading