Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TrtLLM] Support JIT compilation and dynamic batch for TrtLLM python backend #1678

Merged
merged 1 commit into from
Mar 29, 2024

Conversation

sindhuvahinis
Copy link
Contributor

@sindhuvahinis sindhuvahinis commented Mar 27, 2024

Description

T5 handler support is in another PR. Separated my changes into multiple PR for easy reviewing #1680

This is for TensorRT-LLM T5 python backend

  • Does JIT compilation of T5 models
  • Automatically sets dynamic batch if not provided.
  • Automatically sets engine=MPI and mpi_mode=True

NOTE: User has to disable rolling_batch in order for us to set dynamic batch and engine=mpi with these changes. We could do disable rolling batch ourselves, but did not do that way because, if TRTLLM release C++ backend support for T5 in between our releases, user could upgrade their TRTLLM in requirements.txt and enable rolling batch and test them. But open to suggestions here.

Testing:
Tested with flan-t5-xl in my EC2 machine.

My serving.properties

option.model_id=/opt/ml/model/t5/hf-model
option.rolling_batch=disable
option.entryPoint=djl_python.tensorrt_llm

@@ -167,7 +168,7 @@ public void load(Path modelPath, String prefix, Map<String, ?> options) throws I
} else if ("nc".equals(manager.getDevice().getDeviceType())
&& pyEnv.getTensorParallelDegree() > 0) {
entryPoint = "djl_python.transformers_neuronx";
} else if (isTrtLlmBackend) {
} else if ("trtllm".equals(features)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we cannot use the old param?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that param checks whether rolling_batch == trtllm. Might as well, remove the param.

case "rolling_batch":
      isTrtLlmBackend = "trtllm".equals(value);
       break;

https://github.com/deepjavalibrary/djl-serving/pull/1678/files#diff-244860c398daaa2ff4bc328b98c867ccce93643b1f65c4f1024bf6bdd5a7beecR124

@sindhuvahinis sindhuvahinis merged commit a5957f3 into deepjavalibrary:master Mar 29, 2024
7 checks passed
@sindhuvahinis sindhuvahinis deleted the jit branch April 4, 2024 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants