Add trt-llm engine build step during model initialization #1235

rohithkrn · 2023-10-30T00:21:57Z

Description

Add TRT-LLM engine build step during model initialization

TODO: Tests

wlm/src/main/java/ai/djl/serving/wlm/ModelInfo.java

serving/docker/partition/trt_llm_partition.py

lanking520

Your logic doesn't cover a condition that the model is SageMaker uncompressed model: model is saved to /opt/ml/models (read only). The model can be a triton repo or HF standard model. This case you should scan the model_dir and find out if there are model files there.

lanking520 · 2023-10-31T05:47:32Z

engines/python/src/main/java/ai/djl/python/engine/PyModel.java

@@ -165,6 +165,8 @@ public void load(Path modelPath, String prefix, Map<String, ?> options) throws I
                } else if ("nc".equals(manager.getDevice().getDeviceType())
                        && pyEnv.getTensorParallelDegree() > 0) {
                    entryPoint = "djl_python.transformers_neuronx";
+                } else if ("TRT-LLM".equals(Utils.getenv("LMI_BACKEND"))) {


@sindhuvahinis

It's better check option.rolling_batch=trtllm

that doesn't cover the case when customers do not want to use rolling batch. Also, I think there's an effort to make rolling_batch as boolean property

serving/docker/partition/trt_llm_partition.py

wlm/src/main/java/ai/djl/serving/wlm/ModelInfo.java

rohithkrn · 2023-10-31T18:13:43Z

Your logic doesn't cover a condition that the model is SageMaker uncompressed model: model is saved to /opt/ml/models (read only). The model can be a triton repo or HF standard model. This case you should scan the model_dir and find out if there are model files there.

This change relies on model_id set in serving.properties or env var. For an uncompressed model, what would model_id look like?

zachgk · 2023-10-31T21:53:19Z

wlm/src/main/java/ai/djl/serving/wlm/ModelInfo.java

@@ -463,6 +471,10 @@ public void initialize() throws IOException, ModelException {
        downloadModel();
        loadServingProperties();
        downloadS3();
+        isTrtLlmBackend = "TRT-LLM".equals(Utils.getenv("LMI_BACKEND"));
+        if (isTrtLlmBackend) {
+            initTrtLlmModel();


Please move these changes to the python engine. Probably the PyModel.load() function. The ModelInfo is a higher abstraction that shouldn't be modified by these changes. It should also be easier to test because you can just test it by loading and predicting with the Python Engine using the standard DJL predictor API

Agree to your point. Will refactor this in a separate PR as I don't have the bandwidth to refactor and test it currently.

rohithkrn · 2023-10-31T23:38:22Z

CI seems to be flaky.
Tests pass locally on mac with PR branch.
Master branch also fails on ubuntu locally

lanking520 · 2023-11-01T04:46:38Z

I fixed the pmd rules, please rebase

rohithkrn · 2023-11-01T20:48:18Z

engines/python/src/main/java/ai/djl/python/engine/PyModel.java

@@ -150,6 +151,21 @@ public void load(Path modelPath, String prefix, Map<String, ?> options) throws I
            pyEnv.setFailOnInitialize(false);
        }

+        // Handle TRT-LLM
+        if ("TRT-LLM".equals(Utils.getenv("LMI_BACKEND")) || Boolean.parseBoolean(getProperty("option.trt_llm"))) {


env var is always set in the container and we don't want to introduce other option. I will update this,

rohithkrn · 2023-11-01T20:54:59Z

engines/python/src/main/java/ai/djl/python/engine/PyModel.java

+                    String modelId = trtLlmRepoDir.toAbsolutePath().toString();
+                    setProperty("model_id", modelId);
+                    pyEnv.addParameter("model_id", modelId);
+                    entryPoint = "djl_python.tensorrt_llm";


this overrides user set entryPoint

Yes. If that isn't desired, you can move it back or set only if entryPoint is null

rohithkrn requested review from zachgk, frankfliu and a team as code owners October 30, 2023 00:21

lanking520 reviewed Oct 30, 2023

View reviewed changes

wlm/src/main/java/ai/djl/serving/wlm/ModelInfo.java Outdated Show resolved Hide resolved

lanking520 reviewed Oct 30, 2023

View reviewed changes

wlm/src/main/java/ai/djl/serving/wlm/ModelInfo.java Outdated Show resolved Hide resolved

wlm/src/main/java/ai/djl/serving/wlm/ModelInfo.java Outdated Show resolved Hide resolved

serving/docker/partition/trt_llm_partition.py Show resolved Hide resolved

lanking520 reviewed Oct 31, 2023

View reviewed changes

lanking520 approved these changes Oct 31, 2023

View reviewed changes

zachgk requested changes Oct 31, 2023

View reviewed changes

rohithkrn added 7 commits November 1, 2023 17:12

add trt-llm engine build step during model initialization

5424bee

address comments

e971f93

add tokenizer check for repo validity

270f10e

update trt repo validation logic

e8d40eb

remove unused import

d9a2443

fix checkstyle

441dba1

format Java code

e454bd9

rohithkrn force-pushed the trt_engine_build branch from 8fe39bf to e454bd9 Compare November 1, 2023 17:17

Uses PyModel instead of ModelInfo

236626c

rohithkrn commented Nov 1, 2023

View reviewed changes

zachgk approved these changes Nov 1, 2023

View reviewed changes

fix modelPath and minor updates

835a31e

rohithkrn merged commit a58a735 into deepjavalibrary:master Nov 1, 2023
5 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add trt-llm engine build step during model initialization #1235

Add trt-llm engine build step during model initialization #1235

rohithkrn commented Oct 30, 2023 •

edited

Loading

lanking520 left a comment

lanking520 Oct 31, 2023

frankfliu Oct 31, 2023

rohithkrn Nov 1, 2023

rohithkrn commented Oct 31, 2023

zachgk Oct 31, 2023

rohithkrn Oct 31, 2023

rohithkrn commented Oct 31, 2023 •

edited

Loading

lanking520 commented Nov 1, 2023 •

edited

Loading

rohithkrn Nov 1, 2023

rohithkrn Nov 1, 2023 •

edited

Loading

zachgk Nov 1, 2023

Add trt-llm engine build step during model initialization #1235

Add trt-llm engine build step during model initialization #1235

Conversation

rohithkrn commented Oct 30, 2023 • edited Loading

Description

lanking520 left a comment

Choose a reason for hiding this comment

lanking520 Oct 31, 2023

Choose a reason for hiding this comment

frankfliu Oct 31, 2023

Choose a reason for hiding this comment

rohithkrn Nov 1, 2023

Choose a reason for hiding this comment

rohithkrn commented Oct 31, 2023

zachgk Oct 31, 2023

Choose a reason for hiding this comment

rohithkrn Oct 31, 2023

Choose a reason for hiding this comment

rohithkrn commented Oct 31, 2023 • edited Loading

lanking520 commented Nov 1, 2023 • edited Loading

rohithkrn Nov 1, 2023

Choose a reason for hiding this comment

rohithkrn Nov 1, 2023 • edited Loading

Choose a reason for hiding this comment

zachgk Nov 1, 2023

Choose a reason for hiding this comment

rohithkrn commented Oct 30, 2023 •

edited

Loading

rohithkrn commented Oct 31, 2023 •

edited

Loading

lanking520 commented Nov 1, 2023 •

edited

Loading

rohithkrn Nov 1, 2023 •

edited

Loading