-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add trt-llm engine build step during model initialization #1235
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your logic doesn't cover a condition that the model is SageMaker uncompressed model: model is saved to /opt/ml/models (read only). The model can be a triton repo or HF standard model. This case you should scan the model_dir and find out if there are model files there.
@@ -165,6 +165,8 @@ public void load(Path modelPath, String prefix, Map<String, ?> options) throws I | |||
} else if ("nc".equals(manager.getDevice().getDeviceType()) | |||
&& pyEnv.getTensorParallelDegree() > 0) { | |||
entryPoint = "djl_python.transformers_neuronx"; | |||
} else if ("TRT-LLM".equals(Utils.getenv("LMI_BACKEND"))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better check option.rolling_batch=trtllm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that doesn't cover the case when customers do not want to use rolling batch. Also, I think there's an effort to make rolling_batch as boolean property
This change relies on model_id set in |
@@ -463,6 +471,10 @@ public void initialize() throws IOException, ModelException { | |||
downloadModel(); | |||
loadServingProperties(); | |||
downloadS3(); | |||
isTrtLlmBackend = "TRT-LLM".equals(Utils.getenv("LMI_BACKEND")); | |||
if (isTrtLlmBackend) { | |||
initTrtLlmModel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move these changes to the python engine. Probably the PyModel.load()
function. The ModelInfo is a higher abstraction that shouldn't be modified by these changes. It should also be easier to test because you can just test it by loading and predicting with the Python Engine using the standard DJL predictor API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree to your point. Will refactor this in a separate PR as I don't have the bandwidth to refactor and test it currently.
CI seems to be flaky. |
I fixed the pmd rules, please rebase |
8fe39bf
to
e454bd9
Compare
@@ -150,6 +151,21 @@ public void load(Path modelPath, String prefix, Map<String, ?> options) throws I | |||
pyEnv.setFailOnInitialize(false); | |||
} | |||
|
|||
// Handle TRT-LLM | |||
if ("TRT-LLM".equals(Utils.getenv("LMI_BACKEND")) || Boolean.parseBoolean(getProperty("option.trt_llm"))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
env var is always set in the container and we don't want to introduce other option. I will update this,
String modelId = trtLlmRepoDir.toAbsolutePath().toString(); | ||
setProperty("model_id", modelId); | ||
pyEnv.addParameter("model_id", modelId); | ||
entryPoint = "djl_python.tensorrt_llm"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this overrides user set entryPoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If that isn't desired, you can move it back or set only if entryPoint is null
Description
Add TRT-LLM engine build step during model initialization
TODO: Tests