-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] Update lmi_dist warmup logic #1367
Conversation
self.properties.get("max_rolling_batch_prefill_tokens", -1)) | ||
self.model.warmup(batch_size, max_batch_prefill_tokens) | ||
max_prefill_tokens = int( | ||
self.properties.get("max_rolling_batch_prefill_tokens", 4096)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the purpose of max_rolling_batch_prefill_tokens
has changed. Do we still want to keep this concept? And why we need now to introduce a max_prefill_tokens, can we just set some small value like 512, 1024?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and why we start to bring this to external, can this be done inside lmi-dist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was external before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we set to 4096 such small value? Does that have impact to the total token we can support? Or on the other hand, if customer set this value to be very high, like 32000, will this impact model loading?
72a0134
to
70f7204
Compare
tests/integration/llm/prepare.py
Outdated
@@ -773,6 +773,7 @@ def build_lmi_dist_model(model): | |||
) | |||
options = lmi_dist_model_list[model] | |||
options["engine"] = "MPI" | |||
options["option.mpi_mode"] = "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not needed? Given engine is MPI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this change require this property: https://github.com/deepjavalibrary/djl-serving/blob/master/engines/python/setup/djl_python/rolling_batch/lmi_dist_rolling_batch.py#L42
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got error without adding mpi_mode.
Description
Brief description of what this PR is about