-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] Rolling batch support for flash models #865
Conversation
engines/python/setup/djl_python/rolling_batch/lmi_dist_rolling_batch.py
Outdated
Show resolved
Hide resolved
self.rolling_batch = SchedulerRollingBatch(model_id_or_path, | ||
self.device, properties, | ||
**kwargs) | ||
if os.getenv('OMPI_COMM_WORLD_SIZE'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check for engine, if engine != Python and tensor_parallel!=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
if self.enable_rolling_batch and self.enable_rolling_batch.lower( | ||
) == "false": | ||
self.enable_rolling_batch = None | ||
self.enable_rolling_batch = properties.get("rolling_batch") is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.enable_rolling_batch = properties.get("rolling_batch", None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
**kwargs) | ||
if self.engine != "Python" and tp_degree != 1: | ||
self.device = int(os.getenv("LOCAL_RANK", 0)) | ||
rolling_batch_type = properties.get("rolling_batch") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
self.device = int(os.getenv("LOCAL_RANK", 0)) | ||
rolling_batch_type = properties.get("rolling_batch") | ||
model_config = AutoConfig.from_pretrained(model_id_or_path, **kwargs) | ||
_rolling_batch_cls = get_rolling_batch_class_from_str(rolling_batch_type, model_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_rolling_batch_cls = get_rolling_batch_class_from_str(self.enable_rolling_batch, model_config)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* [python] Add rolling batch for flash attention models * Flash gptneox support * fix py tests * Set sharded to false for tp 1 * Review changes --------- Co-authored-by: sindhuso <somasundaram.sindhu@gmail.com>
* [python] Add rolling batch for flash attention models * Flash gptneox support * fix py tests * Set sharded to false for tp 1 * Review changes --------- Co-authored-by: sindhuso <somasundaram.sindhu@gmail.com>
Description
Brief description of what this PR is about