-
Notifications
You must be signed in to change notification settings - Fork 47
[Transform] Better dispatch support for offloaded and multi-gpu #423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
e68f4f7 to
f8f7156
Compare
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
brian-dellabetta
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving with question
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
brian-dellabetta
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
dsikka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead use first device seen (typically CPU) and ensure future devices match that device
How is this ensured?
|
@dsikka This is ensured by moving the weight to the device of the value. https://github.com/neuralmagic/compressed-tensors/pull/423/files#diff-be313d6f55c99277b8d747f1d5470f9cf0f08d99ffdb1fc45ac8df80f8784c59R114 This |
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
…-project#423) * key by weight only Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * always return on CPU, onload at runtime Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix get_offloaded_device Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * reduce diff Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * reduce diff Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * reduce diff Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * move to device to support pipeline parallel Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * eagerly generate with precision Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add comment Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Purpose
Changes
scheme.precisionrather thanmodule.dtypewhen constructing parametersget_offloaded_devicein the case that the module is not offloaded (such as attention)Testing