-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Add CustomOp interface for device portability #5255
Conversation
vllm/model_executor/custom_op.py
Outdated
def dispatch_forward(self): | ||
if is_hip(): | ||
return self.forward_hip | ||
elif is_cpu(): | ||
return self.forward_cpu | ||
else: | ||
return self.forward_cuda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we need more flexibility here in the future. For example, we may build a wheel with both CPU and CUDA enabled, but we want to configure which one to use on the fly. On the other hand, this may not be necessary at this moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. For now, vLLM is bound to a specific backend at the build time. I added a note that we do not support dynamic dispatching currently.
pytorch has quite a lot dispatching utilities, can we reuse some? |
@youkaichao Good point. I moved the dispatching logic to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Hi @WoosukKwon , would you mind taking a look at #5047 before you land this? I've been working on registering all the custom operations via |
@bnellnm Thanks for bringing it up. If I understand correctly, this PR is orthogonal to yours. Basically, I believe your PR does NOT include per-device dispatching, because vLLM always builds the custom library for at most one device. Also, in our situation, dispatching can't be implemented at the C++ level, because we'd like to use Python libraries to implement some custom ops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm :)
Currently, the custom layers have two issues. First, they directly import
_custom_ops
, which are not supported for devices such as TPU and Gaudi. Second, they assume that the custom ops are implemented in the same way for all devices. To address these issues, the PR addsCustomOp
interface, an indirection layer that implements the device-specificforward
methods. This allows the custom kernels to be lazily imported only for the associated device.According to the benchmarks, the lazy import does not affect the performance: