-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][Core] fully composible launcher/task/coordinator/communicator design and implementation #3762
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff! Some question:
- Can the interfaces be kept as pure abstract classes? It is a little hard to follow the layers upon first reading -- I feel we should avoid
super().method()
for user implementations as much as possible, as then the code relevant to the layer is completely contained within the implementation and not spread between implementation and interface. - The MPLauncher starts processes for the task. Is it possible with this design to run the Engine in a different process? The concern I have is that the Engine will do a lot of work that can contend for CPU with the model execution. Things like receiving requests/async detokenization/async tokenization. Of course we can push all those things to different threads, but the dependency on Python for LLM ecosystem means we won't be able to escape the GIL unless we cordon it off to a different process.
from vllm.interfaces.coordinator import Coordinator | ||
|
||
|
||
class Communicator(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this extend object
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should avoid super().method() for user implementations as much as possible, as then the code relevant to the layer is completely contained within the implementation and not spread between implementation and interface.
Technically it is doable. But then some code will be replicated over many implementations. There is a trade-off here.
Is it possible with this design to run the Engine in a different process?
Yes we can. The launcher design is general, and we can have both CPU workers and GPU workers. Although we need to think about how they coordinate with each other.
why does this extend object?
see https://stackoverflow.com/questions/4015417/why-do-python-classes-inherit-object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
btw python 2 is EOL'd a long time ago (in 2020 IIRC), we don't need to extend object anymore!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree that code will be replicated if not using super().method()
. There is a third way--factor out common utilities that can be used in multiple implementations. Basically, composition vs. inheritance.
I am a bit confused about this PR. Where will these communicator be used? How do they interact with the existing code? |
They will be used for GPU-collective-communication, like class AllReduceDistributedTask(GlobalCoordinatorDistributedTask):
def post_init_distributed(self, **kwargs):
tensor = torch.ones(16, 1024, 1024, dtype=torch.float32).cuda(
self.coordinator.get_local_rank())
self.communicator.all_reduce(tensor_in=tensor)
result = tensor.mean().cpu().item()
assert result == self.coordinator.get_local_world_size() I didn't push these code into existing code yet. This is just a draft for comment. |
I think for several current uses, usage like But this can happen at higher abstraction layers, and in stages. "Coordinator" and "Communicator" is a confusing split to me, and it doesn't help that the words sound too similar to each other. Maybe this is yak shaving, but could we consider different names that make the distinction clearer? Like "WorkerCoordinator" vs. "DeviceCommunicator", or something? Or even "CollectiveComms" or "CollectiveCommunicator" to make the "NCCL wrapper" idea more clear? |
That's a great idea! Indeed I met this exact problem when I try to convey this abstraction to others. They get confused about "Coordinator" and "Communicator". I think "WorkerCoordinator" and "DeviceCommunicator" are much better. Thank you for the proposal! |
close as it is broken up in several prs. |
An implementation draft of #3587 .
Check the test code for example:
And the code for defining a task type:
We can see the full composibility.
MPLauncher
only accepts launcher-specific args, andtask_type
. It passes the rest arg to task, which initialize coordinator and communicator.It is worth to note that
GlobalCoordinatorDistributedTask
knows nothing about specific coordinator or communicator. It just operates on interfaces provided byCommunicator
andCoordinator
.This is a draft implementation, and is open for discussion.