-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ Frontend ] Multiprocessing for OpenAI Server with zeromq
#6883
[ Frontend ] Multiprocessing for OpenAI Server with zeromq
#6883
Conversation
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
) -> PreTrainedTokenizer: | ||
"""Get the appropriate Tokenizer for the request""" | ||
|
||
async def is_tracing_enabled(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joerunde why are these pass
?
Co-authored-by: Simon Mo <simon.mo@hey.com>
Do you want this in automation? |
TPOT speedup:
Nice win + more to go in the follow ups (especially protobufs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async def _send_one_way_rpc_request(self, request: RPC_REQUEST_TYPE, | ||
error_message: str): | ||
"""Send one-way RPC request to trigger an action.""" | ||
with self.socket() as socket: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand this. We create a socket every time we call this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I believe this is a common paradigm for zeromq based clients
also, this function is mostly used in setup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this does not mean a new Unix socket is created. This is a zeromq socket. how zeromq handles this internally is opaque to us
port = get_open_port(envs.VLLM_RPC_PORT) | ||
rpc_server_process = Process(target=run_rpc_server, | ||
args=(engine_args, | ||
UsageContext.OPENAI_API_SERVER, | ||
port)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need a new env var here? isn't get_open_port()
enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be open to reverting this. Will just need to check how it interacts with tp
@@ -0,0 +1,715 @@ | |||
""" | |||
Repeat of tests in test_completion.py with the non-mp backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we copy this file instead of adding several new lines to add a new arg for the server, like @pytest.mark.parametrize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we need to modify default server args which is a fixture for the whole file so unfortunately I think this has to be a separate file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work! I just left some nit comments that I don't understand. If you have time, could you please briefly draw some diagrams to show how many processes do we have use and how do they interact with each other?
…oject#6883) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Simon Mo <simon.mo@hey.com>
…oject#6883) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Simon Mo <simon.mo@hey.com>
…oject#6883) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Signed-off-by: Alvant <alvasian@yandex.ru>
…oject#6883) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Simon Mo <simon.mo@hey.com>
RFC: #6797
SUMMARY:
AsyncLLMEngine
zeromq
for networking between OAI process andAsyncLLMEngine
llama-8b-instruct
on H100 (550 input tokens, 150 output tokens)TODO:
PERFORMANCE:
zeromq
based protocol is delivering speedup in high QPS scenario:main
grpc
zmq + pickle
(this PR)zmq + protobuf
Clear win + clear we should move forward with
zeromq
Benchmark script for reference:
FOLLOW UP WORK:
protobuf
communication layer for the RPC ServerN
uvicorn processes + expose a flag