-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix]: serialize config instances by value when using --trust-remote-code #6751
[Bugfix]: serialize config instances by value when using --trust-remote-code #6751
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
Great observation! In general we should not expect any new code when we use multi-node serving. HF dynamically downloaded code and module is a pain in this case. I think converting the object to a dict makes sense to me. Does it have any side effect? Or we can just do it in all the cases? |
Yeah, this is something that we'll need to look at further. If the custom config class only adds new attributes and default values (eg. no new methods used by the modeling code), then this this conversion to a |
Will this be merged in the next release? It'd be great to have engine Ray support for Phi3 and others with remote code. |
e25a0f0
to
602c0d3
Compare
In #6607 (comment), the I'm also still working out how to test that the conversion to |
In my testing, I found that most attributes of the custom config could be attached to the PretrainedConfig, but that some configurations are expected to be class attributes and those would not be preserved (eg. I did some more investigation into the serialization and found a much better solution: the This means that the |
@youkaichao @rkooo567 This PR is now ready for review. Please take a look :) |
do you happen to know how does the |
Not in any detail, but I assume that it somehow serializes the class definition along with the instance data in what it communicates between the workers. |
@njhill @tjohnson31415 can we try to get this merged in? |
Previously, I was hesitant to merge this pr, as I was not confident about if removing To proceed, @tjohnson31415 can you restore the change of |
Just a heads up, I am trying to incorporate this into my pipeline parallel setup for deepseek v2. I modified my docker file to build vllm from this branch. I got an error and am falling back to an older flash attention just to test. I'm building from the vllm-openai:v0.6.2 image.
|
the main branch of vllm builds flash-attention inside. to test this branch, I suggest you adding changes for |
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
03e21d9
to
d03107a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just questions/thoughts out of curiosity as a downstream user/watcher of this branch.
vllm/transformers_utils/config.py
Outdated
cloudpickle.register_pickle_by_value(transformers_modules) | ||
# Ray vendors its own version of cloudpickle | ||
ray.cloudpickle.register_pickle_by_value(transformers_modules) | ||
# ignore import errors in the case that trust_remote_code is set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this still be a debug log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, definitely. Updated to incldue log messages instead of just the comment.
Thanks for the review!
@@ -945,6 +945,8 @@ def flatten_2d_lists(lists: List[List[T]]) -> List[T]: | |||
return [item for sublist in lists for item in sublist] | |||
|
|||
|
|||
# TODO: This function can be removed if transformer_modules classes are | |||
# serialized by value when communicating between processes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be removed, and properly tested with it's absence, as part of this PR's goals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my plan as well but I undid the removal due to this comment from @youkaichao. It doesn't hurt to keep it in for now if it can help the PR get merged. The removal can be a quick follow-on PR where we can test edge cases.
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
@justinthelaw @mphilippnv does this pr solve your problem? @tjohnson31415 can you add one testcase for this? we can have tests for 2-node case. |
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
6f9208a
to
2b318c7
Compare
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
@mphilippnv Have you had a chance to verify these changes in your environment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the tests pass, this PR should be good to go now. What do you think @youkaichao ?
No. I have too much work coming in. Haven't had time to really play with it. |
d9546a2
to
ab7b095
Compare
…ject#6751) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: charlifu <charlifu@amd.com>
…ject#6751) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>
…ject#6751) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Alvant <alvasian@yandex.ru>
…ject#6751) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Erkin Sagiroglu <erkin@infra-aipipeline-1-at1-prox-prod-a.ipa.corp.telnyx.com>
…ject#6751) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Amit Garg <mitgarg17495@gmail.com>
…ject#6751) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: qishuai <ferdinandzhong@gmail.com>
…ject#6751) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>
…ject#6751) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…ject#6751) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
…ject#6751) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
It is not currently possible to run vLLM with a model that requires
--trust-remote-code
if the server spans multiple nodes. The server will crash with an error when it attempts to communicate the dynamically generated configuration class to a remote Ray worker. The crux of the error isThis error arises due to the dynamic
transformers_modules
module generated by Transformers when it loads the remote code configuration for the model. In a multi-node context, this module is generated on the head node whentransformers.AutoConfig.from_pretrained()
is called, but it isn't generated on the other nodes.This is a very similar issue to what was resolved in #871 and #4285, but now in a multi-node context. As noted in #871, the failure to import occurs when Ray attempts to communicate the
ModelConfig
object containinghf_config
andhf_text_config
referencing the dynamically imported config class fromtransformers_modules
to the worker node. The fix in #871 became the util functioninit_cached_hf_modules
that runstransformers.dynamic_module_utils
on each worker during the initialization of the WorkerWrapperBase. This generates the dynamic module base in~/.cache/huggingface/modules
(which does need to happen once on each node) and also modifies the module search path to include the cache directory (which needs to happen in every worker), but it does not generatetransformers_modules
. Use ofinit_cached_hf_modules
fixed the single node case due to the modification to the module import path, but doesn't fix the multi-node case.A work around would be to run vllm or
transformers.AutoConfig.from_pretrained
on each node manually to generate the modules (or get the generated module files onto each node some other way).The implementation proposed in this PR is to utilize a feature in the
cloudpickle
library that allows the config objects to be serialized by-value instead of by-reference so that the custom config class does not need to be importable in the remote workers.See https://github.com/cloudpipe/cloudpickle?tab=readme-ov-file#overriding-pickles-serialization-mechanism-for-importable-constructs
Doing this also obviates the need for
init_cached_hf_modules()
(added a TODO to remove in a later PR once edge-cases can be verified).A similar error is reported in #6607, even without multiple GPUs. In that case, the failure occurs when using
--trust-remote-code
with--engine-use-ray
. The fix proposed here resolves this issue as well.Alternatives Considered for multi-node (do not fix to the
--engine-use-ray
case):AutoConfig.from_pretrained()
on each worker to usetransformers
to generate the dynamic moduleFIX #3593
FIX #4169
FIX #6263
FIX #6607
FIX #8553
Also fixes the issue raised in #4986 (comment)
There's a related issue with serializing these custom config classes when running TP with
VLLM_WORKER_MULTIPROC_METHOD=spawn
and a model with a.
in its name such asmicrosoft/Phi-3.5-MoE-instruct
. This results in an error like:(NB: the module name is truncated at the
.
inPhi-3.5-MoE
)This error arises when
multiprocessing
pickles the custom config class embedded inModelConfig
to send the arguments forcreate_worker
to the newly spawned process. Even though the module is importable, it fails due to the extra.
in the name (transformers
has special handling for this). This too can be fixed by registering by-value serialization of these objects by usingcloudpickle
as a custom "reducer"/serializer inpickle
.FIX #8553