Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix]: serialize config instances by value when using --trust-remote-code #6751

Merged

Conversation

tjohnson31415
Copy link
Contributor

@tjohnson31415 tjohnson31415 commented Jul 24, 2024

It is not currently possible to run vLLM with a model that requires --trust-remote-code if the server spans multiple nodes. The server will crash with an error when it attempts to communicate the dynamically generated configuration class to a remote Ray worker. The crux of the error is

ray.exceptions.RaySystemError: System error: No module named 'transformers_modules'

This error arises due to the dynamic transformers_modules module generated by Transformers when it loads the remote code configuration for the model. In a multi-node context, this module is generated on the head node when transformers.AutoConfig.from_pretrained() is called, but it isn't generated on the other nodes.

This is a very similar issue to what was resolved in #871 and #4285, but now in a multi-node context. As noted in #871, the failure to import occurs when Ray attempts to communicate the ModelConfig object containing hf_config and hf_text_config referencing the dynamically imported config class from transformers_modules to the worker node. The fix in #871 became the util function init_cached_hf_modules that runs transformers.dynamic_module_utils on each worker during the initialization of the WorkerWrapperBase. This generates the dynamic module base in ~/.cache/huggingface/modules (which does need to happen once on each node) and also modifies the module search path to include the cache directory (which needs to happen in every worker), but it does not generate transformers_modules. Use of init_cached_hf_modules fixed the single node case due to the modification to the module import path, but doesn't fix the multi-node case.

A work around would be to run vllm or transformers.AutoConfig.from_pretrained on each node manually to generate the modules (or get the generated module files onto each node some other way).


The implementation proposed in this PR is to utilize a feature in the cloudpickle library that allows the config objects to be serialized by-value instead of by-reference so that the custom config class does not need to be importable in the remote workers.
See https://github.com/cloudpipe/cloudpickle?tab=readme-ov-file#overriding-pickles-serialization-mechanism-for-importable-constructs

Doing this also obviates the need for init_cached_hf_modules() (added a TODO to remove in a later PR once edge-cases can be verified).

A similar error is reported in #6607, even without multiple GPUs. In that case, the failure occurs when using --trust-remote-code with --engine-use-ray. The fix proposed here resolves this issue as well.

Alternatives Considered for multi-node (do not fix to the --engine-use-ray case):

  • serialize configs as instances of classes that aren't dynamically imported, eg. PretrainedConfig or Dict
  • figure out how to copy files from the head node to all other nodes via Ray
  • add code to run AutoConfig.from_pretrained() on each worker to use transformers to generate the dynamic module

FIX #3593
FIX #4169
FIX #6263
FIX #6607
FIX #8553
Also fixes the issue raised in #4986 (comment)


There's a related issue with serializing these custom config classes when running TP with VLLM_WORKER_MULTIPROC_METHOD=spawn and a model with a . in its name such as microsoft/Phi-3.5-MoE-instruct. This results in an error like:

ModuleNotFoundError: No module named 'transformers_modules.microsoft.Phi-3'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
ModuleNotFoundError: No module named 'transformers_modules.microsoft.Phi-3'

(NB: the module name is truncated at the . in Phi-3.5-MoE)

This error arises when multiprocessing pickles the custom config class embedded in ModelConfig to send the arguments for create_worker to the newly spawned process. Even though the module is importable, it fails due to the extra . in the name (transformers has special handling for this). This too can be fixed by registering by-value serialization of these objects by using cloudpickle as a custom "reducer"/serializer in pickle.

FIX #8553

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@youkaichao
Copy link
Member

Great observation! In general we should not expect any new code when we use multi-node serving. HF dynamically downloaded code and module is a pain in this case. I think converting the object to a dict makes sense to me. Does it have any side effect? Or we can just do it in all the cases?

@tjohnson31415
Copy link
Contributor Author

tjohnson31415 commented Jul 25, 2024

converting the object to a dict makes sense to me. Does it have any side effect? Or we can just do it in all the cases?

Yeah, this is something that we'll need to look at further. If the custom config class only adds new attributes and default values (eg. no new methods used by the modeling code), then this this conversion to a PretrainedConfig while preserving extra attributes should be fine. With the assumption that downstream code mostly treats the config objects as a plain-old-data class with read-only attributes. I think that is the standard case, but I haven't done much investigation into edge cases.

@justinthelaw
Copy link

Will this be merged in the next release? It'd be great to have engine Ray support for Phi3 and others with remote code.

@tjohnson31415 tjohnson31415 force-pushed the fix-distributed-trust-remote-code branch from e25a0f0 to 602c0d3 Compare August 1, 2024 17:52
@tjohnson31415
Copy link
Contributor Author

tjohnson31415 commented Aug 1, 2024

In #6607 (comment), the No module named 'transformers_modules' error can also occur when using --engine-use-ray with --trust-remote-code. In that case, the engine worker does not have its python path updated to load the dynamic modules. The idea in this PR would resolve that issue as well if the conversion to PretrainedConfig happens earlier. I've updated the PR to include the fix for this.

I'm also still working out how to test that the conversion to PretrainedConfig preserves all relevant attributes.

@tjohnson31415 tjohnson31415 changed the title [Bugfix]: use PretrainedConfig to communicate config objects with trust remote code [Bugfix]: serialize config instances by value when using --trust-remote-code Aug 1, 2024
@tjohnson31415
Copy link
Contributor Author

tjohnson31415 commented Aug 1, 2024

In my testing, I found that most attributes of the custom config could be attached to the PretrainedConfig, but that some configurations are expected to be class attributes and those would not be preserved (eg. attribute_map, keys_to_ignore_at_inference).

I did some more investigation into the serialization and found a much better solution: the cloudpickle serialization library used by Ray supports serializing instances of dynamic classes by-value:
https://github.com/cloudpipe/cloudpickle?tab=readme-ov-file#overriding-pickles-serialization-mechanism-for-importable-constructs

This means that the transformers_modules class no longer needs to be importable on remote workers. Using this feature simply requires registering the dynamic modules for by-value serialization instead of the default by-reference serialization. I'll note that cloudpickle considers the feature experimental, but it seems much better than the other options I've tried.

@tjohnson31415 tjohnson31415 marked this pull request as ready for review August 1, 2024 23:13
@tjohnson31415
Copy link
Contributor Author

@youkaichao @rkooo567 This PR is now ready for review. Please take a look :)

@youkaichao
Copy link
Member

do you happen to know how does the serialization by value work?

@tjohnson31415
Copy link
Contributor Author

do you happen to know how does the serialization by value work?

Not in any detail, but I assume that it somehow serializes the class definition along with the instance data in what it communicates between the workers.

@TheAhmadOsman
Copy link

@njhill @tjohnson31415 can we try to get this merged in?

@youkaichao youkaichao mentioned this pull request Oct 6, 2024
@youkaichao
Copy link
Member

Previously, I was hesitant to merge this pr, as I was not confident about if removing init_cached_hf_modules() would have unexpected consequence.

To proceed, @tjohnson31415 can you restore the change of init_cached_hf_modules() , and only pass the config via cloudpickle ?

@mphilippnv
Copy link

mphilippnv commented Oct 6, 2024

Just a heads up, I am trying to incorporate this into my pipeline parallel setup for deepseek v2. I modified my docker file to build vllm from this branch. I got an error and am falling back to an older flash attention just to test. I'm building from the vllm-openai:v0.6.2 image.

> [16/20] RUN git fetch origin pull/6751/head:pr-6751 &&     git checkout pr-6751 &&     pip install -e .:

1.034 From https://github.com/vllm-project/vllm

1.034  * [new ref]           refs/pull/6751/head -> pr-6751

1.107 Switched to branch 'pr-6751'

1.275 Obtaining file:///workspace/vllm

1.276   Installing build dependencies: started

70.29   Installing build dependencies: still running...

132.2   Installing build dependencies: still running...

205.7   Installing build dependencies: still running...

205.9   Installing build dependencies: finished with status 'done'

205.9   Checking if build backend supports build_editable: started

206.0   Checking if build backend supports build_editable: finished with status 'done'

206.0   Getting requirements to build editable: started

206.9   Getting requirements to build editable: finished with status 'done'

206.9   Preparing editable metadata (pyproject.toml): started

207.7   Preparing editable metadata (pyproject.toml): finished with status 'done'

207.8 Collecting cmake>=3.21 (from vllm==0.5.3.post1+cu124)

207.8   Using cached cmake-3.30.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.4 kB)

217.8 Collecting ninja (from vllm==0.5.3.post1+cu124)

217.8   Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB)

217.8 Requirement already satisfied: psutil in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (6.0.0)

217.8 Requirement already satisfied: sentencepiece in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (0.2.0)

217.8 Requirement already satisfied: numpy<2.0.0 in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (1.26.4)

217.8 Requirement already satisfied: requests in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (2.32.3)

217.8 Requirement already satisfied: tqdm in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (4.66.5)

217.8 Requirement already satisfied: py-cpuinfo in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (9.0.0)

217.8 Requirement already satisfied: transformers>=4.43.2 in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (4.45.0)

217.8 Requirement already satisfied: tokenizers>=0.19.1 in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (0.20.0)

217.8 Requirement already satisfied: fastapi in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (0.110.0)

217.8 Requirement already satisfied: aiohttp in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (3.10.6)

217.8 Requirement already satisfied: openai in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (1.48.0)

217.8 Requirement already satisfied: uvicorn[standard] in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (0.29.0)

217.8 Requirement already satisfied: pydantic>=2.0 in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (2.9.2)

217.8 Requirement already satisfied: pillow in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (10.4.0)

217.8 Requirement already satisfied: prometheus-client>=0.18.0 in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (0.21.0)

217.8 Requirement already satisfied: prometheus-fastapi-instrumentator>=7.0.0 in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (7.0.0)

217.8 Requirement already satisfied: tiktoken>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (0.7.0)

217.9 Collecting lm-format-enforcer==0.10.3 (from vllm==0.5.3.post1+cu124)

218.0   Downloading lm_format_enforcer-0.10.3-py3-none-any.whl.metadata (16 kB)

218.0 Requirement already satisfied: outlines<0.1,>=0.0.43 in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (0.0.46)

218.0 Requirement already satisfied: typing-extensions in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (4.12.2)

218.0 Requirement already satisfied: filelock>=3.10.4 in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (3.16.1)

218.0 Requirement already satisfied: pyzmq in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (26.2.0)

218.0 Requirement already satisfied: ray>=2.9 in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (2.37.0)

218.0 Requirement already satisfied: nvidia-ml-py in /usr/local/lib/python3.12/dist-packages (from vllm==0.5.3.post1+cu124) (12.560.30)

218.1 Collecting torch==2.3.1 (from vllm==0.5.3.post1+cu124)

218.1   Using cached torch-2.3.1-cp312-cp312-manylinux1_x86_64.whl.metadata (26 kB)

218.1 Collecting torchvision==0.18.1 (from vllm==0.5.3.post1+cu124)

218.2   Downloading torchvision-0.18.1-cp312-cp312-manylinux1_x86_64.whl.metadata (6.6 kB)

218.2 Collecting xformers==0.0.27 (from vllm==0.5.3.post1+cu124)

218.3   Downloading xformers-0.0.27-cp312-cp312-manylinux2014_x86_64.whl.metadata (1.0 kB)

218.3 INFO: pip is looking at multiple versions of vllm to determine which version is compatible with other requirements. This could take a while.

218.3 ERROR: Could not find a version that satisfies the requirement vllm-flash-attn==2.5.9.post1 (from vllm) (from versions: 2.6.1, 2.6.2)

218.5 ERROR: No matching distribution found for vllm-flash-attn==2.5.9.post1

------

Dockerfile:57

--------------------

  56 |     WORKDIR /workspace/vllm

  57 | >>> RUN git fetch origin pull/6751/head:pr-6751 && \

  58 | >>>     git checkout pr-6751 && \

  59 | >>>     pip install -e .

  60 |     

--------------------

ERROR: failed to solve: process "/bin/sh -c git fetch origin pull/6751/head:pr-6751 &&     git checkout pr-6751 &&     pip install -e ." did not complete successfully: exit code: 1

@youkaichao
Copy link
Member

I got an error and am falling back to an older flash attention just to test

the main branch of vllm builds flash-attention inside.

to test this branch, I suggest you adding changes for vllm/transformers_utils/config.py manually on top of the main branch.

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
@tjohnson31415 tjohnson31415 force-pushed the fix-distributed-trust-remote-code branch from 03e21d9 to d03107a Compare October 7, 2024 15:51
Copy link

@justinthelaw justinthelaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just questions/thoughts out of curiosity as a downstream user/watcher of this branch.

cloudpickle.register_pickle_by_value(transformers_modules)
# Ray vendors its own version of cloudpickle
ray.cloudpickle.register_pickle_by_value(transformers_modules)
# ignore import errors in the case that trust_remote_code is set

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this still be a debug log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, definitely. Updated to incldue log messages instead of just the comment.
Thanks for the review!

@@ -945,6 +945,8 @@ def flatten_2d_lists(lists: List[List[T]]) -> List[T]:
return [item for sublist in lists for item in sublist]


# TODO: This function can be removed if transformer_modules classes are
# serialized by value when communicating between processes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be removed, and properly tested with it's absence, as part of this PR's goals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my plan as well but I undid the removal due to this comment from @youkaichao. It doesn't hurt to keep it in for now if it can help the PR get merged. The removal can be a quick follow-on PR where we can test edge cases.

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
@youkaichao
Copy link
Member

@justinthelaw @mphilippnv does this pr solve your problem?

@tjohnson31415 can you add one testcase for this? we can have tests for 2-node case.

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
@tjohnson31415 tjohnson31415 force-pushed the fix-distributed-trust-remote-code branch from 6f9208a to 2b318c7 Compare October 8, 2024 07:15
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 11, 2024
@tjohnson31415
Copy link
Contributor Author

@mphilippnv Have you had a chance to verify these changes in your environment?

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the tests pass, this PR should be good to go now. What do you think @youkaichao ?

@mphilippnv
Copy link

@mphilippnv Have you had a chance to verify these changes in your environment?

No. I have too much work coming in. Haven't had time to really play with it.

@tjohnson31415 tjohnson31415 force-pushed the fix-distributed-trust-remote-code branch from d9546a2 to ab7b095 Compare October 21, 2024 22:31
@youkaichao youkaichao merged commit b729901 into vllm-project:main Oct 22, 2024
61 checks passed
@tjohnson31415 tjohnson31415 deleted the fix-distributed-trust-remote-code branch October 22, 2024 20:12
charlifu pushed a commit to charlifu/vllm that referenced this pull request Oct 23, 2024
…ject#6751)

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: charlifu <charlifu@amd.com>
vrdn-23 pushed a commit to vrdn-23/vllm that referenced this pull request Oct 23, 2024
…ject#6751)

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
…ject#6751)

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Alvant <alvasian@yandex.ru>
MErkinSag pushed a commit to MErkinSag/vllm that referenced this pull request Oct 26, 2024
…ject#6751)

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Erkin Sagiroglu <erkin@infra-aipipeline-1-at1-prox-prod-a.ipa.corp.telnyx.com>
garg-amit pushed a commit to garg-amit/vllm that referenced this pull request Oct 28, 2024
…ject#6751)

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Amit Garg <mitgarg17495@gmail.com>
FerdinandZhong pushed a commit to FerdinandZhong/vllm that referenced this pull request Oct 29, 2024
…ject#6751)

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: qishuai <ferdinandzhong@gmail.com>
sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024
…ject#6751)

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>
KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024
…ject#6751)

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
mfournioux pushed a commit to mfournioux/vllm that referenced this pull request Nov 20, 2024
…ject#6751)

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
tlrmchlsmth pushed a commit to neuralmagic/vllm that referenced this pull request Nov 23, 2024
…ject#6751)

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
7 participants