[V1] Enable custom ops with piecewise CUDA graphs #10228

WoosukKwon · 2024-11-11T19:51:12Z

A correct version of #10227

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

github-actions · 2024-11-11T19:51:23Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

youkaichao · 2024-11-11T19:56:22Z

vllm/v1/worker/gpu_model_runner.py

@@ -405,6 +406,7 @@ def load_model(self) -> None:
        if self.use_cuda_graph:
            # FIXME(woosuk): Currently, we do not use inductor to reduce the
            # compilation time and any potential issues with the inductor.
+            os.environ["VLLM_CUSTOM_OPS"] = "all"


@ProExpertProg we should improve the default option, basing on the value of use_inductor .

youkaichao

thanks for figuring it out!

WoosukKwon · 2024-11-12T19:18:37Z

Actually this PR broke CUDA graphs :( Needs to be fixed.

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

[V1] Enable custom ops with piecewise CUDA graphs

68a8aea

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

WoosukKwon requested a review from youkaichao November 11, 2024 19:51

youkaichao reviewed Nov 11, 2024

View reviewed changes

youkaichao approved these changes Nov 11, 2024

View reviewed changes

WoosukKwon merged commit 9d5b4e4 into main Nov 11, 2024
13 of 16 checks passed

WoosukKwon deleted the v1-use-custom-op branch November 11, 2024 19:58

WoosukKwon mentioned this pull request Nov 12, 2024

[V1] Enable Inductor when using piecewise CUDA graphs #10268

Merged

rickyyx pushed a commit to rickyyx/vllm that referenced this pull request Nov 13, 2024

[V1] Enable custom ops with piecewise CUDA graphs (vllm-project#10228)

7a772e7

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024

[V1] Enable custom ops with piecewise CUDA graphs (vllm-project#10228)

bf7ec1c

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[V1] Enable custom ops with piecewise CUDA graphs (vllm-project#10228)

5c7e4b4

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

tlrmchlsmth pushed a commit to neuralmagic/vllm that referenced this pull request Nov 23, 2024

[V1] Enable custom ops with piecewise CUDA graphs (vllm-project#10228)

b500ed1

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[V1] Enable custom ops with piecewise CUDA graphs (vllm-project#10228)

c4abbf4

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1] Enable custom ops with piecewise CUDA graphs #10228

[V1] Enable custom ops with piecewise CUDA graphs #10228

WoosukKwon commented Nov 11, 2024

github-actions bot commented Nov 11, 2024

youkaichao Nov 11, 2024

youkaichao left a comment

WoosukKwon commented Nov 12, 2024

[V1] Enable custom ops with piecewise CUDA graphs #10228

[V1] Enable custom ops with piecewise CUDA graphs #10228

Conversation

WoosukKwon commented Nov 11, 2024

github-actions bot commented Nov 11, 2024

youkaichao Nov 11, 2024

Choose a reason for hiding this comment

youkaichao left a comment

Choose a reason for hiding this comment

WoosukKwon commented Nov 12, 2024