Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Update to outlines >= 0.1.8 #10576

Merged
merged 1 commit into from
Dec 10, 2024

Conversation

russellb
Copy link
Collaborator

@russellb russellb commented Nov 22, 2024

This PR updates to the latest release of outlines that works with vllm.

It is a draft while we wait for 0.1.8 to be on pypi.

FIX #3794
FIX #10489

279ccc9 [Core] Update to outlines >= 0.1.8

commit 279ccc9
Author: Russell Bryant rbryant@redhat.com
Date: Thu Nov 21 21:25:22 2024 +0000

[Core] Update to outlines >= 0.1.8

0.1.x prior to 0.1.8 + outlines-core 0.1.18 had issues with
serialization that broke vllm integration.

Also change our code slightly to account for an API change in
outlines.

Signed-off-by: Russell Bryant <rbryant@redhat.com>

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@russellb
Copy link
Collaborator Author

0.1.5 has been released, so this is ready to test

@russellb russellb marked this pull request as ready for review November 22, 2024 16:27
@russellb russellb marked this pull request as draft November 22, 2024 16:59
@russellb
Copy link
Collaborator Author

converted back to a draft because I don't have 0.1.5 working locally, yet ...

@joennlae
Copy link
Contributor

joennlae commented Nov 30, 2024

I introduce this PR to outlines:

dottxt-ai/outlines-core#99

The caching issue (or not being able to pickle) also affects the multiprocessing-based engine in vllm as we need to be able to pickle the RegexGuide. I introduce the pickling capability with the above PR.

References:

lp_bytes = cloudpickle.dumps(logits_processors)

@russellb russellb force-pushed the outlines-0.1.4 branch 2 times, most recently from 1b6e5f6 to 22ea8e8 Compare December 2, 2024 15:15
@markmc
Copy link
Contributor

markmc commented Dec 2, 2024

I introduce this PR to outlines:

dottxt-ai/outlines-core#99

The caching issue (or not being able to pickle) also affects the multiprocessing-based engine in vllm as we need to be able to pickle the RegexGuide. I introduce the pickling capability with the above PR.

Thank you @joennlae - this works for me

Here are some performance measurements with (a) older outlines, (b) newer outlines with your patch but no caching, (c) newer outlines with your patch and caching reinstated:

(a) commit f9310cbd0c1109c4f22cf9f1dc615b2d08f06408 from main
   with	pull/10557/head:xuechendi-benchmark-structured-output commit 41966603
   outlines==0.0.46

   $ python benchmarks/benchmark_guided.py --async-engine --model meta-llama/Llama-3.2-1B-Instruct --output-len 2048 --num-prompts 10 --save-results
   Throughput: 0.14 requests/s, 361.06 total tokens/s, 281.59 output tokens/s Correct rate is 100.0 %
   Throughput: 0.16 requests/s, 414.12 total tokens/s, 322.97 output tokens/s Correct rate is 100.0 %
   Throughput: 0.16 requests/s, 414.91 total tokens/s, 323.59 output tokens/s Correct rate is 100.0 %

(b) commit b8c2895a2cbe249e86713615c9ed3ab132812b08 from russellb/outlines-0.1.4
   with pull/10557/head:xuechendi-benchmark-structured-output commit 41966603
   outlines==0.1.7 (from PyPi)
   outlines_core 412ef296392a0814a5490ccc15080e79f98cd411 from 44ai-labs/serialization "release build" (pip install .)

   $ python benchmarks/benchmark_guided.py --async-engine --model meta-llama/Llama-3.2-1B-Instruct --output-len 2048 --num-prompts 10 --save-results
   Throughput: 0.06 requests/s, 167.18 total tokens/s, 130.38 output tokens/s Correct rate is 100.0 %
   Throughput: 0.06 requests/s, 167.48 total tokens/s, 130.61 output tokens/s Correct rate is 100.0 %
   Throughput: 0.07 requests/s, 171.11 total tokens/s, 133.45 output tokens/s Correct rate is 100.0 %

(c)   same	as above, with "Stop using outlines.caching.cache" reverted
   Throughput: 0.16 requests/s, 410.16 total tokens/s, 319.88 output tokens/s Correct rate is 100.0 %
   Throughput: 0.15 requests/s, 406.16 total tokens/s, 316.76 output tokens/s Correct rate is 100.0 %
   Throughput: 0.16 requests/s, 408.68 total tokens/s, 318.73 output tokens/s Correct rate is 100.0 %

torymur pushed a commit to dottxt-ai/outlines-core that referenced this pull request Dec 2, 2024
I understand that `pickleable` is not your priority right now. But the
`RegexGuide` needs to be pickled for `vllm` production use, which is
multiprocessing-based.

This PR reintroduces this pickling capability + some tests.

I understand that this introduces more effort on your side.

References:
dottxt-ai/outlines#1274
vllm-project/vllm#10490
vllm-project/vllm#10576
vllm-project/vllm#10489

It would also tackle the current caching issues: 
huggingface/text-generation-inference#2766
dottxt-ai/outlines#1283

Closes:
#95
Copy link

mergify bot commented Dec 3, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @russellb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Dec 3, 2024
@russellb russellb changed the title [Core] Update to outlines > 0.1.4 [Core] Update to outlines >= 0.1.8 Dec 3, 2024
@russellb
Copy link
Collaborator Author

russellb commented Dec 3, 2024

I think this will be ready once outlines 0.1.8 is available on pypi.

@mergify mergify bot removed the needs-rebase label Dec 3, 2024
@russellb russellb marked this pull request as ready for review December 6, 2024 16:13
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 6, 2024
@rlouf
Copy link

rlouf commented Dec 6, 2024

It is !

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending green CI!

0.1.x prior to 0.1.8 + outlines-core 0.1.18 had issues with
serialization that broke vllm integration.

Also change our code slightly to account for an API change in
outlines.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
@youkaichao youkaichao merged commit e739194 into vllm-project:main Dec 10, 2024
72 of 74 checks passed
Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Dec 12, 2024
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
Signed-off-by: Russell Bryant <rbryant@redhat.com>
ZenPuzzle pushed a commit to ZenPuzzle/vllm that referenced this pull request Dec 24, 2024
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Support outlines versions > v0.1.0 [Feature]: Make outlines dependency optional
7 participants