-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logits processor #176
Comments
Hey @rlouf, thanks for reaching out and offering to help with the PR! Outlines is an awesome project, and one that's definitely been on my radar, so excited for this. Can you help me understand the interaction between Outlines and the LLM server (like LoRAX)? One way I could imagine this working is:
Is that the right way to think about this v1 integration, or is there a different way to approach this? We can certainly add a generic logit processor interface, but I wanted to first make sure I understand how it will be used so we can get the interface right. |
Looks like we could implement something similar to: https://github.com/outlines-dev/outlines/blob/main/outlines/serve/vllm.py I could se two routes:
|
+1, I am extremely interested in this. For reference, vllm applies logit processing here: https://github.com/vllm-project/vllm/blob/2a18da257ccd0d5beafcebe93246e4e220c88a12/vllm/model_executor/layers/sampler.py#L155 As for 1 vs 2... why not both? How about making Outlines work out-of-the-box, with zero extra configuration, as a "default" logit processing function. Or structure it as an optional dependency for those that don't want the processing, if that's desirable. Then allow users to supply their own logit processing function as well, perhaps hidden behind a launch flag with a warning and a very minimal DIY interface. Also, there is a performance concern here. I'm not familiar with Lorax's code at all, but heavy grammar processing can become a single-core performance bottleneck in some implementations. |
This looks like a reasonable approach to me |
@tgaddair not to rush or push. Just wanted to check in if this is in the roadmap :) |
Hey @AdithyanI, definitely still on the roadmap! We should have capacity to take this on our side in about 2 weeks if that works, but also open to contributions if someone wants to take this on sooner. From the discussion, it sounds like the main items needed are:
|
Draft PR is up in #224. @jeffreyftang will be picking up the work from here to test it out and resolve any issues (of which there are likely to be many, haha). But my hope is we can land this some time this week. |
@rlouf @AdithyanI @brucethemoose this has landed, please feel free to try it out. We're going to follow-up with some official docs, but the general usage is: REST:
Python:
|
The prebuild Docker image now comes with Outlines preinstalled as well. |
This is awesome! |
Documentation up here: https://predibase.github.io/lorax/guides/guided_generation/ Next up will be to support OpenAI API and Pydantic schemas in Python client. |
@rlouf I was taking a stab at enabling CFG generation with Outlines (using I also tried running the CFG example from the Outline readme directly and things just seemed to hang. Looks like there are quite a few open issues related to grammar in the Outlines repo; just curious if you have some insight as to what might be going wrong here. Thanks! |
Lark grammar is somewhat "dangerous," certain configs can make Outlines (and other implementations) hang. Same goes for llama.cpp's grammar. It also kinda has to "agree" with the prompt and model so it has some valid logits to choose from. @jeffreyftang Do you have an example of the exact grammar file/string you used? And yeah, Outlines may be having some CFG issues on top of that, but I am not up to speed. |
@brucethemoose the one where it hung was actually the arithmetic example from the outlines readme. |
Feature request
Add the possibility for users to specify a function that processes the logits before sampling. This function would be called right before
next_token_chooser
.Motivation
I am a maintainer of Outlines a library for guided generation: regex, JSON, grammar, etc. A user recently asked if we could integrate with LoRaX, and I think it would be beneficial for both libraries.
We can then discuss a potential deeper integration (allow function calling via JSON-guided generation), but this feature request is minimally intrusive and low maintenance.
Your contribution
I could help with submitting a PR.
The text was updated successfully, but these errors were encountered: