Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logits processor #176

Closed
rlouf opened this issue Jan 11, 2024 · 14 comments · Fixed by #224
Closed

Add logits processor #176

rlouf opened this issue Jan 11, 2024 · 14 comments · Fixed by #224
Assignees
Labels
enhancement New feature or request

Comments

@rlouf
Copy link

rlouf commented Jan 11, 2024

Feature request

Add the possibility for users to specify a function that processes the logits before sampling. This function would be called right before next_token_chooser.

Motivation

I am a maintainer of Outlines a library for guided generation: regex, JSON, grammar, etc. A user recently asked if we could integrate with LoRaX, and I think it would be beneficial for both libraries.

We can then discuss a potential deeper integration (allow function calling via JSON-guided generation), but this feature request is minimally intrusive and low maintenance.

Your contribution

I could help with submitting a PR.

@tgaddair tgaddair added the enhancement New feature or request label Jan 11, 2024
@tgaddair
Copy link
Contributor

Hey @rlouf, thanks for reaching out and offering to help with the PR! Outlines is an awesome project, and one that's definitely been on my radar, so excited for this.

Can you help me understand the interaction between Outlines and the LLM server (like LoRAX)? One way I could imagine this working is:

  1. User provides a JSON schema to LoRAX as a request parameter (similar to the vLLM example here):
curl http://0.0.0.1:8000 \
    -d '{
        "inputs": "What is the capital of France?",
        "parameters": {
            "schema": {"type": "string"}
          }
        }'
  1. In the backend, LoRAX executes some custom Outlines code that warps the logits just prior to the next_token_chooser call here.

Is that the right way to think about this v1 integration, or is there a different way to approach this?

We can certainly add a generic logit processor interface, but I wanted to first make sure I understand how it will be used so we can get the interface right.

@tgaddair tgaddair mentioned this issue Jan 12, 2024
36 tasks
@tgaddair
Copy link
Contributor

Looks like we could implement something similar to: https://github.com/outlines-dev/outlines/blob/main/outlines/serve/vllm.py

I could se two routes:

  1. We hardcode this implementation into LoRAX, so there is native support for Outlines (just requires including outlines as a dep, I imagine).
  2. We provide a generic interface to process logits and allow the user to provide a path to a file to be loaded on initialization containing the implementation. This would work, but could get tricky with depndencie and create security vulnerabilities from remote code execution.

@brucethemoose
Copy link

brucethemoose commented Jan 15, 2024

+1, I am extremely interested in this.

For reference, vllm applies logit processing here: https://github.com/vllm-project/vllm/blob/2a18da257ccd0d5beafcebe93246e4e220c88a12/vllm/model_executor/layers/sampler.py#L155

As for 1 vs 2... why not both? How about making Outlines work out-of-the-box, with zero extra configuration, as a "default" logit processing function. Or structure it as an optional dependency for those that don't want the processing, if that's desirable.

Then allow users to supply their own logit processing function as well, perhaps hidden behind a launch flag with a warning and a very minimal DIY interface.

Also, there is a performance concern here. I'm not familiar with Lorax's code at all, but heavy grammar processing can become a single-core performance bottleneck in some implementations.

@rlouf
Copy link
Author

rlouf commented Jan 18, 2024

We can certainly add a generic logit processor interface, but I wanted to first make sure I understand how it will be used so we can get the interface right.

This looks like a reasonable approach to me

@AdithyanI
Copy link

@tgaddair not to rush or push. Just wanted to check in if this is in the roadmap :)
Just asking so I can align my internal teams roadmap accordingly.
Also happy to pitch in with any help if required.

@tgaddair
Copy link
Contributor

Hey @AdithyanI, definitely still on the roadmap! We should have capacity to take this on our side in about 2 weeks if that works, but also open to contributions if someone wants to take this on sooner.

From the discussion, it sounds like the main items needed are:

  • Custom logit processor interface and a list of logit processors as an atrribute of the Model class
  • Ability to load logit processors from a file during LoRAX initialization (on startup)
  • Ability to specify generation constraints in the request that trigger the logit processors (for example --schema param that triggers JSON enforcement through an Outlines logit processor)

@tgaddair
Copy link
Contributor

tgaddair commented Feb 5, 2024

Draft PR is up in #224. @jeffreyftang will be picking up the work from here to test it out and resolve any issues (of which there are likely to be many, haha). But my hope is we can land this some time this week.

@tgaddair
Copy link
Contributor

@rlouf @AdithyanI @brucethemoose this has landed, please feel free to try it out. We're going to follow-up with some official docs, but the general usage is:

REST:

schema: "<string containing valid JSON schema>"

Python:

schema=<dict containing json schema>

@tgaddair
Copy link
Contributor

The prebuild Docker image now comes with Outlines preinstalled as well.

@rlouf
Copy link
Author

rlouf commented Feb 13, 2024

This is awesome!

@tgaddair
Copy link
Contributor

Documentation up here: https://predibase.github.io/lorax/guides/guided_generation/

cc @AdithyanI @brucethemoose

Next up will be to support OpenAI API and Pydantic schemas in Python client.

cc @jeffreyftang

@jeffreyftang
Copy link
Contributor

@rlouf I was taking a stab at enabling CFG generation with Outlines (using grammars.json), but consistently got garbage output (usually something like 1. followed by a bunch of newlines or similar).

I also tried running the CFG example from the Outline readme directly and things just seemed to hang.

Looks like there are quite a few open issues related to grammar in the Outlines repo; just curious if you have some insight as to what might be going wrong here. Thanks!

@brucethemoose
Copy link

brucethemoose commented Feb 16, 2024

Lark grammar is somewhat "dangerous," certain configs can make Outlines (and other implementations) hang. Same goes for llama.cpp's grammar.

It also kinda has to "agree" with the prompt and model so it has some valid logits to choose from.

@jeffreyftang Do you have an example of the exact grammar file/string you used?

And yeah, Outlines may be having some CFG issues on top of that, but I am not up to speed.

@jeffreyftang
Copy link
Contributor

@brucethemoose the one where it hung was actually the arithmetic example from the outlines readme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants