-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON formatting issue #1191
Comments
Indeed supporting guided generation would be more efficient, but I don't see adding support for any of the libraries you mentioned on the Roadmap. Some workarounds until then
|
I think the best may be to add support (API) for the integration of guidance libraries, then provide an adapter for each of the popular libraries separately. @WoosukKwon What do you think? |
Related to #535 |
Mostly the same as #288 |
LM Format Enforcer is a library that achieves this and supports vLLM. (Disclosure: I am the author of the library) |
@noamgat does LM Format Enforcer support api access to vllm ? |
With LM Format Enforcer its the other way around - it integrates into the inference engine's pipeline. So if you have existing vLLM code, its easy to plug LMFE into it. |
since its not an integration with the VLLM Server though, I cant just have an LLM deployment use the format enforcer against that, I need to have the model loaded locally/ in the sample place as the format enforcer code.... |
Yes, this is a known issue. If vLLM would be interested in adopting a similar solution, it would cause the LMFE to also work with server / multi GPU deployments. |
That would be nice. Support should be added for LMQL as well, I think. |
Is that not what's happening here @noamgat @viktor-ferenczi #535 Any chance we can get this in? |
A similar PR to #535 (disclosure: mine) was already merged. This is how LMFE works with vllm - example notebook. The challenge (that #535 does not cover) is how do we pass this information in a multiprocess / networked environment to the custom code. A serialization / "simple object" parametrization solution is required, which is what was proposed in the PR to huggingface-inference-server. |
@noamgat so maybe I'm not super clear on what's going on internally. I have not done my full due diligence and deep dived the code but.... I saw your PR that got in. I thought we just needed another one that utilizes what you put in, inside of the API/ openapi layer. My understanding was that the final comment on 535 was a request for him to rebase and utilize your code in the api layer. I was thinking about making a dumb solution for myself where I actually install LMfE into VLLM, then create a new api endpoint that is essentially a parameterized version of the notebook you shared. I was thinking I'd have it inside the openapi compatible server and make it look like function calling. Not sure how efficient it would be but could get the job done quickly to accelerate me. This would probably just be a personal fork until the better solution gets in. I don't feel qualified to try and implement the correct pattern, but happy to give it a shot if you'll hold my hand a bit through the process. |
Outlines now provides a "fork" of vLLM's FastAPI deployment example, if that helps: https://outlines-dev.github.io/outlines/reference/vllm/ |
That is super cool! Congrats on the release! |
Support for guided decoding using |
How to get the response from vLLM in a proper JSON format.
Does vLLM support outlines, guidance or jsonformer libraries?
The text was updated successfully, but these errors were encountered: