-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformers separate model server? #48
Comments
In the works. |
If I understand the OP, this is something I am looking for as well. I want to host an ONNX model with Triton and have that interface with Guidance. @marcotcr, will what you have in the works support this? |
I think this will be a issue for many, as the specifics of running an LLM are changing so fast that Guidance will have a hard time keeping up (see exllama for an example). If Guidance is in fact just using a REST API to talk to OpenAI, depending on the API features being used, it should be possible to switch out OpenAI's server for a local server running an OpenAI-compatible API such as text-generation-webui. To that end, it would be really interesting/useful to see a list of all the API features that Guidance uses, so developers of open-source OpenAI API's could prioritize those features, since the API support for OpenAI in projects like text-generation-webui are certainly not complete. |
This may be a stupid question, please forgive if so
the openAI interface obviously relies an on idenpendently existing server for gpt-3.5 and gpt-4
the Transformers interface, though, assumes guidance will load the model internally. Loading models in Transformers takes forever, even when already cached.
Is there a way to point to an existing 'guidance' server to handle guidance prompts, so I don't have to wait an entire model startup cycle every prompt test when using Transformer models like Wizard-13B?
The text was updated successfully, but these errors were encountered: