Transformers separate model server? #48

bdambrosio · 2023-05-19T03:02:23Z

This may be a stupid question, please forgive if so

the openAI interface obviously relies an on idenpendently existing server for gpt-3.5 and gpt-4

the Transformers interface, though, assumes guidance will load the model internally. Loading models in Transformers takes forever, even when already cached.

Is there a way to point to an existing 'guidance' server to handle guidance prompts, so I don't have to wait an entire model startup cycle every prompt test when using Transformer models like Wizard-13B?

marcotcr · 2023-06-06T16:13:03Z

In the works.

zacharyblank · 2023-06-30T14:25:15Z

If I understand the OP, this is something I am looking for as well. I want to host an ONNX model with Triton and have that interface with Guidance. @marcotcr, will what you have in the works support this?

tensiondriven · 2023-07-02T17:32:35Z

I think this will be a issue for many, as the specifics of running an LLM are changing so fast that Guidance will have a hard time keeping up (see exllama for an example). If Guidance is in fact just using a REST API to talk to OpenAI, depending on the API features being used, it should be possible to switch out OpenAI's server for a local server running an OpenAI-compatible API such as text-generation-webui.

To that end, it would be really interesting/useful to see a list of all the API features that Guidance uses, so developers of open-source OpenAI API's could prioritize those features, since the API support for OpenAI in projects like text-generation-webui are certainly not complete.

bdambrosio mentioned this issue May 19, 2023

Support for huggingface/text-generation-inference #33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformers separate model server? #48

Transformers separate model server? #48

bdambrosio commented May 19, 2023

marcotcr commented Jun 6, 2023

zacharyblank commented Jun 30, 2023

tensiondriven commented Jul 2, 2023

Transformers separate model server? #48

Transformers separate model server? #48

Comments

bdambrosio commented May 19, 2023

marcotcr commented Jun 6, 2023

zacharyblank commented Jun 30, 2023

tensiondriven commented Jul 2, 2023