Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added local llm functionality by incorporating text-generation-webui #289

Merged
merged 37 commits into from
Jun 12, 2023

Conversation

sirajperson
Copy link
Contributor

@sirajperson sirajperson commented Jun 9, 2023

In this PM I have integrated text-generation-webui as a means of managing locally hosted LLMs.
In this PM the changes are as follows:
Created a setting for OPENAI_API_BASE_URL: This allows one to set the URL that the openai library is pointed to.
Created a docker image of Text Generation Web UI that includes multi-GPU off loading of GGMLs
Configured SuperAGI to use use the TGWUI docker image by default.

With this PM one can run the docker-compose up --build command, and then navigate to localhost:7860 to download models for use with SuperAGI from huggingface.co.

@Renegadesoffun
Copy link

Im all about this!! I think we all wanted autogpt to run locally to push the limits without the bank! Stoked to see what we can run locally. And yeah then optimization of gpu preset, cblast, llama etc. will be next to auto optimize locall llms. Great job tho! 👏

@TransformerOptimus
Copy link
Collaborator

TransformerOptimus commented Jun 9, 2023

@sirajperson. This is awesome.

Tried running docker-compose on new changes(on macbook 16gb RAM). Getting the following error.

0 32.90 Ignoring bitsandbytes: markers 'platform_system == "Windows"' don't match your environment
0 32.90 Ignoring llama-cpp-python: markers 'platform_system == "Windows"' don't match your environment
0 32.90 Ignoring auto-gptq: markers 'platform_system == "Windows"' don't match your environment
0 32.90 ERROR: auto_gptq-0.2.0+cu117-cp310-cp310-linux_x86_64.whl is not a supported wheel on this platform.

failed to solve: executor failed running [/bin/sh -c pip3 install -r /app/requirements.txt]: exit code: 1

@sirajperson
Copy link
Contributor Author

@sirajperson. This is awesome.

Tried running docker-compose on new changes(on macbook 16gb RAM). Getting the following error.

0 32.90 Ignoring bitsandbytes: markers 'platform_system == "Windows"' don't match your environment

0 32.90 Ignoring llama-cpp-python: markers 'platform_system == "Windows"' don't match your environment
0 32.90 Ignoring auto-gptq: markers 'platform_system == "Windows"' don't match your environment
0 32.90 ERROR: auto_gptq-0.2.0+cu117-cp310-cp310-linux_x86_64.whl is not a supported wheel on this platform.
failed to solve: executor failed running [/bin/sh -c pip3 install -r /app/requirements.txt]: exit code: 1

Thanks for the reply. As for the build, are you able to build the docker image on the main branch?

Removed device specific launch arguments. Settings must be done after installation.
Removed additional default settings for GPU usage. These settings need to be configured via the configuration.yaml file perhaps.
Additional generalized default device settings.
@sirajperson
Copy link
Contributor Author

sirajperson commented Jun 9, 2023

Okay, it looks like there error that you are getting is while trying to execute the install requirements.txt file from text generation web ui. On line 25 of tgwui_requirements.txt try commenting out the last line. In order to make this work on your local machine there are a couple of installation steps that you may have to take using mac. I'm not sure what kind of video card you have, or if you are using a laptop, but you should be able to remove that last line from the requirements.txt file to get it installed.

Also, I have removed the following items from the launch arguments so that TGWUI doesn't automatically target devices with nvidia GPUs.

For now configuration of the docker-compose.yaml needs to be done manually. I will create an build.sh script tonight that will create the docker-compose.yaml file to load build options based on the target installation environment. Until then I have commented out GPU offloading and GPU configuration. This will make the model's API responses much slower, but will greatly increase the number of devices that the containers can run on without having to modify the docker-compose.yaml file.

Also, llama.cpp GPU offloading doesn't presently support removing offloaded layers from system RAM. Instead, it is presently making a copy of it in vRAM and then executing the layers there. This, from what I understand, is currently being addressed.

Please reclone the repository and try running from scratch. You may need to remove the containers that you already tried to build. Please refer to docker container management to get information on how to remove containers. If this is the only items that you are using containers for on your system then you can call 'docker system prune' to remove containers that aren't running and to wipe the previous build cache. Don't run prune if you have other docker images that you would like to save installed, or it will delete them.

@sirajperson
Copy link
Contributor Author

sirajperson commented Jun 9, 2023

@TransformerOptimus
Another hiccup I've run into in working with local LLMs is the difference in token length between llama's, and it's derivatives', token limit. The token limit of lama is 2048 while the token limit for gpt-3.5 and gpt-4 are 4096 and 8192 (for the api version) respectively. I was thinking that it might be a good idea to consider token limits in the session formation. I'll will work on a something like that later tonight, but any ideas would be greatly appreciated.

Searx is a free internet metasearch engine which aggregates
results from more than 70 search services. I added a simple
beautifulsoup scraper that allows many of the instances on
https://searx.space/ to be used without an API key. Uses the
bs4, httpx, and pydantic packages which are already in the
`requirements.txt`.
@TransformerOptimus
Copy link
Collaborator

@TransformerOptimus
Another hiccup I've run into in working with local LLMs is the difference in token length between llama's, and it's derivatives', token limit. The token limit of lama is 2048 while the token limit for gpt-3.5 and gpt-4 are 4096 and 8192 (for the api version) respectively. I was thinking that it might be a good idea to consider token limits in the session formation. I'll will work on a something like that later tonight, but any ideas would be greatly appreciated.

There are multiple components in a prompt.

  1. Base prompt - includes goals, constraints, tools etc.
  2. Short term memory
  3. Long term memory(WIP)
  4. Knowledge base - preseeded knowledge for agents (it is WIP).

We can give certain percentage of weight to each of components.
Base prompt weight can’t be changed but we can come up with variations. STM, LTM, Knowledge can have a weight of 1/3 of remaining tokens available or can be kept configurable.

@@ -51,7 +51,7 @@ def extract_json_section(cls, input_str: str = ""):

@classmethod
def remove_escape_sequences(cls, string):
return string.encode('utf-8').decode('unicode_escape').encode('raw_unicode_escape').decode('utf-8')
return string.encode('utf-8').decode('unicode_escape')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this to old change. This is required for non-English encoding.

sirajperson and others added 3 commits June 10, 2023 01:03
Updated reversion on line 54
fix misspelled word SERACH to SEARCH
@luciferlinx101
Copy link
Collaborator

@sirajperson Hey can you let me know what to do after localhost:7860 I was able to set up locally but how to choose and test with different model?
can you explain in detail what are the next steps after docker compose up --build

@TransformerOptimus
Copy link
Collaborator

Can we keep the new docker-compose with local llm different from the current docker-compose file (something like - docker-compose.local_llm.yaml). We don't know how many devs want to run the local model directly by default. We can add a section in the readme for the local model.

If a local LLM url is set in the config.yaml file, it uses tgwui container, which can be managed from 127.0.0.1:7860
The tgui is configured to run models on CPU only mode by default, but can be configured to run in other modes via the build option settings in the docker-compose.yaml
@sirajperson
Copy link
Contributor Author

@TransformerOptimus Sure, It would be nice to be able to specify the use of local LLMs as a build arg. If we hand docker-compose something like --build-args use_local_llm=true than the compose executes the tgwui build.

@sirajperson
Copy link
Contributor Author

In my last commit everything seems to be basically working. I've had 4 successful runs in the past 24 hours. I'll go ahead and separate the docker-compose files. Let me know if the last commit is working well on Mac. I'm on Linux.

@sirajperson
Copy link
Contributor Author

sirajperson commented Jun 10, 2023

@luciferlinx101 As of the last commit, Jun 10, to use Local LLMs follow these steps:
Clone my development branch
Copy the config_template.yaml to config.yaml

Edit the config.yaml file and make the following changes:
Comment out line 7: OPENAI_API_BASE: https://api.openai.com/v1
Uncomment line 8: #OPENAI_API_BASE: "http://super__tgwui:5001/v1"

Modify the following lines to match the model you plan on using:
MAX_TOOL_TOKEN_LIMIT: 800
MAX_MODEL_TOKEN_LIMIT: 2048 # set to 2048 for llama or 4032 for GPT-3.5

For llama based models I have successfully been using 500, and 2048 respectively

Run docker compose:
docker-compose up --build

Note, that if you want to use more advanced features like loading models into GPUs then you will need to do additional configuration in the docker-compose.yaml file. I have tried to leave comments in the current file as basic instructions. For more information on specific text-generation-webui builds I would recommend that you review the instructions on the text-generation-webui-docker github repo. https://github.com/Atinoda/text-generation-webui-docker

After you have successfully built the containers, point your browser to 127.0.0.1:7860 and click on the models tab. In the feild "Download custom model or LoRA" enter the huggingface model identifier you would like to use, such as:

TheBloke/Vicuna-13B-CoT-GGML

Then click download. Then in the selection drop down menu select the model that you just downloaded and wait for it to load.

Finally, point your browser to 127.0.0.1:3000 to begin using the agent.

Cheers!

Please be aware that my fork is development branch and is undergoing a PR, and will be changing more soon. In other words, it isn't stable.

@TransformerOptimus
Copy link
Collaborator

In my last commit everything seems to be basically working. I've had 4 successful runs in the past 24 hours. I'll go ahead and separate the docker-compose files. Let me know if the last commit is working well on Mac. I'm on Linux.

Mac it is still failing. Getting this error. Seems running fine on ubuntu.
#0 17.80
#0 17.80 note: This error originates from a subprocess, and is likely not a problem with pip.
#0 17.80 ERROR: Failed building wheel for llama-cpp-python
#0 17.80 Failed to build llama-cpp-python
#0 17.80 ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

failed to solve: executor failed running [/bin/sh -c pip uninstall -y llama-cpp-python && CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python]: exit code: 1

@TransformerOptimus
Copy link
Collaborator

In my last commit everything seems to be basically working. I've had 4 successful runs in the past 24 hours. I'll go ahead and separate the docker-compose files. Let me know if the last commit is working well on Mac. I'm on Linux.

Mac it is still failing. Getting this error. Seems running fine on ubuntu.

#0 17.80
#0 17.80 note: This error originates from a subprocess, and is likely not a problem with pip.
#0 17.80 ERROR: Failed building wheel for llama-cpp-python
#0 17.80 Failed to build llama-cpp-python
#0 17.80 ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
failed to solve: executor failed running [/bin/sh -c pip uninstall -y llama-cpp-python && CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python]: exit code: 1

Ran the code change on https://lmsys.org/about/ and was able to get

@TransformerOptimus Sure, It would be nice to be able to specify the use of local LLMs as a build arg. If we hand docker-compose something like --build-args use_local_llm=true than the compose executes the tgwui build.

Sounds good

@sirajperson
Copy link
Contributor Author

@TransformerOptimus I'm modifying the default build to remove llama-cuda.

Changed build to default instead of cublas
@sirajperson
Copy link
Contributor Author

Okay I'll do a fresh clone and see if it works. I might just pull the macbook off the shelf and spin it up to debug. Although my mac is an x68 from '18. I'm not sure if I can reproduce. Let me know if commenting out the build line resolved the build issue.

I’m and others added 5 commits June 11, 2023 16:14
Removed the installation video (temporarily)
	renamed:    docker-compose.yaml.bak -> local-llm
	new file:   local-llm-gpu
	modified:   superagi/helper/json_cleaner.py
	renamed:    DockerfileTGWUI -> tgwui/DockerfileTGWUI
	deleted:    tgwui/config/place-your-models-here.txt
	deleted:    tgwui/tgwui_requirements.txt
@sirajperson
Copy link
Contributor Author

sirajperson commented Jun 11, 2023

@TransformerOptimus
In the last commit I'v separated the docker compose files such with the following build schema:

Default build, no local LLMs or OPENAI_API_BASE redirect.
docker-compose up --build

Build with local GPU LLM support. This mode uses memory caching by default. It isn't very fast, but it works on a much greater number of host machines than using GPU mode.
docker-compose -f local-llm up --build

And finally, the more advanced GPU build. This build may require that additional packages be installed on the host machine. I would suggest that anyone trying to build a GPU install of the container read the docs on the in the Text Generation Web UI doc files. They are very informative and well written. TGWUI Docs
docker-compose -f local-llm-gpu up --build

Please note, that Text Generation Web UI is a large project with lots of functionality. It's worth taking time to get to know it to use local LLMs efficiently

@sirajperson
Copy link
Contributor Author

sirajperson commented Jun 11, 2023

@luciferlinx101 I quite a bit of time looking for the that was causing a hang with the agent execution. Please look over the README.md file in the OpenAI plugin root folder of TGWUI. The openai plugin is currently under development and is not yet fully implemented. This PR is for integration of TGWUI as a model management tool. The issue might be more easily resolved by creating an issue on the projects issue tracker as not being able to sequence API calls to the OpenAI API plugin. The readme can found here.

@luciferlinx101
Copy link
Collaborator

@sirajperson Sure I will test and let you know if it works properly.

@TransformerOptimus
Copy link
Collaborator

Merging it to dev. We will merge the dev -> main along with other changes tomorrow. I could able to get it running in Windows and Ubuntu VM.
Able to get it running on my windows.
Mac still throwing "ERROR: auto_gptq-0.2.2+cu117-cp310-cp310-linux_x86_64.whl is not a supported wheel on this platform" error again.

@TransformerOptimus TransformerOptimus changed the base branch from main to dev June 12, 2023 18:55
@TransformerOptimus TransformerOptimus merged commit 6789ab6 into TransformerOptimus:dev Jun 12, 2023
@luciferlinx101
Copy link
Collaborator

luciferlinx101 commented Jun 12, 2023

@sirajperson I was also able to get the tool http://localhost:7860/ and download the model but we wanted proper readme after this step which could be used to use this local model as part of super agi such that in frontend people seeing the agent actually runs on this local model not open ai.

This will help all users once we merge dev to main.

The minimum requirement for the README is to showcase set and use case with atleast one particular model end to end with superagi.

@sirajperson
Copy link
Contributor Author

sirajperson commented Jun 13, 2023

@luciferlinx101
Sure, I'll be happy to do that.

@sirajperson
Copy link
Contributor Author

@TransformerOptimus

I'll go ahead and get to the bottom of that bug. Commenting out the line changes the Dockerfile that is present on the TGWUI docker github project. I would like to avoid changing files, and to try to keep it the same. That way instead of having additional folders, files, and configurations down the line we can just do a git pull for the repo, so long as it's maintained. For now, I will go ahead and comment it out until the issue is investigated further.

@atxcowboy
Copy link

@TransformerOptimus In the last commit I'v separated the docker compose files such with the following build schema:

Default build, no local LLMs or OPENAI_API_BASE redirect. docker-compose up --build

Build with local GPU LLM support. This mode uses memory caching by default. It isn't very fast, but it works on a much greater number of host machines than using GPU mode. docker-compose -f local-llm up --build

And finally, the more advanced GPU build. This build may require that additional packages be installed on the host machine. I would suggest that anyone trying to build a GPU install of the container read the docs on the in the Text Generation Web UI doc files. They are very informative and well written. TGWUI Docs docker-compose -f local-llm-gpu up --build

Please note, that Text Generation Web UI is a large project with lots of functionality. It's worth taking time to get to know it to use local LLMs efficiently

This is great! I am familiar with Oobabooga and developed a little extension for it.

A few observations:

  • I got it to spin up with GPU support and the model loads about 10 times faster with GPU support rather than CPU on my system. I would clearly prefer to be able to just point to my own customized defined Ooobabooga installation location at a certain URL as I'd want to make use of gptq and other extended capabilities. I guess ideally a user gets the choice between using the dockerized version or a custom endpoint.

  • On the GPU I also get the error you mentioned:
    redis.exceptions.DataError: Invalid input of type: 'list'. Convert to a bytes, string, int or float first.

  • On the Create Agent Screen SuperAGI would need to be aware of the available LLMs, as it currently only shows the predefined ChatGPT models.
    createagent

@luciferlinx101
Copy link
Collaborator

@luciferlinx101 Sure, I'll be happy to do that.

Hey any updates of the stepwise readme showing end to end use case with superagi?

@sirajperson
Copy link
Contributor Author

@luciferlinx101
Sorry, about the delay. Been a bit busy this week. LoL

@GuruVirus
Copy link

Adding encouragement for @sirajperson : more of us are watching for the steps to successfully set this up.

@luciferlinx101
Copy link
Collaborator

luciferlinx101 commented Jun 17, 2023

@luciferlinx101 Sorry, about the delay. Been a bit busy this week. LoL

No problem! Let me know if it is done.

@malicorX
Copy link

i'm running this for ooba:

malicor@DESKTOP-I087DO5:/mnt/d/ai/oobabooga_WSL/text-generation-webui$ python3 server.py --wbits 4 --groupsize 128 --model_type llama --model WizardLM-7B-uncensored-GPTQ --api --extensions long_term_memory, EdgeGPT --no-stream

i can use ooba via http://localhost:7860/ and if i ask it "how many legs does a spider have", it gives me correct answer

i started superAGI with:

D:\AI\SuperAGI>docker-compose -f local-llm-gpu up --build

it did a lot of installing and ended up with:

superagi-gui-1 | - ready started server on 0.0.0.0:3000, url: http://localhost:3000
superagi-gui-1 | - event compiled client and server successfully in 957 ms (1314 modules)
superagi-gui-1 | - wait compiling...
superagi-gui-1 | - event compiled client and server successfully in 218 ms (1314 modules)
superagi-super__postgres-1 | 2023-06-17 08:26:33.196 UTC [26] LOG: checkpoint starting: time
superagi-super__postgres-1 | 2023-06-17 08:26:33.230 UTC [26] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.008 s, sync=0.004 s, total=0.035 s; sync files=2, longest=0.002 s, average=0.002 s; distance=0 kB, estimate=0 kB
superagi-gui-1 | - wait compiling /_error (client and server)...
superagi-gui-1 | - event compiled client and server successfully in 135 ms (1315 modules)
superagi-gui-1 | Do not add stylesheets using next/head (see tag with href="https://fonts.googleapis.com/css2?family=Source+Code+Pro&display=swap"). Use Document instead.
superagi-gui-1 | See more info here: https://nextjs.org/docs/messages/no-stylesheets-in-head-component
superagi-gui-1 | Do not add stylesheets using next/head (see tag with href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap"). Use Document instead.
superagi-gui-1 | See more info here: https://nextjs.org/docs/messages/no-stylesheets-in-head-component
superagi-gui-1 | - wait compiling /favicon.ico/route (client and server)...
superagi-gui-1 | - event compiled successfully in 43 ms (146 modules)

in the config.yaml file i set this:

For locally hosted LLMs comment out the next line and uncomment the one after

to configure a local llm point your browser to 127.0.0.1:7860 and click on the model tab in text generation web ui.

#OPENAI_API_BASE: https://api.openai.com/v1
OPENAI_API_BASE: "http://super__tgwui:5001/v1"

"gpt-3.5-turbo-0301": 4032, "gpt-4-0314": 8092, "gpt-3.5-turbo": 4032, "gpt-4": 8092, "llama":2048, "mpt-7b-storywriter":45000

MODEL_NAME: "gpt-3.5-turbo-0301"
MAX_TOOL_TOKEN_LIMIT: 500 # or 800 for gpt3.5
MAX_MODEL_TOKEN_LIMIT: 2048 # set to 2048 for llama or 4032 for gpt3.5

when i now start http://localhost:3000/ i get the purple superAGI screen and it saying "Initializing superAGI", but it's stuck there.

any idea what i m doing wrong?

@DaKingof
Copy link

It would be fantastic to have a tut that goes over how to set up open source LLM's with SuperAGI. I would love to do test runs on my own hardware to see if my configuration looks pretty good, then take it out to the more powerful LLM's after I have confirmed it looked like I am close to where I need to be with my instructions.

Also, we could save some work by pointing to the oobabooga installation instructions for their part...but hooking into SuperAGI seems to be the part I am amiss at here.

@sirajperson
Copy link
Contributor Author

sirajperson commented Jun 26, 2023

i'm running this for ooba:

malicor@DESKTOP-I087DO5:/mnt/d/ai/oobabooga_WSL/text-generation-webui$ python3 server.py --wbits 4 --groupsize 128 --model_type llama --model WizardLM-7B-uncensored-GPTQ --api --extensions long_term_memory, EdgeGPT --no-stream

i can use ooba via http://localhost:7860/ and if i ask it "how many legs does a spider have", it gives me correct answer

i started superAGI with:

D:\AI\SuperAGI>docker-compose -f local-llm-gpu up --build

it did a lot of installing and ended up with:

superagi-gui-1 | - ready started server on 0.0.0.0:3000, url: http://localhost:3000 superagi-gui-1 | - event compiled client and server successfully in 957 ms (1314 modules) superagi-gui-1 | - wait compiling... superagi-gui-1 | - event compiled client and server successfully in 218 ms (1314 modules) superagi-super__postgres-1 | 2023-06-17 08:26:33.196 UTC [26] LOG: checkpoint starting: time superagi-super__postgres-1 | 2023-06-17 08:26:33.230 UTC [26] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.008 s, sync=0.004 s, total=0.035 s; sync files=2, longest=0.002 s, average=0.002 s; distance=0 kB, estimate=0 kB superagi-gui-1 | - wait compiling /_error (client and server)... superagi-gui-1 | - event compiled client and server successfully in 135 ms (1315 modules) superagi-gui-1 | Do not add stylesheets using next/head (see tag with href="https://fonts.googleapis.com/css2?family=Source+Code+Pro&display=swap"). Use Document instead. superagi-gui-1 | See more info here: https://nextjs.org/docs/messages/no-stylesheets-in-head-component superagi-gui-1 | Do not add stylesheets using next/head (see tag with href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap"). Use Document instead. superagi-gui-1 | See more info here: https://nextjs.org/docs/messages/no-stylesheets-in-head-component superagi-gui-1 | - wait compiling /favicon.ico/route (client and server)... superagi-gui-1 | - event compiled successfully in 43 ms (146 modules)

in the config.yaml file i set this:

For locally hosted LLMs comment out the next line and uncomment the one after

to configure a local llm point your browser to 127.0.0.1:7860 and click on the model tab in text generation web ui.

#OPENAI_API_BASE: https://api.openai.com/v1 OPENAI_API_BASE: "http://super__tgwui:5001/v1"

"gpt-3.5-turbo-0301": 4032, "gpt-4-0314": 8092, "gpt-3.5-turbo": 4032, "gpt-4": 8092, "llama":2048, "mpt-7b-storywriter":45000

MODEL_NAME: "gpt-3.5-turbo-0301" MAX_TOOL_TOKEN_LIMIT: 500 # or 800 for gpt3.5 MAX_MODEL_TOKEN_LIMIT: 2048 # set to 2048 for llama or 4032 for gpt3.5

when i now start http://localhost:3000/ i get the purple superAGI screen and it saying "Initializing superAGI", but it's stuck there.

any idea what i m doing wrong?

Yes. The IP that you're using is for the docker image. Since you aren't running docker to use TGUI you are going to need to change the line

OPENAI_API_BASE: "http://super__tgwui:5001/v1"

to

OPENAI_API_BASE URL to http://[your host machines LAN address]:5001/v1

Then you are going to want to either create a port forward from your lan interface controller to your loop back interface, or just run TGWU on your lan interface too. You can do that by setting the run address to 0.0.0.0.

@DiamondGlassDrill
Copy link

i'm running this for ooba:
malicor@DESKTOP-I087DO5:/mnt/d/ai/oobabooga_WSL/text-generation-webui$ python3 server.py --wbits 4 --groupsize 128 --model_type llama --model WizardLM-7B-uncensored-GPTQ --api --extensions long_term_memory, EdgeGPT --no-stream
i can use ooba via http://localhost:7860/ and if i ask it "how many legs does a spider have", it gives me correct answer
i started superAGI with:
D:\AI\SuperAGI>docker-compose -f local-llm-gpu up --build
it did a lot of installing and ended up with:
superagi-gui-1 | - ready started server on 0.0.0.0:3000, url: http://localhost:3000 superagi-gui-1 | - event compiled client and server successfully in 957 ms (1314 modules) superagi-gui-1 | - wait compiling... superagi-gui-1 | - event compiled client and server successfully in 218 ms (1314 modules) superagi-super__postgres-1 | 2023-06-17 08:26:33.196 UTC [26] LOG: checkpoint starting: time superagi-super__postgres-1 | 2023-06-17 08:26:33.230 UTC [26] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.008 s, sync=0.004 s, total=0.035 s; sync files=2, longest=0.002 s, average=0.002 s; distance=0 kB, estimate=0 kB superagi-gui-1 | - wait compiling /_error (client and server)... superagi-gui-1 | - event compiled client and server successfully in 135 ms (1315 modules) superagi-gui-1 | Do not add stylesheets using next/head (see tag with href="https://fonts.googleapis.com/css2?family=Source+Code+Pro&display=swap"). Use Document instead. superagi-gui-1 | See more info here: https://nextjs.org/docs/messages/no-stylesheets-in-head-component superagi-gui-1 | Do not add stylesheets using next/head (see tag with href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap"). Use Document instead. superagi-gui-1 | See more info here: https://nextjs.org/docs/messages/no-stylesheets-in-head-component superagi-gui-1 | - wait compiling /favicon.ico/route (client and server)... superagi-gui-1 | - event compiled successfully in 43 ms (146 modules)
in the config.yaml file i set this:

For locally hosted LLMs comment out the next line and uncomment the one after

to configure a local llm point your browser to 127.0.0.1:7860 and click on the model tab in text generation web ui.

#OPENAI_API_BASE: https://api.openai.com/v1 OPENAI_API_BASE: "http://super__tgwui:5001/v1"

"gpt-3.5-turbo-0301": 4032, "gpt-4-0314": 8092, "gpt-3.5-turbo": 4032, "gpt-4": 8092, "llama":2048, "mpt-7b-storywriter":45000

MODEL_NAME: "gpt-3.5-turbo-0301" MAX_TOOL_TOKEN_LIMIT: 500 # or 800 for gpt3.5 MAX_MODEL_TOKEN_LIMIT: 2048 # set to 2048 for llama or 4032 for gpt3.5
when i now start http://localhost:3000/ i get the purple superAGI screen and it saying "Initializing superAGI", but it's stuck there.
any idea what i m doing wrong?

Yes. The IP that you're using is for the docker image. Since you aren't running docker to use TGUI you are going to need to change the line

OPENAI_API_BASE: "http://super__tgwui:5001/v1"

to

OPENAI_API_BASE URL to http://[your host machines LAN address]:5001/v1

Then you are going to want to either create a port forward from your lan interface controller to your loop back interface, or just run TGWU on your lan interface too. You can do that by setting the run address to 0.0.0.0.

Tried everything it says cannot connect to openAI: OPENAI_API_BASE: "http://localhost:5000/api" connecting with SillyTavern to the API of oobabooga works, but cannot make it work on SuperAGI, any thoughts? Thanks in advance.

Error I receive:

superagi-celery-1 | 2023-07-04 15:15:09 UTC - Super AGI - INFO - [/app/superagi/llms/openai.py:79] - Exception:
superagi-celery-1 | [2023-07-04 15:15:09,629: INFO/ForkPoolWorker-8] Exception:
superagi-celery-1 | 2023-07-04 15:15:09 UTC - Super AGI - INFO - [/app/superagi/llms/openai.py:79] - Error communicating with OpenAI: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /api/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f97baaac880>: Failed to establish a new connection: [Errno 111] Connection refused'))
superagi-celery-1 | [2023-07-04 15:15:09,629: INFO/ForkPoolWorker-8] Error communicating with OpenAI: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /api/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f97baaac880>: Failed to establish a new connection: [Errno 111] Connection refused'))

@sirajperson
Copy link
Contributor Author

sirajperson commented Jul 5, 2023

@DiamondGlassDrill Port 5000 is the Text Generation Web UI api, which differs from the open AI api endpoints. You will need to enable the openapi extension to work with local LLM models. It can get tricky because not every model is compatible with instruct. You would also need to set up the correct template for prompting.

python server.py --listen --listen-host 0.0.0.0 --verbose --extensions openai 

Be sure to check the LAN ip address of your computer, so if it is 192.168.1.100, then in the config.yaml file you would set the openai base setting as such:

OPENAI_API_BASE: "http://192.168.1.100:5001/v1"

There's a wide array of configurable settings for local LLMs. To manage the model you can navigate to localhost:7860 and click on the models tab.

@neelayan7
Copy link
Collaborator

@sirajperson A lot of people have been asking for a guide on how to use SuperAGI with local llms. Can you please help us out with a Readme.md? I think @luciferlinx101 also mentioned this a few weeks back.

@juangea
Copy link

juangea commented Jul 7, 2023

I tried to make this work, at least to be able to access to the minimum UI, but it seems to not work, I used the main SuperAGI branch and I run it with
docker-compose -f local-llm-gpu up --build
But the docker related to Oobabooga never runs, it complains about it being unable to create a volume or something similar.

I want to try with nous-heroes 13B GPTQ superhot with exclama_hf, I had good experience trying it directly in Oobabooga's UI

@IsleOf
Copy link

IsleOf commented Sep 11, 2023

How about adding petals support to take load off from local resources? https://github.com/petals-infra/chat.petals.dev

@NikolaT0mic
Copy link

Is there any progress on adding support for locally running instances of tgwui? Would love to use the version i already have installed instead of using the dockerized version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.