-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added local llm functionality by incorporating text-generation-webui #289
Conversation
Im all about this!! I think we all wanted autogpt to run locally to push the limits without the bank! Stoked to see what we can run locally. And yeah then optimization of gpu preset, cblast, llama etc. will be next to auto optimize locall llms. Great job tho! 👏 |
@sirajperson. This is awesome. Tried running docker-compose on new changes(on macbook 16gb RAM). Getting the following error. 0 32.90 Ignoring bitsandbytes: markers 'platform_system == "Windows"' don't match your environment
|
Thanks for the reply. As for the build, are you able to build the docker image on the main branch? |
Removed device specific launch arguments. Settings must be done after installation.
Removed additional default settings for GPU usage. These settings need to be configured via the configuration.yaml file perhaps.
Additional generalized default device settings.
Okay, it looks like there error that you are getting is while trying to execute the install requirements.txt file from text generation web ui. On line 25 of tgwui_requirements.txt try commenting out the last line. In order to make this work on your local machine there are a couple of installation steps that you may have to take using mac. I'm not sure what kind of video card you have, or if you are using a laptop, but you should be able to remove that last line from the requirements.txt file to get it installed. Also, I have removed the following items from the launch arguments so that TGWUI doesn't automatically target devices with nvidia GPUs. For now configuration of the docker-compose.yaml needs to be done manually. I will create an build.sh script tonight that will create the docker-compose.yaml file to load build options based on the target installation environment. Until then I have commented out GPU offloading and GPU configuration. This will make the model's API responses much slower, but will greatly increase the number of devices that the containers can run on without having to modify the docker-compose.yaml file. Also, llama.cpp GPU offloading doesn't presently support removing offloaded layers from system RAM. Instead, it is presently making a copy of it in vRAM and then executing the layers there. This, from what I understand, is currently being addressed. Please reclone the repository and try running from scratch. You may need to remove the containers that you already tried to build. Please refer to docker container management to get information on how to remove containers. If this is the only items that you are using containers for on your system then you can call 'docker system prune' to remove containers that aren't running and to wipe the previous build cache. Don't run prune if you have other docker images that you would like to save installed, or it will delete them. |
@TransformerOptimus |
Searx is a free internet metasearch engine which aggregates results from more than 70 search services. I added a simple beautifulsoup scraper that allows many of the instances on https://searx.space/ to be used without an API key. Uses the bs4, httpx, and pydantic packages which are already in the `requirements.txt`.
There are multiple components in a prompt.
We can give certain percentage of weight to each of components. |
Adds support for searx search
superagi/helper/json_cleaner.py
Outdated
@@ -51,7 +51,7 @@ def extract_json_section(cls, input_str: str = ""): | |||
|
|||
@classmethod | |||
def remove_escape_sequences(cls, string): | |||
return string.encode('utf-8').decode('unicode_escape').encode('raw_unicode_escape').decode('utf-8') | |||
return string.encode('utf-8').decode('unicode_escape') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this to old change. This is required for non-English encoding.
Updated reversion on line 54
fix misspelled word SERACH to SEARCH
Update config_template.yaml
@sirajperson Hey can you let me know what to do after localhost:7860 I was able to set up locally but how to choose and test with different model? |
Can we keep the new docker-compose with local llm different from the current docker-compose file (something like - docker-compose.local_llm.yaml). We don't know how many devs want to run the local model directly by default. We can add a section in the readme for the local model. |
If a local LLM url is set in the config.yaml file, it uses tgwui container, which can be managed from 127.0.0.1:7860 The tgui is configured to run models on CPU only mode by default, but can be configured to run in other modes via the build option settings in the docker-compose.yaml
@TransformerOptimus Sure, It would be nice to be able to specify the use of local LLMs as a build arg. If we hand docker-compose something like --build-args use_local_llm=true than the compose executes the tgwui build. |
In my last commit everything seems to be basically working. I've had 4 successful runs in the past 24 hours. I'll go ahead and separate the docker-compose files. Let me know if the last commit is working well on Mac. I'm on Linux. |
@luciferlinx101 As of the last commit, Jun 10, to use Local LLMs follow these steps: Edit the config.yaml file and make the following changes: Modify the following lines to match the model you plan on using: For llama based models I have successfully been using 500, and 2048 respectively Run docker compose: Note, that if you want to use more advanced features like loading models into GPUs then you will need to do additional configuration in the docker-compose.yaml file. I have tried to leave comments in the current file as basic instructions. For more information on specific text-generation-webui builds I would recommend that you review the instructions on the text-generation-webui-docker github repo. https://github.com/Atinoda/text-generation-webui-docker After you have successfully built the containers, point your browser to 127.0.0.1:7860 and click on the models tab. In the feild "Download custom model or LoRA" enter the huggingface model identifier you would like to use, such as: TheBloke/Vicuna-13B-CoT-GGML Then click download. Then in the selection drop down menu select the model that you just downloaded and wait for it to load. Finally, point your browser to 127.0.0.1:3000 to begin using the agent. Cheers! Please be aware that my fork is development branch and is undergoing a PR, and will be changing more soon. In other words, it isn't stable. |
Mac it is still failing. Getting this error. Seems running fine on ubuntu.
|
Ran the code change on https://lmsys.org/about/ and was able to get
Sounds good |
@TransformerOptimus I'm modifying the default build to remove llama-cuda. |
Changed build to default instead of cublas
Okay I'll do a fresh clone and see if it works. I might just pull the macbook off the shelf and spin it up to debug. Although my mac is an x68 from '18. I'm not sure if I can reproduce. Let me know if commenting out the build line resolved the build issue. |
Fix typo in agent_executor.py
Removed the installation video (temporarily)
…dme_removed_vid Update README.MD
renamed: docker-compose.yaml.bak -> local-llm new file: local-llm-gpu modified: superagi/helper/json_cleaner.py renamed: DockerfileTGWUI -> tgwui/DockerfileTGWUI deleted: tgwui/config/place-your-models-here.txt deleted: tgwui/tgwui_requirements.txt
@TransformerOptimus Default build, no local LLMs or OPENAI_API_BASE redirect. Build with local GPU LLM support. This mode uses memory caching by default. It isn't very fast, but it works on a much greater number of host machines than using GPU mode. And finally, the more advanced GPU build. This build may require that additional packages be installed on the host machine. I would suggest that anyone trying to build a GPU install of the container read the docs on the in the Text Generation Web UI doc files. They are very informative and well written. TGWUI Docs Please note, that Text Generation Web UI is a large project with lots of functionality. It's worth taking time to get to know it to use local LLMs efficiently |
@luciferlinx101 I quite a bit of time looking for the that was causing a hang with the agent execution. Please look over the README.md file in the OpenAI plugin root folder of TGWUI. The openai plugin is currently under development and is not yet fully implemented. This PR is for integration of TGWUI as a model management tool. The issue might be more easily resolved by creating an issue on the projects issue tracker as not being able to sequence API calls to the OpenAI API plugin. The readme can found here. |
@sirajperson Sure I will test and let you know if it works properly. |
Merging it to dev. We will merge the dev -> main along with other changes tomorrow. I could able to get it running in Windows and Ubuntu VM. |
@sirajperson I was also able to get the tool http://localhost:7860/ and download the model but we wanted proper readme after this step which could be used to use this local model as part of super agi such that in frontend people seeing the agent actually runs on this local model not open ai. This will help all users once we merge dev to main. The minimum requirement for the README is to showcase set and use case with atleast one particular model end to end with superagi. |
@luciferlinx101 |
I'll go ahead and get to the bottom of that bug. Commenting out the line changes the Dockerfile that is present on the TGWUI docker github project. I would like to avoid changing files, and to try to keep it the same. That way instead of having additional folders, files, and configurations down the line we can just do a git pull for the repo, so long as it's maintained. For now, I will go ahead and comment it out until the issue is investigated further. |
This is great! I am familiar with Oobabooga and developed a little extension for it. A few observations:
|
Hey any updates of the stepwise readme showing end to end use case with superagi? |
@luciferlinx101 |
Adding encouragement for @sirajperson : more of us are watching for the steps to successfully set this up. |
No problem! Let me know if it is done. |
i'm running this for ooba: malicor@DESKTOP-I087DO5:/mnt/d/ai/oobabooga_WSL/text-generation-webui$ python3 server.py --wbits 4 --groupsize 128 --model_type llama --model WizardLM-7B-uncensored-GPTQ --api --extensions long_term_memory, EdgeGPT --no-stream i can use ooba via http://localhost:7860/ and if i ask it "how many legs does a spider have", it gives me correct answer i started superAGI with: D:\AI\SuperAGI>docker-compose -f local-llm-gpu up --build it did a lot of installing and ended up with: superagi-gui-1 | - ready started server on 0.0.0.0:3000, url: http://localhost:3000 in the config.yaml file i set this: For locally hosted LLMs comment out the next line and uncomment the one afterto configure a local llm point your browser to 127.0.0.1:7860 and click on the model tab in text generation web ui.#OPENAI_API_BASE: https://api.openai.com/v1 "gpt-3.5-turbo-0301": 4032, "gpt-4-0314": 8092, "gpt-3.5-turbo": 4032, "gpt-4": 8092, "llama":2048, "mpt-7b-storywriter":45000MODEL_NAME: "gpt-3.5-turbo-0301" when i now start http://localhost:3000/ i get the purple superAGI screen and it saying "Initializing superAGI", but it's stuck there. any idea what i m doing wrong? |
It would be fantastic to have a tut that goes over how to set up open source LLM's with SuperAGI. I would love to do test runs on my own hardware to see if my configuration looks pretty good, then take it out to the more powerful LLM's after I have confirmed it looked like I am close to where I need to be with my instructions. Also, we could save some work by pointing to the oobabooga installation instructions for their part...but hooking into SuperAGI seems to be the part I am amiss at here. |
Yes. The IP that you're using is for the docker image. Since you aren't running docker to use TGUI you are going to need to change the line OPENAI_API_BASE: "http://super__tgwui:5001/v1" to OPENAI_API_BASE URL to http://[your host machines LAN address]:5001/v1 Then you are going to want to either create a port forward from your lan interface controller to your loop back interface, or just run TGWU on your lan interface too. You can do that by setting the run address to 0.0.0.0. |
Tried everything it says cannot connect to openAI: OPENAI_API_BASE: "http://localhost:5000/api" connecting with SillyTavern to the API of oobabooga works, but cannot make it work on SuperAGI, any thoughts? Thanks in advance. Error I receive: superagi-celery-1 | 2023-07-04 15:15:09 UTC - Super AGI - INFO - [/app/superagi/llms/openai.py:79] - Exception: |
@DiamondGlassDrill Port 5000 is the Text Generation Web UI api, which differs from the open AI api endpoints. You will need to enable the openapi extension to work with local LLM models. It can get tricky because not every model is compatible with instruct. You would also need to set up the correct template for prompting.
Be sure to check the LAN ip address of your computer, so if it is 192.168.1.100, then in the config.yaml file you would set the openai base setting as such:
There's a wide array of configurable settings for local LLMs. To manage the model you can navigate to localhost:7860 and click on the models tab. |
@sirajperson A lot of people have been asking for a guide on how to use SuperAGI with local llms. Can you please help us out with a Readme.md? I think @luciferlinx101 also mentioned this a few weeks back. |
I tried to make this work, at least to be able to access to the minimum UI, but it seems to not work, I used the main SuperAGI branch and I run it with I want to try with nous-heroes 13B GPTQ superhot with exclama_hf, I had good experience trying it directly in Oobabooga's UI |
How about adding petals support to take load off from local resources? https://github.com/petals-infra/chat.petals.dev |
Is there any progress on adding support for locally running instances of tgwui? Would love to use the version i already have installed instead of using the dockerized version. |
In this PM I have integrated text-generation-webui as a means of managing locally hosted LLMs.
In this PM the changes are as follows:
Created a setting for OPENAI_API_BASE_URL: This allows one to set the URL that the openai library is pointed to.
Created a docker image of Text Generation Web UI that includes multi-GPU off loading of GGMLs
Configured SuperAGI to use use the TGWUI docker image by default.
With this PM one can run the docker-compose up --build command, and then navigate to localhost:7860 to download models for use with SuperAGI from huggingface.co.