-
Notifications
You must be signed in to change notification settings - Fork 45k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix browse_website
model context overflow issue
#2542
Merged
Pwuts
merged 6 commits into
Significant-Gravitas:master
from
bszollosinagy:prompt_context_overflow_fix
Apr 19, 2023
Merged
Fix browse_website
model context overflow issue
#2542
Pwuts
merged 6 commits into
Significant-Gravitas:master
from
bszollosinagy:prompt_context_overflow_fix
Apr 19, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* change BROWSE_CHUNK_MAX_LENGTH to use token counts instead of characters. Added comment in .env.template * move BROWSE_CHUNK_MAX_LENGTH to the BROWSER section in .env.template * add BROWSE_SPACY_LANGUAGE_MODEL to .env.template. This is the language model in SpaCy that used to split text into sentences. * add SpaCy to requirements * improved summarize_text() and split_text() to use tokens instead of character counts, and to split based on sentences instead of paragraphs. * Added token count display for each chunk that is summarized during a "browse_website" command
5 tasks
…flow_fix # Conflicts: # autogpt/processing/text.py
* applied black formatting
This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request. |
…flow_fix # Conflicts: # .env.template
Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly. |
Pwuts
approved these changes
Apr 19, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to work! :)
Log (warning: contains info on sea urchins)
Warning: The file 'auto-gpt.json' does not exist. Local memory would not be saved to a file.
NEWS: ::UPDATED:: Welcome to Auto-GPT! We'll keep you informed of the latest news and features by printing messages here. If you don't wish to see this message, you can run Auto-GPT with the --skip-news flag
Welcome to Auto-GPT! run with '--help' for more information.
Create an AI-Assistant: Enter the name of your AI and its role below. Entering nothing will load defaults.
Name your AI: For example, 'Entrepreneur-GPT'
AI Name: Sea Urchin GPT
Sea Urchin GPT here! I am at your service.
Describe your AI's role: For example, 'an AI designed to autonomously develop and run businesses with the sole goal of increasing your net worth.'
Sea Urchin GPT is: a fan of sea urchins, wants to know all about them!
Enter up to 5 goals for your AI: For example: Increase net worth, Grow Twitter Account, Develop and manage multiple businesses autonomously'
Enter nothing to load defaults, enter nothing when finished.
Goal 1: Find the wikipedia page on sea urchins
Goal 2: Summarize the wikipedia page on sea urchins in 5 sentences
Goal 3: terminate
Goal 4:
Using memory of type: LocalCache
Using Browser: chrome
THOUGHTS: I will use the 'google' command to search for the wikipedia page on sea urchins
REASONING: I need to find information on sea urchins, and wikipedia is a good source of information
PLAN:
- Use the 'google' command to search for the wikipedia page on sea urchins
CRITICISM: I should ensure that the wikipedia page I find is reliable and accurate
NEXT ACTION: COMMAND = google ARGUMENTS = {'input': 'sea urchin wikipedia'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-=
SYSTEM: Command google returned: [ { "title": "Sea urchin - Wikipedia", "href": "https://en.wikipedia.org/wiki/Sea_urchin", "body": "Sea urchins (/ ˈ ɜːr tʃ ɪ n z /) are spiny, globular echinoderms in the class Echinoidea.About 950 species of sea urchin live on the seabed of every ocean and inhabit every depth zone from the intertidal seashore down to 5,000 meters (16,000 ft; 2,700 fathoms). The spherical, hard shells of sea urchins are round and covered in spines.Most urchin spines range in length from 3 to 10 cm (1 ..." }, { "title": "Sea urchin - Simple English Wikipedia, the free encyclopedia", "href": "https://simple.wikipedia.org/wiki/Sea_urchin", "body": "Sea urchins are spiky members of the class Echinoidea of the phylum Echinodermata. Like the rest of the echinoderms, they are entirely marine. They are globe-shaped, and protected by calcareous plates and spines. Urchin is an old word for hedgehog, and in many foreign languages these animals are called sea hedgehogs." }, { "title": "Sea urchin | Description, Anatomy, & Facts | Britannica", "href": "https://www.britannica.com/animal/sea-urchin", "body": "sea urchin, any of about 950 living species of spiny marine invertebrate animals (class Echinoidea, phylum Echinodermata) with a globular body and a radial arrangement of organs, shown by five bands of pores running from mouth to anus over the test (internal skeleton)." }, { "title": "Diadema antillarum - Wikipedia", "href": "https://en.wikipedia.org/wiki/Diadema_antillarum", "body": "Diadema antillarum, also known as the lime urchin, black sea urchin, or the long-spined sea urchin, [2] is a species of sea urchin in the family Diadematidae . This sea urchin is characterized by its exceptionally long black spines. It is the most abundant and important herbivore on the coral reefs of the western Atlantic and Caribbean basin." }, { "title": "Red sea urchin - Wikipedia", "href": "https://en.wikipedia.org/wiki/Red_sea_urchin", "body": "The red sea urchin ( Mesocentrotus franciscanus) [1] is a sea urchin found in the northeastern Pacific Ocean from Alaska to Baja California." }, { "title": "40 Sea Urchin Facts About These Spiky Creatures - Facts.net", "href": "https://facts.net/sea-urchin-facts/", "body": "Sea urchins make up some of the most visible, and also the most fascinating animals in the sea. Their vibrant color is certainly eye-catching. Sea urchins come in many colors from purple to red, to green, etc. An average sea urchin is pretty small, around 3 to 10 centimeters in diameter." }, { "title": "Echinus esculentus - Wikipedia", "href": "https://en.wikipedia.org/wiki/Echinus_esculentus", "body": "Echinus esculentus, the European edible sea urchin or common sea urchin, is a species of marine invertebrate in the Echinidae family. It is found in coastal areas of western Europe down to a depth of 1,200 m (3,900 ft). [3] It is considered \"Near threatened\" in the IUCN Red List of Threatened Species. [1] Test of Echinus esculentus" }, { "title": "Echinocardium cordatum - Wikipedia", "href": "https://en.wikipedia.org/wiki/Echinocardium_cordatum", "body": "Echinocardium cordatum, also known as the common heart urchin [2] or the sea potato, [3] is a sea urchin in the family Loveniidae. It is found in sub-tidal regions in the NE Atlantic. Other un-named species have been identified as this species from temperate seas around the world. [4] It lives buried in the sandy sea floor. Description [ edit]" } ]
THOUGHTS: I will use the 'browse_website' command to access the wikipedia page on sea urchins and summarize the information in 5 sentences
REASONING: I need to summarize the wikipedia page on sea urchins, and browsing the website will allow me to access the information I need
PLAN:
- Use the 'browse_website' command to access the wikipedia page on sea urchins
- Summarize the information in 5 sentences
CRITICISM: I should ensure that the information I summarize is accurate and relevant to sea urchins
NEXT ACTION: COMMAND = browse_website ARGUMENTS = {'url': 'https://en.wikipedia.org/wiki/Sea_urchin', 'question': 'summary'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-=
[WDM] - Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 6.75M/6.75M [00:01<00:00, 6.69MB/s]
Text length: 57125 characters
Adding chunk 1 / 6 to memory
Summarizing chunk 1 / 6 of length 10867 characters, or 2997 tokens
Added chunk 1 summary to memory, of length 764 characters
Adding chunk 2 / 6 to memory
Summarizing chunk 2 / 6 of length 12774 characters, or 2999 tokens
Added chunk 2 summary to memory, of length 1020 characters
Adding chunk 3 / 6 to memory
Summarizing chunk 3 / 6 of length 12042 characters, or 2993 tokens
Added chunk 3 summary to memory, of length 630 characters
Adding chunk 4 / 6 to memory
Summarizing chunk 4 / 6 of length 9422 characters, or 2997 tokens
Added chunk 4 summary to memory, of length 491 characters
Adding chunk 5 / 6 to memory
Summarizing chunk 5 / 6 of length 8813 characters, or 2978 tokens
Added chunk 5 summary to memory, of length 761 characters
Adding chunk 6 / 6 to memory
Summarizing chunk 6 / 6 of length 3205 characters, or 819 tokens
Added chunk 6 summary to memory, of length 285 characters
Summarized 6 chunks.
SYSTEM: Command browse_website returned: ("Answer gathered from website: Sea urchins are spiny echinoderms found in oceans around the world. They have a hard shell and move using tube feet and spines. Sea urchins primarily eat algae but can also eat slow-moving animals. They are important members of marine ecosystems and can be eaten by humans in various parts of the world. Sea urchins have a unique immune system and are being studied for their ability to survive in acidic environments. They have been used for various purposes throughout history, including as amulets and in traditional medicine. \n \n Links: ['Jump to content (https://en.wikipedia.org/wiki/Sea_urchin#bodyContent)', 'Main page (https://en.wikipedia.org/wiki/Main_Page)', 'Contents (https://en.wikipedia.org/wiki/Wikipedia:Contents)', 'Current events (https://en.wikipedia.org/wiki/Portal:Current_events)', 'Random article (https://en.wikipedia.org/wiki/Special:Random)']", <selenium.webdriver.chrome.webdriver.WebDriver (session="09f1fab205605cac9e03112862dcdc41")>)
THOUGHTS: I will use the 'write_to_file' command to save the summary of the wikipedia page on sea urchins to a file
REASONING: I need to save the summary of the wikipedia page on sea urchins for future reference, and the 'write_to_file' command will allow me to do so
PLAN:
- Use the 'write_to_file' command to save the summary of the wikipedia page on sea urchins to a file
CRITICISM: I should ensure that the file I save the summary to is easily accessible and properly named
NEXT ACTION: COMMAND = write_to_file ARGUMENTS = {'file': 'sea_urchin_summary.txt', 'text': 'Sea urchins are spiny echinoderms found in oceans around the world. They have a hard shell and move using tube feet and spines. Sea urchins primarily eat algae but can also eat slow-moving animals. They are important members of marine ecosystems and can be eaten by humans in various parts of the world. Sea urchins have a unique immune system and are being studied for their ability to survive in acidic environments. They have been used for various purposes throughout history, including as amulets and in traditional medicine.'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-=
SYSTEM: Command write_to_file returned: File written to successfully.
THOUGHTS: I will use the 'read_file' command to confirm that the summary of the wikipedia page on sea urchins was saved correctly
REASONING: I need to confirm that the summary of the wikipedia page on sea urchins was saved correctly, and the 'read_file' command will allow me to do so
PLAN:
- Use the 'read_file' command to confirm that the summary of the wikipedia page on sea urchins was saved correctly
CRITICISM: I should ensure that the file I saved the summary to is still accessible and properly named
NEXT ACTION: COMMAND = read_file ARGUMENTS = {'file': 'sea_urchin_summary.txt'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-=
SYSTEM: Command read_file returned: Sea urchins are spiny echinoderms found in oceans around the world. They have a hard shell and move using tube feet and spines. Sea urchins primarily eat algae but can also eat slow-moving animals. They are important members of marine ecosystems and can be eaten by humans in various parts of the world. Sea urchins have a unique immune system and are being studied for their ability to survive in acidic environments. They have been used for various purposes throughout history, including as amulets and in traditional medicine.
THOUGHTS: I have completed the task of summarizing the wikipedia page on sea urchins and saving the summary to a file. I will use the 'task_complete' command to end this session
REASONING: I have completed the task I was given, and there are no further actions required
PLAN:
- Use the 'task_complete' command to end this session
CRITICISM: I should ensure that I have completed all necessary actions before ending the session
NEXT ACTION: COMMAND = task_complete ARGUMENTS = {'reason': 'Task of summarizing the wikipedia page on sea urchins and saving the summary to a file is complete'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-=
Shutting down...
1 task
1 task
browse_website
model context overflow issue
1 task
This was referenced Apr 22, 2023
Closed
sindlinger
pushed a commit
to Orgsindlinger/Auto-GPT-WebUI
that referenced
this pull request
Sep 25, 2024
Co-authored-by: batyu <batyu@localhost>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
Sometimes, when browsing the web using the command "browse_website", AutoGPT would crash with a message:
SYSTEM: Command browse_website returned: Error: This model's maximum context length is 4097 tokens. However, your messages resulted in 6470 tokens. Please reduce the length of the messages.
There are a few reasons for this crash:
split_text()
(in autogpt/processing/text.py). The original algorithm split a webpage into paragraphs, and kept adding paragraphs into a "chunk" list, until it hit some max_length. Then it yielded that chunk, and started the next chunk by simply adding the next paragraph.The issue is, that some paragraphs are huge, without any newline in them, and this "let's add a paragraph as the first element of a chunk" is what caused the error in all of the issues.split_text()
ignoredBROWSE_CHUNK_MAX_LENGTH
BROWSE_CHUNK_MAX_LENGTH
was understood as the number of characters, whereas in reality it should really use the number of tokens.Changes
Now, instead of splitting into paragraphs, we split into sentences. We use Spacy to split into sentences.
Now,
BROWSE_CHUNK_MAX_LENGTH
is understood as max number of tokens, not characters, and it is not ignored. Now the summaries are actually much larger, than before, leading to better quality AutoGPT performance. (split_text had a default max_length=8192 which was the number of characters. The current default of 3000 tokens allows slightly more content from the webpage)now, there are webpages the AutoGPT can open, whereas it could simply not parse them earlier
Fixes why it shows " This model's maximum context length is 8191 tokens ? " #2366
Fixes Maximum context length exceeded after
browse_website
#796Fixes This model's maximum context length is 8191 tokens, however you requested 89686 tokens (89686 in your prompt) #1639
Fixes Prompt overflows aren't handled gracefully #1841
Documentation
This PR is the documentation, and the code is fairly readable.
Test Plan
Tested with this command:
It fails with the error message normally, but after this fix, it works correctly.
Did not test with Chinese, but if you set this in the .env file, and install the Chinese model via pip, it should work and fix #2366 :
python -m spacy download zh_core_web_sm
Add this to .env:
SPACY_LANGUAGE_MODEL = zh_core_web_sm
PR Quality Checklist