Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix browse_website model context overflow issue #2542

Merged

Conversation

bszollosinagy
Copy link
Contributor

@bszollosinagy bszollosinagy commented Apr 19, 2023

Background

Sometimes, when browsing the web using the command "browse_website", AutoGPT would crash with a message: SYSTEM: Command browse_website returned: Error: This model's maximum context length is 4097 tokens. However, your messages resulted in 6470 tokens. Please reduce the length of the messages.

There are a few reasons for this crash:

  • There was a bug in split_text() (in autogpt/processing/text.py). The original algorithm split a webpage into paragraphs, and kept adding paragraphs into a "chunk" list, until it hit some max_length. Then it yielded that chunk, and started the next chunk by simply adding the next paragraph.The issue is, that some paragraphs are huge, without any newline in them, and this "let's add a paragraph as the first element of a chunk" is what caused the error in all of the issues.
  • The other issue was that split_text() ignored BROWSE_CHUNK_MAX_LENGTH
  • The third issue was that BROWSE_CHUNK_MAX_LENGTH was understood as the number of characters, whereas in reality it should really use the number of tokens.

Changes

Documentation

This PR is the documentation, and the code is fairly readable.

Test Plan

Tested with this command:

command_name = 'browse_website'
arguments = {'url': 'http://www.ontario.ca/page/cherry-cultivars-sweet-and-tart', 'question': 'list of commercially available cherry cultivars'}

It fails with the error message normally, but after this fix, it works correctly.

Did not test with Chinese, but if you set this in the .env file, and install the Chinese model via pip, it should work and fix #2366 :

python -m spacy download zh_core_web_sm

Add this to .env:
SPACY_LANGUAGE_MODEL = zh_core_web_sm

PR Quality Checklist

  • My pull request is atomic and focuses on a single change.
  • I have thoroughly tested my changes with multiple different prompts.
  • I have considered potential risks and mitigations for my changes.
  • I have documented my changes clearly and comprehensively.
  • I have not snuck in any "extra" small tweaks changes

* change BROWSE_CHUNK_MAX_LENGTH to use token counts instead of characters. Added comment in .env.template
* move BROWSE_CHUNK_MAX_LENGTH to the BROWSER section in .env.template
* add BROWSE_SPACY_LANGUAGE_MODEL to .env.template. This is the language model in SpaCy that used to split text into sentences.
* add SpaCy to requirements
* improved summarize_text() and split_text() to use tokens instead of character counts, and to split based on sentences instead of paragraphs.
* Added token count display for each chunk that is summarized during a "browse_website" command
@bszollosinagy bszollosinagy mentioned this pull request Apr 19, 2023
5 tasks
@Pwuts Pwuts assigned Pwuts and BillSchumacher and unassigned BillSchumacher and Pwuts Apr 19, 2023
@github-actions github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Apr 19, 2023
@github-actions
Copy link
Contributor

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

@github-actions github-actions bot removed the conflicts Automatically applied to PRs with merge conflicts label Apr 19, 2023
@github-actions
Copy link
Contributor

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

@Pwuts Pwuts self-requested a review April 19, 2023 20:50
Copy link
Member

@Pwuts Pwuts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work! :)

Log (warning: contains info on sea urchins)
Warning: The file 'auto-gpt.json' does not exist. Local memory would not be saved to a file.
NEWS:  ::UPDATED:: Welcome to Auto-GPT! We'll keep you informed of the latest news and features by printing messages here. If you don't wish to see this message, you can run Auto-GPT with the --skip-news flag
Welcome to Auto-GPT!  run with '--help' for more information.
Create an AI-Assistant:  Enter the name of your AI and its role below. Entering nothing will load defaults.
Name your AI:  For example, 'Entrepreneur-GPT'
AI Name: Sea Urchin GPT
Sea Urchin GPT here!  I am at your service.
Describe your AI's role:  For example, 'an AI designed to autonomously develop and run businesses with the sole goal of increasing your net worth.'
Sea Urchin GPT is: a fan of sea urchins, wants to know all about them!
Enter up to 5 goals for your AI:  For example: Increase net worth, Grow Twitter Account, Develop and manage multiple businesses autonomously'
Enter nothing to load defaults, enter nothing when finished.
Goal 1: Find the wikipedia page on sea urchins
Goal 2: Summarize the wikipedia page on sea urchins in 5 sentences
Goal 3: terminate
Goal 4: 
Using memory of type:  LocalCache
Using Browser:  chrome
 THOUGHTS:  I will use the 'google' command to search for the wikipedia page on sea urchins
REASONING:  I need to find information on sea urchins, and wikipedia is a good source of information
PLAN: 
-  Use the 'google' command to search for the wikipedia page on sea urchins
CRITICISM:  I should ensure that the wikipedia page I find is reliable and accurate
NEXT ACTION:  COMMAND = google ARGUMENTS = {'input': 'sea urchin wikipedia'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-= 
SYSTEM:  Command google returned: [ { "title": "Sea urchin - Wikipedia", "href": "https://en.wikipedia.org/wiki/Sea_urchin", "body": "Sea urchins (/ ˈ ɜːr tʃ ɪ n z /) are spiny, globular echinoderms in the class Echinoidea.About 950 species of sea urchin live on the seabed of every ocean and inhabit every depth zone from the intertidal seashore down to 5,000 meters (16,000 ft; 2,700 fathoms). The spherical, hard shells of sea urchins are round and covered in spines.Most urchin spines range in length from 3 to 10 cm (1 ..." }, { "title": "Sea urchin - Simple English Wikipedia, the free encyclopedia", "href": "https://simple.wikipedia.org/wiki/Sea_urchin", "body": "Sea urchins are spiky members of the class Echinoidea of the phylum Echinodermata. Like the rest of the echinoderms, they are entirely marine. They are globe-shaped, and protected by calcareous plates and spines. Urchin is an old word for hedgehog, and in many foreign languages these animals are called sea hedgehogs." }, { "title": "Sea urchin | Description, Anatomy, & Facts | Britannica", "href": "https://www.britannica.com/animal/sea-urchin", "body": "sea urchin, any of about 950 living species of spiny marine invertebrate animals (class Echinoidea, phylum Echinodermata) with a globular body and a radial arrangement of organs, shown by five bands of pores running from mouth to anus over the test (internal skeleton)." }, { "title": "Diadema antillarum - Wikipedia", "href": "https://en.wikipedia.org/wiki/Diadema_antillarum", "body": "Diadema antillarum, also known as the lime urchin, black sea urchin, or the long-spined sea urchin, [2] is a species of sea urchin in the family Diadematidae . This sea urchin is characterized by its exceptionally long black spines. It is the most abundant and important herbivore on the coral reefs of the western Atlantic and Caribbean basin." }, { "title": "Red sea urchin - Wikipedia", "href": "https://en.wikipedia.org/wiki/Red_sea_urchin", "body": "The red sea urchin ( Mesocentrotus franciscanus) [1] is a sea urchin found in the northeastern Pacific Ocean from Alaska to Baja California." }, { "title": "40 Sea Urchin Facts About These Spiky Creatures - Facts.net", "href": "https://facts.net/sea-urchin-facts/", "body": "Sea urchins make up some of the most visible, and also the most fascinating animals in the sea. Their vibrant color is certainly eye-catching. Sea urchins come in many colors from purple to red, to green, etc. An average sea urchin is pretty small, around 3 to 10 centimeters in diameter." }, { "title": "Echinus esculentus - Wikipedia", "href": "https://en.wikipedia.org/wiki/Echinus_esculentus", "body": "Echinus esculentus, the European edible sea urchin or common sea urchin, is a species of marine invertebrate in the Echinidae family. It is found in coastal areas of western Europe down to a depth of 1,200 m (3,900 ft). [3] It is considered \"Near threatened\" in the IUCN Red List of Threatened Species. [1] Test of Echinus esculentus" }, { "title": "Echinocardium cordatum - Wikipedia", "href": "https://en.wikipedia.org/wiki/Echinocardium_cordatum", "body": "Echinocardium cordatum, also known as the common heart urchin [2] or the sea potato, [3] is a sea urchin in the family Loveniidae. It is found in sub-tidal regions in the NE Atlantic. Other un-named species have been identified as this species from temperate seas around the world. [4] It lives buried in the sandy sea floor. Description [ edit]" } ]
 THOUGHTS:  I will use the 'browse_website' command to access the wikipedia page on sea urchins and summarize the information in 5 sentences
REASONING:  I need to summarize the wikipedia page on sea urchins, and browsing the website will allow me to access the information I need
PLAN: 
-  Use the 'browse_website' command to access the wikipedia page on sea urchins
-  Summarize the information in 5 sentences
CRITICISM:  I should ensure that the information I summarize is accurate and relevant to sea urchins
NEXT ACTION:  COMMAND = browse_website ARGUMENTS = {'url': 'https://en.wikipedia.org/wiki/Sea_urchin', 'question': 'summary'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-= 
[WDM] - Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 6.75M/6.75M [00:01<00:00, 6.69MB/s]
Text length: 57125 characters
Adding chunk 1 / 6 to memory
Summarizing chunk 1 / 6 of length 10867 characters, or 2997 tokens
Added chunk 1 summary to memory, of length 764 characters
Adding chunk 2 / 6 to memory
Summarizing chunk 2 / 6 of length 12774 characters, or 2999 tokens
Added chunk 2 summary to memory, of length 1020 characters
Adding chunk 3 / 6 to memory
Summarizing chunk 3 / 6 of length 12042 characters, or 2993 tokens
Added chunk 3 summary to memory, of length 630 characters
Adding chunk 4 / 6 to memory
Summarizing chunk 4 / 6 of length 9422 characters, or 2997 tokens
Added chunk 4 summary to memory, of length 491 characters
Adding chunk 5 / 6 to memory
Summarizing chunk 5 / 6 of length 8813 characters, or 2978 tokens
Added chunk 5 summary to memory, of length 761 characters
Adding chunk 6 / 6 to memory
Summarizing chunk 6 / 6 of length 3205 characters, or 819 tokens
Added chunk 6 summary to memory, of length 285 characters
Summarized 6 chunks.
SYSTEM:  Command browse_website returned: ("Answer gathered from website: Sea urchins are spiny echinoderms found in oceans around the world. They have a hard shell and move using tube feet and spines. Sea urchins primarily eat algae but can also eat slow-moving animals. They are important members of marine ecosystems and can be eaten by humans in various parts of the world. Sea urchins have a unique immune system and are being studied for their ability to survive in acidic environments. They have been used for various purposes throughout history, including as amulets and in traditional medicine. \n \n Links: ['Jump to content (https://en.wikipedia.org/wiki/Sea_urchin#bodyContent)', 'Main page (https://en.wikipedia.org/wiki/Main_Page)', 'Contents (https://en.wikipedia.org/wiki/Wikipedia:Contents)', 'Current events (https://en.wikipedia.org/wiki/Portal:Current_events)', 'Random article (https://en.wikipedia.org/wiki/Special:Random)']", <selenium.webdriver.chrome.webdriver.WebDriver (session="09f1fab205605cac9e03112862dcdc41")>)
 THOUGHTS:  I will use the 'write_to_file' command to save the summary of the wikipedia page on sea urchins to a file
REASONING:  I need to save the summary of the wikipedia page on sea urchins for future reference, and the 'write_to_file' command will allow me to do so
PLAN: 
-  Use the 'write_to_file' command to save the summary of the wikipedia page on sea urchins to a file
CRITICISM:  I should ensure that the file I save the summary to is easily accessible and properly named
NEXT ACTION:  COMMAND = write_to_file ARGUMENTS = {'file': 'sea_urchin_summary.txt', 'text': 'Sea urchins are spiny echinoderms found in oceans around the world. They have a hard shell and move using tube feet and spines. Sea urchins primarily eat algae but can also eat slow-moving animals. They are important members of marine ecosystems and can be eaten by humans in various parts of the world. Sea urchins have a unique immune system and are being studied for their ability to survive in acidic environments. They have been used for various purposes throughout history, including as amulets and in traditional medicine.'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-= 
SYSTEM:  Command write_to_file returned: File written to successfully.
 THOUGHTS:  I will use the 'read_file' command to confirm that the summary of the wikipedia page on sea urchins was saved correctly
REASONING:  I need to confirm that the summary of the wikipedia page on sea urchins was saved correctly, and the 'read_file' command will allow me to do so
PLAN: 
-  Use the 'read_file' command to confirm that the summary of the wikipedia page on sea urchins was saved correctly
CRITICISM:  I should ensure that the file I saved the summary to is still accessible and properly named
NEXT ACTION:  COMMAND = read_file ARGUMENTS = {'file': 'sea_urchin_summary.txt'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-= 
SYSTEM:  Command read_file returned: Sea urchins are spiny echinoderms found in oceans around the world. They have a hard shell and move using tube feet and spines. Sea urchins primarily eat algae but can also eat slow-moving animals. They are important members of marine ecosystems and can be eaten by humans in various parts of the world. Sea urchins have a unique immune system and are being studied for their ability to survive in acidic environments. They have been used for various purposes throughout history, including as amulets and in traditional medicine.
 THOUGHTS:  I have completed the task of summarizing the wikipedia page on sea urchins and saving the summary to a file. I will use the 'task_complete' command to end this session
REASONING:  I have completed the task I was given, and there are no further actions required
PLAN: 
-  Use the 'task_complete' command to end this session
CRITICISM:  I should ensure that I have completed all necessary actions before ending the session
NEXT ACTION:  COMMAND = task_complete ARGUMENTS = {'reason': 'Task of summarizing the wikipedia page on sea urchins and saving the summary to a file is complete'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-= 
Shutting down...

@Pwuts Pwuts changed the title Prompt context overflow issue fixed Fix model context overflow issue Apr 19, 2023
@Pwuts Pwuts merged commit fa91bc1 into Significant-Gravitas:master Apr 19, 2023
@Pwuts Pwuts mentioned this pull request Apr 19, 2023
1 task
@Pwuts Pwuts linked an issue Apr 19, 2023 that may be closed by this pull request
@Pwuts Pwuts mentioned this pull request Apr 21, 2023
5 tasks
@Pwuts Pwuts changed the title Fix model context overflow issue Fix browse_website model context overflow issue Apr 22, 2023
@Pwuts Pwuts changed the title Fix browse_website model context overflow issue Fix browse_website model context overflow issue Apr 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment