Fix `browse_website` model context overflow issue #2542

bszollosinagy · 2023-04-19T14:35:58Z

Background

Sometimes, when browsing the web using the command "browse_website", AutoGPT would crash with a message: SYSTEM: Command browse_website returned: Error: This model's maximum context length is 4097 tokens. However, your messages resulted in 6470 tokens. Please reduce the length of the messages.

There are a few reasons for this crash:

There was a bug in split_text() (in autogpt/processing/text.py). The original algorithm split a webpage into paragraphs, and kept adding paragraphs into a "chunk" list, until it hit some max_length. Then it yielded that chunk, and started the next chunk by simply adding the next paragraph.The issue is, that some paragraphs are huge, without any newline in them, and this "let's add a paragraph as the first element of a chunk" is what caused the error in all of the issues.
The other issue was that split_text() ignored BROWSE_CHUNK_MAX_LENGTH
The third issue was that BROWSE_CHUNK_MAX_LENGTH was understood as the number of characters, whereas in reality it should really use the number of tokens.

Changes

Now, instead of splitting into paragraphs, we split into sentences. We use Spacy to split into sentences.
Now, BROWSE_CHUNK_MAX_LENGTH is understood as max number of tokens, not characters, and it is not ignored. Now the summaries are actually much larger, than before, leading to better quality AutoGPT performance. (split_text had a default max_length=8192 which was the number of characters. The current default of 3000 tokens allows slightly more content from the webpage)
now, there are webpages the AutoGPT can open, whereas it could simply not parse them earlier
Fixes why it shows " This model's maximum context length is 8191 tokens ? " #2366
Fixes Maximum context length exceeded after browse_website #796
Fixes This model's maximum context length is 8191 tokens, however you requested 89686 tokens (89686 in your prompt) #1639
Fixes Prompt overflows aren't handled gracefully #1841

Documentation

This PR is the documentation, and the code is fairly readable.

Test Plan

Tested with this command:

command_name = 'browse_website'
arguments = {'url': 'http://www.ontario.ca/page/cherry-cultivars-sweet-and-tart', 'question': 'list of commercially available cherry cultivars'}

It fails with the error message normally, but after this fix, it works correctly.

Did not test with Chinese, but if you set this in the .env file, and install the Chinese model via pip, it should work and fix #2366 :

python -m spacy download zh_core_web_sm

Add this to .env:
SPACY_LANGUAGE_MODEL = zh_core_web_sm

PR Quality Checklist

My pull request is atomic and focuses on a single change.
I have thoroughly tested my changes with multiple different prompts.
I have considered potential risks and mitigations for my changes.
I have documented my changes clearly and comprehensively.
I have not snuck in any "extra" small tweaks changes

* change BROWSE_CHUNK_MAX_LENGTH to use token counts instead of characters. Added comment in .env.template * move BROWSE_CHUNK_MAX_LENGTH to the BROWSER section in .env.template * add BROWSE_SPACY_LANGUAGE_MODEL to .env.template. This is the language model in SpaCy that used to split text into sentences. * add SpaCy to requirements * improved summarize_text() and split_text() to use tokens instead of character counts, and to split based on sentences instead of paragraphs. * Added token count display for each chunk that is summarized during a "browse_website" command

…flow_fix # Conflicts: # autogpt/processing/text.py

* applied black formatting

github-actions · 2023-04-19T17:29:55Z

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

…flow_fix # Conflicts: # .env.template

github-actions · 2023-04-19T18:14:29Z

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

Pwuts

Seems to work! :)

Log (warning: contains info on sea urchins)

Warning: The file 'auto-gpt.json' does not exist. Local memory would not be saved to a file.
NEWS:  ::UPDATED:: Welcome to Auto-GPT! We'll keep you informed of the latest news and features by printing messages here. If you don't wish to see this message, you can run Auto-GPT with the --skip-news flag
Welcome to Auto-GPT!  run with '--help' for more information.
Create an AI-Assistant:  Enter the name of your AI and its role below. Entering nothing will load defaults.
Name your AI:  For example, 'Entrepreneur-GPT'
AI Name: Sea Urchin GPT
Sea Urchin GPT here!  I am at your service.
Describe your AI's role:  For example, 'an AI designed to autonomously develop and run businesses with the sole goal of increasing your net worth.'
Sea Urchin GPT is: a fan of sea urchins, wants to know all about them!
Enter up to 5 goals for your AI:  For example: Increase net worth, Grow Twitter Account, Develop and manage multiple businesses autonomously'
Enter nothing to load defaults, enter nothing when finished.
Goal 1: Find the wikipedia page on sea urchins
Goal 2: Summarize the wikipedia page on sea urchins in 5 sentences
Goal 3: terminate
Goal 4: 
Using memory of type:  LocalCache
Using Browser:  chrome
 THOUGHTS:  I will use the 'google' command to search for the wikipedia page on sea urchins
REASONING:  I need to find information on sea urchins, and wikipedia is a good source of information
PLAN: 
-  Use the 'google' command to search for the wikipedia page on sea urchins
CRITICISM:  I should ensure that the wikipedia page I find is reliable and accurate
NEXT ACTION:  COMMAND = google ARGUMENTS = {'input': 'sea urchin wikipedia'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-= 
SYSTEM:  Command google returned: [ { "title": "Sea urchin - Wikipedia", "href": "https://en.wikipedia.org/wiki/Sea_urchin", "body": "Sea urchins (/ ˈ ɜːr tʃ ɪ n z /) are spiny, globular echinoderms in the class Echinoidea.About 950 species of sea urchin live on the seabed of every ocean and inhabit every depth zone from the intertidal seashore down to 5,000 meters (16,000 ft; 2,700 fathoms). The spherical, hard shells of sea urchins are round and covered in spines.Most urchin spines range in length from 3 to 10 cm (1 ..." }, { "title": "Sea urchin - Simple English Wikipedia, the free encyclopedia", "href": "https://simple.wikipedia.org/wiki/Sea_urchin", "body": "Sea urchins are spiky members of the class Echinoidea of the phylum Echinodermata. Like the rest of the echinoderms, they are entirely marine. They are globe-shaped, and protected by calcareous plates and spines. Urchin is an old word for hedgehog, and in many foreign languages these animals are called sea hedgehogs." }, { "title": "Sea urchin | Description, Anatomy, & Facts | Britannica", "href": "https://www.britannica.com/animal/sea-urchin", "body": "sea urchin, any of about 950 living species of spiny marine invertebrate animals (class Echinoidea, phylum Echinodermata) with a globular body and a radial arrangement of organs, shown by five bands of pores running from mouth to anus over the test (internal skeleton)." }, { "title": "Diadema antillarum - Wikipedia", "href": "https://en.wikipedia.org/wiki/Diadema_antillarum", "body": "Diadema antillarum, also known as the lime urchin, black sea urchin, or the long-spined sea urchin, [2] is a species of sea urchin in the family Diadematidae . This sea urchin is characterized by its exceptionally long black spines. It is the most abundant and important herbivore on the coral reefs of the western Atlantic and Caribbean basin." }, { "title": "Red sea urchin - Wikipedia", "href": "https://en.wikipedia.org/wiki/Red_sea_urchin", "body": "The red sea urchin ( Mesocentrotus franciscanus) [1] is a sea urchin found in the northeastern Pacific Ocean from Alaska to Baja California." }, { "title": "40 Sea Urchin Facts About These Spiky Creatures - Facts.net", "href": "https://facts.net/sea-urchin-facts/", "body": "Sea urchins make up some of the most visible, and also the most fascinating animals in the sea. Their vibrant color is certainly eye-catching. Sea urchins come in many colors from purple to red, to green, etc. An average sea urchin is pretty small, around 3 to 10 centimeters in diameter." }, { "title": "Echinus esculentus - Wikipedia", "href": "https://en.wikipedia.org/wiki/Echinus_esculentus", "body": "Echinus esculentus, the European edible sea urchin or common sea urchin, is a species of marine invertebrate in the Echinidae family. It is found in coastal areas of western Europe down to a depth of 1,200 m (3,900 ft). [3] It is considered \"Near threatened\" in the IUCN Red List of Threatened Species. [1] Test of Echinus esculentus" }, { "title": "Echinocardium cordatum - Wikipedia", "href": "https://en.wikipedia.org/wiki/Echinocardium_cordatum", "body": "Echinocardium cordatum, also known as the common heart urchin [2] or the sea potato, [3] is a sea urchin in the family Loveniidae. It is found in sub-tidal regions in the NE Atlantic. Other un-named species have been identified as this species from temperate seas around the world. [4] It lives buried in the sandy sea floor. Description [ edit]" } ]
 THOUGHTS:  I will use the 'browse_website' command to access the wikipedia page on sea urchins and summarize the information in 5 sentences
REASONING:  I need to summarize the wikipedia page on sea urchins, and browsing the website will allow me to access the information I need
PLAN: 
-  Use the 'browse_website' command to access the wikipedia page on sea urchins
-  Summarize the information in 5 sentences
CRITICISM:  I should ensure that the information I summarize is accurate and relevant to sea urchins
NEXT ACTION:  COMMAND = browse_website ARGUMENTS = {'url': 'https://en.wikipedia.org/wiki/Sea_urchin', 'question': 'summary'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-= 
[WDM] - Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 6.75M/6.75M [00:01<00:00, 6.69MB/s]
Text length: 57125 characters
Adding chunk 1 / 6 to memory
Summarizing chunk 1 / 6 of length 10867 characters, or 2997 tokens
Added chunk 1 summary to memory, of length 764 characters
Adding chunk 2 / 6 to memory
Summarizing chunk 2 / 6 of length 12774 characters, or 2999 tokens
Added chunk 2 summary to memory, of length 1020 characters
Adding chunk 3 / 6 to memory
Summarizing chunk 3 / 6 of length 12042 characters, or 2993 tokens
Added chunk 3 summary to memory, of length 630 characters
Adding chunk 4 / 6 to memory
Summarizing chunk 4 / 6 of length 9422 characters, or 2997 tokens
Added chunk 4 summary to memory, of length 491 characters
Adding chunk 5 / 6 to memory
Summarizing chunk 5 / 6 of length 8813 characters, or 2978 tokens
Added chunk 5 summary to memory, of length 761 characters
Adding chunk 6 / 6 to memory
Summarizing chunk 6 / 6 of length 3205 characters, or 819 tokens
Added chunk 6 summary to memory, of length 285 characters
Summarized 6 chunks.
SYSTEM:  Command browse_website returned: ("Answer gathered from website: Sea urchins are spiny echinoderms found in oceans around the world. They have a hard shell and move using tube feet and spines. Sea urchins primarily eat algae but can also eat slow-moving animals. They are important members of marine ecosystems and can be eaten by humans in various parts of the world. Sea urchins have a unique immune system and are being studied for their ability to survive in acidic environments. They have been used for various purposes throughout history, including as amulets and in traditional medicine. \n \n Links: ['Jump to content (https://en.wikipedia.org/wiki/Sea_urchin#bodyContent)', 'Main page (https://en.wikipedia.org/wiki/Main_Page)', 'Contents (https://en.wikipedia.org/wiki/Wikipedia:Contents)', 'Current events (https://en.wikipedia.org/wiki/Portal:Current_events)', 'Random article (https://en.wikipedia.org/wiki/Special:Random)']", <selenium.webdriver.chrome.webdriver.WebDriver (session="09f1fab205605cac9e03112862dcdc41")>)
 THOUGHTS:  I will use the 'write_to_file' command to save the summary of the wikipedia page on sea urchins to a file
REASONING:  I need to save the summary of the wikipedia page on sea urchins for future reference, and the 'write_to_file' command will allow me to do so
PLAN: 
-  Use the 'write_to_file' command to save the summary of the wikipedia page on sea urchins to a file
CRITICISM:  I should ensure that the file I save the summary to is easily accessible and properly named
NEXT ACTION:  COMMAND = write_to_file ARGUMENTS = {'file': 'sea_urchin_summary.txt', 'text': 'Sea urchins are spiny echinoderms found in oceans around the world. They have a hard shell and move using tube feet and spines. Sea urchins primarily eat algae but can also eat slow-moving animals. They are important members of marine ecosystems and can be eaten by humans in various parts of the world. Sea urchins have a unique immune system and are being studied for their ability to survive in acidic environments. They have been used for various purposes throughout history, including as amulets and in traditional medicine.'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-= 
SYSTEM:  Command write_to_file returned: File written to successfully.
 THOUGHTS:  I will use the 'read_file' command to confirm that the summary of the wikipedia page on sea urchins was saved correctly
REASONING:  I need to confirm that the summary of the wikipedia page on sea urchins was saved correctly, and the 'read_file' command will allow me to do so
PLAN: 
-  Use the 'read_file' command to confirm that the summary of the wikipedia page on sea urchins was saved correctly
CRITICISM:  I should ensure that the file I saved the summary to is still accessible and properly named
NEXT ACTION:  COMMAND = read_file ARGUMENTS = {'file': 'sea_urchin_summary.txt'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-= 
SYSTEM:  Command read_file returned: Sea urchins are spiny echinoderms found in oceans around the world. They have a hard shell and move using tube feet and spines. Sea urchins primarily eat algae but can also eat slow-moving animals. They are important members of marine ecosystems and can be eaten by humans in various parts of the world. Sea urchins have a unique immune system and are being studied for their ability to survive in acidic environments. They have been used for various purposes throughout history, including as amulets and in traditional medicine.
 THOUGHTS:  I have completed the task of summarizing the wikipedia page on sea urchins and saving the summary to a file. I will use the 'task_complete' command to end this session
REASONING:  I have completed the task I was given, and there are no further actions required
PLAN: 
-  Use the 'task_complete' command to end this session
CRITICISM:  I should ensure that I have completed all necessary actions before ending the session
NEXT ACTION:  COMMAND = task_complete ARGUMENTS = {'reason': 'Task of summarizing the wikipedia page on sea urchins and saving the summary to a file is complete'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-= 
Shutting down...

Co-authored-by: batyu <batyu@localhost>

bszollosinagy mentioned this pull request Apr 19, 2023

Fix split_text chunking bug #2088

Closed

5 tasks

batyu added 4 commits April 19, 2023 16:50

* corrected spacy model requirement

f1bb99d

Merge remote-tracking branch 'origin/master' into prompt_context_over…

33ddb53

…flow_fix # Conflicts: # autogpt/processing/text.py

* bugfix: removed casting string to int

bfbbbb8

* applied black formatting

* one more black formatting

7635085

Pwuts assigned Pwuts and BillSchumacher and unassigned BillSchumacher and Pwuts Apr 19, 2023

github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Apr 19, 2023

Merge remote-tracking branch 'origin/master' into prompt_context_over…

4963199

…flow_fix # Conflicts: # .env.template

github-actions bot removed the conflicts Automatically applied to PRs with merge conflicts label Apr 19, 2023

Pwuts self-requested a review April 19, 2023 20:50

Pwuts approved these changes Apr 19, 2023

View reviewed changes

Pwuts changed the title ~~Prompt context overflow issue fixed~~ Fix model context overflow issue Apr 19, 2023

Pwuts merged commit fa91bc1 into Significant-Gravitas:master Apr 19, 2023

Pwuts mentioned this pull request Apr 19, 2023

Exceeding tokenlimit leads to loop #2590

Closed

1 task

Pwuts added the function: process text label Apr 19, 2023

Pwuts linked an issue Apr 19, 2023 that may be closed by this pull request

Improve chunking and chunk handling #38

Closed

Pwuts mentioned this pull request Apr 19, 2023

Improve chunking and chunk handling #38

Closed

Fadude mentioned this pull request Apr 20, 2023

Prompt overflows aren't handled gracefully #1841

Closed

1 task

Pwuts mentioned this pull request Apr 21, 2023

Possible workaround for #796 #1302

Closed

5 tasks

Pwuts changed the title ~~Fix model context overflow issue~~ Fix browse_website model context overflow issue Apr 22, 2023

Pwuts changed the title ~~Fix browse_website model context overflow issue~~ Fix browse_website model context overflow issue Apr 22, 2023

Pwuts mentioned this pull request Apr 22, 2023

Maximum context length exceeded after get_hyperlinks #2906

Closed

1 task

This was referenced Apr 22, 2023

This model's maximum context length is 8191 tokens, however you requested 89686 tokens (89686 in your prompt) #1639

Closed

how to reduce prompt? #2469

Closed

Crock-nic mentioned this pull request Oct 9, 2023

[Snyk] Upgrade zod from 3.21.4 to 3.22.2 Crock-nic/AutoGPT#2

Open

tdarlak-mwb mentioned this pull request Apr 12, 2024

[Snyk] Fix for 3 vulnerabilities tdarlak-mwb/AutoGPT#2

Merged

matr1xp mentioned this pull request May 3, 2024

[Snyk] Security upgrade zod from 3.21.4 to 3.22.3 matr1xp/AutoGPT#8

Merged

tdarlak-mwb mentioned this pull request May 13, 2024

[Snyk] Fix for 5 vulnerabilities tdarlak-mwb/AutoGPT#16

Merged

sindlinger pushed a commit to Orgsindlinger/Auto-GPT-WebUI that referenced this pull request Sep 25, 2024

Fix model context overflow issue (Significant-Gravitas#2542)

d210269

Co-authored-by: batyu <batyu@localhost>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `browse_website` model context overflow issue #2542

Fix `browse_website` model context overflow issue #2542

bszollosinagy commented Apr 19, 2023 •

edited

Loading

github-actions bot commented Apr 19, 2023

github-actions bot commented Apr 19, 2023

Pwuts left a comment

Fix browse_website model context overflow issue #2542

Fix browse_website model context overflow issue #2542

Conversation

bszollosinagy commented Apr 19, 2023 • edited Loading

Background

Changes

Documentation

Test Plan

PR Quality Checklist

github-actions bot commented Apr 19, 2023

github-actions bot commented Apr 19, 2023

Pwuts left a comment

Choose a reason for hiding this comment

Fix `browse_website` model context overflow issue #2542

Fix `browse_website` model context overflow issue #2542

bszollosinagy commented Apr 19, 2023 •

edited

Loading