Repairs broken changes, Cleaned up Browsing, added switch to choose Selenium or Headless #1534
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Repairs broken changes, Cleaned up Browsing, added switch to choose Selenium or Headless, and reduced the code duplication between Selenium and Requests based browsing.
Background
AutoGPT received a working implementation to help it browse the web with Selenium. Amongst other reasons, this is good because each page can load with javascript enabled, and show its full content, as required by many websites today.
However, the implementation was a bit hasty, concerning existing features, (such as PR #968), and duplicated much code, thereby introducing the possibility of creating bugs when only one copy of the near identical implementations is changed.
This was the motivation behind making the code more DRY.
Using a headless browser for servers was also raised in PR #1520 and PR #1473, and the switch implemented here allows a slightly simpler headless mode: the original
requests
based one. Nevertheless headless Selenium might also be a good idea to allow a page to use its javascripts.Changes
Combined summary.py web.py and browse.py because they were duplicating each others efforts in many aspects.
added a new config parameter in .env to control which kind of browser the user wants: headless or full Selenium with Chrome
restored browse_website() to commands.py
PR Implemented Selenium based web browsing. #1397 introduced a working Selenium adapter, but inadvertently clobbered PR Add visited website to memory for recalling content without being limited by the website summary. #968, and replicated most of the stuff in browse.py, but based on an old version, without any merge conflicts. This is now rectified by moving Selenium code into browse.py, and reducing duplication as much as possible.
there was a small typo, because an object reference was also returned along with the links in the link scraper.
listed the PROs and CONs of each browser in the source code
Documentation
In code comments, and "readable code" serve as documentation, and the messages in the git commits.
Test Plan
This was tested manually by trying both browsers. Now the Selenium part also makes use of the changes in PR #968.
Some unit tests are not passing:
That is because they are closely coupled with 'requests', and probably have not been passing since AutoGPT started using sessions.get instead of requests.get, thereby fouling the Mock.
For now, it is out of the scope of this PR to also correct that.
PR Quality Checklist