Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce SeleniumBrowser #1733

Closed
wants to merge 39 commits into from
Closed

Introduce SeleniumBrowser #1733

wants to merge 39 commits into from

Conversation

signalprime
Copy link
Contributor

@signalprime signalprime commented Feb 20, 2024

Key Contributions:

  • SeleniumBrowser Function:
    Adds a headless, fully-functional desktop web driver to enable dynamic interactions with web pages, crucial for accessing content reliant on JavaScript or other client-side scripts.

  • SeleniumBrowserWrapper Class:
    Offers a seamless alternative to the SimpleTextBrowser, enhancing agent capabilities in interacting with web pages that require advanced browsing functionalities.

  • WebSurferAgent Enhancements:
    This update enables the ability to select between SimpleTextBrowser or SeleniumBrowserWrapper based on the provided browser_config and opens the door to vision-based function calling and interactions, significantly broadening the scope of tasks and potential use cases for Autogen agents in web interaction scenarios going forward.

  • Unit Testing:
    Extends unit tests to cover the enhanced WebSurferAgent, ensuring both functionality and reliability in the agents' operations.

  • Notebooks:
    Added agentchat_surfer_edge.ipynb to demonstrate cross compatibility and new graphical functionality
    Both notebooks specify gpt-3.5-turbo and rely on round-trip generations without follow-ons.

Benefits

  • Broadens the scope of web-based content that can be collected
  • Facilitates future development of sophisticated agents capable of complex web-based vision tasks
  • Enhances the WebSurferAgent with configurable browser behavior

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checks

Tests for the new Selenium WebDriver addition
Inclusions of `SeleniumBrowserWrapper`, `SeleniumBrowser`, and several required helper functions that are part of the upcoming `ContentCollector` class and the `WebCollectionAgent`.
Provides an optional drop-in replacement for `SimpleTextBrowser` with `SeleniumBrowserWrapper` for use-cases including pages that depend on JavaScript and others that prevent calls from `requests`.  Nearly all compatibility is held through with the exception of page numbering.
    The ContentAgent class is a custom Autogen agent that can be used to collect and store online content from different web pages. It extends the ConversableAgent class and provides additional functionality for managing a list of additional links, storing collected content in local directories, and customizing request headers.
    ContentAgent uses deque to manage a list of additional links for further exploration, with a maximum depth limit set by max_depth parameter. The collected content is stored in the specified storage path (storage_path) using local directories.
    ContentAgent can be customized with request_kwargs and llm_config parameters during instantiation. The default User-Agent header is used for requests, but it can be overridden by providing a new dictionary of headers under request_kwargs.
Very minor updates prior to submitting a PR
small fix in the `fix_missing_protocol` function
Small addition to maintain a dictionary of processed html content, referenced by the source URL (Uniform Resource Locator)
We cover a small sample of websites, asserting expectations against a number of measurements performed on the collected content.  

The assertions include, but are not limited to: 
- the expected variables contain values
- the presence of the expected output files
- that the expected output files are not empty

Further improvements can include:
- evaluation against all choices of WebDriver to confirm functionality 
- evaluation against a larger sample of websites
-
It's noted that `_set_page_content`, `_split_pages`, and `viewport` are likely not yet compatible but seemingly not necessary at this time for the selenium browser wrapper class.
Small updates on imports that have been recently refactored to other locations.  Specifically:
```
from ..agent import Agent
from .. import ConversableAgent, AssistantAgent, UserProxyAgent, GroupChatManager, GroupChat
from ...oai.client import OpenAIWrapper
```
A small change to declaring `self.browser_kwargs` prior to initializing the parent class (ConversableAgent).  This is done to avoid triggering an unexpected argument error for `browser_kwargs`.
@codecov-commenter
Copy link

codecov-commenter commented Feb 20, 2024

Codecov Report

Attention: Patch coverage is 2.32172% with 589 lines in your changes are missing coverage. Please review.

Project coverage is 41.64%. Comparing base (8ec1c3e) to head (c06f6fd).

Files Patch % Lines
autogen/browser_utils.py 2.90% 333 Missing and 1 partial ⚠️
autogen/agentchat/contrib/web_archiver_agent.py 0.00% 215 Missing and 3 partials ⚠️
autogen/agentchat/contrib/web_surfer.py 9.75% 37 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1733      +/-   ##
==========================================
+ Coverage   37.05%   41.64%   +4.59%     
==========================================
  Files          62       63       +1     
  Lines        6499     7096     +597     
  Branches     1438     1675     +237     
==========================================
+ Hits         2408     2955     +547     
- Misses       3898     3905       +7     
- Partials      193      236      +43     
Flag Coverage Δ
unittests 41.61% <2.32%> (+4.56%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@signalprime
Copy link
Contributor Author

signalprime commented Feb 20, 2024 via email

fixing the following pre-commit errors:
autogen/agentchat/contrib/content_agent.py:21:1: E402 Module level import not at top of file
autogen/agentchat/contrib/content_agent.py:34:1: E402 Module level import not at top of file
autogen/agentchat/contrib/content_agent.py:65:33: F811 Redefinition of unused `deque` from line 6
autogen/agentchat/contrib/content_agent.py:374:26: F811 Redefinition of unused `filename` from line 7
@signalprime
Copy link
Contributor Author

Made some last updates based on the generous feedback from @skzhang1. The only two other things I can think to include would be an update to the docs for the graphical WebSurferAgent and additional tests to test/test_browser_utils.py.

@sonichi thank you for running the OAI tests. I noticed test_web_surfer.py failed with a NameError about WebSurferAgent being undefined. That test file is unchanged in this PR, but the agent has small changes. Looking into it we do see the agent class imported here with a catch to disable all tests in the case of failure. Your thoughts?

@signalprime signalprime changed the title Introduce ContentAgent with SeleniumBrowser for Enhanced Web Content Collection Introduce WebArchiverAgent with SeleniumBrowser for Enhanced Web Content Collection Feb 26, 2024
Copy link
Contributor

@sonichi sonichi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to get @afourney 's opinion on the structure of the changes, e.g., whether it's ok to directly modify the current web surfer agent or should a separate agent be created. And whether a separate optional dependency is desired. My other comments may not apply if a structural change is needed.

autogen/browser_utils.py Outdated Show resolved Hide resolved
autogen/agentchat/contrib/web_surfer.py Outdated Show resolved Hide resolved
.github/workflows/contrib-openai.yml Outdated Show resolved Hide resolved
@signalprime
Copy link
Contributor Author

I'd like to get @afourney 's opinion on the structure of the changes, e.g., whether it's ok to directly modify the current web surfer agent or should a separate agent be created. And whether a separate optional dependency is desired. My other comments may not apply if a structural change is needed.

Hi @afourney, I saw in an issue that you were thinking to pass a browser object to the agent. Would you prefer that to the method I have configured? I could expose an option to provide a browser object while maintaining the current behavior where the browser type is specified in the config to keep things simple for the avg end-user. Your thoughts?

@signalprime signalprime changed the title Introduce WebArchiverAgent with SeleniumBrowser for Enhanced Web Content Collection Introduce SeleniumBrowser for Enhanced Web Content Collection Mar 11, 2024
@signalprime signalprime changed the title Introduce SeleniumBrowser for Enhanced Web Content Collection Introduce SeleniumBrowser Mar 11, 2024
@afourney afourney mentioned this pull request Mar 14, 2024
change _set_page_content to set_page_content
change _set_page_content to set_page_content
Removing the exception messages related to Selenium
Minor fix to permit testing
Copy link

gitguardian bot commented Jul 20, 2024

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

Since your pull request originates from a forked repository, GitGuardian is not able to associate the secrets uncovered with secret incidents on your GitGuardian dashboard.
Skipping this check run and merging your pull request will create secret incidents on your GitGuardian dashboard.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
10404662 Triggered Generic CLI Secret 841ed31 .github/workflows/dotnet-release.yml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@MohammedNagdy
Copy link

@signalprime i would like you to take a look at my PR maybe we can adopt one design as I’m working on a similar part.

@ekzhu ekzhu changed the base branch from main to 0.2 October 2, 2024 18:30
@jackgerrits jackgerrits added the 0.2 Issues which are related to the pre 0.4 codebase label Oct 4, 2024
@rysweet
Copy link
Collaborator

rysweet commented Oct 10, 2024

This PR is against AutoGen 0.2. AutoGen 0.2 has been moved to the 0.2 branch. Please rebase your PR on the 0.2 branch or update it to work with the new AutoGen 0.4 that is now in main.

@rysweet rysweet added the awaiting-op-response Issue or pr has been triaged or responded to and is now awaiting a reply from the original poster label Oct 10, 2024
@rysweet
Copy link
Collaborator

rysweet commented Oct 11, 2024

@signalprime unfortunately after rebase there are still some open conflicts. If you are still interested in bringing this one forward please see if you can get those resolved and green the latest CI.

@rysweet
Copy link
Collaborator

rysweet commented Oct 18, 2024

closing as stale, please reopen if you would like to update

@rysweet rysweet closed this Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.2 Issues which are related to the pre 0.4 codebase awaiting-op-response Issue or pr has been triaged or responded to and is now awaiting a reply from the original poster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants