Skip to content

Develop a version of WebSurferAgent that uses a proper headless browser (perhaps with multimodal LLM support) #1481

Closed
@afourney

Description

Possible roadmap:

  • Investigate (Headless Chrome)[https://developer.chrome.com/blog/headless-chrome/] or equivalent.
  • Add a HeadlessChromeBrowser to complement SimpleTextBrowser in https://github.com/microsoft/autogen/blob/main/autogen/browser_utils.py
  • Update WebSurferAgent to accept a web_broswer instance rather than a web_browser_config, and pass in either a SimpleTextBrowser or HeadlessChromeBrowser as appropriate

Additional thoughts: We should try to take full advantage to having a browser under our control. Don't just dump the dom to HTML for BeautifulSoup to parse (like what Langchain does). Rather use javascript running privileged in the page context to query the document, extract text, interact with links, etc.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions