Skip to content

WebRover is an autonomous AI agent designed to interpret user input and execute actions by interacting with web elements to accomplish tasks or answer questions. It leverages advanced language models and web automation tools to navigate the web, gather information, and provide structured responses based on the user's needs.

License

Notifications You must be signed in to change notification settings

hrithikkoduri/WebRover

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

47 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

WebRover

Your AI Co-pilot for Web Navigation πŸš€

Autonomous Web Agent | Task Automation | Information Retrieval | Deep Research

Overview

WebRover is an autonomous AI agent designed to interpret user input and execute actions by interacting with web elements to accomplish tasks or answer questions. It leverages advanced language models and web automation tools to navigate the web, gather information, and provide structured responses based on the user's needs.

Key Features

Agent Capabilities

  • Three specialized agents for different use cases (Task, Research, Deep Research)
  • Dynamic agent selection based on task complexity
  • Real-time agent state visualization
  • Streaming agent actions and thoughts

Browser Integration

  • Local browser instance for privacy and control
  • Multi-tab management
  • PDF document handling
  • Secure browsing sessions

User Interface

  • Modern chat interface with real-time updates
  • Interactive agent selection
  • Action streaming with visual feedback
  • Real-time page annotations and highlights

Output Options

  • Direct chat responses
  • One-click Google Docs export
  • PDF download functionality
  • Copy to clipboard support

Research Tools

  • Vector store for information retention
  • Multi-source verification
  • Academic paper generation
  • Reference management

Technical Features

  • State-of-the-art LLM integration (GPT-4o, o3-mini-high, Claude-3.5 sonnet)
  • RAG pipeline for enhanced responses
  • LangGraph for state management
  • Playwright for reliable web automation

Agent Types

1. Task Agent

A specialized automation agent for executing web-based tasks and workflows.

  • Custom action planning for multi-step tasks
  • Dynamic element interaction based on context
  • Real-time task progress monitoring

2. Research Agent

An information gathering specialist with smart content processing.

  • Intelligent source selection and validation
  • Adaptive search refinement
  • Single-pass comprehensive information gathering

3. Deep Research Agent (New! πŸŽ‰)

An advanced research agent that produces academic-quality content through systematic topic exploration.

  • Automatic topic decomposition and structured research
  • Independent subtopic exploration
  • Academic paper generation with proper citations
  • Cross-referenced bibliography compilation

Agent Architecture Diagrams

Deep Research Agent Flow

Deep Research Agent Architecture

Deep Research Agent's workflow for comprehensive research and content generation

Research Agent Flow

Research Agent Architecture

Research Agent's workflow for information gathering and synthesis

Task Agent Flow

Task Agent Architecture

Task Agent's workflow for automating web interactions

Architecture

The system is built on a modern tech stack with three distinct agent types, each powered by:

  1. State Management

    • LangGraph for maintaining agent state
    • Handles complex navigation flows and decision making
    • Structured workflow management
  2. Browser Automation

    • Playwright for reliable web interaction
    • Custom element detection and interaction system
    • Automated navigation and content extraction
  3. Content Processing

    • RAG (Retrieval Augmented Generation) pipeline
    • Vector store integration for efficient information storage
    • PDF and webpage content extraction
    • Automatic content structuring and organization
  4. AI Decision Making

    • Multiple LLM integration (GPT-4, Claude)
    • Context-aware navigation
    • Self-review mechanisms
    • Structured output generation

Setup Instructions

Backend Setup

  1. Clone the repository

    git clone https://github.com/hrithikkoduri18/webrover.git
    cd webrover
    cd backend
  2. Install Poetry (if not already installed)

    Mac/Linux:

    curl -sSL https://install.python-poetry.org | python3 -

    Windows:

    (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -
  3. Set Python version for Poetry

    poetry env use python3.12
  4. Install dependencies using Poetry:

    poetry install
  5. Activate the Poetry shell: For Unix/Linux/MacOS:

    poetry shell
    # or manually
    source $(poetry env info --path)/bin/activate

    For Windows:

    poetry shell
    # or manually
    & (poetry env info --path)\Scripts\activate
  6. Set up environment variables in .env:

    OPENAI_API_KEY="your_openai_api_key"
    LANGCHAIN_API_KEY="your_langchain_api_key"
    LANGCHAIN_TRACING_V2="true"
    LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
    LANGCHAIN_PROJECT="your_project_name"
    ANTHROPIC_API_KEY="your_anthropic_api_key"
  7. Run the backend:

    Make sure you are in the backend folder

    uvicorn app.main:app --reload --port 8000 

    For Windows User:

    uvicorn app.main:app --port 8000
  8. Access the API at http://localhost:8000

Frontend Setup

  1. Open a new terminal and make sure you are in the WebRover folder:

    cd frontend
  2. Install dependencies:

    npm install
  3. Run the frontend:

    npm run dev
  4. Access the frontend at http://localhost:3000

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.


Made with ❀️ by @hrithikkoduri

About

WebRover is an autonomous AI agent designed to interpret user input and execute actions by interacting with web elements to accomplish tasks or answer questions. It leverages advanced language models and web automation tools to navigate the web, gather information, and provide structured responses based on the user's needs.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •