A web-based interface allowing users to direct, observe, and control an AI agent performing browser automation tasks in real-time with interactive step approval.
🚀 Take control of browser automation like never before! This project provides a hands-on web UI where you can collaborate with an AI agent, guiding its browsing actions step-by-step. Watch it navigate, interact with elements, and complete tasks, all under your supervision.
- 👀 Real-time Observation: See exactly what the AI agent sees and does in the browser.
- ✅ Interactive Step Approval: Review and approve each action before the agent proceeds.
▶️ Direct Control: Guide the agent's direction and intervene when needed.- 🌐 Web-Based Interface: Access and control the agent from your browser.
- 🧠 Powered by
browser-use
: Leverages the robustbrowser-use
library (included insrc/browser-use-src
) for core agent capabilities.
- Docker and Docker Compose
-
Clone the repository and navigate to the project directory:
git clone https://github.com/Cofounder-Labs/interactive-browser-use cd interactive-browser-use
-
Copy the example environment file and add your API keys:
cp .env.example .env
Then edit
.env
and add your API keys:# Provide EITHER the OpenAI API Key OR the Azure configuration below # OpenAI API Key OPENAI_API_KEY=your_openai_api_key # --- OR --- # Azure OpenAI Configuration AZURE_ENDPOINT=https://your-resource-name.openai.azure.com/ AZURE_OPENAI_API_KEY=your_azure_api_key
- Ensure Docker and Docker Compose are installed on your system.
- Build and run the services using Docker Compose:
This command builds the images if they don't exist and starts the backend server and the Chrome instance in detached mode.
docker compose up --build
- The application will be accessible at
http://localhost:3000
- To stop the services:
docker compose down
interactive-browser-use/
├── .github/ # GitHub Actions workflows
├── frontend/ # Frontend application code (Next.js)
├── src/
│ ├── browser_agent/ # Backend FastAPI application and agent control logic
│ │ ├── web/ # FastAPI specific code (routes, models)
│ │ ├── utils/ # Utility functions
│ │ ├── agent.py # Core agent interaction logic
│ │ └── cli.py # Command-line interface (if applicable)
│ └── browser-use-src/ # Source code for the underlying browser-use library
├── .env.example # Example environment variables
├── .gitignore
├── docker-compose.yml # Docker Compose configuration
├── Dockerfile.backend # Dockerfile for the backend service
├── Dockerfile.frontend # Dockerfile for the frontend service
├── LICENSE
├── poetry.lock
├── pyproject.toml # Python project configuration (Poetry)
├── README.md # This file
└── run.py # Main entry point for running the backend locally
This project is licensed under the MIT License - see the LICENSE file for details.
This project is built upon the excellent work of the browser-use
team. We utilize their powerful browser-use library for the core browser automation capabilities. Many thanks to them for providing such a valuable open-source tool!
- Ritanshu Dokania
- Re Solver
If you use Interactive Browser Use in your research or project, please cite:
@software{interactive_browser_use2025,
author = {Dokania, Ritanshu and Solver, Re},
title = {Interactive Browser Use: Make Browser Use Interactive},
year = {2025},
publisher = {GitHub},
url = {https://github.com/Cofounder-Labs/interactive-browser-use}
}