An intelligent browser automation system that allows users to control their web browser using natural language commands through a Chrome extension.
Built with a microservices architecture using Python, LangGraph, and Redis.
- Natural Language Control: Control your browser with commands like "Go to Amazon and search for laptops"
- Chrome Extension Interface: User-friendly popup for entering commands
- AI-Powered Decision Making: Uses Google's Gemini model to interpret commands and execute browser actions
- Distributed Architecture: Scalable microservices design with Redis message queuing
- Smart Browser Automation: Handles popups, CAPTCHAs detection, human-like interactions
- Session Management: Maintains conversation context across multiple commands
The system uses a distributed microservices architecture to ensure scalability and separation of concerns.
- Chrome Extension: User interface for natural language input
- Flask API Server: RESTful API that receives commands and returns results
- Redis Message Queue: Enables asynchronous communication between services
- Agent Worker: LangGraph agent that interprets commands and decides actions
- Browser Worker: Executes browser automation using Playwright