A research experiment and browser automation project scaffolding. Run agent tasks right in your Chrome browser.
Gemini Browser Agent is an automation agent that bridges a Chrome extension with Google’s Gemini Computer Use API. It observes the active tab, exchanges screenshots and events with the model, and performs actions directly in your own browser, no sandbox or virtual machine required.
- Install Python 3.10+ and Chrome (or Chromium-based) browser.
- Clone this repository and open a terminal in the project directory.
- (Optional) Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use: venv\Scripts\activate
- Run the setup helper to install dependencies and scaffold
.env:python setup.py
- Visit https://aistudio.google.com/api-keys to create a Gemini API key, then place it in the generated
.envfile asGEMINI_API_KEY=....
- Start the Python WebSocket bridge:
python websocket_agent.py
- Open Chrome and navigate to
chrome://extensions. - Enable Developer mode, choose Load unpacked, and select the
widget/directory from this project. - Open the sidebar, click Connect to link the extension with the Python agent, and provide your automation goal.
- Press Start AI Agent to let Gemini plan, execute actions, and stream log updates directly in your browser.
For sandboxed automation, use this repo https://github.com/pmbstyle/gemini-computer-use
