Skip to content

A browser agent with a Google Chrome extension that can work in your browser. Based on Google Gemini 2.5 computer use model.

pmbstyle/gemini-browser-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gemini Browser Agent

A research experiment and browser automation project scaffolding. Run agent tasks right in your Chrome browser.

Overview

Gemini Browser Agent is an automation agent that bridges a Chrome extension with Google’s Gemini Computer Use API. It observes the active tab, exchanges screenshots and events with the model, and performs actions directly in your own browser, no sandbox or virtual machine required.

Setup

  1. Install Python 3.10+ and Chrome (or Chromium-based) browser.
  2. Clone this repository and open a terminal in the project directory.
  3. (Optional) Create a virtual environment:
    python -m venv venv
    source venv/bin/activate  # On Windows use: venv\Scripts\activate
  4. Run the setup helper to install dependencies and scaffold .env:
    python setup.py
  5. Visit https://aistudio.google.com/api-keys to create a Gemini API key, then place it in the generated .env file as GEMINI_API_KEY=....

Usage

  1. Start the Python WebSocket bridge:
    python websocket_agent.py
  2. Open Chrome and navigate to chrome://extensions.
  3. Enable Developer mode, choose Load unpacked, and select the widget/ directory from this project.
  4. Open the sidebar, click Connect to link the extension with the Python agent, and provide your automation goal.
  5. Press Start AI Agent to let Gemini plan, execute actions, and stream log updates directly in your browser.

For sandboxed automation, use this repo https://github.com/pmbstyle/gemini-computer-use

About

A browser agent with a Google Chrome extension that can work in your browser. Based on Google Gemini 2.5 computer use model.

Topics

Resources

Stars

Watchers

Forks