🤖 ChromeAI Agent - Universal AI Assistant Chrome Extension

The ultimate Chrome extension for AI-powered assistance with advanced browser automation and support for 10+ AI providers

🚀 Installation • ✨ Features • 🔧 Setup • 📖 Usage • 🤝 Contributing

🌟 Overview

ChromeAI Agent is a powerful Chrome extension that brings multiple AI assistants directly to your browser with advanced automation capabilities. With support for OpenAI, Claude, Gemini, OpenRouter, and many more providers, you can chat with AI while browsing any website, extract page context, maintain conversation logs, and automate complex browser interactions with intelligent action planning.

🎯 Why ChromeAI Agent?

🌐 Universal Access: Works on any website
🔄 Multi-Provider: Support for 10+ AI services
📝 Smart Context: Automatically includes page information
💾 Conversation Logs: Complete chat history with export
🎨 Markdown Support: Rich text formatting in responses
🔧 Easy Setup: Simple configuration and API key management
🤖 Advanced Automation: Intelligent browser automation with action planning
🎯 Enhanced Precision: Temperature 0 AI for consistent automation results

✨ Features

🆕 What's New in 2.3.3

User-Facing Step-by-Step Progress Comments: Every automation step now appears in the chat UI, providing real-time, transparent progress tracking for all browser automation tasks.
Robust Progress Tracking: Enhanced progress message delivery and UI integration, ensuring users always see what the agent is doing.
Improved Multi-URL Automation: More reliable enrollment and fallback logic for course automation, with clearer error handling and feedback.
Manifest and packaging updates: Version 2.3.3 is ready for Chrome Web Store and GitHub release, with all build artifacts and manifests synchronized.

🆕 What's New in 2.3.0

🧠 Strategic LLM Automation with Temperature 0 Precision

Revolutionary Strategic Analysis: LLM now provides strategic reasoning for every automation task with temperature 0 for deterministic results
Website Intelligence System: Automatic detection and analysis of website contexts (e-commerce, educational, social media, etc.)
Enhanced Multi-Step Automation: Intelligent wait actions with proper page load detection and 5-second stability delays
JSON Response Reliability: Improved parsing to handle malformed LLM responses with HTML tags and token markers
Advanced Action Planning: Strategic automation intents with comprehensive risk assessment and next-step recommendations

🎯 Enhanced Automation Capabilities

Wait Action Support: Proper wait action type handling for page loads and dynamic content
Page Load Stability: 5-second post-load delays ensure pages are fully interactive before actions
Better Error Recovery: Enhanced JSON parsing with fallback mechanisms for robust operation
Strategic Command Analysis: LLM analyzes user commands with website context to determine optimal automation approach

🔧 Technical Improvements

Temperature 0 LLM Configuration: All automation requests use temperature 0 for consistent, deterministic responses
Enhanced Token Limits: Increased maxOutputTokens to 8000 for comprehensive strategic analysis
Robust Response Parsing: Multiple fallback methods for extracting content from various LLM provider response formats
Optimized Content Processing: Reduced HTML content limits to 50KB for faster processing while maintaining effectiveness

🆕 What's New in 2.2.6

Enhanced build consistency: manual build edits preserved and synced across versions
Improved packaging script with better exclusion filters and automatic zip creation
Chrome Web Store package optimization: cleaner file structure, no development artifacts
Build pipeline stability: eliminated cyclic copy issues and streamlined deployment process
Tag versioning strategy: resolved remote conflicts with alternate tagging approach

🆕 What's New in 2.2.5

Source and runtime build fully synchronized; all automation and UI fixes mirrored into build/
Compound navigation + search flows hardened in background executor; reliable results landing
Promise-aware script injection across XPath and legacy pipelines with page readiness checks
Chat: Gemini response parsing stabilized; logs show more content with truncation hints
Restricted page handling: clearer side panel messaging, avoids unsupported injections/capture
Packaging: cleaned build script and exclude list for reproducible, smaller zip

🆕 What's New in 2.2.4

More reliable multi-step automation on real sites (search flows, consent dialogs, navigation waits)
Compound command parsing (e.g., “open a new tab and search …”) executes deterministically
Gemini response parsing corrected in chat
Chat logs show more content with a clear truncation indicator
UI gracefully handles restricted pages (Chrome Web Store, chrome://, etc.) with clearer messaging
Build synchronized: changes mirrored into build/ so the loaded extension reflects updates

🤖 Multi-AI Provider Support

OpenAI - GPT-3.5, GPT-4, GPT-4 Turbo
Anthropic Claude - Claude 3 Haiku, Sonnet, Opus
Google Gemini - Gemini Pro, Gemini Ultra
OpenRouter - Access to 50+ models from multiple providers
GitHub Models - Free AI models via GitHub
Groq - Ultra-fast inference
DeepSeek - Coding-focused models
Perplexity - Research and analysis
Azure OpenAI - Enterprise-grade AI
Local Models - Support for local AI servers

🔥 Core Capabilities

💬 Smart Chat Interface - Clean, intuitive side panel
🌐 Page Context Awareness - AI knows what page you're on
📊 Conversation Logging - Complete chat history with search
🤖 Advanced Browser Automation - Intelligent automation with action planning
🎯 Enhanced Element Detection - AI-powered element finding with temperature 0 precision
⚡ Multi-Step Action Plans - Every automation creates detailed 2-5 step execution plans
🎨 Markdown Rendering - Rich text formatting in responses
🔄 Model Switching - Change providers/models on the fly
⚙️ Default Provider - Set your preferred AI service
📤 Export Logs - Download conversation history
🆕 New Chat - Start fresh conversations easily

🤖 Browser Automation System

🚀 Action Planning Engine

Intelligent Planning: Every automation action generates 2-5 step execution plans
Step-by-Step Execution: Real-time progress tracking with detailed logging
Temperature 0 Precision: Deterministic AI responses for consistent automation
Enhanced Element Targeting: Sophisticated relevance scoring to find the right elements
Anti-Pattern Detection: Prevents common targeting errors (e.g., clicking avatars instead of navigation)

🖱️ Mouse & Click Actions

Smart Clicking: click the login button, click submit, click the back button
Advanced Targeting: AI finds elements by text, context, and semantic meaning
Navigation Intelligence: Distinguishes between navigation elements and profile/user elements
Double-Click & Right-Click: Full mouse interaction support
Drag & Drop: Move elements between locations

⌨️ Keyboard & Text Input

Intelligent Typing: type "hello world" in the search box
Form Field Detection: Automatically finds and focuses input fields
Smart Text Replacement: Clears existing content before typing new text
Keyboard Shortcuts: Execute complex key combinations
Special Keys: Tab, Enter, Arrow keys, Function keys

📝 Form Automation

Smart Form Filling: fill out the registration form
Field Recognition: Identifies form fields by labels, placeholders, and context
Multiple Input Types: Text inputs, dropdowns, checkboxes, radio buttons
File Uploads: Handle file selection dialogs
Form Submission: Automatic form validation and submission

🪟 Navigation & Page Control

Intelligent Navigation: go back, go forward, refresh the page
New Tab Management: Open links in new tabs
URL Navigation: Direct page navigation
Scroll Control: Smart scrolling to elements or directions
Wait Commands: Wait for elements, page loads, or specific conditions

🔍 Data Extraction & Analysis

Content Extraction: Get page text, element content, attributes
Smart Screenshots: Capture full pages or specific elements
Element Information: Extract URLs, images, form data
Page Analysis: Comprehensive page structure understanding

🎨 Visual & Advanced Actions

Element Highlighting: Visual feedback for automation actions
Custom CSS Injection: Style modifications on the fly
JavaScript Execution: Run custom scripts when needed
Alert Handling: Manage browser alerts and confirmations

🛡️ Privacy & Security

🔐 Local Storage - API keys stored securely in Chrome
🚫 No Data Collection - No telemetry or analytics
🎭 OAuth Support - Secure authentication for supported providers
🔒 HTTPS Only - All API communications encrypted

🚀 Installation

Method 1: Chrome Web Store

Visit the Chrome Web Store
Search for "ChromeAI Agent"
Click "Add to Chrome"
Pin the extension for easy access

Method 2: Developer Mode (Current)

Download the Extension

git clone https://github.com/guberm/ChromeAIAgent.git
cd ChromeAIAgent

Load in Chrome
- Open Chrome and go to chrome://extensions/
- Enable "Developer mode" (top right toggle)
- Click "Load unpacked"
- Select the ChromeAIAgent folder
Pin the Extension
- Click the puzzle piece icon in Chrome toolbar
- Pin "ChromeAI Agent" for easy access

🔧 Development Setup

For developers who want to test with local servers:

Rename manifest.json to manifest-prod.json
Rename manifest-dev.json to manifest.json
Load the extension in developer mode

The development manifest includes localhost permissions for testing local AI servers.

🔧 Setup

Step 1: Choose Your AI Provider

🤖 OpenAI (Recommended)

Visit OpenAI Platform
Create account and get API key
In extension settings, select "OpenAI"
Enter your API key
Choose model (GPT-3.5-turbo, GPT-4, etc.)

Cost: Pay-per-use, ~$0.002 per 1K tokens

🔄 OpenRouter (Best Value)

Visit OpenRouter.ai
Sign up and get API key
In extension settings, select "OpenRouter"
Enter your API key
Access 50+ models from multiple providers

Cost: Often 50-90% cheaper than direct APIs

🆓 GitHub Models (Free)

Visit GitHub Models
Sign in with GitHub account
In extension settings, select "GitHub Models"
Use GitHub authentication
Access models like GPT-4o mini for free

Cost: Free tier available

⚡ Groq (Fastest)

Visit Groq Console
Create account and get API key
In extension settings, select "Groq"
Enter your API key
Enjoy ultra-fast responses

Cost: Free tier + pay-per-use

Step 2: Configure Settings

Click the ChromeAI Agent icon
Go to Settings (gear icon)
Select your provider
Enter API key
Choose default model
Set as default provider (optional)

📖 Usage

Basic Chat

Open Side Panel: Click the ChromeAI Agent icon
Type Message: Enter your question or prompt
Send: Press Enter or click send button
Get Response: AI responds with context about current page

🤖 Browser Automation

ChromeAI Agent now features an advanced automation system with intelligent action planning:

🚀 Action Planning System

Every automation command creates a detailed 2-5 step execution plan:

Example: "click the back button"
📋 Action Plan:
1. Analyze page to locate "back button" element (1000ms)
2. Verify "back button" is clickable and visible (500ms)
3. Click on "back button" (500ms)
4. Validate click action was successful (500ms)

🚀 Executing Action Plan:
✅ Step 1 completed in 1200ms
✅ Step 2 completed in 450ms
✅ Step 3 completed in 680ms
✅ Step 4 completed in 320ms
🎯 Action Plan completed in 2650ms

🎯 Smart Automation Commands

Navigation: click the back button, go to settings, scroll down
Form Interaction: type "hello" in the search box, fill out the form
Element Interaction: click the submit button, select the dropdown option
Page Actions: take a screenshot, extract all links, refresh the page

🧠 Enhanced Precision

Temperature 0 AI: Consistent, deterministic responses for reliable automation
Advanced Element Scoring: Sophisticated algorithms to find the most relevant elements
Context-Aware Targeting: Understands the difference between navigation buttons and profile elements
Anti-Pattern Detection: Prevents common automation errors

Advanced Features

🌐 Page Context

The AI automatically knows:

Current page title and URL
Your browsing context
Can reference page content in responses

📊 Conversation Logs

View History: Click "Show Logs" to see all conversations
Search Logs: Filter by provider, model, or content
Export Data: Download logs as JSON
Show Details: View full request/response data

🔄 Model Switching

Change provider anytime during conversation
Switch models without losing context
Compare responses from different AIs

🆕 Start Fresh

Click "+" button to start new conversation
Previous chat automatically saved to logs
Clean slate for new topics

🔧 Development

Project Structure

ChromeAIAgent/
├── manifest.json              # Extension manifest (v3)
├── background.js              # Service worker & API handling
├── sidepanel.html            # Main UI structure
├── popup.html                # Extension popup
├── css/
│   └── sidepanel.css         # Styling & markdown formatting
├── js/
│   └── sidepanel.js          # Main application logic
├── icons/                    # Extension icons
└── test-*.html              # Testing pages

Key Components

ChatLogger: Handles conversation persistence
ChromeAiAgent: Main application class
Provider Management: Multi-AI service integration
Markdown Parser: Rich text rendering
OAuth Handler: Secure authentication

Adding New Providers

Add provider to providerDefaults in sidepanel.js
Add API endpoint to getModelsEndpoint() in background.js
Handle any special authentication in sendAIRequest()
Update HTML select options

🤝 Contributing

We welcome contributions! Here's how to help:

🐛 Bug Reports

Use GitHub Issues
Include Chrome version
Provide steps to reproduce
Include error logs if available

💡 Feature Requests

Open GitHub Issue with "enhancement" label
Describe the feature and use case
Consider implementation complexity

🔧 Pull Requests

Fork the repository
Create feature branch: git checkout -b feature-name
Make changes and test thoroughly
Submit PR with clear description

🧪 Testing

Test with multiple AI providers
Verify on different websites
Check conversation logging
Test markdown rendering

📋 Roadmap

🔜 Coming Soon

🚀 Future Plans

Team collaboration features
Plugin system for custom providers
Advanced conversation analytics
Mobile companion app
Integration with productivity tools

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Thanks to all AI provider APIs
Chrome Extension community
Open source contributors
Beta testers and early adopters

📞 Support

🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions
⭐ Star this repo if you find it useful!

⭐ Star this repository if you find it useful!

Made with ❤️ by developers, for developers

🔝 Back to top

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github		.github
build		build
css		css
icons		icons
js		js
scripts		scripts
src/bg		src/bg
tests		tests
.gitignore		.gitignore
CHROME_STORE_LISTING.md		CHROME_STORE_LISTING.md
LICENSE		LICENSE
OPTIONS.md		OPTIONS.md
PRIVACY.md		PRIVACY.md
README.md		README.md
RELEASE_NOTES_v2.3.0.md		RELEASE_NOTES_v2.3.0.md
RELEASE_NOTES_v2.3.3.md		RELEASE_NOTES_v2.3.3.md
background.js		background.js
build-exclude.txt		build-exclude.txt
build-webstore.bat		build-webstore.bat
manifest-dev.json		manifest-dev.json
manifest.json		manifest.json
mcp-provider-interface.js		mcp-provider-interface.js
options.html		options.html
options.js		options.js
package.json		package.json
popup.html		popup.html
popup.js		popup.js
sidepanel.html		sidepanel.html

License

guberm/ChromeAIAgent

Folders and files

Latest commit

History

Repository files navigation