The ultimate Chrome extension for AI-powered assistance with advanced browser automation and support for 10+ AI providers
π Installation β’ β¨ Features β’ π§ Setup β’ π Usage β’ π€ Contributing
ChromeAI Agent is a powerful Chrome extension that brings multiple AI assistants directly to your browser with advanced automation capabilities. With support for OpenAI, Claude, Gemini, OpenRouter, and many more providers, you can chat with AI while browsing any website, extract page context, maintain conversation logs, and automate complex browser interactions with intelligent action planning.
- π Universal Access: Works on any website
- π Multi-Provider: Support for 10+ AI services
- π Smart Context: Automatically includes page information
- πΎ Conversation Logs: Complete chat history with export
- π¨ Markdown Support: Rich text formatting in responses
- π§ Easy Setup: Simple configuration and API key management
- π€ Advanced Automation: Intelligent browser automation with action planning
- π― Enhanced Precision: Temperature 0 AI for consistent automation results
- User-Facing Step-by-Step Progress Comments: Every automation step now appears in the chat UI, providing real-time, transparent progress tracking for all browser automation tasks.
- Robust Progress Tracking: Enhanced progress message delivery and UI integration, ensuring users always see what the agent is doing.
- Improved Multi-URL Automation: More reliable enrollment and fallback logic for course automation, with clearer error handling and feedback.
- Manifest and packaging updates: Version 2.3.3 is ready for Chrome Web Store and GitHub release, with all build artifacts and manifests synchronized.
- Revolutionary Strategic Analysis: LLM now provides strategic reasoning for every automation task with temperature 0 for deterministic results
- Website Intelligence System: Automatic detection and analysis of website contexts (e-commerce, educational, social media, etc.)
- Enhanced Multi-Step Automation: Intelligent wait actions with proper page load detection and 5-second stability delays
- JSON Response Reliability: Improved parsing to handle malformed LLM responses with HTML tags and token markers
- Advanced Action Planning: Strategic automation intents with comprehensive risk assessment and next-step recommendations
- Wait Action Support: Proper
wait
action type handling for page loads and dynamic content - Page Load Stability: 5-second post-load delays ensure pages are fully interactive before actions
- Better Error Recovery: Enhanced JSON parsing with fallback mechanisms for robust operation
- Strategic Command Analysis: LLM analyzes user commands with website context to determine optimal automation approach
- Temperature 0 LLM Configuration: All automation requests use temperature 0 for consistent, deterministic responses
- Enhanced Token Limits: Increased maxOutputTokens to 8000 for comprehensive strategic analysis
- Robust Response Parsing: Multiple fallback methods for extracting content from various LLM provider response formats
- Optimized Content Processing: Reduced HTML content limits to 50KB for faster processing while maintaining effectiveness
- Enhanced build consistency: manual build edits preserved and synced across versions
- Improved packaging script with better exclusion filters and automatic zip creation
- Chrome Web Store package optimization: cleaner file structure, no development artifacts
- Build pipeline stability: eliminated cyclic copy issues and streamlined deployment process
- Tag versioning strategy: resolved remote conflicts with alternate tagging approach
- Source and runtime build fully synchronized; all automation and UI fixes mirrored into
build/
- Compound navigation + search flows hardened in background executor; reliable results landing
- Promise-aware script injection across XPath and legacy pipelines with page readiness checks
- Chat: Gemini response parsing stabilized; logs show more content with truncation hints
- Restricted page handling: clearer side panel messaging, avoids unsupported injections/capture
- Packaging: cleaned build script and exclude list for reproducible, smaller zip
- More reliable multi-step automation on real sites (search flows, consent dialogs, navigation waits)
- Compound command parsing (e.g., βopen a new tab and search β¦β) executes deterministically
- Gemini response parsing corrected in chat
- Chat logs show more content with a clear truncation indicator
- UI gracefully handles restricted pages (Chrome Web Store, chrome://, etc.) with clearer messaging
- Build synchronized: changes mirrored into
build/
so the loaded extension reflects updates
- OpenAI - GPT-3.5, GPT-4, GPT-4 Turbo
- Anthropic Claude - Claude 3 Haiku, Sonnet, Opus
- Google Gemini - Gemini Pro, Gemini Ultra
- OpenRouter - Access to 50+ models from multiple providers
- GitHub Models - Free AI models via GitHub
- Groq - Ultra-fast inference
- DeepSeek - Coding-focused models
- Perplexity - Research and analysis
- Azure OpenAI - Enterprise-grade AI
- Local Models - Support for local AI servers
- π¬ Smart Chat Interface - Clean, intuitive side panel
- π Page Context Awareness - AI knows what page you're on
- π Conversation Logging - Complete chat history with search
- π€ Advanced Browser Automation - Intelligent automation with action planning
- π― Enhanced Element Detection - AI-powered element finding with temperature 0 precision
- β‘ Multi-Step Action Plans - Every automation creates detailed 2-5 step execution plans
- π¨ Markdown Rendering - Rich text formatting in responses
- π Model Switching - Change providers/models on the fly
- βοΈ Default Provider - Set your preferred AI service
- π€ Export Logs - Download conversation history
- π New Chat - Start fresh conversations easily
- Intelligent Planning: Every automation action generates 2-5 step execution plans
- Step-by-Step Execution: Real-time progress tracking with detailed logging
- Temperature 0 Precision: Deterministic AI responses for consistent automation
- Enhanced Element Targeting: Sophisticated relevance scoring to find the right elements
- Anti-Pattern Detection: Prevents common targeting errors (e.g., clicking avatars instead of navigation)
- Smart Clicking:
click the login button
,click submit
,click the back button
- Advanced Targeting: AI finds elements by text, context, and semantic meaning
- Navigation Intelligence: Distinguishes between navigation elements and profile/user elements
- Double-Click & Right-Click: Full mouse interaction support
- Drag & Drop: Move elements between locations
- Intelligent Typing:
type "hello world" in the search box
- Form Field Detection: Automatically finds and focuses input fields
- Smart Text Replacement: Clears existing content before typing new text
- Keyboard Shortcuts: Execute complex key combinations
- Special Keys: Tab, Enter, Arrow keys, Function keys
- Smart Form Filling:
fill out the registration form
- Field Recognition: Identifies form fields by labels, placeholders, and context
- Multiple Input Types: Text inputs, dropdowns, checkboxes, radio buttons
- File Uploads: Handle file selection dialogs
- Form Submission: Automatic form validation and submission
- Intelligent Navigation:
go back
,go forward
,refresh the page
- New Tab Management: Open links in new tabs
- URL Navigation: Direct page navigation
- Scroll Control: Smart scrolling to elements or directions
- Wait Commands: Wait for elements, page loads, or specific conditions
- Content Extraction: Get page text, element content, attributes
- Smart Screenshots: Capture full pages or specific elements
- Element Information: Extract URLs, images, form data
- Page Analysis: Comprehensive page structure understanding
- Element Highlighting: Visual feedback for automation actions
- Custom CSS Injection: Style modifications on the fly
- JavaScript Execution: Run custom scripts when needed
- Alert Handling: Manage browser alerts and confirmations
- π Local Storage - API keys stored securely in Chrome
- π« No Data Collection - No telemetry or analytics
- π OAuth Support - Secure authentication for supported providers
- π HTTPS Only - All API communications encrypted
- Visit the Chrome Web Store
- Search for "ChromeAI Agent"
- Click "Add to Chrome"
- Pin the extension for easy access
-
Download the Extension
git clone https://github.com/guberm/ChromeAIAgent.git cd ChromeAIAgent
-
Load in Chrome
- Open Chrome and go to
chrome://extensions/
- Enable "Developer mode" (top right toggle)
- Click "Load unpacked"
- Select the
ChromeAIAgent
folder
- Open Chrome and go to
-
Pin the Extension
- Click the puzzle piece icon in Chrome toolbar
- Pin "ChromeAI Agent" for easy access
For developers who want to test with local servers:
- Rename
manifest.json
tomanifest-prod.json
- Rename
manifest-dev.json
tomanifest.json
- Load the extension in developer mode
The development manifest includes localhost permissions for testing local AI servers.
π€ OpenAI (Recommended)
- Visit OpenAI Platform
- Create account and get API key
- In extension settings, select "OpenAI"
- Enter your API key
- Choose model (GPT-3.5-turbo, GPT-4, etc.)
Cost: Pay-per-use, ~$0.002 per 1K tokens
π OpenRouter (Best Value)
- Visit OpenRouter.ai
- Sign up and get API key
- In extension settings, select "OpenRouter"
- Enter your API key
- Access 50+ models from multiple providers
Cost: Often 50-90% cheaper than direct APIs
π GitHub Models (Free)
- Visit GitHub Models
- Sign in with GitHub account
- In extension settings, select "GitHub Models"
- Use GitHub authentication
- Access models like GPT-4o mini for free
Cost: Free tier available
β‘ Groq (Fastest)
- Visit Groq Console
- Create account and get API key
- In extension settings, select "Groq"
- Enter your API key
- Enjoy ultra-fast responses
Cost: Free tier + pay-per-use
- Click the ChromeAI Agent icon
- Go to Settings (gear icon)
- Select your provider
- Enter API key
- Choose default model
- Set as default provider (optional)
- Open Side Panel: Click the ChromeAI Agent icon
- Type Message: Enter your question or prompt
- Send: Press Enter or click send button
- Get Response: AI responds with context about current page
ChromeAI Agent now features an advanced automation system with intelligent action planning:
Every automation command creates a detailed 2-5 step execution plan:
Example: "click the back button"
π Action Plan:
1. Analyze page to locate "back button" element (1000ms)
2. Verify "back button" is clickable and visible (500ms)
3. Click on "back button" (500ms)
4. Validate click action was successful (500ms)
π Executing Action Plan:
β
Step 1 completed in 1200ms
β
Step 2 completed in 450ms
β
Step 3 completed in 680ms
β
Step 4 completed in 320ms
π― Action Plan completed in 2650ms
- Navigation:
click the back button
,go to settings
,scroll down
- Form Interaction:
type "hello" in the search box
,fill out the form
- Element Interaction:
click the submit button
,select the dropdown option
- Page Actions:
take a screenshot
,extract all links
,refresh the page
- Temperature 0 AI: Consistent, deterministic responses for reliable automation
- Advanced Element Scoring: Sophisticated algorithms to find the most relevant elements
- Context-Aware Targeting: Understands the difference between navigation buttons and profile elements
- Anti-Pattern Detection: Prevents common automation errors
The AI automatically knows:
- Current page title and URL
- Your browsing context
- Can reference page content in responses
- View History: Click "Show Logs" to see all conversations
- Search Logs: Filter by provider, model, or content
- Export Data: Download logs as JSON
- Show Details: View full request/response data
- Change provider anytime during conversation
- Switch models without losing context
- Compare responses from different AIs
- Click "+" button to start new conversation
- Previous chat automatically saved to logs
- Clean slate for new topics
ChromeAIAgent/
βββ manifest.json # Extension manifest (v3)
βββ background.js # Service worker & API handling
βββ sidepanel.html # Main UI structure
βββ popup.html # Extension popup
βββ css/
β βββ sidepanel.css # Styling & markdown formatting
βββ js/
β βββ sidepanel.js # Main application logic
βββ icons/ # Extension icons
βββ test-*.html # Testing pages
- ChatLogger: Handles conversation persistence
- ChromeAiAgent: Main application class
- Provider Management: Multi-AI service integration
- Markdown Parser: Rich text rendering
- OAuth Handler: Secure authentication
- Add provider to
providerDefaults
insidepanel.js
- Add API endpoint to
getModelsEndpoint()
inbackground.js
- Handle any special authentication in
sendAIRequest()
- Update HTML select options
We welcome contributions! Here's how to help:
- Use GitHub Issues
- Include Chrome version
- Provide steps to reproduce
- Include error logs if available
- Open GitHub Issue with "enhancement" label
- Describe the feature and use case
- Consider implementation complexity
- Fork the repository
- Create feature branch:
git checkout -b feature-name
- Make changes and test thoroughly
- Submit PR with clear description
- Test with multiple AI providers
- Verify on different websites
- Check conversation logging
- Test markdown rendering
- Chrome Web Store release
- Custom system prompts
- Conversation templates
- File upload support
- Voice input/output
- Team collaboration features
- Plugin system for custom providers
- Advanced conversation analytics
- Mobile companion app
- Integration with productivity tools
This project is licensed under the MIT License - see the LICENSE file for details.
- Thanks to all AI provider APIs
- Chrome Extension community
- Open source contributors
- Beta testers and early adopters
- π Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions
- β Star this repo if you find it useful!
β Star this repository if you find it useful!
Made with β€οΈ by developers, for developers