This project allows you to control your browser using natural language. It integrates OpenAI's GPT-4 with the Playwright library, enabling seamless browser navigation. GPT-4 generates code snippets, which Playwright executes to carry out specified tasks.
npm installOPENAI_API_KEY=your_openai_api_key
Replace your_openai_api_key with your actual OpenAI API key.
npm run startUsage: node index.js -o code.spec.js
Options:
-a, --autogpt run with autogpt (default: false)
-m, --model <model> openai model to use (default: "gpt-4-1106-preview")
-o, --outputFilePath <outputFilePath> path to store test code
-u, --url <url> url to start on (default: "https://www.google.com")
-v, --viewport <viewport> viewport size to use (default: "1280,720")
-h, --help display help for command
The script opens a browser window.
In the terminal, you'll be prompted to enter a task.
Type your task using natural language (e.g., "Generate an interesting phrase and type it into Google") and press Enter.
GPT-4 can recognize buttons and text on the page and will navigate the browser to complete the specified task.
To stop the script, press Ctrl + C in the terminal.
Here are some example tasks you can input:
go to iFixitSearch for Iphone 11 Pro Max BatteryClick on the first iPhone 11 Pro Max Battery and wait for Page to LoadClick the second Add to Cart Button, then CheckoutFill out the Contact/Shipping forms with fake but believeable dataSelect Standard Shipping then Continue to PaymentWait for "Pay Now" button to be visible
With autogpt enabled, you can also input more complex tasks like:
go to hn and click on the first articleuse bing and find the abc article
This script serves as a demonstration of GPT-4 and Playwright integration, and may not perform flawlessly for every task or website. Generated code snippets could fail to execute, or the model might not comprehend specific inputs. Consider providing a more detailed task description or rephrasing your input in these situations. Some websites might be too large to fit in the prompt for smaller models like base gpt-4, hence we default to gpt-4-1106-preview with 125k tokens.
This project is licensed under the MIT License. See the LICENSE file for details.
