diff --git a/README.md b/README.md index 95cdde0..56d7ae8 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,6 @@ # Agent-E +📚 [Cite paper](https://arxiv.org/abs/2407.13032) + Agent-E is an agent based system that aims to automate actions on the user's computer. At the moment it focuses on automation within the browser. The system is based on on [AutoGen agent framework](https://github.com/microsoft/autogen). @@ -29,7 +31,7 @@ While Agent-E is growing, it is already equipped to handle a versatile range of - If you do not have Google Chrome locally (and don't want to install it), install playwright drivers: `playwright install` - .env file in project root is needed with the following (sample `.env-example` is included for convience): - Follow the directions in the sample file - - You will need to set `AUTOGEN_MODEL_NAME` (for example `gpt-4-turbo-preview`) and `AUTOGEN_MODEL_API_KEY` + - You will need to set `AUTOGEN_MODEL_NAME` (We recommend using `gpt-4-turbo` for optimal performance) and `AUTOGEN_MODEL_API_KEY`. - If you are using a model other than OpenAI, you need to set `AUTOGEN_MODEL_BASE_URL` for example `https://api.groq.com/openai/v1` or `https://.openai.azure.com` on [Azure](https://azure.microsoft.com/). - For [Azure](https://azure.microsoft.com/), you'll also need to configure `AUTOGEN_MODEL_API_TYPE=azure` and `AUTOGEN_MODEL_API_VERSION` (for example `2023-03-15-preview`) variables. - If you want to use local chrome browser over playwright browser, go to chrome://version/ in chrome, find the path to your profile and set `BROWSER_STORAGE_DIR` to the path value @@ -51,17 +53,17 @@ To personalize this agent, there is a need for Long Term Memory (LTM) that track ### Run the code: `python -m ae.main` (if you are on a Mac, `python -u -m ae.main` See blocking IO issues above) -Once the program is running, you should see an icon on the browser. The icon expands to chat-like interface where you can enter natural language requests. For example, `open youtube`, `search youtube for funny cat videos`, `find Nothing Phone 2 on Amazon and sort the results by best seller`, etc. +Once the program is running, you should see an icon on the browser. The icon expands to chat-like interface where you can enter natural language requests. For example, `open youtube`, `search youtube for funny cat videos`, `find Nothing Phone 2 on Amazon and sort the results by best seller`, etc. ## Demos | Video | Command | Description | |-----------|-------------|-------------| -| [![Oppenheimer Video](docs/images/play-video-on-youtube-thumbnail.png)](https://www.youtube.com/embed/zjYeULZW4Ao) | There is an Oppenheimer video on youtube by Veritasium, can you find it and play it? | | -| [![Example 2: Use information to fill forms](docs/images/form-filling-thumbnail.png)](https://www.youtube.com/embed/B5PWBNBbmQU) | Can you do this task? Wait for me to review before submitting. | Takes the highlighted text from the email as part of the instruction.