A demonstration agent using voice to perform automated web actions using the AGI SDK framework.
This uses REAL Evals to evaluate your agent and give you a score and win #1 on the REAL leaderboard for a $1000 prize.
- Python 3.11 or higher
- OpenAI API key (or compatible provider)
-
Clone the repository:
git clone https://github.com/agi-inc/real-audio cd real-audio
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up your API key:
export OPENAI_API_KEY="your-api-key" # any supported provider key works
-
Set up playwright:
playwright install chromium --force
Run the agent with default settings:
python run_agent.py
--model
: Model to use (default: gpt-4o)--task
: Task to run (default: webclones.omnizon-1)--headless
: Run in headless mode (default: False)
# Run with a specific task
python run_agent.py --task webclones.dashdish-1
# Run in headless mode
python run_agent.py --headless true
The project includes audio tasks for various web applications:
- dashdish: Restaurant/food delivery platform tasks
- fly-unified: Flight booking platform tasks
- gocalendar: Calendar application tasks
- gomail: Email application tasks
- networkin: Professional networking platform tasks
- omnizon: E-commerce platform tasks
- opendining: Restaurant reservation platform tasks
- staynb: Accommodation booking platform tasks
- topwork: Job platform tasks
- udriver: Ride-sharing platform tasks
- zilloft: Real estate platform tasks
Agent logs are stored in the /results directory. If your agent crashes, it will be logged there, so look there first.
- API Key Issues: Ensure your OPENAI_API_KEY is properly set and has sufficient credits
- Dependencies: Make sure all requirements are installed with
pip install -r requirements.txt
agisdk
: AGI SDK frameworkplaywright
: Browser automationnumpy
: Numerical computingopenai
: OpenAI API clientPillow
: Image processing