Skip to content

BrowserGym, a gym environment for web task automation in the Chromium browser.

License

Notifications You must be signed in to change notification settings

ServiceNow/BrowserGym

Repository files navigation

BrowserGym: a Gym Environment for Web Task Automation

[Setup] [Usage] [Demo] [Citation]

This package provides browsergym, a gym environment for web task automation in the Chromium browser.

4x4.grid.mp4

Example of a GPT4-V agent executing openended tasks (top row, chat interactive), as well as WebArena and WorkArena tasks (bottom row)

BrowserGym includes the following benchmarks by default:

Designing new web benchmarks with BrowserGym is easy, and simply requires to inherit the AbstractBrowserTask class.

Setup

To use browsergym, install one of the following packages:

pip install browsergym  # (recommended) everything below
pip install browsergym-experiments  # experiment utilities (agent, loop, benchmarks) + everything below
pip install browsergym-core  # core functionalities only (no benchmark, just the openended task)
pip install browsergym-miniwob  # core + miniwob
pip install browsergym-webarena  # core + webarena
pip install browsergym-visualwebarena  # core + visualwebarena
pip install browsergym-workarena  # core + workarena
pip install browsergym-assistantbench  # core + assistantbench

Then setup playwright by running

playwright install chromium

Finally, each benchmark comes with its own specific setup that requires to follow additional steps.

Development setup

To install browsergym locally for development, use the following commands:

git clone https://github.com/ServiceNow/BrowserGym.git
cd BrowserGym
make install

Usage

Open-ended example

Boilerplate code to run an agent on an interactive, open-ended task:

import gymnasium as gym
import browsergym.core  # register the openended task as a gym environment

env = gym.make(
    "browsergym/openended",
    task_kwargs={"start_url": "https://www.google.com/"},  # starting URL
    wait_for_user_message=True,  # wait for a user message after each agent message sent to the chat
)
obs, info = env.reset()
done = False
while not done:
    action = ...  # implement your agent here
    obs, reward, terminated, truncated, info = env.step(action)
    done = terminated or truncated

MiniWoB++ example

import gymnasium as gym
import browsergym.miniwob  # register miniwob tasks as gym environments

env = gym.make("browsergym/miniwob.choose-list")
...

To list all the available MiniWoB++ environments run

env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/miniwob")]
print("\n".join(env_ids))

WorkArena example

import gymnasium as gym
import browsergym.workarena  # register workarena tasks as gym environments

env = gym.make("browsergym/workarena.servicenow.order-ipad-pro")
...

To list all the available WorkArena environments run

env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")]
print("\n".join(env_ids))

WebArena example

import gymnasium as gym
import browsergym.webarena  # register webarena tasks as gym environments

env = gym.make("browsergym/webarena.310")
...

To list all the available WebArena environments run

env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/webarena")]
print("\n".join(env_ids))

VisualWebArena example

import gymnasium as gym
import browsergym.webarena  # register webarena tasks as gym environments

env = gym.make("browsergym/visualwebarena.721")
...

To list all the available VisualWebArena environments run

env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/visualwebarena")]
print("\n".join(env_ids))

AssistantBench example

import gymnasium as gym
import browsergym.workarena  # register assistantbench tasks as gym environments

env = gym.make("browsergym/assistantbench.validation.3")
...

To list all the available AssistantBench environments run

env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")]
print("\n".join(env_ids))

Demo

If you want to experiment with a demo agent in BrowserGym, follow these steps:

conda env create -f demo_agent/environment.yml
conda activate demo_agent
# or simply use `pip install -r requirements.txt`
playwright install chromium

Our demo agent uses openai as a backend, be sure to set your OPENAI_API_KEY.

Launch the demo agent on the open web:

python demo_agent/run_demo.py --task_name openended --start_url https://www.google.com

Or use it to solve a simple MiniWoB task:

python demo_agent/run_demo.py --task_name miniwob.click-test

A VisualWebArena task:

python demo_agent/run_demo.py --task_name visualwebarena.398

A WebArena task:

python demo_agent/run_demo.py --task_name webarena.4

A WorkArena task:

python demo_agent/run_demo.py --task_name workarena.servicenow.order-standard-laptop

You can customize your experience by changing the model_name to your preferred LLM (it uses gpt-4o-mini by default), adding screenshots for your VLMs with use_screenshot, and much more! (see python run_demo.py --help)

Citing This Work

Please use the following BibTeX to cite our work:

@inproceedings{workarena2024,
    title = {{W}ork{A}rena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?},
    author = {Drouin, Alexandre and Gasse, Maxime and Caccia, Massimo and Laradji, Issam H. and Del Verme, Manuel and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre},
    booktitle = {Proceedings of the 41st International Conference on Machine Learning},
    pages = {11642--11662},
    year = {2024},
    editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
    volume = {235},
    series = {Proceedings of Machine Learning Research},
    month = {21--27 Jul},
    publisher = {PMLR},
    url = {https://proceedings.mlr.press/v235/drouin24a.html},
}