-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using AgentLab with a custom BG benchmark #99
Comments
A better solution would be to have a @dataclass
class Benchmark:
name: str
tasks: list[AbstractBrowserTask]
max_steps: int The get_benchmark_env_args funtion would be lighter: def get_benchmark_env_args(
benchmark_name: Benchmark, meta_seed=42, n_repeat=None
) -> list[EnvArgs]:
return _make_env_args(benchmark.tasks, benchmark.max_steps, n_repeat, rng) We can also imagine having a benchmark registry for all benchmarks provided by browsergym (just a list or dict we store somewhere with Benchmark objects). |
Hi @imenelydiaker , those are very good points! Things have been moving fast the last few weeks on that side. We now have a AgentLab/src/agentlab/experiments/study.py Lines 59 to 60 in 6e18fb8
|
When using AgentLab with a custom benchmark, I had to update some code files and I wish I wouldn't have to update it, but only adding some arguments to the functions I used. Here is what I did and my suggestions:
The script (main.py) calls this function
run_agents_on_benchmark()
, that callsget_benchmark_env_args()
from task_collection.py module. The function fetches tasks from a given benchmark id, I had to update the file manually to add the task list of my custom benchmark.My suggestion if to add an additionnal argument
tasks_list: list[AbstractBrowserTask]
torun_agents_on_benchmark()
andget_benchmark_env_args()
methods. Setting it should bypass the if/else conditions to fetch tasks of a given benchmark name. This would also be valuable if one would like to run only a specific list of tasks, for testing or fast developement.Another function needed to be updated:
_get_benchmark_version()
from reproductibility_utils.py, but I don't have any suggestion here.The text was updated successfully, but these errors were encountered: