You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am wondering whether there is a plan to include an evaluator component to the framework that is capable of periodically running the policy in other environments. Say, if I'd like to test the generalization capabilities of the model, or employ it in a continual or curriculum learning setting, it would be necessary to periodically evaluate the policy on environments not actively being trained on.
The enjoy.py script is very useful, but it is rather inconvenient and costly to store numerous checkpoints throughout training and separately run the evaluation on each of them afterward. It would be very handy to have it incorporated into the training run and have the results aggregated under one run.
I found an empty default_evaluator.py, which might have been initiated for that very purpose a while back.
I reckon that the appropriate way is to integrate the evaluator in the connect_components() of Runner to pick up a signal from the learner_worker. For instance, after every n policy updates, the evaluator could obtain the most recent policy, run it in a set of environments, and report the results back to some other component. Perhaps you could give some high-level pointers on how to properly implement this, such that it would follow the architecture and design paradigms of the project to avoid a hacky solution.
Cheers,
Tristan Tomilin
The text was updated successfully, but these errors were encountered:
Great question! Your intuition is pretty much on point!
I suppose the most straightforward way to implement the evaluator would be to add an "AlgoObserver". There's an example in train.py:
runner = runner_cls(cfg)
if cfg.with_pbt:
runner.register_observer(PopulationBasedTraining(cfg, runner))
return cfg, runner
If you can formulate your evaluator as an algo observer (e.g. just copy the most recent checkpoint and fire a process there once per N training iterations), that'd be the easiest way to go.
I you need something a bit more sophisticated, e.g. maybe you want a continuously running parallel process that provides some kind of feedback for the main process, indeed you might want to consider a combination of EventLoopProcess and EventLoopObject. Basically, you spawn a process, create an evaluator object which lives on this event loop, connect some signals and slots so you can exchange messages between this process and the runner process. There's a bit of a learning curve to this, but it's totally doable!
Hi @alex-petrenko!
Thank you so much for the great project!
I am wondering whether there is a plan to include an evaluator component to the framework that is capable of periodically running the policy in other environments. Say, if I'd like to test the generalization capabilities of the model, or employ it in a continual or curriculum learning setting, it would be necessary to periodically evaluate the policy on environments not actively being trained on.
The
enjoy.py
script is very useful, but it is rather inconvenient and costly to store numerous checkpoints throughout training and separately run the evaluation on each of them afterward. It would be very handy to have it incorporated into the training run and have the results aggregated under one run.I found an empty default_evaluator.py, which might have been initiated for that very purpose a while back.
I reckon that the appropriate way is to integrate the evaluator in the
connect_components()
ofRunner
to pick up a signal from the learner_worker. For instance, after everyn
policy updates, the evaluator could obtain the most recent policy, run it in a set of environments, and report the results back to some other component. Perhaps you could give some high-level pointers on how to properly implement this, such that it would follow the architecture and design paradigms of the project to avoid a hacky solution.Cheers,
Tristan Tomilin
The text was updated successfully, but these errors were encountered: