Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation during training #302

Open
TTomilin opened this issue Aug 7, 2024 · 2 comments
Open

Evaluation during training #302

TTomilin opened this issue Aug 7, 2024 · 2 comments

Comments

@TTomilin
Copy link

TTomilin commented Aug 7, 2024

Hi @alex-petrenko!

Thank you so much for the great project!

I am wondering whether there is a plan to include an evaluator component to the framework that is capable of periodically running the policy in other environments. Say, if I'd like to test the generalization capabilities of the model, or employ it in a continual or curriculum learning setting, it would be necessary to periodically evaluate the policy on environments not actively being trained on.

The enjoy.py script is very useful, but it is rather inconvenient and costly to store numerous checkpoints throughout training and separately run the evaluation on each of them afterward. It would be very handy to have it incorporated into the training run and have the results aggregated under one run.

I found an empty default_evaluator.py, which might have been initiated for that very purpose a while back.

I reckon that the appropriate way is to integrate the evaluator in the connect_components() of Runner to pick up a signal from the learner_worker. For instance, after every n policy updates, the evaluator could obtain the most recent policy, run it in a set of environments, and report the results back to some other component. Perhaps you could give some high-level pointers on how to properly implement this, such that it would follow the architecture and design paradigms of the project to avoid a hacky solution.

Cheers,
Tristan Tomilin

@alex-petrenko
Copy link
Owner

Hi Tristan!

Great question! Your intuition is pretty much on point!

I suppose the most straightforward way to implement the evaluator would be to add an "AlgoObserver". There's an example in train.py:

    runner = runner_cls(cfg)

    if cfg.with_pbt:
        runner.register_observer(PopulationBasedTraining(cfg, runner))

    return cfg, runner

If you can formulate your evaluator as an algo observer (e.g. just copy the most recent checkpoint and fire a process there once per N training iterations), that'd be the easiest way to go.

I you need something a bit more sophisticated, e.g. maybe you want a continuously running parallel process that provides some kind of feedback for the main process, indeed you might want to consider a combination of EventLoopProcess and EventLoopObject. Basically, you spawn a process, create an evaluator object which lives on this event loop, connect some signals and slots so you can exchange messages between this process and the runner process. There's a bit of a learning curve to this, but it's totally doable!

@TTomilin
Copy link
Author

Thanks a lot for the suggestions! The AlgoObserver indeed seems to be suitable for what I was after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants