Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add evaluation worker #43

Open
alex-petrenko opened this issue Aug 6, 2020 · 15 comments
Open

Add evaluation worker #43

alex-petrenko opened this issue Aug 6, 2020 · 15 comments
Labels

Comments

@alex-petrenko
Copy link
Owner

Ideally, we want a worker that can render an episode every x minutes and post a gif animation or video on Tensorboard.

The best solution (I think) is to repurpose the existing ActorWorker for this. We can just add a regime where it also saves the environment frames (rendering them if necessary).

@neevparikh
Copy link

This would involve adding an evaluation/render TaskType?

@alex-petrenko
Copy link
Owner Author

The simplest thing I can think of is to just make one of the actor workers a "special" one. I.e. add some sort of flag that will also save environment frames to some memory location, or to disk, instead of just sending them to the learner.

It can do it every x minutes (instead of every trajectory) to avoid using too many resources.

For the environments that do not have pixel observations we can additionally do something like render(mode='rgb_array') to get the rendered frames to begin with.

Once the episode trajectory is collected, the actor worker can send it to the master process which currently writes all the summaries. Gifs can be generated with https://github.com/alex-petrenko/sample-factory/blob/master/utils/gifs.py

@signalprime
Copy link

signalprime commented Sep 9, 2020

Hi, I found this thread looking for any details on how to periodically evaluate the model on a separate evaluation environment during training, so we can compare training and validation on the same tensorboard. Would you comment on how you'd go about it and I could submit a PR?

@alex-petrenko
Copy link
Owner Author

@signalprime
In case the evaluation worker has a different set of environments and maybe an entirely different workflow to the regular actor worker, I'd say you need to create/spawn a separate worker process for this. However, it makes a lot of sense to reuse the existing actor-policy worker infrastructure for simulation and inference. This new evaluation worker can be a subclass of ActorWorker with a few methods overloaded, such as environment creation and the processing of trajectories (i.e. we should not send them to the learner).

An important requirement is that users should not pay for the evaluation worker if they don't use it. So if the option is disabled there should be an absolute minimum of overhead for supporting this feature.

The existing mechanism for sending summaries to the main process can also be reused (see report_queue). The reports from the evaluation worker can be marked with a special flag such that summaries can be written with separate keys (I'd just add eval_ prefix to everything to have a completely separate TB section).

An alternative option is to just turn one of the existing workers into an evaluation worker using a set of flags. This can probably be a lot more simple than adding another class and another process.
We can use the flags to control things like:

  1. how the environments are created (e.g. to enable some sort of evaluation environment)
  2. whether we train on the collected experience or not (probably should be False for the evaluation worker if the set of environments is different)
  3. visualization of experience. The evaluation worker can create gif animations of episodes and add them to tensorboard.

The above options can be switched on/off separately to allow for a more flexible configuration.

@signalprime
Copy link

signalprime commented Oct 26, 2020

hi @alex-petrenko, The second idea does sound better. I've not forgotten about this thread, there are a few elements I'm trying to understand for the changes. I've forked the project and have been reviewing what needs to be done.

Inside APPO.init():

        # Evaluation
        self.cfg.num_eval_workers = 1
        self.eval_traj_buffers = SharedBuffers(self.cfg, self.num_agents, self.obs_space, self.action_space)
        self.evaluation_workers = dict()
        self.eval_actor_workers = None
        self.eval_policy_workers = dict()
        self.eval_policy_queues = dict()        
        self.eval_policy_inputs = [[] for _ in range(self.cfg.num_eval_workers)]
        self.eval_policy_outputs = dict()
        self.eval_report_queue = MpQueue(20 * 1000 * 1000)
        ... ...
        # Evaluation
        if self.cfg.num_eval_workers > 0:
            for worker_idx in range(self.cfg.num_eval_workers):
                for split_idx in range(self.cfg.worker_num_splits):
                    self.eval_policy_outputs[(worker_idx, split_idx)] = dict()
        ... ...
        # Evaluation
        self.eval_writers = dict()
        eval_summary_dir = join(eval_summaries_dir(experiment_dir(cfg=self.cfg)), str(key))
        self.eval_writers[0] = SummaryWriter(eval_summary_dir, flush_secs=20)

So we have an additional summary dir that will overlap training summaries.

Creating the evaluation ActorWorker:

    def create_eval_worker(self, idx, actor_queue):
        learner_queues = {p: w.task_queue for p, w in self.evaluation_workers.items()}
        
        return ActorWorker( self.cfg, self.obs_space, self.action_space, 
                           num_agents=1, worker_idx=idx, shared_buffers=self.eval_traj_buffers, task_queue=actor_queue, policy_queues=self.eval_policy_queues,
            report_queue=self.eval_report_queue, learner_queues=learner_queues,
        )

Things get a little more complex after this. I'm questioning if we need to add logic inside the following definitions, or create new ones with the 'eval' prefix or suffix:

  • init_subset
  • init_workers
  • finish_initialization

It seems the following need modified copies:

  • process_report
  • report
  • print_stats
  • report_train_summaries
  • report_experiment_summaries

Could you talk about what items, other than cfg for environment variables, need changes inside the ActorWorker class?

Thanks!

@alex-petrenko
Copy link
Owner Author

Hi @signalprime !

Thank you for looking into this!

I guess if you like the second idea of just repurposing the ActorWorker class, maybe it makes sense to reuse the existing infrastracture for creating the ActorWorkers, as well as shared buffers, queues, etc.
Why not just agree that, let's say, the last ActorWorker with idx = num_workers-1 will be the evaluation worker?
Then you only need to pass a special flag to this worker, and that's it.

Again, if you do this, you probably don't need to worry about report, process_report and all this stuff (except for handling the new summaries).

init_subset logic is quite complicated indeed and was added to deal with difficult initialization of some types of environments. First of all, we support restarting the workers if initialization has failed. On top of that, some environments struggle to initialize (e.g. VizDoom) when more than 5-10 environment instances are created simulataneously (hence the init_subset).
I'd say, if you follow the second idea with the flag you just don't need to worry about all that

@signalprime
Copy link

Perfect, yes, so if the evaluation flag is enabled, we'll assume the last worker is for evaluation.
Thanks for the suggestions, I'll be working to get it functional.

@alex-petrenko
Copy link
Owner Author

alex-petrenko commented Jan 7, 2021 via email

@signalprime
Copy link

Thank you Alex. I deleted my post after finding the location as well. My greatest compliments go to you because Sample Factory is truly a work of art. I find it to be 10x faster than other implementations.

@alex-petrenko
Copy link
Owner Author

alex-petrenko commented Jan 7, 2021 via email

@github-actions
Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label May 14, 2021
@github-actions
Copy link

This issue was closed because it has been inactive for 14 days since being marked as stale.

@alex-petrenko
Copy link
Owner Author

Re opened to determine requirements

@aadityanema
Copy link

Hello, is this planned for implementation. This is a very good/important feature to have

@alex-petrenko
Copy link
Owner Author

Yes, I would like to have this in 2.1.0

Any feedback would be very helpful. I.e. what's your use case, how would you use this feature, what implementation details are important.

There is no definite timeline for implementing this, but most likely in the next couple of months.
Prior to that I'd gladly accept PRs/review code if people are willing to work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants