Add evaluation worker #43

alex-petrenko · 2020-08-06T11:00:14Z

Ideally, we want a worker that can render an episode every x minutes and post a gif animation or video on Tensorboard.

The best solution (I think) is to repurpose the existing ActorWorker for this. We can just add a regime where it also saves the environment frames (rendering them if necessary).

neevparikh · 2020-08-26T23:29:52Z

This would involve adding an evaluation/render TaskType?

alex-petrenko · 2020-08-26T23:39:18Z

The simplest thing I can think of is to just make one of the actor workers a "special" one. I.e. add some sort of flag that will also save environment frames to some memory location, or to disk, instead of just sending them to the learner.

It can do it every x minutes (instead of every trajectory) to avoid using too many resources.

For the environments that do not have pixel observations we can additionally do something like render(mode='rgb_array') to get the rendered frames to begin with.

Once the episode trajectory is collected, the actor worker can send it to the master process which currently writes all the summaries. Gifs can be generated with https://github.com/alex-petrenko/sample-factory/blob/master/utils/gifs.py

signalprime · 2020-09-09T20:49:55Z

Hi, I found this thread looking for any details on how to periodically evaluate the model on a separate evaluation environment during training, so we can compare training and validation on the same tensorboard. Would you comment on how you'd go about it and I could submit a PR?

alex-petrenko · 2020-09-10T03:54:26Z

@signalprime
In case the evaluation worker has a different set of environments and maybe an entirely different workflow to the regular actor worker, I'd say you need to create/spawn a separate worker process for this. However, it makes a lot of sense to reuse the existing actor-policy worker infrastructure for simulation and inference. This new evaluation worker can be a subclass of ActorWorker with a few methods overloaded, such as environment creation and the processing of trajectories (i.e. we should not send them to the learner).

An important requirement is that users should not pay for the evaluation worker if they don't use it. So if the option is disabled there should be an absolute minimum of overhead for supporting this feature.

The existing mechanism for sending summaries to the main process can also be reused (see report_queue). The reports from the evaluation worker can be marked with a special flag such that summaries can be written with separate keys (I'd just add eval_ prefix to everything to have a completely separate TB section).

An alternative option is to just turn one of the existing workers into an evaluation worker using a set of flags. This can probably be a lot more simple than adding another class and another process.
We can use the flags to control things like:

how the environments are created (e.g. to enable some sort of evaluation environment)
whether we train on the collected experience or not (probably should be False for the evaluation worker if the set of environments is different)
visualization of experience. The evaluation worker can create gif animations of episodes and add them to tensorboard.

The above options can be switched on/off separately to allow for a more flexible configuration.

signalprime · 2020-10-26T18:13:53Z

hi @alex-petrenko, The second idea does sound better. I've not forgotten about this thread, there are a few elements I'm trying to understand for the changes. I've forked the project and have been reviewing what needs to be done.

Inside APPO.init():

        # Evaluation
        self.cfg.num_eval_workers = 1
        self.eval_traj_buffers = SharedBuffers(self.cfg, self.num_agents, self.obs_space, self.action_space)
        self.evaluation_workers = dict()
        self.eval_actor_workers = None
        self.eval_policy_workers = dict()
        self.eval_policy_queues = dict()        
        self.eval_policy_inputs = [[] for _ in range(self.cfg.num_eval_workers)]
        self.eval_policy_outputs = dict()
        self.eval_report_queue = MpQueue(20 * 1000 * 1000)
        ... ...
        # Evaluation
        if self.cfg.num_eval_workers > 0:
            for worker_idx in range(self.cfg.num_eval_workers):
                for split_idx in range(self.cfg.worker_num_splits):
                    self.eval_policy_outputs[(worker_idx, split_idx)] = dict()
        ... ...
        # Evaluation
        self.eval_writers = dict()
        eval_summary_dir = join(eval_summaries_dir(experiment_dir(cfg=self.cfg)), str(key))
        self.eval_writers[0] = SummaryWriter(eval_summary_dir, flush_secs=20)

So we have an additional summary dir that will overlap training summaries.

Creating the evaluation ActorWorker:

    def create_eval_worker(self, idx, actor_queue):
        learner_queues = {p: w.task_queue for p, w in self.evaluation_workers.items()}
        
        return ActorWorker( self.cfg, self.obs_space, self.action_space, 
                           num_agents=1, worker_idx=idx, shared_buffers=self.eval_traj_buffers, task_queue=actor_queue, policy_queues=self.eval_policy_queues,
            report_queue=self.eval_report_queue, learner_queues=learner_queues,
        )

Things get a little more complex after this. I'm questioning if we need to add logic inside the following definitions, or create new ones with the 'eval' prefix or suffix:

init_subset
init_workers
finish_initialization

It seems the following need modified copies:

process_report
report
print_stats
report_train_summaries
report_experiment_summaries

Could you talk about what items, other than cfg for environment variables, need changes inside the ActorWorker class?

Thanks!

alex-petrenko · 2020-10-27T01:06:20Z

Hi @signalprime !

Thank you for looking into this!

I guess if you like the second idea of just repurposing the ActorWorker class, maybe it makes sense to reuse the existing infrastracture for creating the ActorWorkers, as well as shared buffers, queues, etc.
Why not just agree that, let's say, the last ActorWorker with idx = num_workers-1 will be the evaluation worker?
Then you only need to pass a special flag to this worker, and that's it.

Again, if you do this, you probably don't need to worry about report, process_report and all this stuff (except for handling the new summaries).

init_subset logic is quite complicated indeed and was added to deal with difficult initialization of some types of environments. First of all, we support restarting the workers if initialization has failed. On top of that, some environments struggle to initialize (e.g. VizDoom) when more than 5-10 environment instances are created simulataneously (hence the init_subset).
I'd say, if you follow the second idea with the flag you just don't need to worry about all that

signalprime · 2020-10-28T13:44:16Z

Perfect, yes, so if the evaluation flag is enabled, we'll assume the last worker is for evaluation.
Thanks for the suggestions, I'll be working to get it functional.

alex-petrenko · 2021-01-07T09:27:30Z

It's called on line 461 in appo.py in the function init_workers(): self.actor_workers = [] max_parallel_init = int(1e9) # might be useful to limit this for some envs worker_indices = list(range(self.cfg.num_workers)) for i in range(0, self.cfg.num_workers, max_parallel_init): workers = self.init_subset(worker_indices[i:i + max_parallel_init], actor_queues) self.actor_workers.extend(workers) init_workers is called in the run() function in the same file вт, 5 янв. 2021 г. в 12:02, signalprime <notifications@github.com>:

…

Hi @alex-petrenko <https://github.com/alex-petrenko> , happy 2021! I have been studying the code for the changes needed, specifically inside APPO <https://github.com/alex-petrenko/sample-factory/blob/master/algorithms/appo/appo.py>, and I can't seem to find where init_subset <https://github.com/alex-petrenko/sample-factory/blob/bc8d1cc5b9733566d7919ea5979c8871abc5275d/algorithms/appo/appo.py#L328> is called externally. We see here <https://github.com/alex-petrenko/sample-factory/blob/bc8d1cc5b9733566d7919ea5979c8871abc5275d/algorithms/appo/appo.py#L461> it's referenced from inside itself, but that's the only reference listed. Can you explain how it's <https://github.com/alex-petrenko/sample-factory/blob/bc8d1cc5b9733566d7919ea5979c8871abc5275d/algorithms/appo/appo.py#L328> being called originally? Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#43 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABJ6HLZRJAKAC2RR23PFWATSYNV7BANCNFSM4PWOYKTQ> .

signalprime · 2021-01-07T16:38:48Z

Thank you Alex. I deleted my post after finding the location as well. My greatest compliments go to you because Sample Factory is truly a work of art. I find it to be 10x faster than other implementations.

alex-petrenko · 2021-01-07T23:30:26Z

No worries! Thank you, I'm glad you find SF helpful! I wish I had more time to work on it and turn it into a proper framework, but it is what it is now. I understand the code is convoluted and hard to read (the price you pay for optimization) чт, 7 янв. 2021 г. в 08:39, signalprime <notifications@github.com>:

…

Thank you Alex. I deleted my post after finding the location as well. My greatest compliments go to you because Sample Factory is truly a work of art. I find it to be 10x faster than other implementations. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#43 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABJ6HL7TM2HRYVAPDK2MUUDSYXPSRANCNFSM4PWOYKTQ> .

github-actions · 2021-05-14T02:35:47Z

This issue is stale because it has been open for 30 days with no activity.

github-actions · 2021-05-28T03:56:05Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

alex-petrenko · 2023-01-08T23:47:14Z

Re opened to determine requirements

aadityanema · 2023-01-11T10:41:57Z

Hello, is this planned for implementation. This is a very good/important feature to have

alex-petrenko · 2023-01-16T04:39:01Z

Yes, I would like to have this in 2.1.0

Any feedback would be very helpful. I.e. what's your use case, how would you use this feature, what implementation details are important.

There is no definite timeline for implementing this, but most likely in the next couple of months.
Prior to that I'd gladly accept PRs/review code if people are willing to work on this!

github-actions bot added the stale label May 14, 2021

github-actions bot closed this as completed May 28, 2021

alex-petrenko reopened this Jan 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add evaluation worker #43

Add evaluation worker #43

alex-petrenko commented Aug 6, 2020

neevparikh commented Aug 26, 2020

alex-petrenko commented Aug 26, 2020

signalprime commented Sep 9, 2020 •

edited

Loading

alex-petrenko commented Sep 10, 2020

signalprime commented Oct 26, 2020 •

edited

Loading

alex-petrenko commented Oct 27, 2020

signalprime commented Oct 28, 2020

alex-petrenko commented Jan 7, 2021 via email

signalprime commented Jan 7, 2021

alex-petrenko commented Jan 7, 2021 via email

github-actions bot commented May 14, 2021

github-actions bot commented May 28, 2021

alex-petrenko commented Jan 8, 2023

aadityanema commented Jan 11, 2023

alex-petrenko commented Jan 16, 2023

Add evaluation worker #43

Add evaluation worker #43

Comments

alex-petrenko commented Aug 6, 2020

neevparikh commented Aug 26, 2020

alex-petrenko commented Aug 26, 2020

signalprime commented Sep 9, 2020 • edited Loading

alex-petrenko commented Sep 10, 2020

signalprime commented Oct 26, 2020 • edited Loading

alex-petrenko commented Oct 27, 2020

signalprime commented Oct 28, 2020

alex-petrenko commented Jan 7, 2021 via email

signalprime commented Jan 7, 2021

alex-petrenko commented Jan 7, 2021 via email

github-actions bot commented May 14, 2021

github-actions bot commented May 28, 2021

alex-petrenko commented Jan 8, 2023

aadityanema commented Jan 11, 2023

alex-petrenko commented Jan 16, 2023

signalprime commented Sep 9, 2020 •

edited

Loading

signalprime commented Oct 26, 2020 •

edited

Loading