-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple Trials for Reinforcement Learning Suggestion #416
Conversation
/assign @hougangliu @YujiOshima |
pkg/suggestion/nasrl_service.py
Outdated
@@ -33,11 +35,13 @@ def __init__(self, request, logger): | |||
self.search_space = None | |||
self.opt_direction = None | |||
self.objective_name = None | |||
self.num_trials = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
api.GetSuggestionsRequest.request_number has defined num_trials, you need not add a new one in suggestionParameters
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for notice! We will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. I removed num_trials from the suggestion parameters and used requestNumber instead.
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hougangliu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
Now the reinforcement learning suggestion supports spawning multiple trials in each iteration. Users can activate this function by specifying the
num_trials
suggestion parameter in StudyJob yaml file.Please notice that this multiple trials support is for extending exploration only, not for asynchronous update. The suggestion will use the average metrics of all the trials to calculate the policy gradient, which provides a more justified evaluation of the LSTM cell's internal state compared to using single trial. And the suggestion will not generate new candidates until all the previous ones are finished.
Fixes #396
This change is