-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel acquisition function optimizer #438
Parallel acquisition function optimizer #438
Conversation
@hstojic, I had to only use a single restart for your active learning notebook. If we use multiple restarts and optimizer the acquisition function properly, we seem to try locations of the (batch) search space that break your batch predictive variance acquisition function, i.e. cholesky errors. You might want to look into this? EVen increasing jitter to 1e-2 doesnt fix this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM.
I'm a but concerned about tf.function
? For speed you probably want to compile your code? It doesn't look like you do that anywhere, and it's not clear to me how/where you plan to do it? Maybe update test_random_search_optimizer_on_toy_problems
to compile it's function, and make sure everything works like you expect?
def test_continuous_optimizer_on_toy_problems( | ||
neg_function: Callable[[TensorType], TensorType], | ||
expected_maximizer: TensorType, | ||
expected_minimum: TensorType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like you use this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean? use what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean? use what?
The parameter expected_minimum
doesn't appear to be used below. I think you can remove it.
@satrialoka implemented that, we should probably ping him to check this out - @satrialoka do you mind investigating this? |
@hstojic Sure! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks quite good to me
two main points
- perhaps it has to be done, but scaling the number of runs with dimensionality dependence seems a bit much
- it would be useful to send some optimization metadata to tensorboard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good now :)
if greenlet.dead: # Allow for crashed greenlets | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can they all die? What will we return in that case?
return child_results | ||
|
||
|
||
class ScipyLbfgsBGreenlet(gr.greenlet): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can greenlets fall with exceptions? Do they catch them all and show "deal == True"? If not, shall we be catching any outside this greenlet?
Are you selecting duplicate points within one batch? Do you have an example batch where it fails? Which jitter value did you change? |
@icouckuy you can run the notebook but remove the |
I have converted @jesnie's parallel acquisition function optimizer for Trieste.
The resulting optimizer is significantly better than our old one with very little overhead.
All the acquisition function evaluations required for each update step are done in parallel across the many optimization runs using tensorflow and so the greenlet (micro-thread) implementation basically adds no overheads but allows us to run many restarts in parallel. I found no obvious slow down using 1000 parallel runs compared to our old implementation performing a single run.