Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel acquisition function optimizer #438

Merged
merged 25 commits into from
Nov 29, 2021

Conversation

henrymoss
Copy link
Collaborator

@henrymoss henrymoss commented Nov 25, 2021

I have converted @jesnie's parallel acquisition function optimizer for Trieste.

The resulting optimizer is significantly better than our old one with very little overhead.

All the acquisition function evaluations required for each update step are done in parallel across the many optimization runs using tensorflow and so the greenlet (micro-thread) implementation basically adds no overheads but allows us to run many restarts in parallel. I found no obvious slow down using 1000 parallel runs compared to our old implementation performing a single run.

  • I have set the default number of parallel optimization runs to be 10 * d, but this could potentially set to be much larger.
  • I have also added some unit tests that check optimization performance on more challenging functions (rather than just the quadratic functions we had before). I found that the new optimizer using 1_000 initial points and 10 parallel multi-starts significantly outperform the old optimizer using 1_000_000 initial points and just a single optimization.
  • I have added unit tests to check that we use the correct number of multi-starts and recovery runs. These recovery runs are now also performed in parallel.
  • I also fixed the random search optimizer to work with Box search spaces that have their bounds specified as floats.

@henrymoss henrymoss changed the title Henry/parallel opt Parallel acquisition function optimizer Nov 27, 2021
@henrymoss
Copy link
Collaborator Author

henrymoss commented Nov 27, 2021

@hstojic, I had to only use a single restart for your active learning notebook. If we use multiple restarts and optimizer the acquisition function properly, we seem to try locations of the (batch) search space that break your batch predictive variance acquisition function, i.e. cholesky errors. You might want to look into this? EVen increasing jitter to 1e-2 doesnt fix this!

@henrymoss henrymoss requested a review from uri-granta November 29, 2021 08:17
Copy link
Collaborator

@jesnie jesnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM.

I'm a but concerned about tf.function? For speed you probably want to compile your code? It doesn't look like you do that anywhere, and it's not clear to me how/where you plan to do it? Maybe update test_random_search_optimizer_on_toy_problems to compile it's function, and make sure everything works like you expect?

def test_continuous_optimizer_on_toy_problems(
neg_function: Callable[[TensorType], TensorType],
expected_maximizer: TensorType,
expected_minimum: TensorType,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like you use this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean? use what?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean? use what?

The parameter expected_minimum doesn't appear to be used below. I think you can remove it.

@hstojic
Copy link
Collaborator

hstojic commented Nov 29, 2021

@hstojic, I had to only use a single restart for your active learning notebook. If we use multiple restarts and optimizer the acquisition function properly, we seem to try locations of the (batch) search space that break your batch predictive variance acquisition function, i.e. cholesky errors. You might want to look into this? EVen increasing jitter to 1e-2 doesnt fix this!

@satrialoka implemented that, we should probably ping him to check this out - @satrialoka do you mind investigating this?

@satrialoka
Copy link
Collaborator

satrialoka commented Nov 29, 2021

@hstojic, I had to only use a single restart for your active learning notebook. If we use multiple restarts and optimizer the acquisition function properly, we seem to try locations of the (batch) search space that break your batch predictive variance acquisition function, i.e. cholesky errors. You might want to look into this? EVen increasing jitter to 1e-2 doesnt fix this!

@satrialoka implemented that, we should probably ping him to check this out - @satrialoka do you mind investigating this?

@hstojic Sure!
I'll investigate this.

@henrymoss henrymoss requested a review from vpicheny November 29, 2021 10:57
Copy link
Collaborator

@hstojic hstojic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks quite good to me

two main points

  1. perhaps it has to be done, but scaling the number of runs with dimensionality dependence seems a bit much
  2. it would be useful to send some optimization metadata to tensorboard

Copy link
Collaborator

@hstojic hstojic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good now :)

Comment on lines +327 to +328
if greenlet.dead: # Allow for crashed greenlets
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can they all die? What will we return in that case?

return child_results


class ScipyLbfgsBGreenlet(gr.greenlet):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can greenlets fall with exceptions? Do they catch them all and show "deal == True"? If not, shall we be catching any outside this greenlet?

@henrymoss henrymoss merged commit e3020d9 into secondmind-labs:develop Nov 29, 2021
@icouckuy
Copy link

icouckuy commented Dec 3, 2021

@hstojic, I had to only use a single restart for your active learning notebook. If we use multiple restarts and optimizer the acquisition function properly, we seem to try locations of the (batch) search space that break your batch predictive variance acquisition function, i.e. cholesky errors. You might want to look into this? EVen increasing jitter to 1e-2 doesnt fix this!

Are you selecting duplicate points within one batch? Do you have an example batch where it fails? Which jitter value did you change?

@henrymoss
Copy link
Collaborator Author

@icouckuy you can run the notebook but remove the num_restarts=1 argument to the optimizer to get a concrete example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants