- 
                Notifications
    
You must be signed in to change notification settings  - Fork 451
 
          Use batched L-BFGS-B in gen_candidates_scipy and change the definition of batch_limit
          #2892
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 
           This pull request was exported from Phabricator. Differential Revision: D77043251  | 
    
…ion of `batch_limit` (meta-pytorch#2892) Summary: This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements. If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable. (This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.) This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`. This keyword is used to mean part of what `batch_limit` was used for before. The new meanings are: 1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs. 2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch). ## Why does this PR speedup things? 1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality. 2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then. ## Caveats? None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation. ## Points for discussion 1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right? 2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated. ## Notes on the state of gen_candidates_scipy before this PR (stack) Gen candidates scipy does not properly support fixed features set to None. It will just be ignored in some cases, e.g.: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: None,}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1, "with_grad": False}, ) ``` and it breaks inequality constraints in other cases: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: 1., 2: None}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1}, ) ``` That is why I am just dropping support for None feature_fixes generally. Reviewed By: Balandat Differential Revision: D77043251
76e4479    to
    4ded1b9      
    Compare
  
    | 
           This pull request was exported from Phabricator. Differential Revision: D77043251  | 
    
…ion of `batch_limit` (meta-pytorch#2892) Summary: This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements. If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable. (This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.) This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`. This keyword is used to mean part of what `batch_limit` was used for before. The new meanings are: 1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs. 2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch). ## Why does this PR speedup things? 1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality. 2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then. ## Caveats? None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation. ## Points for discussion 1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right? 2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated. ## Notes on the state of gen_candidates_scipy before this PR (stack) Gen candidates scipy does not properly support fixed features set to None. It will just be ignored in some cases, e.g.: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: None,}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1, "with_grad": False}, ) ``` and it breaks inequality constraints in other cases: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: 1., 2: None}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1}, ) ``` That is why I am just dropping support for None feature_fixes generally. Reviewed By: Balandat Differential Revision: D77043251
4ded1b9    to
    8857945      
    Compare
  
    | 
           This pull request was exported from Phabricator. Differential Revision: D77043251  | 
    
…ion of `batch_limit` (meta-pytorch#2892) Summary: This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements. If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable. (This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.) This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`. This keyword is used to mean part of what `batch_limit` was used for before. The new meanings are: 1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs. 2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch). ## Why does this PR speedup things? 1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality. 2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then. ## Caveats? None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation. ## Points for discussion 1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right? 2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated. ## Notes on the state of gen_candidates_scipy before this PR (stack) Gen candidates scipy does not properly support fixed features set to None. It will just be ignored in some cases, e.g.: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: None,}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1, "with_grad": False}, ) ``` and it breaks inequality constraints in other cases: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: 1., 2: None}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1}, ) ``` That is why I am just dropping support for None feature_fixes generally. Reviewed By: Balandat Differential Revision: D77043251
8857945    to
    4594bcf      
    Compare
  
    | 
           This pull request was exported from Phabricator. Differential Revision: D77043251  | 
    
4594bcf    to
    c6aefa8      
    Compare
  
    …ion of `batch_limit` (meta-pytorch#2892) Summary: This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements. If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable. (This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.) This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`. This keyword is used to mean part of what `batch_limit` was used for before. The new meanings are: 1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs. 2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch). ## Why does this PR speedup things? 1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality. 2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then. ## Caveats? None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation. ## Points for discussion 1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right? 2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated. ## Notes on the state of gen_candidates_scipy before this PR (stack) Gen candidates scipy does not properly support fixed features set to None. It will just be ignored in some cases, e.g.: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: None,}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1, "with_grad": False}, ) ``` and it breaks inequality constraints in other cases: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: 1., 2: None}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1}, ) ``` That is why I am just dropping support for None feature_fixes generally. Reviewed By: Balandat Differential Revision: D77043251
| 
           This pull request was exported from Phabricator. Differential Revision: D77043251  | 
    
…ion of `batch_limit` (meta-pytorch#2892) Summary: This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements. If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable. (This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.) This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`. This keyword is used to mean part of what `batch_limit` was used for before. The new meanings are: 1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs. 2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch). ## Why does this PR speedup things? 1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality. 2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then. ## Caveats? None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation. ## Points for discussion 1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right? 2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated. ## Notes on the state of gen_candidates_scipy before this PR (stack) Gen candidates scipy does not properly support fixed features set to None. It will just be ignored in some cases, e.g.: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: None,}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1, "with_grad": False}, ) ``` and it breaks inequality constraints in other cases: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: 1., 2: None}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1}, ) ``` That is why I am just dropping support for None feature_fixes generally. Reviewed By: Balandat Differential Revision: D77043251
c6aefa8    to
    f8292cf      
    Compare
  
    | 
           This pull request was exported from Phabricator. Differential Revision: D77043251  | 
    
          Codecov ReportAll modified and coverable lines are covered by tests ✅ 
 Additional details and impacted files@@            Coverage Diff            @@
##              main     #2892   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          212       212           
  Lines        19717     19777   +60     
=========================================
+ Hits         19717     19777   +60     ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
  | 
    
f8292cf    to
    491eac6      
    Compare
  
    | 
           This pull request was exported from Phabricator. Differential Revision: D77043251  | 
    
…ion of `batch_limit` (meta-pytorch#2892) Summary: This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements. If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable. (This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.) This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`. This keyword is used to mean part of what `batch_limit` was used for before. The new meanings are: 1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs. 2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch). ## Why does this PR speedup things? 1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality. 2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then. ## Caveats? None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation. ## Points for discussion 1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right? 2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated. ## Notes on the state of gen_candidates_scipy before this PR (stack) Gen candidates scipy does not properly support fixed features set to None. It will just be ignored in some cases, e.g.: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: None,}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1, "with_grad": False}, ) ``` and it breaks inequality constraints in other cases: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: 1., 2: None}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1}, ) ``` That is why I am just dropping support for None feature_fixes generally. Reviewed By: Balandat Differential Revision: D77043251
491eac6    to
    cac2620      
    Compare
  
    …ion of `batch_limit` (meta-pytorch#2892) Summary: This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements. If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable. (This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.) This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`. This keyword is used to mean part of what `batch_limit` was used for before. The new meanings are: 1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs. 2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch). ## Why does this PR speedup things? 1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality. 2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then. ## Caveats? None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation. ## Points for discussion 1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right? 2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated. ## Notes on the state of gen_candidates_scipy before this PR (stack) Gen candidates scipy does not properly support fixed features set to None. It will just be ignored in some cases, e.g.: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: None,}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1, "with_grad": False}, ) ``` and it breaks inequality constraints in other cases: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: 1., 2: None}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1}, ) ``` That is why I am just dropping support for None feature_fixes generally. Reviewed By: Balandat Differential Revision: D77043251
cac2620    to
    53f8ba6      
    Compare
  
    | 
           This pull request was exported from Phabricator. Differential Revision: D77043251  | 
    
    
      
        1 similar comment
      
    
  
    | 
           This pull request was exported from Phabricator. Differential Revision: D77043251  | 
    
53f8ba6    to
    cae6935      
    Compare
  
    …ion of `batch_limit` (meta-pytorch#2892) Summary: Pull Request resolved: meta-pytorch#2892 This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements. If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable. (This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.) This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`. This keyword is used to mean part of what `batch_limit` was used for before. The new meanings are: 1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs. 2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch). ## Why does this PR speedup things? 1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality. 2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then. ## Caveats? None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation. ## Points for discussion 1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right? 2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated. ## Notes on the state of gen_candidates_scipy before this PR (stack) Gen candidates scipy does not properly support fixed features set to None. It will just be ignored in some cases, e.g.: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: None,}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1, "with_grad": False}, ) ``` and it breaks inequality constraints in other cases: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: 1., 2: None}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1}, ) ``` That is why I am just dropping support for None feature_fixes generally. Reviewed By: Balandat Differential Revision: D77043251
…ion of `batch_limit` (meta-pytorch#2892) Summary: This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements. If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable. (This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.) This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`. This keyword is used to mean part of what `batch_limit` was used for before. The new meanings are: 1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs. 2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch). ## Why does this PR speedup things? 1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality. 2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then. ## Caveats? None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation. ## Points for discussion 1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right? 2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated. ## Notes on the state of gen_candidates_scipy before this PR (stack) Gen candidates scipy does not properly support fixed features set to None. It will just be ignored in some cases, e.g.: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: None,}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1, "with_grad": False}, ) ``` and it breaks inequality constraints in other cases: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: 1., 2: None}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1}, ) ``` That is why I am just dropping support for None feature_fixes generally. Reviewed By: Balandat Differential Revision: D77043251
…ion of `batch_limit` (meta-pytorch#2892) Summary: This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements. If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable. (This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.) This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`. This keyword is used to mean part of what `batch_limit` was used for before. The new meanings are: 1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs. 2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch). ## Why does this PR speedup things? 1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality. 2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then. ## Caveats? None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation. ## Points for discussion 1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right? 2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated. ## Notes on the state of gen_candidates_scipy before this PR (stack) Gen candidates scipy does not properly support fixed features set to None. It will just be ignored in some cases, e.g.: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: None,}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1, "with_grad": False}, ) ``` and it breaks inequality constraints in other cases: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: 1., 2: None}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1}, ) ``` That is why I am just dropping support for None feature_fixes generally. Reviewed By: Balandat Differential Revision: D77043251
cae6935    to
    eb0d661      
    Compare
  
    | 
           This pull request was exported from Phabricator. Differential Revision: D77043251  | 
    
…ion of `batch_limit` (meta-pytorch#2892) Summary: Pull Request resolved: meta-pytorch#2892 This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements. If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable. (This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.) This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`. This keyword is used to mean part of what `batch_limit` was used for before. The new meanings are: 1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs. 2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch). ## Why does this PR speedup things? 1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality. 2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then. ## Caveats? None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation. ## Points for discussion 1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right? 2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated. ## Notes on the state of gen_candidates_scipy before this PR (stack) Gen candidates scipy does not properly support fixed features set to None. It will just be ignored in some cases, e.g.: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: None,}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1, "with_grad": False}, ) ``` and it breaks inequality constraints in other cases: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: 1., 2: None}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1}, ) ``` That is why I am just dropping support for None feature_fixes generally. Reviewed By: Balandat Differential Revision: D77043251
eb0d661    to
    13cf860      
    Compare
  
    …ion of `batch_limit` (meta-pytorch#2892) Summary: This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements. If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable. (This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.) This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`. This keyword is used to mean part of what `batch_limit` was used for before. The new meanings are: 1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs. 2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch). ## Why does this PR speedup things? 1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality. 2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then. ## Caveats? None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation. ## Points for discussion 1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right? 2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated. ## Notes on the state of gen_candidates_scipy before this PR (stack) Gen candidates scipy does not properly support fixed features set to None. It will just be ignored in some cases, e.g.: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: None,}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1, "with_grad": False}, ) ``` and it breaks inequality constraints in other cases: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: 1., 2: None}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1}, ) ``` That is why I am just dropping support for None feature_fixes generally. Reviewed By: Balandat Differential Revision: D77043251
13cf860    to
    826f565      
    Compare
  
    | 
           This pull request was exported from Phabricator. Differential Revision: D77043251  | 
    
…ion of `batch_limit` (meta-pytorch#2892) Summary: This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements. If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable. (This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.) This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`. This keyword is used to mean part of what `batch_limit` was used for before. The new meanings are: 1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs. 2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch). ## Why does this PR speedup things? 1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality. 2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then. ## Caveats? None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation. ## Points for discussion 1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right? 2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated. ## Notes on the state of gen_candidates_scipy before this PR (stack) Gen candidates scipy does not properly support fixed features set to None. It will just be ignored in some cases, e.g.: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: None,}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1, "with_grad": False}, ) ``` and it breaks inequality constraints in other cases: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: 1., 2: None}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1}, ) ``` That is why I am just dropping support for None feature_fixes generally. Reviewed By: Balandat Differential Revision: D77043251
…ion of `batch_limit` (meta-pytorch#2892) Summary: This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements. If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable. (This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.) This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`. This keyword is used to mean part of what `batch_limit` was used for before. The new meanings are: 1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs. 2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch). ## Why does this PR speedup things? 1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality. 2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then. ## Caveats? None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation. ## Points for discussion 1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right? 2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated. ## Notes on the state of gen_candidates_scipy before this PR (stack) Gen candidates scipy does not properly support fixed features set to None. It will just be ignored in some cases, e.g.: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: None,}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1, "with_grad": False}, ) ``` and it breaks inequality constraints in other cases: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: 1., 2: None}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1}, ) ``` That is why I am just dropping support for None feature_fixes generally. Reviewed By: Balandat Differential Revision: D77043251
826f565    to
    55cefbb      
    Compare
  
    | 
           This pull request was exported from Phabricator. Differential Revision: D77043251  | 
    
…ion of `batch_limit` (meta-pytorch#2892) Summary: This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements. This PR gives us a good speedup for acq. function optimization, if done with L-BFGS-B, something like 4x can be expected. This can be further increased to closer to 10x by changing the `batch_limit`, but we defer this to another PR, in case this might lead to memory issues. Optimize mixed is barely sped up by this PR, but we have follow up fixing that. This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`. This keyword is used to mean part of what `batch_limit` was used for before. The new meanings are: 1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs. 2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch). ## Why does this PR speedup things? 1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality. 2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then. ## Caveats? None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation. ## Notes on the state of gen_candidates_scipy before this PR (stack) Gen candidates scipy does not properly support fixed features set to None. It will just be ignored in some cases, e.g.: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: None,}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1, "with_grad": False}, ) ``` and it breaks inequality constraints in other cases: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: 1., 2: None}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1}, ) ``` That is why I am just dropping support for None feature_fixes generally. Reviewed By: Balandat Differential Revision: D77043251
…ion of `batch_limit` (meta-pytorch#2892) Summary: This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements. This PR gives us a good speedup for acq. function optimization, if done with L-BFGS-B, something like 4x can be expected. This can be further increased to closer to 10x by changing the `batch_limit`, but we defer this to another PR, in case this might lead to memory issues. Optimize mixed is barely sped up by this PR, but we have follow up fixing that. This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`. This keyword is used to mean part of what `batch_limit` was used for before. The new meanings are: 1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs. 2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch). ## Why does this PR speedup things? 1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality. 2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then. ## Caveats? None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation. ## Notes on the state of gen_candidates_scipy before this PR (stack) Gen candidates scipy does not properly support fixed features set to None. It will just be ignored in some cases, e.g.: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: None,}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1, "with_grad": False}, ) ``` and it breaks inequality constraints in other cases: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: 1., 2: None}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1}, ) ``` That is why I am just dropping support for None feature_fixes generally. Reviewed By: Balandat Differential Revision: D77043251
…ion of `batch_limit` (meta-pytorch#2892) Summary: This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements. This PR gives us a good speedup for acq. function optimization, if done with L-BFGS-B, something like 4x can be expected. This can be further increased to closer to 10x by changing the `batch_limit`, but we defer this to another PR, in case this might lead to memory issues. Optimize mixed is barely sped up by this PR, but we have follow up fixing that. This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`. This keyword is used to mean part of what `batch_limit` was used for before. The new meanings are: 1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs. 2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch). ## Why does this PR speedup things? 1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality. 2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then. ## Caveats? None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation. ## Notes on the state of gen_candidates_scipy before this PR (stack) Gen candidates scipy does not properly support fixed features set to None. It will just be ignored in some cases, e.g.: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: None,}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1, "with_grad": False}, ) ``` and it breaks inequality constraints in other cases: ``` from botorch.generation.gen import gen_candidates_scipy import torch acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1) gen_candidates_scipy( initial_conditions=torch.zeros(1, 1, 3), inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)], #fixed_features={0: torch.tensor([.8])}, fixed_features={0: 1., 2: None}, acquisition_function=acquisition_function, lower_bounds=torch.zeros(3), upper_bounds=torch.ones(3), options={"batch_limit": 1}, ) ``` That is why I am just dropping support for None feature_fixes generally. Reviewed By: Balandat Differential Revision: D77043251
| 
           This pull request has been merged in 490bf8f.  | 
    
Summary:
This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.
If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the
batch_limitif applicable.(This still does not translate a lot of these improvements to
optimize_mixed, which is done in a later PR.)This PR changes our optimization routine to accept a new option keyword:
max_optimization_problem_aggregation_size.This keyword is used to mean part of what
batch_limitwas used for before.The new meanings are:
batch_limitdescribes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.max_optimization_problem_aggregation_sizeis the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).Why does this PR speedup things?
Caveats?
None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation.
Points for discussion
I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right?
I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated.
Notes on the state of gen_candidates_scipy before this PR (stack)
Gen candidates scipy does not properly support fixed features set to None.
It will just be ignored in some cases, e.g.:
and it breaks inequality constraints in other cases:
That is why I am just dropping support for None feature_fixes generally.
Reviewed By: Balandat
Differential Revision: D77043251