Skip to content

Conversation

@SamuelGabriel
Copy link
Contributor

Summary:
This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.

If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the batch_limit if applicable.

(This still does not translate a lot of these improvements to optimize_mixed, which is done in a later PR.)

This PR changes our optimization routine to accept a new option keyword: max_optimization_problem_aggregation_size.

This keyword is used to mean part of what batch_limit was used for before.

The new meanings are:

  1. batch_limit describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
  2. max_optimization_problem_aggregation_size is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

Why does this PR speedup things?

  1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
  2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our batched L-BFGS-B implementation.

Points for discussion

  1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right?

  2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated.

Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)

and it breaks inequality constraints in other cases:

from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251

@facebook-github-bot facebook-github-bot added CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported labels Jun 24, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77043251

SamuelGabriel added a commit to SamuelGabriel/botorch that referenced this pull request Jun 24, 2025
…ion of `batch_limit` (meta-pytorch#2892)

Summary:

This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.

If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable.

(This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.)

This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`.

This keyword is used to mean part of what `batch_limit` was used for before.

The new meanings are:

1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

## Why does this PR speedup things?

1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

## Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our  batched L-BFGS-B implementation.

## Points for discussion

1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right?

2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated.

## Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)
```
and it breaks inequality constraints in other cases:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)
```

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77043251

SamuelGabriel added a commit to SamuelGabriel/botorch that referenced this pull request Jun 24, 2025
…ion of `batch_limit` (meta-pytorch#2892)

Summary:

This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.

If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable.

(This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.)

This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`.

This keyword is used to mean part of what `batch_limit` was used for before.

The new meanings are:

1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

## Why does this PR speedup things?

1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

## Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our  batched L-BFGS-B implementation.

## Points for discussion

1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right?

2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated.

## Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)
```
and it breaks inequality constraints in other cases:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)
```

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77043251

SamuelGabriel added a commit to SamuelGabriel/botorch that referenced this pull request Jun 24, 2025
…ion of `batch_limit` (meta-pytorch#2892)

Summary:

This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.

If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable.

(This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.)

This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`.

This keyword is used to mean part of what `batch_limit` was used for before.

The new meanings are:

1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

## Why does this PR speedup things?

1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

## Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our  batched L-BFGS-B implementation.

## Points for discussion

1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right?

2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated.

## Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)
```
and it breaks inequality constraints in other cases:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)
```

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77043251

SamuelGabriel added a commit to SamuelGabriel/botorch that referenced this pull request Jun 24, 2025
…ion of `batch_limit` (meta-pytorch#2892)

Summary:

This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.

If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable.

(This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.)

This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`.

This keyword is used to mean part of what `batch_limit` was used for before.

The new meanings are:

1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

## Why does this PR speedup things?

1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

## Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our  batched L-BFGS-B implementation.

## Points for discussion

1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right?

2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated.

## Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)
```
and it breaks inequality constraints in other cases:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)
```

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77043251

SamuelGabriel added a commit to SamuelGabriel/botorch that referenced this pull request Jun 25, 2025
…ion of `batch_limit` (meta-pytorch#2892)

Summary:

This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.

If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable.

(This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.)

This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`.

This keyword is used to mean part of what `batch_limit` was used for before.

The new meanings are:

1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

## Why does this PR speedup things?

1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

## Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our  batched L-BFGS-B implementation.

## Points for discussion

1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right?

2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated.

## Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)
```
and it breaks inequality constraints in other cases:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)
```

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77043251

@codecov
Copy link

codecov bot commented Jun 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (da44751) to head (55cefbb).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main     #2892   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          212       212           
  Lines        19717     19777   +60     
=========================================
+ Hits         19717     19777   +60     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77043251

SamuelGabriel added a commit to SamuelGabriel/botorch that referenced this pull request Jun 25, 2025
…ion of `batch_limit` (meta-pytorch#2892)

Summary:

This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.

If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable.

(This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.)

This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`.

This keyword is used to mean part of what `batch_limit` was used for before.

The new meanings are:

1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

## Why does this PR speedup things?

1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

## Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our  batched L-BFGS-B implementation.

## Points for discussion

1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right?

2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated.

## Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)
```
and it breaks inequality constraints in other cases:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)
```

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251
SamuelGabriel added a commit to SamuelGabriel/botorch that referenced this pull request Jun 25, 2025
…ion of `batch_limit` (meta-pytorch#2892)

Summary:

This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.

If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable.

(This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.)

This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`.

This keyword is used to mean part of what `batch_limit` was used for before.

The new meanings are:

1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

## Why does this PR speedup things?

1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

## Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our  batched L-BFGS-B implementation.

## Points for discussion

1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right?

2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated.

## Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)
```
and it breaks inequality constraints in other cases:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)
```

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77043251

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77043251

SamuelGabriel added a commit to SamuelGabriel/botorch that referenced this pull request Jun 25, 2025
…ion of `batch_limit` (meta-pytorch#2892)

Summary:
Pull Request resolved: meta-pytorch#2892

This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.

If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable.

(This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.)

This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`.

This keyword is used to mean part of what `batch_limit` was used for before.

The new meanings are:

1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

## Why does this PR speedup things?

1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

## Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our  batched L-BFGS-B implementation.

## Points for discussion

1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right?

2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated.

## Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)
```
and it breaks inequality constraints in other cases:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)
```

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251
SamuelGabriel added a commit to SamuelGabriel/botorch that referenced this pull request Jun 25, 2025
…ion of `batch_limit` (meta-pytorch#2892)

Summary:

This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.

If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable.

(This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.)

This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`.

This keyword is used to mean part of what `batch_limit` was used for before.

The new meanings are:

1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

## Why does this PR speedup things?

1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

## Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our  batched L-BFGS-B implementation.

## Points for discussion

1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right?

2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated.

## Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)
```
and it breaks inequality constraints in other cases:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)
```

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251
SamuelGabriel added a commit to SamuelGabriel/botorch that referenced this pull request Jun 25, 2025
…ion of `batch_limit` (meta-pytorch#2892)

Summary:

This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.

If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable.

(This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.)

This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`.

This keyword is used to mean part of what `batch_limit` was used for before.

The new meanings are:

1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

## Why does this PR speedup things?

1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

## Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our  batched L-BFGS-B implementation.

## Points for discussion

1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right?

2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated.

## Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)
```
and it breaks inequality constraints in other cases:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)
```

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77043251

SamuelGabriel added a commit to SamuelGabriel/botorch that referenced this pull request Jun 25, 2025
…ion of `batch_limit` (meta-pytorch#2892)

Summary:
Pull Request resolved: meta-pytorch#2892

This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.

If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable.

(This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.)

This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`.

This keyword is used to mean part of what `batch_limit` was used for before.

The new meanings are:

1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

## Why does this PR speedup things?

1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

## Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our  batched L-BFGS-B implementation.

## Points for discussion

1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right?

2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated.

## Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)
```
and it breaks inequality constraints in other cases:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)
```

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251
SamuelGabriel added a commit to SamuelGabriel/botorch that referenced this pull request Jun 25, 2025
…ion of `batch_limit` (meta-pytorch#2892)

Summary:

This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.

If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable.

(This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.)

This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`.

This keyword is used to mean part of what `batch_limit` was used for before.

The new meanings are:

1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

## Why does this PR speedup things?

1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

## Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our  batched L-BFGS-B implementation.

## Points for discussion

1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right?

2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated.

## Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)
```
and it breaks inequality constraints in other cases:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)
```

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77043251

SamuelGabriel added a commit to SamuelGabriel/botorch that referenced this pull request Jun 25, 2025
…ion of `batch_limit` (meta-pytorch#2892)

Summary:

This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.

If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable.

(This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.)

This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`.

This keyword is used to mean part of what `batch_limit` was used for before.

The new meanings are:

1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

## Why does this PR speedup things?

1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

## Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our  batched L-BFGS-B implementation.

## Points for discussion

1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right?

2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated.

## Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)
```
and it breaks inequality constraints in other cases:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)
```

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251
…ion of `batch_limit` (meta-pytorch#2892)

Summary:

This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.

If we do this without HP changes, we only get 1.8x gen time speedups on average. That is why this PR also changes the `batch_limit` if applicable.

(This still does not translate a lot of these improvements to `optimize_mixed`, which is done in a later PR.)

This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`.

This keyword is used to mean part of what `batch_limit` was used for before.

The new meanings are:

1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

## Why does this PR speedup things?

1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

## Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our  batched L-BFGS-B implementation.

## Points for discussion

1. I did change tests that otherwise lead to problems for values being out-of-bounds. This was previously not an issue, and if I see it right this now breaks things in axolotl. I was working under the assumption that fixed features will lie within bounds and that post_processing functions will also make sure that things lie within bounds after post-processing. Is this assumption right?

2. I have a check in here to test that the version of scipy is within a specific range for which I have tested this code: batched lbfgsb will only be enabled then. I have used this quite unusual run-time version check, as we can fall back to a version that will work across versions and as I am using private functions from scipy, which might break in subtle ways that public interfaces wouldn't (e.g. interpreting the fourth input to the C code as the opposite from before). I am happy to instead pin the version in botorch to be at most 1.15, like esantorella hinted in her previous review, even though I prefer the current solution for not incurring too much rush on our end when scipy is updated.

## Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)
```
and it breaks inequality constraints in other cases:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)
```

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77043251

SamuelGabriel added a commit to SamuelGabriel/botorch that referenced this pull request Jun 25, 2025
…ion of `batch_limit` (meta-pytorch#2892)

Summary:

This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.
This PR gives us a good speedup for acq. function optimization, if done with L-BFGS-B, something like 4x can be expected.
This can be further increased to closer to 10x by changing the `batch_limit`, but we defer this to another PR, in case this might lead to memory issues.

Optimize mixed is barely sped up by this PR, but we have follow up fixing that.

This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`.

This keyword is used to mean part of what `batch_limit` was used for before.

The new meanings are:

1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

## Why does this PR speedup things?

1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

## Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our  batched L-BFGS-B implementation.

## Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)
```
and it breaks inequality constraints in other cases:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)
```

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251
SamuelGabriel added a commit to SamuelGabriel/botorch that referenced this pull request Jun 25, 2025
…ion of `batch_limit` (meta-pytorch#2892)

Summary:

This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.
This PR gives us a good speedup for acq. function optimization, if done with L-BFGS-B, something like 4x can be expected.
This can be further increased to closer to 10x by changing the `batch_limit`, but we defer this to another PR, in case this might lead to memory issues.

Optimize mixed is barely sped up by this PR, but we have follow up fixing that.

This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`.

This keyword is used to mean part of what `batch_limit` was used for before.

The new meanings are:

1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

## Why does this PR speedup things?

1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

## Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our  batched L-BFGS-B implementation.

## Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)
```
and it breaks inequality constraints in other cases:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)
```

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251
SamuelGabriel added a commit to SamuelGabriel/botorch that referenced this pull request Jun 25, 2025
…ion of `batch_limit` (meta-pytorch#2892)

Summary:

This diff changes our acq optimization in the cases where we use L-BFGS-B to use separate L-BFGS-B states/optimizers for different batch elements.
This PR gives us a good speedup for acq. function optimization, if done with L-BFGS-B, something like 4x can be expected.
This can be further increased to closer to 10x by changing the `batch_limit`, but we defer this to another PR, in case this might lead to memory issues.

Optimize mixed is barely sped up by this PR, but we have follow up fixing that.

This PR changes our optimization routine to accept a new option keyword: `max_optimization_problem_aggregation_size`.

This keyword is used to mean part of what `batch_limit` was used for before.

The new meanings are:

1. `batch_limit` describes the maximum batch you want to evaluate during optimization (with gradients) as you might expect OOMs.
2. `max_optimization_problem_aggregation_size` is the maximum number of acqf. optimizations you would like to treat as a single optimization problem. This is only used for non-parallel optimizers (not batched L-BFGS-B or gen_candidates_torch).

## Why does this PR speedup things?

1. We converge faster, as each subproblem is simpler. I could see this be particularly pronounced for problems with small dimensionality.
2. When some sub-problem is not converged only that sub-problem is re-evaluated, saving total compute as iterations are quicker then.

## Caveats?

None, besides the step times slightly increasing, due to the loop in Python in our  batched L-BFGS-B implementation.

## Notes on the state of gen_candidates_scipy before this PR (stack)

Gen candidates scipy does not properly support fixed features set to None.

It will just be ignored in some cases, e.g.:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: -x[:,:,:].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: None,},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1, "with_grad": False},
)
```
and it breaks inequality constraints in other cases:

```
from botorch.generation.gen import gen_candidates_scipy
import torch
acquisition_function = lambda x: x[:,:,:1].sum(-1).sum(-1)
gen_candidates_scipy(
    initial_conditions=torch.zeros(1, 1, 3),
    inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([1., 1.]), 1.)],
    #fixed_features={0: torch.tensor([.8])},
    fixed_features={0: 1., 2: None},
    acquisition_function=acquisition_function,
    lower_bounds=torch.zeros(3),
    upper_bounds=torch.ones(3),
    options={"batch_limit": 1},
)
```

That is why I am just dropping support for None feature_fixes generally.

Reviewed By: Balandat

Differential Revision: D77043251
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 490bf8f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants