Variable candidates in multi-objective, multi-fidelity setting #2833

EvanClaes · 2025-04-24T17:08:08Z

EvanClaes
Apr 24, 2025

Hi all,

I'm doing multi-objective, multi-fidelity bayesian optimization in botorch. I’m using qNEHVI as the acquisition function, and MultiTaskGP as the surrogate model. I'm tuning six parameters, and have two objectives. My high fidelity dataset has four observations, my low fidelity dataset has around 500. I'm generating four new candidates. My problem is that the four candidates that are being generated, are quite variable.

Is this due to the sparse high fidelity data?
How can I diagnose the problem, and what would be potential solutions?

Below you can find my code:

import pandas as pd
import numpy as np
import torch
import os
import matplotlib.pyplot as plt
from gpytorch.mlls.sum_marginal_log_likelihood import ExactMarginalLogLikelihood
from botorch import fit_gpytorch_mll
from botorch.utils.transforms import unnormalize, normalize
from botorch.optim import optimize_acqf
from scipy.optimize import minimize
from botorch.sampling.normal import SobolQMCNormalSampler
from botorch.models.transforms.outcome import Standardize
from botorch.models.model_list_gp_regression import ModelListGP
from gpytorch.mlls.sum_marginal_log_likelihood import SumMarginalLogLikelihood
from botorch.models import MultiTaskGP
from botorch.utils.multi_objective.box_decompositions.dominated import DominatedPartitioning
from botorch.acquisition.multi_objective.logei import qLogNoisyExpectedHypervolumeImprovement

def initialize_model(train_x, train_obj, bounds, train_noise):
    train_x_norm = normalize(train_x, bounds)
    models = []
    for i in range(train_obj.shape[-1]):
        train_y = train_obj[..., i : i + 1]
        train_yvar = train_noise[..., i : i + 1]
        models.append(
            MultiTaskGP(
                train_x_norm, train_y, task_feature=-1,  output_tasks = [1], outcome_transform=Standardize(m=1)
            )
        )
    model = ModelListGP(*models)
    mll = SumMarginalLogLikelihood(model.likelihood, model)
    return mll, model

tkwargs = {
    "dtype": torch.double,
    "device": torch.device("cuda" if torch.cuda.is_available() else "cpu"),
}
SMOKE_TEST = os.environ.get("SMOKE_TEST")
BATCH_SIZE = 4
NUM_RESTARTS = 50
RAW_SAMPLES = 4096
MC_SAMPLES = 128

#import data
highFdata = pd.read_excel('ANT-0434 - DMEM 766.xlsx')
lowFdata = pd.read_excel('LFdata.xlsx')

#define bounds, noise levels and reference point
actualBoundsMT = torch.tensor([[10,24,0.5,1,700,200,0],[60,192,6,6,2500,500,1]], dtype=torch.float64)
qnehviBounds = torch.tensor([[0,0,0,0,0,0,0],[1,1,1,1,1,1,1]], dtype=torch.float64)
NOISE_SE_highF = torch.tensor([0.56/np.sqrt(6), 2.86/np.sqrt(6)], **tkwargs)
NOISE_SE_lowF = torch.tensor([0, 0], **tkwargs)
refPoint = torch.tensor([0,0], dtype=torch.float64)

#define constraint on two of the input variables
inequality_constraint = [ #(800,200) as bottom left point, instead of (700,200)
(torch.tensor([4, 5], dtype=torch.long),  # Parameter indices
torch.tensor([(18/5), -1.0], dtype=torch.double),  # Coefficients
(1/5))  # Right-hand side. This is >= by default
]

#make the training data
train_x1 = torch.tensor(highFdata.iloc[:,1:7].values)
train_obj1 = torch.tensor(highFdata.iloc[:,7:9].values)
train_x2 = torch.tensor(lowFdata.iloc[:,0:6].values)
train_obj2 = torch.tensor(lowFdata.iloc[:,6:8].values)
train_x1 = torch.cat([train_x1, torch.ones(train_x1.shape[0], 1) ], dim=1)
train_x2 = torch.cat([train_x2, torch.zeros(train_x2.shape[0], 1) ], dim=1)

train_x = torch.cat([train_x1, train_x2], dim=0)
train_obj = torch.cat([train_obj1, train_obj2])
train_noise = torch.cat([NOISE_SE_highF.repeat(highFdata.shape[0],1)**2,NOISE_SE_lowF.repeat(lowFdata.shape[0],1)**2])

#fit model
mll, model = initialize_model(train_x, train_obj, actualBoundsMT, train_noise)
fit_gpytorch_mll(mll)

#compute hypervolume
bd = DominatedPartitioning(ref_point=refPoint, Y=train_obj)
volume = bd.compute_hypervolume().item()
qnehvi_sampler = SobolQMCNormalSampler(sample_shape=torch.Size([128]))

#partition non-dominated space into disjoint rectangles
acq_func = qLogNoisyExpectedHypervolumeImprovement(
    model=model,
    ref_point=refPoint,
    X_baseline=normalize(train_x1, actualBoundsMT),
    prune_baseline=True,
    sampler=qnehvi_sampler,
)

#optimize
candidates, _ = optimize_acqf(
    acq_function=acq_func,
    bounds=qnehviBounds,
    q=BATCH_SIZE,
    num_restarts=NUM_RESTARTS,
    raw_samples=RAW_SAMPLES,
    options={"batch_limit": 5, "maxiter": 200},
    sequential=True,
    fixed_features = {6: 1},
    inequality_constraints=inequality_constraint,
)

print(candidates)

HFdata.xlsx
LFdata.xlsx

Answered by Balandat

May 3, 2025

With validating LF data PF points on HF objectives, do you mean evaluating them through the HF process?

Yes. Unfortunately that's really the only way to know for sure whether the LF data can serve as an effective proxy.

I guess the fundamental problem/limitation in this case might be the sparsity of the HF data.

Indeed.

The approach I followed for now is to evaluate the HF design points and the suggested candidates with the LF process, and calculate the hypervolume improvement (resulting from adding the LF evaluations of the suggested candidates). Then I select the candidate set (out of the 5 shown in the figures) that has the highest hypervolume improvement. What do you think?

Give…

View full answer

eytan · 2025-04-25T03:07:03Z

eytan
Apr 25, 2025

Hi Evan, With only 4 observations you are going to have a good deal of uncertainty in your model, and if you are doing multi objective optimization in a 6d space, there are likely to be many possible solutions that increase your hyper volume, so I wouldn’t expect the optimizer to consistently pick the same set of promising points. Section 6 of https://jmlr.org/papers/volume20/18-225/18-225.pdf provides some intuitions about how the MTGP (ICM) model will behave, depending on the correlation between the tasks. Have you tried plotting the values at high fidelity against the values at low fidelity? You might need a few more than 4 data points to get a sense of how correlated your high fidelity and low fidelity tasks are, but if you are able to perform those evaluations, you may quickly find out whether the AF is identifying something useful.

…

On Thu, Apr 24, 2025 at 1:08 PM EvanClaes ***@***.***> wrote: Hi all, I'm doing multi-objective, multi-fidelity bayesian optimization in botorch. I’m using qNEHVI as the acquisition function, and MultiTaskGP as the surrogate model. I'm tuning six parameters, and have two objectives. My high fidelity dataset has four observations, my low fidelity dataset has around 500. I'm generating four new candidates. My problem is that the four candidates that are being generated, are quite variable. Is this due to the sparse high fidelity data? How can I diagnose the problem, and what would be potential solutions? Below you can find my code: import pandas as pd import numpy as np import torch import os import matplotlib.pyplot as plt from gpytorch.mlls.sum_marginal_log_likelihood import ExactMarginalLogLikelihood from botorch import fit_gpytorch_mll from botorch.utils.transforms import unnormalize, normalize from botorch.optim import optimize_acqf from scipy.optimize import minimize from botorch.sampling.normal import SobolQMCNormalSampler from botorch.models.transforms.outcome import Standardize from botorch.models.model_list_gp_regression import ModelListGP from gpytorch.mlls.sum_marginal_log_likelihood import SumMarginalLogLikelihood from botorch.models import MultiTaskGP from botorch.utils.multi_objective.box_decompositions.dominated import DominatedPartitioning from botorch.acquisition.multi_objective.logei import qLogNoisyExpectedHypervolumeImprovement def initialize_model(train_x, train_obj, bounds, train_noise): train_x_norm = normalize(train_x, bounds) models = [] for i in range(train_obj.shape[-1]): train_y = train_obj[..., i : i + 1] train_yvar = train_noise[..., i : i + 1] models.append( MultiTaskGP( train_x_norm, train_y, task_feature=-1, output_tasks = [1], outcome_transform=Standardize(m=1) ) ) model = ModelListGP(*models) mll = SumMarginalLogLikelihood(model.likelihood, model) return mll, model tkwargs = { "dtype": torch.double, "device": torch.device("cuda" if torch.cuda.is_available() else "cpu"), } SMOKE_TEST = os.environ.get("SMOKE_TEST") BATCH_SIZE = 4 NUM_RESTARTS = 50 RAW_SAMPLES = 4096 MC_SAMPLES = 128 #import data highFdata = pd.read_excel('ANT-0434 - DMEM 766.xlsx') lowFdata = pd.read_excel('LFdata.xlsx') #define bounds, noise levels and reference point actualBoundsMT = torch.tensor([[10,24,0.5,1,700,200,0],[60,192,6,6,2500,500,1]], dtype=torch.float64) qnehviBounds = torch.tensor([[0,0,0,0,0,0,0],[1,1,1,1,1,1,1]], dtype=torch.float64) NOISE_SE_highF = torch.tensor([0.56/np.sqrt(6), 2.86/np.sqrt(6)], **tkwargs) NOISE_SE_lowF = torch.tensor([0, 0], **tkwargs) refPoint = torch.tensor([0,0], dtype=torch.float64) #define constraint on two of the input variables inequality_constraint = [ #(800,200) as bottom left point, instead of (700,200) (torch.tensor([4, 5], dtype=torch.long), # Parameter indices torch.tensor([(18/5), -1.0], dtype=torch.double), # Coefficients (1/5)) # Right-hand side. This is >= by default ] #make the training data train_x1 = torch.tensor(highFdata.iloc[:,1:7].values) train_obj1 = torch.tensor(highFdata.iloc[:,7:9].values) train_x2 = torch.tensor(lowFdata.iloc[:,0:6].values) train_obj2 = torch.tensor(lowFdata.iloc[:,6:8].values) train_x1 = torch.cat([train_x1, torch.ones(train_x1.shape[0], 1) ], dim=1) train_x2 = torch.cat([train_x2, torch.zeros(train_x2.shape[0], 1) ], dim=1) train_x = torch.cat([train_x1, train_x2], dim=0) train_obj = torch.cat([train_obj1, train_obj2]) train_noise = torch.cat([NOISE_SE_highF.repeat(highFdata.shape[0],1)**2,NOISE_SE_lowF.repeat(lowFdata.shape[0],1)**2]) #fit model mll, model = initialize_model(train_x, train_obj, actualBoundsMT, train_noise) fit_gpytorch_mll(mll) #compute hypervolume bd = DominatedPartitioning(ref_point=refPoint, Y=train_obj) volume = bd.compute_hypervolume().item() qnehvi_sampler = SobolQMCNormalSampler(sample_shape=torch.Size([128])) #partition non-dominated space into disjoint rectangles acq_func = qLogNoisyExpectedHypervolumeImprovement( model=model, ref_point=refPoint, X_baseline=normalize(train_x1, actualBoundsMT), prune_baseline=True, sampler=qnehvi_sampler, ) #optimize candidates, _ = optimize_acqf( acq_function=acq_func, bounds=qnehviBounds, q=BATCH_SIZE, num_restarts=NUM_RESTARTS, raw_samples=RAW_SAMPLES, options={"batch_limit": 5, "maxiter": 200}, sequential=True, fixed_features = {6: 1}, inequality_constraints=inequality_constraint, ) print(candidates) HFdata.xlsx <https://github.com/user-attachments/files/19896203/HFdata.xlsx> LFdata.xlsx <https://github.com/user-attachments/files/19896204/LFdata.xlsx> — Reply to this email directly, view it on GitHub <#2833>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAW34NVWNMOHQOCZ3X7B3D23ELA5AVCNFSM6AAAAAB3ZSIIE2VHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZYGIZTSNZZGI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

1 reply

EvanClaes Apr 25, 2025
Author

Hello Eytan,

Thanks for the feedback. the pearson correlations between the 4 high fidelity observations and the 4 low fidelity observations (with the same input values) are very high, around 0,99. Your results seem to indicate that a multi-fidelity approach should be beneficial in this case. I do have to add that, in the objective space, the points are not very well spaced. I have two towards the lower end, and two towards the higher end...

If my approach is correct and valid, then I guess we should just proceed with one particular set of generated candidates. Do you think it makes sense here to further increase the number of low-fidelity samples, or some of the optimizer parameters, before we generate these?

Enjoy your weekend!

eytan · 2025-04-25T18:11:53Z

eytan
Apr 25, 2025

Yes, I would recommend proceeding. With only 4 points it's hard to tell if the correlation is 0.99, but so far the correlation is quite strong. If the tasks are highly correlated, then any point on the PF in the low fidelity will also be on the PF for the high fidelity, so you can also consider just fitting a single-task GP to the low-fidelity and generating candidates by applying BO to that. Looking briefly at your code, it looks like you aren't normalizing your y's, which could be a big problem. You may also wish to set the reference point to some minimum values you care about (perhaps there is some minimum cost or yield). If you are more of an end-user of BO than a BO researcher, you may want to consider using Ax instead. We don't have any tutorials for MF-MOBO in Ax quite yet, but there is this tutorial which could be pretty handy for your setup. https://ax.dev/docs/tutorials/multi_task/

…

On Fri, Apr 25, 2025 at 11:07 AM EvanClaes ***@***.***> wrote: Hello Eytan, Thanks for the feedback. the pearson correlations between the 4 high fidelity observations and the 4 low fidelity observations (with the same input values) are very high, around 0,99. Your results seem to indicate that a multi-fidelity approach should be beneficial in this case. I do have to add that, in the objective space, the points are not very well spaced. I have two towards the lower end, and two towards the higher end... If my approach is correct and valid, then I guess we should just proceed with one particular set of generated candidates. Do you think it makes sense here to further increase the number of low-fidelity samples, or some of the optimizer parameters, before we generate these? Enjoy your weekend! — Reply to this email directly, view it on GitHub <#2833 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAW34MJZPIZXD24CXPZKM323JFS5AVCNFSM6AAAAAB3ZSIIE2VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEOJUHA2TONY> . You are receiving this because you commented.Message ID: ***@***.***>

9 replies

EvanClaes May 2, 2025
Author

Hello guys,

So I implemented the approach with the suggestions mentioned above. However, when I try to assess the potential performance of the new candidates by plotting their MF-GP posterior together with the MF-GP posterior of the high fidelity training data, the results are quite underwhelming. We're only finding solution that are 'in between' the existing non-dominated ones, and do not seem to improve the hypervolume much (if at all):

The different figures are different runs of the script (where the MF-GP was also re-trained each time). My intuition says that better solutions/candidates should exist. This is also corroborated by the low fidelity data which, when shown together with the LF estimate of the HF points (i.e. not the estimate from the LF GP, but the prediction by the mathematical model of the bioprocess that generates the LF data, although these are probably similar), suggests that quite big improvements in both objectives should be possible:

What is causing this problem? To me it feels like the MF-GP is unable to 'extrapolate' beyond the limited HF dataset. But to a certain extent, the MF approach should solve this, no? Should we maybe try to use more LF data? Because there is relatively little LF data that is non-dominated relative to the LF estimate of the (HF data points).

Balandat May 2, 2025
Collaborator

QQ: For the GP model estimate plots, the "High fidelity data" points are also the model predictions, right?

Also, what is the predictive variance at those candidate points? If it is large then these candidates could well contribute to sizable expected hypervolume improvements even if their predicted mean does not increase the hypervolume.

Have you validated some of the points on the PF of the LF data on the HF objectives? I'd be curious to see whether that also translates to pareto improvements for the HF objectives. If not then it may be that the LF model is not a great proxy in that outcome range (in the sense that even rank ordering is inconsistent).

EvanClaes May 2, 2025
Author

Hello Max,

The high fidelity data points are also GP predictions, indeed.

I haven't checked the variance of the candidates yet, but I imagine it is high. The thing that confuses me is that the candidates are so close together in objective space...

With validating LF data PF points on HF objectives, do you mean evaluating them through the HF process? I haven't done this and unfortunately, if done purely out of scientific curiosity, it is not an option. The HF process takes about 3 weeks and 5-10k€.

There are for sure regions in the feature/objective space where the LF model is not a great proxy, but I guess the GP doesn't know this, since the task correlation is so high. I guess the fundamental problem/limitation in this case might be the sparsity of the HF data.

The approach I followed for now is to evaluate the HF design points and the suggested candidates with the LF process, and calculate the hypervolume improvement (resulting from adding the LF evaluations of the suggested candidates). Then I select the candidate set (out of the 5 shown in the figures) that has the highest hypervolume improvement. What do you think?

Balandat May 3, 2025
Collaborator

With validating LF data PF points on HF objectives, do you mean evaluating them through the HF process?

Yes. Unfortunately that's really the only way to know for sure whether the LF data can serve as an effective proxy.

I guess the fundamental problem/limitation in this case might be the sparsity of the HF data.

Indeed.

The approach I followed for now is to evaluate the HF design points and the suggested candidates with the LF process, and calculate the hypervolume improvement (resulting from adding the LF evaluations of the suggested candidates). Then I select the candidate set (out of the 5 shown in the figures) that has the highest hypervolume improvement. What do you think?

Given the limitations on evaluating the HF candidates I think this approach makes a lot of sense!

Answer selected by Balandat

EvanClaes May 5, 2025
Author

Hello Max,

Thanks for the feedback. You can close this discussion.

Have a great week!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Variable candidates in multi-objective, multi-fidelity setting #2833

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 10 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Variable candidates in multi-objective, multi-fidelity setting #2833

Uh oh!

EvanClaes Apr 24, 2025

Replies: 2 comments · 10 replies

Uh oh!

eytan Apr 25, 2025

Uh oh!

EvanClaes Apr 25, 2025 Author

Uh oh!

eytan Apr 25, 2025

Uh oh!

EvanClaes May 2, 2025 Author

Uh oh!

Balandat May 2, 2025 Collaborator

Uh oh!

Uh oh!

EvanClaes May 2, 2025 Author

Uh oh!

Balandat May 3, 2025 Collaborator

Uh oh!

EvanClaes May 5, 2025 Author

EvanClaes
Apr 24, 2025

Replies: 2 comments 10 replies

eytan
Apr 25, 2025

EvanClaes Apr 25, 2025
Author

eytan
Apr 25, 2025

EvanClaes May 2, 2025
Author

Balandat May 2, 2025
Collaborator

EvanClaes May 2, 2025
Author

Balandat May 3, 2025
Collaborator

EvanClaes May 5, 2025
Author