Skip to content

Curious behavior of model.set_data() and control_loop.get_next_points() #313

Open
@ekalosak

Description

@ekalosak

Hi Emikit team,

First, thank you for your work on this package - it's a joy to use.

I'm writing with a question about some curious behavior I've observed when using the Bayesian optimization control loop. When I use the IModel.set_data(X, Y) class method to alter the model data followed by the OuterLoop.get_next_points(results), the model's data is reset to what it was before the set_data() call with an extra row representing the contents of the results object.

The expected behavior is to see, after the OuterLoop.get_next_points(results) call, the model data constituted by the X passed to set_data concatenated with the contents of results.

untitled (1)

Here's a minimal example that reproduces the behavior:

import numpy as np

from GPy.models import GPRegression
from GPy.kern import Matern52

from emukit.bayesian_optimization.acquisitions import ExpectedImprovement
from emukit.bayesian_optimization.loops import BayesianOptimizationLoop
from emukit.core import (
    ParameterSpace,
    DiscreteParameter,
)
from emukit.core.loop import UserFunctionWrapper
from emukit.model_wrappers import GPyModelWrapper

# Initial observations
X = np.array([[1,1,2],[2,1,2],[1,1,1]])
Y = np.array([[1],[2],[3]])

# Surrogate optimization components
kernel = Matern52(
    input_dim=X.shape[1],
    )
model_gpy = GPRegression(
    X=X,
    Y=Y,
    kernel=kernel,
    normalizer=True,
    )
model_emukit = GPyModelWrapper(
    gpy_model = model_gpy,
    )
parameters = [DiscreteParameter(f'param_{i}', range(10)) for i in range(X.shape[1])]
parameter_space = ParameterSpace(parameters)
acquisition_criterion = ExpectedImprovement(model = model_emukit)
f = lambda x_row: np.array([[sum(sum(x_row))]])
f_wrapped = UserFunctionWrapper(f)

control_loop = BayesianOptimizationLoop(
    model = model_emukit,
    space = parameter_space,
    acquisition = acquisition_criterion,
    )

# Just make sure that the data is actually represented in the model
assert model_emukit.model.X.shape[0] == X.shape[0]

# Try to set the data using other matrices
X2 = np.array([[3,3,3],[3,4,3]])
Y2 = np.array([[4],[5]])
model_emukit.set_data(X=X2, Y=Y2)

# The data is 'set' after running set_data()
assert model_emukit.model.X.shape[0] == X2.shape[0]

# Provide a result for some arbitrarily suggested point
X_arbitrary_suggestion = np.array([[1,2,5]])
results = f_wrapped(X_arbitrary_suggestion)
X_next = control_loop.get_next_points(
    results = results,
    )

# As a side effect of control_loop.get_next_points(), the model data is reset.
assert model_emukit.model.X.shape[0] == X.shape[0] + 1
for model_x_row, initial_x_row in zip(model_emukit.model.X, X):
    assert all(model_x_row == initial_x_row)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions