-
Hi all, not sure if this is a bug or something else I'm doing wrong. In my workflow, I need to extract the mean and covariance matrices from the posterior distributions of To work around this, I add a jitter along the diagonal of the covariance matrix. But I am hoping someone here could explain why this happens when I'm supposedly taking the matrices from an instance of the same distribution and creating another. Thank you! See below a simple reproducer: import torch
from botorch.models import MultiTaskGP
from botorch.fit import fit_gpytorch_mll
from gpytorch.mlls import ExactMarginalLogLikelihood
from gpytorch.distributions import MultitaskMultivariateNormal
from botorch.models.transforms.input import AffineInputTransform
# Generate synthetic data
X1 = torch.tensor([0.03, 0.1, 0.15, 0.17, 0.6, 0.68, 0.7, 0.8, 0.82, 0.95]).to(
dtype=torch.float64)
X1 = torch.stack([X1, torch.zeros_like(X1)], dim=-1)
X2 = torch.tensor([0.03, 0.1, 0.14, 0.97]).to(dtype=torch.float64)
X2 = torch.stack([X2, torch.ones_like(X2)], dim=-1)
X = torch.cat([X1, X2])
y = 1e10 * ( torch.sin(6 * X[:, 0:1]) - 0.6 * X[:, 1:])
# Normalize input/output data
X_input = X[:, 0].unsqueeze(-1)
X_task = X[:, 1]
input_transform = AffineInputTransform(
1, # Number of input features (just one column in this case)
coefficient=X_input.std(dim=0), # Standard deviation of input features
offset=X_input.mean(dim=0) # Mean of input features
)
output_transform = AffineInputTransform(
1,
coefficient=y.std(axis=0),
offset=y.mean(axis=0)
)
norm_x = torch.cat([X_input, X_task.unsqueeze(-1)], dim=-1)
norm_y = y.clone().detach()
# Initialize MultiTaskGP
gp = MultiTaskGP(norm_x, norm_y, task_feature=-1)
mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_mll(mll)
# Make predictions
x = torch.linspace(norm_x.min(), norm_x.max(), 200).reshape(-1, 1)
p = gp.posterior(x)
# Create distribution
mmvn = MultitaskMultivariateNormal(p.mean, p.distribution.covariance_matrix) which results in:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hey, sorry for the delay, this slipped through. So one thing to call out here is that your But even if you are normalizing the values, it can happen that the posterior covariance matrix is numerically not PSD if the test points are not very correlated. I ran your example with normalized One additional comment: You need to be careful with |
Beta Was this translation helpful? Give feedback.
-
Thank you for your response @Balandat . I ended up adjusting the transformers and adding a jitter, and that works for our use case. And thank you for the note on the |
Beta Was this translation helpful? Give feedback.
Hey, sorry for the delay, this slipped through.
So one thing to call out here is that your
Y
s are very large and you're not actually passing the standardized observations to the model, so your mean and covariance matrix end up having values of the order of 10^10 or 10^16 even. This will cause all kinds of numerical problems, so make sure to actually standardize your inputs (the transforms you define don't do anything here since they're not passed to the model).But even if you are normalizing the values, it can happen that the posterior covariance matrix is numerically not PSD if the test points are not very correlated. I ran your example with normalized
Ys
and the eigenvalues range up to…