Skip to content

"Predicted covariance" is not always what the user thinks it is. Perhaps a better API? #400

@odunbar

Description

@odunbar

Issue

Currently the API for predictions can be done by predict

function predict(
emulator::Emulator{FT},
new_inputs::AM;
transform_to_real = false,
mlt_kwargs...,
) where {FT <: AbstractFloat, AM <: AbstractMatrix}

with transform_to_real=false by default, this is neatly how the MCMC can obtain both the posterior mean $$G(u)$$, and the estimated observational noise $$\Gamma(u)$$, for use in the MCMC objective (here a negative log likelihood):

$$ L(u, y) = (y-G(u))^\top \Gamma(u)^{-1} (y-G(u)) + \log\det(\Gamma(u)) $$

Thus y_pred, y_cov = predict(emulator, new_inputs) is not returning the predicted (GP mean and covariance $$C(u)$$) pair, but in fact the (GP mean, and the predicted observational covariance $$\Gamma(u)$$) pair.
These are typically related in the encoded space as $$\Gamma(u) = C(u) + I$$. But often users also call with transform_to_real=true to get a prediction, and here this harder to disaggregate the additional effect of our (linear encoder E), so $$C(u)$$ is only obtainable as y_cov - E*E' calling our data encoder.

This i think would result in users interpreting the obs covariances as the posterior covariances, and therefore will often see something overly broad in their interpretation.

Possible Solution - change in what predict returns.

I think perhaps we should default to predict always giving the the prediction covariance, with the observational noise variant now being called only internally. (used by MCMC).

Current fix until issue is resolved

If you have called the following

y_pred, y_cov = predict(emulator, new_inputs, transform_to_real=true,...)

then to get the "right" cov. This is exact if the output decoder is

y_cov_new = [similar(yc) for yc in y_cov] # create a new container
encoder_schedule = get_encoder_schedule(emulator)

for (yc_new, yc) in zip(y_cov_new, y_cov)
    yc_new .= Matrix(decode_with_schedule( 
        encoder_schedule, 
        Matrix(encode_with_schedule(encoder_schedule, yc, "out")) - I, 
        "out"),
    )
end

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions