aleximmer · aleximmer · Jul 16, 2024 · Sep 15, 2021 · Sep 18, 2021 · Sep 20, 2021
diff --git a/.gitignore b/.gitignore
@@ -134,3 +134,4 @@ data/
 .DS_Store
 
 state_dict.bin
+/temp
diff --git a/README.md b/README.md
@@ -4,6 +4,7 @@
 
 [![Main](https://travis-ci.com/AlexImmer/Laplace.svg?token=rpuRxEjQS6cCZi7ptL9y&branch=main)](https://travis-ci.com/AlexImmer/Laplace)
 
+
 The laplace package facilitates the application of Laplace approximations for entire neural networks, subnetworks of neural networks, or just their last layer.
 The package enables posterior approximations, marginal-likelihood estimation, and various posterior predictive computations.
 The library documentation is available at [https://aleximmer.github.io/Laplace](https://aleximmer.github.io/Laplace).
@@ -35,7 +36,7 @@ For development purposes, clone the repository and then install:
 # or after cloning the repository for development
 pip install -e .
 # run tests
-pip install -e .[tests]
+pip install -e '.[tests]'
 pytest tests/
 ```
 
@@ -273,7 +274,7 @@ torch.load(..., map_location='cpu')
 ## Structure
 The laplace package consists of two main components:
 
-1. The subclasses of [`laplace.BaseLaplace`](https://github.com/AlexImmer/Laplace/blob/main/laplace/baselaplace.py) that implement different sparsity structures: different subsets of weights (`'all'`, `'subnetwork'` and `'last_layer'`) and different structures of the Hessian approximation (`'full'`, `'kron'`, `'lowrank'` and `'diag'`). This results in _nine_ currently available options: `laplace.FullLaplace`, `laplace.KronLaplace`, `laplace.DiagLaplace`, the corresponding last-layer variations `laplace.FullLLLaplace`, `laplace.KronLLLaplace`,  and `laplace.DiagLLLaplace` (which are all subclasses of [`laplace.LLLaplace`](https://github.com/AlexImmer/Laplace/blob/main/laplace/lllaplace.py)), [`laplace.SubnetLaplace`](https://github.com/AlexImmer/Laplace/blob/main/laplace/subnetlaplace.py) (which only supports `'full'` and `'diag'` Hessian approximations) and `laplace.LowRankLaplace` (which only supports inference over `'all'` weights). All of these can be conveniently accessed via the [`laplace.Laplace`](https://github.com/AlexImmer/Laplace/blob/main/laplace/laplace.py) function.
+1. The subclasses of [`laplace.BaseLaplace`](https://github.com/AlexImmer/Laplace/blob/main/laplace/baselaplace.py) that implement different sparsity structures: different subsets of weights (`'all'`, `'subnetwork'` and `'last_layer'`) and different structures of the Hessian approximation (`'full'`, `'kron'`, `'lowrank'`, `'diag'` and `'gp'`). This results in _ten_ currently available options: `laplace.FullLaplace`, `laplace.KronLaplace`, `laplace.DiagLaplace`, `laplace.FunctionalLaplace` the corresponding last-layer variations `laplace.FullLLLaplace`, `laplace.KronLLLaplace`,  `laplace.DiagLLLaplace` and `laplace.FunctionalLLLaplace` (which are all subclasses of [`laplace.LLLaplace`](https://github.com/AlexImmer/Laplace/blob/main/laplace/lllaplace.py)), [`laplace.SubnetLaplace`](https://github.com/AlexImmer/Laplace/blob/main/laplace/subnetlaplace.py) (which only supports `'full'` and `'diag'` Hessian approximations) and `laplace.LowRankLaplace` (which only supports inference over `'all'` weights). All of these can be conveniently accessed via the [`laplace.Laplace`](https://github.com/AlexImmer/Laplace/blob/main/laplace/laplace.py) function.
 2. The backends in [`laplace.curvature`](https://github.com/AlexImmer/Laplace/blob/main/laplace/curvature/) which provide access to Hessian approximations of
 the corresponding sparsity structures, for example, the diagonal GGN.
 

diff --git a/examples/calibration_gp_example.md b/examples/calibration_gp_example.md
@@ -0,0 +1,136 @@
+## Full example: Functional Laplace (GP) on FMNIST image classifier
+Applying the General-Gauss-Newton (GGN) approximation to the Hessian in the Laplace approximation (LA) of the BNN posterior
+turns the underlying probabilistic model from a BNN into a generalized linear model (GLM).
+This GLM is equivalent to a Gaussian Process (GP) with a particular kernel [1, 2]. 
+
+In this notebook, we will show how to use `laplace` library to perform GP inference on top of a *pre-trained* neural network.
+
+Note that a GPU with CUDA support is needed for this example. We recommend using a GPU with at least 24 GB of memory. If less memory is available, we suggest reducing `BATCH_SIZE` below.
+
+#### Data loading
+
+First, let us load the FMIST dataset. The helper scripts for FMNIST and pre-trained CNN are available in the `examples/helper` directory in the main repository.
+
+```python
+import numpy as np
+import torch
+from torch.utils.data import DataLoader
+import torch.distributions as dists
+from netcal.metrics import ECE
+
+from helper.util_gp import get_dataset, CIFAR10Net
+from laplace import Laplace
+
+np.random.seed(7777)
+torch.manual_seed(7777)
+torch.backends.cudnn.deterministic = True
+torch.backends.cudnn.benchmark = True
+
+assert torch.cuda.is_available()
+
+DATASET = 'FMNIST'
+BATCH_SIZE = 256
+ds_train, ds_test = get_dataset(DATASET, False, 'cuda')
+train_loader = DataLoader(ds_train, batch_size=BATCH_SIZE, shuffle=True)
+test_loader = DataLoader(ds_test, batch_size=BATCH_SIZE, shuffle=False)
+targets = torch.cat([y for x, y in test_loader], dim=0).cpu()
+```
+
+#### Load a pre-trained model
+
+Next, we load a pre-trained CNN model. The code to train the model can be found in [BNN-predictions repo](https://github.com/AlexImmer/BNN-predictions).
+
+``` python
+MODEL_NAME = 'FMNIST_CNN_10_2.2e+02.pt'
+model = CIFAR10Net(ds_train.channels, ds_train.K, use_tanh=True).to('cuda')
+state = torch.load(f'helper/models/{MODEL_NAME}')
+model.load_state_dict(state['model'])
+model = model.cuda()
+prior_precision = state['delta']
+```
+
+To simplify the downstream tasks, we will use the following helper function to make predictions. It simply iterates through all minibatches and obtains the predictive probabilities of the FMNIST classes.
+
+``` python
+@torch.no_grad()
+def predict(dataloader, model, laplace=False):
+    py = []
+
+    for x, _ in dataloader:
+        if laplace:
+            py.append(model(x.cuda()))
+        else:
+            py.append(torch.softmax(model(x.cuda()), dim=-1))
+
+    return torch.cat(py).cpu().numpy()
+```
+
+#### The calibration of MAP
+
+We are now ready to see how calibrated is the model. The metrics we use are the expected calibration error (ECE, Naeni et al., AAAI 2015) and the negative (Categorical) log-likelihood. Note that lower values are better for both these metrics.
+
+First, let us inspect the MAP model. We shall use the [`netcal`](https://github.com/fabiankueppers/calibration-framework) library to easily compute the ECE.
+
+``` python
+probs_map = predict(test_loader, model, laplace=False)
+acc_map = (probs_map.argmax(-1) == targets).float().mean()
+ece_map = ECE(bins=15).measure(probs_map.numpy(), targets.numpy())
+nll_map = -dists.Categorical(probs_map).log_prob(targets).mean()
+
+print(f'[MAP] Acc.: {acc_map:.1%}; ECE: {ece_map:.1%}; NLL: {nll_map:.3}')
+```
+
+Running this snippet, we would get:
+
+```
+[MAP] Acc.: 94.8%; ECE: 2.0%; NLL: 0.172
+```
+
+### The calibration of Laplace
+
+Next, we run Laplace-GP inference to calibrate neural network's predictions. Since running exact GP inference is computationally infeasible, we perform Subset-of-Datapoints (SoD) [3] approximation here. In the code below, `m`denotes the number of datapoints used in the SoD posterior. 
+
+Execution of the cell below can take up to 5min (depending on the exact hardware used).
+
+``` python
+for m in [50, 200, 800, 1600]:
+    print(f'Fitting Laplace-GP for m={m}')
+    la = Laplace(model, 'classification',
+                 subset_of_weights='all',
+                 hessian_structure='gp',
+                 diagonal_kernel=True, M=m,
+                 prior_precision=prior_precision)
+    la.fit(train_loader)
+
+    probs_laplace = predict(test_loader, la, laplace=True)
+    acc_laplace = (probs_laplace.argmax(-1) == targets).float().mean()
+    ece_laplace = ECE(bins=15).measure(probs_laplace.numpy(), targets.numpy())
+    nll_laplace = -dists.Categorical(probs_laplace).log_prob(targets).mean()
+
+    print(f'[Laplace-GP, m={m}] Acc.: {acc_laplace:.1%}; ECE: {ece_laplace:.1%}; NLL: {nll_laplace:.3}')
+```
+
+```
+Fitting Laplace-GP for m=50
+[Laplace] Acc.: 91.6%; ECE: 1.5%; NLL: 0.252
+Fitting Laplace-GP for m=200
+[Laplace] Acc.: 91.5%; ECE: 1.1%; NLL: 0.252
+Fitting Laplace-GP for m=800
+[Laplace] Acc.: 91.4%; ECE: 0.8%; NLL: 0.254
+Fitting Laplace-GP for m=1600
+[Laplace] Acc.: 91.3%; ECE: 0.7%; NLL: 0.257
+```
+
+Notice that the post-hoc Laplace-GP inference does not have a significant impact on the accuracy, yet it improves the calibration (in terms of ECE) of the MAP model substantially.
+<br />
+<br />
+<br />
+<br />
+
+### References
+[1] Khan, Mohammad Emtiyaz E., et al. "Approximate inference turns deep networks into gaussian processes." Advances in neural information processing systems 32 (2019)
+
+[2] Immer, Alexander, Maciej Korzepa, and Matthias Bauer. "Improving predictions of Bayesian neural nets via local linearization." International Conference on Artificial Intelligence and Statistics. PMLR, 2021
+
+[3] Rasmussen, Carl Edward. "Gaussian processes in machine learning." Springer, 2004
+
diff --git a/examples/calibration_gp_example.py b/examples/calibration_gp_example.py
@@ -0,0 +1,78 @@
+import warnings
+
+import numpy as np
+import torch
+import torch.distributions as dists
+from helper.util_gp import CIFAR10Net, get_dataset
+from netcal.metrics import ECE
+from torch.utils.data import DataLoader
+
+from laplace import Laplace
+
+np.random.seed(7777)
+torch.manual_seed(7777)
+torch.backends.cudnn.deterministic = True
+torch.backends.cudnn.benchmark = True
+
+warnings.simplefilter('ignore', UserWarning)
+
+
+assert torch.cuda.is_available()
+
+DATASET = 'FMNIST'
+BATCH_SIZE = 25
+ds_train, ds_test = get_dataset(DATASET, False, 'cuda')
+train_loader = DataLoader(ds_train, batch_size=BATCH_SIZE, shuffle=True)
+test_loader = DataLoader(ds_test, batch_size=BATCH_SIZE, shuffle=False)
+targets = torch.cat([y for x, y in test_loader], dim=0).cpu()
+
+MODEL_NAME = 'FMNIST_CNN_10_2.2e+02.pt'
+model = CIFAR10Net(ds_train.channels, ds_train.K, use_tanh=True).to('cuda')
+state = torch.load(f'helper/models/{MODEL_NAME}')
+model.load_state_dict(state['model'])
+model = model.cuda()
+prior_precision = state['delta']
+
+
+@torch.no_grad()
+def predict(dataloader, model, laplace=False):
+    py = []
+
+    for x, _ in dataloader:
+        if laplace:
+            py.append(model(x.cuda()))
+        else:
+            py.append(torch.softmax(model(x.cuda()), dim=-1))
+
+    return torch.cat(py).cpu()
+
+
+probs_map = predict(test_loader, model, laplace=False)
+acc_map = (probs_map.argmax(-1) == targets).float().mean()
+ece_map = ECE(bins=15).measure(probs_map.numpy(), targets.numpy())
+nll_map = -dists.Categorical(probs_map).log_prob(targets).mean()
+
+print(f'[MAP] Acc.: {acc_map:.1%}; ECE: {ece_map:.1%}; NLL: {nll_map:.3}')
+
+for m in [50, 200, 800, 1600]:
+    print(f'Fitting Laplace-GP for m={m}')
+    la = Laplace(
+        model,
+        'classification',
+        subset_of_weights='all',
+        hessian_structure='gp',
+        diagonal_kernel=True,
+        M=m,
+        prior_precision=prior_precision,
+    )
+    la.fit(train_loader)
+    la.optimize_prior_precision(method='marglik', progress_bar=True)
+
+    probs_laplace = predict(test_loader, la, laplace=True)
+    acc_laplace = (probs_laplace.argmax(-1) == targets).float().mean()
+    ece_laplace = ECE(bins=15).measure(probs_laplace.numpy(), targets.numpy())
+    nll_laplace = -dists.Categorical(probs_laplace).log_prob(targets).mean()
+
+    print(
+        f'[Laplace] Acc.: {acc_laplace:.1%}; ECE: {ece_laplace:.1%}; NLL: {nll_laplace:.3}'
+    )
diff --git a/examples/helper/models/FMNIST_CNN_10_2.2e+02.pt b/examples/helper/models/FMNIST_CNN_10_2.2e+02.pt
diff --git a/examples/helper/util_gp.py b/examples/helper/util_gp.py
@@ -0,0 +1,119 @@
+import os
+
+import torch
+import torchvision.datasets as dset
+import torchvision.transforms as transforms
+from deepobs.pytorch.testproblems.testproblems_utils import (tfconv2d,
+                                                             tfmaxpool2d)
+from torch import nn
+from torchvision.datasets import VisionDataset
+
+PACKAGE_DIR = os.path.dirname(os.path.realpath(__file__))
+ROOT = '/'.join(PACKAGE_DIR.split('/')[:-1])
+DATA_DIR = ROOT + '/data'
+
+MNIST_transform = transforms.ToTensor()
+
+
+class QuickDS(VisionDataset):
+    def __init__(self, ds, device):
+        self.D = [
+            (ds[i][0].to(device).requires_grad_(), torch.tensor(ds[i][1]).to(device))
+            for i in range(len(ds))
+        ]
+        self.K = ds.K
+        self.channels = ds.channels
+        self.pixels = ds.pixels
+
+    def __getitem__(self, index):
+        return self.D[index]
+
+    def __len__(self):
+        return len(self.D)
+
+
+def get_dataset(dataset, double, device=None):
+    if dataset == 'FMNIST':
+        ds_train = FMNIST(train=True, double=double)
+        ds_test = FMNIST(train=False, double=double)
+    else:
+        raise ValueError('Invalid dataset argument')
+    if device is not None:
+        return QuickDS(ds_train, device), QuickDS(ds_test, device)
+    else:
+        return ds_train, ds_test
+
+
+class FMNIST(dset.FashionMNIST):
+    def __init__(
+        self,
+        root=DATA_DIR,
+        train=True,
+        download=True,
+        transform=MNIST_transform,
+        double=False,
+    ):
+        super().__init__(root=root, train=train, download=download, transform=transform)
+        self.K = 10
+        self.pixels = 28
+        self.channels = 1
+        if double:
+            self.data = self.data.double()
+            self.targets = self.targets.double()
+
+
+class CIFAR10Net(nn.Sequential):
+    """
+    Deepobs network with optional last sigmoid activation (instead of relu)
+    In Deepobs called `net_cifar10_3c3d`
+    """
+
+    def __init__(self, in_channels=3, n_out=10, use_tanh=False):
+        super(CIFAR10Net, self).__init__()
+        self.output_size = n_out
+        activ = nn.Tanh if use_tanh else nn.ReLU
+
+        self.add_module(
+            'conv1', tfconv2d(in_channels=in_channels, out_channels=64, kernel_size=5)
+        )
+        self.add_module('relu1', nn.ReLU())
+        self.add_module(
+            'maxpool1', tfmaxpool2d(kernel_size=3, stride=2, tf_padding_type='same')
+        )
+
+        self.add_module(
+            'conv2', tfconv2d(in_channels=64, out_channels=96, kernel_size=3)
+        )
+        self.add_module('relu2', nn.ReLU())
+        self.add_module(
+            'maxpool2', tfmaxpool2d(kernel_size=3, stride=2, tf_padding_type='same')
+        )
+
+        self.add_module(
+            'conv3',
+            tfconv2d(
+                in_channels=96, out_channels=128, kernel_size=3, tf_padding_type='same'
+            ),
+        )
+        self.add_module('relu3', nn.ReLU())
+        self.add_module(
+            'maxpool3', tfmaxpool2d(kernel_size=3, stride=2, tf_padding_type='same')
+        )
+
+        self.add_module('flatten', nn.Flatten())
+
+        self.add_module('dense1', nn.Linear(in_features=3 * 3 * 128, out_features=512))
+        self.add_module('relu4', activ())
+        self.add_module('dense2', nn.Linear(in_features=512, out_features=256))
+        self.add_module('relu5', activ())
+        self.add_module('dense3', nn.Linear(in_features=256, out_features=n_out))
+
+        # init the layers
+        for module in self.modules():
+            if isinstance(module, nn.Conv2d):
+                nn.init.constant_(module.bias, 0.0)
+                nn.init.xavier_normal_(module.weight)
+
+            if isinstance(module, nn.Linear):
+                nn.init.constant_(module.bias, 0.0)
+                nn.init.xavier_uniform_(module.weight)
diff --git a/examples/requirements.txt b/examples/requirements.txt
@@ -1,4 +1,5 @@
 botorch==0.8.2
 gpytorch==1.9.1
 tqdm
-netcal==1.1.3
+netcal==1.3.5
+deepobs==1.1.2
diff --git a/laplace/__init__.py b/laplace/__init__.py
@@ -7,17 +7,17 @@
 REGRESSION = 'regression'
 CLASSIFICATION = 'classification'
 
-from laplace.baselaplace import BaseLaplace, ParametricLaplace, FullLaplace, KronLaplace, DiagLaplace, LowRankLaplace
-from laplace.lllaplace import LLLaplace, FullLLLaplace, KronLLLaplace, DiagLLLaplace
+from laplace.baselaplace import BaseLaplace, ParametricLaplace, FullLaplace, KronLaplace, DiagLaplace, LowRankLaplace, FunctionalLaplace
+from laplace.lllaplace import LLLaplace, FullLLLaplace, KronLLLaplace, DiagLLLaplace, FunctionalLLLaplace
 from laplace.subnetlaplace import SubnetLaplace, FullSubnetLaplace, DiagSubnetLaplace
 from laplace.laplace import Laplace
 from laplace.marglik_training import marglik_training
 
 __all__ = ['Laplace',  # direct access to all Laplace classes via unified interface
-           'BaseLaplace', 'ParametricLaplace',  # base-class and its (first-level) subclasses
+           'BaseLaplace', 'ParametricLaplace', 'FunctionalLaplace',  # base-class and its (first-level) subclasses
            'FullLaplace', 'KronLaplace', 'DiagLaplace', 'LowRankLaplace',  # all-weights
            'LLLaplace',  # base-class last-layer
-           'FullLLLaplace', 'KronLLLaplace', 'DiagLLLaplace',  # last-layer
-           'SubnetLaplace',  # base-class subnetwork
+           'FullLLLaplace', 'KronLLLaplace', 'DiagLLLaplace', 'FunctionalLLLaplace',  # last-layer
+           'SubnetLaplace',  # subnetwork
            'FullSubnetLaplace', 'DiagSubnetLaplace',  # subnetwork
            'marglik_training']  # methods