This is a reading list on Bayesian neural networks. The list is quite opinionated in focusing mainly on the case where priors and posteriors are defined over the neural network weights, and the predictive task is classification. In this setting, Bayesian neural networks might be preferable because they either improve test accuracy, test calibration or both. I first include a list of essential papers, and then organize papers by subject. The aim is to create a guide for new researchers in Bayesian Deep Learning, that will speed up their entry to the field.
Interested in a more detailed discussion? Check out our recent review paper 🚨"A Primer on Bayesian Neural Networks: Review and Debates" joint work with Julyan Arbel (Inria Grenoble Rhône-Alpes), Mariia Vladimirova (Criteo), Vincent Fortuin (Helmholtz AI)🚨.
-
[Weight Uncertainty in Neural Networks]: A main challenge in Bayesian neural networks is how to obtain gradients for parametric distributions such as the Gaussian. This is one of the first papers that discusses Variational Inference using the reparametrization trick for realistic neural networks. The reparametrization trick allows obtaining gradients using Monte Carlo sampling from the posterior.
-
[Laplace Redux -- Effortless Bayesian Deep Learning]: The Laplace approximation is one of the few realistic options to perform approximate inference for Bayesian Neural Networks. Not only does it result in good uncertainty estimates, but it can also be used for model selection and invariance learning.
-
[How Good is the Bayes Posterior in Deep Neural Networks Really?]: This paper describes a major criticism of Bayesian Deep Learning, that for the metrics of accuracy and negative log-likelihood, a deterministic network is often better than a Bayesian one. At the same time it describes two common tricks for efficient MCMC approximate inference, Preconditioned Stochastic Gradient Langevin Dynamics and Cyclical Step Sizing.
-
[What Are Bayesian Neural Network Posteriors Really Like?]: This paper implements Hamiltonian Monte Carlo (HMC) for approximate inference in Bayesian Deep Neural Networks. HMC is considered the gold standard in approximate inference, however it is very computationally intensive.
-
[Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles]: This paper implements deep neural network ensembles. This is a Frequentist alternative to Bayesian Neural Networks. It is one of the most common baselines for Bayesian Neural Networks, and frequently outperforms them.
-
[The Bayesian Learning Rule]: Many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models.
- SAM as an Optimal Relaxation of Bayes
- Robustness to corruption in pre-trained Bayesian neural networks
- Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep Learning under Distribution Shift
- Collapsed Inference for Bayesian Deep Learning
- The Memory Perturbation Equation: Understanding Model's Sensitivity to Data
- Learning Layer-wise Equivariances Automatically using Gradients
The Bayesian paradigm consists of choosing a prior over the model parameters, evaluating the data likelihood and the estimating the posterior over the model parameters. This can be done analytically only in simple cases (Gaussian likelihood, prior and posterior). For more complex and interesting cases we have to resort to approximate inference.
++ Computationally efficient --Explores a single mode of the loss
- Keeping the neural networks simple by minimizing the description length of the weights.
- Ensemble learning in Bayesian neural networks.
- Practical variational inference for neural networks.
- Auto-encoding variational Bayes.
- Weight uncertainty in neural networks.
- Fast and scalable Bayesian deep learning by weight-perturbation in ADAM.
- Vprop: Variational inference using rmsprop.
- Structured and efficient variational deep learning with matrix Gaussian posteriors
- Slang: Fast structured covariance approximations for bayesian deep learning with natural gradient.
- Structured second-order methods via natural-gradient descent.
++ Computationally efficient --Explores a single mode of the loss
- A practical Bayesian framework for backpropagation networks.
- A scalable Laplace approximation for neural networks.
- Laplace Redux -- Effortless Bayesian Deep Learning
- Improving predictions of bayesian neural nets via local linearization.
- Adapting the linearised laplace model evidence for modern deep learning.
- Bayesian Deep Learning via Subnetwork Inference
++ Explore multiple modes --Computationally expensive
- What Are Bayesian Neural Network Posteriors Really Like?
- Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles.
- Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks.
- Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning.
- Bayesian learning via stochastic gradient Langevin dynamics.
- A complete recipe for stochastic gradient MCMC.
++ Explore multiple modes, Computationally competitive --Memory expensive
- Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
- Multi-Symmetry Ensembles: Improving Diversity and Generalization via Opposing Symmetries
- Hyperparameter Ensembles
- Agree to Disagree: Diversity through Disagreement for Better Transferability
- Feature Space Particle Inference for Neural Network Ensembles
- Deep Ensembles: A Loss Landscape Perspective
- DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation
- Deep Ensembles Work, But Are They Necessary?
- Combining Diverse Feature Priors
Bayesian neural networks are compatible with two approaches to model selection that eschew validation and test sets and can (in principle) give guarantees on out-of-sample performance simply by using the training set.
The marginal likelihood is a purely Bayesian approach to model selection.
- Scalable marginal likelihood estimation for model selection in deep learning
- Bayesian model selection, the marginal likelihood, and generalization.
- A bayesian perspective on training speed and model selection.
- Speedy performance estimation for neural architecture search.
PAC-Bayes bounds give high-probability Frequentist guarantees on out-of-sample performance.
- Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data.
- On the role of data in PAC-Bayes bounds.
- PAC-Bayesian theory meets Bayesian inference.
- PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization.
- Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian Compression Approach.
- Sharpness-Aware Minimization for Efficiently Improving Generalization.
Bayesian Neural Networks aim to improve test accuracy and more commonly test calibration. While Bayesian Neural Networks are often tested on standard computer vision datasets such as CIFAR10/100 with test accuracy as the test metric, dedicated datasets and metrics can also be used.
- Benchmarking bayesian deep learning on diabetic retinopathy detection tasks.
- Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift.
- Uncertainty Baselines: Benchmarks for uncertainty & robustness in deep learning.
- WILDS: A Benchmark of in-the-Wild Distribution Shifts
- Expected Calibration Error (ECE).
- Thresholded Adaptive Calibration Error (TACE).
- Pitfalls of in-domain uncertainty estimation and ensembling in deep learning.
- A New Vector Partition of the Probability Score (Brier score and its decomposition)
- Hands-on Bayesian Neural Networks--a Tutorial for Deep Learning Users.
- A review of uncertainty quantification in deep learning: Techniques, applications and challenges.
- Bayesian neural networks: An introduction and survey
Did you find this reading list helpful? Consider citing our review paper on your scientific publications using the following BibTeX citation:
@article{arbel2023primer,
title={A Primer on Bayesian Neural Networks: Review and Debates},
author={Arbel, Julyan and Pitas, Konstantinos and Vladimirova, Mariia and Fortuin, Vincent},
journal={arXiv preprint arXiv:2309.16314},
year={2023}
}
When citing this repository on any other medium, please use the following citation:
A Primer on Bayesian Neural Networks: Review and Debates by Julyan Arbel, Konstantinos Pitas, Mariia Vladimirova and Vincent Fortuin