Tiny Neural Nets

This repository exisits for two reasons:

I wanted to explore neural networks with a single hidden layer of very few units with ReLU activations; ideally few enough to "see" each ReLU in prediction space.
I wanted to have a go at implementing Hamiltonian and Langevin Monte Carlo MCMC schemes for inference in Bayesian neural networks (BNNs).

Both datasets were generated from Gaussian process prior samples with exponentiated negative quadratic covariance functions. In the notebooks you will see that to reach the prediction samples shown below, for each MCMC scheme there was a removal of earlier samples corresponding to a burn-in period, as well as a sample lag correlation investigation to figure out how much to thin the samples in order for them to be treated as independent. LMC is just gradient descent with a random walk superposed---i.e. the proposals are only a small step away from the current states, and so to obtain a set of independent posterior samples we must simulate the Langevin diffusion for very many steps, and then thin the aquired chain quite heavily. HMC is more sophisticated, and so although it takes longer to obtain each succesive sample, successive samples can be almost entirely independent if the Hamiltonian dynamics simulation is long enough (i.e. enough leapfrog steps), and the posterior is explored much more quickly as a function of Markov chain length in comparison to LMC.

Underparameterised Regime

Below are the predictions of a deterministic NN trained with Adam, a BNN with posterior samples obtained via Metropolis-adjusted Langevin Monte Carlo, and a BNN with posterior samples obtained via Hybrid/Hamiltonian Monte Carlo respectively. The networks all have one hidden layer of just three units, and ReLU activations are used throughout.

Notice how the characteristic "corners" of the seperate ReLU's can be seen, and also how relatively tight the BNN uncertainty bounds are in this underparameterised regime, since the abundance of relatively noiseless data leaves little uncertainty over the network weight values.

Overparameterised Regime

Below are the predictions of the same three models but in a setting with much less data. The networks all have one hidden layer of 100 units, and Tanh activations are used at this point for smoother prediction samples.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.vscode		.vscode
__pycache__		__pycache__
figs		figs
neural_networks		neural_networks
notebooks		notebooks
README.md		README.md
data.py		data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tiny Neural Nets

Underparameterised Regime

Overparameterised Regime

About

Uh oh!

Releases

Packages

Languages

Sheev13/tiny-neural-nets

Folders and files

Latest commit

History

Repository files navigation

Tiny Neural Nets

Underparameterised Regime

Overparameterised Regime

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages