Implementation of different deep learning specific things
- Encoder-Decoder Transformer in PyTorch, build to be aligned with Vasvani et al. 2017
- UNet based diffusion Model
- LeNet
These visualisations are created for this blog post: https://daniel-sinkin.github.io/2025/07/31/Bishop-Visualisation.html
Suppose we have natural data laying in
In this example I trained a simple 5 -> 16 -> 16 -> 5 MLP with ReLU activation for the denoising.
Inspired by https://www.deeplearningbook.org Chapter 14 Autoencoders Page 512.
Suppose we have a sample point (in this example
We sample a direction
As we want an acceptance probability, we clip it to be at most 1:
We then accept this step with probability
If we accept, we define
Suppose we want to sample a complex distribution
Here I implemented the discussion in this stackexchange answer https://stats.stackexchange.com/a/151351
which shows the "Ridge" that we get with overfitting explicitly

Assuming we are fitting a curve with Gaussian Noise, then for any fixed x value the distribution
of the target value is given as follows (based on Figure 1.28 in Bishop, Pattern Recognition and Machine Learning, 2006)

We take our samples
The first 
Suppose we have a covariance matrix 
A somewhat efficient algorithm for sample standard normal random variable using a transform on
uniforms distributed variables. Suppose we want to sample
First start with
We then apply the transform
This uses the rotational symmetry of the 2 dimensional normal distribution. We can see more accurately
what the transformation does by highlighting a particular slice

When doing Bayesian inference, if

- Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.



















