René Chenard - June 2020
Generates animated plots of different models of neural network classifiers during learning in order to compare them.
![]() |
---|
Figure 1 — Tanh networks with different amounts of neurons per layer (Sequential, not shuffled). |
![]() |
---|
Figure 2 — Tanh networks with different amounts of neurons per layer (Batches of 32, shuffled). |
Directory | Description | Links |
---|---|---|
Activation functions | Comparison of the effect of using different types of activation function. | ① ② |
Amounts of layers | Comparison of the effect of using different depths of layers. | ① ② |
Amounts of neurons | Comparison of the effect of using different amounts of neurons per layer. | ① ② |
Learning rate | Comparison of the effect of using different learning rates. | ① ② |
Loss functions | Comparison of the effect of using different loss functions. | ① ② |
Optimizers | Comparison of the effect of using different optimizers. | ① ② ③ ④ |
Weight decay | Comparison of the effect of using different values of weight decay. | ① ② |
Weight initialization | Comparison of the effect of using different weight initialization methods. | ① ② |
For simplicity reasons, these experiments are done on simple neural network classifiers with two classes. A negative output corresponds to one class while a positive output corresponds to the other class. The error function is applied on the output neuron to ensure that the output is always between -1 and 1.
Unless specified, the neural networks are composed of four layers (two hidden layers) with 32 neurons per layer and uses the stochastic gradient descent optimizer with a learning rate of 0.01, a momentum of 0.9, weight decay at 0.001. The default loss function is the mean squared error.
The sampling methods used by default are:
- sequential learning over the whole dataset (gradient is calculated after feeding the whole dataset in the same sequential order)
- Shuffled batches of 32 samples
The results obtained by these experiments may vastly diverge with slightly different parameters. Always verify what you infer from them! |
- Explore alternative learning methods.
- Use one frame per retropopagation instead of one frame per epoch, for smoother animations.
- Implement dimensionality reduction (like PCA, UMAP or tSNE) to analyze higher dimensions datasets.
- Display more information about the setup: loss function, architecture, optimizer.
- Display more information about the learning: learning curve, weights & biases, metrics, etc.
- Make a module or an application out of the Jupyter Notebook experiments.
- Add different color themes.
- Add support for 0-1 classifiers and multiple classes classifiers.