SMI Training Dynamics

This repo reproduces the MNIST MLP experiment from Sliced Information Plane for Analysis of Deep Neural Networks (Wongso, Ghosh, Motani, Jan 31, 2025) [TechRxiv preprint] using 1‑SMI to study learning dynamics on the Sliced Information Plane.

Overview

The goal is to track how each layer’s representation T evolves during training in terms of:

S_1(X; T) (information retained about the input),
S_1(T; Y) (information about the label).

Plotting these quantities across epochs yields the Sliced Information Plane (SIP).

Definitions (Wongso et al., Eq. 1–3)

Mutual Information (MI, Eq. 1)

$$I(X;Y)=\int p(x,y)\log\frac{p(x,y)}{p(x)p(y)}\,dx\,dy$$

Sliced Mutual Information (1‑SMI, Eq. 2)

$$S_1(X;Y)=\mathbb{E}_{\theta,\phi}\left[I(\theta^\top X;\,\phi^\top Y)\right]$$

where $\theta$ and $\phi$ are random unit vectors (uniform on the sphere). For discrete Y, only X is projected.

k‑Sliced Mutual Information (k‑SMI, Eq. 3)

$$S_k(X;Y)=\mathbb{E}_{A,B}\left[I(A^\top X;\,B^\top Y)\right]$$

where A and B are random orthonormal projection matrices (uniform on the Stiefel manifold).

Method In This Repo (1‑SMI on MNIST)

Train an MLP and save ReLU activations from hidden layers on the test set at snapshot epochs.
Estimate S_1(X; T) and S_1(T; Y) for each saved layer and epoch.
Default estimator: KSG (k_neighbors = 5), with m random projections.

Results

The Sliced Information Plane shows the relationship between S_1(X; T) and S_1(T; Y) across epochs for each layer, showing how representations evolve in terms of input retention and label relevance.

h32 (32 hidden units)	h64 (64 hidden units)

Training Curves. Loss and accuracy over epochs for train/test splits.

h32 (32 hidden units)	h64 (64 hidden units)

My contribution: Redundancy Comparison. (h32 vs h64) Estimated redundancy dynamics side-by-side.

How To Run

Setup

uv sync
uv add --dev --editable .

Train and save activation snapshots

bash experiments/mnist_mlp/run_training.sh h64

Compute 1‑SMI metrics

bash experiments/mnist_mlp/run_analysis.sh h64

Compute Redundancy metric

bash experiments/mnist_mlp/run_pid_analysis.sh h64

Generate plots

bash experiments/mnist_mlp/generate_plots.sh h64

Variants:

h64 matches the paper (64 units per hidden layer).
h32 is the reduced‑capacity variant (32 units).

Outputs are written to experiments/mnist_mlp/runs/<variant>/ (configs, checkpoints, activations, analysis metrics, and plots).

Repo Structure

experiments/mnist_mlp/run.py: loads configs and launches training.
experiments/mnist_mlp/run_analysis.py: computes 1‑SMI from saved activations.
experiments/mnist_mlp/generate_plots.py: renders training curves and SIP plots.
src/smi_training_dynamics/neural_networks/: MLP and training loop.
src/smi_training_dynamics/measures/: MI, SMI, k‑SMI estimators (implementation based on Wongso et al., 2023).
src/smi_training_dynamics/visualizations/: plotting utilities.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
experiments/mnist_mlp		experiments/mnist_mlp
src/smi_training_dynamics		src/smi_training_dynamics
tests		tests
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMI Training Dynamics

Overview

Definitions (Wongso et al., Eq. 1–3)

Method In This Repo (1‑SMI on MNIST)

Results

How To Run

Repo Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SMI Training Dynamics

Overview

Definitions (Wongso et al., Eq. 1–3)

Method In This Repo (1‑SMI on MNIST)

Results

How To Run

Repo Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages