Skip to content

FynnGerding/SMI-training-dynamics

Repository files navigation

SMI Training Dynamics

This repo reproduces the MNIST MLP experiment from Sliced Information Plane for Analysis of Deep Neural Networks (Wongso, Ghosh, Motani, Jan 31, 2025) [TechRxiv preprint] using 1‑SMI to study learning dynamics on the Sliced Information Plane.

Overview

The goal is to track how each layer’s representation T evolves during training in terms of:

  • S_1(X; T) (information retained about the input),
  • S_1(T; Y) (information about the label).

Plotting these quantities across epochs yields the Sliced Information Plane (SIP).

Definitions (Wongso et al., Eq. 1–3)

Mutual Information (MI, Eq. 1)

$$I(X;Y)=\int p(x,y)\log\frac{p(x,y)}{p(x)p(y)}\,dx\,dy$$

Sliced Mutual Information (1‑SMI, Eq. 2)

$$S_1(X;Y)=\mathbb{E}_{\theta,\phi}\left[I(\theta^\top X;\,\phi^\top Y)\right]$$

where $\theta$ and $\phi$ are random unit vectors (uniform on the sphere). For discrete Y, only X is projected.

k‑Sliced Mutual Information (k‑SMI, Eq. 3)

$$S_k(X;Y)=\mathbb{E}_{A,B}\left[I(A^\top X;\,B^\top Y)\right]$$

where A and B are random orthonormal projection matrices (uniform on the Stiefel manifold).

Method In This Repo (1‑SMI on MNIST)

  • Train an MLP and save ReLU activations from hidden layers on the test set at snapshot epochs.
  • Estimate S_1(X; T) and S_1(T; Y) for each saved layer and epoch.
  • Default estimator: KSG (k_neighbors = 5), with m random projections.

Results

The Sliced Information Plane shows the relationship between S_1(X; T) and S_1(T; Y) across epochs for each layer, showing how representations evolve in terms of input retention and label relevance.

h32 (32 hidden units) h64 (64 hidden units)
Sliced Information Plane h32 Sliced Information Plane h64

Training Curves. Loss and accuracy over epochs for train/test splits.

h32 (32 hidden units) h64 (64 hidden units)
Training Curves h32 Training Curves h64

My contribution: Redundancy Comparison. (h32 vs h64) Estimated redundancy dynamics side-by-side.

PID Redundancy Comparison h32 vs h64

How To Run

Setup

uv sync
uv add --dev --editable .

Train and save activation snapshots

bash experiments/mnist_mlp/run_training.sh h64

Compute 1‑SMI metrics

bash experiments/mnist_mlp/run_analysis.sh h64

Compute Redundancy metric

bash experiments/mnist_mlp/run_pid_analysis.sh h64

Generate plots

bash experiments/mnist_mlp/generate_plots.sh h64

Variants:

  • h64 matches the paper (64 units per hidden layer).
  • h32 is the reduced‑capacity variant (32 units).

Outputs are written to experiments/mnist_mlp/runs/<variant>/ (configs, checkpoints, activations, analysis metrics, and plots).

Repo Structure

  • experiments/mnist_mlp/run.py: loads configs and launches training.
  • experiments/mnist_mlp/run_analysis.py: computes 1‑SMI from saved activations.
  • experiments/mnist_mlp/generate_plots.py: renders training curves and SIP plots.
  • src/smi_training_dynamics/neural_networks/: MLP and training loop.
  • src/smi_training_dynamics/measures/: MI, SMI, k‑SMI estimators (implementation based on Wongso et al., 2023).
  • src/smi_training_dynamics/visualizations/: plotting utilities.

About

Reimplementation of the Sliced Information Plane (SIP) framework from Wongso, Ghosh, and Motani (2025) for analyzing deep neural network training dynamics. The repo uses Sliced Mutual Information (SMI) to obtain scalable, finite dependence estimates in high‑dimensional, deterministic settings, and applies them to MNIST MLP experiments.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors