Diffusion Neuron

Conditional next-frame video prediction using live human brain cells (CL1, Cortical Labs) as a biological bottleneck. Frames are encoded into electrical stimulation, sent to real neurons over UDP, and the spike responses are decoded into a predicted next frame.

Why this problem is hard
Architecture
Same Frame Encoding (SFE)
Setup
Running
Training metrics
Ablations and tuning
Files

Why this problem is hard

In standard RL, the policy selects from a discrete or low-dimensional action space and the reward signal is relatively unambiguous. Here the "action" is a 64x64 RGB image (roughly 12,000 continuous values) and the reward (SSIM) is a weak pixel-level similarity score that happily rewards predicting a blurry mean frame.

The bottleneck is non-differentiable. Gradients cannot flow through the biological cells. The encoder is trained with REINFORCE, but it is not trying to elicit a fixed response. The cells are biological and their outputs change over time as they adapt to stimulation. The encoder's role is closer to a catalyst: continuously adapting its inputs to the cells' current state, providing structured stimulation that gives the cells useful signal to respond to. The cells and encoder co-adapt, but with no direct gradient signal the encoder can only guide this through reward-shaped trial and error, which is slow and high-variance.

The spike signal is low-dimensional and noisy. 177 spike counts (59 channels x 3 rounds) must carry enough information to reconstruct a 64x64 frame. The cells are biological; the same stimulation does not produce identical spikes each time. This noise looks identical to the decoder as meaningful variation, making it hard to learn a stable mapping.

The decoder can cheat. MSE loss rewards predicting the mean of the training distribution. A decoder with enough capacity can achieve reasonable SSIM purely by memorizing per-class average frames, completely ignoring the spike input. The noise ablation (ablation_noise.py) quantifies exactly how much SSIM is available for free this way. Real CL1 training must beat that ceiling to prove the spikes are contributing.

The reward is delayed and indirect. Feedback to the cells (positive/negative stim channels) is based on SSIM from the previous step. The cells are not being told what to do; they are being shaped over many steps toward responses that correlate with better predictions, which is a much weaker signal than supervised learning.

The result is that progress is slow, noisy, and difficult to interpret. Small SSIM improvements may reflect the decoder overfitting rather than the cells learning. The ablation workflow exists to separate these two effects.

Architecture

frame_t + class
      |
      v
 StimEncoder     CNN + REINFORCE policy, SFE (3 rounds: full / spatial / color)
      |
      |  freqs + amps (64 channels, UDP)
      v
  CL1 Cells      Real neurons
      |
      |  spike counts (64 channels x 3 rounds = 177 values)
      v
 SpikeDecoder    Spatial spike grid -> Conv2d -> 3x bilinear upsample
      |
      v
 predicted frame_t+1  (64x64 RGB)

Encoder

CNN that outputs stimulation parameters (frequency + amplitude) per active channel. Non-differentiable through the biological bottleneck, so the encoder is trained with REINFORCE using SSIM as the reward. Each frame is encoded using SFE (see below).

Decoder

177 spike values are reshaped into a (3, 8, 8) spatial grid, processed through Conv2d layers, then upsampled 8->16->32->64 via bilinear interpolation. GroupNorm throughout to avoid grey collapse at batch size 1. No class conditioning; output is driven purely by spike patterns.

Feedback

After each step, feedback is packed into the next stim message on dedicated channels:

SSIM >= 0.6: synchronous burst on 8 positive channels
SSIM <= 0.4: chaotic noise on 8 negative channels
Otherwise: minimum safe stimulus

Channel Layout

Channels	Role
`[0, 4, 7, 56, 63]`	Dead (hardware)
Active[0:42]	Encoder (SFE full / spatial / color)
Active[42:50]	Positive feedback
Active[50:58]	Negative feedback

Same Frame Encoding (SFE)

SFE is the stimulation strategy used by the encoder. Rather than sending a single stimulation pattern per frame, the same frame is encoded three times in succession with different visual preprocessing, each round driving a dedicated set of physical channels:

Round	Mode	Preprocessing	Channels
1	`full`	Raw RGB	42
2	`spatial`	Sobel edge magnitude (grayscale)	21
3	`color`	YUV chroma (U + V channels)	21

Each round has its own policy head in the encoder, so the stim parameters are learned independently per mode. The decoder receives all three rounds concatenated (177 values total) and treats each round as a separate spatial channel in the (3, 8, 8) grid.

The motivation is that different preprocessing may drive different cell populations or elicit different temporal response patterns, giving the decoder richer and more structured spike information than a single round would provide. This is loosely analogous to how the visual cortex processes the same scene through parallel pathways (edges, color, motion) simultaneously.

Whether SFE produces meaningfully richer spike patterns than single-round encoding is an open empirical question that likely varies by task. A direct comparison (setting STIM_ROUNDS = 1 and retraining) against the noise ablation baseline would determine whether the additional rounds are contributing signal.

Setup

pip install -r requirements.txt

Python 3.10+, PyTorch 2.0+. CUDA optional.

Dataset

Run the download script to fetch and extract UCF101 automatically:

python download_ucf101.py

This downloads UCF101.rar (~6.5 GB), extracts the classes listed in UCF101_CLASSES, and deletes the RAR afterwards. To keep the RAR or extract different classes:

python download_ucf101.py --keep-rar --classes IceDancing Biking Basketball

On Windows, UnRAR must be on your PATH. Download from https://www.rarlab.com/rar_add.htm.

Active classes are set in config.py via UCF101_CLASSES.

Running

Hardware

On the CL1 device:

python cl1_neural_interface.py

On the training machine:

python train.py --cl1-host cl1-2544-122.corticalcloud

Training resumes from checkpoints/latest.pt automatically if it exists.

Inference

Output goes to inference_out/ by default. Override with --out.

# From dataset
python infer.py --class IceDancing --frames 30 --cl1-host cl1-2544-122.corticalcloud

# From a custom image
python infer.py --class Biking --image my_frame.png --cl1-host cl1-2544-122.corticalcloud

# Test set
python infer.py --test-set --cl1-host cl1-2544-122.corticalcloud

# Training set (capped)
python infer.py --train-set --class IceDancing --max-samples 30 --cl1-host cl1-2544-122.corticalcloud

Training metrics

Metric	Meaning
`ssim`	Structural similarity between predicted and actual next frame. Higher is better.
`mse`	Pixel-level mean squared error. Lower is better.
`enc_loss`	REINFORCE loss (-log_prob x advantage). Negative means the encoder took actions that beat the baseline.
`baseline`	Exponential moving average of SSIM (decay 0.99). Tracks rolling average reward.

Ablations and tuning

Random noise ablation

ablation_noise.py trains the decoder with random noise in place of real CL1 spikes. This establishes a baseline: the SSIM a decoder can achieve purely by learning mean-frame statistics, with no biological signal.

python ablation_noise.py

Use --ablation in infer.py to run inference with the ablation checkpoint:

python infer.py --class IceDancing --frames 30 --ablation

If real CL1 training SSIM does not consistently exceed the noise ablation SSIM, the decoder is ignoring the spikes and learning from dataset statistics alone.

Decoder capacity

The decoder capacity (LR_DECODER, channel widths in models/decoder.py) controls how much the decoder can learn independently of the spike signal.

The core tradeoff:

Too large: the decoder has enough capacity to memorize mean frames per class, making spike content irrelevant. It will match or exceed the noise ablation SSIM without needing real spikes.
Too small: the decoder cannot express the spike-to-frame mapping even when spikes carry real information. Outputs will be blurry regardless of spike quality.

Tuning guide:

Run ablation_noise.py to get a noise baseline SSIM
Train with real CL1 spikes and compare SSIM curves
If the curves are similar, reduce decoder capacity (LR_DECODER, channel counts) and retrain
If real training SSIM drops significantly below the noise baseline, the decoder is too small; increase capacity
The target is the smallest decoder where real CL1 training measurably outperforms random noise

Current settings (tuned against noise ablation):

LR_DECODER = 1e-5 (10x lower than encoder)
Channel progression: 3->32->64->32->16->8->3

Class conditioning

The decoder does not receive the class label. This prevents it from learning per-class mean frames, which would let it produce plausible-looking outputs with no spike information. If outputs collapse to grey or become unrecognisable, re-adding class conditioning (class_emb + class_proj in models/decoder.py) gives the decoder a prior to work from while the encoder learns.

Files

Entry points

File	Description
`train.py`	Main training loop. Encoder -> SFE stim -> spikes -> decoder -> REINFORCE + MSE update. Checkpoints every 500 steps.
`infer.py`	Inference. Supports single frame, multi-frame, custom image, and full train/test set sweeps. Outputs side-by-side GIF (input / predicted) to `inference_out/`. Supports `--ablation` mode.
`ablation_noise.py`	Trains decoder with random noise instead of CL1 spikes. Establishes SSIM baseline for comparison against real training.
`download_ucf101.py`	Downloads UCF101.rar (~6.5 GB) and extracts the configured classes to `data/train/`.
`cl1_neural_interface.py`	Runs on the CL1 device. Receives stim packets, runs `create_stim_plan()`, collects spikes, replies.

Models

File	Description
`models/encoder.py`	`StimEncoder` - CNN backbone with three SFE policy heads. Outputs (mu, sigma) per channel for frequency and amplitude.
`models/decoder.py`	`SpikeDecoder` - Spatial spike grid -> Conv2d -> bilinear upsample to 64x64. No class conditioning.

Protocol layer

File	Description
`cl1/protocol.py`	UDP packet pack/unpack. STIM = 520 bytes, SPIKE = 264 bytes. Channels with amp=0 skip validation.
`cl1/interface.py`	`CL1Interface` - async UDP socket pair. Handles artifact rejection window and spike receive with timeout.
`cl1/channel_map.py`	`build_stim_arrays()` - maps encoder and feedback params into full 64-channel arrays.

Data

File	Description
`data/ucf101_subset.py`	UCF101 loader. Returns `(frame_t, frame_t+1, class_idx, is_new_clip)`. 80/20 train/test split. Clips are shuffled but intra-clip frame order is preserved so inter-clip pauses fire correctly.

Utilities

File	Description
`config.py`	All constants.
`feedback.py`	`compute_feedback()` - positive/negative stim arrays from SSIM value.
`utils/ssim.py`	Differentiable SSIM used as the REINFORCE reward.
`utils/session.py`	`SessionManager` - enforces the CL1 rest cycle (2.5hr train / 1hr rest).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diffusion Neuron

Contents

Why this problem is hard

Architecture

Encoder

Decoder

Feedback

Channel Layout

Same Frame Encoding (SFE)

Setup

Dataset

Running

Hardware

Inference

Training metrics

Ablations and tuning

Random noise ablation

Decoder capacity

Class conditioning

Files

Entry points

Models

Protocol layer

Data

Utilities

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Diffusion Neuron

Contents

Why this problem is hard

Architecture

Encoder

Decoder

Feedback

Channel Layout

Same Frame Encoding (SFE)

Setup

Dataset

Running

Hardware

Inference

Training metrics

Ablations and tuning

Random noise ablation

Decoder capacity

Class conditioning

Files

Entry points

Models

Protocol layer

Data

Utilities