prism_training

Example training scripts and models for PRISM paper

Overview

E11 Bio recently released a new technology - PRISM (Protein Reconstruction and Identification through Multiplexing), a platform that combines viral barcoding, expansion microscopy, and iterative immunolabeling for large-scale neuronal reconstruction. Neurons were labeled with combinatorial “protein bits” that act as barcodes to distinguish individual cells and support error-correction during reconstruction.

Read more about the approach in the paper and accompanying blog post

This is a simple tutorial for downloading and running the models used for neuron segmentation and synapse detection. This tutorial is currently pretty minimal, but will be extended/improved in the coming weeks. Additionally, all experimental code (including post-processing and analysis) will be released in a separate repository.

We uploaded data to a publically accessible s3 bucket via aws open data. More details on the bucket contents can be seen in this repository

The reconstruction pipeline consists of 5 models:

barcode signal enhancement
affinities + LSDs
uniform embedding
barcode expression
synapse detection

This tutorial uses several different libraries for training/predicting/visualizing data including:

Getting started

Pre-requisites

We highly recommend using a package manager. conda, virtualenv, or uv are all good examples. The instructions are created assuming usage of uv. Here are the installation instructions.

Tested on ubuntu 22.04 with an a6000 gpu. Assumes basic python an ML knowledge. For some useful tutorial with affinity/lsd models see this repo

clone this repo:

git clone https://github.com/e11bio/prism_training.git
cd prism_training/prism_training

download and consolidate example data

cd data  # from script directory
uv run download_data.py
uv run consolidate_data.py
cd ../  # revert to script directory (optional)

predict enhanced data (takes about 10 minutes on NVIDIA RTX 6000 gpu)

cd train/enhanced  # from script directory
uv run predict.py
cd ../../  # revert to script directory (optional)
cp -r prism_training/data/instance/example_data.zarr/enhanced prism_training/data/semantic/example_data.zarr/enhanced

run any of the other models via uv run train.py. Some models take arguments, please read the individual README's.

Enhancement

Example training from scratch for 10 iters: python train.py -i 10

Since we by default compute the difference between the average barcodes and the raw data as our target signal, a batch might look like:

Might have to tweak the shader a bit to see the target since it can contain negative values. The black pixels around the object denote the sparsely masked label for training (pixels outside of this label do not contribute to the loss). No need to visualize the predictions yet since this is from scratch so they will be uninformative.

Example training from scratch and learning the direct average barcodes rather than residuals: python train.py -d false

A batch might then look like:

Which is a bit more visually intuitive.

Example training from downloaded checkpoint: python train.py -c model

Now we can visualize the predictions (residual barcode), and we can visualize the predicted average barcodes (simply adding the residual to the raw data). A batch might then look like:

If we then run inference, i.e python predict.py and visualize the raw vs enhanced, we could see something like:

This is using a more fancy custom shader in which each channel is percentile normalized first.

Affs/LSDs

Example training from model using raw input: python train.py -d raw -c model

A batch might look like:

The predictions are kind of noisy since the raw data is used as input.

Assuming we ran enhancement inference above, example using enhanced input: python train.py -d enhanced -c model

Which might give us something cleaner like:

Uniform embedding

Example training with uv run train.py. If the model weights have been downloaded and are available, they will be used. Otherwise training will start from scratch. By default we will only train a single iteration for illustration purposes, but feel free to increase the NUM_ITERATIONS variable as high as you want.

This is an example of what the first batch may look like when starting from the provided checkpoint:

Note that the purpose of the uniform embedding is to encode the barcodes in a space where computing the distance between two pixels is easy, the benefit of this embedding is largely hidden by visualizing the initial raw data with a PCA projection, since this is also a method for extracting the basis vectors of maximum variation and then displaying them. Without any appropriate normalization the raw data can be very hard to see.

Barcode expression

Example training with uv run train.py. The same setup as the uniform model, will start with the provided model if it is available, but will otherwise train from scratch.

Here is an example of the first batch assuming the checkpoint is available.

Synapses

Example training from model: python train.py -c model

A batch might look like:

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
prism_training		prism_training
static		static
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

prism_training

Overview

Getting started

Pre-requisites

Enhancement

Affs/LSDs

Uniform embedding

Barcode expression

Synapses

About

Uh oh!

Releases

Packages

Contributors 2

Languages

e11bio/prism_training

Folders and files

Latest commit

History

Repository files navigation

prism_training

Overview

Getting started

Pre-requisites

Enhancement

Affs/LSDs

Uniform embedding

Barcode expression

Synapses

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages