Example training scripts and models for PRISM paper
E11 Bio recently released a new technology - PRISM (Protein Reconstruction and Identification through Multiplexing), a platform that combines viral barcoding, expansion microscopy, and iterative immunolabeling for large-scale neuronal reconstruction. Neurons were labeled with combinatorial “protein bits” that act as barcodes to distinguish individual cells and support error-correction during reconstruction.
Read more about the approach in the paper and accompanying blog post
This is a simple tutorial for downloading and running the models used for neuron segmentation and synapse detection. This tutorial is currently pretty minimal, but will be extended/improved in the coming weeks. Additionally, all experimental code (including post-processing and analysis) will be released in a separate repository.
We uploaded data to a publically accessible s3 bucket via aws open data. More details on the bucket contents can be seen in this repository
The reconstruction pipeline consists of 5 models:
- barcode signal enhancement
- affinities + LSDs
- uniform embedding
- barcode expression
- synapse detection
This tutorial uses several different libraries for training/predicting/visualizing data including:
We highly recommend using a package manager.
conda,virtualenv, oruvare all good examples. The instructions are created assuming usage ofuv. Here are the installation instructions.Tested on ubuntu 22.04 with an a6000 gpu. Assumes basic python an ML knowledge. For some useful tutorial with affinity/lsd models see this repo
-
clone this repo:
git clone https://github.com/e11bio/prism_training.git cd prism_training/prism_training -
download and consolidate example data
cd data # from script directory uv run download_data.py uv run consolidate_data.py cd ../ # revert to script directory (optional) -
predict enhanced data (takes about 10 minutes on NVIDIA RTX 6000 gpu)
cd train/enhanced # from script directory uv run predict.py cd ../../ # revert to script directory (optional) cp -r prism_training/data/instance/example_data.zarr/enhanced prism_training/data/semantic/example_data.zarr/enhanced -
run any of the other models via
uv run train.py. Some models take arguments, please read the individual README's.
- Example training from scratch for 10 iters:
python train.py -i 10
Since we by default compute the difference between the average barcodes and the raw data as our target signal, a batch might look like:
Might have to tweak the shader a bit to see the target since it can contain negative values. The black pixels around the object denote the sparsely masked label for training (pixels outside of this label do not contribute to the loss). No need to visualize the predictions yet since this is from scratch so they will be uninformative.
- Example training from scratch and learning the direct average barcodes rather than residuals:
python train.py -d false
A batch might then look like:
Which is a bit more visually intuitive.
- Example training from downloaded checkpoint:
python train.py -c model
Now we can visualize the predictions (residual barcode), and we can visualize the predicted average barcodes (simply adding the residual to the raw data). A batch might then look like:
If we then run inference, i.e python predict.py and visualize the raw vs enhanced, we could see something like:
This is using a more fancy custom shader in which each channel is percentile normalized first.
Example training from model using raw input: python train.py -d raw -c model
A batch might look like:
The predictions are kind of noisy since the raw data is used as input.
Assuming we ran enhancement inference above, example using enhanced input: python train.py -d enhanced -c model
Which might give us something cleaner like:
Example training with uv run train.py. If the model weights have been downloaded and are available, they will be
used. Otherwise training will start from scratch.
By default we will only train a single iteration for illustration purposes, but feel free to increase the NUM_ITERATIONS
variable as high as you want.
This is an example of what the first batch may look like when starting from the provided checkpoint:
Note that the purpose of the uniform embedding is to encode the barcodes in a space where computing the distance between two pixels is easy, the benefit of this embedding is largely hidden by visualizing the initial raw data with a PCA projection, since this is also a method for extracting the basis vectors of maximum variation and then displaying them. Without any appropriate normalization the raw data can be very hard to see.
Example training with uv run train.py. The same setup as the uniform model, will start with the provided model if it
is available, but will otherwise train from scratch.
Here is an example of the first batch assuming the checkpoint is available.
Example training from model: python train.py -c model
A batch might look like:








