Edits made by Emily Jane Dennis (ejdennis@princeton) with significant help from Zahra (zmd@princeton). Forked from PrincetonUniversity/BrainPipe by Tom Pisano, Zahra Dhanerawala, John D'Uva. It also includes modified scripts from ClearMapCluster and lightsheet_helper_scripts written by the same people. ClearMap Cluster is T. Pisano's parallelization to a cluster of C. Kirst's ClearMap software (https://idisco.info/clearmap/) for use on a cluster using a slurm based scheduler. Written for Python 3.7+. Modifications by Zahra.
Windows 10 install notes added by Adrian Bondy.
Includes three-dimensional CNN with a U-Net architecture (Gornet et al., 2019; K. Lee, Zung, Li, Jain, & Sebastian Seung, 2017) with added packages developed by Kisuk Lee (Massachusetts Institute of Technology), Nick Turner (Princeton University), James Gornet (Columbia University), and Kannan Umadevi Venkatarju (Cold Spring Harbor Laboratories).
This is a work in progress. Emily has largely updated scripts and README, but some documentation is not fully fleshed out.
DataProvider3
PyTorchUtils
Augmentor
DataTools
- Note that this currently has only been tested on Linux (Ubuntu 16 and 18). SimpleElastix installation has also been tested in Windows 10 by Adrian.
- If on a cluster - Elastix needs to be compiled on the cluster - this was challenging for IT here and suspect it will be for your IT as well. If at Princeton, elastix is on spock
- use anaconda
- when in the main rat_BrainPipe directory, in a terminal type
conda env create -f environment.yaml
If on the cluster, and typing which terastitcher can't find terastitcher, try adding the following to your path
export PATH="/usr/people/pnilsadmin/TeraStitcher-Qt4-standalone-1.10.11-Linux/bin$PATH"
If on a local Ubuntu machine also install elastix (see section below), xvfb, Terastitcher, and make sure you have all boost libraries installed for DataTools:
$ sudo apt-get install xvfb
$ sudo apt-get install libboost-all-dev
For Windows, Follow the instructions on read the docs. Use the command line step (step 3) and skip the IDE steps 4 and 5. If the Python wrapping fails, see this known issue.
To install elastix on linux, follow the instructions in the manual under the easy way, not the "super easy" way
if you use the 'easy way' but have a modern computer, your gcc version may be too high. For this, you'll need at least ITK 5.0 which means you need to use elastix version 5, not 4.8. The following worked on Ubuntu 18 with two GeForce RTX 2070 SUPERs.
-
made two dirs: ITK-build and elastix-build
-
added ITKSNap 5 file:///tmp/mozilla_emilyjanedennis0/InsightSoftwareGuide-Book1-5.1.0.pdf extracted downloaded .tar.gz and .tar.bz2 files to those directories in ITK-build, typed
cmake ITK-build
thensudo make install
in elastix-build,cmake elastix-build
failed, so I went into the folder and found the CMakeFiles.txtcd elastixfolder nano CMakeLists.txt
and addedlist( APPEND CMAKE_PREFIX_PATH "/home/emilyjanedennis/Desktop/Apps/ITK-build/" )
note: I had to remove line 76 from
elastix-5.0.0/Components/Resamplers/MyStandardResampler/elxMyStandardResampler
which referred to an undefined type called PointType which was throwing an exception during the make install process for elastix
Download TeraStitcher-installer. Move file to wherever you want Terastitcher to live, cd into that directory, and then:
$ bash TeraStitcher-Qt4-standalone-1.10.18-Linux
- Modify Path in ~/.bashrc:
export PATH="<path/to/software>/TeraStitcher-Qt4-standalone-1.10.18-Linux/bin:$PATH"
- Check to see if successful
$ which terastitcher
Navigate to tools/conv_net
and clone the necessary C++ extension scripts for working with DataProvider3:
$ git clone https://github.com/torms3/DataTools.git
Go to the dataprovider3, DataTools, and augmentor directories in tools/conv_net
and run (for each directory):
$ cd tools/conv_net/dataprovider3
$ python setup.py install
$ cd ../DataTools
$ python setup.py install
$ cd ../augmentor
$ python setup.py install
- If not, change .sh in main folder, /slurm_scripts in the main rat_BrainPipe, and ClearMapCluster folders like sub_main_tracing.sh file, e.g.
module load anacondapy/5.3.1
module load elastix/4.8
. activate ligthsheet
- Check/change the resource allocations and email alerts at the top of each .sh file based on cluster and run_tracing.py settings
- Add your paths for BOTH the cluster and local machinery
-
Make test sets Emily made a document with explicit instructions and examples which can be found here
-
Find parameters to use
-
Train network
-
Validate and Benchmark Network
- Detailed instructions on steps 1-3 can be found here
-
Run inference on whole brains MAKE SURE YOU HAVE CHECKED that the model params are the same in run_chnk_fwd as they were when you trained the network!
- on spock, make input and patched arrays
- in
run_tracing.py
edit the inputdirectory, outputdirectory, and check the atlas and annotation files are correct.- MAKE SURE you have already changed
directorydeterminer
intools/utils
and run step0 - check the header for
sub_main_tracing_cnn.sh
and make sure for step1, the array job number matches your zplanes. There are 50 jobs/array, so --array=0-12 means 13 sets of 50 (650) zplanes will be run. - submit job to cluster using
sbatch --array=0 sub_main_tracing_cnn.sh
- when finished, use Globus to transfer files to tigress. You only need to transfer the:
- cnn_param_dict.csv
- input_chnks folder
- lightsheet folder
- LogFile.txt
- output_chnks (an empty folder that will be filled on tigress)
- param_dict.p
- NOTE you actually probably don't need all of these and only need input_chnks and output_chnks but the others are small and used a bunch of places.
- on tigress, go into ratname/lightsheet and check that
tools/conv_net/pytorchutils/run_chunked_fwd.py
has the correct paths for tigress. Also check thattools/conv_net/pytorchutils/slurm_scripts/run_chnk_fwd.sh
uses the correct model/checkpoint - cd into
tools/conv_net/
and run withsbatch --array=0 slurm_scripts/run_chnk_fwd.sh
- when finished, use Globus to transfer back to spock or
scp -r <username>@tigergpu.princeton.edu:/<folder-location> <username>@spock.princeton.edu:/<folder-location>
- run
cnn_postprocess.sh
which reconstructs and uses connected components to find cell measures
- MAKE SURE you have already changed
- in
-
Visualize network outputs - documentation in progress
-
To compare to ClearMap See next section, has steps on how to run ClearMap 1, visualize, and compare ouputs.
NOTE there is a demo for the CNN to test if your paths are correct and everything is installed properly, see end of document
ClearMap2 is a git submodule. to learn more aboud submodules, see here
To make that code 'appear' in the repo use
git submodule update --init --recursive
to update it
git submodule update --recursive --remote
the scripts you need are in rat_BrainPipe/cm2
- if you're using volumes imaged on the LaVision, these instructions should work for you as-is. If you're using smartSPIM volumes, you'll also need to use rename_smartspimZ_for_cm2.py
, editing lines 16-22 to reflect your paths as commented
- edit line 5 in cell_detect.py
to wherever your rat_BrainPipe is stored
- e.g. sys.path.append('/scratch/ejdennis/rat_BrainPipe/ClearMap2')
- edit lines 12-14 of cell_detect.sh
as commented there
- follow the example run comments at the end of cell_detect.sh
repeated here for ease of use
# example run:
# use rename_for_clearmap.py on head node # change folder name # from rat_BrainPipe/ClearMap2/ClearMap/Scripts # sbatch cell_detect.sh
NOTE there is a new ClearMap! These instructions use the old ClearMap 1 0. Make test sets
- If you have already made test sets to train a CNN, use those.
- Find parameters
- See
ClearMapCluster/ parameter_sweep.ipynb
-
Run ClearMap on whole brains ClearMap2 came out in the middle of this process. For us, it worked out of the box with really minor changes, and using the same parameters from parameter_sweep above, gave the same results. The script Emily used to analyze her brains can be found under ClearMapCluster > ClearMap2 > ClearMap > Scripts > CellDetect_custom.py (the complicated file structure is just mimicing the ClearMap2 repo that can be downloaded here)
NOTE that you'll need to change the ClearMap2 Settings file to reflect your local machine.
-
Visualize outputs documentation in progress
-
Compare to CNN (if desired) documentation in progress
- on training data
- on whole brains
If there are errors in these steps, usually it's
1. regex needs to be edited
2. elastix isn't installed properly (try which elastix
) or is missing from bashrc
3. terastitcher isn't installed properly (try which terastitcher
) or is missing from bashrc
things to do before running
- edit run_tracing.py to poit to the appropriate directories and use the correct parameters for your imaging. especially pay attention to:
- systemdirectory
- if you haven't edited directorydeterminer() appropriately, nothing will work
- inputdictionary
- point to the raw images file, they should be in the format like
10-50-16_UltraII_raw_RawDataStack[00 x 00]_C00_xyz-Table Z0606_UltraII Filter0000.ome.tif
- if the format of your images differs, you'll need to edit the regex expression called in run_tracing (find it in tools/imageprocessing/preprocessing under
def regex_determiner
) - if you have more than one channel imaged in this folder, add channels. Examples:
- one channel in the folder:
[["regch", "00"]]
- two chhannels in the folder
[["regch", "00"], ["cellch","01"]]
- one channel in the folder:
- point to the raw images file, they should be in the format like
- under params edit:
- outputdirectory
- give the path where you want things saved. I usually make it the animal's name (e.g. E001) and store temporarily in scratch
scratch/ejdennis/e001
or more permanently/LightSheetData/brodyatlas/processed/e001
- xyz_scale
- usually either (5,5,10) for data collected with the 1.3x objective or (1.63,1.63,10) if collected with the 4X objective
- stitchingmethod
- I keep this as "terastitcher"
- AtlasFile
- point to the file you want to register the brain to - usually this will be either the Waxholm MRI atlas
/jukebox/LightSheetData/brodyatlas/atlas/for_registration_to_lightsheet/WHS_SD_rat_T2star_v1.01_atlas.tif
or our atlas
- point to the file you want to register the brain to - usually this will be either the Waxholm MRI atlas
- annotationfile
- the annotation file that describes the above brain, e.g.
/jukebox/LightSheetData/brodyatlas/atlas/for_registration_to_lightsheet/WHS_SD_rat_atlas_v3_annotation.tif
- the annotation file that describes the above brain, e.g.
- resizefactor
- usually 5 for 4x images, 3 for 1.3x images
- slurmjobfactor
- we keep this at 50, it's the number of files processed in step1 of run_tracing and slurm_files/step1.sh
- transfertype
- MAKE SURE THIS IS "copy" or else your data may get deleted, there are backups, but it's really annoying. Avoid at all costs, you can always clean up and delete things later.
- you'll also want to check that in the main run the systemdirectory points to your use case. For example, my scripts see if I'm running locally ("/home/emilyjanedennis/")
- outputdirectory
- give the path where you want things saved. I usually make it the animal's name (e.g. E001) and store temporarily in scratch
to run
if on spock, after editing run_tracing.py, go to your rat_BrainPipe folder, and run
sbatch slurm_scripts/step0.sh
then go to your outputdirectory (e.g. /scratch/ejdennis/e001) that you specified in run_tracing.py
cd /scratch/ejdennis/e001
there should now be a directory called lightsheet, go there
cd lightsheet
run the pipeline from this folder (this is useful because it allows you to keep track of the exact parameters you used to run, and also parallelize by launching jobs for different brains at once on the cluster)
sbatch sub_registration.sh
That's all! If everything runs successfully, you'll end up with a param_dict.p, LogFile.txt, two new resized tiffstacks, a mostly empty folder called injections, a folder called full_sizedatafld with the individual stitched z planes for each channel, and an elastix folder with the brains warped to the atlas defined in run_tracing.py AtlasFile
If you have a group of stitched brains (e.g. e001/full_sizedatafld, e002/full_sizedatafld, and e003/full_sizedatafld), you can make an average brain for an atlas. Our rat atlas is for our lab, and therefore is made of only male (defined operationally by external genitalia) brains. However, we wanted to test our results and publish including female (similarly operationally defined) brains. Therefore we perfused, cleared, and imaged three female brains and created an atlas.
To make your own atlas, use the rat_atlas
folder.
- Edit
mk_ls_atl.sh
amdcmpl_atl.sh
to use your preferred slurm SBATCH headings - Edit
step1_make_atlas_from_lightsheet_images
- edit sys.path.append in the import section to reference your rat_BrainPipe cloned git repo - main section variables:- src should be where your folders (typically named by animal name, e.g. e001) live, these folders should each have full_sizedatafld folders in them
- dst - where you want to save things. If you have a nested structure, make sure the parent structure exists (e.g.if you want to save in /jukebox/scratch/ejdennis/female_atlas/volumes, make sure /jukebox/scratch/ejdennis/female_atlas already exists)
- brains should be the list of names of the brains you wish to use, corresponding to the names of the folders in dst that you want to average
- Run
sbatch --array=0-2 mk_ls_atl.sh
for three brains, --array=0-9 for 10 brains, etc. - Edit
step2_compie_atlas.py
- edit sys.path.append in the import section to reference your rat_BrainPipe cloned git repo - edit main section variables:- src - should be the same as in step2
- brains - should be the same as in step2
- output_fld - should be a different folder than in step2: I like to place them in the same parent folder
- parameterfld - this should point to a folder containing the affine/bspline transforms you want to use
- Run
sbatch --array=0-2 cmpl_atl.sh
for three brains, --array=0-9 for 10 brains, etc. - Edit
step3_make_median.py
- edit the sys.path.append in the import section - in main, edit variables to match step2_compile_atlas - Either locally or on the cluster head node (module load anacondapy/5.3.1), use export SLURM_ARRAY_TASK_ID=0, activate the lightsheet conda environment, and run
step3_make_median.py
if you have already "Made a stitched whole-brain", you may already have your brain in atlas space, depending on what you specified as the AtlasFile. If you have a tiff stack and you want to register it to an atlas file, you can use elastix_to_pra.py
- change the mv to be your "moving image" (the brain tiffstack) and fx to your "fixed image" (the atlas volume)
- change the output directory to where you want your elastix files and newly aligned tiff saved
- change the outputfilename - this will be a resized mv file that is 140% the size of fx and is what is actually used for the alignment
One use of this can be to take an existing atlas that has parcellations of the brain and put it into our atlas space. This type of 'layer' can be visualized in Neuroglancer (see below) and also can be used to, for example, ID cell centers in different brain regions after tracing.
To do this
- get your atlas into sagittal, tiff format
- Run step3 -
- visualize warping procedure
- identify locations on images and transform to bregma coordinates
documentation in progress
These files are the main files used for lightsheet manipulations required for all volumes collected- things like stitching lightsheets and tiles.
- cell_detect.py
cell_detect.py
:.py
file to be used to manage the parallelization of CNN preprocessing to a SLURM cluster- params need to be changed per cohort.
- see the tutorial for more info.
- run_tracing.py
- sub_main_tracing.sh
sub_main_tracing.sh
:.sh
file to be used to submit to a slurm scheduler- this can change depending on scheduler+cluster but generally batch structure requires 2 variables to pass to
run_tracing.py
ANDcell_detect.py
:stepid
= controlling which 'step' to runjobid
= controlling which the jobid (iteration) of each step
- Steps:
0
: set up dictionary and save; requires a single job (jobid=0)1
: process (stitch, resize) zplns, ensure that 1000 > zplns/slurmfactor. typically submit 80 jobs for LBVT (jobid=0-80).2
: resample and combine; typically submit 3 jobs (requires 1 job/channel; jobid=0-3)3
: registration via elastixcnn_preprocess.sh
- sub_registration.sh
- sub_registration_terastitcher.sh
- sub_update_registration.sh
slurm_files
- in all, check the headers, and make sure you have the correct array numbers, partition, etc.
parameter folder
- This contains the four parameter files for elastix warping: one affine and three bspline. This is different for mice (which only only uses two: one affline and one bspline)
tools
- __init__.py we use for initializing
- analysis: These are mostly for mouse work but could be useful in the future as guides for things to do, so keeping them
- expression_mask: Additional useful plotting tools, though made for mice, keeping for now
- imageprocessing: Useful plotting and processing tools for individual volumes, keeping for now
- objectdetection
- detection_qc.py is useful for plotting cell centers onto volumes for quality control and nice plots
- find_injection.py is useful for finding contours of a probe/fiber covered in CM DiI
- inj_detect.py Similar to find_injection, probably don’t need both, uses connected components analysis
- registration
- steps to register volumes, transform coordinates
- utils: lots of scripts live here that are used lots of places!
- tools: convert 3D STP stack to 2D representation based on colouring
- imageprocessing:
preprocessing.py
: functions use to preprocess, stitch, 2d cell detect, and save light sheet images
- analysis:
allen_structure_json_to_pandas.py
: simple function used to generate atlas list of structures in coordinate space- other functions useful when comparing multiple brains that have been processed using the pipeline
- imageprocessing:
conv_net this houses all the important CNN files! - augmentor - dataprovider3 - DataTools - pytorchutils - slurm_scripts - balance.py - demo.py - forward.py - layers.py - losee.py - run_chnk_fwd.py - run_exp.py - run_fwd.py - train.py
logs
- file
- clearmap_sweep_for_comp
- parameter_sweep.ipynb
- README.md
- run_clearmap_cluster.py
- run_parameter_sweep.py
- spot_detection_on_annotated_volumes
- sub_clearmap_cluster.sh
ClearMap
- copy of ClearMap with some edits as outlined in the ClearMapCluster README.
- parameter_file.py
- this you will need to change
docs
logs
- this is where your slurm output logs will go. It also has a sub-folder called array_jobs where slurm array job err/out messages will live.
slurmfiles this is where most of the slurm files live.
- demo script to run training and large-scale inference
- useful to make sure the environment and modules are imported correctly
-
if working with a slurm-based scheduler: 1. run
sbatch run_demo.sh
within the tools/conv_net- make sure you have "lightsheet" environment set up before running
-
else, navigate to tools/conv_net; in the terminal, in the lightsheet environment, run:
$ python setup_demo_script.py
$ cd pytorchutils/
$ python demo.py demo models/RSUNet.py samplers/demo_sampler.py augmentors/flip_rotate.py 10 --batch_sz 1 --nobn --noeval --tag demo
- output will be in a 'tools/conv_net/demo/cnn_output' subfolder (as a TIFF)