CGDesigner
is a python based tool for designing and analyzing conditional guide RNAs with NUPACK. Details about this work can be found in my thesis. This library and python executable is organized into the following submodules:
design
contains the commands to generate nupack design specifications which are used as input in the checkpoint module
checkpoint
contains the commands to run the nupack designs locally and generate cgRNA sequences
ftp
contains the commands to transfer files to and from the s3 file server
unpack
unpacks nupack design output and data fields into csv format
kube
contains the commands to send jobs to the kubernetes computing cluster
analysis
contains the commands to filter and perform test tube analysis on cgRNA sequences
oligos
contains commands to generate oligos for cloning
This package requires nupack
to be installed on the server. This package can be found at nupack.org.
Run the following commands to clone and install this tool.
git clone https://github.com/zchen15/CGDesigner
cd CGDesigner
pip3 install .
The tests/
directory contains many example scripts for generating conditional guide RNAs. The following are a few useful excerpts to help you start off.
design
is used to generate nupack design specifications which can be run locally or on the kubernetes cluster. The following are a few examples designs specifications that can be generated.
# Get help on design options
CGDesigner design -h
# Generate 4 orthogonal universal trigger RNAs of length 20nt
CGDesigner design -material rna -s rna_trig -N 4 -o rnatrig -d 20 -fstop 0.01
# Generate reverse toehold switch cgRNAs targeting trigger sequences in mtrig.csv
CGDesigner design -material rna -s reverse_toehold_switch -gin data/mtrig.csv -o ts45_mtrig -d 10 20 10 3 10 3 3 6 -fstop 0.01 -scan 50 10
# Generate terminator switch cgRNAs targeting trigger sequences in mtrig.csv
CGDesigner design -material rna -s terminator_switch -gin data/mtrig.csv -o ts32_mtrig -d 10 20 6 0 7 3 1 -fstop 0.01 -scan 100 100
-o
defines the output file prefix
-scan
generates designs targeting 50nt sequences at stride 10nt for the sequences in mtrig.csv
-d
defines the domain dimensions to use which vary from design to design
-gin
defines the trigger RNA input to design against, if none are provided then trigger sequenes are considered unconstrained
-fstop
defines the threshold ensemble defect to reach for design to stop
-material
defines the thermodynamic model to use
-s
defines the test_tube formula to use. These are functions listed in formula.py
Details about these designs can be found in my thesis. New design formulas can be added to CGDesigner/formula.py
and called natively from CGDesigner via -s
.
Design jobs can be run locally by providing design .spec
files as input. These are equivalent to design checkpoints that can be restarted anytime a job fails.
# Get help on this submodule
CGDesigner checkpoint -h
# Runs designs with trials = 4 and checkpoint interval = 30
CGDesigner -c data/npbop.json checkpoint -i *.spec -trials 4 -cint 30
-trials
defines number of jobs to running in parallel
-cint
defines the checkpoint interval in seconds
-c
defines the path to the config file which has other default run parameters
The following example shows how design jobs can be submitted to the kubernetes cluster using the kube
submodule. Make sure you are using the same version of NUPACK as that installed on the cluster otherwise the design .spec
files will fail to load. This submodule can also be used to submit and run bash scripts to the docker container.
# get help on this submodule
CGDesigner kube -h
# submits design jobs to the cluster
CGDesigner -c data/npbop.json kube -i *.spec
# submits bash script to run on the cluster
CGDesigner -c data/npbop.json kube -i test.sh
# lists current running jobs
CGDesigner -c data/npbop.json kube -ls '*'
# lists current running pods
CGDesigner -c data/npbop.json kube -ls '*' -pods
# removes jobs matching keyword ts45*
CGDesigner -c data/npbop.json kube -rm 'ts45*'
# removes pods matching keyword ts45*
CGDesigner -c data/npbop.json kube -i *.spec -pods
# clears completed or failed pods and jobs
CGDesigner -c data/npbop.json kube -clear
CGDesigner -c data/npbop.json kube -clear -pods
-c
defines the path to the config file. This files contains information about s3 and kubernetes cluster credentials.
The following shows how to download results from the s3 file server using the ftp
submodule.
# get help on this submodule
CGDesigner ftp -h
# get list of ts45 results
CGDesigner -c data/npbop.json ftp -ls 'ts45*.csv'
# Download ts45 results
CGDesigner -c data/npbop.json ftp -get 'ts45*.csv'
# Remove ts45 results
CGDesigner -c data/npbop.json ftp -rm 'ts45*.csv'
The following example shows how to filter and analyze cgRNA sequences with the analysis
submodule. These operations are run locally, but can be submitted to the cluster as a bash script.
# download results and filter by prediction
CGDesigner -c ../data/npbop.json ftp -get 'ts45_mRNA_1*.csv'
# note -noterm strips the terminator sequence
CGDesigner analysis -i ts45_mRNA_0*.csv -o strands.csv -m remove_homopolymers -noterm
# select designs targeting the last 400 bp of the PAX7 gene
# -s selects for Strands with keyword '*t1*'
CGDesigner analysis -i strands.csv -o strands.csv -m add_position -r ../data/PAX7_400.csv -s '*t1*'
CGDesigner analysis -i strands.csv -o strands.csv -m add_position -r ../data/PAX7.csv -s '*t1*'
# filter by test tube specifications to make sure they still work
echo 'running test tube analysis on designs'
CGDesigner -v analysis -material rna -i strands.csv -o strands.csv -m filter_ts45 -r ../data/PAX7.csv
# filter for best 100 designs based on external prediction score
echo 'filtering by external predictor'
CGDesigner analysis -i strands.csv -o strands.csv -m add_prediction -r ../data/prediction.csv
CGDesigner analysis -i strands.csv -o strands.csv -m get_n_best -d 100 -30 -r 'prediction'
# filter for best 100 designs based on design defect
echo 'filtering by design defect'
CGDesigner analysis -i strands.csv -o strands.csv -m parse_defect
CGDesigner analysis -i strands.csv -o strands.csv -m get_n_best -d 100 -20 -r 'defect'
# get the 20 best designs based on external prediction score
echo 'filter for best 20 designs based on external predictor'
CGDesigner analysis -material rna -i strands.csv -o strands.csv -m get_n_best -d 16 -20 -r 'prediction'
The following shows how to generate oligos with BsaI golden gate sites padded to 300nt. The output can be uploaded to Twist or IDT to obtain gene block or oligo pool order POs.
echo 'generating oligos for the designs'
CGDesigner oligos -noterm -m twist_outer -g "tatatagGGTCTCcCACA " " CTTTgGAGACCctatata" -s "*g1*0*" -pad 300 -i strands.csv -o oligos.csv
-o
defines output filename. Here it is written to oligos.csv
-g
defines the golden gate sites or flanking sequences
-pad
defines the padding length. Here all strands are made to be at least 300nt
This following shows how to perform test tube analysis on cgRNA sequences or sub-sequences of mRNAs to characterize off-target effects via thermodynamics. Here is pax7 sequences are provided on input and divided up as a hairpin of 10nt toehold, 10nt stem at a stride of 10nt. These are analyzed against PAX7 sequences cut up into 100nt sequences with a stride of 20nt. off_target_analysis
is used to perform test tube analysis. off_target_score
sums up the concentrations along their respective strand labels in PAX7.csv. This is used to generate bar graphs of off-targetedness.
CGDesigner analysis -material rna -m off_target_analysis -i data/PAX7.csv -r data/mRNA.csv -o mscan.csv -d 10 10 10 100 20
CGDesigner analysis -m off_target_score -i mscan.csv -o mscan_score.csv
The docker container can be rebuilt and added to the kubernetes cluster by running bash docker_build.sh
. This will update the container with new code you added to this directory.
If you experience any issues with the code, please post them on the issues section along with the log file. I will monitor this periodically and try to fix issues as they arise.
If you use CGDesigner
in a publication, please cite:
Hanewich-Hollatz MH, Chen Z, Hochrein LM, Huang J, Pierce NA. Conditional Guide RNAs: Programmable Conditional Regulation of CRISPR/Cas Function in Bacterial and Mammalian Cells via Dynamic RNA Nanotechnology. ACS Cent Sci. 2019 Jul 24;5(7):1241-1249. doi: 10.1021/acscentsci.9b00340.