Program for identifying exclusive endogenous gRNA sites and creating unique synthetic gRNA sites.
Features • Requirements • Installing • Usage • Aligners • Thermodynamics • Algorithms • Citing • Contributing
Basic Features:
- Supports both direct (1-step) and indirect (2-step) genome editing through CRISPR/Cas-induced homology-directed repair (HDR).
- Analyzes arbitrary genomic DNA (gDNA).
- Uses an intuitive syntax to locate RNA-guided nuclease () cut sites (s) within a locus of interest ().
- Fully supports ambiguous bases (
RYMKWSBDHVN
) in or . - Accepts 3'-adjacent sequences, such as Cas9 (
>NGG
). - Accepts 5'-adjacent sequences, such as Cas12a (
TTTN<
). - Supports arbitrary length and composition constraints, such as for plant experiments (
G{,2}N{19,20}
). - Supports arbitrary sequences (MAD7:
YTTN<
, Cas12d:TA<
, BlCas9:>NGGNCNDD
, etc). - Supports any number of stranded forward (
/
), reverse (\
) and unstranded (|
) cut sites. - Supports sequences defined by complex nested logic, such as xCas9 (
>(N{1,2}G,GAW,CAA)
)
- Fully supports ambiguous bases (
- Simultaneously calculates any number of on-target and off-target scores (see Algorithms).
- Searches for s using selectable pairwise alignment program (see Aligners).
- Generates exogenous, donor DNA () sequences for modifying the same locus successively.
- Engineers a single set of verification PCR (vPCR) s for assessing genome editing.
- Performs in silico recombination between gDNA and s to predict the genome sequences after editing.
- Same s work for all genotypes (reference, intermediary, and add-back)
- Positive amplification shows if was edited correctly.
- A different, positive amplification shows if was edited incorrectly.
- Determines thermodynamic properties of pairs (Tm, minimum ΔG, amplicon size, etc).
- Uses a genetic algorithm to select sequences that have compatible properties, so they can be run in parallel with the same thermal cycler conditions.
- Facilitates ploidy-aware editing (multi-allelic, allele-specific, and allele-agnostic).
- Contains the most-complete index of all known s for s.
Processor:
- ≥ 4 cores, ≥ 3 GHz
Computations scale fairly linearly, so the more computational cores you can assign to the task, the faster it will go.
Memory:
See Notes for tips on memory optimization.
Below are lists AddTag requirements. Each entry is marked with a 🗹 or ☐, indicating whether or not an additional download/setup is required:
- All requirements included in AddTag
- Additional download/setup required
For tips on setting up AddTag requirements, please review the commands in the .azure-pipelines.yml
file.
Base operation of AddTag requires the following:
-
Python ≥ 3.5.1 (source, binaries, documentation)
-
regex Python module (source, whls, documentation)
Certain optional AddTag functionality (version information, and software updates) depends on the following:
- Git ≥ 1.7.1 (source, binaries, documentation)
One pairwise sequence aligner is required:
-
BLAST+ ≥ 2.6.0 (source, binaries, documentation)
-
Bowtie 2 ≥ 2.3.4.1 (source, binaries, documentation)
-
BWA ≥ 0.7.12 (source, ugene binaries, bioconda binaries, documentation)
-
Cas-OFFinder ≥ 2.4 (source, binaries, documentation)
For polymorphism-aware expansion (using the --homologs
option), one multiple sequence aligner is required:
- MAFFT (source, binaries, documentation)
For oligo design, AddTag requires one of the following third-party thermodynamics solutions to be installed:
-
UNAFold ≥ 3.8 (source, documentation) with patch440
-
primer3-py Python module (source, whls, documentation)
-
ViennaRNA Python module (source, official binaries, bioconda binaries, documentation)
The following scoring algorithms are subclasses of SingleSequenceAlgorithm
.
-
Azimuth (Doench, Fusi, et al (2016))
note: Either Azimuth 2 or Azimuth 3 can be used to calculate Azimuth scores. There is no need to have both installed.
-
Azimuth 3 Python module (source, documentation)
note: requires specific versions of numpy, scikit-learn, and pandas. Other dependencies include click, biopython, scipy, GPy, hyperopt, paramz, theanets, glmnet_py, dill, matplotlib, pytz, python-dateutil, six, tqdm, future, networkx, pymongo, decorator, downhill, theano, nose-parameterized, joblib, kiwisolver, cycler, pyparsing, setuptools, glmnet-py.
-
Azimuth 2 Python module (source, documentation) on 2.7.10 ≤ Python < 3.0.0 (source, binaries, documentation)
note: requires python-tk to be installed. Also requires specific versions of scipy, numpy, matplotlib, nose, scikit-learn, pandas, biopython, pyparsing, cycler, six, pytz, python-dateutil, functools32, subprocess32.
-
-
CINDEL/DeepCpf1 (Kim, Song, et al (2016), Kim, Song, et al (2018))
note: Requires both Keras and Theano Python modules.
- Keras Python module (source, whls, documentation)
- Theano Python module (source, whls, documentation)
-
Doench-2014 (Doench, et al (2014))
-
Housden (Housden, et al (2015))
-
Moreno-Mateos (Moreno-Mateos, et al (2015))
-
CRISPRater (Labuhn, et al. (2018))
-
GC (Wang, et al (2014))
-
Homopolymer (Hough, et al. (2017))
-
ProximalG
-
PolyT
-
PAM Identity
-
Position
The following scoring algorithms are subclasses of PairedSequenceAlgorithm
.
-
Substitutions, Insertions, Deletions, Errors (Needleman, Wunsch (1970))
-
Hsu-Zhang (Hsu, et al (2013))
-
Linear
There are several standard ways to make modules available to your Python installation. The easy way to install a package this is through pip
.
For example, the following code will download and setup the regex
package from PYPI into your default Python installation.
pip install regex
If you want to make the module available to a specific Python installation, use a command like this:
/path/to/python -m pip install regex
Often, the package is not available on PYPI, or you need a development version. In these cases, you can direct pip
to download and setup a package from a code repository. The easiest way to install it and take care of all dependencies is to use pip
, assuming git
is available in the PATH
environmental variable. Here is how to install the Azimuth
package from GitHub.
pip2.7 install git+https://github.com/MicrosoftResearch/Azimuth.git
Some Python packages are available through bioconda. To install viennarna
using conda
, use this command:
conda install -c bioconda viennarna
You can download the latest version of AddTag over HTTPS using git
with the following command.
git clone https://github.com/tdseher/addtag-project.git
This will download AddTag into a folder called addtag-project/
in your current working directory. Go ahead and change the working directory into the AddTag folder.
cd addtag-project/
git
should automatically make the addtag
program executable. If it does not, you can use the following command to do it.
chmod +x addtag
To make the AddTag executable accessible from any working directory, you can add the absolute path of the current working directory to the PATH
variable.
On Windows, run:
set PATH=%PATH%;%CD%
On Linux or macOS, run:
export PATH=$PATH:$PWD
If you run AddTag with no parameters, you should get the following output:
usage: addtag [-h] [-v] action ...
One way to obtain AddTag is by downloading and extracting the code directly from GitHub:
wget https://github.com/tdseher/addtag-project/archive/master.zip
unzip master.zip
cd addtag-project-master/
If you try running addtag
, you will get a message similar to the following:
./addtag
fatal: Not a git repository (or any parent up to mount point /media/sf_VirtualBox_share) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
This message means that the AddTag directory isn't a valid git
repository (it is missing the .git
subfolder).
As a consequence, the version information will not be accessible.
./addtag --version
addtag missing (revision missing)
To fix this, simply ensure git
is installed and available in the PATH
environment variable
(See Software prerequisites), and run the following:
./addtag update
Now, when you run addtag
, you should not receive the warnings, and the version field will be populated.
./addtag --version
addtag 9e8748b (revision 460)
The commands in this section assume the working directory is the AddTag folder.
cd addtag-project/
If you would like to update your local copy to the newest version available, use the following command from within the addtag-project/
directory.
./addtag update
If you want the newest version, but you made changes to the source code, then you can first discard your changes, and then update. Use the following command from inside the addtag-project/
folder.
./addtag update --discard_local_changes
Alternatively, if you want to keep the local modifications, you can use the --keep_local_changes
option to stash, pull, then reapply them afterwards.
./addtag update --keep_local_changes
Each one of these commands assumes git
is available on the PATH
environment variable.
Click to expand/collapse
Because AddTag is being updated regularly, the most current feature set and usage can be viewed by running AddTag with the The following commands assume the current working directory is the AddTag folder ./addtag --help Additionally, you may view the included man page, which is probably not up-to-date. man ./addtag.1 |
Click to expand/collapse
AddTag requires a FASTA genome of the organism you wish to manipulate. FASTA files resemble the following:
FASTA files are plain text files that use newline ( Typically, the DNA sequence information in FASTA files are list a bunch of canonical nucletide abbreviations ( AddTag requires a GFF file containing annotations for the Features you wish to manipulate (technical specifications of GFF format). GFF files resemble the following:
GFF files describe the contig locations of important genomic Features. Empty lines and lines that begin with the pound ( Typical AddTag analyses require at least one GFF file. AddTag can handle GFF files in two ways.
Often, you will have a GFF file with annotations for the entire genome. The addtag find_feature --gff genome.fasta --query HSP90 --linked_tags Name Alias Parent Gene --header > features.gff The Target motif is written from 5' to 3'. Use a greater than ( You can specify any number of Target motifs to be considered 'on-target' using the To see an exhaustive list of all identified Target motifs for each known , run the following command: addtag list_motifs Some researchers are lucky enough to get to work on organisms with phased genomes. This means that full haplotype information is known for each chromosome. AddTag can accommodate haploid, diploid, and polyploid genomes when homologous Features are linked by the addition of the
Each Feature identifier has its contig start and end position defined in the input GFF file. The 'homologs' file merely links them together. Columns in the homologs file are delimited by the |
Click to expand/collapse
AddTag outputs most of the experimental results you need to The final data are printed to The The The The The The The The The The If the AddTag software fails for any reason, error messages will be printed to Often, errors happen if required AddTag arguments are missing, or input data is improperly formatted. AddTag outputs intermediate calculations and computation status to the The An example of a nominal
An example of an
These dDNAs each are predicted to recombine with contigs This file contains only the Target sequences that are contained within the Feature, but in This file is structured identically to the If you direct AddTag to find Primers to amplify the wild type Feature, then their amplicon sequences will be stored in the This example shows that polymorphisms at the Feature and its flanking sequences mean there are two possible dDNAs:
This file contains only the RGN Target sequences compatible with the In silico recombination will integrate the input dDNAs into their respective loci within the input genome. Contig names (primary identifiers) are modified with the incorporated dDNAs as well as the round. For example,
If the first round dDNA contains the following:
After the first round of in silico recombination,
|
Click to expand/collapse
The AddTag program contains a set of subroutines that can be run independently. There are four categories of subroutines.
|
Click to expand/collapse
Click to expand/collapse
To view which thermodynamics calculators are available on your system, use the following command: addtag list_thermodynamics |
These are instructions for using the current version of AddTag to re-design the experiments featured in the manuscript. The commands for the original design are in the methods.md file.
Click to expand/collapse
Download the Candida albicans reference genome and annotations used for this study. wget http://www.candidagenome.org/download/sequence/C_albicans_SC5314/Assembly22/archive/C_albicans_SC5314_version_A22-s07-m01-r19_chromosomes.fasta.gz
gunzip C_albicans_SC5314_version_A22-s07-m01-r19_chromosomes.fasta.gz
wget http://www.candidagenome.org/download/gff/C_albicans_SC5314/archive/C_albicans_SC5314_version_A22-s07-m01-r19_features.gff Set convenience variables for referencing these two files. GENOME_FASTA=C_albicans_SC5314_version_A22-s07-m01-r19_chromosomes.fasta
GENOME_GFF=C_albicans_SC5314_version_A22-s07-m01-r19_features.gff
GENOME_HOMOLOGS=C_albicans_SC5314_version_A22-s07-m01-r19_homologs.txt Create the python3 gff2homologs.py ${GENOME_GFF} > ${GENOME_HOMOLOGS} |
Click to expand/collapse
For simplicity, we use a variable to hold the label for this computational experiment. GENE=ADE2 Create and enter the directory for this experiment. mkdir ${GENE}_CDS
cd ${GENE}_CDS Extract the feature IDs of the genes we want to remove from the SELECTION=$(grep ${GENE} ../${GENOME_HOMOLOGS} | cut -f 2- --output-delimiter ' ') Identify the optimal Target sites and generate potential dDNAs. addtag generate_all \
--fasta ../${GENOME_FASTA} \
--gff ../${GENOME_GFF} \
--homologs ../${GENOME_HOMOLOGS} \
--selection ${SELECTION} \
--features gene \
--tag ID \
--ko-gRNA \
--ko-dDNA mintag \
--ki-gRNA \
--ki-dDNA \
--motifs 'N{17}|N{3}>NGG' \
--off_target_motifs 'N{17}|N{3}>NAG' \
--excise_insert_lengths 0 4 \
--revert_amplification_primers \
--revert_homology_length 100 200 \
--folder ${GENE}ga > ${GENE}ga.out 2> ${GENE}ga.err Select the best +Target and ΔTarget. addtag find_header --fasta ${GENE}ga/excision-targets.fasta --query '\brank=0\b' > ko-target.fasta
addtag find_header --fasta ${GENE}ga/reversion-targets.fasta --query '\brank=0\b' > ki-target.fasta Select an arbitrary ΔdDNA associated with the top-ranked ΔTarget, select the AdDNA with the best AmpF/AmpR primer pair. DONOR=$(grep '# reTarget results' -A 2 ${GENE}ga.out | tail -n +3 | cut -f 9 | cut -d ',' -f 1)
addtag find_header --fasta ${GENE}ga/excision-dDNAs.fasta --query "${DONOR}\b" > ko-dDNA.fasta
addtag find_header --fasta ${GENE}ga/reversion-dDNAs.fasta --query '\brank=0\b' > ki-dDNA.fasta Calculate a decent Primer Design for validating each genome engineering step. addtag generate_primers \
--fasta ../${GENOME_FASTA} \
--dDNAs ko-dDNA.fasta ki-dDNA.fasta \
--primer_scan_limit 600 \
--primer_pair_limit 300 \
--o_primers_required y n y \
--i_primers_required y n y \
--oligo ViennaRNA \
--specificity all \
--max_number_designs_reported 1000 \
--folder ${GENE}gp > ${GENE}gp.out 2> ${GENE}gp.err The file Finally change back to the parent folder cd .. |
Click to expand/collapse
*~ Section incomplete ~* |
Click to expand/collapse
*~ Section incomplete ~* |
Click to expand/collapse
*~ Section incomplete ~* |
Click to expand/collapse
*~ Section incomplete ~* |
Click to expand/collapse
*~ Section incomplete ~* |
Click to expand/collapse
*~ Section incomplete ~* |
Click to expand/collapse
*~ Section incomplete ~* |
Click to expand/collapse
In this simplest of examples, we will choose a Feature to delete from a genome, identify the optimal Target to design the gRNA against, create the necessary dDNA, and generate the set of Primers to validate the deletion. This process uses a 'nominal' The first step is to obtain input data. Let's download the sequences (FASTA) and annotations (GFF) for a haploid C. albicans assembly into the current working directory: wget ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/fungi/Candida_albicans/all_assembly_versions/GCF_000182965.3_ASM18296v3/GCF_000182965.3_ASM18296v3_genomic.fna.gz
gunzip GCF_000182965.3_ASM18296v3_genomic.fna.gz
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/fungi/Candida_albicans/all_assembly_versions/GCF_000182965.3_ASM18296v3/GCF_000182965.3_ASM18296v3_genomic.gff.gz
gunzip GCF_000182965.3_ASM18296v3_genomic.gff.gz For convenience, let's use a variable to abbreviate these paths: GENOME=GCF_000182965.3_ASM18296v3_genomic The 1-step approach is appropriate when the Feature you wish to remove contains a high quality Target within it. We will select a Feature from the GFF file using the Let's pretend we are interested in the gene GENE=GCN20 For the purposes of this walkthrough, If we know its gene ID, we can directly include the option
addtag find_feature --linked_tags --header --query ${GENE} --gff ${GENOME}.gff
We see there are 4 annotations associated with Let's choose the Feature type We will use a Target motif, an on-target score, and an off-target score each appropriate for Cas9. We use default score weights for both We will keep the rest of the AddTag default options. Our final command to identify the best Target sequences and generate the dDNA is the following: addtag generate_all \
--features gene \
--selection gene-CAALFM_C100480CA \
--motifs 'N{17}|N{3}>NGG' \
--off_target_motifs 'N{17}|N{3}>NAG' \
--ontargetfilters Azimuth \
--offtargetfilters CFD \
--excise_insert_lengths 0 0 \
--ko-gRNA \
--ko-dDNA mintag \
--fasta ${GENOME}.fna \
--gff ${GENOME}.gff \
--folder ${GENE}g > ${GENE}g.out 2> ${GENE}g.err This will output a single table, with the best Targets in the top of the output, and the worst toward the bottom. head ${GENE}g.out
If you run this command again, but omit the
Notice that by including the additional off-target motif, we see generally lower off-target scores (the Next we will identify the best cPCR primers for verifying the 'GCN20' full CDS deletion. |
Click to expand/collapse
We will delete a Feature that has no Target within it. *~ Section incomplete ~* |
Click to expand/collapse
We will edit a Feature *~ Section incomplete ~* |
Click to expand/collapse
We will edit a Feature that has no Target within it. *~ Section incomplete ~* |
Click to expand/collapse
Click to expand/collapse
*~ Section incomplete ~* |
Click to expand/collapse
*~ Section incomplete ~* |
Click to expand/collapse
All Features in input GFF file will be evaluated simultaneously. *~ Section incomplete ~* |
If you use the AddTag indirect genome editing method, please cite the paper with the initial proof-of-concept [1] as well as the full method description [2]. If you use the AddTag software for your research, please cite [2]. If you comment on, or further develop, AddTag's computational methods (such as Target identification, dDNA generation, or primer design—specifically the weight equations), please cite [3]:
- Namkha Nguyen, Morgan M. F. Quail, and Aaron D. Hernday. An efficient, rapid, and recyclable system for CRISPR-mediated genome editing in Candida albicans. mSphere Volume 2, Number 2 (2017). doi: 10.1128/mSphereDirect.00149-17, PMID: 28497115, PMCID: PMC5422035.
- Thaddeus D. Seher, Namkha Nguyen, Diana Ramos, Priyanka Bapat, Clarissa J. Nobile, Suzanne S. Sindi, and Aaron D. Hernday. AddTag, a two-step approach with supporting software package that facilitates CRISPR/Cas-mediated precision genome editing. G3 Genes|Genomes|Genetics, Volume 11, Issue 9 (2021). doi: 10.1093/g3journal/jkab216, retrieved from: <https://github.com/tdseher/addtag-project>.
- Thaddeus D. Seher. A computational approach for microbial genome editing. eScholarship: UC Merced Electronic Theses and Dissertations (2021). item: uc/item/4rd9215f.
Who do I talk to?
- Aaron D. Hernday (🔬 PI leading the project)
- Thaddeus D. Seher (💻 programmer) (💬@tdseher)
See also the list of contributors who participated in this project.
Click to expand/collapse
We are always looking for ways to broaden the usability of the AddTag software. Here is a list of things that would be great contributions.
|
Click to expand/collapse
First, check to see if the problem you are having has already been added to the issue tracker. If not, then please submit a new issue. |
Click to expand/collapse
Send a message to @tdseher. |
Click to expand/collapse
Please submit a pull request. |
Click to expand/collapse
Click to expand/collapse
AddTag comes with wrappers for several alignment programs. Depending on your experimental design and computing system, you may decide to use an aligner with no included wrapper. To implement your own, create a subclass of Share your code with us so we can make it available to all AddTag users. |
Click to expand/collapse
Several wrappers to popular oligonucleotide conformation, free energy, and melting temperature calculation programs are included. You can add your own by subclassing the If you create your own wrapper, please submit a |
Please see the LICENSE.md file.
Below are tips and descriptions of AddTag limitations that will help you make successful designs.
Click to expand/collapse
|