- Overview
- Requirements
- Execution
- Workflow
MIP_ORACLE is a software to filter and identify Molecular Inversion Probes with diagnostic significance in antimicrobial resistance genes, and various other pathogen genomes. MIPs are single-stranded DNA molecules containing two complementary regions that flank the target DNA.
These molecules often have a Fluorophore, DNA barcode, or Molecular tag for unique identification.
Rough Design Outline-
- Start with all possible MIPs by moving along the strand one base pair at a time.
- Design MIPs for both the forward and reverse strands so that we have the highest probability of binding and then proceed to filter them according to three user-specified criteria:
a) Temperature
b) GC Content
c) Nucleotide Repeats - Following this, further filter the MIPs by BLASTing them against the host genome(human).
- To further increase the probability of the MIP binding to the correct target region BLAST them against the non-redundant nucleotides database as well. Filter out any MIPs that match other organisms.
Nucleotide BLAST 2.12.0 + with the nt database.
Python 3.6 and the following Python packages:
- pandas=1.1.5
- biopython=1.70
- configparser
- regex
- xlsxwriter
- openpyxl
Users can install the required packages through conda using the following command
conda create -n mip_oracle --file mip_oracle_env.txt
For creating a database specific to the host (human), the following commands can be used
### Extract human sequences from NT DB
blastdbcmd -db $parameterJ/nt -taxids 9606 -out human_sequences.fasta
### Create a new BLAST database specific to humans
makeblastdb -in human_sequences.fasta -dbtype nucl -parse_seqids -out nt_human
- Obtain sequences of interest in a FASTA format, make sure the organism name is present in the definition line of each sequence.
- Following this download all the program files and store them in the same directory as the FASTA file.
- Fill out the requirements to filter MIPs in the config file provided. The MIPs within the ranges given will be accepted. ex. all MIPs with 45<temp<70 will be taken.
- Run the shell script provided as so:
bash MIP_ORACLE.sh -i AAC-nucleotide -o AAC-nucleotide_results -l mip_oracle -j /DATA/databases/blast/nt/ -n /DATA/databases/blast/Nt_Human/
- nohup can also be used:
nohup bash MIP_ORACLE.sh -i AAC-nucleotide -o AAC-nucleotide_results -l mip_oracle -j /DATA/databases/blast/nt -n /DATA/databases/blast/Nt_Human/ > AAC-nucleotide_log.out &
where,
-i = Name of the input FASTA file(There's no need to add the file extension)
-o = Name of the ouptut file(There's no need to add the file extension)
-l = The name of the conda environment containing all the packages
-j = The location of the nt BLAST database
-n = The location of the human-specific BLAST database
- The following files will be generated(These files will be stored in a folder called LOG_FILES):
- The first file will contain all possible MIPs for the sequences provided.
- The second and third files will contain Passable MIPs(The MIPs that met user requirements as per the config file) and Eliminated MIPs(MIPs that were filtered out).
- The fourth file is the BLAST input containing arm1+target+arm2 sequences.
- The fifth and sixth files are the .xml result files from BLAST.
- The seventh file will contain the parsed BLAST results about each MIP, and the eighth file will have the filtered results.
- Lastly the final result file will be generated in an Excel format.