-
Notifications
You must be signed in to change notification settings - Fork 4
homocomp
The script homocomp implements homologous searching and visualization of the comparison. With a query gene sequence, homologous sequences can be identified in a genome. The homologous gene could be found if a related GTF is supplied. The alignment between two homologous regions are plotted.
- a query sequence
- a blast database
Notes
The blast database can be produced by the following script.
makeblastdb -in <seq> --dbtype nucl
If the query sequence is from a gene, geneseq can help retrieve data. In this case, the Zm00001eb001720 sequence extracted is used to find a matched region in A188.
query=wikiexample/2_geneseq/1o_geneseq/1o_geneseq.1.Zm00001eb001720.fasta
ref_db=wikiexample/1_data/blastDB/A188.fasta
prefix=1o_homocomp
perl ../../homocomp \
--prefix $prefix \
--query $query \
--db $ref_db
1o_homocomp.5b.delta.txt
1o_homocomp.5c.filt.delta.txt
1o_homocomp.5f.uniq.filt.delta.txt
1o_homocomp.o1.alnplot.pdf
1o_homocomp.o2.filt.alnplot.pdf
1o_homocomp.o3.uniq.filt.alnplot.pdf
1o_homocomp.o4.dotplot.pdf
1o_homocomp.o5.filt.dotplot.pdf
1o_homocomp.o6.uniq.filt.dotplot.pdf
Files suffixed with txt are outputs of NUCmer alignments between two sequences. xxx.5b.delta.txt includes raw alignments; xxx.5c.filt.delta.txt includes alignments after filtering based on the input criteria (identity and coverage); xxx.5f.uniq.filt.delta.txt includes unique alignments after filtering based on the input criteria. These alignments are used to produced alnplots and dotplots.
Here is the alnplot of 1o_homocomp.o2.filt.alnplot.pdf:

We can add a BED file to highlight the query regions by using --qryadd.
The parameter --dbacc can be used to add the accession/genotype of the reference/target sequence.
We can supply a GTF gene annotation file by using --tgtf to allow to find the best matched gene in ref/target.
query=wikiexample/2_geneseq/1o_geneseq/1o_geneseq.1.Zm00001eb001720.fasta
query_bed=wikiexample/2_geneseq/1o_geneseq/1o_geneseq.4.pos.adjusted.gtf.bed/Zm00001eb001720_T001.adjusted.bed
ref_db=wikiexample/1_data/blastDB/A188.fasta
ref_name=A188
target_gtf=wikiexample/1_data/gtf/A188.gtf
prefix=2o_homocomp
perl ../../homocomp \
--prefix $prefix \
--query $query \
--qryadd $query_bed \
--db $ref_db \
--dbacc $ref_name \
--tgtf $target_gtf
Here is the alnplot of 2o_homocomp.o2.filt.alnplot.pdf:

Usage: homocomp --query <fasta> --db <blast_db> [options]
[Options]
--query <file> fasta file containing a sequence as the query; required
--qryadd <file> bed file to highlight regions in query; optional
[format, 7 columns separated by Tab]:
chr start(0-based) end(1-based) label height(0.01-0.1) strand(+/-) color(R compatible)
--db <db_name> blast database; required
--dbacc <str> accession name of reference database; optional
--ref <file> fasta file of the reference; optional
if not supplied, reference sequences will be extracted from --db
--tchr <str> targeted chromosome or contig; optional
--tstart <num> bp position for the region start; optional
if specified, the value will be used as the left end
--tend <num> bp position for the region end; optional
if specified, the value will be used as the right end
--tgtf <file> the gene annotation GTF file of the target; optional
--strand <str> plus (or +) or minus (or -) (plus)
--evalue <str> maximal E-value (1e-10)
--identity <num> minimal percentage of identity from 0 to 100 (80)
--match <num> minimal bp match of an alignment (100)
--coverage <num> minimal coverage of the query (0)
--repeatdust remove repetitive blastn alignments if specified
--lext <num> extended bp from the left side (0);
--rext <num> extended bp from the right side (0);
--expand <num> expand scale relative to the query length (10)
e.g., 1kb query, 1kb x 10 extened from both sides will be scanned for the hit range
--prefix <str> the output directory and the prefix for output files (hpout)
--threads <num> number of cpus (1)
--noblastn skip blastn if specified
--bandcol <str> a valid R color name (bisque3)
--version version information
--help: help information.
Three alignment plots and two dotplots between two sequences are output.
Alignment plot without alignment filtering : <prefix>.o1.alnplot.pdf
Alignment plot with alignment filtering : <prefix>.o2.filt.alnplot.pdf
Alignment plot with filtered unique alignment: <prefix>.o2.filt.uniq.alnplot.pdf
Dotplot without alignment filtering : <prefix>.o3.dotplot.pdf
Dotplot with alignment filtering : <prefix>.o4.filt.dotplot.pdf