Skip to content

homostack

Sanzhen Liu edited this page Aug 29, 2025 · 11 revisions

The script homostack visualizes alignments among multiple homologous sequences.

required data

  1. multiple query sequences
  2. BED files with highlight regions (optional) [format, 7 columns separated by Tab]:
    1. chr
    2. start(0-based)
    3. end(1-based)
    4. label
    5. height(e.g., 0.1)
    6. strand(+/-)
    7. color(R compatible)

example

running script

Parameter --seq is for a fasta sequence input and parameter --bed is for the BED highlight file. This is a pair. Once --bed is used for one sequence, other sequences should be paired with their own input. If no BED file for a sequence, "--bed none" can be specified.

seq01=/wikiexample/1_data/mads69/B73/MAD69/MAD69.3.Zm00001eb143080.fasta
bed01=/wikiexample/1_data/mads69/B73/MAD69/MAD69.2.transcripts/Zm00001eb143080_T001.adjusted.bed
seq02=/wikiexample/1_data/mads69/B97/MAD69/MAD69.3.Zm00018ab147610.fasta
bed02=/wikiexample/1_data/mads69/B97/MAD69/MAD69.2.transcripts/Zm00018ab147610_T001.adjusted.bed
seq03=/wikiexample/1_data/mads69/Ms71/MAD69/MAD69.3.Zm00035ab147480.fasta
bed03=/wikiexample/1_data/mads69/Ms71/MAD69/MAD69.2.transcripts/Zm00035ab147480_T001.adjusted.bed

perl ../../homostack \
    --seq $seq01 --annot $bed01 --plotname Zm00001eb143080 \
    --seq $seq02 --annot $bed02 --plotname Zm00018ab147610 \
    --seq $seq03 --annot $bed03 --plotname Zm00035ab147480

output plot

stackplot

full usage

Usage: perl ../homostack --seq <fasta> --annot <annot_file> [options]
    [Options]
    --seq <file>     fasta file containing a sequence as the query; required
                     multiple sequences are needed by using --seq multiple times
    --annotskip      skip annotation if specified; NO skipping by default
    --annot <file>   bed file to highlight regions in query; if --annotskip is specified, --annot will be ignored; required otherwise
                     [format]: 7 columns separated by Tab
                               chr start(0-based) end(1-based) label height(e.g., 0.1) strand(+/-) color(R compatible)
                     [NOTE 1]: if no --annotskip, the number --annot needs to match the number of --seq;
                               they will be paired by their order, i.e., 1st --seq paired with 1st --annot;
                               if some --annot has no data, input "none".
                     [NOTE 2]: "height" is the ratio of height of highlighted bars to height of each alignment unit
                               a highlighted bar fills the specified region if the height equals --$seqheight value
    --plotname       sequence names to be used in the plot; multiple sequences allowed; if specified, equal number of inputs should be used as --seq inputs 
    --alnskip        skip alignments if specified; NO skipping by default
    --identity <int> minimal percentage of identity from 0 to 100 (80)
    --match <int>    minimal bp match of an alignment (100)
    --prefix <str>   the output directory and the prefix for output files (hsout)
    --title <str>    the title of the plot (ALNStack)
    --minident <int> lowest identity for plotting color scaling, 0-100 or auto (auto)
    --maxident <int> highest identity for plotting color scaling, 0-100 or auto (auto)
    --threads <int>  number of cpus (1)
    --seqheight <float> ratio of height of a sequence to height of each alignment unit (0.1)
    --bandcol <str>  a valid R color name (bisque3)
    --cleanup        clean up outputs if specified; NO cleanup by default
    --version        version information
    --help           help information.

Output from homocomp

Plot of sequential alignments of multiple sequences : <prefix>.3.alnstack.pdf
Clone this wiki locally