Skip to content

How otb.sh works

David Molik edited this page Feb 16, 2022 · 1 revision

otb.sh is the script that the user calls to run an instance of otb. The public secret is that otb.sh is just a wrapper script for run.nf, with some call outs to prefetch_containers and check_env (these are int the scr directory). Generally, what's happening under the hood is that otb.sh checks the users getopts/flags, checks the users compute environment, pre-downloads all the required software containers, setups busco, and then runs a nextflow script (run.nf).

otb runs the check environment script, the prefetch containers script, the check containers script, and nextflow script by building a bash variable, and adding to it and then evaluating it. The following snippet from otb.sh show's how this is done:

prefetch_container="./scr/prefetch_containers.sh"
[ -n "$YAHS" ] && prefetch_container+=" -y"
[ -n "$BUSCO" ] && prefetch_container+=" -b"
[ -n "$POLISHTYPE" ] && prefetch_container+=" -p $POLISHTYPE"
[ -n "$NXF_SINGULARITY_CACHEDIR" ] || ( mkdir -p "./work/singularity"; prefetch_container+=" -l ./work/singularity" )
eval $prefetch_container

...when the $YAHS variable is set, a " -y" will be appended to the prefetch_container variable.

If a developer wanted to add a getopt to otb.sh, they'd add a line to the get opt while loop near the beginning (remembering to add to the help() function at the beginning otb.sh, which would set a variable to add to the corresponding eval.

For instance, if the user wanted to add a --bar and corresponding -b which to the getopts of otb, and this flag took a integer for the prefetch containers script, the following would be completed:

In the documentation

    -r or --reverse
       another fastq or fastq.gz file for the pipeline, order does not matter
    -in or --reads
       path to reads (generally from pacbio), may include a wildcard for multiple files, can be fastq or bam
  suggested:
    -m or --mode
       mode to use, must be one of \"phasing\",\"homozygous\",\"heterozygous\",\"trio\", default: homozygous
    -t or --threads
       number of threads to use, clusters sometimes use this as number of cores, default: 20
    -n or --name
       a name for the assembly 
    -y or --yahs
       run yahs as well

Would become:

    -r or --reverse
       another fastq or fastq.gz file for the pipeline, order does not matter
    -in or --reads
       path to reads (generally from pacbio), may include a wildcard for multiple files, can be fastq or bam
  suggested:
    -m or --mode
       mode to use, must be one of \"phasing\",\"homozygous\",\"heterozygous\",\"trio\", default: homozygous
    -t or --threads
       number of threads to use, clusters sometimes use this as number of cores, default: 20
    -b or --bar
       the bar setting for prefetch_containers operation, default 20
    -n or --name
       a name for the assembly 
    -y or --yahs
       run yahs as well

and then the getopts while loop would be modified:

while [ $# -gt 0 ] ; do
  case $1 in
    -h | --help) help ;;
    -v | --version) version ;;
    -s | --supress) SUPRESS="true";;
    -c | --check) TEST="true";;
    -f | --forward) R1="$2" ;;
    -r | --reverse) R2="$2" ;;
    -in | --reads) READS="$2" ;;
    -m | --mode) MODE="$2";;
    -t | --threads) THREADS="$2";;
    -n | --name) NAME="$2";;
    -y | --yahs) YAHS="true";;
    --sge) RUNNER="sge";;
    --slurm) RUNNER="slurm";;
    --slurm-usda) RUNNER="slurm_usda";;
    --slurm-atlas) RUNNER="slurm_atlas";;
    --none) RUNNER="none";;
    --busco) BUSCO="--busco ";;
    --polish-type) POLISHTYPE="$2";;
    --auto-lineage) LINEAGE="auto-lineage";;
    --auto-lineage-prok) LINEAGE="auto-lineage-prok";;
    --auto-lineage-euk) LINEAGE="auto-lineage-euk";;
    -l | --lineage) LINEAGE="$2";;
    -p | --busco-path) BUSCOPATH="$2";;
  esac
  shift
done

becoming:

while [ $# -gt 0 ] ; do
  case $1 in
    -h | --help) help ;;
    -v | --version) version ;;
    -s | --supress) SUPRESS="true";;
    -c | --check) TEST="true";;
    -f | --forward) R1="$2" ;;
    -r | --reverse) R2="$2" ;;
    -in | --reads) READS="$2" ;;
    -m | --mode) MODE="$2";;
    -t | --threads) THREADS="$2";;
    -b | --bar) BAR="$2";;
    -n | --name) NAME="$2";;
    -y | --yahs) YAHS="true";;
    --sge) RUNNER="sge";;
    --slurm) RUNNER="slurm";;
    --slurm-usda) RUNNER="slurm_usda";;
    --slurm-atlas) RUNNER="slurm_atlas";;
    --none) RUNNER="none";;
    --busco) BUSCO="--busco ";;
    --polish-type) POLISHTYPE="$2";;
    --auto-lineage) LINEAGE="auto-lineage";;
    --auto-lineage-prok) LINEAGE="auto-lineage-prok";;
    --auto-lineage-euk) LINEAGE="auto-lineage-euk";;
    -l | --lineage) LINEAGE="$2";;
    -p | --busco-path) BUSCOPATH="$2";;
  esac
  shift
done

and finally, the bar variable could either be computed on or passed directly to prefetch_containers.sh:

prefetch_container="./scr/prefetch_containers.sh"
[ -n "$YAHS" ] && prefetch_container+=" -y"
[ -n "$BUSCO" ] && prefetch_container+=" -b"
[ -n "$POLISHTYPE" ] && prefetch_container+=" -p $POLISHTYPE"
[ -n "$BAR"] && prefetch_container+=" -b $BAR"
[ -n "$NXF_SINGULARITY_CACHEDIR" ] || ( mkdir -p "./work/singularity"; prefetch_container+=" -l ./work/singularity" )
eval $prefetch_container