Skip to content

Quick start test

Matt Holt edited this page Feb 9, 2019 · 2 revisions

Quick Start test - DEPRECATED

This content will still work, but is not recommended due to dependency on msbwt convert. Instead, we recommend using the built-in fmlrc-convert method. For more information, refer to the full example README.

This document contains the necessary information to run fmlrc on a test set for E. coli.

Prerequisites

  1. Python 2.7 - tested on 2.7.6; assumes pip is installed as well
  2. C++ compiler - tested with Apple LLVM version 8.1.0 (clang-802.0.42); should work with most up-to-date compilers

Shell script

To use the script, create an empty directory to run your tests in. Then copy the script into a file (e.g. "test.sh"). cd to the directory and run the script. A description of what the script does can be found in inline comments or in the section below the script.

#!/bin/bash
#download and install msbwt
pip install msbwt
 
#download and build ropebwt2
if [ ! -f ./ropebwt2/ropebwt2 ]; then
    git clone https://github.com/lh3/ropebwt2.git
    cd ropebwt2; make; cd ..
fi

#download and build fmlrc
if [ ! -f ./fmlrc/fmlrc ]; then
    git clone https://github.com/holtjma/fmlrc.git
    cd fmlrc; make; cd ..
fi

#download short-read ecoli
if [ ! -f s_6_1.fastq.gz ]; then
    wget http://spades.bioinf.spbau.ru/spades_test_datasets/ecoli_mc/s_6_1.fastq.gz
fi
if [ ! -f s_6_2.fastq.gz ]; then
    wget http://spades.bioinf.spbau.ru/spades_test_datasets/ecoli_mc/s_6_2.fastq.gz
fi

#download long-read ecoli and convert to fasta format
if [ ! -f PacBioCLR/PacBio_10kb_CLR.fasta ]; then
    wget http://files.pacb.com/datasets/secondary-analysis/e-coli-k12-de-novo/1.3.0/Ecoli_MG1655_pacBioToCA.tgz
    tar -xvzf Ecoli_MG1655_pacBioToCA.tgz
    awk 'NR%4==1||NR%4==2' ./PacBioCLR/PacBio_10kb_CLR.fastq | tr "@" ">" > ./PacBioCLR/PacBio_10kb_CLR.fasta
fi

#build the bwt
if [ ! -f ./ecoli_mc_msbwt/comp_msbwt.npy ]; then
    mkdir temp
    gunzip -c s_6_?.fastq.gz | awk "NR % 4 == 2" | sort -T ./temp | tr NT TN | ./ropebwt2/ropebwt2 -LR | tr NT TN | msbwt convert ./ecoli_mc_msbwt
fi

#run fmlrc
NUM_PROCS=4
./fmlrc/fmlrc -p $NUM_PROCS -V -e 400 ./ecoli_mc_msbwt/comp_msbwt.npy ./PacBioCLR/PacBio_10kb_CLR.fasta ./corrected_final.fa

What is it doing?

  1. Installs the prerequisite software: fmlrc, ropebwt2, and msbwt. ropebwt2 and msbwt are used together to construct the BWT of the short reads that feeds into fmlrc. fmlrc is then used to perform the read correction.
  2. Downloads test data: short Illumina reads and long PacBio reads for E. coli.
  3. Creates a BWT using ropebwt2 and msbwt.
  4. Runs fmlrc on the first 400 reads in the PacBio dataset using 4 processes. Note: NUM_PROCS should be changed depending on the limits of the test machine. Additionally, for a full test, remove the "-e 400" from the last command in the shell script.
Clone this wiki locally