-
Notifications
You must be signed in to change notification settings - Fork 3
Quick start test
Matt Holt edited this page Feb 9, 2019
·
2 revisions
This content will still work, but is not recommended due to dependency on msbwt convert
. Instead, we recommend using the built-in fmlrc-convert
method. For more information, refer to the full example README.
This document contains the necessary information to run fmlrc on a test set for E. coli.
- Python 2.7 - tested on 2.7.6; assumes
pip
is installed as well - C++ compiler - tested with Apple LLVM version 8.1.0 (clang-802.0.42); should work with most up-to-date compilers
To use the script, create an empty directory to run your tests in. Then copy the script into a file (e.g. "test.sh"). cd
to the directory and run the script. A description of what the script does can be found in inline comments or in the section below the script.
#!/bin/bash
#download and install msbwt
pip install msbwt
#download and build ropebwt2
if [ ! -f ./ropebwt2/ropebwt2 ]; then
git clone https://github.com/lh3/ropebwt2.git
cd ropebwt2; make; cd ..
fi
#download and build fmlrc
if [ ! -f ./fmlrc/fmlrc ]; then
git clone https://github.com/holtjma/fmlrc.git
cd fmlrc; make; cd ..
fi
#download short-read ecoli
if [ ! -f s_6_1.fastq.gz ]; then
wget http://spades.bioinf.spbau.ru/spades_test_datasets/ecoli_mc/s_6_1.fastq.gz
fi
if [ ! -f s_6_2.fastq.gz ]; then
wget http://spades.bioinf.spbau.ru/spades_test_datasets/ecoli_mc/s_6_2.fastq.gz
fi
#download long-read ecoli and convert to fasta format
if [ ! -f PacBioCLR/PacBio_10kb_CLR.fasta ]; then
wget http://files.pacb.com/datasets/secondary-analysis/e-coli-k12-de-novo/1.3.0/Ecoli_MG1655_pacBioToCA.tgz
tar -xvzf Ecoli_MG1655_pacBioToCA.tgz
awk 'NR%4==1||NR%4==2' ./PacBioCLR/PacBio_10kb_CLR.fastq | tr "@" ">" > ./PacBioCLR/PacBio_10kb_CLR.fasta
fi
#build the bwt
if [ ! -f ./ecoli_mc_msbwt/comp_msbwt.npy ]; then
mkdir temp
gunzip -c s_6_?.fastq.gz | awk "NR % 4 == 2" | sort -T ./temp | tr NT TN | ./ropebwt2/ropebwt2 -LR | tr NT TN | msbwt convert ./ecoli_mc_msbwt
fi
#run fmlrc
NUM_PROCS=4
./fmlrc/fmlrc -p $NUM_PROCS -V -e 400 ./ecoli_mc_msbwt/comp_msbwt.npy ./PacBioCLR/PacBio_10kb_CLR.fasta ./corrected_final.fa
- Installs the prerequisite software: fmlrc, ropebwt2, and msbwt. ropebwt2 and msbwt are used together to construct the BWT of the short reads that feeds into fmlrc. fmlrc is then used to perform the read correction.
- Downloads test data: short Illumina reads and long PacBio reads for E. coli.
- Creates a BWT using ropebwt2 and msbwt.
- Runs fmlrc on the first 400 reads in the PacBio dataset using 4 processes. Note: NUM_PROCS should be changed depending on the limits of the test machine. Additionally, for a full test, remove the "-e 400" from the last command in the shell script.