git clone https://github.com/RabbitBio/RabbitUniq
cd RabbitUniq
make
usage: RabbitUniq.py [-h] [--workspace WORKSPACE] --infile_list INFILE_LIST --outfile OUTFILE [--gu_worker GU_WORKER] [--kmer_len KMER_LEN] [--bin_num BIN_NUM]
[--exclude_last] [--output_char]
RabbitUniq
optional arguments:
-h, --help show this help message and exit
--workspace WORKSPACE, -w WORKSPACE
workspace directory the bin files stored [default: workspace]
--infile_list INFILE_LIST, -l INFILE_LIST
input file list, one line per file
--outfile OUTFILE, -o OUTFILE
out put file
--gu_worker GU_WORKER, -n GU_WORKER
The number of worker thread when generate unique kmer [default: 20]
--kmer_len KMER_LEN, -k KMER_LEN
Unique k-mer length [default: 25]
--bin_num BIN_NUM, -b BIN_NUM
Number of bin files to be store, from 64 to 2000[default: 512]
--exclude_last, -e Exclude the last element in infile_list when output
--uniq_ref_num UNIQ_REF_NUM, -u UNIQ_REF_NUM
Threshold considered as unique kmer, default is 1
--output_char, -c Output the unique k-mer collection in character-based file instead of binary file (slower, so not recommended)
cat $REF_FILE_PATH >> ${infile_list}
time${RabbitUniq_PATH }/RabbitUniq.py \
--infile_list ${infile_list} \
--outfile ${outname}.bin \
-n 20 -k 25 -b 2000 \
Add the reference file to the end of input list file list,
and run RabbitUniq with parameter --exclude_last
.
cat $REF_FILE_PATH >> ${infile_list}
time${RabbitUniq_PATH }/RabbitUniq.py \
--infile_list ${infile_list} -19. list \
--outfile ${outname}.bin \
-n 20 -k 25 -b 512 \
--exclude_last\