-
Notifications
You must be signed in to change notification settings - Fork 10
Home
Welcome to the MetaFast wiki!
Fast metagenome analysis toolkit, version 0.1.0.
Authors:
- Software: Sergey Kazakov and Vladimir Ulyantsev, ITMO University, Saint-Petersburg.
- Testing: Veronika Dubinkina and Alexandr Tyakht, SRI of Physical-Chemical Medicine, Moscow.
- Idea: Dmitry Alexeev, SRI of Physical-Chemical Medicine, Moscow.
MetaFast — FAST METAgenome analysis toolkit — is a software for calculating different statistics of metagenome sequences and building the distance matrix between them.
Last stable release can be downloaded from http://github.com/ulyantsev/metafast/releases. You need only metafast.sh
to run metafast on Linux (don't forget chmod a+x metafast.sh
), or only metafast.bat
to run on Windows OS.
MetaFast accepts input sequence files of fastq and fasta formats. Input files can also be compressed with gzip of bzip2.
metafast.sh -i sample_1.fastq sample_2.fastq
— runs metafast with default parameters on Linux on two samples with reads in sample_1.fastq and sample_2.fastq.
metafast.sh -i data/*.fastq
— runs metafast on Linux on all reads files with extension .fastq in data directory.
metafast.sh -m 4G -k 7 -b 0 -l 8 -b1 3 -i test_data/tinytest_A.fastq test_data/tinytest_B.fastq
— runs metafast on Linux on two samples with reads in tinytest_A.fastq and tinytest_B.fastq.
-m 4G
— using 4 GB of memory;
-k 7
— use k-mer size of 7 nucleotides;
-b 0
— maximal frequency for a k-mer to be assumed erroneous is 0 (all k-mers are good);
-l 8
— only sequences with at least 8 nucleotides will be added to a component;
-b1 3
— minimum component size is 3 different k-mers.
Once metafast has finished, working directory will contain following results:
-
workDir/matrices/dist_matrix_<date>_<time>.txt
— distance matrix between samples (based on Bray–Curtis dissimilarity); -
workDir/kmer-counter-many/stats/<in_file>.stat.txt
— k-mers frequency statistics; -
workDir/components-cutter/components-stat-<b1>-<b2>.txt
— statistics of extracted components; -
workDir/features-calculator/vectors/<in_file>.vec
— characteristic vector for the input file in_file (number of k-mers in the input file from each extracted component).
Usage: metafast [<Launch options>] [<Input parameters>]
Input parameters:
-
-i, --reads <args>
List of reads files from single environment. FASTQ, BINQ, FASTA files are acceptable, gzip- and bzip2-compressed files are allowed too. -
-k, --k <arg>
K-mer size (in nucleotides, maximum 31 due to realization details). The default value is 31 nucleotide. -
-b, --maximal-bad-frequency <arg>
Maximal frequency for a k-mer to be assumed erroneous. The default value is 1 k-mer. -
-l, --min-seq-len <arg>
Minimum sequence length to be added to a component (in nucleotides). The default value is 100 nucleotides. -
--matrix-file <arg>
Resulting distance matrix file. The default value is <work_dir>/matrices/dist_matrix_<date>_<time>.txt. -
--stats-dir <arg>
Directory with statistics for kmers. The default value is <work_dir>/kmer-counter-many/stats. -
-bp, --bottom-cut-percent <arg>
K-mers percent to be assumed erroneous while building sequences in seq-builder. If specified, --maximal-bad-frequency wouldn't be used in sequence builder. -
-b1, --min-component-size <arg>
Minimum component size in component-cutter (in k-mers). The default value is 1000 k-mers. -
-b2, --max-component-size <arg>
Maximum component size in component-cutter (in k-mers). The default value is 10000 k-mers. -
-wn, --without-names
Do not print matrix row and column names as given file names.
Launch options:
-
-ts, --tools
Print available tools. -
-t, --tool <arg>
Set certain tool to run (by specifying its name). Default tool to run is the matrix-builder tool. -
-m, --memory <arg>
Memory to use (values with suffix, for example: 1500M, 4G, etc.). By default metafast uses 95% of free memory on Linux and 90% of free memory on Windows. -
-p, --available-processors <arg>
Available processors. By default metafast uses all processors in the computer. -
-w, --work-dir <arg>
Working directory. The default working directory is workDir/ in current directory. -
-c, --continue
Continue the previous run in working directory (there is no need to set other input parameters except working directory itself). -
-s, --start <arg>
First force run stage (with rewriting old results). -
-f, --finish <arg>
Finishing stage. -
-ea, --enable-assertions
Enable assertions. If metafast works strange you can use this flag for additional checking during working process. By default assertions are disabled. -
-v, --verbose
Enable debug output. -
-h, --help
Print short help message. -
-ha, --help-all
Print full help message.
- RAM: metafast requires 2-2.5 times more memory than maximum size of processing uncompressed fastq file.
- Hard disk space: metafast requires 25-30% of total size of processing uncompressed fastq files.
- Software: Java Runtime Environment 1.6 or higher is required for running metafast (nowadays almost any computer already has such software).