Currently, only vpf-class is implemented, but we have plans to include more tools
in this framework.
vpf-class attemps to classify viruses using Viral Protein Families.
Usage example: Given a .fna file, obtain the proteins of each virus with
prodigal, then perform a hmmsearch against the given hmms (VPFs) file to
obtain a classification. This requires a working installation of
HMMER (version 3.2+) and
Prodigal (version 2.6.X). Both should be
either available in your $PATH or specified using the --hmmer-prefix and
the --prodigal flags.
stack exec -- vpf-class --data-index ../data/index.yaml -i ../data/test.fna -o test-classifiedThis will output a directory with a .tsv file for each specified classification
level in the index.yaml file. Using the provided files, one thus obtains:
test-classified/baltimore.tsvtest-classified/family.tsvtest-classified/genus.tsvtest-classified/host_domain.tsvtest-classified/host_family.tsvtest-classified/host_genus.tsv
Please read to the end to find out where to obtain all the required data files.
The --data-files option may be skipped if the VPF_CLASS_DATA_INDEX
environment variable is set to the path of index.yaml.
Concurrency options can be specified with --workers (number of
parallel workers running prodigal or hmmsearch) and --chunk-size (max
number of genomes for each prodigal/hmmsearch process).
We now provide a pre-configured docker image which contains all the required dependencies and automatically downloads supplementary data. See the detailed instructions here.
Since there are no release binaries available, you will need to install
stack and compile vpf-tools yourself. The instructions
are the same for both Mac OS and Linux, the tool has not been tested on
Windows.
First, install stack using
curl -sSL https://get.haskellstack.org/ | shThen run
git clone https://github.com/biocom-uib/vpf-tools
cd vpf-tools
stack buildto clone the repository and compile all targets. The first time this can take a
while as stack also needs to install GHC and compile all the dependencies.
Once it has finished, you should be able to run any of the tools from this
directory by prefixing them with stack exec --, for instance,
stack exec -- vpf-class --helpThere is experimental support for OpenMPI. Add --flag vpf-class:+mpi when
building and then run the tool normally as any other program with mpirun.
You can find our classification of VPFs either as a
compressed package (including index.yaml)
here.
Alternatively, you can download individual data files
here, at the "VPF classification"
tab. The data files that vpf-class requires are in the rows "Full data"
(modelClassesFile) and "UViG Score samples" (scoreSamplesFile). This VPF
classification has been obtained as described in the paper, but the tool is
designed to work with any user-provided classification files.
The most recent hmms file containing the HMMER models of VPFs (vpfsFile in
data-index.yml) can be downloaded from
IMG/VR (UPDATE: the link
appears to be broken, you can find a copy
here).
To use it with the provided index.yaml, extract final_list.hmms into the
data directory, next to index.yaml.
-
HMMSearchNotFound: First, make sure that you have a working installation of HMMER. If it is not accessible from your
$PATH, you can specify the path to the installation (the directory that containsbinandshare) using the--hmmer-prefixflag. -
ProdigalNotFound: Make sure that you have Prodigal installed. If it is not accessible from your
$PATH, you can specify the location to the executable using the--prodigalflag. -
The first step (
curl -sSL https://get.haskellstack.org/ | sh) requires root access: The default configuration in the Stack installer uses/usr/local/as the default prefix. Stack can also be installed in$HOME/.local/following their manual installation method. -
Stack build reports errors either while installing GHC or downloading package indices: If you have any issues during the installation, please check out the Stack documentation to verify that all dependencies are satisfied.
-
I have issues with conda: Some users have reported issues with Stack and Conda. Thus, installing it in a Conda-polluted environment is discouraged and unsupported.