- Introduction (Wiki)
- Installation
- PanMAN Construction
- panmanUtils functionalities
- Contribute
- Citing PanMAN
Here we provide an overview of PanMAN, panmanUtils, and its installation methods and usage. For more information please see our Wiki.
PanMAN or Pangenome Mutation-Annotated Network is a novel data representation for pangenomes that provides massive leaps in both representative power and storage efficiency. Specifically, PanMANs are composed of mutation-annotated trees, called PanMATs, which, in addition to substitutions, also annotate inferred indels (Fig. 1b), and even structural mutations (Fig. 1a) on the different branches. Multiple PanMATs are connected in the form of a network using edges to generate a PanMAN (Fig. 1c). PanMAN's representative power is compared against existing pangenomic formats in Fig. 1d. PanMANs are the most compressible pangenomic format for the different microbial datasets (SARS-CoV-2, RSV, HIV, Mycobacterium. Tuberculosis, E. Coli, and Klebsiella pneumoniae), providing 2.9 to 559-fold compression over standard pangenomic formats.
panmanUtils includes multiple algorithms to construct PanMANs and to support various functionalities to modify and extract useful information from PanMANs (Fig. 2).
panmanUtils software can be installed using four different methods:
- Conda (Recommended)
- Docker Image
- Dockerfile
- Installation scripts
Users can install panmanUtils through installation of panman conda package, compatible with linux-64
and osx-64
. For modern macs using Apple silicon (arm64), you need to install Rosetta 2.
# Create and activate a new environment for panman
conda create -n panman-env python=3.11 -y
conda activate panman-env
# Set up channels
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
# On macOS ARM:
# conda config --env --set subdir osx-64
# Install the panman package
conda install panman -y
panmanUtils --help
To use panmanUtils in a docker container, users can create a docker container from a docker image, by following these steps (compatible with linux-64
and osx-64
).
## Note: If the Docker image already exist locally, make sure to pull the latest version using
## docker pull swalia14/panman:latest
## If the Docker image does not exist locally, the following command will pull and run the latest version
docker run -it swalia14/panman:latest
# Insider docker container
panmanUtils --help
Docker container with preinstalled panmanUtils can also be built from DockerFile by following these steps (compatible with linux-64
and osx-64
).
git clone https://github.com/TurakhiaLab/panman.git
cd panman/docker
docker build -t panman .
docker run -it panman
# Insider docker container
panmanUtils --help
We provide scripts to install panmanUtils from source code (requires sudo
access, compatible with Linux
only). Mac
users can use MacOS specific installation script, that uses conda
to install panmanUtils.
git clone https://github.com/TurakhiaLab/panman.git
cd panman
chmod +x install/installationUbuntu.sh
./install/installationUbuntu.sh
cd build
./panmanUtils --help
Once the package is installed, PanMANs can be constructed from PanGraph [or GFA or MSA] and Tree topology (Newick format) using panmanUtils. Here we provide examples for constructing PanMANs from PanGraph (JSON) and custom dataset. Alternatively, users can follow the instructions provided in wiki for other methods.
Step 1: Check if sars_20.json
and sars_20.nwk
files exist in test
directory.
Step 2: Run panmanUtils with the following command to build a panman from PanGraph:
panmanUtils -P $PANMAN_HOME/test/sars_20.json -N $PANMAN_HOME/test/sars_20.nwk -o sars_20
The above command will run panmanUtils program and build sars_20.panman
in $PANMAN_HOME/build/panman
directory.
We provide a Snakemake workflow to construct PanMANs from raw sequences (FASTA format) or from fragment assemblies.
!!!Note The Snakemake workflow uses various tools such as PanGraph tool, PGGB, MAFFT, and MashTree to build input PanGraph, GFA, MSA, and Tree topology files, respectively and it is particularly designed to be used in the docker container build from either the provided docker image or the DockerFile (instructions provided here).
Step 1: Run the following command to construct a panman from raw sequences.
- Usage
cd $PANMAN_HOME/workflows
snakemake --use-conda --cores 8 --config RUNTYPE="pangraph/gfa/msa" FASTA="[user_input]" SEQ_COUNT="Number of sequences" ASSEM="NONE" REF="NONE" TARGET="NONE"
- Example
cd $PANMAN_HOME/workflows
snakemake --use-conda --cores 8 --config RUNTYPE="pangraph" FASTA="$PANMAN_HOME/test/sars_20.fa" SEQ_COUNT="20" ASSEM="NONE" REF="NONE" TARGET="NONE"
Step 1: Run the following command to construct a panman from fragment assemblies.
cd $PANMAN_HOME/workflows
snakemake --use-conda --cores 8 --config RUNTYPE="pangraph/gfa/msa" FASTA="None" SEQ_COUNT="Number of sequences" ASSEM="frag" REF="reference_file" TARGET="target.txt"
Here, target.txt includes a list of files that contain the fragmented assemblies.
panmanUtils provide various functionalities such as summary, [Raw sequence, MSA, VCF, GFA] extract, sub-network pruning, and many more. Please refer to wiki for detailed information. Here we provide usage syntax and examples for summary and VCF extract.
The summary feature extracts node and tree level statistics of a PanMAN, that contains a summary of its geometric and parsimony information.
- Usage Syntax
panmanUtils -I <path to PanMAN file> --summary --output-file=<prefix of output file> (optional)
- Example
panmanUtils -I panman/sars_20.panman --summary --output-file=sars_20
Extract variations of all sequences from any PanMAT in a PanMAN in the form of a VCF file with respect to any reference sequence (ref) in the PanMAT.
- Usage syntax
panmanUtils -I <path to PanMAN file> --vcf -reference=ref --output-file=<prefix of output file> (optional)
- Example
panmanUtils -I panman/sars_20.panman --vcf -reference="Switzerland/SO-ETHZ-500145/2020|OU000199.2|2020-11-12" --output-file=sars_20
We welcome contributions from the community to enhance the capabilities of PanMAN and panmanUtils. If you encounter any issues or have suggestions for improvement, please open an issue on PanMAN GitHub page. For general inquiries and support, reach out to our team.
If you use the PanMANs or panmanUtils in your research or publications, we kindly request that you cite the following paper:
- Sumit Walia, Harsh Motwani, Kyle Smith, Russell Corbett-Detig, Yatish Turakhia, "Compressive Pangenomics Using Mutation-Annotated Networks", bioRxiv 2024.07.02.601807; doi: 10.1101/2024.07.02.601807