Pangenome Mutation-Annotated Network (PanMAN)

Introduction

Here we provide an overview of PanMAN, panmanUtils, and its installation methods and usage. For more information please see our Wiki.

What is a PanMAN?

PanMAN or Pangenome Mutation-Annotated Network is a novel data representation for pangenomes that provides massive leaps in both representative power and storage efficiency. Specifically, PanMANs are composed of mutation-annotated trees, called PanMATs, which, in addition to substitutions, also annotate inferred indels (Fig. 1b), and even structural mutations (Fig. 1a) on the different branches. Multiple PanMATs are connected in the form of a network using edges to generate a PanMAN (Fig. 1c). PanMAN's representative power is compared against existing pangenomic formats in Fig. 1d. PanMANs are the most compressible pangenomic format for the different microbial datasets (SARS-CoV-2, RSV, HIV, Mycobacterium. Tuberculosis, E. Coli, and Klebsiella pneumoniae), providing 2.9 to 559-fold compression over standard pangenomic formats.

Figure 1: Overview of the PanMAN data structure

panmanUtils

panmanUtils includes multiple algorithms to construct PanMANs and to support various functionalities to modify and extract useful information from PanMANs (Fig. 2).

Figure 2: Overview of panmanUtils' functionalities

Installation

panmanUtils software can be installed using four different methods:

Conda (Recommended)
Docker Image
Dockerfile
Installation scripts

1. Using conda (recommended)

Users can install panmanUtils through installation of panman conda package, compatible with linux-64 and osx-64. For modern macs using Apple silicon (arm64), you need to install Rosetta 2.

i. Dependencies

Conda

ii. Install panman conda package

# Create and activate a new environment for panman
conda create -n panman-env python=3.11 -y
conda activate panman-env

# Set up channels
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

# On macOS ARM:
# conda config --env --set subdir osx-64

# Install the panman package
conda install panman -y

iii. Run panmanUtils

panmanUtils --help

2. Using Docker Image

To use panmanUtils in a docker container, users can create a docker container from a docker image, by following these steps (compatible with linux-64 and osx-64).

i. Dependencies

Docker

ii. Pull and build the PanMAN docker image from DockerHub

## Note: If the Docker image already exist locally, make sure to pull the latest version using 
## docker pull swalia14/panman:latest

## If the Docker image does not exist locally, the following command will pull and run the latest version
docker run -it swalia14/panman:latest

iii. Run panmanUtils

# Insider docker container
panmanUtils --help

3. Using DockerFile

Docker container with preinstalled panmanUtils can also be built from DockerFile by following these steps (compatible with linux-64 and osx-64).

i. Dependencies

Docker
Git

ii. Clone the repository and build a docker image

git clone https://github.com/TurakhiaLab/panman.git
cd panman/docker
docker build -t panman .

iii. Build and run the docker container

docker run -it panman

iv. Run panmanUtils

# Insider docker container
panmanUtils --help

4. Using installation script (Least recommended)

We provide scripts to install panmanUtils from source code (requires sudo access, compatible with Linux only). Mac users can use MacOS specific installation script, that uses conda to install panmanUtils.

i. Dependencies

Git

ii. Clone the repository

git clone https://github.com/TurakhiaLab/panman.git
cd panman

iii. Run the installation script

chmod +x install/installationUbuntu.sh
./install/installationUbuntu.sh

iv. Run panmanUtils

cd build
./panmanUtils --help

PanMAN Construction

Once the package is installed, PanMANs can be constructed from PanGraph [or GFA or MSA] and Tree topology (Newick format) using panmanUtils. Here we provide examples for constructing PanMANs from PanGraph (JSON) and custom dataset. Alternatively, users can follow the instructions provided in wiki for other methods.

Building PanMAN from PanGraph

Step 1: Check if sars_20.json and sars_20.nwk files exist in test directory.

Step 2: Run panmanUtils with the following command to build a panman from PanGraph:

panmanUtils -P $PANMAN_HOME/test/sars_20.json -N $PANMAN_HOME/test/sars_20.nwk -o sars_20

The above command will run panmanUtils program and build sars_20.panman in $PANMAN_HOME/build/panman directory.

Building PanMAN from raw sequences or fragment assemblies using Snakemake Workflow

We provide a Snakemake workflow to construct PanMANs from raw sequences (FASTA format) or from fragment assemblies.

!!!Note The Snakemake workflow uses various tools such as PanGraph tool, PGGB, MAFFT, and MashTree to build input PanGraph, GFA, MSA, and Tree topology files, respectively and it is particularly designed to be used in the docker container build from either the provided docker image or the DockerFile (instructions provided here).

Building PanMAN from raw genome sequences

Step 1: Run the following command to construct a panman from raw sequences.

Usage

cd $PANMAN_HOME/workflows
snakemake --use-conda --cores 8 --config RUNTYPE="pangraph/gfa/msa" FASTA="[user_input]" SEQ_COUNT="Number of sequences" ASSEM="NONE" REF="NONE" TARGET="NONE"

Example

cd $PANMAN_HOME/workflows
snakemake --use-conda --cores 8 --config RUNTYPE="pangraph" FASTA="$PANMAN_HOME/test/sars_20.fa" SEQ_COUNT="20" ASSEM="NONE" REF="NONE" TARGET="NONE"

Building PanMAN from fragment assemblies

Step 1: Run the following command to construct a panman from fragment assemblies.

cd $PANMAN_HOME/workflows
snakemake --use-conda --cores 8 --config RUNTYPE="pangraph/gfa/msa" FASTA="None" SEQ_COUNT="Number of sequences" ASSEM="frag" REF="reference_file" TARGET="target.txt"

Here, target.txt includes a list of files that contain the fragmented assemblies.

panmanUtils functionalities

panmanUtils provide various functionalities such as summary, [Raw sequence, MSA, VCF, GFA] extract, sub-network pruning, and many more. Please refer to wiki for detailed information. Here we provide usage syntax and examples for summary and VCF extract.

Summary extract

The summary feature extracts node and tree level statistics of a PanMAN, that contains a summary of its geometric and parsimony information.

Usage Syntax

panmanUtils -I <path to PanMAN file> --summary --output-file=<prefix of output file> (optional)

Example

panmanUtils -I panman/sars_20.panman  --summary --output-file=sars_20

Variant Call Format (VCF) extract

Extract variations of all sequences from any PanMAT in a PanMAN in the form of a VCF file with respect to any reference sequence (ref) in the PanMAT.

Usage syntax

panmanUtils -I <path to PanMAN file> --vcf -reference=ref --output-file=<prefix of output file> (optional)

Example

panmanUtils -I panman/sars_20.panman --vcf -reference="Switzerland/SO-ETHZ-500145/2020|OU000199.2|2020-11-12" --output-file=sars_20

Contribute

We welcome contributions from the community to enhance the capabilities of PanMAN and panmanUtils. If you encounter any issues or have suggestions for improvement, please open an issue on PanMAN GitHub page. For general inquiries and support, reach out to our team.

Citing PanMAN

If you use the PanMANs or panmanUtils in your research or publications, we kindly request that you cite the following paper:

Sumit Walia, Harsh Motwani, Kyle Smith, Russell Corbett-Detig, Yatish Turakhia, "Compressive Pangenomics Using Mutation-Annotated Networks", bioRxiv 2024.07.02.601807; doi: 10.1101/2024.07.02.601807

Name		Name	Last commit message	Last commit date
Latest commit History 552 Commits
.github/workflows		.github/workflows
docker		docker
docs		docs
gpu		gpu
install		install
scripts		scripts
src		src
test		test
workflows		workflows
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
panman.capnp		panman.capnp
panman.proto		panman.proto
usher.proto		usher.proto
version.txt		version.txt

License

TurakhiaLab/panman

Folders and files

Latest commit

History

Repository files navigation

Pangenome Mutation-Annotated Network (PanMAN)

Table of Contents

Introduction

What is a PanMAN?

panmanUtils

Installation

1. Using conda (recommended)

i. Dependencies

ii. Install panman conda package

iii. Run panmanUtils

2. Using Docker Image

i. Dependencies

ii. Pull and build the PanMAN docker image from DockerHub

iii. Run panmanUtils

3. Using DockerFile

i. Dependencies

ii. Clone the repository and build a docker image

iii. Build and run the docker container

iv. Run panmanUtils

4. Using installation script (Least recommended)

i. Dependencies

ii. Clone the repository

iii. Run the installation script

iv. Run panmanUtils

PanMAN Construction

Building PanMAN from PanGraph

Building PanMAN from raw sequences or fragment assemblies using Snakemake Workflow

Building PanMAN from raw genome sequences

Building PanMAN from fragment assemblies

panmanUtils functionalities

Summary extract

Variant Call Format (VCF) extract

Contribute

Citing PanMAN

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages