Asm4pg is an automatic and reproducible genome assembly workflow designed for pangenomic applications using PacBio HiFi data.
The official MSpangepop repo can be found at the INRAE forge.
A GitHub mirror can be found at INRAE GitHub.
This workflow leverages Snakemake for efficient genome assembly and generates an HTML report summarizing key assembly statistics.
Asm4pg can assamble in :
-
HiFi mode (default)
Performs primary genome assembly using high-fidelity long reads. -
hi-c mode
Uses Hi-C data to scaffold the assembled contigs into chromosome-scale scaffolds. -
trio mode
Uses parental short reads to partition long reads by haplotype before assembly. -
ont mode
Uses ultra long reads (in fastq or bam format, no fasta) to perform the assembly
├── README.md
├── asm4pg # <- The running script
├── doc
├── workflow
│ ├── scripts
| └── Snakefile
└── .config
├── snakemake_profile
└── masterconfig.yaml # <- Your configuration- Miniforge/conda (for Snakemake>=8.4.7 and the SLURM plugin)
- Singularity/Apptainer (for containerized execution)
Note: All external tools are automatically managed by Snakemake and will be downloaded as Singularity/Apptainer images (~6GB total).
Clone the Git repository
git clone https://forge.inrae.fr/asm4pg/GenomAsm4pg/ && cd GenomAsm4pg && mkdir slurm_logs- Create an environement for snakemake (from the provided envfile):
conda env create -n wf_env -f .config/wf_env.yamlUse Miniforge with the conda-forge channel, see why here (french)
- Update the
asm4pgfile with the correct paths to Singularity/Apptainer modules lines 45-46
nano asm4pgYou can configure this file for multiple servers using the case statement (see the example for genotoul HPC line 33)
- Edit the
masterconfigfile in the.config/directory with your sample information.
nano .config/masterconfig.yaml- Here you can add the path to your long reads file (fasta.gz, fasta, fastq.gz, fastq, or bam)
- Update the path to the output directory parent directory
- We advise keeping the default options for the first run.
Example config :
samples:
example1: # <- First indent = Name of the assembly
reads: example-1.fasta.gz # <- Second indent = All options
busco_lineage: insecta_odb10
example2:
reads: example-2.fasta.gz
busco_lineage: eudicots_odb10 # Options only affect current assembly- Run the workflow :
sbatch asm4pg dry # Check for warnings
sbatch asm4pg run # ThenNb : If your account name can't be automatically determined, add it in the
.config/snakemake/profiles/slurm/config.yamlfile.
Nb : Use the command
squeue --format="%.10i %.9P %.6j %.10k %.8u %.2t %.10M %.6D %.20R" -A $userto see job names
asm4pg [dry|run|local-run|dag|rulegraph|unlock|touch] [additional snakemake args]
dry - run in dry-run mode
run - run the workflow with SLURM
local-run - run the workflow localy (on a single node)
dag - generate the directed acyclic graph for the workflow
rulegraph - generate the rulegraph for the workflow
unlock - Unlock the directory if snakemake crashed
touch - Tell snakemake that all files are up to date (use with caution)
[additional snakemake args] - for any snakemake arg, like --until hifiasm
Asm4pg has many options. If you wish to modify the default values and know more about the workflow, please refer to the documentation
└── sample
└── results
├── 00_converted_input
├── 01_raw_assembly
│ ├── sample.fasta.gz
│ └── sample.gfa
├── 02_final_assembly
│ ├── hap1/hap2
│ │ ├── sample.fasta.gz # <- The final assembly
│ │ └── ragtag_scafold
├── 03_raw_data_qc
│ ├── genometools
│ ├── genomescope
│ └── jellyfish
├── 04_assembly_qc
│ ├── hap1/hap2
│ │ ├── genometools
│ │ ├── busco
│ │ ├── katplot
│ │ ├── LTR/LAI
│ │ └── telomeres
│ ├── merqury
│ │ ├── ...
│ │ └── meryl_database.meryl
│ └── quast
├── final_report.html # <- The final report
├── benchmark
└── logsWaiting for the publication, you can cite asm4pg as follow:
Denni S*, Piat L*, Bouallegue S, Tran J, Smith K, Wu C, Klopp C, Bui QT, Duvaux L. Asm4pg: a workflow for efficient long-read genome assembly for pangenomics (In prep.). https://forge.inrae.fr/asm4pg/GenomAsm4pg/
* This authors contributed equally to this work.
The content of this repository is licensed under (GNU GPLv3)
For any troubleshooting, issue or feature suggestion, please use the issue tab of this repository. For any other question or if you want to help in developing asm4pg, please contact Ludovic Duvaux at ludovic.duvaux@inrae.fr