Asm4pg

Asm4pg is an automatic and reproducible genome assembly workflow designed for pangenomic applications using PacBio HiFi data.

The official MSpangepop repo can be found at the INRAE forge.
A GitHub mirror can be found at INRAE GitHub.

This workflow leverages Snakemake for efficient genome assembly and generates an HTML report summarizing key assembly statistics.

Asm4pg can assamble in :

HiFi mode (default)
Performs primary genome assembly using high-fidelity long reads.
hi-c mode
Uses Hi-C data to scaffold the assembled contigs into chromosome-scale scaffolds.
trio mode
Uses parental short reads to partition long reads by haplotype before assembly.
ont mode
Uses ultra long reads (in fastq or bam format, no fasta) to perform the assembly

Animated version

📂 Repository Structure

├── README.md
├── asm4pg  # <- The running script
├── doc
├── workflow
│   ├── scripts
|   └── Snakefile
└──  .config
    ├── snakemake_profile
    └── masterconfig.yaml # <- Your configuration

✅ Requirements

Miniforge/conda (for Snakemake>=8.4.7 and the SLURM plugin)
Singularity/Apptainer (for containerized execution)

Note: All external tools are automatically managed by Snakemake and will be downloaded as Singularity/Apptainer images (~6GB total).

🚀 How to Use (quick guide)

1. Set up

Clone the Git repository

git clone https://forge.inrae.fr/asm4pg/GenomAsm4pg/ && cd GenomAsm4pg && mkdir slurm_logs

Create an environement for snakemake (from the provided envfile):

conda env create -n wf_env -f .config/wf_env.yaml

Use Miniforge with the conda-forge channel, see why here (french)

Update the asm4pg file with the correct paths to Singularity/Apptainer modules lines 45-46

nano asm4pg

You can configure this file for multiple servers using the case statement (see the example for genotoul HPC line 33)

2. Configure the pipeline for your data

Edit the masterconfig file in the .config/ directory with your sample information.

nano .config/masterconfig.yaml

Here you can add the path to your long reads file (fasta.gz, fasta, fastq.gz, fastq, or bam)
Update the path to the output directory parent directory
We advise keeping the default options for the first run.

Example config :

samples:       
  example1:              # <- First indent = Name of the assembly
    reads: example-1.fasta.gz    # <- Second indent = All options
    busco_lineage: insecta_odb10
  example2: 
    reads: example-2.fasta.gz
    busco_lineage: eudicots_odb10 # Options only affect current assembly

3. Run the workflow

Run the workflow :

sbatch asm4pg dry # Check for warnings
sbatch asm4pg run # Then

Nb : If your account name can't be automatically determined, add it in the .config/snakemake/profiles/slurm/config.yaml file.

Nb : Use the command squeue --format="%.10i %.9P %.6j %.10k %.8u %.2t %.10M %.6D %.20R" -A $user to see job names

⚙️ Other runing options

asm4pg [dry|run|local-run|dag|rulegraph|unlock|touch] [additional snakemake args]
    dry - run in dry-run mode
    run - run the workflow with SLURM
    local-run - run the workflow localy (on a single node)
    dag - generate the directed acyclic graph for the workflow
    rulegraph - generate the rulegraph for the workflow
    unlock - Unlock the directory if snakemake crashed
    touch - Tell snakemake that all files are up to date (use with caution)
    [additional snakemake args] - for any snakemake arg, like --until hifiasm

🔧 Using the full potential of the workflow :

Asm4pg has many options. If you wish to modify the default values and know more about the workflow, please refer to the documentation

Output of the workflow :

└── sample
    └── results
        ├── 00_converted_input
        ├── 01_raw_assembly
        │   ├── sample.fasta.gz
        │   └── sample.gfa
        ├── 02_final_assembly
        │   ├── hap1/hap2 
        │   │   ├── sample.fasta.gz # <- The final assembly
        │   │   └── ragtag_scafold
        ├── 03_raw_data_qc
        │   ├── genometools
        │   ├── genomescope
        │   └── jellyfish
        ├── 04_assembly_qc
        │   ├── hap1/hap2
        │   │   ├── genometools
        │   │   ├── busco
        │   │   ├── katplot
        │   │   ├── LTR/LAI
        │   │   └── telomeres
        │   ├── merqury
        │   │   ├── ...
        │   │   └── meryl_database.meryl
        │   └── quast
        ├── final_report.html # <- The final report
        ├── benchmark
        └── logs

📜 How to cite asm4pg?

Waiting for the publication, you can cite asm4pg as follow:

Denni S*, Piat L*, Bouallegue S, Tran J, Smith K, Wu C, Klopp C, Bui QT, Duvaux L. Asm4pg: a workflow for efficient long-read genome assembly for pangenomics (In prep.). https://forge.inrae.fr/asm4pg/GenomAsm4pg/

* This authors contributed equally to this work.

License

The content of this repository is licensed under (GNU GPLv3)

✉️ Contacts

For any troubleshooting, issue or feature suggestion, please use the issue tab of this repository. For any other question or if you want to help in developing asm4pg, please contact Ludovic Duvaux at ludovic.duvaux@inrae.fr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Asm4pg

📂 Repository Structure

✅ Requirements

🚀 How to Use (quick guide)

1. Set up

2. Configure the pipeline for your data

3. Run the workflow

⚙️ Other runing options

🔧 Using the full potential of the workflow :

Output of the workflow :

📜 How to cite asm4pg?

License

✉️ Contacts

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 378 Commits
.config		.config
doc		doc
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
asm4pg		asm4pg

License

inrae/GenomAsm4pg

Folders and files

Latest commit

History

Repository files navigation

Asm4pg

📂 Repository Structure

✅ Requirements

🚀 How to Use (quick guide)

1. Set up

2. Configure the pipeline for your data

3. Run the workflow

⚙️ Other runing options

🔧 Using the full potential of the workflow :

Output of the workflow :

📜 How to cite asm4pg?

License

✉️ Contacts

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages