Skip to content

nf-core/pacsomatic: comprehensive somatic analysis using Pacbio HiFi read data #63

@wzhang42

Description

@wzhang42

Pipeline title/name

pacsomatic

Keywords

cancer genomics, somatic analysis, HiFi read, variant (SNV, SV, CNV) calling, mutation signature, CpG methylation, Tumor purity, HRD

What is it about?

nf-core/pacsomatic is a bioinformatics nextflow pipeline which utilize the paired tumor/normal HiFi read data for comprehensive PacBio long-read somatic analysis. It includes a bunch of key functionalities such as HiFi read alignment, BAM coverage and quality control (QC), somatic SNV/SV/CNV detection and annotation, tumor purity/ploidy inference, homologous recombination deficiency (HRD) estimation, mutation signature analysis, CpG methylation profiling, and identification of differentially methylated regions (DMRs).
Key features:
1.) Adopted Nextflow DSL2 as the framework and singularity containers to run each functional module.
2.) A simple sample sheet as the sole input.
3.) Automatically pair tumor and normal BAM/VCF files for analysis.
4.) Support customizable run of the entire or part of pipeline.

Please provide a schematic diagram of the proposed pipeline

https://github.com/wzhang42/HiFi-Somatic-Nextflow/blob/main/Diagram.png

I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:

  • be built with Nextflow.
  • pass nf-core lint tests and use standardized parameters.
  • be community-owned and developed within the nf-core organization.
  • open source under the MIT license with proper credits and acknowledgments.
  • have a descriptive, all lowercase, and without punctuation name.
  • use the nf-core pipeline template and predominantly use official nf-core modules.
  • focus on a specific data/analysis type with appropriate scope.
  • have properly maintained documentation.
  • be bundled using versioned Docker/Singularity containers.

Why do we need a new pipeline?

PacBio long-read sequencing, a third-generation sequencing (TGS) technology, offers several advantages over short-read sequencing. Somatic analysis are highly demanded in cancer/tumor studies. Currently, there is no Nextflow pipeline to use Pacbio HiFi reads for a comprehensive somatic analysis.

Who would be interested?

Tumor /cancer researchers.
Bioinformatics scientist in the field of Pacbio long read sequencing data analysis.

What has been done so far

We have developed most of the nextflow DSL2 codes. The initial test in HPC lsf cluster run well. I need more people involve and improve .

URL to existing work (if applicable)

https://github.com/wzhang42/HiFi-Somatic-Nextflow

Are there any similar existing nf-core pipelines?

none

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    accepted

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions