This Nextflow pipeline is designed to process Oxford Nanopore raw signal data (POD5 files) through basecalling and optional demultiplexing steps. It supports both simplex and duplex basecalling modes using Dorado.
The pipeline consists of a single workflow that processes Nanopore POD5 files through several phases:
- A basecalling phase using Dorado in either simplex or duplex mode
- An optional demultiplexing phase for barcoded samples
- A final conversion phase to generate FASTQ files from BAM output
The workflow produces the following outputs:
raw/
: Directory containing the final FASTQ filesunclassified/
: Directory containing unclassified FASTQ files (only relevant for demultiplexing)
- Install Nextflow (23.04.0+)
- Install Docker
- Set up AWS BATCH
- Clone this repository
Basic usage:
Create a new directory, name it after the delivery, copy in basecall.config as nextflow.config, and set the parameters. Params:
- duplex
- Duplex basecalling or no? You can't combine duplex and demux
- demux
- Demultiplex basecalling output?
- nanopore_run
- Name of run/delivery
- kit
- Name of ONT kit, needed for demux'ing
Additionally, add a barcodes.txt file to the directory, containing the barcodes to be demultiplexed, in the format:
01
02
12
...
Once that is done, you can switch into the directory and run
nextflow run .. -resume