NestLink-pipeline is a pipeline for processing NestLink libraries sequenced by nanopore sequencing. Reads are binned according to their flycodes (UMIs). Accurate consensus sequences are calculated using Medaka. Variants are called with the pipeline, resulting in a flycode assignment table that links protein variants to their respective set of flycodes.
- Nextflow (Installation guide), on the cluster it has to be installed in a mamba/ conda environment called
nextflow
. - Mamba/ Conda (https://conda-forge.org/)
- mini_align (mini_align.sh placed in
./bin/
)
- Podman (https://podman.io/)
- Slurm workflow manager
- Singularity
- Clone the repository.
- Edit the params.json file, specify the nanopore reads (bam) and reference sequence.
- Run the pipeline:
sbatch run_NL-pipeline.slurm
- Prepare the pipeline as described above.
- Run the pipeline:
bash run_NL-pipeline.sh
Parameter | Type | Description |
---|---|---|
data |
String | Path to input BAM file. |
reference |
String | Path to reference FASTA file. |
filter_min_length |
Integer | Read filtering minimum length threshold. |
filter_max_length |
Integer | Read filtering maximum length threshold. |
extract_seq_adapter |
String | Linked adapter for sequence trimming. |
extract_seq_min_length |
Integer | Sequence trimming minimum length threshold. |
extract_seq_max_length |
Integer | Sequence trimming minimum length threshold. |
extract_flycode_adapter |
String | Linked adapter for flycode extraction. |
medaka_dorado_model |
String | Dorado model used for basecalling. |
flycode_pattern |
List(String, String) | Sequences flanking flyodes. |
orf1_name |
String | Name of ORF 1. |
orf1_pattern |
List(String, String) | Sequences flanking ORF 1. |
orf2_name |
String | Name of ORF2 (optional). |
orf2_pattern |
List(String, String) | Sequences flanking ORF 2 (optional). |
outdir |
String | Output directory for results. |