The Snakemake pipeline integrates NTSM and VerifyBamID to verify sample identity and detect potential contamination in sequencing data.
- Snakemake 7+
- Singularity (or Apptainer) for running the ntsm Docker image
- Python libraries (for plotting step):
pandasnumpyseabornmatplotlib
The manifest must include:
ID— Unique identifier for each datasetFOFN— File-of-filenames (list of FASTQ/FASTA files)TYPE- Sequencing platform or data type Choose one of the following:PacBioONTIllumina
Example:
ID FOFN TYPE
SampleA fofn/SampleA.fofn PacBio
SampleB fofn/SampleB.fofn ONT
SampleC fofn/SampleC.fofn Illumina
Important keys:
MANIFEST: Path to the manifest fileREF_SITE: Reference sites fasta (default:db_source/human_sites_n10.fa, relative to the Snakefile directory)EXTERNAL_COUNTS_DIR: Directory containing external count files (optional)COUNT_FILE_EXP: File extension for the external count files(default:count)
- Edit
config.yamlandmanifest.tabto reflect your datasets. - Run Snakemake:
ln -s /net/eichler/vol28/software/pipelines/ntsm_smk/runcluster .
./runcluster 30- NTSM: https://github.com/JustinChu/ntsm
- VerifyBamID: https://github.com/Griffan/VerifyBamID.git