nf_CRACpipeline is a nextflow workflow for processing and analyzing CRAC data. This workflow was rewritten in Nextflow DSL2 based on an earlier version of python script, CRAC_pipeline_SE_demult_dedup.py by Edward Wallace which was used in the paper:
Rosemary A Bayne, Uma Jayachandran, Aleksandra Kasprowicz, Stefan Bresson, David Tollervey, Edward W J Wallace, Atlanta G Cook, Yeast Ssd1 is a non-enzymatic member of the RNase II family with an alternative RNA recognition site, Nucleic Acids Research, 2021;, gkab615, doi: 10.1093/nar/gkab615
This workflow make use of various 3rd party tools especially tools from pyCRAC software by Sander Granneman. A brief summary of all the tools in this pipeline and their function is listed below:
- Flexbar: Trimmed forward reads
- pyBarcodeFilter.py: Demultiplexed the trimmed reads
- FastQC: Evaluate the quality of the demultiplexed trimmed reads
- pyFastqDuplicateRemover.py: Collapsed duplicated reads
- Novoalign: Maps the colllapsed reads to a reference genome with reference to an index file
- samtools: convert sam files to bam files, sort and index bam files
- BamQC: Evaluate the quality of bam/sam files
- bedtools genomecov: Generate bedgraph files from sorted indexed bam files
- bedtools multicov: Counts reads to transcript features
- pyReadcounter.py: Quantify the number of reads that are overlapped to the genomic features
- pyGTF2bedGraph.py: Generate bedgraphs from gtf files
- pyPileup.py: Make pileup tables of reads and deletions/mutations for a given genelist
- pyCalculateFDR.py: Find peaks with false discovery rates on protein-coding genes
Before running this pipeline, Nextflow first needs to be installed. Please follow the instruction here for installation. Please make sure you have all the required dependencies listed above installed before running the pipeline.
To run this pipeline, use the command below:
nextflow run main.nf \
--reads reads.fastq \
--adapterfile adapter.fasta \
--barcode Barcodes.txt \
--novoindex myindexfile.novoindex \
--transcriptgff myfile.gff \
--gtf mygtffile.gtf \
--chromosome mychromosome.lengths \
--genelist mygenelist.txt \
--genometab mygenome.tab
For more information on the arguments of this pipeline, simply run the command:
nextflow run main.nf --help