RNA-seq Analysis Pipeline (Nextflow)

This repository contains a RNA-seq analysis pipeline I developed using Nextflow. It takes paired-end FASTQ files as input and processes them through quality control, trimming, alignment, and statistical analysis.

I created this as part of a project to better understand RNA-seq data analysis and workflow management with Nextflow. This can be used as a base or educational reference for similar projects. I also used some gtf and fa files from the GENCODE those can be changed the parts that I marked as #EDIT.

Pipeline Steps

The workflow consists of the following main steps:

FastQC – run before trimming and after trimming to check read quality.
Trim Galore – detects low-quality reads and adapter sequences to remove them.
STAR Indexing – indexes the reference genome.
STAR Alignment – aligns trimmed reads to the genome.
R Analysis – an R script (counts_and_tests.R) that:
- Generates count tables
- Performs differential expression with edgeR
- Uses clusterProfiler for enrichment analysis
- Works with GTF annotations using rtracklayer and org.Hs.eg.db
- Generates volcano and heatmap plots

The workflow is written in Nextflow DSL2 and uses Conda for reproducible environments.

Installation and Setup

Before running the pipeline:

Download the Repository

git clone https://github.com/uzay-citimoglu/RNAseq-Analysis.git
cd RNAseq-Analysis

Install Nextflow & Conda
Follow the Nextflow installation guide:

curl -s https://get.nextflow.io | bash
mv nextflow ~/bin/   # or another directory in your PATH

If you don't have Conda, install Miniconda:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Set Up the Conda Environment
Edit envsetup.slurm and update the .yml file path. Then run:

sbatch envsetup.slurm
# or
bash envsetup.slurm

Prepare Input Files

Paired-end FASTQs (*_1.fastq.gz, *_2.fastq.gz) they need to be in this format
Reference genome FASTA
Annotation GTF

Running the Pipeline

1. Edit `nfrun.slurm`

Open the file and update the four parameters in the nextflow run command.

#!/bin/bash -l
#SBATCH --job-name=testnf
#SBATCH --output=testnf_%j.out
#SBATCH --error=testnf_%j.err
#SBATCH --cpus-per-task=16
#SBATCH --mem=72G
#SBATCH -p your_partition_name       # EDIT

# --- USER INPUTS (EDIT BELOW) ---
nextflow run mainuzay.nf \
  --reads "/PATH/TO/FASTQ/*_{1,2}.fastq.gz" \                               # EDIT
  --fasta "/PATH/TO/REFERENCE/GENOME.fa" \                                  # EDIT
  --gtf "/PATH/TO/ANNOTATION/GENCODE.gtf" \                                 # EDIT
  --outdir "/PATH/TO/OUTPUT/DIRECTORY" \                                    # EDIT
  -with-report "/PATH/TO/OUTPUT/DIRECTORY/summary_$(date +%F_%H-%M-%S).html" \
  -resume

2. Nextflow Script (`mainuzay.nf`)

The Nextflow workflow defines the full RNA-seq analysis process.
You must provide the same four required parameters (--reads, --fasta, --gtf, --outdir) either:

In the SLURM script (nfrun.slurm)
Or directly on the Nextflow CLI

Parameters in the Script

These are the defaults inside mainuzay.nf — you can edit them here or override via CLI.

// --- USER INPUTS (EDIT THESE PATHS OR OVERRIDE VIA CLI) ---
params.reads  = "/PATH/TO/FASTQ/*_{1,2}.fastq.gz"                            // EDIT
params.fasta  = "/PATH/TO/REFERENCE/GENOME.fa"                               // EDIT
params.gtf    = "/PATH/TO/ANNOTATION/GENCODE.gtf"                            // EDIT
params.outdir = "/PATH/TO/OUTPUT/DIRECTORY"                                  // EDIT

3. R Analysis Script (`counts_and_tests.R`)

This R script performs the downstream RNA-seq analysis after alignment and counting.

User Inputs to Edit

Section / Variable	Description	Example
`bams <- c(...)`	Paths to sorted BAM files	`/path/to/sample1.sorted.bam`
`annot.ext` in `featureCounts`	Path to GTF annotation	`/refs/gencode.v48.annotation.gtf`
Output file paths	Where counts and FPKM TSVs are written	`/project/results/counts.tsv`
`nthreads`	Number of CPU threads for counting	`20`
`group <- factor(...)`	Experimental groups for DE analysis	`c("Control", "Control", "Treatment", "Treatment")`
Filtering thresholds	Expression cutoffs and DE cutoffs	Adjust as needed
PCA color mapping	Colors assigned to samples in PCA plot	`"Sample1" = "red", "Sample2" = "blue"`

Main Analysis Steps

Counting & FPKM calculation
Annotation merge with GTF
Filtering low-expression genes
PCA plot
Differential expression analysis
Volcano plots
Heatmap
KEGG & GO enrichment analysis

Output Files

File	Description
`counts.tsv`	Raw gene counts
`fpkm_values.tsv`	FPKM-normalized values
`edgeR_glm_DEG.tsv`	DE results from glmLRT
`edgeR_exact_DEG.tsv`	DE results from exactTest
`kegg_enrichment.tsv`	KEGG enrichment results
`go_enrichment.tsv`	GO enrichment results
`kegg_plots.pdf`	KEGG enrichment plots
`go_plots.pdf`	GO enrichment plots

File Structure

scripts/: Contains a Rscript for future usage in statistical analysis.
envsetup.slurm: Contains the bash code for environment setup.
mainuzay.nf: The pipeline.
nextflow.config: Additional setting for the pipeline.
nfrun.slurm: Contains the code and inputs for running the pipeline.
nfuzay.yml: Contains needed packages and tools for pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RNA-seq Analysis Pipeline (Nextflow)

Pipeline Steps

Installation and Setup

Running the Pipeline

1. Edit `nfrun.slurm`

2. Nextflow Script (`mainuzay.nf`)

Parameters in the Script

3. R Analysis Script (`counts_and_tests.R`)

User Inputs to Edit

Main Analysis Steps

Output Files

File Structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
scripts		scripts
README.md		README.md
envsetup.slurm		envsetup.slurm
mainuzay.nf		mainuzay.nf
nextflow.config		nextflow.config
nfrun.slurm		nfrun.slurm
nfuzay.yml		nfuzay.yml

uzay-citimoglu/RNAseq-Analysis

Folders and files

Latest commit

History

Repository files navigation

RNA-seq Analysis Pipeline (Nextflow)

Pipeline Steps

Installation and Setup

Running the Pipeline

1. Edit nfrun.slurm

2. Nextflow Script (mainuzay.nf)

Parameters in the Script

3. R Analysis Script (counts_and_tests.R)

User Inputs to Edit

Main Analysis Steps

Output Files

File Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Edit `nfrun.slurm`

2. Nextflow Script (`mainuzay.nf`)

3. R Analysis Script (`counts_and_tests.R`)

Packages