BactSeq

Introduction

BactSeq is a Nextflow pipeline for performing bacterial RNA-Seq analysis. The pipeline supports both BWA alignment and Kallisto pseudo-alignment approaches, with optional differential expression and functional enrichment analysis.

Quick Start

# Run with test data
nextflow run BactSeq -profile test,docker

# Run with your own data
nextflow run BactSeq \
  --data_dir /path/to/fastq/files \
  --sample_file samples.tsv \
  --ref_genome genome.fasta \
  --ref_ann genome.gff3 \
  -profile docker

Pipeline Summary

The pipeline performs the following steps:

Core Analysis

Quality Control & Trimming
- Trim adaptors from reads (Trim Galore!)
- Read QC (FastQC)
Read Alignment (choose one)
- BWA: Align reads to reference genome (BWA-MEM)
- Kallisto: Pseudo-align reads to coding sequences (Kallisto)
Expression Quantification & Normalization
- Size-factor scaling using DESeq2
- TMM normalization using edgeR
- RPKM scaling for gene length normalization
Exploratory Analysis
- Principal component analysis (PCA) of normalized expression values
- Sample clustering and visualization

Optional Analysis

Differential Expression (DESeq2)
- Pairwise comparisons based on provided contrasts
- Volcano plots and summary statistics
Functional Enrichment (topGO)
- GO term enrichment analysis of differentially expressed genes
- Enrichment plots and gene lists

Installation

You will need to install Nextflow (version 21.10.3+).

# Install Nextflow
curl -s https://get.nextflow.io | bash

# Or via conda
conda install -c bioconda nextflow

Usage

Basic Usage

nextflow run BactSeq \
  --data_dir /path/to/fastq/files \
  --sample_file samples.tsv \
  --ref_genome genome.fasta \
  --ref_ann genome.gff3 \
  -profile docker

Parameters

Required Parameters

Parameter	Description
`--data_dir`	Path to directory containing FastQ files
`--sample_file`	Path to file containing sample information
`--ref_genome`	Path to FASTA file containing reference genome sequence (BWA) or coding gene sequences (Kallisto)
`-profile`	Configuration profile to use: `conda`, `docker`, `singularity`

Optional Parameters

Parameter	Default	Description
`--aligner`	`bwa`	Aligner to use: `bwa`, `kallisto`
`--ref_ann`	-	Path to GFF file containing reference genome annotation (required for BWA)
`--contrast_file`	-	Path to TSV file containing contrasts for differential expression
`--func_file`	-	Path to functional annotation file for enrichment analysis
`--strandedness`	`reverse`	Data strandedness: `unstranded`, `forward`, `reverse`
`--p_thresh`	`0.05`	Adjusted p-value threshold for differential expression
`--l2fc_thresh`	`1`	Absolute log2(FoldChange) threshold for differential expression
`--fragment_len`	`150`	Average fragment length for Kallisto (single-end only)
`--fragment_sd`	`20`	Fragment length standard deviation for Kallisto (single-end only)
`--skip_trimming`	`false`	Skip adapter trimming
`--outdir`	`./results`	Output directory for results

Standard Nextflow Parameters

Parameter	Description
`-resume`	Resume a previous run
`-name`	Name for the pipeline run
`-work-dir`	Work directory for temporary files

Input Files

Note: See the test data folder for example inputs.

Required Files

Sample Sheet (samples.tsv)
- TSV file containing sample information with the following columns:
  - sample: Sample ID
  - file1: Name of R1 FastQ file
  - file2: Name of R2 FastQ file (leave blank for single-end)
  - group: Grouping factor for differential expression
  - rep_no: Replicate number
Example:
```
sample	file1	file2	group	rep_no
AS_1	SRX1607051_T1.fastq.gz		Artificial_Sputum	1
AS_2	SRX1607052_T1.fastq.gz		Artificial_Sputum	2
MB_1	SRX1607054_T1.fastq.gz		Middlebrook	1
MB_2	SRX1607055_T1.fastq.gz		Middlebrook	2
```
Reference Genome (genome.fasta)
- FASTA file containing the reference genome sequence
- Can be downloaded from NCBI RefSeq
Gene Annotation (genome.gff3) - Required for BWA only
- GFF3 file containing gene annotations
- Can be downloaded from NCBI RefSeq

Optional Files

Contrasts Table (contrasts.tsv) - For differential expression
- TSV file with 2 columns defining comparisons to perform
- Column names: Condition1, Condition2
Example:
```
Condition1	Condition2
Artificial_Sputum	Middlebrook
Artificial_Sputum	Kanamycin
Middlebrook	Kanamycin
```
Functional Annotation File (functional_annotation.csv) - For enrichment analysis
- CSV file containing GO terms for genes
- Column 1: Gene ID (must match locus_tag in GFF)
- Column 2: GO terms (comma-separated)
Example:
```
Gene,GO_terms
MAB_0001,"GO:0005737,GO:0008150"
MAB_0002,"GO:0016020,GO:0006810"
```

Output

The pipeline generates the following output directories:

Directory	Contents
`trim_galore/`	Adapter-trimmed FastQ files and FastQC reports
`read_counts/`	Raw and normalized gene count matrices
`PCA_samples/`	Principal component analysis plots and coordinates
`diff_expr/`	Differential expression results and volcano plots
`func_enrich/`	Functional enrichment analysis results
`pipeline_info/`	Pipeline execution reports and metadata

Key Output Files

Count Matrices:

gene_counts.tsv: Raw read counts per gene
deseq_counts.tsv: DESeq2 normalized counts (log2 transformed)
cpm_counts.tsv: Counts per million (CPM) normalized
rpkm_counts.tsv: RPKM normalized counts

Analysis Results:

DGE_*.tsv: Differential expression results for each contrast
volcano_plot_*.png: Volcano plots for each contrast
pca_grouped.png: PCA plot colored by sample groups
*_enrich.tsv: GO enrichment results (if functional annotation provided)

Citation

If you use BactSeq in your research, please cite:

@software{bactseq,
  author = {Adam Dinan},
  title = {BactSeq: A Nextflow pipeline for bacterial RNA-seq analysis},
  url = {https://github.com/adamd3/BactSeq},
  version = {dev},
  year = {2024}
}

Changelog

Recent Updates

2025-07-19: Docker image is now compatible with nextflow 25.04.0+
2023-10-06: Docker and Singularity support restored
2023-09-28: Added example contrasts table and functional enrichment file

Name		Name	Last commit message	Last commit date
Latest commit History 342 Commits
.github/workflows		.github/workflows
bin		bin
configs		configs
modules		modules
test_data		test_data
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.nf		main.nf
nextflow.config		nextflow.config
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BactSeq

Introduction

Quick Start

Pipeline Summary

Core Analysis

Optional Analysis

Installation

Usage

Basic Usage

Parameters

Required Parameters

Optional Parameters

Standard Nextflow Parameters

Input Files

Required Files

Optional Files

Output

Key Output Files

Citation

Changelog

Recent Updates

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

adamd3/BactSeq

Folders and files

Latest commit

History

Repository files navigation

BactSeq

Introduction

Quick Start

Pipeline Summary

Core Analysis

Optional Analysis

Installation

Usage

Basic Usage

Parameters

Required Parameters

Optional Parameters

Standard Nextflow Parameters

Input Files

Required Files

Optional Files

Output

Key Output Files

Citation

Changelog

Recent Updates

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages