Author: Varun Umesh Gowda
Platform: 10x Genomics Visium
Species: Human
Tissue: Invasive Ductal Carcinoma (IDC), Fresh Frozen
Status: Actively developed (core analyses validated)
This project implements a reproducible spatial transcriptomics pipeline for analyzing 10x Genomics Visium breast cancer data, with a focus on:
- Spatial clustering and localization
- Data-driven and literature-supported cell type annotation
- Identification and validation of TLS-like (Tertiary Lymphoid Structure–like) lymphoid aggregates
- Spatial autocorrelation and neighborhood enrichment analyses
The pipeline emphasizes biological validation, transparent assumptions, and interpretable spatial results, rather than black-box annotation.
- Source: 10x Genomics / BioIVT
- Sample: Human Invasive Ductal Carcinoma (IDC)
- Preparation: Fresh frozen tissue, cryosectioned following Visium Spatial Protocols
- Data Type: Spot-level spatial gene expression with histology-aligned coordinates
Data access:
- 10x Genomics Visium Spatial Gene Expression datasets
- https://www.10xgenomics.com/datasets/human-breast-cancer-visium-fresh-frozen-whole-transcriptome-1-standard
Due to file size constraints, raw Visium output files are not included in this repository. Users should download the dataset directly from 10x Genomics and provide the path to the expression matrix when running the pipeline.
- Learn and implement a full spatial transcriptomics analysis workflow
- Go beyond clustering to validate biological meaning using spatial evidence
- Identify and characterize tumor-associated immune niches, including TLS-like structures
- Build a portfolio-ready, modular, and extensible pipeline
Implemented and validated in this version:
- Spatial clustering (Leiden)
- Marker-based and mean-expression–based annotation
- Module scoring for immune and epithelial programs
- Spatial localization plots (tissue vs dots)
- Moran’s I spatial autocorrelation (targeted genes)
- Spatial neighborhood enrichment (cluster adjacency)
- Quantitative summaries for cluster-level interpretation
Code exists but results are under review:
- Pathway enrichment (MSigDB-based)
- Spatial domain composition
- Reference-based deconvolution
These components will be added in a future release after biological verification.
.
├── spatial_preprocessing_and_annotation.py
│ └─ Core preprocessing, clustering, annotation, and spatial analysis logic
│
├── spatial_validation_and_visualization.py
│ └─ Helper utilities for spatial visualization, module scoring,
│ Moran’s I, neighborhood enrichment, and summary exports
│
├── environment.yml
│ └─ Conda environment for reproducibility
│
├── README.md
│ └─ Project documentation (this file)
- Leiden clustering on spatial transcriptomics data
- Tissue-aligned visualization of clusters
- Cluster-specific spatial localization plots
Outputs
spatial_clusters_overview.pngspatial_cluster_0_localization.png
- Canonical immune and epithelial marker panels
- Mean-expression–based evaluation across clusters
- Manual curation supported by literature
Key annotated programs
- B cells
- T cells
- Plasma cells
- Epithelial cells
- TLS-associated chemokines
TLS-like lymphoid aggregates were identified based on:
- Co-expression of CCL19, CCL21, CXCL13, LTB
- Enrichment of B, T, and plasma cell markers
- Spatial co-localization on tissue
- Significant spatial autocorrelation (Moran’s I)
- Neighborhood enrichment patterns
Note: Clusters are labeled as “TLS-like” unless canonical TLS histological organization is confirmed.
Outputs
spatial_module_TLS_enrichment.pngmoranI_tls_genes.csvcluster_0_score_summary.csv
- Targeted Moran’s I analysis for immune and TLS-related genes
- Identifies genes with non-random spatial expression patterns
- Evaluates spatial adjacency between clusters
- Reveals immune–tumor and immune–immune interactions
Outputs
nhood_enrichment.pngnhood_enrichment_zscores.csv(if supported by Squidpy version)
conda env create -f environment.yml
conda activate spatialPurpose: Preprocess raw 10x Visium data, perform clustering, and generate a processed AnnData object for downstream spatial validation.
Key output:
adata_spatial_processed.h5ad
python spatial_preprocessing_and_annotation.py \
--data_path path/to/filtered_feature_bc_matrix.h5 \
--out_dir results/Purpose: Validate cluster annotations using spatial localization, module enrichment, Moran’s I, and neighborhood enrichment.
Key outputs:
- Spatial gene & module localization plots
- Moran’s I tables
- Neighborhood enrichment heatmaps
python spatial_validation_and_visualization.py \
--adata_path results/adata_spatial_processed.h5ad \
--outdir results/spatial_validation \
--cluster_key leiden \
--target_cluster 0- Scanpy
- Squidpy
- NumPy / Pandas
- Matplotlib
- Python 3.10+
- Conda
- Modular scripts (core logic vs plotting helpers)
- Explicit biological assumptions
- Clear separation between validated and experimental analyses
- Outputs designed for manual inspection and interpretation
✔ Spatial clustering and annotation validated
✔ TLS-like niche identification supported by multiple spatial metrics
⏳ Pathway enrichment under biological review
⏳ Deconvolution pending reference validation
Future updates will be versioned and documented.
This repository is intended for:
- Educational purposes
- Methodological demonstration
- Portfolio and reproducibility examples
If you use or adapt this pipeline, please cite appropriately.
Varun Umesh Gowda
MS Bioinformatics, Northeastern University
Bioinformatics Research Assistant, Brigham & Women’s Hospital
📧 gowda.var@northeastern.edu
🔗 LinkedIn: www.linkedin.com/in/varun-u-gowda