Unsupervised Clustering and Supervised Labeling of Single-Cell Data

This project implements both unsupervised clustering and supervised labeling of single-cell RNA sequencing data. It leverages highly variable gene selection (HVG), PCA for dimensionality reduction, UMAP embeddings, K-Means clustering, and Random Forest classification to analyze single-cell transcriptomic datasets.

Components

Unsupervised clustering using PCA, UMAP, and K-Means.
Supervised classification using Random Forest trained on labeled data.
Efficient preprocessing pipeline that normalizes, log-transforms, and selects highly variable genes.
Scalable and modular implementation using Python and Scanpy.

Files

best_cleaned.py - Main script for both clustering and classification.
data_exp.py - Exploratory data analysis script.
clusters.npy - Output file containing the predicted cluster labels.
requirements.txt - List of required Python dependencies.

Usage

python3 best_cleaned.py -t mouse_spatial_brain_section0.h5ad -d mouse_spatial_brain_section1_modified.h5ad -o clusters.npy # supervised labeling

The pipeline was evaluated using Adjusted Mutual Information (AMI) scores & ended up scoring the highest in class (nickname: oceanman🙂)

Acknowledgments : This project was completed as part of the MSCBIO2066 coursework at University of Pittsburgh.

Dataset source : https://bits.csb.pitt.edu/mscbio2066/assign3/data/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Clustering and Supervised Labeling of Single-Cell Data

Components

Files

Usage

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md
best_cleaned.py		best_cleaned.py
clusters.npy		clusters.npy
data_exp.py		data_exp.py
requirements.txt		requirements.txt

djcode81/Unsupervised-Clustering-and-Supervised-Labeling-of-Single-Cell-Data

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Clustering and Supervised Labeling of Single-Cell Data

Components

Files

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages