Single-nuclei transcriptomics of mammalian prion diseases identifies dynamic gene signatures shared between species
This repository contains the code for the analysis of our prion disease single-cell RNA sequencing data. Our pre-print can be accessed on BioRxiv following the link: https://www.biorxiv.org/content/10.1101/2022.09.13.507650v1
More information regarding the contents of each file can be found below. In summary, the analysis starts with the parsing of single-cell files generated by the split-seq-tools pipeline into Seurat, then the datasets are integrated with a reference annotated dataset (Rosenberg, 2018) and label transfer is carried out. Then the main analysis proceeds with quality control, visualisations, and exploration of the datasets, including differential expression analyses. Additional differential expression analyses were also carried out on pseudobulk data. Gene Set Enrichment and Over-representation analyses were used to identify perturbed biological pathways. Finally, information of the version of R and packages used can be found in the R_session_information.txt
The SPLiT-seq analysis pipeline generates 3 files containing feature names, the counts matrix, and cell identities. These files are first loaded in Seurat to create the single-cell experiments. Additional information and metadata is also added on this step.
The annotated dataset from Rosenberg et al. 2018 was downloaded, loaded on Seurat, and pre-processed before integration with our dataset for label transfering.
The main analysis includes the steps of feature renaming to replace ENSEMBL ids with gene symbols, data quality control, and integration of our dataset with the reference dataset for label transfering. Common visualisations for quality assurance are also generated using this script. The analysis continues with testing the differences in proportions of different cell types between control and disease samples, and a differential gene expression.
We also agreggated our dataset by feature to create a pseudo-bulk expression experiment and applied traditional methodologies using DESeq2 for differential gene expression analyses. The script contains all the relevant code for data agreggation, DE testing, and visualisations.
We employed Gene Set Enrichment Analyses and Over-representation Analyses to identify perturbed biological networks based on the Gene Ontology gene sets.
Information regarding the version of R and packages used for the entirety of the analysis can be found in this file.