- GSE42872_main: Demonstrates the standard six steps for mining the GEO database.
- GSE11121_survival: Demonstrates batch survival analysis based on gene expression grouping.
Demonstrates the performance of specific genes of interest across multiple GSE datasets.
Demonstrates how to construct a network.
Demonstrates downstream analysis when sample grouping information is complex.
Demonstrates the standard six steps for mining the GEO database, similar to GSE42872_main.
Demonstrates a novel algorithm beyond the standard six steps for GEO database mining shown in GSE42872_main.
Demonstrates how to integrate the TCGA database.
Demonstrates the analysis of expression matrices obtained from RNA-seq, highlighting similarities and differences compared to traditional microarray expression matrices.
Demonstrates how to perform meta-analysis.
# Step 1: Download Data
# Data is the soul!
# It may not be easy to download data in China, so I have also uploaded the file GSE42872_raw_exprSet.Rdata, which you can load directly.
if(F) {
library(GEOquery)
gset <- getGEO('GSE42872', destdir=".",
AnnotGPL = F,
getGPL = F)
save(gset, 'GSE42872.gset.Rdata')
}
load('GSE42872_eSet.Rdata')
b <- eSet[[1]] # Please note that some GSE datasets have multiple platforms. Pay attention to the selection.
raw_exprSet <- exprs(b)
group_list <- c(rep('control', 3), rep('case', 3))
save(raw_exprSet, group_list,
file = 'GSE42872_raw_exprSet.Rdata')
# Step 2: Check the Expression Matrix
# High-quality data is crucial!
# I filtered the probes based on microarray annotation, and how I checked the group information for different samples in each experiment.
# This includes PCA and Cluster figures.