-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
Some data sets GEO don't tell you explicitly what the batch identifier is (for example GSE46691 that we've discussed), but you can use the date value which can be found in the CEL files. In R's oligo package, you can extract this information with the protocolData() function, but it's annoyingly slow to run as it loads the entire expression matrix. I wonder if it's an easy thing to write an R function that will grab off the protocol data without loading everything else. The following loop is code I'm working on right now, and it takes way to long to run (~1.5 hours) to do such a simple thing.
setwd("/scratch/lfs/tgerke/geoData")
library(oligo)
library(pd.huex.1.0.st.v2)
# this section will gather the dates of each sample run as a batch identifier
celnames <- list.celfiles("GSE46691", full.names = TRUE)
dates <- rep(NA, length(celnames))
class(dates) <- "Date"
for (i in 1:length(celnames)){
x <- read.celfiles(celnames[i])
dates[i] <- as.Date(protocolData(x)$dates)
}
table(dates)Reactions are currently unavailable