Skip to content

Grab protocol date from CEL file #13

@smithjessk

Description

@smithjessk

Some data sets GEO don't tell you explicitly what the batch identifier is (for example GSE46691 that we've discussed), but you can use the date value which can be found in the CEL files. In R's oligo package, you can extract this information with the protocolData() function, but it's annoyingly slow to run as it loads the entire expression matrix. I wonder if it's an easy thing to write an R function that will grab off the protocol data without loading everything else. The following loop is code I'm working on right now, and it takes way to long to run (~1.5 hours) to do such a simple thing.

setwd("/scratch/lfs/tgerke/geoData")

library(oligo)
library(pd.huex.1.0.st.v2)

# this section will gather the dates of each sample run as a batch identifier
celnames <- list.celfiles("GSE46691", full.names = TRUE)
dates <- rep(NA, length(celnames))
class(dates) <- "Date"
for (i in 1:length(celnames)){
    x <- read.celfiles(celnames[i])
    dates[i] <- as.Date(protocolData(x)$dates)
}
table(dates)

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions