-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Hi, my package uses genomeInfoDb, and we use the seqlevelsStyle function to clean up user-inputted data and ensure consistent chromosome names (in our case, we go with NCBI style, which means stripping chr prefixes). I can see that what seems like a simple task gets complicated under the hood with the need to download the latest info from NCBI, Ensembl, and UCSC.
I found that .UCSC_cached_chrom_info and .NCBI_cached_chrom_info store the necessary information for seqlevelsStyle throughout a session, but an internet connection is initially necessary every new session. This causes a problem for offline users and users on networks that for whatever reason are blocking any of NCBI/UCSC/Ensembl traffic (yes, this is really happening). Since seqinfo is such a small amount of data, is there a plan to take advantage of R's support for caching user data to save this information and allow seqlevelsStyle to run offline? Or is there a safe workaround to supply the necessary seqinfo?
I did it this way, but I'm concerned this could cause problems with new GenomeInfoDb releases or if anything changes on the NCBI/UCSC/Ensembl server side.
# Get information for local caching
bsg = getBSgenome("hg19")
seqlevelsStyle(bsg) = "NCBI"
ucsc_info = GenomeInfoDb:::.add_ensembl_column(ucsc_info, "hg19")
ucsc_info = getFromNamespace(".UCSC_cached_chrom_info", "GenomeInfoDb")[["hg19"]]
ucsc_info = GenomeInfoDb:::.add_ensembl_column(ucsc_info, "hg19")
ncbi_info = getFromNamespace(".NCBI_cached_chrom_info", "GenomeInfoDb")[["GCF_000001405.25"]]
saveRDS(ncbi_info, "hg19_ncbi_seqinfo_for_GenomeInfoDb.rds")
saveRDS(ucsc_info, "hg19_ucsc_seqinfo_for_GenomeInfoDb.rds")
# Later, in new (offline) R session
ucsc_info = readRDS("hg19_ucsc_seqinfo_for_GenomeInfoDb.rds")
ncbi_info = readRDS("hg19_ncbi_seqinfo_for_GenomeInfoDb.rds")
assign('hg19', ucsc_info, envir = get(".UCSC_cached_chrom_info", envir = asNamespace('GenomeInfoDb')))
assign('GCF_000001405.25', ncbi_info, envir = get(".NCBI_cached_chrom_info", envir = asNamespace('GenomeInfoDb')))
# seqlevelsStyle now works offline
`