This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
Inconsistent merging in RNA-seq RDS files #221
Closed
Description
File(s)
What data file(s) does this issue pertain to?
pbta-gene-expression-rsem-fpkm.polya.rds
pbta-gene-expression-rsem-fpkm.stranded.rds
Release
What release are you using?
release-v5-20190924
Link to OpenPBTA-manuscript
Put a link to the relevant section of the OpenPBTA manuscript here.
Question/issue
Looks like an issue when merging the RSEM files. In Gencode v27, the actual gene id-symbol
mappings are:
ENSG00000168824.14 HGNC:18790
ENSG00000170091.10 HGNC:24955
Example of inconsistent merging:
stranded:
input.dat <- '~/Projects/OpenPBTA-analysis/data/pbta-gene-expression-rsem-fpkm.stranded.rds'
expr <- readRDS(input.dat)
head(expr[grep("HGNC", expr$gene_id),1:5])
gene_id BS_QWNBZ9RJ BS_FEPRNEXX BS_P39SQPTS BS_M8WP5T16
12451 ENSG00000168824.14_HGNC 86.73 20.43 79.11 2.28
12750 ENSG00000170091.10_HGNC 105.13 36.66 71.74 8.33
58348 ENSG00000168824.14_HGNC:18790 NA NA NA NA
58349 ENSG00000170091.10_HGNC:24955 NA NA NA NA
and for polya:
input.dat <- '~/Projects/OpenPBTA-analysis/data/pbta-gene-expression-rsem-fpkm.polya.rds'
expr <- readRDS(input.dat)
head(expr[grep("HGNC", expr$gene_id),1:5])
gene_id BS_R7NTZR4C BS_C83TK159 BS_PXCPK5XS BS_JB43XBCQ
12451 ENSG00000168824.14_HGNC NA NA 17.08 NA
12750 ENSG00000170091.10_HGNC NA NA 27.80 NA
58348 ENSG00000168824.14_HGNC:18790 1.80 56.39 NA 10.26
58349 ENSG00000170091.10_HGNC:24955 41.26 61.57 NA 25.85
Activity