Skip to content

how to get log(TPM+1) values #44

Closed
@sunta3iouxos

Description

@sunta3iouxos

Thank you for this tool.
I am a novice in all TCGA data, but I am looking to do some analysis, and I wanted to download TPM normalised values, so that I can compine my own RNA-seq data. I think for my need, want to do GSVA, the TPM should be more appropriate than the percentile ranking.
From some tutorials I got some values that look more scaled than TPM normalised.
I want to use the data for GSVA or singscore
Is there a way to accomplish this with the XENAtools?
This is the code: (taken from https://github.com/XSLiuLab/tumor-immunogenicity-score)

library(UCSCXenaTools)
library(dplyr)
xe <- XenaGenerate(subset = XenaHostNames == "tcgaHub")
xe %>% XenaFilter(filterDatasets = "clinical") -> xe_clinical
xe %>% XenaFilter(filterDatasets = "HiSeqV2_PANCAN$") -> xe_rna_pancan
#Create data queries and download them:
# download_xena_pancan, eval=FALSE
xe_clinical.query <- XenaQuery(xe_clinical)
xe_clinical.download <- XenaDownload(xe_clinical.query,
  destdir = "UCSC_Xena/TCGA/Clinical", trans_slash = TRUE, force = TRUE
)

xe_rna_pancan.query <- XenaQuery(xe_rna_pancan)
xe_rna_pancan.download <- XenaDownload(xe_rna_pancan.query,
  destdir = "UCSC_Xena/TCGA/RNAseq_Pancan", trans_slash = TRUE
)
# hide_download_pancan, include=FALSE
if (!dir.exists("UCSC_Xena")) {
  xe_clinical.query <- XenaQuery(xe_clinical)
  xe_clinical.download <- XenaDownload(xe_clinical.query,
    destdir = "UCSC_Xena/TCGA/Clinical", trans_slash = TRUE
  )

  xe_rna_pancan.query <- XenaQuery(xe_rna_pancan)
  xe_rna_pancan.download <- XenaDownload(xe_rna_pancan.query,
    destdir = "UCSC_Xena/TCGA/RNAseq_Pancan", trans_slash = TRUE
  )
}

The author of the code mentions:
The RNASeq data we downloaded are pancan normalized. For comparing data within independent cohort (like TCGA-LUAD), we recommend to use the "gene expression RNAseq" dataset. For questions regarding the gene expression of this particular cohort in relation to other types tumors, you can use the pancan normalized version of the "gene expression RNAseq" data. For comparing with data outside TCGA, we recommend using the percentile version if the non-TCGA data is normalized by percentile ranking. For more information, please see our Data FAQ: [here](https://docs.google.com/document/d/1q-7Tkzd7pci4Rz-_IswASRMRzYrbgx1FTTfAWOyHbmk/edit?usp=sharing

Do you have any recommendations on this?
Theodoros

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions