-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generally high tumor proportion from TCGA data #28
Comments
Thank you for your feedback. A few potential reasons are as follows. First, the fraction inferred by BayesPrism represents the fraction of reads (rather than the cell count) of each cell The second potential cause for this is that when non-tumor cells in the reference are too few, non-tumor cells will have a sparser representation than tumor e , so that the reads in bulk will be assigned to tumor for those genes with zero expression in non-tumor cells. We also observe similar effects in T cells of GBM (see Supplementary Fig. 1e of our BayesPrism paper). Under such circumstance, although the absolute fraction will be underestimated for some cell types with too few cells, the relative fractions are still accurate. We recommend user represent each cell type with sufficient number of cells, say > 20 or even >50. The third reason might be related to the high granularity of cell type definition in your reference. In one spatial transcriptomics dataset we tested, when the reference cell types are too similar/co-linear, the quality, e.g. number of cells representing the cell type, might have higher impact in the reference, causing some cell types to be close to zero (due to the weak/sparse prior). In fact high co-linearity will also cause the linear regression to be unstable (higher standard error in regression coefficients). If that is the case, users may merge the cell types to a granularity of higher confidence, or simply treat them as cell states, which will be summed up by BayesPrism. Hope that I have clarified this. Let me know if there is any other questions. Best, Tinyi |
Hi, again.
I was able to solve issues with running BayesPrism thanks to your help.
Now I have been using both CIBERSORTx and BayesPrism to analyze various TCGA data with single-cell matrix of my own.
The most distinct result from those tools was how BayesPrism would end up with very high proportion of tumor cells (70-90%) while CIBERSORTx usually gave 20-30% using the same sample and single cell reference.
I have tried to re-scale non-tumor cells by removing tumor proportion and scaling each sample's proportion to 1. However, with the presence of other CD45- cells like Fibroblast and endothelial, I was unable to retrieve immune cell proportion with most of the immune cells having around 1^10-6 to 1^10-3. I could have removed all CD45- cell types but with such low proportion of CD45+ cell types, there were too much fluctuation between samples.
While actual tumor cell proportion might vary between samples and tumor types, I would think that tumor proportion is probably not as high as ~80% but probably not as low as ~25%. From your paper I observed similar pattern of having high proportion of tumor cells. I am curious about your interpretation of different deconvolution tools having such wide range of tumor cell proportion results.
The text was updated successfully, but these errors were encountered: