Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choice of gene sets for cell cycle scoring #78

Open
fbnrst opened this issue Oct 8, 2021 · 1 comment
Open

Choice of gene sets for cell cycle scoring #78

fbnrst opened this issue Oct 8, 2021 · 1 comment

Comments

@fbnrst
Copy link

fbnrst commented Oct 8, 2021

I recently realized that the gene sets that you are using in the tutorial differ quite a bit from the gene sets that are used in Seurat's cc.genes (https://satijalab.org/seurat/reference/cc.genes.html). For a dataset that I am not allowed to show because it is not published yet, I get very different results depending on the gene set. And the results that use the cc.genes gene set look much more convincing.

Unfortunately, I do not have the time to test whether there is a difference in the dataset that you looked at. But maybe there should be some words on the importance of the choice of the gene sets in the tutorial.

@LuckyMD
Copy link
Contributor

LuckyMD commented Oct 8, 2021

Hey @fabianrost84,
Thanks for highlighting this. We're just looking into a large update of this tutorial and will make sure to include this note.

For context: there are two sets of widely used gene sets for CC scoring. One from Macosko et al (used here), and one from Tirosh et al (used in Seurat and Scanpy tutorials)... I've used both as well, and since publishing have noticed that the Tirosh work better in some cases. I have found as well that you can introduce an offset in S vs G2M scores to make the Macosko et al gene set work better again, but that's not done anywhere yet... so the Tirosh gene set might just be more useful.

In case someone comes looking for the Tirosh gene set (formatted to mouse gene names) here, here is the file:
Tirosh_cell_cycle_genes_mouse.txt

The notebook code would just have to be replaced by the relevant code in the scanpy tutorial:

s_genes = cell_cycle_genes[:43]
g2m_genes = cell_cycle_genes[43:]
cell_cycle_genes = [x for x in cell_cycle_genes if x in adata.var_names]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants