-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Markers different in BPCells than in Seurat #45
Comments
Hi @mihem, We could also think about using this function you mention @mihem, but at first we want to ensure FindMarkers is able to return the same results on the same data for the same method whether the data is a BPCells or dgCMatrix. |
Thanks for the reproducible example here @mihem! A couple notes, mostly regarding your calculation of adjusted p-values and log2FC values based on the BPCells outputs
As for @Gesmira's questions, BPCells does have a (currently-unexported) rank_transform function (documented here), which returns offset ranks to preserve sparsity, so I'm not sure if that would work properly with |
Hi Gesmira and Ben, thanks for your quick response and all your great work! Here's the updated version: https://osmzhlab.uni-muenster.de/nfs/bpcells/bpcells_debug.html @Gesmira I am sorry, you were right, I didn't update for a few weeks and didn't include a session Info. I updated to the most github version, and then it works, or well as you say log2FC work (and they are the same as when calculcated with the sparse matrix!), but no p values (which are essential imo). Also it's still worlds slower than @bnprks sorry, you are right, unadjusted p values are the same, and you are right I used BH correction (because that's what it said in the tutorial). Since we are essentially double dipping https://arxiv.org/abs/2207.00554, we should probably be stricter and use Bonferroni (or even better using countsplit, but that's out of scope here). The difference we can still see, is probably because in the BPcells version it's corrected for ALL markers, and in the Seurat version only for the one in cluster 13 (above the threshold). Which version would you think is correct? Concerning logFC, there is a lot of debate whether it's correctly done in Seurat @bnprks so the correct way seems to be first log, then mean. Is that the way BPCells does it?
based on the If yes, then a Seurat user could just use BPcells function (since BPCells is a dependency anyway) .. or as you say @Gesmira integrate the rank_transform function into Seuratv5 and use FoldChange (double checking it's first log then mean). Thanks! |
Answering briefly since I think our discussion might be starting to drift from discussing bugs to discussing opinions on analysis strategies
I'm happy to answer any further questions about what BPCells actually calculates in |
Thanks. Sorry, don't want to talk about analysis strategies.
|
I'm glad the performance is handy for you mihem -- I'll also mention the presto library which predates BPCells and uses much the same strategy to do fast Wilcoxon rank-sum tests with similarly good performance. Thanks for the references about mean-log ordering -- I'll have to think if there's a good way I can add a note to the docs to point towards helpful resources for end-users. I'll likely never be able to return a logFC column directly since BPCells doesn't have the metadata access to know if the data has already been log-normalized, but at least I can point people in the right direction for how to calculate it themselves. |
Thanks. Yes, presto has been my prefered choice for identifying markers the last few years. Sorry, I still don't really get why This is now mostly again concerning SeuratTeam i guess @Gesmira
would be also a great and super easy solution if it were mathematically correct and optimally gave the same as the Seurat @samuel-marsh maybe also interested in that since |
Hi, Just chiming in here on the scCustomize portion. scCustomize doesn't depend on presto (although it is compatible with presto outputs). The only functions which interact with DE results are I use presto as example in the vignettes here but that is just example of non-Seurat DE data.frame. So long as you can supply a column name to rank markers by, gene_column (if present; and Best, |
Hi all,
In terms of FoldChange, our implementation of the BPCells marker_features ends up leading to the same fold change results as if you ran FindMarkers on an object with a default matrix type. Although we recognize there is some debate here, we are currently maintaining the way we do FoldChange so as to be consistent with our prior code. However, our team is working on a new implementation of FindMarkers that should be faster for all data types ideally by the Seurat 5 release. |
Great. Thanks Gesmira and Ben, I think from a user perspective this is ideal! @Gesmira Unfortunately I cannot run my previous example because of issues with |
Fixed now! |
@bnprks Thank you so much for this awesome package. Using BPCells is worlds faster and worlds more memory efficient, which allows analyzing huge datasets. The integration by @Gesmira and others into Seurat is also great.
I struggle with finding the top markers. Unfortunately, it's not possible until now to use the BPCells Matrix in Seurat satijalab/seurat#7516.
However, I saw that you also offered the
marker_features
function which works directly on the BPCells Matrix and can be easily integrated with the Seurat workflow. It's worlds faster, however the results differ.P values maybe different because of Bonferroni, but log2FC should be the same (or is there a mistake in my code?) and the order of the genes should be the same
I created a reproducible example with parts of my own published dataset.
https://osmzhlab.uni-muenster.de/nfs/bpcells/bpcells_debug.html
Maybe @Gesmira can also help? I think this would be a very easy way for Seurat to integrate a very fast
FindMarkers
alternative that works on BPCell Matrix.Thanks!
The text was updated successfully, but these errors were encountered: