Skip to content

Fix t-tests when variance is zero #621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 28, 2019
Merged

Conversation

ivirshup
Copy link
Member

Just a quick fix, should probably replace most of this code with scipy.stats.ttest_ind/ scipy.stats.stats.ttest_ind_from_stats with equal_var=False.

@ivirshup
Copy link
Member Author

#620

@ivirshup
Copy link
Member Author

Someone should review this before it's merged. I think this won't cause any problems, but I'm also not too familiar with the DE code.

@LuckyMD
Copy link
Contributor

LuckyMD commented Apr 26, 2019

I wonder why this wasn't done in the first place. Is scipy not already a dependency of scanpy? Or is this slower than the initial implementation?

@falexwolf
Copy link
Member

Looks great! I wasn't aware of this high-dimensional version of a t-test in Scipy, which seems to be as efficient as the current implementation. I only investigated thoroughly for Wilcoxon rank and found that Scipy doesn't have a scalable version to offer.

But yes, this will get merged after 1.4.1.

@falexwolf
Copy link
Member

1.4.1 is out, are we sure this is as scalable as it was before and not a backwards breaking change. If yes, we can merge immediately.

@ivirshup
Copy link
Member Author

ivirshup commented Apr 27, 2019

I'm not completely sure this doesn't break anything, but the regression tests pass. The internal code is very similar, so I'm not too worried about these changes.

It does look like it's (very) slightly slower. Running this a thousand times for pbmc68k dataset took ~2.3% longer (about 1.4 ms per run) than the previous version. That said, we're very inefficient about mean and variance calculation, so I think that's a better place to optimize.

Edit: I've force pushed to fix some minor formatting issues (trailing white space, blank line, typo) that I didn't think deserved it's own commit.

@falexwolf
Copy link
Member

Great! 2.3% is nothing...

@falexwolf falexwolf merged commit 0712168 into scverse:master Apr 28, 2019
awnimo pushed a commit to dpeerlab/scanpy that referenced this pull request Dec 17, 2019
Fix t-tests when variance is zero
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants