Skip to content

Conversation

@kschuerholt
Copy link

#225
Implemented CI from 'Calculating confidence intervals for some non-parametric analyses', Campbell and Gardner 1988. CI Style is adapted from ttest. The same publication offers a solution for wilcoxon, which is not yet implemented but could be added fairly easily.

raphaelvallat#225 
Implemented CI from 'Calculating confidence intervals for some non-parametric analyses', Campbell and Gardner 1988. CI Style is adapted from ttest. The same publication offers a solution for wilcoxon, which is not yet implemented but could be added fairly easily.
@raphaelvallat raphaelvallat linked an issue Jan 22, 2022 that may be closed by this pull request
@raphaelvallat raphaelvallat self-requested a review January 22, 2022 01:55
@raphaelvallat raphaelvallat added the feature request 🚧 New feature or request label Jan 22, 2022
@codecov
Copy link

codecov bot commented Jan 22, 2022

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.00%. Comparing base (b1c334d) to head (f31b2e5).
⚠️ Report is 74 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #226   +/-   ##
=======================================
  Coverage   98.99%   99.00%           
=======================================
  Files          19       19           
  Lines        3290     3304   +14     
  Branches      527      531    +4     
=======================================
+ Hits         3257     3271   +14     
  Misses         17       17           
  Partials       16       16           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

conf = confidence
N = scipy.stats.norm.ppf(conf)
ct1, ct2 = len(x),len(y) # count samples
diffs = sorted([i-j for i in x for j in y]) # get ct1xct2 difference
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kschuerholt could we use a numpy function / numpy broadcasting here to avoid the nested for loop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's easy enough. I'll add it in a new commit promptly.

MWU 97.0 two-sided 0.00556 0.515 0.2425
>>> pg.mwu(x, y, alternative='two-sided',confidence=0.95)
U-val alternative p-val RBC CLES CI95%
MWU 97.0 two-sided 0.00556 0.515 0.2425 [-0.39290395101879694, -0.09400270319896187]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the actual output that you get? The CI should normally be rounded to two decimals by the _postprocess_dataframe function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the actual output I get. I was wondering about that, too. But then again, the t-test also gives me full floats (at least when confidence!=0.95), so I thought that was intentional.
I can of course round it in MWU or do you want to adress that elsewhere?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example of the t-test showing that behavior.
grafik

Association and the American Statistical Association, 25(2),
101–132. https://doi.org/10.2307/1165329
.. [5] Campbell, M. J. & Gardner, M. J. (1988). Calculating confidence
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add in the "Notes" section a one line explanation of the CI method?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll give that a go. Like I said, I'm not a statistician, so that'll have to be proof-read by someone

N = scipy.stats.norm.ppf(conf)
ct1, ct2 = len(x),len(y) # count samples
diffs = sorted([i-j for i in x for j in y]) # get ct1xct2 difference
k = int(round(ct1*ct2/2 - (N * (ct1*ct2*(ct1+ct2+1)/12)**0.5)))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure that the code follows the flake8 guideline, i.e. there must be a white space between arithmetic operators

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that.. was editing the file on the fly in github directly, no auto linting/formatting there yet unfortunatley. Next commit will be formatted accordingly.

@raphaelvallat raphaelvallat mentioned this pull request Feb 20, 2022
18 tasks
@raphaelvallat
Copy link
Owner

Hi @kschuerholt,

FYI I have just released a minor release of Pingouin (https://github.com/raphaelvallat/pingouin/releases/tag/v0.5.1) to fix some urgent dependencies bugs. Could you please make sure to update the PR to the new master and solve any conflicts that may arise?

Thank you,
Raphael

@kschuerholt
Copy link
Author

Hi @raphaelvallat

Thanks for the heads-up. It's still on the todo list, but currently other things have to come first. I'm trying to get hold of an original source for CI computation of nonparametric tests. Or did you find something?

Cheers,
Konstantin

@raphaelvallat raphaelvallat mentioned this pull request Jun 18, 2022
11 tasks
@raphaelvallat
Copy link
Owner

Hi,

This PR has been inactive for several years so I'll go ahead and close it. Feel free to re-open.

Raphael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request 🚧 New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Return Confidence Interval for nonparametric Mann Whitney U Test

2 participants