-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented credibility_interval() #188
base: master
Are you sure you want to change the base?
Implemented credibility_interval() #188
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #188 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 36 36
Lines 3058 3127 +69
=========================================
+ Hits 3058 3127 +69 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work @Stefan-Heimersheim. The changes are only cosmetic/refactoring, so this should be good to go soon. Very thought provoking
The sorting
breaks with merged data sets, as in #189 Edit: This does not seem to be an issue anymore,
works. |
Re CI/lint flake8:
Not really sure what's causing flake8 to report an error here. Otherwise this should be good to go; only the quantile treatment isn't merged between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've unified the CDF functions between quantile and credibility_interval. My weighted ECDF was slightly different from yours (defined by min sample has CDF 0, max sample has CDF 1, and the intermediate points have a symmetric average of the weights either side). This coincides with the unweighted form and is symmetric, but I also have some ideas as to how to extend so that the bounds are appropriately slightly larger than the min and max.
I also added some tests so that the coverage is 100%.
If you're happy with these changes please squash and merge.
Best,
Will
Excellent plot! I've done some investigating, and these really are second order effects. What we should be most concerned about is whether cdf(b)-cdf(a) approximates the true probability mass in between b and a. import numpy as np
from scipy.interpolate import interp1d
from scipy.stats import norm
import matplotlib.pyplot as plt
# Generate a dataset
np.random.seed(3)
data = np.random.randn(30)
weights = np.random.rand(30)
i = np.argsort(data)
data = data[i]
weights = weights[i]/weights.sum()
# Symmetric empirical CDF
w_ = np.concatenate([[0.],weights[1:]+weights[:-1]]).cumsum()
w_/=w_[-1]
sym = interp1d(data, w_)
# Left hand empirical CDF
lh = interp1d(data, np.cumsum(weights))
# Right hand empirical CDF
rh = interp1d(data, 1-np.cumsum(weights[::-1])[::-1])
plt.plot(data,sym(data), label='Symmetric')
plt.plot(data,lh(data), label='Left Hand')
plt.plot(data,rh(data), label='Right Hand')
plt.plot(data,norm.cdf(data), label='True CDF')
plt.xlabel('x')
plt.ylabel('CDF')
plt.tight_layout()
plt.savefig('CDF.png')
plt.close() So we see that all perform pretty equivalently relative to the correct answer. Increasing the number of samples improves performance, but the three empirical curves lie on top of one another, which is different at lower order to the true answer. Examining the histograms of the performance relative to the true answer shows them to all perform in an unbiased way W, SYM, LH, RH, T = [], [], [], [], []
for _ in range(10000):
a, b = np.sort(np.random.uniform(data.min(),data.max(),size=(2)))
w = weights[(data < b) & (data > a)].sum()
W.append(w)
T.append(norm.cdf(b)-norm.cdf(a))
SYM.append(sym(b)-sym(a))
LH.append(lh(b)-lh(a))
RH.append(rh(b)-rh(a))
W, SYM, LH, RH, T = np.array([W, SYM, LH, RH, T])
plt.hist(W-T,bins=100)
plt.hist(SYM-T,bins=100)
plt.hist(LH-T,bins=100,alpha=0.8)
plt.hist(RH-T,bins=100,alpha=0.8)
plt.tight_layout()
plt.savefig('hist.png') Increasing to 1000 points just reduces the spread, but not the difference in the methods. Unless you can find a stress test which shows any of these to be better or worse than the others, I'd vote for the symmetric one. |
…`method` in `credibility_interval` method
Thanks @lukashergt for implementing the MultiIndex! That's everything done; as you suggested I'll take a look at the code tonight and we can get this merged. |
I think this is fantastic, thanks for implementing the extension! The only unintuitive case is calling it for a single variable, you have to do
rather than
but our other functions like |
|
@williamjameshandley, since @Stefan-Heimersheim and I have both added functionality here, would you like to review? |
anesthetic/utils.py
Outdated
# Set the last element (tested to be approx 1) | ||
# to exactly 1 to avoid interpolation errors | ||
cdf[-1] = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly prefer for there not to be an assert statement (which the normalisation would fix). This kind of thing would be infuriating as part of a large automated pipeline where floating point errors derail a larger workflow.
I agree, and we do not need to check |
This really just checks whether the NumPy dirichlet function is correct, and we don't need to be checking NumPy here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything else for this PR, @williamjameshandley?
Were there any other changes needed to merge this? |
Description
Added a
credibility_interval
method toMCMCSampels
/NestedSamples
which computes the credibility / confidence interval.
Fixes #178 and replaces #179
Notes:
np.cumsum
, the value of the CDF at thefirst data point is
== its weight > 0
, but at the last data pointis
== 1
. This should be negligible forweights<<1
but forsignificant weights i.e. small sample sizes this can have an effect.
confidence_interval
iso-probability intervals, but rely on the same code
Checklist:
flake8 anesthetic tests
)(
pydocstyle --convention=numpy anesthetic
)(
python -m pytest
)feature works