Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented credibility_interval() #188

Open
wants to merge 89 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 77 commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
00c61e5
Implemented Samples.credibility_interval
Mar 22, 2022
8098bfd
Improve documentation
Mar 22, 2022
f6afe0a
Formatting / make flake8 happy
Mar 22, 2022
6c816eb
pydocstyle compliance
Mar 22, 2022
e3c1f21
Fix typo made while flake8-ing
Mar 22, 2022
c22e304
Merge branch 'williamjameshandley:master' into credibility-interval
Stefan-Heimersheim Jun 8, 2022
3f8f2cb
Merge branch 'master' into credibility-interval
williamjameshandley Jul 12, 2022
a39c038
That will teach me to try and use github to manually merge
williamjameshandley Jul 12, 2022
be0eb50
Move credibility_interval to utils
Jul 19, 2022
3b20f1c
Formatting
Jul 19, 2022
4a66d74
flake8 wants def rather than lambda
Jul 19, 2022
0d53bce
Remove unneeded variable
Jul 19, 2022
d6b8bb9
Remove imports no longer needed
Jul 19, 2022
da7ebaf
docstring format
Jul 19, 2022
92f24be
Updated CI checks to be in line with github requirements
williamjameshandley Jul 27, 2022
19b3e1a
Moving to full coverage
williamjameshandley Jul 27, 2022
e97526e
Added a link to the fastCL repo
williamjameshandley Jul 27, 2022
12a6bc8
Unified quantiles, cdfs and credibility-intervals
williamjameshandley Jul 27, 2022
bcf2ae1
Added some CDF tests
williamjameshandley Jul 28, 2022
095a64a
Merge branch 'master' into credibility-interval
williamjameshandley Aug 3, 2022
cf95f46
Merge branch 'master' into credibility-interval
williamjameshandley Aug 5, 2022
7f56e78
Merge branch 'master' into credibility-interval
lukashergt Aug 6, 2022
d9c4076
Merge branch 'master' into credibility-interval
lukashergt Aug 9, 2022
25e3843
Merge branch 'master' into credibility-interval
lukashergt Aug 10, 2022
693abef
Implemented compress_weights-based credibility_interval with uncertai…
Nov 3, 2022
f6e9b40
Merge remote-tracking branch 'upstream/master' into credibility-interval
Apr 6, 2023
7d634b5
Remove cdf() function and quantile changes
Apr 6, 2023
f33decf
Implement covariance and clean up code
Apr 6, 2023
0b938e1
Fix indexing
Apr 10, 2023
3d26a41
Update tests
Apr 10, 2023
e5cb4b6
flake8 compliance
Apr 10, 2023
30c70bf
Merge branch 'master' into credibility-interval
Apr 10, 2023
36c58d2
Merge branch 'master' into credibility-interval
lukashergt Apr 11, 2023
4f79a09
version bump from 2.0.0-beta.28 to 2.0.0-beta.29
lukashergt Apr 11, 2023
dccfe4b
fix docstring formatting of `utils.credibility_interval`
lukashergt Apr 12, 2023
84ed5b6
Add tests for other methods
Apr 26, 2023
d1aaf02
Added tests for compress_samples
Apr 26, 2023
2527895
Updated docstring
Apr 26, 2023
b466d1a
Removed u kw argument
Apr 26, 2023
0d59b7c
n_iter to nsamples
Apr 26, 2023
2a0d3b8
Added CDF fill_value; added tests
Apr 26, 2023
ed4607f
Typo
Apr 26, 2023
b4b78e6
Renamed methods to full names
Apr 26, 2023
9911dd2
flake8 compatibility
Apr 26, 2023
5ac5fca
Implemented Samples.credibility_interval
Apr 26, 2023
86d3836
flake8 compatibility
Apr 26, 2023
dbd1303
Implemented suggestions
May 15, 2023
7ce8b59
Improved tests with Lukas' suggestions
May 15, 2023
f016fa5
Improve tests
May 15, 2023
fb42c59
bump version number to 2.0.0b30
lukashergt May 15, 2023
5c14599
Merge branch 'master' into credibility-interval
lukashergt May 15, 2023
1997ffd
Update _version.py
lukashergt May 15, 2023
b0862e1
Merge branch 'master' into credibility-interval
lukashergt May 25, 2023
a92f822
Update _version.py
lukashergt May 27, 2023
4d16340
Update README.rst
lukashergt May 27, 2023
1e78eb7
Merge branch 'master' into credibility-interval
lukashergt Jun 7, 2023
7fdd99a
remove `verbose` kwarg which is now unused
lukashergt Jun 7, 2023
8a44f17
remove also docstring of `verbose` kwarg
lukashergt Jun 7, 2023
1327055
rewrite `credibility_interval` _method_ to automatically do all DataF…
lukashergt Jun 9, 2023
16a0d42
version bump to 2.0.0-beta.35
lukashergt Jun 14, 2023
49a3d1b
version bump to 2.0.0-beta.36
lukashergt Jun 14, 2023
48444c4
Merge branch 'master' into credibility-interval
lukashergt Jun 14, 2023
d1cdd8c
fix flake8: blank line contains whitespace
lukashergt Jun 14, 2023
f641a40
Merge branch 'master' into credibility-interval
lukashergt Jun 14, 2023
5fb72f1
version bump to 2.0.0b37
lukashergt Jun 14, 2023
bad654c
update from `ncompress=-1` to `ncompress='equal'`
lukashergt Jun 14, 2023
26ce56a
Merge branch 'master' into credibility-interval
lukashergt Jun 14, 2023
39337dc
update another instance of `ncompress=-1` to `ncompress='equal'`
lukashergt Jun 14, 2023
010a02e
implement `return_covariance` option for lower-limit and upper-limit …
lukashergt Jun 15, 2023
f05f3fd
update `test_credibility_interval` to new dataframe return values
lukashergt Jun 15, 2023
f1e4d11
Merge branch 'master' into credibility-interval
williamjameshandley Jun 29, 2023
707376b
Merge branch 'master' into credibility-interval
lukashergt Jul 19, 2023
463ae88
Merge branch 'master' into credibility-interval
williamjameshandley Jul 26, 2023
49863cc
newline
williamjameshandley Jul 26, 2023
1ec5b8e
move `credibility_interval` method from `samples.py` to `weighted_pan…
lukashergt Jul 27, 2023
0c63e6e
add tests for Series method of `credibility_interval`
lukashergt Jul 27, 2023
54bf848
Merge branch 'master' into credibility-interval
lukashergt Jul 27, 2023
292aba5
Merge branch 'master' into credibility-interval
lukashergt Jul 31, 2023
b85b84d
Merge branch 'master' into credibility-interval
williamjameshandley Aug 1, 2023
525381c
Merge branch 'master' into credibility-interval
williamjameshandley Aug 4, 2023
9fabe8d
Remove unnecessary assertion
Aug 4, 2023
3da3115
Merge branch 'master' into credibility-interval
lukashergt Aug 15, 2023
56a0c6c
version bump to 2.3.0
lukashergt Aug 15, 2023
8aa2bb8
Merge branch 'master' into credibility-interval
lukashergt Aug 16, 2023
5832781
Merge branch 'master' into credibility-interval
lukashergt Aug 16, 2023
48b1a8b
version bump to 2.4.0
lukashergt Aug 16, 2023
202d754
Merge branch 'master' into credibility-interval
lukashergt Sep 30, 2023
fe25aeb
version bump to 2.5.0
lukashergt Sep 30, 2023
3e6bd4b
Merge branch 'master' into credibility-interval
williamjameshandley May 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
anesthetic: nested sampling post-processing
===========================================
:Authors: Will Handley and Lukas Hergt
:Version: 2.1.2
:Version: 2.2.0
:Homepage: https://github.com/handley-lab/anesthetic
:Documentation: http://anesthetic.readthedocs.io/

Expand Down
2 changes: 1 addition & 1 deletion anesthetic/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '2.1.2'
__version__ = '2.2.0'
127 changes: 127 additions & 0 deletions anesthetic/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import pandas
from scipy import special
from scipy.interpolate import interp1d
from scipy.optimize import minimize_scalar
from scipy.stats import kstwobign, entropy
from matplotlib.tri import Triangulation
import contextlib
Expand Down Expand Up @@ -151,6 +152,132 @@ def quantile(a, q, w=None, interpolation='linear'):
return quant


def sample_cdf(samples, inverse=False, interpolation='linear'):
"""Sample the empirical cdf for a 1d array."""
samples = np.sort(samples)
ngaps = len(samples)-1
gaps = np.random.dirichlet(np.ones(ngaps))
cdf = np.array([0, *np.cumsum(gaps)])
assert np.isclose(cdf[-1], 1, atol=1e-9, rtol=1e-9), \
"Error: CDF does not reach 1 but "+str(cdf[-1])
# Set the last element (tested to be approx 1)
# to exactly 1 to avoid interpolation errors
cdf[-1] = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more appropriate to normalise as follows?

cdf /= cdf[-1]

Or would it be better to handle bound errors within interp1d? (see below)

Copy link
Collaborator Author

@Stefan-Heimersheim Stefan-Heimersheim Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly it doesn't matter since atol=1e-9, this is just dealing with python float precision. So lets stick to the simpler version to avoid looking like we're doing some actual normalisation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly prefer for there not to be an assert statement (which the normalisation would fix). This kind of thing would be infuriating as part of a large automated pipeline where floating point errors derail a larger workflow.

if inverse:
return interp1d(cdf, samples, kind=interpolation)
else:
return interp1d(samples, cdf, kind=interpolation,
fill_value=(0, 1), bounds_error=False)


def credibility_interval(samples, weights=None, level=0.68, method="iso-pdf",
return_covariance=False, nsamples=12):
"""Compute the credibility interval of weighted samples.

Based on linear interpolation of the cumulative density function, thus
expect discretisation errors on the scale of distances between samples.

https://github.com/Stefan-Heimersheim/fastCI#readme

Parameters
----------
samples : array
Samples to compute the credibility interval of.
weights : array, default=np.ones_like(samples)
Weights corresponding to samples.
level : float, default=0.68
Credibility level (probability, <1).
method : str, default='iso-pdf'
Which definition of interval to use:

* ``'iso-pdf'``: Calculate iso probability density interval with the
same probability density at each end. Also known as
waterline-interval or highest average posterior density interval.
This is only accurate if the distribution is sufficiently uni-modal.
* ``'lower-limit'``/``'upper-limit'``: Lower/upper limit. One-sided
limits for which ``level`` fraction of the (equally weighted) samples
lie above/below the limit.
* ``'equal-tailed'``: Equal-tailed interval with the same fraction of
(equally weighted) samples below and above the interval region.

return_covariance: bool, default=False
Return the covariance of the sampled limits, in addition to the mean
nsamples : int, default=12
Number of CDF samples to improve `mean` and `std` estimate.
lukashergt marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
limit(s) : float, array, or tuple of floats or arrays
Returns the credibility interval boundari(es). By default,
returns the mean over ``nsamples`` samples, which is either
two numbers (``method='iso-pdf'``/``'equal-tailed'``) or one number
(``method='lower-limit'``/``'upper-limit'``). If
``return_covariance=True``, returns a tuple (mean(s), covariance)
where covariance is the covariance over the sampled limits.
"""
if level >= 1:
raise ValueError('level must be <1, got {0:.2f}'.format(level))
if len(np.shape(samples)) != 1:
raise ValueError('Support only 1D arrays for samples')
if weights is not None and np.shape(samples) != np.shape(weights):
raise ValueError('Shape of samples and weights differs')

# Convert to numpy to unify indexing
samples = np.array(samples.copy())
if weights is None:
weights = np.ones(len(samples))
else:
weights = np.array(weights.copy())

# Convert samples to unit weight not the case
if not np.all(np.logical_or(weights == 0, weights == 1)):
# compress_weights with ncompress='equal' assures weights \in 0,1
# Note that this must be done, we cannot handle weights != 1
# see this discussion for details:
# https://github.com/williamjameshandley/anesthetic/pull/188#issuecomment-1274980982
weights = compress_weights(weights, ncompress='equal')

indices = np.where(weights)[0]
x = samples[indices]

# Sample the confidence interval multiple times
# to get errorbars on confidence interval boundaries
ci_samples = []
for i in range(nsamples):
invCDF = sample_cdf(x, inverse=True)
if method == 'iso-pdf':
# Find smallest interval
def distance(Y, level=level):
return invCDF(Y+level)-invCDF(Y)
res = minimize_scalar(distance, bounds=(0, 1-level),
method="Bounded")
ci_samples.append(np.array([invCDF(res.x),
invCDF(res.x+level)]))
elif method == 'lower-limit':
# Get value from which we reach the desired level
ci_samples.append(invCDF(1-level))
elif method == 'upper-limit':
# Get value to which we reach the desired level
ci_samples.append(invCDF(level))
elif method == 'equal-tailed':
ci_samples.append(np.array([invCDF((1-level)/2),
invCDF((1+level)/2)]))
else:
raise ValueError("Method '{0:}' unknown".format(method))
ci_samples = np.array(ci_samples)
if np.shape(ci_samples) == (nsamples, ):
if return_covariance:
return np.mean(ci_samples), np.cov(ci_samples)
else:
return np.mean(ci_samples)
else:
if return_covariance:
return np.mean(ci_samples, axis=0), \
np.cov(ci_samples, rowvar=False)
else:
return np.mean(ci_samples, axis=0)


def mirror_1d(d, xmin=None, xmax=None):
"""If necessary apply reflecting boundary conditions."""
if xmin is not None and xmax is not None:
Expand Down
128 changes: 125 additions & 3 deletions anesthetic/weighted_pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,17 @@
import warnings
from inspect import signature
import numpy as np
from numpy.ma import masked_array
from pandas import Series, DataFrame, concat, MultiIndex
from pandas.core.groupby import GroupBy, SeriesGroupBy, DataFrameGroupBy, ops
from pandas._libs import lib
from pandas._libs.lib import no_default
from pandas.util._exceptions import find_stack_level
from pandas.util import hash_pandas_object
from numpy.ma import masked_array
from anesthetic.utils import (compress_weights, neff, quantile,
temporary_seed, adjust_docstrings)
from pandas.core.dtypes.missing import notna
from anesthetic.utils import (compress_weights, neff, quantile,
temporary_seed, adjust_docstrings,
credibility_interval)


class WeightedGroupBy(GroupBy):
Expand Down Expand Up @@ -345,6 +346,55 @@ def compress(self, ncompress=True):
def sample(self, *args, **kwargs): # noqa: D102
return super().sample(weights=self.get_weights(), *args, **kwargs)

def credibility_interval(self, level=0.68, method="iso-pdf",
return_covariance=False, nsamples=12):
"""Compute the credibility interval of the weighted samples.

Based on linear interpolation of the cumulative density function, thus
expect discretisation errors on the scale of distances between samples.

https://github.com/Stefan-Heimersheim/fastCI#readme

Parameters
----------
level : float, default=0.68
Credibility level (probability, <1).
method : str, default='iso-pdf'
Which definition of interval to use:

* ``'iso-pdf'``: Calculate iso probability density interval with
the same probability density at each end. Also known as
waterline-interval or highest average posterior density interval.
This is only accurate if the distribution is sufficiently
uni-modal.
* ``'lower-limit'``/``'upper-limit'``: Lower/upper limit. One-sided
limits for which ``level`` fraction of the (equally weighted)
samples lie above/below the limit.
* ``'equal-tailed'``: Equal-tailed interval with the same fraction
of (equally weighted) samples below and above the interval
region.

return_covariance: bool, default=False
Return the covariance of the sampled limits, in addition to the
mean
nsamples : int, default=12
Number of CDF samples to improve `mean` and `std` estimate.

Returns
-------
limit(s) : float, array, or tuple of floats or arrays
Returns the credibility interval boundaries of the Series.
By default, returns the mean over ``nsamples`` samples, which is
either two numbers (``method='iso-pdf'``/``'equal-tailed'``) or
one number (``method='lower-limit'``/``'upper-limit'``). If
``return_covariance=True``, returns a tuple (mean(s), covariance)
where covariance is the covariance over the sampled limits.
"""
return credibility_interval(self, weights=self.get_weights(),
level=level, method=method,
return_covariance=return_covariance,
nsamples=nsamples)

@property
def _constructor(self):
return WeightedSeries
Expand Down Expand Up @@ -600,6 +650,78 @@ def sample(self, *args, **kwargs): # noqa: D102
else:
return super().sample(*args, **kwargs)

def credibility_interval(self, level=0.68, method="iso-pdf",
return_covariance=False, nsamples=12):
"""Compute the credibility interval of the weighted samples.

Based on linear interpolation of the cumulative density function, thus
expect discretisation errors on the scale of distances between samples.

https://github.com/Stefan-Heimersheim/fastCI#readme

Parameters
----------
level : float, default=0.68
Credibility level (probability, <1).
method : str, default='iso-pdf'
Which definition of interval to use:

* ``'iso-pdf'``: Calculate iso probability density interval with
the same probability density at each end. Also known as
waterline-interval or highest average posterior density interval.
This is only accurate if the distribution is sufficiently
uni-modal.
* ``'lower-limit'``/``'upper-limit'``: Lower/upper limit. One-sided
limits for which ``level`` fraction of the (equally weighted)
samples lie above/below the limit.
* ``'equal-tailed'``: Equal-tailed interval with the same fraction
of (equally weighted) samples below and above the interval
region.

return_covariance: bool, default=False
Return the covariance of the sampled limits, in addition to the
mean
nsamples : int, default=12
Number of CDF samples to improve `mean` and `std` estimate.

Returns
-------
limit(s) : float, array, or tuple of floats or arrays
Returns the credibility interval boundaries for each column.
By default, returns the mean over ``nsamples`` samples, which is
either two numbers (``method='iso-pdf'``/``'equal-tailed'``) or
one number (``method='lower-limit'``/``'upper-limit'``). If
``return_covariance=True``, returns a tuple (means, covariances)
where covariances are the covariance over the sampled limits for
each column.
"""
if 'lower' in method:
limits = ['lower']
elif 'upper' in method:
limits = ['upper']
else:
limits = ['lower', 'upper']
cis = [credibility_interval(self[col], weights=self.get_weights(),
level=level, method=method,
return_covariance=return_covariance,
nsamples=nsamples) for col in self.columns]
if return_covariance:
cis, covs = zip(*cis)
mulidx = MultiIndex.from_product([
self.columns.get_level_values(level=0),
limits
])
ncol = len(self.columns)
nlim = len(limits)
covs = np.asarray(covs).reshape(nlim*ncol, nlim).T
covs = DataFrame(covs, index=limits, columns=mulidx)
cis = np.atleast_2d(cis) if 'limit' in method else np.asarray(cis).T
cis = DataFrame(data=cis, index=limits, columns=self.columns)
if return_covariance:
return cis, covs
else:
return cis

@property
def _constructor_sliced(self):
return WeightedSeries
Expand Down
33 changes: 33 additions & 0 deletions tests/test_samples.py
Original file line number Diff line number Diff line change
Expand Up @@ -1780,3 +1780,36 @@ def test_axes_limits_2d(kind, kwargs):
assert 3 < xmax < 3.9
assert -3.9 < ymin < -3
assert 3 < ymax < 3.9


def test_credibility_interval():
np.random.seed(0)
pc = read_chains('./tests/example_data/pc')

ci, cov = pc.x0.credibility_interval(level=0.68,
method="iso-pdf",
return_covariance=True)
assert ci[0] == pytest.approx(-0.1, rel=0.01, abs=0.01)
assert ci[1] == pytest.approx(+0.1, rel=0.01, abs=0.01)
assert np.all(np.abs(cov) < 1e-3)
assert np.shape(ci) == (2,)
assert np.shape(cov) == (2, 2)

params = ['x0', 'x1']
ci, cov = pc[params].credibility_interval(level=0.68,
method="iso-pdf",
return_covariance=True)
assert_allclose(ci.loc['lower'], -0.1, rtol=0.01, atol=0.01)
assert_allclose(ci.loc['upper'], +0.1, rtol=0.01, atol=0.01)
assert np.all(np.abs(cov) < 1e-3)
assert ci.shape == (2, len(params))
assert cov.shape == (2, 2 * len(params))
ci, cov = pc[params].credibility_interval(level=0.95+0.025,
method='lower-limit',
return_covariance=True)
assert_allclose(ci, -0.2, rtol=0.01, atol=0.025)
assert np.all(np.abs(cov) < 1e-3)
assert cov.shape == (1, len(params))
ci = pc[params].credibility_interval(level=0.95+0.025,
method='upper-limit')
assert_allclose(ci, +0.2, rtol=0.01, atol=0.025)
Loading
Loading