ENH: Add CDF and SF of the Wilcoxon distribution by fbourgey · Pull Request #124 · scipy/xsf

fbourgey · 2026-04-13T20:55:21Z

Reference issue

Toward #98

What does this implement/fix?

Implement the CDF and SF of the Wilcoxon signed-rank distribution.
Port the exact distribution logic from scipy.stats._wilcoxon.py
Precompute a CDF table from the PMF so repeated CDF/SF queries for a given n are efficient.

fbourgey · 2026-04-13T20:56:56Z

+inline std::vector<double> wilcoxon_pmf(int n) {
+    // PMF of the Wilcoxon signed-rank statistic.
+    // The returned vector has length n*(n+1)/2 + 1, and the k-th element contains
+    // the probability of observing a test statistic equal to k for n pairs of
+    // observations.
+    std::vector<double> pmf = {1.0};
+    for (int k = 1; k < n + 1; k++) {
+        std::vector<double> tmp(k * (k + 1) / 2 + 1, 0.0);
+        int m = static_cast<int>(pmf.size());
+        for (int i = 0; i < m; i++) {
+            tmp[i] += 0.5 * pmf[i];
+            tmp[i + k] += 0.5 * pmf[i];
+        }
+        pmf = std::move(tmp);
+    }
+    return pmf;
+}


From https://github.com/scipy/scipy/blob/main/scipy/stats/_hypotests.py#L697-L711

dschmitz89 · 2026-05-20T06:24:57Z

+inline double wilcoxon_cdf(double k, const std::vector<double> &cdf_table, int n) {
+    // CDF of the Wilcoxon signed-rank statistic for real k and n pairs of observations.
+    if (std::isnan(k) || n < 0) {
+        return std::numeric_limits<double>::quiet_NaN();
+    }
+    return wilcoxon_cdf(static_cast<int>(k), cdf_table, n);
+}


Maybe I am blind but I do not understand how this function would be called inside SciPy. Should we first generate the CDF table and then call this method? That would be highly unregular pattern for our special functions. I would expect there to be a simple wilcoxon_cdf function. Is this difficult here for performance reasons?

@dschmitz89, I've worked out a pattern where the results of computations which are based on only some of a ufunc's args can be reused across iterations over the other args. See for instance this stateful kernel for for the poisson binomial pmf in SciPy

https://github.com/scipy/scipy/blob/851124cf2893187a07ade72d757938cc145b79fd/scipy/special/_gufuncs.cpp#L72-L107

/* gufunc kernels which use internal caches. * * Such caches live only during the course of a single call to the ufunc * in order to eliminate redundant calculations. This does not concern * user-facing caching across ufunc calls. * * In order for such caches to serve their purpose, the iteration order over the * course of the ufunc loops must be chosen so that elements (or core slices) of * the inputs that go into the cached computation vary most slowly and elements * of the inputs for which the cached result can be reused vary most quickly. * This can be accomplished by wrapping the gufunc with `_with_cache_optimization` * from ./_ufunc_tools.py. See the docstring of `_with_cache_optimization` for * more information. */ template <typename T_1d> struct _poisson_binom_pmf_kernel { using T = typename T_1d::value_type; std::vector<T> dist; T* last_p_ptr = nullptr; T operator()(long long int k, T_1d p) { if (!last_p_ptr) { dist.resize(p.extent(0) + 1); } if (p.data_handle() != last_p_ptr) { /* If p is stepped through in the outer loops and k in the inner loops, * this will yield a cache hit for each inner iteration. The cache is * overwritten unconditionally whenever this steps to a new slice of p. */ xsf::poisson_binom_pmf_all(p, _as_mdspan(dist)); last_p_ptr = p.data_handle(); } return xsf::take_from_pmf(_as_mdspan(dist), k); } };

what @fbourgey has here is not quite right though. To ensure that these xsf kernels will also work in CuPy, they should be using template arguments meant to be instantiated with mdspans, rather than using std::vector. The kernels for generating the pmf and cdf tables should follow the same pattern used for the corresponding Poisson Binom kernels

xsf/include/xsf/stats.h

Lines 538 to 578 in f7b85f5

template <typename InputMat, typename OutputMat>

XSF_HOST_DEVICE inline void poisson_binom_pmf_all(InputMat p, OutputMat res) {

/* Outputs the full pmf of a Poisson-Binomial distribution

*

* p should be an std::mdspan view of a 1d array of length n containing

* the success probabilities for the n Bernoulli trials. res should be

* a std::mdspan view of a 1d array of of length n + 1. It is up to

* the caller to pass valid values for p and res.

*

* Upon completion, res(k) will contain the probability of observing k

* successes for k from 0 to n.

*/

auto n = p.extent(0);

auto out_size = res.extent(0);

if (out_size != n + 1) {

set_error("_poisson_binom_pmf_all", SF_ERROR_MEMORY, "out.shape[-1] must be p.shape[-1] + 1");

return;

}

detail::poisson_binom_pmf_all_impl(p, res);

}

template <typename InputMat, typename OutputMat>

XSF_HOST_DEVICE inline void poisson_binom_cdf_all(InputMat p, OutputMat res) {

/* Outputs the full cdf of a Poisson-Binomial distribution */

using T = typename OutputMat::value_type;

auto n = p.extent(0);

auto out_size = res.extent(0);

if (out_size != n + 1) {

set_error("_poisson_binom_cdf_all", SF_ERROR_MEMORY, "out.shape[-1] must be p.shape[-1] + 1");

return;

}

detail::poisson_binom_pmf_all_impl(p, res);

for (decltype(n) i = 1; i < n; i++) {

res(i) = std::min(res(i) + res(i - 1), T(1.0));

}

res(n) = T(1.0);

}

Also @fbourgey, you should be using xsf::take_from_discrete_cdf and xsf::take_from_pmf

xsf/include/xsf/stats.h

Lines 580 to 601 in f7b85f5

template <typename InputMat>

XSF_HOST_DEVICE inline typename InputMat::value_type take_from_pmf(InputMat pmf, long long int k) {

using T = typename InputMat::value_type;

auto size = pmf.extent(0);

if ((k < 0) || (k >= static_cast<long long int>(size))) {

return T(0.0);

}

return pmf(k);

}

template <typename InputMat>

XSF_HOST_DEVICE inline typename InputMat::value_type take_from_discrete_cdf(InputMat cdf, long long int k) {

using T = typename InputMat::value_type;

auto size = cdf.extent(0);

if (k < 0) {

return T(0.0);

}

if (k >= static_cast<long long int>(size) - 1) {

return T(1.0);

}

return cdf(k);

}

not writing distribution specific functions to do this work.

Thanks all. I need to update this PR following what @steppi implemented for poisson_binom_pmf. Will try to do it asap. This PR was submitted before we added this pattern. Changing to draft for now.

fbourgey added 2 commits April 13, 2026 16:46

ENH: Add Wilcoxon signed-rank statistic PMF and CDF functions

9b80fff

TST: Add unit tests for Wilcoxon CDF and SF functions

9f8a0a3

fbourgey commented Apr 13, 2026

View reviewed changes

STYLE: clang-format

4128852

fbourgey mentioned this pull request Apr 13, 2026

ENH: Port null-distribution special functions from scipy.stats to xsf #98

Open

18 tasks

fbourgey added the enhancement New feature or request label Apr 14, 2026

dschmitz89 reviewed May 20, 2026

View reviewed changes

fbourgey marked this pull request as draft May 20, 2026 13:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: Add CDF and SF of the Wilcoxon distribution#124

ENH: Add CDF and SF of the Wilcoxon distribution#124
fbourgey wants to merge 3 commits into
scipy:mainfrom
fbourgey:wilcoxon

fbourgey commented Apr 13, 2026

Uh oh!

fbourgey Apr 13, 2026

Uh oh!

dschmitz89 May 20, 2026

Uh oh!

steppi May 20, 2026 •

edited

Loading

Uh oh!

fbourgey May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	template <typename InputMat, typename OutputMat>
	XSF_HOST_DEVICE inline void poisson_binom_pmf_all(InputMat p, OutputMat res) {
	/* Outputs the full pmf of a Poisson-Binomial distribution
	*
	* p should be an std::mdspan view of a 1d array of length n containing
	* the success probabilities for the n Bernoulli trials. res should be
	* a std::mdspan view of a 1d array of of length n + 1. It is up to
	* the caller to pass valid values for p and res.
	*
	* Upon completion, res(k) will contain the probability of observing k
	* successes for k from 0 to n.
	*/
	auto n = p.extent(0);
	auto out_size = res.extent(0);

	if (out_size != n + 1) {
	set_error("_poisson_binom_pmf_all", SF_ERROR_MEMORY, "out.shape[-1] must be p.shape[-1] + 1");
	return;
	}

	detail::poisson_binom_pmf_all_impl(p, res);
	}

	template <typename InputMat, typename OutputMat>
	XSF_HOST_DEVICE inline void poisson_binom_cdf_all(InputMat p, OutputMat res) {
	/* Outputs the full cdf of a Poisson-Binomial distribution */
	using T = typename OutputMat::value_type;
	auto n = p.extent(0);
	auto out_size = res.extent(0);

	if (out_size != n + 1) {
	set_error("_poisson_binom_cdf_all", SF_ERROR_MEMORY, "out.shape[-1] must be p.shape[-1] + 1");
	return;
	}

	detail::poisson_binom_pmf_all_impl(p, res);
	for (decltype(n) i = 1; i < n; i++) {
	res(i) = std::min(res(i) + res(i - 1), T(1.0));
	}
	res(n) = T(1.0);
	}

	template <typename InputMat>
	XSF_HOST_DEVICE inline typename InputMat::value_type take_from_pmf(InputMat pmf, long long int k) {
	using T = typename InputMat::value_type;
	auto size = pmf.extent(0);
	if ((k < 0) \|\| (k >= static_cast<long long int>(size))) {
	return T(0.0);
	}
	return pmf(k);
	}

	template <typename InputMat>
	XSF_HOST_DEVICE inline typename InputMat::value_type take_from_discrete_cdf(InputMat cdf, long long int k) {
	using T = typename InputMat::value_type;
	auto size = cdf.extent(0);
	if (k < 0) {
	return T(0.0);
	}
	if (k >= static_cast<long long int>(size) - 1) {
	return T(1.0);
	}
	return cdf(k);
	}

Uh oh!

Conversation

fbourgey commented Apr 13, 2026

Reference issue

What does this implement/fix?

Uh oh!

fbourgey Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

dschmitz89 May 20, 2026

Choose a reason for hiding this comment

Uh oh!

steppi May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fbourgey May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

steppi May 20, 2026 •

edited

Loading