-
Notifications
You must be signed in to change notification settings - Fork 40
Description
Describe the bug
I have identified a discrepancy in the p-values produced by 'scikit_posthocs.posthoc_conover_friedman' when compared to the results obtained through manual calculation using the provided references ([1] Conover & Iman, 1979; [2] Conover, 1999). Upon closer examination of '_posthocs.py', I believe the issue lies in the formula implemented on line 731 (calculation of T2) which is inconsistent with the formula specified in references [1] and [2]:
# (...)
# Below is the critical part of scikit_posthocs.posthoc_conover_friedman function:
x['mat'] = x.groupby(_block_col)[_y_col].rank()
R = x.groupby(_group_col)['mat'].sum()
A1 = (x['mat'] ** 2).sum()
m = 1
S2 = m/(m*k - 1.) * (A1 - m*k*n*(m*k + 1.)**2./4.)
T2 = 1. / S2 * (np.sum(R) - n * m * ((m * k + 1.) / 2.)**2.)
A = S2 * (2. * n * (m * k - 1.)) / (m * n * k - k - n + 1.)
B = 1. - T2 / (n * (m * k - 1.))
# (...)
Dataset
See the code below.
To Reproduce
from scikit_posthocs import posthoc_conover_friedman
data=[
[3.88, 30.58, 25.24, 4.44, 29.41, 38.87],
[5.64, 30.14, 33.52, 7.94, 30.72, 33.12],
[5.76, 16.92, 25.45, 4.04, 32.92, 39.15],
[4.25, 23.19, 18.85, 4.40, 28.23, 28.06],
[5.91, 26.74, 20.45, 4.23, 23.35, 38.23],
[4.33, 10.91, 26.67, 4.36, 12.00, 26.65]
]
pval = posthoc_conover_friedman(data, p_adjust=None)
The script yields in following p-values:
0 1 2 3 4 5
0 1.000000 0.036732 0.019296 0.771000 0.009817 0.001127
1 0.036732 1.000000 0.771000 0.067323 0.561483 0.153707
2 0.019296 0.771000 1.000000 0.036732 0.771000 0.250288
3 0.771000 0.067323 0.036732 1.000000 0.019296 0.002360
4 0.009817 0.561483 0.771000 0.019296 1.000000 0.385789
5 0.001127 0.153707 0.250288 0.002360 0.385789 1.000000
Expected behavior
Correct results (as I believe) for the testing data are reported on page 18 of "The Pairwise Multiple Comparison of Mean Ranks Package (PMCMR)" by Thorsten Pohlert (2016).
I have verified these results using the script below and independently through manual calculation. The verification process followed the methodology outlined in [2] Conover, 1999 (page 371). The validation shows significant discrepancies with p-values produced by scikit_posthocs.
import numpy as np
import scipy.stats as ss
# Data
data = np.array([
# T1 T2 T3 T4 T5 T6 = Treatment (k-times)
[3.88, 30.58, 25.24, 4.44, 29.41, 38.87], # Value 1
[5.64, 30.14, 33.52, 7.94, 30.72, 33.12], # Value 2
[5.76, 16.92, 25.45, 4.04, 32.92, 39.15], # Value 3
[4.25, 23.19, 18.85, 4.40, 28.23, 28.06], # Value 4
[5.91, 26.74, 20.45, 4.23, 23.35, 38.23], # Value 5
[4.33, 10.91, 26.67, 4.36, 12.00, 26.65] # Value 6
])
# Rank data
data_ranked = np.array([ss.rankdata(col) for col in data])
# Calculate Conover statistics (tval)
b, k = data.shape
A1 = np.sum(data_ranked**2)
C1 = b*k*(k+1)**2/4
R = np.sum(data_ranked, axis=0)
T1 = (k-1)*(np.sum(R**2)-b*C1)/(A1-C1)
SE = ((A1-C1)*2*b/((b-1)*(k-1))*(1-T1/(b*(k-1))))**0.5
difR = np.zeros((k, k))
for i in range(k):
for j in range(k):
difR[i, j] = abs(R[i] - R[j])
tval = difR / SE
# Calculate p-values
pval = 2 * ss.t.sf(tval, df=(b*k - k - b + 1))
This yields in following p-values:
[[1.000000 0.000143 0.000030 0.555472 0.000007 0.000000]
[0.000143 1.000000 0.555472 0.000666 0.243213 0.006214]
[0.000030 0.555472 1.000000 0.000143 0.555472 0.024680]
[0.555472 0.000666 0.000143 1.000000 0.000030 0.000000]
[0.000007 0.243213 0.555472 0.000030 1.000000 0.085107]
[0.000000 0.006214 0.024680 0.000000 0.085107 1.000000]]
These values are consistent with Pohlert's results.
System and package information (please complete the following information):
- OS: Windos 10 Pro
- Package version:
- scipy 1.11.1
- scikit_posthocs 0.8.0
** Aditional context **
No p-value adjustment was used in the tested runs.