Skip to content

Calculation of the AD statistics #20

@he-linyun

Description

@he-linyun

Hi,

Thanks for creating the package!

I'm running some tests on the AD test, and have a question on the calculation of the AD statistics.
According to the README.md or ad_test.Rd file, the AD statistics is calculated in the following way

AD = \sum_{x \in k} \left({|E(x)-F(x)| \over \sqrt{2G(x)(1-G(x))/n} }\right)^p

It seems to me that there may be two issues: 1) the formula assumes the two samples sizes are the same; and 2) the approximation of the integral is not correctly calculated.

Let the sample sizes be n1 and n2, with corresponding ecdf E and F in your notation; n=n1+n2 and G be the ecdf of the joint, when p=2,
$AD = \frac{n1\times n2}{n} \int (E(x)-F(x))^2 / (G(x)(1-G(x))) d G(x)$
see F. W. Scholz, M. A. Stephens, (1987) K-Sample Anderson-Darling Tests

Let x_i denote the data in the joint sample, then the integral should be approximated by
$\frac{1}{n} \sum_{i \in [n]} \frac{(E(x_i)-F(x_i))^2}{(G(x_i)(1-G(x_i)))}. $
Recall that there is extra $n1*n2/n$, if you make $n1=n2=n/2$,
$AD = \frac{1}{4} \sum_{i \in [n]} \frac{(E(x_i)-F(x_i))^2}{(G(x_i)(1-G(x_i)))},$
which is different from your formula (extra $n$ is multiplied there).

Plus, tried with some simple datasets, the Test Stat returned from ad_test is related to the total sample size.

Please let me know if this makes sense, or if I am wrong.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions