Calculation of the AD statistics

Hi,

Thanks for creating the package!

I'm running some tests on the AD test, and have a question on the calculation of the AD statistics. 
According to the `README.md` or `ad_test.Rd` file, the AD statistics is calculated in the following way

> AD = \sum_{x \in k} \left({|E(x)-F(x)| \over \sqrt{2G(x)(1-G(x))/n} }\right)^p

It seems to me that there may be two issues: 1) the formula assumes the two samples sizes are the same; and 2) the approximation of the integral is not correctly calculated.

Let the sample sizes be n1 and n2, with corresponding ecdf E and F in your notation; n=n1+n2 and G be the ecdf of the joint, when p=2,
$AD = \frac{n1\times n2}{n} \int (E(x)-F(x))^2 / (G(x)(1-G(x))) d G(x)$
see [F. W. Scholz, M. A. Stephens, (1987) K-Sample Anderson-Darling Tests](https://www.jstor.org/stable/2288805?origin=crossref)

Let x_i denote the data in the joint sample, then the integral should be approximated by
$\frac{1}{n} \sum_{i \in [n]}  \frac{(E(x_i)-F(x_i))^2}{(G(x_i)(1-G(x_i)))}. $
Recall that there is extra $n1*n2/n$, if you make $n1=n2=n/2$,
$AD = \frac{1}{4} \sum_{i \in [n]}  \frac{(E(x_i)-F(x_i))^2}{(G(x_i)(1-G(x_i)))},$
which is different from your formula (extra $n$ is multiplied there).

Plus, tried with some simple datasets, the `Test Stat` returned from `ad_test` is related to the total sample size.

Please let me know if this makes sense, or if I am wrong.

Thanks!




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Calculation of the AD statistics #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Calculation of the AD statistics #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions