Kuiper's test

One-sample Kuiper test

Summarize

Perspective

The one-sample test statistic, $V_{n}$ , for Kuiper's test is defined as follows. Let F be the continuous cumulative distribution function which is to be the null hypothesis. Denote by F_n the empirical distribution function for n independent and identically distributed (i.i.d.) observations X_i, which is defined as

F_{n}(x)={\frac {{\text{number of (elements in the sample}}\leq x)}{n}}={\frac {1}{n}}\sum _{i=1}^{n}1_{(-\infty ,x]}(X_{i}),

where

1_{(-\infty ,x]}(X_{i})

is the indicator function, equal to 1 if

X_{i}\leq x

and equal to 0 otherwise.

Then the one-sided Kolmogorov–Smirnov statistic for the given cumulative distribution function F(x) is

D_{n}^{+}=\sup _{x}[F_{n}(x)-F(x)],

D_{n}^{-}=\sup _{x}[F(x)-F_{n}(x)],

where $\sup$ is the supremum function. And finally the one-sample Kuiper test is defined as,

V_{n}=D_{n}^{+}+D_{n}^{-},

or equivalently

V_{n}=\sup _{x}[F_{n}(x)-F(x)]-\inf _{x}[F_{n}(x)-F(x)],

where $\inf$ is the infimum function.

Tables for the critical points of the test statistic $V_{n}$ are available,^[2] and these include certain cases where the distribution being tested is not fully known, so that parameters of the family of distributions are estimated.

The asymptotic distribution of the statistic ${\sqrt {n}}V_{n}$ is given by,^[1]

{\begin{aligned}\operatorname {Pr} ({\sqrt {n}}V_{n}\leq x)=&1-2\sum _{k=1}^{\infty }(-1)^{k-1}(4k^{2}x^{2}-1)e^{-2k^{2}x^{2}}\\&+{\frac {8}{3{\sqrt {n}}}}x\sum _{k=1}^{\infty }k^{2}(4k^{2}x^{2}-3)e^{-2k^{2}x^{2}}+o({\frac {1}{n}}).\end{aligned}}

For $x>{\frac {6}{5}}$ , a reasonable approximation is obtained from the first term of the series as follows

1-2(4x^{2}-1)e^{-2x^{2}}+{\frac {8x}{3{\sqrt {n}}}}(4x^{2}-3)e^{-2x^{2}}.

Remove ads

Two-sample Kuiper test

Summarize

Perspective

The Kuiper test may also be used to test whether a pair of random samples, either on the real line or the circle coming from a common but unknown distribution. In this case, the Kuiper statistic is

V_{n,m}=\sup _{x}[F_{1,n}(x)-F_{2,m}(x)]-\inf _{x}[F_{1,n}(x)-F_{2,m}(x)],

where $F_{1,n}$ and $F_{2,m}$ are the empirical distribution functions of the first and the second sample respectively, $\sup$ is the supremum function, and $\inf$ is the infimum function.

Remove ads

Example

Summarize

Perspective

We could test the hypothesis that computers fail more during some times of the year than others. To test this, we would collect the dates on which the test set of computers had failed and build an empirical distribution function. The null hypothesis is that the failures are uniformly distributed. Kuiper's statistic does not change if we change the beginning of the year and does not require that we bin failures into months or the like.^[1]^[3] Another test statistic having this property is the Watson statistic,^[3]^[4] which is related to the Cramér–von Mises test.

However, if failures occur mostly on weekends, many uniform-distribution tests such as K-S and Kuiper would miss this, since weekends are spread throughout the year. This inability to distinguish distributions with a comb-like shape from continuous uniform distributions is a key problem with all statistics based on a variant of the K-S test. Kuiper's test, applied to the event times modulo one week, is able to detect such a pattern. Using event times that have been modulated with the K-S test can result in different results depending on how the data is phased. In this example, the K-S test may detect the non-uniformity if the data is set to start the week on Saturday, but fail to detect the non-uniformity if the week starts on Wednesday.

One-sample Kuiper test

Two-sample Kuiper test

Example

See also

References

Wikiwand - on