Join count statistic

Binary data

Summarize

Perspective

Given binary data $x_{i}\in \{0,1\}$ distributed over $N$ spatial sites, where the neighbour relations between regions $i$ and $j$ are encoded in the spatial weight matrix

w_{ij}={\begin{cases}1\qquad &i{\text{ neighbor of }}j\\0&{\text{otherwise}}\end{cases}}

the join count statistics are defined as ^[8]^[4]

J=J_{BB}+J_{BW}+J_{WW}

Where

J_{BB}={\frac {1}{2}}\sum _{ij,i\neq j}w_{ij}x_{i}x_{j}

J_{BW}={\frac {1}{2}}\sum _{ij,i\neq j}w_{ij}(x_{i}-x_{j})^{2}

J_{WW}={\frac {1}{2}}\sum _{ij,i\neq j}w_{ij}(1-x_{i})(1-x_{j})

J={\frac {1}{2}}\sum _{ij,i\neq j}w_{ij}

The $B,W$ subscripts refer to 'black'=1 and 'white'=0 sites. The relation $J=J_{BB}+J_{BW}+J_{WW}$ implies only three of the four numbers are independent. Generally speaking, large values of $J_{BB}$ and $J_{WW}$ relative to $J_{BW}$ imply autocorrelation and relatively large values of $J_{BW}$ imply anti-correlation.

To assess the statistical significance of these statistics, the expectation under various null models has been computed.^[9] For example, if the null hypothesis is that each sample is chosen at random according to a Bernoulli process with probability

p={\frac {\text{number of black cells}}{N}}={\frac {N_{1}}{N}}

then Cliff and Ord ^[8] show that

E(J_{BB})={\frac {1}{2}}S_{0}p^{2}

var(J_{BB})={\frac {p^{2}(1-p)}{4}}([S_{1}(1-p)+S_{2}p])

E(J_{BW})=S_{0}p(1-p)

var(J_{BW})={\frac {p(1-p)}{4}}[4S_{1}+S_{2}(1-4p(1-p))]

where

S_{0}=\sum _{ij}w_{ij}

S_{1}={\frac {1}{2}}\sum _{ij}(w_{ji}+w_{ij})^{2}

S_{2}=\sum _{i}(\sum _{j}w_{ji}+\sum _{j}w_{ij})^{2}

However in practice^[10] an approach based on random permutations is preferred, since it requires fewer assumptions.

Remove ads

Local join count statistic

Anselin and Li introduced^[11]^[12] the idea of the local join count statistic, following Anselin's general idea of a Local Indicator of Spatial Association (LISA).^[13] Local Join Count is defined by e.g.

J_{BBi}=x_{i}\sum _{j}w_{ij}x_{j}

with similar definitions for $BW$ and $WW$ . This is equivalent to the Getis–Ord statistics computed with binary data. Some analytic results for the expectation of the local statistics are available based on the hypergeometric distribution^[11] but due to the multiple comparisons problem a permutation based approach is again preferred in practice.^[12]

Remove ads

Extension to multiple categories

When there are $k\geq 2$ categories join count statistics have been generalised^[4]^[8]^[9]

J_{rs}={\frac {1}{2}}\sum _{ij}I_{r}(x_{i})I_{s}(x_{j})

Where $I_{r}(x_{i})=\delta _{r,x_{i}}$ is an indicator function for the variable $x_{i}$ belonging to the category $r$ . Analytic results are available^[14] or a permutation approach can be used to test for significance as in the binary case.

Remove ads

Join count statistic

Binary data

Local join count statistic

Extension to multiple categories

References

Wikiwand - on