Join count statistic

Statistics of spatial association From Wikipedia, the free encyclopedia


Join count statistics are a method of spatial analysis used to assess the degree of association, in particular the autocorrelation, of categorical variables distributed over a spatial map. They were originally introduced by Australian statistician P. A. P. Moran.[1] Join count statistics have found widespread use in econometrics,[2] remote sensing[3] and ecology.[4] Join count statistics can be computed in a number of software packages including PASSaGE,[5] GeoDA, PySAL[6] and spdep.[7]

Binary data

Summarize
Perspective
Thumb
Join counts for binary data on a grid using 'rook' (north, south, east, west) neighbors. Left: black is never next to black, nor white to white resulting in zeros values of . Centre: random pattern shows no bias for pairing colours, resulting in approximately equal values for all join count statistics. Right: A solid patch of black in a white background results in high values for and low values of , since black is only next to white along the patch boundary.

Given binary data distributed over spatial sites, where the neighbour relations between regions and are encoded in the spatial weight matrix

the join count statistics are defined as [8][4]

Where

The subscripts refer to 'black'=1 and 'white'=0 sites. The relation implies only three of the four numbers are independent. Generally speaking, large values of and relative to imply autocorrelation and relatively large values of imply anti-correlation.

To assess the statistical significance of these statistics, the expectation under various null models has been computed.[9] For example, if the null hypothesis is that each sample is chosen at random according to a Bernoulli process with probability

then Cliff and Ord [8] show that

where

However in practice[10] an approach based on random permutations is preferred, since it requires fewer assumptions.

Local join count statistic

Anselin and Li introduced[11][12] the idea of the local join count statistic, following Anselin's general idea of a Local Indicator of Spatial Association (LISA).[13] Local Join Count is defined by e.g.

with similar definitions for and . This is equivalent to the Getis–Ord statistics computed with binary data. Some analytic results for the expectation of the local statistics are available based on the hypergeometric distribution[11] but due to the multiple comparisons problem a permutation based approach is again preferred in practice.[12]

Extension to multiple categories

Thumb
Join counts for 3 category data on a grid using 'rook' (north, south, east, west) neighbors. Left: each category never has a neighbour of its own type, resulting in zeros on the diagonal. Centre: random pattern shows no bias for pairing colours, resulting in approximately equal values for all join count statistics. Right: Since different types are only adjacent on the edge of the patches this results in small values for .

When there are categories join count statistics have been generalised[4][8][9]

Where is an indicator function for the variable belonging to the category . Analytic results are available[14] or a permutation approach can be used to test for significance as in the binary case.

References

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.