Cramér's V

Usage and interpretation

φ_c is the intercorrelation of two discrete variables^[2] and may be used with variables having two or more levels. φ_c is a symmetrical measure: it does not matter which variable we place in the columns and which in the rows. Also, the order of rows/columns does not matter, so φ_c may be used with nominal data types or higher (notably, ordered or numerical).

Cramér's V varies from 0 (corresponding to no association between the variables) to 1 (complete association) and can reach 1 only when each variable is completely determined by the other. It may be viewed as the association between two variables as a percentage of their maximum possible variation.

φ_c² is the mean square canonical correlation between the variables.^{[citation needed]}

In the case of a 2 × 2 contingency table Cramér's V is equal to the absolute value of Phi coefficient.

Remove ads

Calculation

Summarize

Perspective

Let a sample of size n of the simultaneously distributed variables $A$ and $B$ for $i=1,\ldots ,r;j=1,\ldots ,k$ be given by the frequencies

n_{ij}=

number of times the values

(A_{i},B_{j})

were observed.

The chi-squared statistic then is:

\chi ^{2}=\sum _{i,j}{\frac {(n_{ij}-{\frac {n_{i.}n_{.j}}{n}})^{2}}{\frac {n_{i.}n_{.j}}{n}}}\;,

where $n_{i.}=\sum _{j}n_{ij}$ is the number of times the value $A_{i}$ is observed and $n_{.j}=\sum _{i}n_{ij}$ is the number of times the value $B_{j}$ is observed.

Cramér's V is computed by taking the square root of the chi-squared statistic divided by the sample size and the minimum dimension minus 1:

V={\sqrt {\frac {\varphi ^{2}}{\min(k-1,r-1)}}}={\sqrt {\frac {\chi ^{2}/n}{\min(k-1,r-1)}}}\;,

where:

$\varphi$ is the phi coefficient.
$\chi ^{2}$ is derived from Pearson's chi-squared test
$n$ is the grand total of observations and
$k$ being the number of columns.
$r$ being the number of rows.

The p-value for the significance of V is the same one that is calculated using the Pearson's chi-squared test.^{[citation needed]}

The formula for the variance of V=φ_c is known.^[3]

In R, the function cramerV() from the package rcompanion^[4] calculates V using the chisq.test function from the stats package. In contrast to the function cramersV() from the lsr^[5] package, cramerV() also offers an option to correct for bias. It applies the correction described in the following section.

Remove ads

Bias correction

Summarize

Perspective

Cramér's V can be a heavily biased estimator of its population counterpart and will tend to overestimate the strength of association. A bias correction, using the above notation, is given by^[6]

{\tilde {V}}={\sqrt {\frac {{\tilde {\varphi }}^{2}}{\min({\tilde {k}}-1,{\tilde {r}}-1)}}}

where

{\tilde {\varphi }}^{2}=\max \left(0,\varphi ^{2}-{\frac {(k-1)(r-1)}{n-1}}\right)

and

{\tilde {k}}=k-{\frac {(k-1)^{2}}{n-1}}

{\tilde {r}}=r-{\frac {(r-1)^{2}}{n-1}}

Then ${\tilde {V}}$ estimates the same population quantity as Cramér's V but with typically much smaller mean squared error. The rationale for the correction is that under independence, $E[\varphi ^{2}]={\frac {(k-1)(r-1)}{n-1}}$ .^[7]

Remove ads

Usage and interpretation

Calculation

Bias correction

See also

References

External links

Wikiwand - on