Top Qs
Timeline
Chat
Perspective

Cramér's V

Statistical measure of association From Wikipedia, the free encyclopedia

Remove ads

In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φc) is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic and was published by Harald Cramér in 1946.[1]

Usage and interpretation

φc is the intercorrelation of two discrete variables[2] and may be used with variables having two or more levels. φc is a symmetrical measure: it does not matter which variable we place in the columns and which in the rows. Also, the order of rows/columns does not matter, so φc may be used with nominal data types or higher (notably, ordered or numerical).

Cramér's V varies from 0 (corresponding to no association between the variables) to 1 (complete association) and can reach 1 only when each variable is completely determined by the other. It may be viewed as the association between two variables as a percentage of their maximum possible variation.

φc2 is the mean square canonical correlation between the variables.[citation needed]

In the case of a 2 × 2 contingency table Cramér's V is equal to the absolute value of Phi coefficient.

Remove ads

Calculation

Summarize
Perspective

Let a sample of size n of the simultaneously distributed variables and for be given by the frequencies

number of times the values were observed.

The chi-squared statistic then is:

where is the number of times the value is observed and is the number of times the value is observed.

Cramér's V is computed by taking the square root of the chi-squared statistic divided by the sample size and the minimum dimension minus 1:

where:

  • is the phi coefficient.
  • is derived from Pearson's chi-squared test
  • is the grand total of observations and
  • being the number of columns.
  • being the number of rows.

The p-value for the significance of V is the same one that is calculated using the Pearson's chi-squared test.[citation needed]

The formula for the variance of Vc is known.[3]

In R, the function cramerV() from the package rcompanion[4] calculates V using the chisq.test function from the stats package. In contrast to the function cramersV() from the lsr[5] package, cramerV() also offers an option to correct for bias. It applies the correction described in the following section.

Remove ads

Bias correction

Summarize
Perspective

Cramér's V can be a heavily biased estimator of its population counterpart and will tend to overestimate the strength of association. A bias correction, using the above notation, is given by[6]

 

where

 

and

 
 

Then estimates the same population quantity as Cramér's V but with typically much smaller mean squared error. The rationale for the correction is that under independence, .[7]

Remove ads

See also

Other measures of correlation for nominal data:

Other related articles:

References

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads