Loading AI tools
From Wikipedia, the free encyclopedia
In survey research, the design effect is a number that shows how well a sample of people may represent a larger group of people for a specific measure of interest (such as the mean). This is important when the sample comes from a sampling method that is different than just picking people using a simple random sample.
The design effect is a positive real number, represented by the symbol . If , then the sample was selected in a way that is just as good as if people were picked randomly. When , then inference from the data collected is not as accurate as it could have been if people were picked randomly.
When researchers use complicated methods to pick their sample, they use the design effect to check and adjust their results. It may also be used when planning a study in order to determine the sample size.
In survey methodology, the design effect (generally denoted as , , or ) is a measure of the expected impact of a sampling design on the variance of an estimator for some parameter of a population. It is calculated as the ratio of the variance of an estimator based on a sample from an (often) complex sampling design, to the variance of an alternative estimator based on a simple random sample (SRS) of the same number of elements.[1]: 258 The (be it estimated, or known a priori) can be used to evaluate the variance of an estimator in cases where the sample is not drawn using simple random sampling. It may also be useful in sample size calculations[2] and for quantifying the representativeness of samples collected with various sampling designs.
The design effect is a positive real number that indicates an inflation (), or deflation () in the variance of an estimator for some parameter, that is due to the study not using SRS (with , when the variances are identical).[3]: 53, 54 Intuitively we can get when we have some a-priori knowledge we can exploit during the sampling process (which is somewhat rare). And, in contrast, we often get when we need to compensate for some limitation in our ability to collect data (which is more common). Some sampling designs that could introduce generally greater than 1 include: cluster sampling (such as when there is correlation between observations), stratified sampling (with disproportionate allocation to the strata sizes), cluster randomized controlled trial, disproportional (unequal probability) sample (e.g. Poisson sampling), statistical adjustments of the data for non-coverage or non-response, and many others. Stratified sampling can yield that is smaller than 1 when using Proportionate allocation to strata sizes (when these are known a-priori, and correlated to the outcome of interest) or Optimum allocation (when the variance differs between strata and is known a-priori).[citation needed]
Many calculations (and estimators) have been proposed in the literature for how a known sampling design influences the variance of estimators of interest, either increasing or decreasing it. Generally, the design effect varies among different statistics of interests, such as the total or ratio mean. It also matters if the sampling design is correlated with the outcome of interest. For example, a possible sampling design might be such that each element in the sample may have a different probability to be selected. In such cases, the level of correlation between the probability of selection for an element and its measured outcome can have a direct influence on the subsequent design effect. Lastly, the design effect can be influenced by the distribution of the outcome itself. All of these factors should be considered when estimating and using design effect in practice.[4]: 13
The term "design effect" was coined by Leslie Kish in his 1965 book "Survey Sampling."[1]: 88, 258 In it, Kish proposed the general definition for the design effect,[lower-alpha 1] as well as formulas for the design effect of cluster sampling (with intraclass correlation);[1]: 162 and the famous design effect formula for unequal probability sampling.[1]: 427 These are often known as "Kish's design effect", and were later combined into a single formula.
In a 1995 paper,[5]: 73 Kish mentions that a similar concept, termed "Lexis ratio", was described at the end of the 19th century. The closely related Intraclass correlation was described by Fisher in 1950, while computations of ratios of variances were already published by Kish and others from the late 1940s to the 1950s. One of the precursors to Kish's definition was work done by Cornfield in 1951.[6][4]
In his 1995 paper, Kish proposed that considering the design effect is necessary when averaging the same measured quantity from multiple surveys conducted over a period of time.[5]: 57–62 He also suggested that the design effect should be considered when extrapolating from the error of simple statistics (e.g. the mean) to more complex ones (e.g. regression coefficients). However, when analyzing data (e.g., using survey data to fit models), values are less useful nowadays due to the availability of specialized software for analyzing survey data. Prior to the development of software that computes standard errors for many types of designs and estimates, analysts would adjust standard errors produced by software that assumed all records in a dataset were i.i.d by multiplying them by a (see Deft definition below).[citation needed]
Symbol | Description |
---|---|
Variance of an estimator under a given sampling design | |
Variance of an estimator under simple random sampling without replacement (SRSWOR) | |
Variance of an estimator under simple random sampling with replacement (SRSWR) | |
, | Design effect, a measure of the impact of a sampling design on the variance of an estimator compared to simple random sampling without replacement (SRSWOR), |
, | Design effect factor, the square root of the ratio of variances under a given sampling design and SRS with replacement (SRSWR), |
Sample size | |
Population size | |
Effective sample size, the sample size under SRS needed to achieve the same variance as the given sampling design, | |
Weight for the -th unit | |
Sample size for stratum | |
Population size for stratum | |
Weight for stratum | |
Total number of strata | |
, | Average cluster size |
Total number of clusters | |
Sample size for cluster | |
Intraclass correlation coefficient (ICC) for cluster sampling | |
, , | Measures of variation in weights using the coefficient of variation (CV) squared (relvariance) |
Estimated correlation between the outcome variable and the selection probabilities | |
Estimated intercept in the linear regression of the outcome variable on the selection probabilities | |