Double descent

Double descent in statistics and machine learning is the phenomenon where a model with a small number of parameters and a model with an extremely large number of parameters both have a small training error, but a model whose number of parameters is about the same as the number of data points used to train the model will have a much greater test error than one with a much larger number of parameters.^[2] This phenomenon has been considered surprising, as it contradicts assumptions about overfitting in classical machine learning.^[3]

History

Early observations of what would later be called double descent in specific models date back to 1989.^[4]^[5]

The term "double descent" was coined by Belkin et. al.^[6] in 2019,^[3] when the phenomenon gained popularity as a broader concept exhibited by many models.^[7]^[8] The latter development was prompted by a perceived contradiction between the conventional wisdom that too many parameters in the model result in a significant overfitting error (an extrapolation of the bias–variance tradeoff),^[9] and the empirical observations in the 2010s that some modern machine learning techniques tend to perform better with larger models.^[6]^[10]

Remove ads

Theoretical models

Double descent occurs in linear regression with isotropic Gaussian covariates and isotropic Gaussian noise.^[11]

A model of double descent at the thermodynamic limit has been analyzed using the replica trick, and the result has been confirmed numerically.^[12]

A number of works^[13]^[14] have suggested that double descent can be explained using the concept of effective dimension: While a network may have a large number of parameters, in practice only a subset of those parameters are relevant for generalization performance, as measured by the local Hessian curvature. This explanation is formalized through PAC-Bayes compression-based generalization bounds,^[15] which show that less complex models are expected to generalize better under a Solomonoff prior.

Remove ads

History

Theoretical models

Empirical examples

See also

References

Further reading

External links

Wikiwand - on