Statistical model validation

From Wikipedia, the free encyclopedia

In statistics, model validation is the task of evaluating whether a chosen statistical model is appropriate or not. Oftentimes in statistical inference, inferences from models that appear to fit their data may be flukes, resulting in a misunderstanding by researchers of the actual relevance of their model. To combat this, model validation is used to test whether a statistical model can hold up to permutations in the data. This topic is not to be confused with the closely related task of model selection, the process of discriminating between multiple candidate models: model validation does not concern so much the conceptual design of models as it tests only the consistency between a chosen model and its stated outputs.

There are many ways to validate a model. Residual plots plot the difference between the actual data and the model's predictions: correlations in the residual plots may indicate a flaw in the model. Cross validation is a method of model validation that iteratively refits the model, each time leaving out just a small sample and comparing whether the samples left out are predicted by the model: there are many kinds of cross validation. Predictive simulation is used to compare simulated data to actual data. External validation involves fitting the model to new data. Akaike information criterion estimates the quality of a model.