Coefficient of determination
Indicator for how well data points fit a line or curve / From Wikipedia, the free encyclopedia
Dear Wikiwand AI, let's keep it short by simply answering these key questions:
Can you list the top facts and stats about Coefficient of determination?
Summarize this article for a 10 year old
In statistics, the coefficient of determination, denoted R^{2} or r^{2} and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).
You can help expand this article with text translated from the corresponding article in German. (September 2019) Click [show] for important translation instructions.

It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing of hypotheses, on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.^{[1]}^{[2]}^{[3]}
There are several definitions of R^{2} that are only sometimes equivalent. One class of such cases includes that of simple linear regression where r^{2} is used instead of R^{2}. When only an intercept is included, then r^{2} is simply the square of the sample correlation coefficient (i.e., r) between the observed outcomes and the observed predictor values.^{[4]} If additional regressors are included, R^{2} is the square of the coefficient of multiple correlation. In both such cases, the coefficient of determination normally ranges from 0 to 1.
There are cases where R^{2} can yield negative values. This can arise when the predictions that are being compared to the corresponding outcomes have not been derived from a modelfitting procedure using those data. Even if a modelfitting procedure has been used, R^{2} may still be negative, for example when linear regression is conducted without including an intercept,^{[5]} or when a nonlinear function is used to fit the data.^{[6]} In cases where negative values arise, the mean of the data provides a better fit to the outcomes than do the fitted function values, according to this particular criterion.
The coefficient of determination can be more (intuitively) informative than MAE, MAPE, MSE, and RMSE in regression analysis evaluation, as the former can be expressed as a percentage, whereas the latter measures have arbitrary ranges. It also proved more robust for poor fits compared to SMAPE on the test datasets in the article.^{[7]}
When evaluating the goodnessoffit of simulated (Y_{pred}) vs. measured (Y_{obs}) values, it is not appropriate to base this on the R^{2} of the linear regression (i.e., Y_{obs}= m·Y_{pred} + b).^{[citation needed]} The R^{2} quantifies the degree of any linear correlation between Y_{obs} and Y_{pred}, while for the goodnessoffit evaluation only one specific linear correlation should be taken into consideration: Y_{obs} = 1·Y_{pred} + 0 (i.e., the 1:1 line).^{[8]}^{[9]}