Loading AI tools
This article is rated B-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||
|
Hi all, I recently made a small change to the AICc formula to add the simplified version of the AICc (i.e. not in terms of the AIC equation above, but the direct formula) so that readers could see both the AICc's relation to the AIC (as a "correction") and as the formula recommended by Burnham and Anderson for general use [edit number 618051554].
An unnamed user reverted the edit, citing "invalid simplification." Does anyone (the reverting editor from ip 2.25.180.99 included) have any reason to not want the edit I made to stand? It adds the second line to the equation below,
The main point I have in favor of the change is that Burnham and Anderson point out in multiple locations that the AIC should be thought of as the asymptotic version of the AICc, rather than thinking of the AICc as a correction to the AIC. This second equation shows easily (by comparison with the AIC formula) how that asymptotic relationship holds, and it shows how to compute the AICc itself directly.
The point against it, as far as I can see, is just that there is now a second line of math (which I can imagine some people being opposed to...)
Any opinions? If no, I'll change my edit back in a few days/weeks. Dgianotti (talk) 18:03, 31 July 2014 (UTC)
When analyzing data, a general question of both absolute and relative goodness-of-fit of a given model arises. In the general case, we are fitting a model of K parameters to the observed data x_1, x_2 ... x_N. In the case of fitting AR, MA, ARMA, or ARIMA models, the question we are concerned with is what K is, i.e. how many parameters to include in the model.
The parameters are routinely estimated by minimizing the residual sum of squares, or by maximizing log likelihood of the data. For normal distributions, the least sum of squares method and the log likelihood method yield identical results.
These techniques are, however, unusable for estimation of optimal K. For that, we use information criteria which also justify the use of the log likelihood above.
More TBA... Note: The entropy link should be changed to Information entropy
I've written a short article on DIC, please look it over and edit. Bill Jefferys 22:55, 7 December 2005 (UTC)
is AIC *derived* from anything or is it just a hack? BIC is at least derivable from some postulate. WHy would you ever use AIC over BIC or, better, cross validation?
There is a link on the page () which shows a proof that AIC can be derived from the same postulate as BIC and vice versa. Cross validation is good but computationally expensive compared to A/BIC - a problem for large scale optimisations. The actual discussion over BIC/AIC as a weapon of choice seems to be long, immensely technical/theoretical and not a little boring 128.240.229.7 12:37, 28 February 2007 (UTC)
Does the definition of AIC make sense with respect to dimension? That is...why would the log of the likelihood function have the same dimension as the number of parameters, so that subtracting them would make sense? Cazort 20:00, 14 November 2007 (UTC)
AIC is not a "hack" but it is not a general method either. It relates to Shannon entropy, which is self-information, and as such it can only compare various uses of single data objects. Physically, entropy has units of energy divided by temperature. Temperature relates to the relative information content of two different data objects, which latter, relative information content, is the basis for comparison between data objects. The reference in this Wikipedia article to this effect is obtuse. That AIC is related to "information theory" is vague enough to qualify as what physicists call "hand-waving." Namely, when one runs out of logical explanation the "waving of hands in the air" takes over. The relationship is only to Shannon entropy, which in turn may have some historical relevance, but is only a small part of information theory. Thus, it is like saying AIC is related to "statistics." It is too vague a statement to be of any use, and it does not explain anything. CarlWesolowski (talk) 21:40, 4 October 2016 (UTC)
AIC was said to stand for "An Information Criterion" by Akaike, not "Akaike information Criterion" Yoderj 19:39, 16 February 2007 (UTC)
This criterion is alternately called the WAIC: Watanabe-Akaike Information Criterion, or the widely-applicable information criterion . — Preceding unsigned comment added by 167.220.148.12 (talk) 10:48, 29 April 2016 (UTC)
I have sent an e-mail to this Mr. Gee who is cited as a possible reference, with the following text:
However, he has not answered. I will remove the reference. Classical geographer 12:18, 2 April 2007 (UTC)
That measurement ( R^2_{AIC}= 1 - \frac{AIC_0}{AIC_i} ) doesn't make sense to me. R^2 values range from 0-1. If the AIC is better than the null model, it should be smaller. If the numerator is larger than the denominator, the R^2_{AIC} will be less than 1. This is saying that better models will generate a negative R^2_{AIC}.
It would make sense if the model were: R^2_{AIC}= 1 - \frac{AIC_i}{AIC_0}
Please write the pronunciation using the International phonetic alphabet, as specified in Wikipedia:Manual of style (pronunciation). -Pgan002 05:09, 10 May 2007 (UTC)
Query - should this page not also link to Bayes' Factor?
I'm not an expert in model selection but in my field (molecular phylogenetics) model selection is an increasingly important problem in methods involves Bayesian inference (e.g. MyBayes, BEAST) and AIC is apparently 'not appropriate' for these models
Any thoughts anyone? I've also posted this on the model selection page. Thanks.--Comrade jo (talk) 12:19, 19 December 2007 (UTC)
I agree. The opening statement: "Hence, AIC provides a means for model selection." should read "Hence, AIC provides a means for model selection, in certain circumstances." Circumstances in which it is not appropriate abound, see introduction in [1] — Preceding unsigned comment added by CarlWesolowski (talk • contribs) 21:49, 12 November 2016 (UTC)
The RSS in the definition is not a likelihood function! However, it turns out that the log likelihood looks similar to RSS. —Preceding unsigned comment added by 203.185.215.144 (talk) 23:12, 7 January 2008 (UTC)
I have contributed a modified AIC, valid only for models with the same number of data points. It is quite useful though. Velocidex (talk) 09:17, 8 July 2008 (UTC)
EverGreg (talk) 11:22, 8 July 2008 (UTC)
I am unhappy with this section. It says "where C is a constant independent of the model used, and dependent only on the use of particular data points, i.e. it does not change if the data do not change."
But this is only true if the :s are the same for the two models. And under "Equal-variances case" it explicitly saya that is unknown, hence is estimated by the models. For instance, if we compare two nested linear models, then the larger will estimnate to a smaller value than the smaller model. In this case it is the converse: the "constant" C will differ between models, whereas the term with the exponentials will cancel out (they will both be exp(−1).)
The formula with RSS is correct, but the derivation is wrong for the above reason.
All this needs to be fixed. (Harald Lang, 9/12/2015) — Preceding unsigned comment added by 46.39.98.125 (talk) 11:36, 9 December 2015 (UTC)
I think the link that appeared at the bottom "A tool for fitting distributions, times series and copulas using AIC with Excel by Vose Software" is not too relevant and only one of many tools that may incorporate AIC. I am not certain enough to remove it myself. Dirkjot (talk) 16:36, 17 November 2008 (UTC)
The equation given here for determining AIC when error terms are normally distributed does not match the equation given by Burnham and Anderson on page 63 of their 2002 book. Burnham and Anderson's equation is identical except that it does not include a term with pi. Anyone know why this is? Tcadam (talk) 03:13, 17 December 2008 (UTC)Tcadam (talk) 03:14, 17 December 2008 (UTC)
Further confusion: Is there a discrepancy between AIC defined from the : and the RSS version: ? Don't they differ with an extra ? —Preceding unsigned comment added by 152.78.192.25 (talk) 15:27, 13 May 2011 (UTC)
I suspect that the whole derivation concerning chi-square is wrong, since it uses the likelihood function instead of the maximum of the likelihood function in the AIC. — Preceding unsigned comment added by 141.14.232.254 (talk) 19:22, 14 February 2012 (UTC)
References
What on earth is this section? It should be properly explained, with real references, or permanently deleted! I would like to see a book on model selection which describes AIC in detail, but also points out these supposed controversies! True bugman (talk) 11:50, 7 September 2010 (UTC)
I removed the part on Takeuchi information criterion (based on matrix trace), because this seemed to give credit to Claeskens & Hjort. There could be a new section on TIC, if someone wanted to write one; for now, I included a reference to the 1976 paper. Note that Burnham & Anderson (2002) discuss TIC at length, and a section on TIC should cite their discussion. TIC is rarely useful in practice; rather, it is an important intermediate step in the most-general derivation of AIC and AICc. 86.170.206.175 (talk) 16:24, 14 April 2011 (UTC)
I made a few minor edits in the BIC section to try to keep it a *little* more neutral, but it still reads with a very biased tone. I imagine a bunch of AIC proponents had a huge argument with BIC proponents and then decided to write that section as pro-AIC propaganda. You can find just as many papers in the literature that unjustifiably argue that BIC is "better" than AIC, as you can find papers that unjustifiably argue AIC is "better" than BIC. Furthermore, if AIC can be derived from the BIC formalism by just taking a different prior, then one might argue AIC is essentially contained within "generalized BIC", so how can BIC, in general, be "worse" than AIC if AIC can be derived through the BIC framework?
The truth is that neither AIC nor BIC is inherently "better" or "worse" than the other until you define a specific application (and by AIC, I include AICc and minor variants, and by BIC I include variants also to be fair). You can find applications where AIC fails miserably and BIC works wonderfully, and vice versa. To argue that this or that method is better in practice, because of asymptotic results or because of a handful of research papers, is flawed since, for most applications, you never get close to the fantasy world of "asymptopia" where asymptotic results can actually be used for justification, and you can almost always find a handful of research papers that argue method A is better than method B when, in truth, method A is only better than method B for the specific application they were working on. — Preceding unsigned comment added by 173.3.109.197 (talk) 17:44, 15 April 2012 (UTC)
The difference between AIC and BIC is not explored in this biased article. To see some of these differences viewed by rather more knowledgeable people, including, for example, Rob Hyndman, who relates:
AIC is best for prediction as it is asymptotically equivalent to cross-validation. BIC is best for explanation as it is allows consistent estimation of the underlying data generating process.
Please follow the link http://stats.stackexchange.com/questions/577/is-there-any-reason-to-prefer-the-aic-or-bic-over-the-other CarlWesolowski (talk) 17:39, 1 October 2016 (UTC)
The article states that AIC is applicable for nested and non-nested models, with a reference to Anderson (2008). However, looking up the source, there's no explicit indication that the AIC should be used for nested models. Instead, the indicated reference just states that the AIC can be valuable for non-nested models. Are there other sources that might be more explicit? — Preceding unsigned comment added by Redsilk09 (talk • contribs) 10:11, 18 July 2012 (UTC)
I agree with the above comment. I've tried using the AIC for nested models as specified by the article, and the results were nonsensical. — Preceding unsigned comment added by 152.160.76.249 (talk) 20:14, 1 August 2012 (UTC)
I agree, and provide a counter example https://stats.stackexchange.com/q/369850/99274CarlWesolowski (talk) 06:30, 31 March 2020 (UTC)
The BIC section claims Akaike derived BIC independently and credits him as much as anyone else in discovering BIC. However, I have always read in the history books that Akaike was very excited when he first saw (Schwartz's?) a BIC derivation, and that after seeing that it inspired him to develop his own Bayesian version of AIC. I thought it was well-documented historically that this was the case, and that he was a very graceful man who didn't think of BIC as a competitor to him, but thought of it as just yet another very useful and interesting result. His only disappointment, many accounts do claim, was that he didn't think of it himself earlier. Isn't that the standard way that all the historical accounts read?
I removed the following sentence: "This form is often convenient, because most model-fitting programs produce as a statistic for the fit." The statistic produced with many model-fitting programs is in fact the RSS (e.g. Origin ). But the RSS cannot simply replace in these equations. Either the σi has to be known or the following formula should be used AIC = n ln(RSS/n) + 2k + C. — Preceding unsigned comment added by 129.67.70.165 (talk) 14:34, 21 February 2013 (UTC)
The example from U. Georgia is no longer found; so I deleted it. It was:
I added the best example I could find with a Google-search: [Akaike example filetype:pdf] Done — Charles Edwin Shipp (talk) 13:31, 11 September 2013 (UTC)
@SolidPhase: Explain yourself. The "relative quality of a model" is ungrammatical - relative to what? This must be "models" plural. As for measuring "quality" - this sounds like higher values of AIC mean greater quality, but the reverse is true, so this should be made clear up front in the lead. Why remove that? Thirdly, WP:HEADINGS states that "Headings should not refer redundantly to the subject of the article". Lastly, it seems (to me) better to talk about how AIC works before discussing its limitations. Tayste (edits) 18:07, 18 June 2015 (UTC)
As an interim measure (only), I have restored the body to my last edit, but kept your lead section. SolidPhase (talk) 19:47, 18 June 2015 (UTC)
It has now been over four days.
Regarding the sentence "Lower values of AIC indicate higher quality and therefore better models", as above I think that including this is clutter, which will be especially distracting for people who only read the lead. Additionally, there are many activities where the minimum is the optimum, e.g. golf. Moreover, in the field of Optimization, the canonical examples are minimization. I definitely believe that the sentence should be removed; so I have now done that.
Regarding the grammatical changes that you made, I do not agree. Back in March, though, you found a grammatical error: and you were correct, of course. Hence I get the impression that you have a really good grammatical knowledge. I do not understand what you find grammatically wrong about the previous version, though, or why your version is correct. Simply put, I am confused about this(!). Your edits to the grammar remain as you made them, but I would really appreciate it if we could discuss this issue further. Will you explain the reasons for your grammatical change more?
SolidPhase (talk) 19:19, 22 June 2015 (UTC)
Are you quite sure that AIC is ranked from best is lowest? Here are rankings from a Mathematica case of the 5 best models for a problem that uses BIC for ranking:
BIC | AIC | HQIC |
---|---|---|
3.841 | 3.857 | 3.845 |
3.815 | 3.825 | 3.818 |
3.735 | 3.746 | 3.738 |
3.732 | 3.742 | 3.735 |
3.458 | 3.468 | 3.461 |
Note that they go from highest as best to lowest as worst. I checked on this and is seems that some programs output -AIC, not AIC. However, the word is "index." In addition to being accurate, it is in common usage and people with no higher mathematical training understand it. Please change this, post an objection to the change or otherwise I will change it. If you then change it back without discussion, which is typical, we will have a dispute, as I will keep changing it back until there is a dispute settlement. CarlWesolowski (talk) 14:21, 11 July 2016 (UTC)CarlWesolowski (talk) 18:37, 14 July 2016 (UTC)
No surprise that you object to the word "index." However, not only is AIC an index, but as it is based on Shannon entropy, it is a data specific index namely Self-information. So, most indices would smoke it, and they tend to be at least somewhat comparable between data sets. I am just asking for a more objective presentation. Moreover, I am convinced that you cannot take this article to the next level. I have found some of the advocates for AIC use to be unusually partisan, which given AIC's very limited applicability due to the restrictive assumptions not being met as a common occurrence, is somewhat difficult for me to reconcile with objectivity.CarlWesolowski (talk) 22:36, 12 September 2016 (UTC)
The comment(s) below were originally left at Talk:Akaike information criterion/Comments, and are posted here for posterity. Following several discussions in past years, these subpages are now deprecated. The comments may be irrelevant or outdated; if so, please feel free to remove this section.
Hello,
I am not a statistician and therefore can only provide remark about stuff I was unable to understand. My concern is about "k" : In the paragraph "Definition" it is said that k is the number of parameters in the statistical model In the paragraph "AICc and AICu" it is said that k denotes the number of model parameters + 1. If these two ks are different, then why to give them the same name. If they are not there is a problem of definition somewhere ? |
Last edited at 13:27, 7 October 2009 (UTC). Substituted at 19:44, 1 May 2016 (UTC)
Some of the edits are ungrammatical, e.g. "AIC use as one means of model selection". Some of the edits introduce technical invalidity, e.g. "each candidate model has residuals that are normal distributions". I have undone the edits.
CarlWesolowski has been sporadically making edits to this article since at least 20 March 2015. Each time, those edits have been undone. My suggestion is this: if CarlWesolowski wants to make changes to the article, then he should discuss those changes on this Talk page, and get a consensus of editors to agree to the changes.
SolidPhase (talk) 06:18, 8 July 2016 (UTC)
That my edits can and have been reversed is not surprising. That SolidPhase takes exception to my person is inexcusable, calling me "ignorant" on my talk page. This article is misleading. The premise "Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Hence, AIC provides a means for model selection." is not logical. For example, BIC will yield better quality of ft than AIC. BIC is more appropriate for model selection than AIC. When a model is already selected, AIC will provide better estimates of that model's parameter values than BIC.
AIC is only one of many criteria for model selection, and often suggested for use when it is inappropriate. "The AIC is not a consistent model selection method"-point 10 of Facts and fallacies of the AIC by Rob J Hyndman [1]. The introduction reads like a commercial for cigarettes. Mathematica uses BIC, as one of several tests whose combined score ranks models, and, there are lots of good folks who may compute AIC but not use it to rank. In addition to AIC, other methods used include BIC, step-wise partial probability ANOVA, HQIC, log likelihood, complexity error, factor analysis, and goodness of fit testing with Pearson Chi-squared, Cramer Von Mises probabilities and others. And, without looking at those other measurements, any pronouncements made with respect to model selection using AIC should be ignored. It is said that AIC does not assume that there is a true model, but BIC does. BIC is also more self-consistent. Neither of these maximum likelihood approaches is appropriate for model selection when the objective is extrapolation, not interpolation, as the goodness-of-extrapolation makes goodness-of-fit irrelevant.
In the section that says in rather poor quality English "Sometimes, each candidate model assumes that the residuals are distributed according to independent identical normal distributions (with zero mean). That gives rise to least squares model fitting." Let us take this one statement at a time.
Candidate models do not "assume." People assume. The requirement for normally distributed residuals is unnecessary, that happens approximately 10% of the time. Normally distributed residuals are not a requirement for AIC any more than they are for maximum likelihood. Again quoting Hyndman-point 3-"The AIC does not assume the residuals are Gaussian. It is just that the Gaussian likelihood is most frequently used. But if you want to use some other distribution, go ahead. The AIC is the penalized likelihood, whichever likelihood you choose to use." Again with inanimate objects making assumptions, tisk, tisk.
"That gives rise to least squares model fitting." Well, no it doesn't. Other assumptions for OLS can include homoscedasticity, and fixed intervals on the x-axis. Otherwise, OLS fit parameters are biased and only approximate. Summarizing, I really think that one should consider pulling back from the claims herein and injecting some perspective into this sloppy article. You will not let me fix this article, so fix it yourselves. — (talk • contribs) 03:08, 9 July 2016 (UTC) CarlWesolowski (talk) 07:51, 29 January 2017 (UTC)
A more general treatment of the fit problem that may be worth mentioning is QML, Quasi-Maximum Likelihood, based upon [2]. It is currently totally unclear what the statistical use of AIC is in the article, so fix it. IThe current article is dangerous, it promotes AIC without sufficient insight as to appropriate usage. CarlWesolowski (talk) 17:09, 10 July 2016 (UTC)CarlWesolowski (talk) 07:51, 29 January 2017 (UTC)
Thank you for responding. However, the "a" is too soft. In the matter of implication, hinting at something is not as good as saying it. This article has that problem throughout, and it is less useful in that form than it would be if it were more clearly written. For example, let us take the infamous sentence "Sometimes, each candidate model assumes that the residuals are distributed according to independent identical normal distributions (with zero mean). That gives rise to least squares model fitting." It took me a very long time to figure out what you are trying to say and, BTW, do not. Consider for "that gives rise to least squares..." it is unclear that it does, and most people having studied least squares would still not know what you are getting on about. Consider saying something relevant rather than making the reader study the phrase to make any sense out of it, namely, note [3] that "There are several different frameworks in which the linear regression model can be cast in order to make the OLS technique applicable. Each of these settings produces the same formulas and same results. The only difference is the interpretation and the assumptions which have to be imposed in order for the method to give meaningful results. The choice of the applicable framework depends mostly on the nature of data in hand, and on the inference task which has to be performed." You do not say what you are assuming, and that is not a problem for the editors, but, it is a big problem for the readers.
When you use the "colloquialism" as you call it, you depreciate not only the language, but mask the fact that you have imposed an assumption, which does not help the reader understand what you are saying. The phrase "the residuals are distributed according to independent identical normal distributions (with zero mean)" is so inaccurate that it is nearly unintelligible. I think perhaps that you obliquely referring to ML ~ND, where and , or some such. Take a look at [4]. It is much more clearly written than this Wikipedia entry. It is not misleading, it is not oversold, and it give a much better indication of where AIC is in the universe of methods. Try to emulate that level of clarity, please. What happens when the residuals are not ND. Surely you realize that that is most of the time. AIC, BIC and maximum likelihood can be, and should be defined in that broader context, in which case there is no direct relationship to OLS, such that the relationship to OLS for normal residuals is an aside that does more to confuse than to clarify. CarlWesolowski (talk) 23:50, 10 July 2016 (UTC)CarlWesolowski (talk) 00:54, 11 July 2016 (UTC)CarlWesolowski (talk) 18:51, 14 July 2016 (UTC)CarlWesolowski (talk) 21:23, 6 February 2017 (UTC)
@SolidPhase The sentence "Sometimes, each candidate model assumes that the residuals are distributed according to independent identical normal distributions (with zero mean). That gives rise to least squares model fitting." is incorrect, because 1) Models do not make assumptions and when you do you confuse not only the reader but also yourself. To wit 2) When the residuals are actually normally distributed only then are OLS and AIC as both applied to normal residuals the same. However, 3) AIC does not assume normal residuals because A) AIC can be applied to non-normal residual structure, and B) The assumption of normal residuals is clearly yours as it is not a requirement for AIC.CarlWesolowski (talk) 22:52, 10 November 2016 (UTC)
This section is created to discuss the recent edits by anonymous IPs. SolidPhase (talk) 10:28, 31 October 2016 (UTC)
In the article, it is written "If the model under consideration is a linear regression, k {\displaystyle k} k is the number of regressors, including the intercept". This is wrong, isn't it? What about the error variance? Shouldn't the error variance also count as a parameter? — Preceding unsigned comment added by 193.174.15.2 (talk) 09:30, 3 January 2017 (UTC)
Hello fellow Wikipedians,
I have just modified one external link on Akaike information criterion. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
{{dead link}}
tag to http://www.mun.ca/biology/quant/BurnhamBES11.pdfWhen you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.
This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}}
(last update: 5 June 2024).
Cheers.—InternetArchiveBot (Report bug) 00:45, 29 June 2017 (UTC)
@BetterMath:, could you please explain why the sentence “Asymptotically, AIC selects the model that minimizes the mean squared error of (out-of-sample) prediction,” is wrong and/or misleading? By definition, an asymptotically efficient (information) criterion chooses the (candidate) model that minimises the mean squared error of prediction, and by Stone (1977) AIC is asymptotically efficient. --bender235 (talk) 23:56, 16 January 2018 (UTC)
This section is created pursuant to WP:BRD, to discuss recent proposed changes to the lead. (Consider also WP:Lead.)
Having the lead use the word "score" in this context seems wrong, because score has a technical meaning in statistics that does not apply in this context. Having the lead mention that a lower AIC is better seems inappropriate to me, because it is a technical detail—a detail, moreover, that is well discussed in the Definition section; it seems much more appropriate to have the lead tell what AIC does, rather than tell how AIC does things. Having the lead claim "Thus, AIC provides a means for model selection that deals with the trade-off between the goodness of fit of the model and the simplicity of the model" seems wrong, because the "Thus" is not logically supported by the context. Having the first paragraph of the lead worded as proposed does not even suggest that AIC can only evaluate relative quality, which is surely inappropriate and confusing.
For the above reasons, I have reverted to the prior version. SolidPhase (talk) 16:19, 29 June 2018 (UTC)
Let me know what you think!
Sietse (talk)'s request for a third opinion on the following (see discussion above): of the two proposed lede paragraphs below, which do you recommend we use (with or without changes)? (SolidPhase, I think that is also what the dispute is about from your point of view. But if you'd also like the third opinion to address a different question, feel free to add that here.)
3O Response: I feel that I understand the topic better after reading both the current and proposed leads. I agree that:
The 3O asks which I recommend, and my choice would be the current version. But that's the sort of revert/no-revert choice that the requester doesn't like. I feel that there is room to build. The lead is short and some information could be added. I feel the 'lower is better' concept could be added as well as summarizing other parts of the article like advantages/disadvantages of using AIC and its history (mentioning Hirotugu Akaike and the year he published). This information should probably not go in the MOS:LEADPARAGRAPH. I suspect that given a little more room outside that first paragraph, it'd be easier to state it without that awkwardness.
For what it's worth, that's my non-binding opinion. I'll try to keep an eye open for any follow-up. – Reidgreg (talk) 21:36, 17 July 2018 (UTC)
A proposed revision is below. The first paragraph is the same as before; the second paragraph is expanded; the third paragraph is the same as before; the fourth paragraph is new (and it mentions Hirotugu Akaike, as recommended by User:Reidgreg).
Perhaps the third paragraph should be moved to the Definition section.
@Glrx:
SolidPhase (talk) 09:45, 19 July 2018 (UTC)
The prior section was for discussion of the wording of the lead. This section is for further discussion. It was created pursuant to a comment from User:Glrx that "the [second] paragraph is difficult and should be rewritten". As per the prior section, recommendations for rewording the lead are solicited. (Note that, in the prior section, I twice wikilinked Glrx.)
Additionally, Glrx has now changed a word in the lead: "representation" was changed to "model". The context is this clause: "When a statistical model is used to represent the process that generated the data, the representation will almost never be exact" versus "When a statistical model is used to represent the process that generated the data, the model will almost never be exact". My preference is for "representation".
SolidPhase (talk) 23:08, 2 December 2018 (UTC)
There is a long equation in the section "Replicating Student's t-test". The equation is so long that it is split over two lines. Should the equals sign go on the first line (at the end) or on the second line (at the beginning)?
I did not find a relevant policy or guideline on this. I prefer that the equals sign be on the first line, for two reasons. First, because it makes the first line clearer: when someone reads the first line, they know immediately that the next line is the other side of an equation, and that eases reading of the whole equation. Second, because it lessens the overall display width.
BetterMath (talk) 10:43, 27 July 2019 (UTC)
@BetterMath: I'm not going to revert again because I'm tired of your one-sentence explanations for removing my contribution. I gave two reliable sources that state verbatim what I added to the article; here are three more sources that confirm that this definition applies to time series models as well (which are regression, too, anyways, but let's leave this discussion aside). So after I've done my due diligence, how about you do your part: (i) please show me a reliable source that contradicts either of the three five that I've found, and more importantly (ii) explain to me what "relative quality of statistical models" is supposed to mean: what "quality" are we talking about? Best looking? Fastest converging? --bender235 (talk) 14:07, 18 September 2019 (UTC)
@Bender235: I thought that I should come back to this. I’ve used statistical deviance very little in my own studies, and didn’t previously notice an error in your reference.
The reference McElreath (2015) does say that AIC is an “estimate of the average out-of-sample deviance”. McElreath, however, is wrong. Indeed, we could easily reduce the deviance to 0, by increasing the number of parameters—but that would lead to overfitting. This issue is discussed in the Definition section.
The error by McElreath points to a more general issue. In most science-related fields, it is reasonable to cite undergraduate-level references with the assumption that the reference is valid. In some fields, however, that assumption is not reasonable. Thermodynamics is one such field. Statistics is another. Many statistics references contain serious errors. So if the references are cited blindly, those errors will corrupt Wikipedia.
That is what almost happened here. You made an edit in good faith (I assume), and cited a reference that easily meets WP:RS. If your edit had not been undone, then the article would’ve contained an error. The way to avoid such errors is to ensure that the editor (in this case, you) has a deep comprehension of the topic and the reference.
Something similar happened before. You edited this article in January 2018, including a citation of Stone (1977). That edit too was based on your comprehension of the reference not being deep enough (as I explained on this Talk page at the time).
Just now, I was looking at the article Parametric model. There was a problem in the definition given there. I fixed the problem, and looked at the article history. The prior edit to the article was made by you. I checked and discovered that it was your edit that introduced the problem.
The above are three illustrations of a general issue. It is important to make textual edits to a statistical article only if the editor has a deep comprehension of the topic.
BetterMath (talk) 22:36, 5 October 2019 (UTC)
Maybe this discussion might additionally consider WP:Lead? TheSeven (talk) 03:57, 12 October 2019 (UTC)
I do not know how to deal with someone who exhibits your level of reasoning capabilities. Your choice is to select a method of Dispute Resolution or have me report you. BetterMath (talk) 18:47, 19 October 2019 (UTC)
Thank you for providing these sources, I quickly read the context of the quotes you provided, I think they would be pertinent additions to the definition and lede. And they answer my initial questions, of what is AIC and how it differs from other information criteria. Here are what caught my eyes: From McElreath:
From Taddy:
In Taddy there is also a paragraph about the intuition behind the "corrected AIC" which I think is interesting and would be pertinent to add to extend the AICc section.
BetterMath, could you please provide reliable sources and concise quotes for your point too? Thank you in advance --Signimu (talk) 17:37, 22 October 2019 (UTC)
There is a misleading statement in the section on "Foundations of Statistics," where one editor claims that the Akaike Information Criterion (AIC) can form a foundation of statistics that is distinct from both frequentism and Bayesianism. Upon reviewing the cited reference (Burnham & Anderson, 2002 [p. 99]), it appears that the statement is not adequately supported. The relevant statement in the reference mentions that information criteria can be computed and interpreted without subjective judgment or the use of significance levels or Bayesian priors, but it does not suggest that information criteria constitute a separate statistical paradigm.
Additionally, after examining the book "Philosophy of Statistics," it is evident that no claim is made regarding AIC or information criteria forming a distinct paradigm within statistics. In the chapter specifically discussing AIC, BIC, and model selection, the authors treat AIC as a model selection rule that aids in statistical inference.
Based on these observations, it seems that the editor who wrote that particular part of the section may have aggregated and overextended the statements in the references. I recommend that the section be removed entirely, but gathering input from other editors would be valuable before making a final decision. Jourdy345 (talk) 15:04, 10 May 2023 (UTC)
Online essay ordering services play a crucial role in alleviating the burden of academic workload for students. With the increasing demands of coursework and deadlines, these services offer a lifeline by providing timely and customized educational assistance. By outsourcing essay writing tasks to professional writers, students can focus on other essential aspects of their education, such as studying for exams, participating in extracurricular activities, or gaining practical experience through internships. The convenience of accessing these services online empowers students to strike a balance between their academic responsibilities and personal pursuits, ultimately enhancing their overall learning experience and academic performance. The order essay online process provides a practical solution for those seeking assistance with complex assignments or seeking to manage their academic workload more effectively. Lucas Herry (talk) 07:28, 8 November 2023 (UTC)
Seamless Wikipedia browsing. On steroids.
Every time you click a link to Wikipedia, Wiktionary or Wikiquote in your browser's search results, it will show the modern Wikiwand interface.
Wikiwand extension is a five stars, simple, with minimum permission required to keep your browsing private, safe and transparent.