Additive smoothing
Statistical technique for smoothing categorical data / From Wikipedia, the free encyclopedia
Dear Wikiwand AI, let's keep it short by simply answering these key questions:
Can you list the top facts and stats about Additive smoothing?
Summarize this article for a 10 year old
In statistics, additive smoothing, also called Laplace smoothing[1] or Lidstone smoothing, is a technique used to smooth categorical data. Given a set of observation counts from a
-dimensional multinomial distribution with
trials, a "smoothed" version of the counts gives the estimator:
where the smoothed count and the "pseudocount" α > 0 is a smoothing parameter. α = 0 corresponds to no smoothing. (This parameter is explained in § Pseudocount below.) Additive smoothing is a type of shrinkage estimator, as the resulting estimate will be between the empirical probability (relative frequency)
, and the uniform probability
. Invoking Laplace's rule of succession, some authors have argued[citation needed] that α should be 1 (in which case the term add-one smoothing[2][3] is also used)[further explanation needed], though in practice a smaller value is typically chosen.
From a Bayesian point of view, this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter α as a prior distribution. In the special case where the number of categories is 2, this is equivalent to using a beta distribution as the conjugate prior for the parameters of the binomial distribution.
Oops something went wrong: