# Additive smoothing

## Statistical technique for smoothing categorical data / From Wikipedia, the free encyclopedia

#### Dear Wikiwand AI, let's keep it short by simply answering these key questions:

Can you list the top facts and stats about Additive smoothing?

Summarize this article for a 10 year old

In statistics, **additive smoothing**, also called **Laplace smoothing**[1] or **Lidstone smoothing**, is a technique used to smooth categorical data. Given a set of observation counts ${\textstyle \textstyle {\mathbf {x} \ =\ \left\langle x_{1},\,x_{2},\,\ldots ,\,x_{d}\right\rangle }}$ from a ${\textstyle \textstyle {d}}$-dimensional multinomial distribution with ${\textstyle \textstyle {N}}$ trials, a "smoothed" version of the counts gives the estimator:

- ${\hat {\theta }}_{i}={\frac {x_{i}+\alpha }{N+\alpha d}}\qquad (i=1,\ldots ,d),$

where the smoothed count ${\textstyle \textstyle {{\hat {x}}_{i}=N{\hat {\theta }}_{i}}}$ and the "pseudocount" *α* > 0 is a smoothing parameter. *α* = 0 corresponds to no smoothing. (This parameter is explained in § Pseudocount below.) Additive smoothing is a type of shrinkage estimator, as the resulting estimate will be between the empirical probability (relative frequency) ${\textstyle \textstyle {x_{i}/N}}$, and the uniform probability ${\textstyle \textstyle {1/d}}$. Invoking Laplace's rule of succession, some authors have argued^{[citation needed]} that *α* should be 1 (in which case the term **add-one smoothing**[2][3] is also used)^{[further explanation needed]}, though in practice a smaller value is typically chosen.

From a Bayesian point of view, this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter *α* as a prior distribution. In the special case where the number of categories is 2, this is equivalent to using a beta distribution as the conjugate prior for the parameters of the binomial distribution.

Oops something went wrong: