Zipf–Mandelbrot law - Wikiwand

Zipf–Mandelbrot law

Parameters ${\displaystyle N\in \{1,2,3\ldots \))$ (integer)${\displaystyle q\in [0;\infty )}$ (real)${\displaystyle s>0\,}$ (real) ${\displaystyle k\in \{1,2,\ldots ,N\))$ ${\displaystyle {\frac {1/(k+q)^{s)){H_{N,q,s))))$ ${\displaystyle {\frac {H_{k,q,s)){H_{N,q,s))))$ ${\displaystyle {\frac {H_{N,q,s-1)){H_{N,q,s))}-q}$ ${\displaystyle 1\,}$ ${\displaystyle {\frac {s}{H_{N,q,s))}\sum _{k=1}^{N}{\frac {\ln(k+q)}{(k+q)^{s))}+\ln(H_{N,q,s})}$

In probability theory and statistics, the Zipf–Mandelbrot law is a discrete probability distribution. Also known as the Pareto-Zipf law, it is a power-law distribution on ranked data, named after the linguist George Kingsley Zipf who suggested a simpler distribution called Zipf's law, and the mathematician Benoit Mandelbrot, who subsequently generalized it.

The probability mass function is given by:

${\displaystyle f(k;N,q,s)={\frac {1/(k+q)^{s)){H_{N,q,s))))$

where ${\displaystyle H_{N,q,s))$ is given by:

${\displaystyle H_{N,q,s}=\sum _{i=1}^{N}{\frac {1}{(i+q)^{s))))$

which may be thought of as a generalization of a harmonic number. In the formula, ${\displaystyle k}$ is the rank of the data, and ${\displaystyle q}$ and ${\displaystyle s}$ are parameters of the distribution. In the limit as ${\displaystyle N}$ approaches infinity, this becomes the Hurwitz zeta function ${\displaystyle \zeta (s,q)}$. For finite ${\displaystyle N}$ and ${\displaystyle q=0}$ the Zipf–Mandelbrot law becomes Zipf's law. For infinite ${\displaystyle N}$ and ${\displaystyle q=0}$ it becomes a Zeta distribution.

Applications

The distribution of words ranked by their frequency in a random text corpus is approximated by a power-law distribution, known as Zipf's law.

If one plots the frequency rank of words contained in a moderately sized corpus of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution, with exponent close to one (but see Powers, 1998 and Gelbukh & Sidorov, 2001). Zipf's law implicitly assumes a fixed vocabulary size, but the Harmonic series with s=1 does not converge, while the Zipf-Mandelbrot generalization with s>1 does. Furthermore, there is evidence that the closed class of functional words that define a language obeys a Zipf-Mandelbrot distribution with different parameters from the open classes of contentive words that vary by topic, field and register.[1]

In ecological field studies, the relative abundance distribution (i.e. the graph of the number of species observed as a function of their abundance) is often found to conform to a Zipf–Mandelbrot law.[2]

Within music, many metrics of measuring "pleasing" music conform to Zipf–Mandelbrot distributions.[3]

Notes

1. ^ Powers, David M W (1998). "Applications and explanations of Zipf's law". Association for Computational Linguistics: 151–160. Cite journal requires |journal= (help)
2. ^ Mouillot, D; Lepretre, A (2000). "Introduction of relative abundance distribution (RAD) indices, estimated from the rank-frequency diagrams (RFD), to assess changes in community diversity". Environmental Monitoring and Assessment. Springer. 63 (2): 279–295. doi:10.1023/A:1006297211561. Retrieved 24 Dec 2008.
3. ^ Manaris, B; Vaughan, D; Wagner, CS; Romero, J; Davis, RB. "Evolutionary Music and the Zipf-Mandelbrot Law: Developing Fitness Functions for Pleasant Music". Proceedings of 1st European Workshop on Evolutionary Music and Art (EvoMUSART2003). 611.

References

• Mandelbrot, Benoît (1965). "Information Theory and Psycholinguistics". In B.B. Wolman and E. Nagel (ed.). Scientific psychology. Basic Books. Reprinted as
• Mandelbrot, Benoît (1968) [1965]. "Information Theory and Psycholinguistics". In R.C. Oldfield and J.C. Marchall (ed.). Language. Penguin Books.
• Powers, David M W (1998). "Applications and explanations of Zipf's law". Association for Computational Linguistics: 151–160. Cite journal requires |journal= (help)
• Zipf, George Kingsley (1932). Selected Studies of the Principle of Relative Frequency in Language. Cambridge, MA: Harvard University Press.
• Van Droogenbroeck F.J., 'An essential rephrasing of the Zipf-Mandelbrot law to solve authorship attribution applications by Gaussian statistics' (2019) [1]