# Entropy (information theory)

## Expected amount of information needed to specify the output of a stochastic data source / From Wikipedia, the free encyclopedia

#### Dear Wikiwand AI, let's keep it short by simply answering these key questions:

Can you list the top facts and stats about Information entropy?

Summarize this article for a 10 year old

SHOW ALL QUESTIONS

In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable ${\displaystyle X}$, which takes values in the alphabet ${\displaystyle {\mathcal {X}}}$ and is distributed according to ${\displaystyle p\colon {\mathcal {X}}\to [0,1]}$, the entropy is

${\displaystyle \mathrm {H} (X):=-\sum _{x\in {\mathcal {X}}}p(x)\log p(x),}$

where ${\displaystyle \Sigma }$ denotes the sum over the variable's possible values. The choice of base for ${\displaystyle \log }$, the logarithm, varies for different applications. Base 2 gives the unit of bits (or "shannons"), while base e gives "natural units" nat, and base 10 gives units of "dits", "bans", or "hartleys". An equivalent definition of entropy is the expected value of the self-information of a variable.[1]

The concept of information entropy was introduced by Claude Shannon in his 1948 paper "A Mathematical Theory of Communication",[2][3] and is also referred to as Shannon entropy. Shannon's theory defines a data communication system composed of three elements: a source of data, a communication channel, and a receiver. The "fundamental problem of communication" – as expressed by Shannon – is for the receiver to be able to identify what data was generated by the source, based on the signal it receives through the channel.[2][3] Shannon considered various ways to encode, compress, and transmit messages from a data source, and proved in his famous source coding theorem that the entropy represents an absolute mathematical limit on how well data from the source can be losslessly compressed onto a perfectly noiseless channel. Shannon strengthened this result considerably for noisy channels in his noisy-channel coding theorem.

Entropy in information theory is directly analogous to the entropy in statistical thermodynamics. The analogy results when the values of the random variable designate energies of microstates, so Gibbs formula for the entropy is formally identical to Shannon's formula. Entropy has relevance to other areas of mathematics such as combinatorics and machine learning. The definition can be derived from a set of axioms establishing that entropy should be a measure of how informative the average outcome of a variable is. For a continuous random variable, differential entropy is analogous to entropy. The definition ${\displaystyle \mathbb {E} [-\log p(X)]}$ generalizes the above.

Oops something went wrong: