![cover image](https://wikiwandv2-19431.kxcdn.com/_next/image?url=https://upload.wikimedia.org/wikipedia/commons/thumb/8/82/Huffman_tree_2.svg/langsimple-640px-Huffman_tree_2.svg.png&w=640&q=50)
Huffman coding
entropy encoding algorithm used for lossless data compression / From Wikipedia, the free encyclopedia
Huffman coding is a way of encoding data. The method was developed in 1952, by David A. Huffman, at MIT. It was first published as A Method for the Construction of Minimum-Redundancy Codes, in 1952.[1] The method is a prefix code, and a way of lossless data compression. The idea behind it is very simple: There's a dictionary that contains words, and how often they occur in the text. Each word is replaced by a variable-length code. Words that occur more often get shorter codes. The codes must not overlap and be unique.
![](http://upload.wikimedia.org/wikipedia/commons/thumb/8/82/Huffman_tree_2.svg/320px-Huffman_tree_2.svg.png)
Char | Freq | Code |
---|---|---|
space | 7 | 111 |
a | 4 | 010 |
e | 4 | 000 |
f | 3 | 1101 |
h | 2 | 1010 |
i | 2 | 1000 |
m | 2 | 0111 |
n | 2 | 0010 |
s | 2 | 1011 |
t | 2 | 0110 |
l | 1 | 11001 |
o | 1 | 00110 |
p | 1 | 10011 |
r | 1 | 11000 |
u | 1 | 00111 |
x | 1 | 10010 |
The output from Huffman's algorithm is a variable-length code table for encoding a source symbol (such as a character in a file). The algorithm constructs this table based on the estimated probability or frequency of occurrence (weight) for each possible value of the source symbol. As in other entropy encoding methods, more common symbols are generally represented using fewer bits than less common symbols. Huffman's method can be efficiently implemented, finding a code in time linear to the number of input weights if these weights are sorted.[2] Even though Huffman's method is optimal among methods encoding symbols separately, it is not always optimal among all compression methods - it is replaced with arithmetic coding[3] or asymmetric numeral systems[4] if a better compression ratio is required.