Huffman coding

entropy encoding algorithm used for lossless data compression From Wikipedia, the free encyclopedia

Huffman coding
Remove ads
Remove ads

Huffman coding is a way of encoding data. The method was developed in 1952, by David A. Huffman, at MIT. It was first published as A Method for the Construction of Minimum-Redundancy Codes, in 1952.[1] The method is a prefix code, and a way of lossless data compression. The idea behind it is very simple: There's a dictionary that contains words, and how often they occur in the text. Each word is replaced by a variable-length code. Words that occur more often get shorter codes. The codes must not overlap and be unique.

Thumb
Huffman tree generated from the exact frequencies of the text "this is an example of a huffman tree". The frequencies and codes of each character are below. Encoding the sentence with this code requires 135 (or 147) bits, as opposed to 288 (or 180) bits if 36 characters of 8 (or 5) bits were used. (This assumes that the code tree structure is known to the decoder and thus does not need to be counted as part of the transmitted information.)
More information Char, Freq ...

The output from Huffman's algorithm is a variable-length code table for encoding a source symbol (such as a character in a file). The algorithm constructs this table based on the estimated probability or frequency of occurrence (weight) for each possible value of the source symbol. As in other entropy encoding methods, more common symbols are generally represented using fewer bits than less common symbols. Huffman's method can be efficiently implemented, finding a code in time linear to the number of input weights if these weights are sorted.[2] Even though Huffman's method is optimal among methods encoding symbols separately, it is not always optimal among all compression methods - it is replaced with arithmetic coding[3] or asymmetric numeral systems[4] if a better compression ratio is required.

Remove ads

History

In 1951, David A. Huffman and his MIT information theory classmates were given the choice of a term paper or a final exam. The professor, Robert M. Fano, assigned a term paper on the problem of finding the most efficient binary code. Huffman, unable to prove any codes were the most efficient, was about to give up and start studying for the final when he hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the most efficient.[5]

In doing so, Huffman outdid Fano, who had worked with Claude Shannon to develop a similar code. Building the tree from the bottom up guaranteed optimality, unlike the top-down approach of Shannon–Fano coding.

Remove ads

References

Loading content...
Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads