Huffman coding
entropy encoding algorithm used for lossless data compression From Wikipedia, the free encyclopedia
Remove ads
Remove ads
Huffman coding is a way of encoding data. The method was developed in 1952, by David A. Huffman, at MIT. It was first published as A Method for the Construction of Minimum-Redundancy Codes, in 1952.[1] The method is a prefix code, and a way of lossless data compression. The idea behind it is very simple: There's a dictionary that contains words, and how often they occur in the text. Each word is replaced by a variable-length code. Words that occur more often get shorter codes. The codes must not overlap and be unique.

The output from Huffman's algorithm is a variable-length code table for encoding a source symbol (such as a character in a file). The algorithm constructs this table based on the estimated probability or frequency of occurrence (weight) for each possible value of the source symbol. As in other entropy encoding methods, more common symbols are generally represented using fewer bits than less common symbols. Huffman's method can be efficiently implemented, finding a code in time linear to the number of input weights if these weights are sorted.[2] Even though Huffman's method is optimal among methods encoding symbols separately, it is not always optimal among all compression methods - it is replaced with arithmetic coding[3] or asymmetric numeral systems[4] if a better compression ratio is required.
Remove ads
History
In 1951, David A. Huffman and his MIT information theory classmates were given the choice of a term paper or a final exam. The professor, Robert M. Fano, assigned a term paper on the problem of finding the most efficient binary code. Huffman, unable to prove any codes were the most efficient, was about to give up and start studying for the final when he hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the most efficient.[5]
In doing so, Huffman outdid Fano, who had worked with Claude Shannon to develop a similar code. Building the tree from the bottom up guaranteed optimality, unlike the top-down approach of Shannon–Fano coding.
Remove ads
References
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads