Huffman CodingCoding
- is the process of assigning codes to a set of symbols (this may be numbers
themselfs)possibly for the goal of compression (encription is another
application). The idea hear is that replacing the symbols by their codes will
result in a smaller data set.
Huffman Coding
- This is a coding procedure that uses a tree data structure to assign the
codes.
- Also uses probabilty of the occurance of a symbol to determine whether it
recieves a long or short code. The higher the probability that a symbol will
occur means it will be assigned a shorter code.
- Code consists of a series of 1's and 0's.
1. Initialization: Put all the symbols in an OPEN list, keep it sorted at all
times (e.g., ABCDE). In our case the symbols will not be letters but,
actually number representing the greylevel or color values of the pixels in an
image OPEN = {A B C D E}
2. Create the bottom of a tree structure and assign each element in OPEN to a
node at this level of the tree.
3. Repeat until the OPEN list has only one node left:
(a) From OPEN pick two nodes having the lowest frequencies/probabilities,
create a parent node of them. (b) Assign the sum of the children's
frequencies/probabilities to the parent node and insert it into OPEN. (c)
Assign code 0, 1 to the two branches of the tree, and delete the children from
OPEN.
The resulting Tree Structure
The resulting Codebook
(assignment of codes to input symbols)
Symbol Count log(1/p) Code Subtotal (# of bits)
------ ----- -------- --------- --------------------
A 15 1.38 0 15
B 7 2.48 100 21
C 6 2.70 101 18
D 6 2.70 110 18
E 5 2.96 111 15
TOTAL (# of bits): 87
Discussions:
- Decoding (given a code, determine the original symbol) for the above
algorithms is trivial as long as the coding table (the statistics) is sent
before the data. (There is a bit overhead for sending this, negligible if the
data file is big.)
- Unique Prefix Property: no code is a prefix to any other code (all
symbols are at the leaf nodes)
--> great for decoder, unambiguous.
- If prior statistics are available and accurate, then Huffman coding is
very good.
In the above example:
entropy = (15 x 1.38 + 7 x 2.48 + 6 x 2.7 + 6 x 2.7 + 5 x 2.96) / 39
= 85.26 / 39 = 2.19
(Entropy is a measure of information)
Average Number of bits needed for Human Coding is: 87 / 39 = 2.23
Adaptive Huffman CodingMotivations:
(a) The previous algorithms require the statistical knowledge which is often
not available (e.g., live audio, video). (b) Even when it is available, it
could be a heavy overhead especially when many tables had to be sent when a
non-order0 model is used, i.e. taking into account the impact of the previous
symbol to the probability of the current symbol (e.g., "qu" often come together,
...).
The solution is to use adaptive algorithms. As an example, the Adaptive
Huffman Coding is examined below. The idea is however applicable to other
adaptive compression algorithms.
ENCODER DECODER
------- -------
Initialize_model(); Initialize_model();
while ((c = getc (input)) != eof) while ((c = decode (input)) != eof)
{ {
encode (c, output); putc (c, output);
update_model (c); update_model (c);
} }
Summary
- Huffman maps fixed length symbols to variable length codes. Optimal only
when symbol probabilities are powers of 2.
- Lempel-Ziv-Welch is a dictionary-based compression method. It maps a
variable number of symbols to a fixed length code.
- Adaptive algorithms do not need a priori estimation of probabilities, they
are more useful in real applications.
|