Huffman Coding
 Huffman Coding also called as Huffman Encoding is a famous greedy algorithm that is used for the lossless compression of data.
 It uses variable length encoding where variable length codes are assigned to all the characters depending on how frequently they occur in the given text.
 The character which occurs most frequently gets the smallest code and the character which occurs least frequently gets the largest code.
Prefix Rule
 To prevent ambiguities while decoding, Huffman coding implements a rule known as a prefix rule which ensures that the code assigned to any character is not a prefix of the code assigned to any other character.
Major Steps in Huffman Coding
There are two major steps in Huffman coding
 Building a Huffman tree from the input characters
 Assigning codes to the characters by traversing the Huffman tree
Steps to construct Huffman Tree
Step01:
Create a leaf node for all the given characters containing the occurring frequency of characters.
Step02:
Arrange all the nodes in the increasing order of frequency value contained in the nodes.
Step03:
Considering the first two nodes having minimum frequency, create a new internal node having frequency equal to the sum of the two nodes frequencies and make the first node as a left child and the other node as a right child of the newly created node.
Step04:
Keep repeating Step02 and Step03 until all the nodes form a single tree.
After following all the above steps, our desired Huffman tree will be constructed.
Important Formulas
Formula01:
Formula02:
Total number of bits in Huffman encoded message
= Total number of characters in the message x Average code length per character
= ∑ ( frequency_{i} x Code length_{i })
Time Complexity
 extractMin( ) is called 2 x (n1) times if there are n nodes.
 As extractMin( ) calls minHeapify( ), it takes O(logn) time.
 Thus, Overall complexity becomes O(nlogn).
Time Complexity of Huffman Coding = O (nlogn) 
where n is the number of unique characters in the text
PRACTICE PROBLEMS BASED ON HUFFMAN CODING
Problem01:
A file contains the following characters with the frequencies as shown. If Huffman coding is used for data compression, determine
 Huffman code for each character
 Average code length
 Length of Huffman encoded message (in bits)
Characters  Frequencies 
a  10 
e  15 
i  12 
o  3 
u  4 
s  13 
t  1 
Solution
First let us construct the Huffman tree using the steps we have learnt above
Step01:
Step02:
Step03:
Step04:
Step05:
Step06:
Step07:
After we have constructed the Huffman tree, we will assign weights to all the edges. Let us assign weight ‘0’ to the left edges and weight ‘1’ to the right edges.
Note

After assigning weight ‘0’ to the left edges and weight ‘1’ to the right edges, we get
1. Huffman code for the characters
We will traverse the Huffman tree from the root node to all the leaf nodes one by one and and will write the Huffman code for all the characters
 a = 111
 e = 10
 i = 00
 o = 11001
 u = 1101
 s = 01
 t = 11000
From here, we can observe
 Characters occurring less frequently in the text are assigned the larger codes.
 Characters occurring more frequently in the text are assigned the smaller codes.
2. Average code length
We know,
Average code length
= ∑ ( frequency_{i} x code length_{i} ) / ∑ ( frequency_{i} )
= { (10 x 3) + (15 x 2) + (12 x 2) + (3 x 5) + (4 x 4) + (13 x 2) + (1 x 5) } / (10 + 15 + 12 + 3 + 4 + 13 + 1)
= 2.52
3. Length of Huffman encoded message
We know
Total number of bits in Huffman encoded message
= Total number of characters in the message x Average code length per character
= 58 x 2.52
= 146.16
≅ 147 bits
To gain better understanding of the concepts and to practice more problems of Huffman Coding,
To practice previous Years GATE Problems on Huffman Coding, Watch this video.
Get more notes and other study material of Design and Analysis of Algorithms (DAA).
Watch video lectures by visiting our YouTube channel LearnVidFun.