Huffman Coding | Huffman Encoding

Huffman Coding-

 

  • Huffman Coding also called as Huffman Encoding is a famous greedy algorithm that is used for the lossless compression of data.
  • It uses variable length encoding where variable length codes are assigned to all the characters depending on how frequently they occur in the given text.
  • The character which occurs most frequently gets the smallest code and the character which occurs least frequently gets the largest code.

 

Prefix Rule-

 

  • To prevent ambiguities while decoding, Huffman coding implements a rule known as a prefix rule which ensures that the code assigned to any character is not a prefix of the code assigned to any other character.

 

Major Steps in Huffman Coding-

 

There are two major steps in Huffman coding-

 

  1. Building a Huffman tree from the input characters
  2. Assigning codes to the characters by traversing the Huffman tree

 

Steps to construct Huffman Tree-

 

Step-01:

 

Create a leaf node for all the given characters containing the occurring frequency of characters.

 

Step-02:

 

Arrange all the nodes in the increasing order of frequency value contained in the nodes.

 

Step-03:

 

Considering the first two nodes having minimum frequency, create a new internal node having frequency equal to the sum of the two nodes frequencies and make the first node as a left child and the other node as a right child of the newly created node.

 

Step-04:

 

Keep repeating Step-02 and Step-03 until all the nodes form a single tree.

 

After following all the above steps, our desired Huffman tree will be constructed.

 

Important Formulas-

 

Formula-01:

 

 

Formula-02:

 

Total number of bits in Huffman encoded message

= Total number of characters in the message x Average code length per character

= ∑ ( frequencyi x Code length)

 

Time Complexity-

 

  • extractMin( ) is called 2 x (n-1) times if there are n nodes.
  • As extractMin( ) calls minHeapify( ), it takes O(logn) time.
  • Thus, Overall complexity becomes O(nlogn).

 

Time Complexity of Huffman Coding = O (nlogn)

where n is the number of unique characters in the text

 

PRACTICE PROBLEMS BASED ON HUFFMAN CODING-

 

Problem-01:

 

A file contains the following characters with the frequencies as shown. If Huffman coding is used for data compression, determine-

 

  1. Huffman code for each character
  2. Average code length
  3. Length of Huffman encoded message (in bits)

 

CharactersFrequencies
a10
e15
i12
o3
u4
s13
t1

 

Solution-

 

First let us construct the Huffman tree using the steps we have learnt above-

 

Step-01:

 

 

Step-02:

 

 

Step-03:

 

 

Step-04:

 

 

Step-05:

 

 

Step-06:

 

 

Step-07:

 

 

After we have constructed the Huffman tree, we will assign weights to all the edges. Let us assign weight ‘0’ to the left edges and weight ‘1’ to the right edges.

 

Note

  • We can also assign weight ‘1’ to the left edges and weight ‘0’ to the right edges.
  • The only thing to keep in mind is that we must follow the same convention at the time of decoding which we adopted at the time of encoding.

 

After assigning weight ‘0’ to the left edges and weight ‘1’ to the right edges, we get-

 

 

1. Huffman code for the characters-

 

We will traverse the Huffman tree from the root node to all the leaf nodes one by one and and will write the Huffman code for all the characters-

 

  • a = 111
  • e = 10
  • i = 00
  • o = 11001
  • u = 1101
  • s = 01
  • t = 11000

 

From here, we can observe-

  • Characters occurring less frequently in the text are assigned the larger codes.
  • Characters occurring more frequently in the text are assigned the smaller codes.

 

2. Average code length-

 

We know,

Average code length

= ∑ ( frequencyi x code lengthi ) / ∑ ( frequencyi )

= { (10 x 3) + (15 x 2) + (12 x 2) + (3 x 5) + (4 x 4) + (13 x 2) + (1 x 5) } / (10 + 15 + 12 + 3 + 4 + 13 + 1)

= 2.52

 

3. Length of Huffman encoded message-

 

We know-

Total number of bits in Huffman encoded message

= Total number of characters in the message x Average code length per character

= 58 x 2.52

= 146.16

≅ 147 bits

 

To gain better understanding of the concepts and to practice more problems of Huffman Coding,

Watch this video

 

To practice previous Years GATE Problems on Huffman Coding, Watch this video.

Get more notes and other study material of Design and Analysis of Algorithms (DAA).

Watch video lectures by visiting our YouTube channel LearnVidFun.

Summary
Huffman Coding | Huffman Encoding
Article Name
Huffman Coding | Huffman Encoding
Description
Huffman Coding or Huffman Encoding is a greedy algorithm used for lossless compression of data. Huffman Coding Steps- Building a Huffman tree and Assigning codes to the characters. Steps for constructing Huffman Tree.
Author
Publisher Name
Gate Vidyalay
Publisher Logo
Liked this article? Share it with your friends and classmates now-