Tag: What is Huffman Encoding

Huffman Coding | Huffman Coding Example | Time Complexity

Huffman Coding-

 

  • Huffman Coding is a famous Greedy Algorithm.
  • It is used for the lossless compression of data.
  • It uses variable length encoding.
  • It assigns variable length code to all the characters.
  • The code length of a character depends on how frequently it occurs in the given text.
  • The character which occurs most frequently gets the smallest code.
  • The character which occurs least frequently gets the largest code.
  • It is also known as Huffman Encoding.

 

Prefix Rule-

 

  • Huffman Coding implements a rule known as a prefix rule.
  • This is to prevent the ambiguities while decoding.
  • It ensures that the code assigned to any character is not a prefix of the code assigned to any other character.

 

Major Steps in Huffman Coding-

 

There are two major steps in Huffman Coding-

  1. Building a Huffman Tree from the input characters.
  2. Assigning code to the characters by traversing the Huffman Tree.

 

Huffman Tree-

 

The steps involved in the construction of Huffman Tree are as follows-

 

Step-01:

 

  • Create a leaf node for each character of the text.
  • Leaf node of a character contains the occurring frequency of that character.

 

Step-02:

 

  • Arrange all the nodes in increasing order of their frequency value.

 

Step-03:

 

Considering the first two nodes having minimum frequency,

  • Create a new internal node.
  • The frequency of this new node is the sum of frequency of those two nodes.
  • Make the first node as a left child and the other node as a right child of the newly created node.

 

Step-04:

 

  • Keep repeating Step-02 and Step-03 until all the nodes form a single tree.
  • The tree finally obtained is the desired Huffman Tree.

 

Time Complexity-

 

The time complexity analysis of Huffman Coding is as follows-

  • extractMin( ) is called 2 x (n-1) times if there are n nodes.
  • As extractMin( ) calls minHeapify( ), it takes O(logn) time.

 

Thus, Overall time complexity of Huffman Coding becomes O(nlogn).

Here, n is the number of unique characters in the given text.

 

Important Formulas-

 

The following 2 formulas are important to solve the problems based on Huffman Coding-

 

Formula-01:

 

 

Formula-02:

 

Total number of bits in Huffman encoded message

= Total number of characters in the message x Average code length per character

= ∑ ( frequencyi x Code length)

 

PRACTICE PROBLEM BASED ON HUFFMAN CODING-

 

Problem-

 

A file contains the following characters with the frequencies as shown. If Huffman Coding is used for data compression, determine-

  1. Huffman Code for each character
  2. Average code length
  3. Length of Huffman encoded message (in bits)

 

Characters Frequencies
a 10
e 15
i 12
o 3
u 4
s 13
t 1

 

Solution-

 

First let us construct the Huffman Tree.

Huffman Tree is constructed in the following steps-

 

Step-01:

 

 

Step-02:

 

 

Step-03:

 

 

Step-04:

 

 

Step-05:

 

 

Step-06:

 

 

Step-07:

 

 

Now,

  • We assign weight to all the edges of the constructed Huffman Tree.
  • Let us assign weight ‘0’ to the left edges and weight ‘1’ to the right edges.

 

Rule

  • If you assign weight ‘0’ to the left edges, then assign weight ‘1’ to the right edges.
  • If you assign weight ‘1’ to the left edges, then assign weight ‘0’ to the right edges.
  • Any of the above two conventions may be followed.
  • But follow the same convention at the time of decoding that is adopted at the time of encoding.

 

After assigning weight to all the edges, the modified Huffman Tree is-

 

 

Now, let us answer each part of the given problem one by one-

 

1. Huffman Code For Characters-

 

To write Huffman Code for any character, traverse the Huffman Tree from root node to the leaf node of that character.

Following this rule, the Huffman Code for each character is-

  • a = 111
  • e = 10
  • i = 00
  • o = 11001
  • u = 1101
  • s = 01
  • t = 11000

 

From here, we can observe-

  • Characters occurring less frequently in the text are assigned the larger code.
  • Characters occurring more frequently in the text are assigned the smaller code.

 

2. Average Code Length-

 

Using formula-01, we have-

Average code length

= ∑ ( frequencyi x code lengthi ) / ∑ ( frequencyi )

= { (10 x 3) + (15 x 2) + (12 x 2) + (3 x 5) + (4 x 4) + (13 x 2) + (1 x 5) } / (10 + 15 + 12 + 3 + 4 + 13 + 1)

= 2.52

 

3. Length of Huffman Encoded Message-

 

Using formula-02, we have-

Total number of bits in Huffman encoded message

= Total number of characters in the message x Average code length per character

= 58 x 2.52

= 146.16

≅ 147 bits

 

To gain better understanding about Huffman Coding,

Watch this Video Lecture

 

To practice previous years GATE problems on Huffman Coding,

Watch this Video Lecture

 

Next Article- 0/1 Knapsack Problem

 

Get more notes and other study material of Design and Analysis of Algorithms.

Watch video lectures by visiting our YouTube channel LearnVidFun.