What is HuffmanEncoding?
Ø Huffman Encoding is a lossless data compression
algorithm.
Ø It assigns shorter codes to more frequent characters
and longer codes to less frequent ones.
Ø It was developed by David A. Huffman in 1952.
3.
Ø Prefix Code:No code is a prefix of another (e.g., 10
and 101 are invalid).
Ø Binary Tree: The algorithm builds a binary tree
where each character is a leaf node.
Ø Variable-Length Encoding: Frequently used
characters get shorter binary representations.
Key Concepts
4.
�
️
Steps to PerformHuffman Encoding
Ø Count the frequency of each character.
Ø Build a priority queue (min-heap) with all characters and their frequencies.
Ø While there is more than one node in the queue:
Ø Remove the two nodes with the lowest frequency.Create a new internal
node with frequency = sum of the two nodes.Add this new node back to
the queue.The remaining node is the root of the Huffman Tree.
Ø Assign codes:
ØTraverse left: assign 0
ØTraverse right: assign 1
ØEncode the data by replacing each character with its Huffman code.
5.
�Example
String : AA B C C C D D D D
Step 1: Frequency Count
Character Frequency
A 2
B 1
C 3
D 4
�Example
[10]
/
D(4) [6]
/
BA(3) C(3)
/
B(1) A(2)
Step 3: Assign Codes from Tree
Let’s assign binary codes (left = 0, right = 1):
From root traverse to the leaf:
ü D = 0
ü C = 11
ü B = 100
ü A = 101
1
1
1
0
0
0
8.
Step 4: Encodethe String
Original string: A A B C C C D D D D
Huffman codes: A = 101, B = 100, C = 11, D = 0
Encoded binary string:101 101 100 11 11 11 0 0 0 0
Key Concepts
9.
✅ Advantages ofHuffman Encoding
Lossless compression.
Reduces file size.
Efficient when character frequency varies.
❗ Limitations:
Needs the frequency table to decode.
Not ideal if all characters have similar frequency.