Huffman
Encoding
Dr.Mary Jacob
Department of Computer Science
Kristu Jayanti College(Autonomous),Bengaluru
What is Huffman Encoding?
Ø Huffman Encoding is a lossless data compression
algorithm.
Ø It assigns shorter codes to more frequent characters
and longer codes to less frequent ones.
Ø It was developed by David A. Huffman in 1952.
Ø Prefix Code: No code is a prefix of another (e.g., 10
and 101 are invalid).
Ø Binary Tree: The algorithm builds a binary tree
where each character is a leaf node.
Ø Variable-Length Encoding: Frequently used
characters get shorter binary representations.
Key Concepts
�
️
Steps to Perform Huffman Encoding
Ø Count the frequency of each character.
Ø Build a priority queue (min-heap) with all characters and their frequencies.
Ø While there is more than one node in the queue:
Ø Remove the two nodes with the lowest frequency.Create a new internal
node with frequency = sum of the two nodes.Add this new node back to
the queue.The remaining node is the root of the Huffman Tree.
Ø Assign codes:
ØTraverse left: assign 0
ØTraverse right: assign 1
ØEncode the data by replacing each character with its Huffman code.
�Example
String : A A B C C C D D D D
Step 1: Frequency Count
Character Frequency
A 2
B 1
C 3
D 4
�Example
Step 2: Create Nodes and Build Huffman Tree
1. Insert nodes:
A(2), B(1), C(3), D(4)
2. Combine smallest two:
B(1) + A(2) = BA(3)
3. New queue: BA(3), C(3), D(4)
4. Combine two smallest:
BA(3) + C(3) = BAC(6)
5. New queue: BAC(6), D(4)
6.Combine final two: D(4) + BAC(6) = Root(10)
[10]
/ 
D(4) [6]
/ 
BA(3) C(3)
/ 
B(1) A(2)
�Example
[10]
/ 
D(4) [6]
/ 
BA(3) C(3)
/ 
B(1) A(2)
Step 3: Assign Codes from Tree
Let’s assign binary codes (left = 0, right = 1):
From root traverse to the leaf:
ü D = 0
ü C = 11
ü B = 100
ü A = 101
1
1
1
0
0
0
Step 4: Encode the String
Original string: A A B C C C D D D D
Huffman codes: A = 101, B = 100, C = 11, D = 0
Encoded binary string:101 101 100 11 11 11 0 0 0 0
Key Concepts
✅ Advantages of Huffman Encoding
Lossless compression.
Reduces file size.
Efficient when character frequency varies.
❗ Limitations:
Needs the frequency table to decode.
Not ideal if all characters have similar frequency.
Thank You

Huffman Encoding Algorithm - Concepts and Example

  • 1.
    Huffman Encoding Dr.Mary Jacob Department ofComputer Science Kristu Jayanti College(Autonomous),Bengaluru
  • 2.
    What is HuffmanEncoding? Ø Huffman Encoding is a lossless data compression algorithm. Ø It assigns shorter codes to more frequent characters and longer codes to less frequent ones. Ø It was developed by David A. Huffman in 1952.
  • 3.
    Ø Prefix Code:No code is a prefix of another (e.g., 10 and 101 are invalid). Ø Binary Tree: The algorithm builds a binary tree where each character is a leaf node. Ø Variable-Length Encoding: Frequently used characters get shorter binary representations. Key Concepts
  • 4.
    � ️ Steps to PerformHuffman Encoding Ø Count the frequency of each character. Ø Build a priority queue (min-heap) with all characters and their frequencies. Ø While there is more than one node in the queue: Ø Remove the two nodes with the lowest frequency.Create a new internal node with frequency = sum of the two nodes.Add this new node back to the queue.The remaining node is the root of the Huffman Tree. Ø Assign codes: ØTraverse left: assign 0 ØTraverse right: assign 1 ØEncode the data by replacing each character with its Huffman code.
  • 5.
    �Example String : AA B C C C D D D D Step 1: Frequency Count Character Frequency A 2 B 1 C 3 D 4
  • 6.
    �Example Step 2: CreateNodes and Build Huffman Tree 1. Insert nodes: A(2), B(1), C(3), D(4) 2. Combine smallest two: B(1) + A(2) = BA(3) 3. New queue: BA(3), C(3), D(4) 4. Combine two smallest: BA(3) + C(3) = BAC(6) 5. New queue: BAC(6), D(4) 6.Combine final two: D(4) + BAC(6) = Root(10) [10] / D(4) [6] / BA(3) C(3) / B(1) A(2)
  • 7.
    �Example [10] / D(4) [6] / BA(3) C(3) / B(1) A(2) Step 3: Assign Codes from Tree Let’s assign binary codes (left = 0, right = 1): From root traverse to the leaf: ü D = 0 ü C = 11 ü B = 100 ü A = 101 1 1 1 0 0 0
  • 8.
    Step 4: Encodethe String Original string: A A B C C C D D D D Huffman codes: A = 101, B = 100, C = 11, D = 0 Encoded binary string:101 101 100 11 11 11 0 0 0 0 Key Concepts
  • 9.
    ✅ Advantages ofHuffman Encoding Lossless compression. Reduces file size. Efficient when character frequency varies. ❗ Limitations: Needs the frequency table to decode. Not ideal if all characters have similar frequency.
  • 10.