Course Coordinator:-
Dr. Mulugeta Atlabachew (Ass. Professor)
HARAMAYA UNIVERSITY
HARAMAYA INSTITUTE OF TECHNOLOGY
SCHOOL OF ELECTRICAL AND COMPUTER
ENGINEERING
Course Coordinator:-
Dr. Mulugeta Atlabachew (Ass. Professor),
Guest Lecturer
Introduction to Information Theory and Coding
INFORMATION
 Information can be described as one or more statements or facts that are
received by a human and that have some form of worth to the recipient.
 The working definition for information therefore must :
1. be something, although the exact nature (substance, energy, or
abstract concept) isn't clear;
2. provide “new” information: a repetition of previously received
messages isn't informative;
3. be “true:" a lie or false or counterfactual information is mis-
information, not information itself;
4. be ``about" something.
Haramaya
University,
HIT,
School
of
ECE
2 Haramaya University, HIT, School of ECE
11/22/2022
 Information can be thought of as the resolution of
uncertainty;
 It answers the question of "What an entity is" and thus
defines both its essence and the nature of its characteristics.
 The concept of information has different meanings in different
contexts.
 Thus the concept becomes related to notions of constraint,
communication, control, data, form, education, knowledge,
meaning, understanding, mental stimuli, pattern, perception,
representation, and entropy.
3
INFORMATION
Haramaya University, HIT, School of ECE
11/22/2022
 Information is stimuli that has meaning in some context
for its receiver.
 When information is entered into and stored in a
computer, it is generally referred to as data.
 After processing (such as formatting and printing), output
data can again be perceived as information.
 When information is packaged or used for understanding
or doing something, it is known as knowledge.
4
INFORMATION
Haramaya University, HIT, School of ECE
11/22/2022
Information
How to measure Information?
 The uncertainty of an event is measured by its probability of
occurrence and is inversely proportional to that.
 More uncertain events, require more information to resolve
the uncertainty of that event.
 The mathematical theory of information is based on
probability theory and statistics, and measures information
with several quantities of information.
Haramaya
University,
HIT,
School
of
ECE
5 Haramaya University, HIT, School of ECE
11/22/2022
Information
How to measure Information?
 The choice of logarithmic base in the following formulae determines
the unit of information entropy that is used.
 The most common unit of information is the bit, based on the binary
logarithm.
 Other units include the nat, based on the natural logarithm or base e,
and the hartley, based on the base 10 or common logarithm.
 The bit is a typical unit of information. It is 'that which reduces
uncertainty by half'.
Haramaya
University,
HIT,
School
of
ECE
6 Haramaya University, HIT, School of ECE
11/22/2022
Why logarithm to measure Information?
 We want our information measure I(p) to have several
properties
1) Information is a non-negative quantity: I(p) ≥ 0.
2) If an event has probability 1, we get no information from the occurrence of
the event: I(1) = 0.
3) If two independent events occur, then the information we get from observing
the events is the sum of the two informations: I(p1,p2) = I(p1)+I(p2):
4) We want our information measure to be a continuous (and, in fact,
monotonic) function of the probability (slight changes in probability should
result in slight changes in information).
Haramaya
University,
HIT,
School
of
ECE
7 Haramaya University, HIT, School of ECE
11/22/2022
Why logarithm to measure Information?
We can therefore derive the following:
Haramaya
University,
HIT,
School
of
ECE
8 Haramaya University, HIT, School of ECE
11/22/2022
Why logarithm to measure Information?
 From this, we can derive the nice property between logarithm and Self-
information content:
 This formula is called the surprise function which represents the
amount of surprise or the amount of self-information a particular
symbol x of a distribution holds.
 Intuitively the definition can be understood as follows: we would be
surprised if a rare symbol ( p(x) is small ) is observed. Thus, lower the
p(x), higher the surprise, which is achieved by the above definition.
Haramaya
University,
HIT,
School
of
ECE
9 Haramaya University, HIT, School of ECE
11/22/2022
Why logarithm to measure Information?
 Thus, using different bases for the logarithm results in
information measures which are just constant multiples of
each other, corresponding with measurements in different
units:
1. log2 units are bits (from 'binary')-Typical in Information Theory
2. log3 units are trits(from 'trinary')
3. loge units are nats (from 'natural logarithm') (We'll use ln(x)
for loge(x))
4. log10 units are Hartleys, after an early worker in the field.
Haramaya
University,
HIT,
School
of
ECE
10 Haramaya University, HIT, School of ECE
11/22/2022
Why logarithm to measure Information?
In general,
 The Shannon information content /self-information content/ of an
outcome x is defined to be
and it is measured in bits.
 The entropy of an ensemble X is defined to be the average Shannon
information content of an outcome:
Haramaya
University,
HIT,
School
of
ECE
11 Haramaya University, HIT, School of ECE
11/22/2022
Entropy
How to measure Information?
 In information theory, the entropy of a random variable is the average level
of "information", "surprise", or "uncertainty" inherent in the variable's
possible outcomes.
 The concept of information entropy was introduced by Claude Shannon in his
work published in 1948.
 Entropy should be a measure of how "surprising" the average outcome of a
variable is.
 Entropy is the average self-information content of the random variable.
 For a continuous random variable, differential entropy is analogous to
entropy.
Haramaya
University,
HIT,
School
of
ECE
12 Haramaya University, HIT, School of ECE
11/22/2022
Entropy
How to measure Information?
 Suppose now that we have n symbols {a1; a2; : : : ; an}, and some
source is providing us with a stream of these symbols.
 Suppose further that the source emits the symbols with probabilities
{p1; p2; : : : ; pn}, respectively.
 Assume that the symbols are emitted independently.
 What is the average amount of information we get from each symbol
we see in the stream from the source?
Haramaya
University,
HIT,
School
of
ECE
13 Haramaya University, HIT, School of ECE
11/22/2022
Entropy
How to measure Information?
 What we really want here is a weighted average.
 If we observe the symbol ai, we will be getting log(1/pi) information
from that particular observation.
 In a long run (say N) of observations, we will see (approximately) N*pi
occurrences of symbol ai.
 Thus, in the N (independent) observations, we will get total information
I of
Haramaya
University,
HIT,
School
of
ECE
14 Haramaya University, HIT, School of ECE
11/22/2022
Entropy
How to measure Information?
 Then, the average information we get per symbol observed for the N
observations will be
 We define to be zero when
Haramaya
University,
HIT,
School
of
ECE
15 Haramaya University, HIT, School of ECE
11/22/2022
Entropy
 For generalization, let us suppose that we have a set of probabilities
(a probability distribution) P = {p1, p2,…, pn}.
 We define the entropy of the distribution P by:
 if we have a continuous rather than discrete probability distribution
P(x):
Haramaya
University,
HIT,
School
of
ECE
16 Haramaya University, HIT, School of ECE
11/22/2022
Entropy
 Another way to think about entropy is in terms of expected value.
 Given a discrete probability distribution P = {p1, p2,…, pn} with
and or a continuous distribution P(x) with
and
 We can define the expected value of an associated discrete set F = {f1;
f2; : : : ; fn} or function F(x) by:
 Or
 From this we can conclude that
Haramaya
University,
HIT,
School
of
ECE
17 Haramaya University, HIT, School of ECE
11/22/2022
Entropy
 The entropy of a probability distribution is just the expected value of
the information of the distribution..
 Intuitively, the more the expected surprise or the entropy of the
distribution, the harder it is to represent.
Haramaya
University,
HIT,
School
of
ECE
18
Therefore, Entropy represents the expected value of surprise a distribution
holds.
Haramaya University, HIT, School of ECE
11/22/2022
Information Theory
 Information theory is the scientific study of the
quantification, storage, and communication of information.
 The field was fundamentally established by the works of
Harry Nyquist and Ralph Hartley, in the 1920s, and Claude
Shannon in the 1940s.
 The field is at the intersection of probability theory,
statistics, computer science, statistical mechanics,
information engineering, and electrical engineering.
Haramaya
University,
HIT,
School
of
ECE
19 Haramaya University, HIT, School of ECE
11/22/2022
 “The fundamental problem of communication is that of reproducing at
one point either exactly or approximately a message selected at
another point.”
Haramaya University, HIT, School of ECE 20
Shannon Definition of Communication
November
22
“Frequently the messages have meaning”
“... [which is] irrelevant to the engineering problem.”
 Shannon wants to find a way for “reliably” transmitting data
throughout the channel at “maximal” possible rate.
Haramaya University, HIT, School of ECE 21
Shannon Definition of Communication
November
22
Destination Decoding
Communication
Channel
Coding
Information
Source
Model of a Digital Communication System
Destination Decoding
Communication
Channel
Coding
Information
Source
Message
e.g. English symbols
Encoder
e.g. English to 0,1 sequence
Can have noise
or distortion
Decoder
e.g. 0,1 sequence to English
22
Haramaya University, HIT, School of ECE
11/22/2022
Information
 Given a discrete random variable X , with possible outcomes
x1,x2,…xn, which occur with probability p(x1),p(x2),…,p(xn) the
entropy of X is formally defined as:
 This Function is called the entropy function
Haramaya
University,
HIT,
School
of
ECE
23
)
(
log
)
(
)
( 2 x
p
x
p
X
H
x



Haramaya University, HIT, School of ECE
11/22/2022
Information
Example
 Tossing a dice:
 Outcomes are 1,2,3,4,5,6
 Each occurs at probability 1/6
 Information provided by tossing a dice is
Haramaya
University,
HIT,
School
of
ECE
24
bits
585
.
2
6
log
6
1
log
6
1
)
(
log
)
(
)
(
log
)
(
2
6
1
2
2
6
1
2
6
1














i
i
i
i
p
i
p
i
p
i
p
H
Haramaya University, HIT, School of ECE
11/22/2022
ENTROPY-Properties
NOTATIONS
 We use capital letters X and Y to name random variables,
 lower case letters and to refer to their respective
outcomes.
 These are drawn from particular sets A and B: and
 The probability of a particular outcome is denoted
Haramaya
University,
HIT,
School
of
ECE
25 Haramaya University, HIT, School of ECE
11/22/2022
ENTROPY-Properties
JOINT ENSEMBLE
 A joint ensemble „XY' is an ensemble whose outcomes are
ordered pairs with
 The joint ensemble XY defines a probability distribution
over all possible joint outcomes
MARGINAL PROBABILITY
 From the Sum Rule, we can see that the probability of X taking on a particular
value is the sum of the joint probabilities of this outcome for X and
all possible outcomes for Y :
Haramaya
University,
HIT,
School
of
ECE
26 Haramaya University, HIT, School of ECE
11/22/2022
ENTROPY-Properties
 In a simplified way it can be rewrite as
 And similarly for Y
Haramaya
University,
HIT,
School
of
ECE
27 Haramaya University, HIT, School of ECE
11/22/2022
ENTROPY-Properties
Conditional probability
 From the Product Rule, we can easily see that the conditional
probability that
 In a simplified way
 Similarly
Haramaya
University,
HIT,
School
of
ECE
28 Haramaya University, HIT, School of ECE
11/22/2022
ENTROPY-Properties
JOINT ENTROPY ENSEMBLES
CONDITIONAL ENTROPY
 Conditional entropy of an ensemble X, given that y = bj measures the
uncertainty remaining about random variable X after specifying that random
variable Y has taken on a particular value y = bj.
 It is defined naturally as the entropy of the probability distribution
Haramaya
University,
HIT,
School
of
ECE
29 Haramaya University, HIT, School of ECE
11/22/2022
ENTROPY-Properties
CONDITIONAL ENTROPY
 Conditional entropy of an ensemble X, given an ensemble Y is given by
CHAIN RULE FOR ENTROPY
 The joint entropy, conditional entropy, and marginal entropy for two
ensembles X and Y are related by:
Haramaya
University,
HIT,
School
of
ECE
30 Haramaya University, HIT, School of ECE
11/22/2022
ENTROPY-Properties
 It should seem natural and intuitive that the joint entropy of a pair of
random variables is the entropy of one plus the conditional entropy of
the other (the uncertainty that it adds once its dependence on the first
one has been discounted by conditionalizing on it).
 If we have three random variables X; Y; Z, the conditionalizing of the
joint distribution of any two of them, upon the third, is also expressed
by a Chain Rule:
Haramaya
University,
HIT,
School
of
ECE
31 Haramaya University, HIT, School of ECE
11/22/2022
ENTROPY-Properties
INDEPENDENCE BOUND ON ENTROPY
 A consequence of the Chain Rule for Entropy is that if we have many
different random variables X1; X2; :::; Xn, then the sum of all their
individual entropies is an upper bound on their joint entropy:
 Their joint entropy only reaches this upper bound if all of the random
variables are independent.
Haramaya
University,
HIT,
School
of
ECE
32 Haramaya University, HIT, School of ECE
11/22/2022
ENTROPY-Properties
MUTUAL INFORMATION
 Mutual information is a quantity that measures a relationship between
two random variables that are sampled simultaneously.
 In particular, it measures how much information is communicated, on
average, in one random variable about another.
 Intuitively, one might ask, how much does one random variable tell me
about another?
 In this definition, P(X) and P(Y ) are the marginal distributions of X and
Y obtained through the marginalization process.
Haramaya
University,
HIT,
School
of
ECE
33 Haramaya University, HIT, School of ECE
11/22/2022
ENTROPY-Properties
MUTUAL INFORMATION
 Note that in case X and Y are independent random variables, then the
numerator inside the logarithm equals the denominator.
 Then the log term vanishes, and the mutual information equals zero, as
one should expect.
 In the event that the two random variables are perfectly correlated,
then their mutual information is the entropy of either one alone.
 The mutual information of a random variable with itself is just its
entropy or self-information
Haramaya
University,
HIT,
School
of
ECE
34 Haramaya University, HIT, School of ECE
11/22/2022
ENTROPY-Properties
MUTUAL INFORMATION
 The mutual information I (X;Y) is the intersection between H(X) and H(Y)
 These properties are reflected in three equivalent definitions for the
mutual information between X and Y :
Haramaya
University,
HIT,
School
of
ECE
35 Haramaya University, HIT, School of ECE
11/22/2022
Information
Assignment-1
Using Probability Theory, Show that there is only one way to
measure information which is in terms of number of bits.
Haramaya
University,
HIT,
School
of
ECE
36 Haramaya University, HIT, School of ECE
11/22/2022
Shannon’s 1st Source Coding Theorem
 Shannon showed that:
“To reliably store the information generated by some
random source X, you need no more/less than, on the
average, H(X) bits for each outcome.”
Haramaya
University,
HIT,
School
of
ECE
37 Haramaya University, HIT, School of ECE
11/22/2022
Shannon’s 1st Source Coding Theorem
 If I toss a dice 1,000,000 times and record values from each
trial
1,3,4,6,2,5,2,4,5,2,4,5,6,1,….
 In principle, I need 3 bits for storing each outcome as 3 bits
covers 1-8. So I need 3,000,000 bits for storing the
information.
 Using ASCII representation, computer needs 8 bits=1 byte
for storing each outcome
 The resulting file has size 8,000,000 bits
Haramaya
University,
HIT,
School
of
ECE
38 Haramaya University, HIT, School of ECE
11/22/2022
Shannon’s 1st Source Coding Theorem
 You only need 2.585 bits for storing each outcome.
 So, the file can be compressed to yield size
2.585x1,000,000=2,585,000 bits
 Optimal Compression Ratio is:
Haramaya
University,
HIT,
School
of
ECE
39
%
31
.
32
3231
.
0
000
,
000
,
8
000
,
585
,
2


Haramaya University, HIT, School of ECE
11/22/2022
Shannon’s 1st Source Coding Theorem
.
Haramaya
University,
HIT,
School
of
ECE
40
Haramaya University, HIT, School of ECE
11/22/2022

Introduction to Information Theory and Coding.pdf

  • 1.
    Course Coordinator:- Dr. MulugetaAtlabachew (Ass. Professor) HARAMAYA UNIVERSITY HARAMAYA INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING Course Coordinator:- Dr. Mulugeta Atlabachew (Ass. Professor), Guest Lecturer Introduction to Information Theory and Coding
  • 2.
    INFORMATION  Information canbe described as one or more statements or facts that are received by a human and that have some form of worth to the recipient.  The working definition for information therefore must : 1. be something, although the exact nature (substance, energy, or abstract concept) isn't clear; 2. provide “new” information: a repetition of previously received messages isn't informative; 3. be “true:" a lie or false or counterfactual information is mis- information, not information itself; 4. be ``about" something. Haramaya University, HIT, School of ECE 2 Haramaya University, HIT, School of ECE 11/22/2022
  • 3.
     Information canbe thought of as the resolution of uncertainty;  It answers the question of "What an entity is" and thus defines both its essence and the nature of its characteristics.  The concept of information has different meanings in different contexts.  Thus the concept becomes related to notions of constraint, communication, control, data, form, education, knowledge, meaning, understanding, mental stimuli, pattern, perception, representation, and entropy. 3 INFORMATION Haramaya University, HIT, School of ECE 11/22/2022
  • 4.
     Information isstimuli that has meaning in some context for its receiver.  When information is entered into and stored in a computer, it is generally referred to as data.  After processing (such as formatting and printing), output data can again be perceived as information.  When information is packaged or used for understanding or doing something, it is known as knowledge. 4 INFORMATION Haramaya University, HIT, School of ECE 11/22/2022
  • 5.
    Information How to measureInformation?  The uncertainty of an event is measured by its probability of occurrence and is inversely proportional to that.  More uncertain events, require more information to resolve the uncertainty of that event.  The mathematical theory of information is based on probability theory and statistics, and measures information with several quantities of information. Haramaya University, HIT, School of ECE 5 Haramaya University, HIT, School of ECE 11/22/2022
  • 6.
    Information How to measureInformation?  The choice of logarithmic base in the following formulae determines the unit of information entropy that is used.  The most common unit of information is the bit, based on the binary logarithm.  Other units include the nat, based on the natural logarithm or base e, and the hartley, based on the base 10 or common logarithm.  The bit is a typical unit of information. It is 'that which reduces uncertainty by half'. Haramaya University, HIT, School of ECE 6 Haramaya University, HIT, School of ECE 11/22/2022
  • 7.
    Why logarithm tomeasure Information?  We want our information measure I(p) to have several properties 1) Information is a non-negative quantity: I(p) ≥ 0. 2) If an event has probability 1, we get no information from the occurrence of the event: I(1) = 0. 3) If two independent events occur, then the information we get from observing the events is the sum of the two informations: I(p1,p2) = I(p1)+I(p2): 4) We want our information measure to be a continuous (and, in fact, monotonic) function of the probability (slight changes in probability should result in slight changes in information). Haramaya University, HIT, School of ECE 7 Haramaya University, HIT, School of ECE 11/22/2022
  • 8.
    Why logarithm tomeasure Information? We can therefore derive the following: Haramaya University, HIT, School of ECE 8 Haramaya University, HIT, School of ECE 11/22/2022
  • 9.
    Why logarithm tomeasure Information?  From this, we can derive the nice property between logarithm and Self- information content:  This formula is called the surprise function which represents the amount of surprise or the amount of self-information a particular symbol x of a distribution holds.  Intuitively the definition can be understood as follows: we would be surprised if a rare symbol ( p(x) is small ) is observed. Thus, lower the p(x), higher the surprise, which is achieved by the above definition. Haramaya University, HIT, School of ECE 9 Haramaya University, HIT, School of ECE 11/22/2022
  • 10.
    Why logarithm tomeasure Information?  Thus, using different bases for the logarithm results in information measures which are just constant multiples of each other, corresponding with measurements in different units: 1. log2 units are bits (from 'binary')-Typical in Information Theory 2. log3 units are trits(from 'trinary') 3. loge units are nats (from 'natural logarithm') (We'll use ln(x) for loge(x)) 4. log10 units are Hartleys, after an early worker in the field. Haramaya University, HIT, School of ECE 10 Haramaya University, HIT, School of ECE 11/22/2022
  • 11.
    Why logarithm tomeasure Information? In general,  The Shannon information content /self-information content/ of an outcome x is defined to be and it is measured in bits.  The entropy of an ensemble X is defined to be the average Shannon information content of an outcome: Haramaya University, HIT, School of ECE 11 Haramaya University, HIT, School of ECE 11/22/2022
  • 12.
    Entropy How to measureInformation?  In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent in the variable's possible outcomes.  The concept of information entropy was introduced by Claude Shannon in his work published in 1948.  Entropy should be a measure of how "surprising" the average outcome of a variable is.  Entropy is the average self-information content of the random variable.  For a continuous random variable, differential entropy is analogous to entropy. Haramaya University, HIT, School of ECE 12 Haramaya University, HIT, School of ECE 11/22/2022
  • 13.
    Entropy How to measureInformation?  Suppose now that we have n symbols {a1; a2; : : : ; an}, and some source is providing us with a stream of these symbols.  Suppose further that the source emits the symbols with probabilities {p1; p2; : : : ; pn}, respectively.  Assume that the symbols are emitted independently.  What is the average amount of information we get from each symbol we see in the stream from the source? Haramaya University, HIT, School of ECE 13 Haramaya University, HIT, School of ECE 11/22/2022
  • 14.
    Entropy How to measureInformation?  What we really want here is a weighted average.  If we observe the symbol ai, we will be getting log(1/pi) information from that particular observation.  In a long run (say N) of observations, we will see (approximately) N*pi occurrences of symbol ai.  Thus, in the N (independent) observations, we will get total information I of Haramaya University, HIT, School of ECE 14 Haramaya University, HIT, School of ECE 11/22/2022
  • 15.
    Entropy How to measureInformation?  Then, the average information we get per symbol observed for the N observations will be  We define to be zero when Haramaya University, HIT, School of ECE 15 Haramaya University, HIT, School of ECE 11/22/2022
  • 16.
    Entropy  For generalization,let us suppose that we have a set of probabilities (a probability distribution) P = {p1, p2,…, pn}.  We define the entropy of the distribution P by:  if we have a continuous rather than discrete probability distribution P(x): Haramaya University, HIT, School of ECE 16 Haramaya University, HIT, School of ECE 11/22/2022
  • 17.
    Entropy  Another wayto think about entropy is in terms of expected value.  Given a discrete probability distribution P = {p1, p2,…, pn} with and or a continuous distribution P(x) with and  We can define the expected value of an associated discrete set F = {f1; f2; : : : ; fn} or function F(x) by:  Or  From this we can conclude that Haramaya University, HIT, School of ECE 17 Haramaya University, HIT, School of ECE 11/22/2022
  • 18.
    Entropy  The entropyof a probability distribution is just the expected value of the information of the distribution..  Intuitively, the more the expected surprise or the entropy of the distribution, the harder it is to represent. Haramaya University, HIT, School of ECE 18 Therefore, Entropy represents the expected value of surprise a distribution holds. Haramaya University, HIT, School of ECE 11/22/2022
  • 19.
    Information Theory  Informationtheory is the scientific study of the quantification, storage, and communication of information.  The field was fundamentally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s.  The field is at the intersection of probability theory, statistics, computer science, statistical mechanics, information engineering, and electrical engineering. Haramaya University, HIT, School of ECE 19 Haramaya University, HIT, School of ECE 11/22/2022
  • 20.
     “The fundamentalproblem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.” Haramaya University, HIT, School of ECE 20 Shannon Definition of Communication November 22 “Frequently the messages have meaning” “... [which is] irrelevant to the engineering problem.”
  • 21.
     Shannon wantsto find a way for “reliably” transmitting data throughout the channel at “maximal” possible rate. Haramaya University, HIT, School of ECE 21 Shannon Definition of Communication November 22 Destination Decoding Communication Channel Coding Information Source
  • 22.
    Model of aDigital Communication System Destination Decoding Communication Channel Coding Information Source Message e.g. English symbols Encoder e.g. English to 0,1 sequence Can have noise or distortion Decoder e.g. 0,1 sequence to English 22 Haramaya University, HIT, School of ECE 11/22/2022
  • 23.
    Information  Given adiscrete random variable X , with possible outcomes x1,x2,…xn, which occur with probability p(x1),p(x2),…,p(xn) the entropy of X is formally defined as:  This Function is called the entropy function Haramaya University, HIT, School of ECE 23 ) ( log ) ( ) ( 2 x p x p X H x    Haramaya University, HIT, School of ECE 11/22/2022
  • 24.
    Information Example  Tossing adice:  Outcomes are 1,2,3,4,5,6  Each occurs at probability 1/6  Information provided by tossing a dice is Haramaya University, HIT, School of ECE 24 bits 585 . 2 6 log 6 1 log 6 1 ) ( log ) ( ) ( log ) ( 2 6 1 2 2 6 1 2 6 1               i i i i p i p i p i p H Haramaya University, HIT, School of ECE 11/22/2022
  • 25.
    ENTROPY-Properties NOTATIONS  We usecapital letters X and Y to name random variables,  lower case letters and to refer to their respective outcomes.  These are drawn from particular sets A and B: and  The probability of a particular outcome is denoted Haramaya University, HIT, School of ECE 25 Haramaya University, HIT, School of ECE 11/22/2022
  • 26.
    ENTROPY-Properties JOINT ENSEMBLE  Ajoint ensemble „XY' is an ensemble whose outcomes are ordered pairs with  The joint ensemble XY defines a probability distribution over all possible joint outcomes MARGINAL PROBABILITY  From the Sum Rule, we can see that the probability of X taking on a particular value is the sum of the joint probabilities of this outcome for X and all possible outcomes for Y : Haramaya University, HIT, School of ECE 26 Haramaya University, HIT, School of ECE 11/22/2022
  • 27.
    ENTROPY-Properties  In asimplified way it can be rewrite as  And similarly for Y Haramaya University, HIT, School of ECE 27 Haramaya University, HIT, School of ECE 11/22/2022
  • 28.
    ENTROPY-Properties Conditional probability  Fromthe Product Rule, we can easily see that the conditional probability that  In a simplified way  Similarly Haramaya University, HIT, School of ECE 28 Haramaya University, HIT, School of ECE 11/22/2022
  • 29.
    ENTROPY-Properties JOINT ENTROPY ENSEMBLES CONDITIONALENTROPY  Conditional entropy of an ensemble X, given that y = bj measures the uncertainty remaining about random variable X after specifying that random variable Y has taken on a particular value y = bj.  It is defined naturally as the entropy of the probability distribution Haramaya University, HIT, School of ECE 29 Haramaya University, HIT, School of ECE 11/22/2022
  • 30.
    ENTROPY-Properties CONDITIONAL ENTROPY  Conditionalentropy of an ensemble X, given an ensemble Y is given by CHAIN RULE FOR ENTROPY  The joint entropy, conditional entropy, and marginal entropy for two ensembles X and Y are related by: Haramaya University, HIT, School of ECE 30 Haramaya University, HIT, School of ECE 11/22/2022
  • 31.
    ENTROPY-Properties  It shouldseem natural and intuitive that the joint entropy of a pair of random variables is the entropy of one plus the conditional entropy of the other (the uncertainty that it adds once its dependence on the first one has been discounted by conditionalizing on it).  If we have three random variables X; Y; Z, the conditionalizing of the joint distribution of any two of them, upon the third, is also expressed by a Chain Rule: Haramaya University, HIT, School of ECE 31 Haramaya University, HIT, School of ECE 11/22/2022
  • 32.
    ENTROPY-Properties INDEPENDENCE BOUND ONENTROPY  A consequence of the Chain Rule for Entropy is that if we have many different random variables X1; X2; :::; Xn, then the sum of all their individual entropies is an upper bound on their joint entropy:  Their joint entropy only reaches this upper bound if all of the random variables are independent. Haramaya University, HIT, School of ECE 32 Haramaya University, HIT, School of ECE 11/22/2022
  • 33.
    ENTROPY-Properties MUTUAL INFORMATION  Mutualinformation is a quantity that measures a relationship between two random variables that are sampled simultaneously.  In particular, it measures how much information is communicated, on average, in one random variable about another.  Intuitively, one might ask, how much does one random variable tell me about another?  In this definition, P(X) and P(Y ) are the marginal distributions of X and Y obtained through the marginalization process. Haramaya University, HIT, School of ECE 33 Haramaya University, HIT, School of ECE 11/22/2022
  • 34.
    ENTROPY-Properties MUTUAL INFORMATION  Notethat in case X and Y are independent random variables, then the numerator inside the logarithm equals the denominator.  Then the log term vanishes, and the mutual information equals zero, as one should expect.  In the event that the two random variables are perfectly correlated, then their mutual information is the entropy of either one alone.  The mutual information of a random variable with itself is just its entropy or self-information Haramaya University, HIT, School of ECE 34 Haramaya University, HIT, School of ECE 11/22/2022
  • 35.
    ENTROPY-Properties MUTUAL INFORMATION  Themutual information I (X;Y) is the intersection between H(X) and H(Y)  These properties are reflected in three equivalent definitions for the mutual information between X and Y : Haramaya University, HIT, School of ECE 35 Haramaya University, HIT, School of ECE 11/22/2022
  • 36.
    Information Assignment-1 Using Probability Theory,Show that there is only one way to measure information which is in terms of number of bits. Haramaya University, HIT, School of ECE 36 Haramaya University, HIT, School of ECE 11/22/2022
  • 37.
    Shannon’s 1st SourceCoding Theorem  Shannon showed that: “To reliably store the information generated by some random source X, you need no more/less than, on the average, H(X) bits for each outcome.” Haramaya University, HIT, School of ECE 37 Haramaya University, HIT, School of ECE 11/22/2022
  • 38.
    Shannon’s 1st SourceCoding Theorem  If I toss a dice 1,000,000 times and record values from each trial 1,3,4,6,2,5,2,4,5,2,4,5,6,1,….  In principle, I need 3 bits for storing each outcome as 3 bits covers 1-8. So I need 3,000,000 bits for storing the information.  Using ASCII representation, computer needs 8 bits=1 byte for storing each outcome  The resulting file has size 8,000,000 bits Haramaya University, HIT, School of ECE 38 Haramaya University, HIT, School of ECE 11/22/2022
  • 39.
    Shannon’s 1st SourceCoding Theorem  You only need 2.585 bits for storing each outcome.  So, the file can be compressed to yield size 2.585x1,000,000=2,585,000 bits  Optimal Compression Ratio is: Haramaya University, HIT, School of ECE 39 % 31 . 32 3231 . 0 000 , 000 , 8 000 , 585 , 2   Haramaya University, HIT, School of ECE 11/22/2022
  • 40.
    Shannon’s 1st SourceCoding Theorem . Haramaya University, HIT, School of ECE 40 Haramaya University, HIT, School of ECE 11/22/2022