信息论中信源熵之间关系的证明
信息论中有关各种熵之间关系的证明
⒈基本定义
1.1信息就是对事物动态(或它的存在方式)的不确定性的一种描述. 不确定性及随机性,可以用研究随机现象的数学教具—概率论与随机过程来描述信息.
1.2自信息量:一个随机事件发生某一结果后所带来的信息量称为自信息量,简称自信息. 用I (a i ) 来表示.
1.3联合自信息量:自信息量是二维联合集XY 上元素a i b j 的联合概率
p (a i b j ) 数的负值,称为联合自信息量. 用I (a i b j ) 来表示.
1.4条件自信息量:为条件概率对数的负值. 用I (a i /b j ) 来表示.
1.5交互信息量:a i 后验概率与先验概率比值的对数为b j 对a i 的互信息量, 也称交互信息量(简称互信息). 用I (a i ; b j ) 来表示.
1.6信源熵:信源各个离散消息的自信息量的数学期望(即概率加权的统计平均值)为信源的平均自信息量,一般称为信源的信息熵,也叫信源熵或香农熵,记为H (X ) .
1.7条件熵:在联合符号集合XY 上的条件自信息量的数学期望. 可以用
H (X /Y ) 表示.
1.8联合熵:也叫共熵,是联和离散符号XY 上的每的元素a i b j 的联合自信息量的数学期望,用H (XY ) 表示. 2.基本公式
2.1自信息量:I (a i ) =−log 2p (a i )
2.2联合的自信息量:I (a i b j ) =−log 2p (a i b j ) 当X 和Y 相互独立时,p (a i b j ) =p (a i ) p (b j ) ;则有:
I (a i b j ) =−log 2p (a i b j ) =−log 2p (a i ) p (b j ) =−log 2p (a i ) −log 2p (b j ) =I (a i ) +I (b j ) 2.3条件自信息量:I (a i /b j ) =−log 2p (a i /b j ) 或I (b j /a i ) =−log 2p (b j /a i )
2.4互信息量:I (a i ; b j ) =log 2
p (a i /b j )
(i =1, 2, ⋯, n ; j =1, 2, ⋯, m )
p (a i )
n
1
2.5信源熵:H (X ) =E [I (a i )]=E [log2]=−∑p (a i ) log 2p (a i )
p (a i ) i =1
2.6条件熵:ⅰ:在已知随机变量Y 的条件下,随机变量X 的条件熵H (X /Y ) 为:
H (X /Y ) =E [I (a i /b j )]=∑∑p (a i b j ) I (a i /b j )
j =1i =1
m n
=−∑∑p (a i b j ) log 2p (a i /b j ) .
j =1i =1
m n
ⅱ:在已知随机变量X 的条件下,随机变量Y 的条件熵H (Y /X ) 为:
H (Y /X ) =E [I (b j /a i )]=∑∑p (a i b j ) I (b j /a i )
j =1i =1m
m n
=−∑∑p (a i b j ) log 2p (b j /a i ) .
j =1i =1
n
2.7联合熵:H (XY ) =∑∑p (a i b j ) I (a i b j ) =−∑∑p (a i b j ) log 2p (a i b j ) .
i =1j =1
j =1i =1
n m m n
2.8有关概率的基本公式:∑p (a i ) =1,∑p (b j ) =1,∑p (a i /b j ) =1,
i =1
j =1
i =1
n m n
∑p (b
j =1
m
j
/a i ) =1,
∑∑p (a b ) =1,∑p (a b ) =p (b ) , ∑p (a b ) =p (a )
i j
i j
j
i j
i
i =1
j =1
i =1
j =1
n m n m
,
p (a i b j ) =p (a i ) p (b j /a i ) =p (b j ) p (a i /b j ) . 3.各种熵之间的关系
3.1无条件熵
3.1.2H (X ) =H (X /Y ) +I (X ; Y ) ≥H (X /Y ) . 证明:①H (X ) =−∑p (a i ) log 2p (a i )
i =1n
=−∑∑p (a i b j ) log 2
j =1i =1
m n
p (a i )
p (a i /b j )
p (a i /b j )
p (a i /b j ) m n
=∑∑p (a i b j ) log 2−∑∑p (a i b j ) log 2p (a i /b j )
p (a ) j =1i =1j =1i =1i
m
n
=I (X ; Y ) +H (X /Y ) .
②H (X /Y ) =−∑∑p (b j ) p (a i /b j ) log 2p (a i /b j )
j
i
=−∑p (b j ) [∑p (a i /b j ) log 2p (a i /b j ) ].
j
i
由熵的极值性知:
H (X /Y ) ≤−∑p (b j ) [∑p (a i /b j ) log 2p (a i ) ]
j
i
=−∑[∑p (b j ) p (a i /b j )]log 2p (a i )
i
j
=H (X ) ,
其中
p (b j ) p (a i /b j ) =∑p (a i b j ) =p (a i ) . ∑j j
同理:H (Y ) =H (Y /X ) +I (X ; Y ) ≥H (Y /X ) . 3.1.2. H (X ) =H (XY ) −H (Y /X ) . 证明:H (X ) =−∑p (a i ) log 2p (a i )
i
=−∑
i
p (a i b j ) [∑p (b j ) p (a i /b j )]log 2
p (b j /a i ) j
j
i
j
=−∑∑p (a i b j ) log 2p (a i b j ) −[−∑∑p (a i b j ) log 2p (b j /a i )]
i
=H (XY ) −H (Y /X ) ,
同理:H (Y ) =H (XY ) −H (X /Y ) . 3.2条件熵
H (X /Y ) =H (XY ) −H (Y ) =H (X ) −I (X ; Y ) .
3.2.1H (X /Y ) =H (XY ) −H (Y ) . 证明:H (X /Y ) =−
p (a i b j ) log ∑∑i j
=1
=1
n
m
n m
2
p (a i /b j )
=−∑∑p (a i b j ) log 2p (a i b j ) +
i =1j =1
∑[∑p (a b ) ]log
i j
j =1
i =1
m n
2
p (b j )
=−∑∑p (a i b j ) log 2p (a i b j ) +∑p (b j ) log 2p (b j )
i =1j =1
j =1
n m m
=H (XY ) −H (Y ) ,
3.2.2H (X /Y ) =H (X ) −I (X ; Y ) .
其中:∑p (a i b j ) =p (b j ) .
i =1
n
证明:H (X /Y ) =−∑∑p (a i b j ) log 2p (a i /b j )
i =1j =1
m
n m
p (a i b j )
=−∑∑p (a i b j ) log 2p (a i )
p (a i ) i =1j =1
n
=−∑[∑p (a i b j ) ]log 2p (a i ) −∑
i =1
j =1
i =1m
n m n
∑p (a i b j ) log 2
j =1
i j
i
m
p (a i /b j )
p (a i )
=H (X ) −I (X ; Y ) , 其中:
∑p (a b ) =p (a ) .
j =1
同理:H (Y /X ) =H (XY ) −H (X ) =H (Y ) −I (X ; Y ) . 3.3联合熵
H (XY ) =H (YX )
H (XY ) =H (X ) +H (Y /X ) =H (Y ) +H (X /Y )
=H (X ) +H (Y ) −I (X ; Y )
=H (X /Y ) +H (Y /X ) +I (X ; Y ) .
3.3.1H (XY ) =H (X ) +H (Y /X ) =H (Y ) +H (X /Y ) . 证明:H (XY ) =−∑∑p (a i b j ) log 2p (a i b j )
i =1j =1n
m n
m
=−∑∑p (a i b j ) log 2p (a i ) p (b j /a i )
i =1j =1n
m
=−∑[∑p (a i b j ) ]log 2p (a i ) −∑∑p (a i b j ) p (b j /a i )
i =1
j =1
i =1j =1
m
n m
=H (X ) +H (Y /X ) ,
其中:∑p (a i b j ) =p (a i ) .
j =1
同理:H (XY ) =H (Y ) +H (X /Y ) .
.
3.3.2H (XY ) =H (X ) +H (Y ) −I (X ; Y )
n
m
证明:H (XY ) =−∑∑p (a i b j ) log 2p (a i ) p (b j /a i )
i =1j =1n
m
=−∑∑p (a i b j ) log 2p (a i ) p (b j )
i =1j =1n
m
p (b j /a i ) p (b j )
n
2
=−∑[∑p (a i b j ) ]log
i =1
j =1m n
2
p (a i ) −∑[∑p (a i b j ) ]log
j =1
i =1
m
p (b )
j
−∑∑p (a i b j ) log 2
i =1j =1
p (b j /a i ) p (b j )
=H (X ) +H (Y ) −I (X ; Y ) .
3.3.3
H (XY ) =H (X /Y ) +H (Y /X ) +I (X ; Y ) .
n
m
证明:H (XY ) =−∑∑p (a i b j ) log 2p (a i ) p (b j /a i )
i =1j =1n
m
=−∑∑p (a i b j ) log 2p (a i /b j ) p (b j /a i )
i =1j =1n
m
n
m
p (a i ) p (a i /b j )
=−∑∑p (a i b j ) log 2p (a i /b j ) −∑∑p (a i b j ) log 2p (b j /a i )
i =1j =1n
i =1j =1
+∑∑p (a i b j ) log 2
i =1j =1
m
p (a i /b j ) p (a i )
=H (X /Y ) +H (Y /X ) +I (X ; Y )
3.4交互熵
I (X ; Y ) =I (Y ; X )
I (X ; Y ) =H (X ) −H (X /Y ) =H (Y ) −H (Y /X )
=H (XY ) −H (X /Y ) −H (Y /X ) =H (X ) +H (Y ) −H (XY ) .
3.4.1I (X ; Y ) =H (X ) −H (X /Y ) =H (Y ) −H (Y /X ) 证明:I (X ; Y ) =∑∑p (a i b j ) log 2
i =1j =1n
m
p (a i /b j ) p (a i )
=−∑[∑p (a i b j ) ]log 2p (a i ) +∑∑p (a i b j ) log 2p (a i /b j )
i =1
j =1
i =1j =1m
n m n m
=H (X ) −H (X /Y ) ,
同理:I (X ; Y ) =H (Y ) −H (Y /X ) . 3.4.2证明:I (X ; Y ) =∑∑p (a i b j ) log 2
i =1j =1n
m n
m
其中:∑p (a i b j ) =p (a i ) .
j =1
p (a i /b j ) p (a i )
1p (a i b j )
=∑∑p (a i b j ) log 2p (a i /b j ) p (b j /a i )
i =1j =1n
m
n
m
=−∑∑p (a i b j ) log 2p (a i b j ) +∑∑p (a i b j ) log 2p (a i /b j )
i =1j =1m
i =1j =1
+∑∑p (a i b j ) log 2p (b j /a i )
i =1j =1
n
=H (XY ) −H (X /Y ) −H (Y /X ) .
3.4.3证明:I (X ; Y ) =∑∑p (a i b j ) log 2
i =1j =1n
m n
m
p (a i /b j ) p (a i )
p (a i b j )
=∑∑p (a i b j ) log 2
p (a i ) p (b j ) i =1j =1
=−∑[∑p (a i b j ) ]log 2p (a i ) −∑[∑p (a i b j ) ]log 2p (b j )
i =1
j =1m
j =1
i =1
n
m
m
n
+∑∑p (a i b j ) log 2p (a i b j )
i =1j =1
n
=H (X ) +H (Y ) −H (XY ) .
其中:∑p (a i b j ) =p (a i ) ,∑p (a i b j ) =p (b j ) .
j =1
i =1
m
n