Mutual Information
- Entropy ¶
- Conditional Entropy ¶
- Joint Entropy ¶
- Mutual Information ¶
- Cross Entropy ¶
- Kullback–Leibler divergence ¶
- Connections ¶
Entropy ¶
$$ H(x) = -\sum_i{p(x_i)log(p(x_i))} $$Conditional Entropy ¶
$$ \begin{align} H(x|y) =& -\sum_{i,j}{p(x_i, y_j)log{p(x_i|y_j)}} \\ =& -\sum_{i,j}{p(x_i, y_j)log{\frac{p(x_i,y_j)}{p(y_j)}}} \end{align} $$Joint Entropy ¶
$$ H(x, y) = -\sum_{i,j}{p(x_i, y_j)log(p(x_i, y_j))} $$Mutual Information ¶
$$ MI(x, y) = -\sum_{i,j}{p(x_i, y_j)log(\frac{p(x_i)p(y_j)}{p(x_i, y_j)})} $$Cross Entropy ¶
$$ H(p, q) = -\sum{p(x)log(q(x))} $$Kullback–Leibler divergence ¶
$p$ is the true distribution while $q$ is the approximation. $$ D_{KL}(p \| q) = -\sum p(x) log(\frac{q(x)}{p(x)}) $$
Connections ¶
$$ H(X,Y) = H(X) + H(Y|X) = H(Y) + H(X|Y) $$$$ MI(X,Y) = H(X) + H(Y) - H(X,Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) $$$$ MI(X,Y) = D_{KL}(p(x,y) \| p(x)p(y)) $$$$ H(p, q) = H(p) + D_{KL}(p \| q) $$
In [4]:
# Some examples
import numpy as np
# prob
p_x0 = 0.3
p_x1 = 0.7
p_y0 = 0.4
p_y1 = 0.6
# conditional prob
p_x0_y0 = 0.5
p_x1_y0 = 0.5
p_x0_y1 = 0.3
p_x1_y1 = 0.7
# joint prob
p_x0y0 = p_y0 * p_x0_y0
p_x1y0 = p_y0 * p_x1_y0
p_x0y1 = p_y1 * p_x0_y1
p_x1y1 = p_y1 * p_x1_y1
H_x = -p_x0 * np.log(p_x0) - p_x1 * np.log(p_x1)
H_y = -p_y0 * np.log(p_y0) - p_y1 * np.log(p_y1)
H_x_y = -p_x0y0*np.log(p_x0_y0) - p_x1y0*np.log(p_x1_y0) - p_x0y1*np.log(p_x0_y1) - p_x1y1 * np.log(p_x1_y1)
H_xy = -p_x0y0*np.log(p_x0y0) - p_x1y0*np.log(p_x1y0) - p_x0y1*np.log(p_x0y1) - p_x1y1*np.log(p_x1y1)
assert(H_xy == H_y + H_x_y)
#
This blog is converted from mutual-information.ipynb
Written on June 3, 2021