Covariance Visualization
We have a matrix A with dimension as m×n. Each row is a data sample with a feature vector f∈Rn. We can get a covariance matrix which represents correlation among different features. Mn×n=[(f1−ˉf)T,(f2−ˉf)T,⋯]n×m[f1−ˉff2−ˉf⋮]m×n
Assume data samples follow a Gaussian distribution as: p(x)∼N(0,M)p(x)=e−12xTM−1x(2π)n‖ With Eigen decomposition M = Q\Sigma Q^T, we can get \begin{align} p(x) &= \frac{e^{-\frac{1}{2}x^T(Q \Sigma Q^T)^{-1}x}}{\sqrt{(2\pi)^n\|\Sigma\|}} \\ &= \frac{e^{-\frac{1}{2}x^T Q \Sigma^{-1} Q^Tx}}{\sqrt{(2\pi)^n\|\Sigma\|}} \\ \end{align} Let y = Q^Tx, then: \begin{align} p(x) &= \frac{e^{-\frac{1}{2}x^T Q \Sigma^{-1} Q^Tx}}{\sqrt{(2\pi)^n\|\Sigma\|}} \\ &= \frac{e^{-\frac{1}{2}y^T \Sigma^{-1} y}}{\sqrt{(2\pi)^n\|\Sigma\|}} \end{align}
y^T\Sigma^{-1}y follows \chi^2 distribution. For example, if y \in \mathbb{R}^2, and \Sigma's diagonal elements are \sigma_1 and \sigma_2. If we choose p-value as 0.05 and degree of freedom as 2, based on table , we will get y^T\Sigma^{-1}y = 5.99 \\ \frac{y_1^2}{\sigma_1^2} + \frac{y_2^2}{\sigma_2^2} = 5.99
The equation is just an ellipse! But wait, this is an equation for y but our origin data are x, what should we do? Remember y = Q^Tx? Then we have x = Qy. We can get y's frame axes in x's frame are q_1 and q_2.
# Covariance visualization
import numpy as np
import math
import matplotlib.pyplot as plt
npoints = 1000
mean = [0, 0]
s1 = 9
s2 = 1
cov = [[s1, 0], [0, s2]]
# x^2 / 1 + y^2 / 100 = 5.99
# x^2 / (s1 * 5.99) + y^2 / (s2 * 5.99) = 1
# x = cos(t) / sqrt(s1 * 5.99)
# y = sin(t) / sqrt(s2 * 5.99)
x_o, y_o = np.random.multivariate_normal(mean, cov, npoints).T
#plt.plot(x_o, y_o, 'b.')
#plt.plot([math.cos(t) * math.sqrt(s1*5.99) for t in np.linspace(0, 2*math.pi, 100)],
# [math.sin(t) * math.sqrt(s2*5.99) for t in np.linspace(0, 2*math.pi, 100)],
# 'g-')
#plt.axis('equal')
c = math.cos(math.pi / 4)
s = math.sin(math.pi / 4)
# [c -s]
# [s c]
print('Rotation matrix that transform points in O to P')
T_po = np.array([[c, -s],[s,c]])
print(T_po)
x_p = [e[0]*c-e[1]*s for e in zip(x_o, y_o)]
y_p = [e[0]*s+e[1]*c for e in zip(x_o, y_o)]
plt.plot(x_p, y_p, 'b.', label='Data points')
plt.axis('equal')
M = np.zeros((2,2))
for i, j in zip(x_p, y_p):
M[0, 0] += i*i
M[0, 1] += i*j
M[1, 0] += j*i
M[1, 1] += j*j
S, Q = np.linalg.eig(M / npoints)
print('Rotation matrix that transform points in P to O')
print(Q)
ellipse_x_p = [math.cos(t) * math.sqrt(S[0]*5.99) for t in np.linspace(0, 2*math.pi, 100)]
ellipse_y_p = [math.sin(t) * math.sqrt(S[1]*5.99) for t in np.linspace(0, 2*math.pi, 100)]
ellipse_x_o = [e[0]*Q[0,0]+e[1]*Q[0,1] for e in zip(ellipse_x_p, ellipse_y_p)]
ellipse_y_o = [e[0]*Q[1,0]+e[1]*Q[1,1] for e in zip(ellipse_x_p, ellipse_y_p)]
plt.plot(ellipse_x_o, ellipse_y_o, 'g-', label='Covariance ellipse')
# A*q1
reduced_f_p = [e[0]*Q[0,0] + e[1]*Q[1,0] for e in zip(x_p, y_p)]
# We set other feature dimensions all be 0
fx_o = [e*Q[0,0]+0*Q[0,1] for e in reduced_f_p]
fy_o = [e*Q[1,0]+0*Q[1,1] for e in reduced_f_p]
plt.plot(fx_o, fy_o, 'rx', label='Reduced dimension features')
plt.legend()
plt.show()
#
\Sigma contains the variance information of y. If we want to reduce the feature dimension, we may pick first couple of dimensions with big \sigma. We still need to transform the data from x to y using y=Q^Tx. The matrix form will be \tilde{A}^T = Q^TA^T \\ \tilde{A}^T \approx Q[:, 0:k]^TA^T \\ \tilde{A}_{(m\times k)} \approx AQ[:, 0:k]
Another way to view the dimension reduction probably is through this: P(x) = \frac{e^{-\frac{1}{2}x^TQ\Sigma^{-1}Q^Tx}}{\sqrt{((2\pi)^k|\Sigma|}} \\ = \frac{e^{-\frac{1}{2}x^T[q_1, q_2, \cdots, q_n]\Sigma^{-1}[q_1, q_2, \cdots, q_n]^Tx}}{\sqrt{((2\pi)^k|\Sigma|}}
We can set q_{k+1}, \cdots and \sigma_{k+1}, \cdots to zeros (similar to using Singular Value Decomposition to approximate a matrix), then we can get the same results.