Gussian Distribution

Gaussian processes ¶
- Gaussian processes ¶
- Gaussian processes for regression ¶

Gaussian processes ¶

Gaussian processes are distributions over functions $f(x)$:

$$f(x) \sim \mathcal{GP}(m(x),k(x,x'))$$

For any finite subset $X =\{\mathbf{x}_1 \ldots \mathbf{x}_n \}$ of the domain of $x$, the marginal distribution:

$$f(X) \sim \mathcal{N}(m(X), k(X, X))$$

with mean vector $\mathbf{\mu} = m(X)$ and covariance matrix $\Sigma = k(X, X)$.

Gaussian processes for regression ¶

$$ \left[\begin{array}{c} \mathbf{y}_{1} \\ \mathbf{y}_{2} \end{array}\right] \sim \mathcal{N} \left( \left[\begin{array}{c} \mu_{1} \\ \mu_{2} \end{array}\right], \left[ \begin{array}{cc} \Sigma_{11} + \sigma_\epsilon^2I & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{array} \right] \right) $$

Where: $$\begin{split} \mu_{1} & = m(X_1) \quad (n_1 \times 1) \\ \mu_{2} & = m(X_2) \quad (n_2 \times 1) \\ \Sigma_{11} & = k(X_1,X_1) \quad (n_1 \times n_1) \\ \Sigma_{22} & = k(X_2,X_2) \quad (n_2 \times n_2) \\ \Sigma_{12} & = k(X_1,X_2) = k_{21}^\top \quad (n_1 \times n_2) \end{split}$$

Note that $\Sigma_{11}$ is independent of $\Sigma_{22}$ and vice versa.

$$\begin{split} p(\mathbf{y}_2 \mid \mathbf{y}_1, X_1, X_2) & = \mathcal{N}(\mu_{2|1}, \Sigma_{2|1}) \\ \mu_{2|1} & = \mu_2 + \Sigma_{21} \Sigma_{11}^{-1} (\mathbf{y}_1 - \mu_1) \\ & = \Sigma_{21} \Sigma_{11}^{-1} \mathbf{y}_1 \quad (\text{if assume mean prior } \mu = 0 ) \\ & = (\Sigma_{11}^{-1} \Sigma_{12})^{\top} \mathbf{y}_1\\ \Sigma_{2|1} & = \Sigma_{22} - \Sigma_{21} \Sigma_{11}^{-1}\Sigma_{12} \\ & = \Sigma_{22} - (\Sigma_{11}^{-1} \Sigma_{12})^{\top} \Sigma_{12} \end{split}$$

It is then possible to predict $\mathbf{y}_2$ corresponding to the input samples $X_2$ by using the mean $\mu_{2|1}$ of the resulting distribution as a prediction. Notice that the mean of the posterior predictions $\mu_{2|1}$ of a Gaussian process are weighted averages of the observed variables $\mathbf{y}_1$, where the weighting is based on the covariance function $k$. The variance $\sigma_2^2$ of these predictions is then the diagonal of the covariance matrix $\Sigma_{2|1}$.

Intuitively, the kernel regression estimate at any point x can be thought of as a weighted sum of training labels $y_i$ using the similarity between the corresponding $x_i$ and $x$.

$$ f(x) = \sum_{i=1}^{i=n}{k(x_i, x)[K^{-1}y]_i}$$

In [2]:

# Imports

import sys
import numpy as np
import scipy
import scipy.spatial
import matplotlib.pyplot as plt


class GpRegressor():
    def __init__(self, X, y, kernel_func, noise):
        '''
        @param X: Training input (m, n): each row is a input vector
        @param y: Training output (m, ): y = f(X) + noise
        '''
        self.X = X
        self.y = y
        self.kernel_func = kernel_func
        self.noise = noise
        # covirance matrix K: mxm 
        self.m = X.shape[0]
        self.n = X.shape[1]
        self.K = self.kernel_func(self.X, self.X) + ((self.noise ** 2) * np.eye(self.m))
        assert(np.linalg.matrix_rank(self.K) == self.m)
        self.invK = np.linalg.inv(self.K)
        self.invK_y= self.invK.dot(self.y)
        
    def predict(self, X):
        '''
        @param X: Training input (m, n): each row is a input vector
        @param y: Training output (m, ): y = f(X) + noise
        '''
        K_21 = self.kernel_func(X, self.X)
        return K_21.dot(self.invK_y), self.kernel_func(X, X) - K_21.dot(self.invK.T).dot(K_21.T)
    
def rbf_kernel(xa, xb,sigma=1):
    sq_norm = -0.5 * scipy.spatial.distance.cdist(xa, xb, 'sqeuclidean')/sigma
    return np.exp(sq_norm)

#

In [3]:

##### Define the true function that we want to regress on
def ground_height_func(x):
    ground_height = 1 - x ** 2 / 25
    ground_height[ground_height<0] = 0
    return ground_height
    
n1 = 6   # Number of training points 
n2 = 50  # Number of test points
domain = (-6, 6)

# Sample observations (X1, y1) on the function
X1 = np.linspace(domain[0], domain[1], n1).reshape(-1,1)
noise = 0  # The standard deviation of the noise
y1 = ground_height_func(X1) + (noise ** 2) * np.random.randn(n1).reshape(-1, 1)
gp_regressor = GpRegressor(X1, y1, rbf_kernel, noise)

X2 = np.linspace(domain[0], domain[1], n2).reshape(-1,1)
y2, K2 = gp_regressor.predict(X2)
sigma_pred = np.sqrt(np.diag(K2))

# Plot the postior distribution and some samples
fig, ax1 = plt.subplots(
    nrows=1, ncols=1, figsize=(6,6), dpi=100)
# Plot the distribution of the function (mean, covariance)
ax1.plot(X2, ground_height_func(X2), 'b--', label='Ground height (GT)')
ax1.fill_between(X2.flat, (y2-sigma_pred.reshape(-1,1)).flat, (y2 + sigma_pred.reshape(-1,1)).flat, color='red', 
                 alpha=0.15, label='$\sigma_{pred|train}$')
ax1.plot(X2, y2, 'r.-', lw=2, label='Ground height (Pred)')
ax1.plot(X1, y1, 'ko', linewidth=2, label='Ground height (Training samples)')
ax1.set_xlabel('$x$', fontsize=13)
ax1.set_ylabel('$y$', fontsize=13)
ax1.set_title('Distribution of posterior and prior data')
ax1.axis([domain[0], domain[1], -3, 3])
ax1.legend()

#

Out[3]:

<matplotlib.legend.Legend at 0x10a1fcbe0>

This blog is converted from gaussian_process.ipynb

Written on April 1, 2022