Gussian Distribution

Gaussian processes

Gaussian processes

Gaussian processes are distributions over functions $f(x)$:

$$f(x) \sim \mathcal{GP}(m(x),k(x,x'))$$

For any finite subset $X =\{\mathbf{x}_1 \ldots \mathbf{x}_n \}$ of the domain of $x$, the marginal distribution:

$$f(X) \sim \mathcal{N}(m(X), k(X, X))$$

with mean vector $\mathbf{\mu} = m(X)$ and covariance matrix $\Sigma = k(X, X)$.

Gaussian processes for regression

$$ \left[\begin{array}{c} \mathbf{y}_{1} \\ \mathbf{y}_{2} \end{array}\right] \sim \mathcal{N} \left( \left[\begin{array}{c} \mu_{1} \\ \mu_{2} \end{array}\right], \left[ \begin{array}{cc} \Sigma_{11} + \sigma_\epsilon^2I & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{array} \right] \right) $$

Where: $$\begin{split} \mu_{1} & = m(X_1) \quad (n_1 \times 1) \\ \mu_{2} & = m(X_2) \quad (n_2 \times 1) \\ \Sigma_{11} & = k(X_1,X_1) \quad (n_1 \times n_1) \\ \Sigma_{22} & = k(X_2,X_2) \quad (n_2 \times n_2) \\ \Sigma_{12} & = k(X_1,X_2) = k_{21}^\top \quad (n_1 \times n_2) \end{split}$$

Note that $\Sigma_{11}$ is independent of $\Sigma_{22}$ and vice versa.

$$\begin{split} p(\mathbf{y}_2 \mid \mathbf{y}_1, X_1, X_2) & = \mathcal{N}(\mu_{2|1}, \Sigma_{2|1}) \\ \mu_{2|1} & = \mu_2 + \Sigma_{21} \Sigma_{11}^{-1} (\mathbf{y}_1 - \mu_1) \\ & = \Sigma_{21} \Sigma_{11}^{-1} \mathbf{y}_1 \quad (\text{if assume mean prior } \mu = 0 ) \\ & = (\Sigma_{11}^{-1} \Sigma_{12})^{\top} \mathbf{y}_1\\ \Sigma_{2|1} & = \Sigma_{22} - \Sigma_{21} \Sigma_{11}^{-1}\Sigma_{12} \\ & = \Sigma_{22} - (\Sigma_{11}^{-1} \Sigma_{12})^{\top} \Sigma_{12} \end{split}$$

It is then possible to predict $\mathbf{y}_2$ corresponding to the input samples $X_2$ by using the mean $\mu_{2|1}$ of the resulting distribution as a prediction. Notice that the mean of the posterior predictions $\mu_{2|1}$ of a Gaussian process are weighted averages of the observed variables $\mathbf{y}_1$, where the weighting is based on the covariance function $k$. The variance $\sigma_2^2$ of these predictions is then the diagonal of the covariance matrix $\Sigma_{2|1}$.

Intuitively, the kernel regression estimate at any point x can be thought of as a weighted sum of training labels $y_i$ using the similarity between the corresponding $x_i$ and $x$.

$$ f(x) = \sum_{i=1}^{i=n}{k(x_i, x)[K^{-1}y]_i}$$
In [2]:
In [3]:
Out[3]:
<matplotlib.legend.Legend at 0x10a1fcbe0>
This blog is converted from gaussian_process.ipynb
Written on April 1, 2022