Machine Learning Notes - Loss functions

Loss Introduction

Given an input $X$ and a model parameterized by $\theta$, we would like to minimize a loss function $L(X; \theta)$ by adjusting model parameter $\theta$. The optimization can usually be done using gradient decent, Guassian Netwon, etc.

Mean Square Loss

Let's start with a very simple linear regression problem. In order to visualize the loss vs parameter in a 2D plot, we parameterize the line as $y = ax$.

$$ L = \sum_{i}(\hat{y}_i - y_i)^2 $$

where, $\hat{y}$ is the predicted value and $y$ is the ground truth.

Pytorch Loss

In [159]:
Best $a$ with least loss is 1.01
Out[159]:
<matplotlib.legend.Legend at 0x14fd96eb0>

Hinge Loss

For classification, we want the predicted score with the correct class label as large as possible. Hinge Loss is shown below. $s_c$ is the loss of the correct class and $s_i$ is a loss of an incorrect class. $$ L = \sum_{i}{\max(0, s_i - s_c + 1)} $$

Pytorch Loss

In [1]:
The correct label is 1

 class 0 score: 1.05
 class 1 score: 0.82
 class 2 score: 0.90

Hinge loss is 2.3063
In [155]:
Pytorch Hinge loss is 2.9947

Cross Entropy Loss

Another common loss function used for classification is Cross Entropy loss. The cross-entropy of the distribution $q$ relative to a distribution $p$ over a given set is defined as: $$ H(p,q) = -E_p[log(q)] $$

KL divergence is defined as $KL(p || q) = H(p,q) - H(p)$, also know as relative entropy of true distribution of $p$ with respect to predicted distribution $q$.

For discrete distribution, $$ H(p, q) = -\sum_{x}p(x)log(q(x)) \\ H(p) = -\sum_{x}p(x)log(p(x)) \\ KL(p, q) = -\sum_{x}p(x)log(\frac{q(x)}{p(x)}) $$

We can use Softmax function to convert output score into a distribution: $$ q_k = \frac{e^{s_k}}{\sum_i{e^{s_i}}} $$

Then if the loss function can be $$ L = \sum_i{-log(q_{ic})} $$ where $q_{ic}$ is the probability of the correct class for sample $i$.

Pytorch Loss

In [156]:
The correct label is 1

 class 0 score: 1.27
 class 1 score: 1.80
 class 2 score: 1.46

 class 0 p: 0.26
 class 1 p: 0.43
 class 2 p: 0.31

Cross entropy loss is 0.8341
In [157]:
Pytorch Cross entropy loss is 0.8341
This blog is converted from machine-learning-loss.ipynb
Written on May 7, 2021