Camera Intrinsics and Extrinsics ¶

Basics of Pinhole Camera Model ¶

The projection is shown as below, where $S$ is the sensor frame, $I$ is the image frame, $C$ is the camera frame, and $W$ is the world frame. $I$ frame normally has only a constant offset to $C$, which is so called camera constant.

$$ \begin{align} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = T_{SC} T_{IC} T_{CW} \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix} \end{align} $$

Extrinsic parameter is $T_{CW}$ which typically has 6 DoF - 3 for position and 3 for orientation. $O^W_C$ is the world origin expressed in the camera frame $C$. $O^C_W$ is the camera origin expressed in the world frame $W$.

$$ \begin{align} T_{CW} = \begin{bmatrix} R_{CW},& O^W_{C} \\ 0,& 1 \end{bmatrix} \end{align} $$$$ \begin{align} T_{CW} = \begin{bmatrix} R_{CW},& -R_{CW} O^C_W \\ 0,& 1 \end{bmatrix} \end{align} $$

Intrinsic parameter is $T_{SC} T_{IC}$ without considering non-linear distortions. If we consider image plane is at the negative z direction. Then camera constant $c$ will be negative. $x_H$ and $y_H$ are translation from image frame center to sensor frame center.

$$ \begin{align} T_{SC} T_{IC} & = \begin{bmatrix} 1,& 0, & x_H \\ 0,& 1, & y_H \\ 0,& 0, & 1 \end{bmatrix}\begin{bmatrix} c,& 0, & 0 \\ 0,& c, & 0 \\ 0,& 0, & 1 \end{bmatrix} \\ & = \begin{bmatrix} 1,& s, & x_H \\ 0,& 1+m, & y_H \\ 0,& 0, & 1 \end{bmatrix}\begin{bmatrix} c,& 0, & 0 \\ 0,& c, & 0 \\ 0,& 0, & 1 \end{bmatrix} \end{align} $$$$ \begin{align} K = T_{SC} T_{IC} & = \begin{bmatrix} c_x,& s_{xy}, & x_H \\ 0,& c_y, & y_H \\ 0,& 0, & 1 \end{bmatrix} \end{align} $$

Non-linear distortion could be caused by non-perfect lens so that each pixel projected onto sensor plane is shifted a little based on its position. $x_u$ and $y_u$ are undistorted projected points on the normalized image plane where $depth = 1$ ($x_u = \frac{X_C}{Z_C}, y_u = \frac{Y_C}{Z_C}$, $Z_C$ is the depth). $q$ are the parameters for distortion models, such as barrel distortion, tangent distortion, etc.

$$ \begin{align} x_d =& x_u + \Delta(x_u, q) \\ y_d =& y_u + \Delta(y_u, q) \end{align} $$$$ \begin{align} x_d = H(x_u)x_u \end{align} $$$$ \begin{align} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = T_{SC} T_{IC} \begin{bmatrix} x_d \\ y_d \\ 1 \end{bmatrix} \end{align} $$

Mapping ¶

Inverse map from $uv$ to $x_d$ ¶

$$ \begin{align} x_d = K^{-1}\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} \end{align} $$

Inverse map from $x_d$ to $x_u$ ¶

$$ \begin{align} [x_{u}]_{i+1}= [H([x_{u}]_i)]^{-1}x_d \end{align} $$

Inverse map from $x_u$ to $X_c$ ¶

$$ \begin{align} X_C = \lambda x_u \end{align} $$$$ \begin{align} X_W = O_W^C + R_{CW}^{-1} X_C \end{align} $$

Where $\lambda$ is the depth.

In [3]:

# Mapping and inverse mapping

import numpy as np

print('Map world points to sensor frame')
p_w = np.array([1, 0, -5, 1])
print(f'p_w: {p_w}')
T_cw = np.array([[1, 0, 0, 1], 
                [0, 1, 0, 0],
                [0, 0, 1, 0],
                [0, 0, 0, 1]])
R_cw = T_cw[0:3, 0:3]
O_w = -R_cw.T.dot(T_cw[0:3, 3])

p_c = T_cw.dot(p_w)

depth = p_c[2]
p_c = p_c[0:3]                
print(f'p_c: {p_c}', f'depth: {depth}')

K = np.array([[-500, 0, 200], [0, -500, 200], [0, 0, 1]])

# Depth = 1
p_nc = p_c / depth

uv = K.dot(p_nc)
print(f'p_s: {uv}')
print('\nMap sensor points to world frame')
ray = np.linalg.inv(K).dot(uv)
print(f'ray: {ray}')

p_c = depth * ray
print(f'p_c: {p_c}')

p_W = O_w + depth * R_cw.T.dot(ray)
print(f'p_w: {p_w}')
#

Map world points to sensor frame
p_w: [ 1  0 -5  1]
p_c: [ 2  0 -5] depth: -5
p_s: [400. 200.   1.]

Map sensor points to world frame
ray: [-0.4  0.   1. ]
p_c: [ 2. -0. -5.]
p_w: [ 1  0 -5  1]

Calibration ¶

DLT: Direct Linear Transform ¶

Key ideas:

Projection matrix $P = T_{SC} T_{IC} T_{CW}$ has 11 DoF.
Given a 2D-to-3D correspondence, we can get two constraints for solving P.
We need at least 6 correspondences so that we will have 12 constraints to solve $P$ with 11 DoF.
Use SVD to find the solution - right singular vector with least singular value.
$ P = [KR| -KRO_W^C]$, so once we get $P$, we can get $KR$ and $O_C^C$
Use QR decomposition on $(KR)^{-1}$ to find $K$ and $R$

Zhang's homography approach ¶

Key ideas:

Use planer object so that $Z_W$ is always 0.
Instead of getting $P$, we are solving equations to get homography matrix $H$.
We need at least 4 points on each image to solve $H$.
Each homography solution gives 2 constraints on $K^TK$.
$K^TK$ is a symmetric matrix with 6 DoF. So we need at least 3 images with each giving us 2 constraints on $K^TK$.
Find $K^TK$, then do Choleskey decomposition to find $K$.

Non-linear Optimization with Gauss-Newton or LM ¶

Key ideas:

Use Zhang's approach for initialization.
$K, q, R_i, t_i = \underset{K, q, R_i, t_i}{\operatorname{argmin}} {\sum_n\sum_i}\|x_{ni} - \hat{x}(K, q, R_i, t_i, X_{ni})\|^2$.
Note that we can set the planar object corner as world origin (0, 0, 0).

Perspective-n-Points ¶

Given n points in 3D world and its correspondence in 2D image, find the calibrated camera extrinsic pose.

P3P ¶

key ideas:

There are 4 solutions in front of camera. We need one additional point to resolve the ambiguity.

Fundamental Matrix and Essential Matrix ¶

Fundamental Matrix ¶

Based on the $O_1O_2$, $O_1X$ and $O_2X$ form a plane, aka $O_1X (\cdot O_1O_2 \times O_1X) = 0$, we can get

$$ \begin{align} x_1^T (R_1^{-1}K_1^{-1})^T S_{b_{12}} R_2^{-1}K_2^{-1}x_2 = 0 \end{align} $$

Where $S_b$ is the skew matrix of vector $O_1O_2$.

$$ \begin{align} x_1^TF_{12}x_2 &= 0 \\ F_{12} &= K_1^{-T}R_1^{-T} S_{b_{12}} R_2^{-1}K_2^{-1} \end{align} $$

Then suppose we have projection matrix $P_1 = [A_1 | a_1] = [KR | -KRO_1]$, then we can get $b_{12} = O_1O_2 = A_2^{-1}a_2 - A_1^{-1}a_1$. Then $$ \begin{align} F = A_1^{-T}S_{b_{12}}A_2^{-1} \end{align} $$

Essential Matrix ¶

$$ \begin{align} x_1^TK_1^{-T}E_{12}K_2^{-1}x_2 &= 0 \\ F &= K_1^{-T}E_{12}K_2^{-1} \\ E_{12} &= R_1^{-T} S_{b_{12}} R_2^{-1} \end{align} $$

Epipolar Constraint ¶

Important elements:

Epipolar axis: $b_{12} = O_1O_2$
Epipolar plane: $O_1O_2X$
Epipoles: For image 1, the epipole is the point that camera center $O_2$ projected onto image 1.
- Which is also the intersected point between $O_1O_2$ and image 1 plane.
- Epipole in image 1 is in the null space of $F_{12}^T$ because $e_1^TF_{12}x_2 = 0$ is always true.
  - This can be obtained using eigen decomposition and find the eigen vector with eigen value as 0.
  - So Fundamental Matrix F has rank as 2...
Epipolar line: For image 1, the epipolar line is the line that ray $O_2X$ projected onto image 1.
- Which is also the intersected line between epipolar plane $O_1O_2X$ and image 1 plane.
- $F_{12}x_2$ is the epipolar line in image 1 because $x_1$ lies on it, aka $x_1^TF_{12}x_2 = 0$

Direct Solution for Estimating Fundamental Matrix and Essential Matrix ¶

Key ideas:

Use DLT to solve $F$ similar as estimating homography matrix $H$. $x_1^TF_{12}x_2 = 0$
- There are 8 unknowns in F so we need at least 8 correspondences.
- $Rank(F) = 2$
  - $F = USV^T$, then $F \approx U\hat{S}V^T$ where $\hat{S}$ is generated by setting the least singular value as 0.
- In reality, using normalized pixel values makes solution more numerically stable.
Use DLT to solve $E$ similar as estimating homography matrix $H$. ${x_k}_1^TE_{12}{x_k}_2 = 0$
- There are 8 unknowns in F so we need at least 8 correspondences.
- $Rank(F) = 2$
  - $F = USV^T$, then $F \approx U\hat{S}V^T$ where $\hat{S}$ is generated by setting the least singular
  - We also need to make first two singular value the same. We can all set them as 1. so it is skew-symmetric??
5 Point algorithm for finding $E$
- RANSAC idea:
  - Minimal number of points to fit a model: $s$
  - Outlier ratio: $e$
  - Draw a single inlier probability $p = 1-e$
  - Draw $s$ inliers $p = (1-e)^s$
  - Failing 1 time probability $p_f = 1-(1-e)^s$
  - Failing T time probability $p_f = (1-(1-e)^s)^T$
  - If we want to have success probability at least p, then $p > 1-p_f$.
  - $T = \frac{log(1-p)}{log(1-(1-e)^s)}$

Find $R$ and $t$ from $E$ ¶

$E_{12} = R_{21} [t_{12}]_\times $ which assumes camera 2 is at the world origin.
$E_{12} = [t_{12}]_\times R_{12}^T $ which assumes camera 1 is at the world origin.
4 solutions

Camera Models ¶

Pinhole Camera Model ¶

Unified Camera Model ¶

Extended Unified Camera Model ¶

Kannala-Brandt Camera Model ¶

Double Sphere Camera Model ¶

Reference ¶

Distortion models
Cremers et al. The Double Sphere Camera Model

Optical Flow ¶

The goal is to find $ \Delta u, \Delta v = \underset{\Delta u, \Delta v}{\operatorname{argmin}} \|I_1(u, v) - I_2(u + \Delta u, v + \Delta v)\|^2$.

We can minimize the error iteratively as in this . $$ \Delta u, \Delta v = \underset{\Delta u, \Delta v}{\operatorname{argmin}} \|I_1(u, v) - I_2(u + u^* + \Delta u, v + v^* + \Delta v)\|^2 \\ u^* += \Delta u \\ v^* += \Delta v $$

Xipeng Wang

Multi-view Geometry

Camera Intrinsics and Extrinsics ¶

Basics of Pinhole Camera Model ¶

Mapping ¶

Inverse map from $uv$ to $x_d$ ¶

Inverse map from $x_d$ to $x_u$ ¶

Inverse map from $x_u$ to $X_c$ ¶

Calibration ¶

DLT: Direct Linear Transform ¶

Zhang's homography approach ¶

Non-linear Optimization with Gauss-Newton or LM ¶

Perspective-n-Points ¶

P3P ¶

Fundamental Matrix and Essential Matrix ¶

Fundamental Matrix ¶

Essential Matrix ¶

Epipolar Constraint ¶

Direct Solution for Estimating Fundamental Matrix and Essential Matrix ¶

Find $R$ and $t$ from $E$ ¶

Camera Models ¶

Pinhole Camera Model ¶

Unified Camera Model ¶

Extended Unified Camera Model ¶

Kannala-Brandt Camera Model ¶

Double Sphere Camera Model ¶

Reference ¶

Optical Flow ¶