Multi-view Geometry
- Camera Intrinsics and Extrinsics ¶
- Optical Flow ¶
Camera Intrinsics and Extrinsics ¶
Basics of Pinhole Camera Model ¶
The projection is shown as below, where $S$ is the sensor frame, $I$ is the image frame, $C$ is the camera frame, and $W$ is the world frame. $I$ frame normally has only a constant offset to $C$, which is so called camera constant.
$$ \begin{align} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = T_{SC} T_{IC} T_{CW} \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix} \end{align} $$Extrinsic parameter is $T_{CW}$ which typically has 6 DoF - 3 for position and 3 for orientation. $O^W_C$ is the world origin expressed in the camera frame $C$. $O^C_W$ is the camera origin expressed in the world frame $W$.
$$ \begin{align} T_{CW} = \begin{bmatrix} R_{CW},& O^W_{C} \\ 0,& 1 \end{bmatrix} \end{align} $$$$ \begin{align} T_{CW} = \begin{bmatrix} R_{CW},& -R_{CW} O^C_W \\ 0,& 1 \end{bmatrix} \end{align} $$Intrinsic parameter is $T_{SC} T_{IC}$ without considering non-linear distortions. If we consider image plane is at the negative z direction. Then camera constant $c$ will be negative. $x_H$ and $y_H$ are translation from image frame center to sensor frame center.
$$ \begin{align} T_{SC} T_{IC} & = \begin{bmatrix} 1,& 0, & x_H \\ 0,& 1, & y_H \\ 0,& 0, & 1 \end{bmatrix}\begin{bmatrix} c,& 0, & 0 \\ 0,& c, & 0 \\ 0,& 0, & 1 \end{bmatrix} \\ & = \begin{bmatrix} 1,& s, & x_H \\ 0,& 1+m, & y_H \\ 0,& 0, & 1 \end{bmatrix}\begin{bmatrix} c,& 0, & 0 \\ 0,& c, & 0 \\ 0,& 0, & 1 \end{bmatrix} \end{align} $$$$ \begin{align} K = T_{SC} T_{IC} & = \begin{bmatrix} c_x,& s_{xy}, & x_H \\ 0,& c_y, & y_H \\ 0,& 0, & 1 \end{bmatrix} \end{align} $$Non-linear distortion could be caused by non-perfect lens so that each pixel projected onto sensor plane is shifted a little based on its position. $x_u$ and $y_u$ are undistorted projected points on the normalized image plane where $depth = 1$ ($x_u = \frac{X_C}{Z_C}, y_u = \frac{Y_C}{Z_C}$, $Z_C$ is the depth). $q$ are the parameters for distortion models, such as barrel distortion, tangent distortion, etc.
$$ \begin{align} x_d =& x_u + \Delta(x_u, q) \\ y_d =& y_u + \Delta(y_u, q) \end{align} $$$$ \begin{align} x_d = H(x_u)x_u \end{align} $$$$ \begin{align} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = T_{SC} T_{IC} \begin{bmatrix} x_d \\ y_d \\ 1 \end{bmatrix} \end{align} $$Mapping ¶
Inverse map from $uv$ to $x_d$ ¶
$$ \begin{align} x_d = K^{-1}\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} \end{align} $$Inverse map from $x_d$ to $x_u$ ¶
$$ \begin{align} [x_{u}]_{i+1}= [H([x_{u}]_i)]^{-1}x_d \end{align} $$Inverse map from $x_u$ to $X_c$ ¶
$$ \begin{align} X_C = \lambda x_u \end{align} $$$$ \begin{align} X_W = O_W^C + R_{CW}^{-1} X_C \end{align} $$Where $\lambda$ is the depth.
# Mapping and inverse mapping
import numpy as np
print('Map world points to sensor frame')
p_w = np.array([1, 0, -5, 1])
print(f'p_w: {p_w}')
T_cw = np.array([[1, 0, 0, 1],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]])
R_cw = T_cw[0:3, 0:3]
O_w = -R_cw.T.dot(T_cw[0:3, 3])
p_c = T_cw.dot(p_w)
depth = p_c[2]
p_c = p_c[0:3]
print(f'p_c: {p_c}', f'depth: {depth}')
K = np.array([[-500, 0, 200], [0, -500, 200], [0, 0, 1]])
# Depth = 1
p_nc = p_c / depth
uv = K.dot(p_nc)
print(f'p_s: {uv}')
print('\nMap sensor points to world frame')
ray = np.linalg.inv(K).dot(uv)
print(f'ray: {ray}')
p_c = depth * ray
print(f'p_c: {p_c}')
p_W = O_w + depth * R_cw.T.dot(ray)
print(f'p_w: {p_w}')
#
Calibration ¶
DLT: Direct Linear Transform ¶
Key ideas:
- Projection matrix $P = T_{SC} T_{IC} T_{CW}$ has 11 DoF.
- Given a 2D-to-3D correspondence, we can get two constraints for solving P.
- We need at least 6 correspondences so that we will have 12 constraints to solve $P$ with 11 DoF.
- Use SVD to find the solution - right singular vector with least singular value.
- $ P = [KR| -KRO_W^C]$, so once we get $P$, we can get $KR$ and $O_C^C$
- Use QR decomposition on $(KR)^{-1}$ to find $K$ and $R$
Zhang's homography approach ¶
Key ideas:
- Use planer object so that $Z_W$ is always 0.
- Instead of getting $P$, we are solving equations to get homography matrix $H$.
- We need at least 4 points on each image to solve $H$.
- Each homography solution gives 2 constraints on $K^TK$.
- $K^TK$ is a symmetric matrix with 6 DoF. So we need at least 3 images with each giving us 2 constraints on $K^TK$.
- Find $K^TK$, then do Choleskey decomposition to find $K$.
Non-linear Optimization with Gauss-Newton or LM ¶
Key ideas:
- Use Zhang's approach for initialization.
- $K, q, R_i, t_i = \underset{K, q, R_i, t_i}{\operatorname{argmin}} {\sum_n\sum_i}\|x_{ni} - \hat{x}(K, q, R_i, t_i, X_{ni})\|^2$.
- Note that we can set the planar object corner as world origin (0, 0, 0).
Fundamental Matrix and Essential Matrix ¶
Fundamental Matrix ¶
Based on the $O_1O_2$, $O_1X$ and $O_2X$ form a plane, aka $O_1X (\cdot O_1O_2 \times O_1X) = 0$, we can get
$$ \begin{align} x_1^T (R_1^{-1}K_1^{-1})^T S_{b_{12}} R_2^{-1}K_2^{-1}x_2 = 0 \end{align} $$Where $S_b$ is the skew matrix of vector $O_1O_2$.
$$ \begin{align} x_1^TF_{12}x_2 &= 0 \\ F_{12} &= K_1^{-T}R_1^{-T} S_{b_{12}} R_2^{-1}K_2^{-1} \end{align} $$Then suppose we have projection matrix $P_1 = [A_1 | a_1] = [KR | -KRO_1]$, then we can get $b_{12} = O_1O_2 = A_2^{-1}a_2 - A_1^{-1}a_1$. Then $$ \begin{align} F = A_1^{-T}S_{b_{12}}A_2^{-1} \end{align} $$
Essential Matrix ¶
$$ \begin{align} x_1^TK_1^{-T}E_{12}K_2^{-1}x_2 &= 0 \\ F &= K_1^{-T}E_{12}K_2^{-1} \\ E_{12} &= R_1^{-T} S_{b_{12}} R_2^{-1} \end{align} $$Epipolar Constraint ¶
Important elements:
- Epipolar axis: $b_{12} = O_1O_2$
- Epipolar plane: $O_1O_2X$
-
Epipoles: For image 1, the epipole is the point that camera center $O_2$ projected onto image 1.
- Which is also the intersected point between $O_1O_2$ and image 1 plane.
-
Epipole in image 1 is in the null space of $F_{12}^T$ because $e_1^TF_{12}x_2 = 0$ is always true.
- This can be obtained using eigen decomposition and find the eigen vector with eigen value as 0.
- So Fundamental Matrix F has rank as 2...
-
Epipolar line: For image 1, the epipolar line is the line that ray $O_2X$ projected onto image 1.
- Which is also the intersected line between epipolar plane $O_1O_2X$ and image 1 plane.
- $F_{12}x_2$ is the epipolar line in image 1 because $x_1$ lies on it, aka $x_1^TF_{12}x_2 = 0$
Direct Solution for Estimating Fundamental Matrix and Essential Matrix ¶
Key ideas:
-
Use DLT to solve $F$ similar as estimating homography matrix $H$. $x_1^TF_{12}x_2 = 0$
- There are 8 unknowns in F so we need at least 8 correspondences.
-
$Rank(F) = 2$
- $F = USV^T$, then $F \approx U\hat{S}V^T$ where $\hat{S}$ is generated by setting the least singular value as 0.
- In reality, using normalized pixel values makes solution more numerically stable.
-
Use DLT to solve $E$ similar as estimating homography matrix $H$. ${x_k}_1^TE_{12}{x_k}_2 = 0$
- There are 8 unknowns in F so we need at least 8 correspondences.
-
$Rank(F) = 2$
- $F = USV^T$, then $F \approx U\hat{S}V^T$ where $\hat{S}$ is generated by setting the least singular
- We also need to make first two singular value the same. We can all set them as 1. so it is skew-symmetric??
-
5 Point algorithm for finding $E$
-
RANSAC idea:
- Minimal number of points to fit a model: $s$
- Outlier ratio: $e$
- Draw a single inlier probability $p = 1-e$
- Draw $s$ inliers $p = (1-e)^s$
- Failing 1 time probability $p_f = 1-(1-e)^s$
- Failing T time probability $p_f = (1-(1-e)^s)^T$
- If we want to have success probability at least p, then $p > 1-p_f$.
- $T = \frac{log(1-p)}{log(1-(1-e)^s)}$
-
RANSAC idea:
Find $R$ and $t$ from $E$ ¶
- $E_{12} = R_{21} [t_{12}]_\times $ which assumes camera 2 is at the world origin.
- $E_{12} = [t_{12}]_\times R_{12}^T $ which assumes camera 1 is at the world origin.
- 4 solutions
Camera Models ¶
Pinhole Camera Model ¶
Unified Camera Model ¶
Extended Unified Camera Model ¶
Kannala-Brandt Camera Model ¶
Double Sphere Camera Model ¶
Reference ¶
- Distortion models
-
Cremers et al.
The Double Sphere Camera Model
Optical Flow ¶
The goal is to find $ \Delta u, \Delta v = \underset{\Delta u, \Delta v}{\operatorname{argmin}} \|I_1(u, v) - I_2(u + \Delta u, v + \Delta v)\|^2$.
We can minimize the error iteratively as in this . $$ \Delta u, \Delta v = \underset{\Delta u, \Delta v}{\operatorname{argmin}} \|I_1(u, v) - I_2(u + u^* + \Delta u, v + v^* + \Delta v)\|^2 \\ u^* += \Delta u \\ v^* += \Delta v $$