Multi-view Geometry

Camera Intrinsics and Extrinsics

Basics of Pinhole Camera Model

The projection is shown as below, where $S$ is the sensor frame, $I$ is the image frame, $C$ is the camera frame, and $W$ is the world frame. $I$ frame normally has only a constant offset to $C$, which is so called camera constant.

$$ \begin{align} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = T_{SC} T_{IC} T_{CW} \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix} \end{align} $$

Extrinsic parameter is $T_{CW}$ which typically has 6 DoF - 3 for position and 3 for orientation. $O^W_C$ is the world origin expressed in the camera frame $C$. $O^C_W$ is the camera origin expressed in the world frame $W$.

$$ \begin{align} T_{CW} = \begin{bmatrix} R_{CW},& O^W_{C} \\ 0,& 1 \end{bmatrix} \end{align} $$$$ \begin{align} T_{CW} = \begin{bmatrix} R_{CW},& -R_{CW} O^C_W \\ 0,& 1 \end{bmatrix} \end{align} $$

Intrinsic parameter is $T_{SC} T_{IC}$ without considering non-linear distortions. If we consider image plane is at the negative z direction. Then camera constant $c$ will be negative. $x_H$ and $y_H$ are translation from image frame center to sensor frame center.

$$ \begin{align} T_{SC} T_{IC} & = \begin{bmatrix} 1,& 0, & x_H \\ 0,& 1, & y_H \\ 0,& 0, & 1 \end{bmatrix}\begin{bmatrix} c,& 0, & 0 \\ 0,& c, & 0 \\ 0,& 0, & 1 \end{bmatrix} \\ & = \begin{bmatrix} 1,& s, & x_H \\ 0,& 1+m, & y_H \\ 0,& 0, & 1 \end{bmatrix}\begin{bmatrix} c,& 0, & 0 \\ 0,& c, & 0 \\ 0,& 0, & 1 \end{bmatrix} \end{align} $$$$ \begin{align} K = T_{SC} T_{IC} & = \begin{bmatrix} c_x,& s_{xy}, & x_H \\ 0,& c_y, & y_H \\ 0,& 0, & 1 \end{bmatrix} \end{align} $$

Non-linear distortion could be caused by non-perfect lens so that each pixel projected onto sensor plane is shifted a little based on its position. $x_u$ and $y_u$ are undistorted projected points on the normalized image plane where $depth = 1$ ($x_u = \frac{X_C}{Z_C}, y_u = \frac{Y_C}{Z_C}$, $Z_C$ is the depth). $q$ are the parameters for distortion models, such as barrel distortion, tangent distortion, etc.

$$ \begin{align} x_d =& x_u + \Delta(x_u, q) \\ y_d =& y_u + \Delta(y_u, q) \end{align} $$$$ \begin{align} x_d = H(x_u)x_u \end{align} $$$$ \begin{align} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = T_{SC} T_{IC} \begin{bmatrix} x_d \\ y_d \\ 1 \end{bmatrix} \end{align} $$

Mapping

Inverse map from $uv$ to $x_d$

$$ \begin{align} x_d = K^{-1}\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} \end{align} $$

Inverse map from $x_d$ to $x_u$

$$ \begin{align} [x_{u}]_{i+1}= [H([x_{u}]_i)]^{-1}x_d \end{align} $$

Inverse map from $x_u$ to $X_c$

$$ \begin{align} X_C = \lambda x_u \end{align} $$$$ \begin{align} X_W = O_W^C + R_{CW}^{-1} X_C \end{align} $$

Where $\lambda$ is the depth.

In [3]:
Map world points to sensor frame
p_w: [ 1  0 -5  1]
p_c: [ 2  0 -5] depth: -5
p_s: [400. 200.   1.]

Map sensor points to world frame
ray: [-0.4  0.   1. ]
p_c: [ 2. -0. -5.]
p_w: [ 1  0 -5  1]

Calibration

DLT: Direct Linear Transform

Key ideas:

  • Projection matrix $P = T_{SC} T_{IC} T_{CW}$ has 11 DoF.
  • Given a 2D-to-3D correspondence, we can get two constraints for solving P.
  • We need at least 6 correspondences so that we will have 12 constraints to solve $P$ with 11 DoF.
  • Use SVD to find the solution - right singular vector with least singular value.
  • $ P = [KR| -KRO_W^C]$, so once we get $P$, we can get $KR$ and $O_C^C$
  • Use QR decomposition on $(KR)^{-1}$ to find $K$ and $R$

Zhang's homography approach

Key ideas:

  • Use planer object so that $Z_W$ is always 0.
  • Instead of getting $P$, we are solving equations to get homography matrix $H$.
  • We need at least 4 points on each image to solve $H$.
  • Each homography solution gives 2 constraints on $K^TK$.
  • $K^TK$ is a symmetric matrix with 6 DoF. So we need at least 3 images with each giving us 2 constraints on $K^TK$.
  • Find $K^TK$, then do Choleskey decomposition to find $K$.

Non-linear Optimization with Gauss-Newton or LM

Key ideas:

  • Use Zhang's approach for initialization.
  • $K, q, R_i, t_i = \underset{K, q, R_i, t_i}{\operatorname{argmin}} {\sum_n\sum_i}\|x_{ni} - \hat{x}(K, q, R_i, t_i, X_{ni})\|^2$.
  • Note that we can set the planar object corner as world origin (0, 0, 0).

Perspective-n-Points

Given n points in 3D world and its correspondence in 2D image, find the calibrated camera extrinsic pose.

P3P

key ideas:

  • There are 4 solutions in front of camera. We need one additional point to resolve the ambiguity.

Fundamental Matrix and Essential Matrix

Fundamental Matrix

Based on the $O_1O_2$, $O_1X$ and $O_2X$ form a plane, aka $O_1X (\cdot O_1O_2 \times O_1X) = 0$, we can get

$$ \begin{align} x_1^T (R_1^{-1}K_1^{-1})^T S_{b_{12}} R_2^{-1}K_2^{-1}x_2 = 0 \end{align} $$

Where $S_b$ is the skew matrix of vector $O_1O_2$.

$$ \begin{align} x_1^TF_{12}x_2 &= 0 \\ F_{12} &= K_1^{-T}R_1^{-T} S_{b_{12}} R_2^{-1}K_2^{-1} \end{align} $$

Then suppose we have projection matrix $P_1 = [A_1 | a_1] = [KR | -KRO_1]$, then we can get $b_{12} = O_1O_2 = A_2^{-1}a_2 - A_1^{-1}a_1$. Then $$ \begin{align} F = A_1^{-T}S_{b_{12}}A_2^{-1} \end{align} $$

Essential Matrix

$$ \begin{align} x_1^TK_1^{-T}E_{12}K_2^{-1}x_2 &= 0 \\ F &= K_1^{-T}E_{12}K_2^{-1} \\ E_{12} &= R_1^{-T} S_{b_{12}} R_2^{-1} \end{align} $$

Epipolar Constraint

Important elements:

  • Epipolar axis: $b_{12} = O_1O_2$
  • Epipolar plane: $O_1O_2X$
  • Epipoles: For image 1, the epipole is the point that camera center $O_2$ projected onto image 1.
    • Which is also the intersected point between $O_1O_2$ and image 1 plane.
    • Epipole in image 1 is in the null space of $F_{12}^T$ because $e_1^TF_{12}x_2 = 0$ is always true.
      • This can be obtained using eigen decomposition and find the eigen vector with eigen value as 0.
      • So Fundamental Matrix F has rank as 2...
  • Epipolar line: For image 1, the epipolar line is the line that ray $O_2X$ projected onto image 1.
    • Which is also the intersected line between epipolar plane $O_1O_2X$ and image 1 plane.
    • $F_{12}x_2$ is the epipolar line in image 1 because $x_1$ lies on it, aka $x_1^TF_{12}x_2 = 0$

Direct Solution for Estimating Fundamental Matrix and Essential Matrix

Key ideas:

  • Use DLT to solve $F$ similar as estimating homography matrix $H$. $x_1^TF_{12}x_2 = 0$
    • There are 8 unknowns in F so we need at least 8 correspondences.
    • $Rank(F) = 2$
      • $F = USV^T$, then $F \approx U\hat{S}V^T$ where $\hat{S}$ is generated by setting the least singular value as 0.
    • In reality, using normalized pixel values makes solution more numerically stable.
  • Use DLT to solve $E$ similar as estimating homography matrix $H$. ${x_k}_1^TE_{12}{x_k}_2 = 0$
    • There are 8 unknowns in F so we need at least 8 correspondences.
    • $Rank(F) = 2$
      • $F = USV^T$, then $F \approx U\hat{S}V^T$ where $\hat{S}$ is generated by setting the least singular
      • We also need to make first two singular value the same. We can all set them as 1. so it is skew-symmetric??
  • 5 Point algorithm for finding $E$
    • RANSAC idea:
      • Minimal number of points to fit a model: $s$
      • Outlier ratio: $e$
      • Draw a single inlier probability $p = 1-e$
      • Draw $s$ inliers $p = (1-e)^s$
      • Failing 1 time probability $p_f = 1-(1-e)^s$
      • Failing T time probability $p_f = (1-(1-e)^s)^T$
      • If we want to have success probability at least p, then $p > 1-p_f$.
      • $T = \frac{log(1-p)}{log(1-(1-e)^s)}$
Find $R$ and $t$ from $E$
  • $E_{12} = R_{21} [t_{12}]_\times $ which assumes camera 2 is at the world origin.
  • $E_{12} = [t_{12}]_\times R_{12}^T $ which assumes camera 1 is at the world origin.
  • 4 solutions

Camera Models

Pinhole Camera Model

Unified Camera Model

Extended Unified Camera Model

Kannala-Brandt Camera Model

Double Sphere Camera Model

Reference

Optical Flow

The goal is to find $ \Delta u, \Delta v = \underset{\Delta u, \Delta v}{\operatorname{argmin}} \|I_1(u, v) - I_2(u + \Delta u, v + \Delta v)\|^2$.

We can minimize the error iteratively as in this . $$ \Delta u, \Delta v = \underset{\Delta u, \Delta v}{\operatorname{argmin}} \|I_1(u, v) - I_2(u + u^* + \Delta u, v + v^* + \Delta v)\|^2 \\ u^* += \Delta u \\ v^* += \Delta v $$

This blog is converted from camera.ipynb
Written on April 26, 2022