Frame Definitions and Observed Points ¶

In the hand-eye calibration problem, we define the following coordinate frames:

Base Frame : The fixed coordinate frame of the robot arm, denoted as $ \mathbf{B} $
End-Effector Frame : The frame attached to the robot's end-effector, denoted as $ \mathbf{E} $
Camera Frame : The coordinate frame of the camera, which is rigidly mounted to the end-effector, denoted as $ \mathbf{C} $
World Frame : A global reference frame in which known points are defined, denoted as $ \mathbf{W} $

We assume that four known points are in the world frame:

$$ p_W = \begin{bmatrix} -1 & -1 & 0 \\ -1 & 1 & 0\\ 1 & -1 & 0\\ 1 & 1 & 0 \end{bmatrix} $$

where each row represents the ((x, y, z)) coordinates of a point in the world frame.

These points serve as reference points that can be used to estimate the transformation between different coordinate frames.

Hand-Eye Calibration Goal ¶

In the hand-eye calibration problem, our goal is to estimate the transformation between the camera and the end-effector, as well as the camera's intrinsic parameters.

We define the transformation from the camera frame $\mathbf{C}$ to the end-effector frame $\mathbf{E}$ as: $\mathbf{T}_{EC} \in SE(3)$. This transformation allows us to convert a point $\mathbf{p}_C$ in the camera frame to the end-effector frame using: $\mathbf{p}_E = \mathbf{T}_{EC} \mathbf{p}_C$

In addition to the hand-eye transformation, we may also need to estimate the camera intrinsics , which describe how 3D points in the camera frame are projected onto the image plane. The camera intrinsics matrix is given by $ \mathbf{K} = \begin{bmatrix} f_x & \gamma & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}$.

where:

$f_x, f_y$ are the focal lengths in pixels,
$c_x, c_y$ are the principal point coordinates.
$\gamma$ is a skew parameter.

Note:

In this tutorial, we assume there is no tangent and radial distortion
See Optimal Hand-Eye Calibration from Klaus H Strobl and Gerd Hirzinger for more details

Install packates (on Google Colab) and import packages ¶

In [ ]:

# Run this first if you use GoogleColab

!pip install pytransform3d

# You don't need this if you don't want to run factor-graph based opimization.
!pip install symforce
#

In [14]:

# Import packages
import numpy as np
from pytransform3d import transformations as pt
from pytransform3d import rotations as pr
import random
import copy

np.set_printoptions(precision=4, suppress=True)
#

Steps to Solve the Hand-Eye Calibration Problem ¶

References: ¶

Robot Sensor Calibration: Solving AX = XB on the Euclidean Group Frank C. Park and Bryan J Martin

Optimal Hand-Eye Calibration Klaus H Strobl and Gerd Hirzinger

A flexible new technique for camera calibration Zhengyou Zhang

Simulated Problem Definition ¶

To validate the hand-eye calibration method, we define a simulated environment where a camera observes known 3D points in the world. This setup allows us to generate synthetic camera poses and project world points onto the image plane.

Defining 3D Points in the World Frame ¶

We define four 3D points in the world frame $\mathbf{W}$ that lie on the $z=0$ plane:

$$ \mathbf{p}_W = \begin{bmatrix} -1 & -1 & 0 \\ 1 & -1 & 0 \\ 1 & 1 & 0 \\ -1 & 1 & 0 \end{bmatrix} $$

To work with homogeneous coordinates, we append a row of ones:

$$ \tilde{\mathbf{p}}_W = \begin{bmatrix} -1 & 1 & 1 & -1 \\ -1 & -1 & 1 & 1 \\ 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \end{bmatrix} $$

Camera Intrinsic Matrix ¶

The camera intrinsics matrix $\mathbf{K}$ models how 3D points are projected onto the image plane:

$$ \mathbf{K} = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} $$

where:

$f_x = 800, f_y = 800$ are the focal lengths (in pixels),
$c_x = 320, c_y = 240$ are the principal point coordinates.
We have $\gamma$ as 0 in this case.

Generating Camera Poses ¶

To simulate multiple views, we generate camera poses in spherical coordinates, where:

The camera moves along a sphere of varying radii ($r \in [10, 20]$),
The azimuth angle varies from $0^\circ$ to $360^\circ$,
The elevation angle varies from $30^\circ$ to $60^\circ$.

For a given azimuth $\theta$ and elevation $\phi$, the camera position in Cartesian coordinates is:

$$ \mathbf{t} = \begin{bmatrix} r \cos\phi \cos\theta \\ r \cos\phi \sin\theta \\ r \sin\phi \end{bmatrix} $$

The camera orientation is determined using a look-at transformation , where:

The forward vector points from the camera to the world origin,
The right vector is computed as the cross-product of the world $z$-axis and the forward vector,
The up vector is computed as the cross-product of the forward and right vectors.

The rotation matrix $\mathbf{R}_{WC}$ is then constructed as:

$$ \mathbf{R}_{WC} = \begin{bmatrix} \mathbf{r} & \mathbf{u} & \mathbf{f} \end{bmatrix} $$

where $\mathbf{r}, \mathbf{u}, \mathbf{f}$ are the right, up, and forward unit vectors, respectively.

The final transformation from the world to the camera frame is:

$$ \mathbf{T}_{WC} = \begin{bmatrix} \mathbf{R}_{WC} & \mathbf{t} \\ 0 & 1 \end{bmatrix} $$

This process generates a set of camera poses that will be used to simulate observations and test the hand-eye calibration process.

In [101]:

# Generate a simulated problem

# Define the 3D points in the world coordinate system
world_points = np.array([[-1, -1, 0], [1, -1, 0], [1, 1, 0], [-1, 1, 0]])
homogeneous_world_points = np.vstack((world_points.T, np.ones(world_points.shape[0])))

# Define the camera intrinsic matrix (K)
fx, fy = 800, 800  # Focal lengths
cx, cy = 320, 240  # Principal point
K = np.array([[fx, 0, cx], [0, fy, cy], [0, 0, 1]])


def generate_camera_poses(num_r=2, num_z=4, num_e=4):
    # def generate_camera_poses(num_r=4, num_z=10, num_e=10):
    camera_poses = []

    for r in np.linspace(10, 20, num=num_r):
        for azimuth_deg in np.linspace(0, 360, num=num_z):
            for elevation_deg in np.linspace(30, 60, num=num_e):
                azimuth = np.radians(azimuth_deg)
                elevation = np.radians(elevation_deg)

                # Convert spherical to Cartesian coordinates
                x = r * np.cos(elevation) * np.cos(azimuth)
                y = r * np.cos(elevation) * np.sin(azimuth)
                z = r * np.sin(elevation)

                # Camera position
                t = np.array([x, y, z])

                # Camera rotation matrix
                # Note: Camera is looking at its negative z
                forward = -t / np.linalg.norm(t)  # z
                right = np.cross([0, 0, 1], forward)  # x
                right /= np.linalg.norm(right)
                up = np.cross(forward, right)  # y
                R = np.vstack([right, up, forward]).T

                # Construct transformation matrix
                T_W_C = pt.transform_from(R, t)
                camera_poses.append(T_W_C)
    return camera_poses
#

Camera Intrinsic and Extrinsic Estimation ¶

To estimate both the camera intrinsics and extrinsics, we apply Zhang's method [ A flexible new technique for camera calibration ], which involves the following steps:

Estimating Homographies : We compute the transformation between the world and image planes for multiple camera views.
Calibrating Camera Intrinsics : Using homographies, we solve for the camera intrinsic matrix $\mathbf{K}$.
Estimating Camera Extrinsics : Given $\mathbf{K}$ and the homographies, we recover the camera pose for each view.

Homography Estimation ¶

A homography is a $3 \times 3$ matrix $\mathbf{H}$ that describes the transformation between two planes in projective space. In the context of camera calibration, it maps 3D world points that lie on a known plane (e.g., the $Z = 0$ plane in our simulated problem) to their corresponding 2D image points :

$$ \tilde{\mathbf{p}}_I = \mathbf{H} \tilde{\mathbf{p}}_W $$

where:

$\tilde{\mathbf{p}}_W = (x, y, 1)^\top$ is the homogeneous coordinate of a point in the world frame.
$\tilde{\mathbf{p}}_I = (u, v, 1)^\top$ is the corresponding image point in homogeneous coordinates.
$\mathbf{H}$ is the $3 \times 3$ homography matrix.

Since all world points lie on the $Z=0$ plane , their 3D coordinates simplify to:

$$ \mathbf{p}_W = (x, y, 0)^\top $$

The camera projection equation is given by:

$$ \tilde{\mathbf{p}}_I = \mathbf{K} [\mathbf{R} \ | \ \mathbf{t}] \tilde{\mathbf{p}}_W $$

where:

$\mathbf{K}$ is the intrinsic matrix , which defines how 3D points are projected onto the image plane.
$[\mathbf{R} \ | \ \mathbf{t}]$ is the extrinsic transformation , with $\mathbf{R} \in SO(3)$ as the rotation matrix and $\mathbf{t} \in \mathbb{R}^3$ as the translation vector.
Since world points are on the $Z=0$ plane, the third column of $\mathbf{R}$ is irrelevant in this mapping.

Thus, we can rewrite the projection equation by extracting only the first two columns of $\mathbf{R}$:

$$ \mathbf{H} = \mathbf{K} [\mathbf{r}_1 \quad \mathbf{r}_2 \quad \mathbf{t}] $$

where:

$\mathbf{r}_1$ and $\mathbf{r}_2$ are the first two columns of the rotation matrix $\mathbf{R}$.
$\mathbf{t}$ is the translation vector.
The homography $\mathbf{H}$ encodes both the intrinsic ($\mathbf{K}$) and extrinsic ($\mathbf{R}, \mathbf{t}$) parameters of the camera.

Normalization ¶

To improve numerical stability, we normalize the points to be on unit circle before estimating the homography. This process ensures that the coordinate values are well-conditioned for numerical computations.

Given a set of points $\mathbf{p}$, we compute a normalization transformation matrix $\mathbf{T}$:

$$ \mathbf{T} = \begin{bmatrix} s & 0 & -s \bar{x} \\ 0 & s & -s \bar{y} \\ 0 & 0 & 1 \end{bmatrix} $$

Before estimating the homography, we apply this normalization process separately to the world points and image points , yielding two transformation matrices:

The transformation matrix $\mathbf{T}_{\text{world}}$ normalizes the 2D world points $(x, y)$:

$$ \mathbf{T}_{\text{world}} = \begin{bmatrix} s_W & 0 & -s_W \bar{x}_W \\ 0 & s_W & -s_W \bar{y}_W \\ 0 & 0 & 1 \end{bmatrix} $$

Applying this transformation:

$$ \tilde{\mathbf{p}}_W' = \mathbf{T}_{\text{world}} \tilde{\mathbf{p}}_W $$

Similarly, the transformation matrix $\mathbf{T}_{\text{pixel}}$ normalizes the 2D image points $(u, v)$:

$$ \mathbf{T}_{\text{pixel}} = \begin{bmatrix} s_I & 0 & -s_I \bar{u} \\ 0 & s_I & -s_I \bar{v} \\ 0 & 0 & 1 \end{bmatrix} $$

Applying this transformation:

$$ \tilde{\mathbf{p}}_I' = \mathbf{T}_{\text{pixel}} \tilde{\mathbf{p}}_I $$

After applying these transformations, the normalized world points $\tilde{\mathbf{p}}_W'$ and image points $\tilde{\mathbf{p}}_I'$ are used to compute the homography $\mathbf{H}_{\text{norm}}$. The final homography is then denormalized :

$$ \mathbf{H} = \mathbf{T}_{\text{pixel}}^{-1} \mathbf{H}_{\text{norm}} \mathbf{T}_{\text{world}} $$

This process ensures better numerical stability in solving for the homography using singular value decomposition (SVD) .

In [102]:

# Point Normalization

def normalize_points(points):
    """Normalize points to have zero mean and unit variance."""
    mean = np.mean(points, axis=0)
    std_dev = np.std(points)
    scale = np.sqrt(2) / std_dev
    T = np.array(
        [[scale, 0, -scale * mean[0]], [0, scale, -scale * mean[1]], [0, 0, 1]]
    )
    normalized_points = (T @ np.vstack((points.T, np.ones(points.shape[0]))))[:2].T
    return normalized_points, T
#

DLT for homography estimation ¶

Using normalized world points $\mathbf{p}_W$ and image points $\mathbf{p}_I$, we construct a direct linear transformation (DLT) system.

$$ \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} $$

$$ u = \frac{h_{11} x + h_{12} y + h_{13}}{h_{31} x + h_{32} y + h_{33}} $$

$$ v = \frac{h_{21} x + h_{22} y + h_{23}}{h_{31} x + h_{32} y + h_{33}} $$

$$ h_{11} x + h_{12} y + h_{13} - u h_{31} x - u h_{32} y - u h_{33} = 0 $$

$$ h_{21} x + h_{22} y + h_{23} - v h_{31} x - v h_{32} y - v h_{33} = 0 $$

Each point correspondence $(x, y) \rightarrow (u, v)$ contributes two equations. To solve for all elements of $\mathbf{H}$, we stack these equations for multiple points into a linear system :

$$ A \mathbf{h} = 0 $$

where $\mathbf{h}$ is the 9-element vector.

The system matrix $A$ for each point correspondence $(x, y) \rightarrow (u, v)$ is:

$$ A = \begin{bmatrix} x & y & 1 & 0 & 0 & 0 & -u x & -u y & -u \\ 0 & 0 & 0 & x & y & 1 & -v x & -v y & -v \end{bmatrix} $$

By stacking these equations for multiple points and solving the homogeneous linear system using singular value decomposition (SVD) , we obtain the best estimate for $\mathbf{H}$.

Note: Since H has 8 DoF, at least 4 points are required to solve this equation, as each point provides 2 constraints. The points must not be collinear, as two points alone would determine the third, making the solution underdetermined.

In [103]:

# Homography estimation


def estimate_homography(world_points, pixel_points):
    """Estimate homography using normalized points."""
    world_norm, T_world = normalize_points(world_points[:, :2])
    pixel_norm, T_pixel = normalize_points(pixel_points.T)

    A = []
    for (x, y), (u, v) in zip(world_norm, pixel_norm):
        A.append([x, y, 1, 0, 0, 0, -u * x, -u * y, -u])
        A.append([0, 0, 0, x, y, 1, -v * x, -v * y, -v])

    A = np.array(A)

    # Perform SVD and take the last row of V^T
    _, _, Vt = np.linalg.svd(A)
    H_norm = Vt[-1].reshape(3, 3)

    # Denormalize the homography matrix
    H = np.linalg.inv(T_pixel) @ H_norm @ T_world

    # Normalize so that H[2,2] = 1
    H /= H[2, 2]

    return H

#

Camera Intrinsic Calibration ¶

Once multiple homographies $\mathbf{H}$ are computed, we estimate the camera intrinsic matrix $\mathbf{K}$.

The homography relates to intrinsics and extrinsics:

$$ \mathbf{H} = \mathbf{K} [ \mathbf{r}_1 \quad \mathbf{r}_2 \quad \mathbf{t} ] $$

where $\mathbf{r}_1, \mathbf{r}_2$ are the first two columns of the rotation matrix, and $\mathbf{t}$ is the translation vector.

- Constraint 1: Orthogonality of $\mathbf{r}_1$ and $\mathbf{r}_2$
The rotation columns must be orthogonal , meaning:

$$ \mathbf{r}_1^\top \mathbf{r}_2 = 0 $$

Expanding in terms of $\mathbf{H}$:

$$ (\mathbf{K}^{-1} \mathbf{h}_1)^\top (\mathbf{K}^{-1} \mathbf{h}_2) = 0 $$

Rearranging:

$$ \mathbf{h}_1^\top \mathbf{K}^{-\top} \mathbf{K}^{-1} \mathbf{h}_2 = 0 $$

- Constraint 2: Equality of Norms of $\mathbf{r}_1$ and $\mathbf{r}_2$
Since the columns of a rotation matrix have unit norms:

$$ \mathbf{r}_1^\top \mathbf{r}_1 = \mathbf{r}_2^\top \mathbf{r}_2 $$

Expanding in terms of $\mathbf{H}$:

$$ \mathbf{h}_1^\top \mathbf{K}^{-\top} \mathbf{K}^{-1} \mathbf{h}_1 = \mathbf{h}_2^\top \mathbf{K}^{-\top} \mathbf{K}^{-1} \mathbf{h}_2 $$

These two constraints hold for each homography and will allow us to estimate $\mathbf{K}$.

To simplify notation, we define:

$$ \mathbf{B} = \mathbf{K}^{-\top} \mathbf{K}^{-1} $$

Since $\mathbf{B}$ is symmetric ($B = B^T$) , it has the form:

$$ \mathbf{B} = \begin{bmatrix} B_{00} & B_{01} & B_{02} \\ B_{01} & B_{11} & B_{12} \\ B_{02} & B_{12} & B_{22} \end{bmatrix} $$

Substituting $\mathbf{B}$ into our constraints:

Orthogonality Condition:

$$ \mathbf{h}_1^\top \mathbf{B} \mathbf{h}_2 = 0 $$
Equality of Norms Condition:

$$ \mathbf{h}_1^\top \mathbf{B} \mathbf{h}_1 = \mathbf{h}_2^\top \mathbf{B} \mathbf{h}_2 $$

These provide two linear equations for each homography $\mathbf{H}$.

Let the vector $\mathbf{b}$ represents the six unknown elements of the intrinsic constraint matrix $\mathbf{B}$:

$$ \mathbf{b} = \begin{bmatrix} B_{00} \\ B_{01} \\ B_{11} \\ B_{02} \\ B_{12} \\ B_{22} \end{bmatrix} $$

The term ( v_{12} ) is derived from the orthogonality constraint of the rotation columns:

$$ \mathbf{h}_1^\top \mathbf{B} \mathbf{h}_2 = 0 $$

Expanding this gives the constraint vector:

$$ v_{12} = \begin{bmatrix} h_{11} h_{12} \\ h_{11} h_{22} + h_{21} h_{12} \\ h_{21} h_{22} \\ h_{11} h_{32} + h_{31} h_{12} \\ h_{21} h_{32} + h_{31} h_{22} \\ h_{31} h_{32} \end{bmatrix} $$

which contributes one row in the constraint matrix $V$ as $v_{12}^T b = 0$.

The terms ( v_{11} ) and ( v_{22} ) are derived from the equal norm constraint :

$$ \mathbf{h}_1^\top \mathbf{B} \mathbf{h}_1 = \mathbf{h}_2^\top \mathbf{B} \mathbf{h}_2 $$

Expanding these gives:

$$ v_{11} = \begin{bmatrix} h_{11}^2 \\ 2 h_{11} h_{21} \\ h_{21}^2 \\ 2 h_{11} h_{31} \\ 2 h_{21} h_{31} \\ h_{31}^2 \end{bmatrix} v_{22} = \begin{bmatrix} h_{12}^2 \\ 2 h_{12} h_{22} \\ h_{22}^2 \\ 2 h_{12} h_{32} \\ 2 h_{22} h_{32} \\ h_{32}^2 \end{bmatrix} $$

which provides another row in the constraint matrix $V$ as $(v_{11} - v_{22})^\top \mathbf{b} = 0$

We solve for $V \mathbf{b} = 0$ using SVD . The solution is obtained as the singular vector corresponding to the smallest singular value .

Finally we e extract the intrinsic parameters:

$v_0 = \frac{B_{01} B_{02} - B_{00} B_{12}}{B_{00} B_{11} - B_{01}^2}$
$\lambda = B_{22} - \frac{B_{02}^2 + v_0 (B_{01} B_{02} - B_{00} B_{12})}{B_{00}}$
$\alpha = \sqrt{\lambda / B_{00}}$ (focal length in $x$)
$\beta = \sqrt{\lambda B_{00} / (B_{00} B_{11} - B_{01}^2)}$ (focal length in $y$)
$\gamma = -B_{01} \alpha^2 \beta / \lambda$ (skew factor)
$u_0 = \gamma v_0 / \beta - B_{02} \alpha^2 / \lambda$ (principal point)

The intrinsic matrix is then:

$$ \mathbf{K} = \begin{bmatrix} \alpha & \gamma & u_0 \\ 0 & \beta & v_0 \\ 0 & 0 & 1 \end{bmatrix} $$

Note: There may be degenerate cases, which is why we check the condition number when solving B .

In [104]:

# Calibrate intrinsics


def calibrate_intrinsics(homographies, debug=False):
    # Number of homographies
    n = len(homographies)
    assert n >= 2, "At least two homographies are required for calibration"

    # Construct the constraint matrix V
    V = []
    for H in homographies:
        h1, h2 = H[:, 0], H[:, 1]

        # Compute the constraint vectors
        v_12 = np.array(
            [
                h1[0] * h2[0],
                h1[0] * h2[1] + h1[1] * h2[0],
                h1[1] * h2[1],
                h1[2] * h2[0] + h1[0] * h2[2],
                h1[2] * h2[1] + h1[1] * h2[2],
                h1[2] * h2[2],
            ]
        )

        v_11 = np.array(
            [
                h1[0] ** 2,
                2 * h1[0] * h1[1],
                h1[1] ** 2,
                2 * h1[0] * h1[2],
                2 * h1[1] * h1[2],
                h1[2] ** 2,
            ]
        )

        v_22 = np.array(
            [
                h2[0] ** 2,
                2 * h2[0] * h2[1],
                h2[1] ** 2,
                2 * h2[0] * h2[2],
                2 * h2[1] * h2[2],
                h2[2] ** 2,
            ]
        )

        V.append(v_12)
        V.append(v_11 - v_22)

    V = np.vstack(V)

    # Solve for the null space of V to get b (which defines B)
    _, S, Vt = np.linalg.svd(V)
    if debug:
        rank_V = np.linalg.matrix_rank(V)
        print(f"Rank of V: {rank_V}")
        print("Singular values of V:", S)
        print("Condition number of V:", S[0] / S[-1])

    b = Vt[-1]

    # Construct the B matrix
    B00, B01, B11, B02, B12, B22 = b
    B = np.array([[B00, B01, B02], [B01, B11, B12], [B02, B12, B22]])

    # Compute intrinsic parameters
    v0 = (B01 * B02 - B00 * B12) / (B00 * B11 - B01**2)
    lambda_ = B22 - (B02**2 + v0 * (B01 * B02 - B00 * B12)) / B00
    alpha = np.sqrt(lambda_ / B00)
    beta = np.sqrt(lambda_ * B00 / (B00 * B11 - B01**2))
    gamma = -B01 * alpha**2 * beta / lambda_
    u0 = gamma * v0 / beta - B02 * alpha**2 / lambda_

    # Intrinsic matrix K
    K = np.array([[alpha, gamma, u0], [0, beta, v0], [0, 0, 1]])
    # Intrinsic matrix K
    K = np.array([[alpha, gamma, u0], [0, beta, v0], [0, 0, 1]])

    return K


#

Camera Extrinsic Estimation ¶

Given $\mathbf{K}$ and a homography $\mathbf{H}$, the extrinsic parameters (rotation $\mathbf{R}$ and translation $\mathbf{t}$) are computed as:

$$ \mathbf{M} = \mathbf{K}^{-1} \mathbf{H} $$

where:

$\mathbf{r}_1 = \frac{\mathbf{M}_{:,1}}{\lambda}$
$\mathbf{r}_2 = \frac{\mathbf{M}_{:,2}}{\lambda}$
$\mathbf{r}_3 = \mathbf{r}_1 \times \mathbf{r}_2$ (ensuring orthogonality)
$\mathbf{t} = \frac{\mathbf{M}_{:,3}}{\lambda}$

The orthonormalization step ensures $\mathbf{R}$ is a valid rotation matrix.

In [105]:

# Compute extrinsic parameters


def compute_extrinsics(H, K):
    K_inv = np.linalg.inv(K)
    M = K_inv @ H
    r1 = M[:, 0]
    r2 = M[:, 1]
    t = M[:, 2]
    # r1 and r2 should be unit vectors
    lambda_ = (np.linalg.norm(r1) + np.linalg.norm(r2)) / 2
    r1 = r1 / lambda_
    r2 = r2 / lambda_
    r3 = np.cross(r1, r2)
    R = np.column_stack((r1, r2, r3))
    U, _, Vt = np.linalg.svd(R)
    R_ortho = U @ Vt
    if np.linalg.det(R_ortho) < 0:
        Vt[-1, :] *= -1  # Flip the last row of Vt
        R_ortho = U @ Vt
    t = t / lambda_
    return np.linalg.inv(pt.transform_from(R, t))


#

Running the Camera Calibration Simulation ¶

Using the simulated world points and camera poses, we generate image observations:

Project world points into the image plane using $\mathbf{K}$ and camera poses $\mathbf{T}_{WC}$.
Compute homographies from world-image correspondences.
Estimate camera intrinsics using multiple homographies.
Recover camera extrinsics for each view.

To validate the calibration:

Intrinsic matrix $\mathbf{K}_{\text{tested}}$ should match the ground truth $\mathbf{K}$.
Estimated camera poses $\mathbf{T}_{WC}^{\text{tested}}$ should align with the expected ones.

In [106]:

# Camera CalibrationSimulation

camera_poses_expected = generate_camera_poses()

pixel_obs_list = []
for i, T_W_C in enumerate(camera_poses_expected):
    # Each column is a point
    points_in_C = (np.linalg.inv(T_W_C) @ homogeneous_world_points)[:3, :]
    uv = K @ points_in_C
    uv_normalized = uv[:2, :] / uv[2, :]
    pixel_obs_list.append(uv_normalized)

# Compute homographies with normalization
homographies = []
for T_W_C, pixel_obs in zip(camera_poses_expected, pixel_obs_list):
    h = estimate_homography(world_points, pixel_points=pixel_obs)

    # Expected homography from intrinsic & extrinsic parameters
    T_C_W = np.linalg.inv(T_W_C)
    h_expected = K.dot(T_C_W[:3, [0, 1, 3]])
    h_expected /= h_expected[2, 2]

    # Check if estimated homography is close to expected
    assert np.allclose(h, h_expected), f"{h=}, {h_expected=}"

    homographies.append(h)

# Compute intrinsic parameters
K_tested = calibrate_intrinsics(homographies=homographies, debug=True)
assert np.allclose(K, K_tested), f"{K=}, {K_tested=}"


# Camera poses are expressed in world frame.
camera_poses_tested = []
for T_W_C_expected, H in zip(camera_poses_expected, homographies):
    T_W_C_tested = compute_extrinsics(H=H, K=K_tested)
    camera_poses_tested.append(T_W_C_tested)
    assert np.allclose(T_W_C_expected, T_W_C_tested)

#

Rank of V: 5
Singular values of V: [3.7186e+04 2.8990e+04 3.3617e+03 1.1681e+01 1.6096e+00 1.4826e-16]
Condition number of V: 2.508181506987467e+20

Hand-Eye Calibration ¶

Hand-Eye Calibration using ( AX = XB ) ¶

I primarily follow Robot Sensor Calibration: Solving AX = XB on the Euclidean Group , except that I use SVD to solve for rotation, similar to the approach in point cloud alignment.

Hand-eye calibration is the process of estimating the transformation between a camera and a robot end-effector when the camera is rigidly mounted on the robot. This calibration helps in accurately relating observations from the camera to the robot's movement.

In this section, we implement the Park & Martin method to solve the classic hand-eye calibration equation :

$$ AX = XB $$

where:

$A_i$ represents the relative transformation between two camera poses $T_{C_iC_j}$
$B_i$ represents the relative transformation between two end-effector poses $T_{E_iE_j}$.
$X$ is the unknown transformation from the camera frame to the end-effector frame $T_{CE}$.

Estimating Rotation $\mathbf{R_{CE}}$ and $\mathbf{t_{CE}}$ ¶

Extracting the rotation component from $T_{C_i C_j} X = X T_{E_i E_j}$,we obtain: $$ R_{C_i C_j} R_{CE} = R_{CE} R_{E_i E_j} $$

Taking the logarithm map of the rotation part:

$$ \log( R_{C_i C_j} ) = R \log( R_{E_i E_j} ) R^{-1} $$

Since the Lie algebra of rotations follows:

$$ \log( R_{C_i C_j} ) = [\alpha] $$

$$ \log( R_{E_i E_j} ) = [\beta] $$

We obtain the relationship:

$$ [\alpha] = R [\beta] R^{-1} $$

Using vector form:

$$ \alpha = R \beta $$

Given multiple transformation pairs $\alpha_i, \beta_i$, we solve for $R$ by minimizing:

$$ \sum \| \alpha_i - R \beta_i \|^2 $$

which is solved using SVD decomposition .

Once $R_{CE}$ is known, the equation for translation becomes:

$$ t_{C_i C_j} = R t_{E_i E_j} + t $$

Rearranging:

$$ (I - R) t_{E_i E_j} = t_{C_i C_j} - t $$

This forms an overdetermined linear system , which is solved using least squares regression .

In [107]:

# calibrate hand eye

from scipy.linalg import logm


def logm_vector(T):
    R = T[:3, :3]
    log_R = logm(R)
    return np.array([log_R[2, 1], log_R[0, 2], log_R[1, 0]])


def calibrate_hand_eye(T_Ci_Cj_list, T_Ei_Ej_list):
    assert len(T_Ci_Cj_list) > 1
    n_data = len(T_Ci_Cj_list)
    M = np.zeros((3, 3))
    C = np.zeros((3 * n_data, 3))
    d = np.zeros((3 * n_data, 1))

    # This is from the example in the paper
    if n_data == 2:
        alpha = logm_vector(T_Ci_Cj_list[0])
        beta = logm_vector(T_Ei_Ej_list[0])
        alpha2 = logm_vector(T_Ci_Cj_list[1])
        beta2 = logm_vector(T_Ei_Ej_list[1])
        alpha3 = np.cross(alpha, alpha2)
        beta3 = np.cross(beta, beta2)
        A_ = np.vstack([alpha, alpha2, alpha3]).T
        B_ = np.vstack([beta, beta2, beta3]).T
        R = A_.dot(np.linalg.inv(B_))
    else:
        for A, B in zip(T_Ci_Cj_list, T_Ei_Ej_list):
            alpha = logm_vector(A)
            beta = logm_vector(B)
            M += beta.reshape(3, 1) @ alpha.reshape(1, 3)
        # I am ssing SVD instead of the way described in the paper.
        # R = scipy.linalg.sqrtm(np.linalg.inv(M.T @ M)) @ M.T
        U, _, Vt = np.linalg.svd(M)
        R = Vt.T @ U.T

    for i in range(n_data):
        Ra = T_Ci_Cj_list[i][0:3, 0:3]
        ta = T_Ci_Cj_list[i][0:3, 3]
        tb = T_Ei_Ej_list[i][0:3, 3]

        C[3 * i : 3 * i + 3, :] = np.eye(3) - Ra
        d[3 * i : 3 * i + 3, 0] = ta - R @ tb

    t = np.linalg.inv(C.T @ C) @ C.T @ d
    return pt.transform_from(R, t.reshape(3))


#

Run Hand-Eye Calibration Simulation ¶

To validate the hand-eye calibration method, we generate a simulation where a camera is rigidly attached to a robot end-effector . The simulation involves:

Generating ground-truth transformations for both the camera and end-effector.
Computing end-effector poses based on the expected transformations.
Introducing noise to simulate real-world conditions.
Extracting transformation pairs to solve for hand-eye calibration.
Testing the estimated transformation against the ground truth.

In [108]:

# Run Hand-eye calibration Simulation

T_E_C_expected = pt.transform_from(
    R=pr.matrix_from_axis_angle([0, 1, 0, np.radians(88)]),
    p=np.array([0.1, 0.05, 0.05]),
)
T_C_E_expected = np.linalg.inv(T_E_C_expected)
T_W_B_expected = pt.transform_from(
    R=pr.matrix_from_axis_angle([0, 0, 1, np.radians(90)]), p=np.array([10, 10, 0.05])
)
T_B_W_expected = np.linalg.inv(T_W_B_expected)


def generate_end_effector_poses(
    camera_poses_expected, noise_factor=0.0, error_low_bound_m=3, error_high_bound_m=7
):
    random.seed(1024)
    np.random.seed(1024)
    errors_expected = np.random.uniform(
        error_low_bound_m * noise_factor,
        error_high_bound_m * noise_factor,
        (len(camera_poses_expected), 3),
    )

    # These are in Base frame
    end_effector_poses = []
    end_effector_poses_noisy = []
    for i, T_W_C in enumerate(camera_poses_expected):
        T_B_E = T_B_W_expected @ T_W_C @ T_C_E_expected
        T_B_E_noisy = copy.deepcopy(T_B_E)
        T_B_E_noisy[0:3, -1] += errors_expected[i, :]
        end_effector_poses.append(T_B_E)
        end_effector_poses_noisy.append(T_B_E_noisy)

    return (
        end_effector_poses,
        end_effector_poses_noisy,
        [np.array(error) for error in errors_expected],
    )


def generate_AB_pairs(camera_poses, end_effector_poses, n_pairs=2, T_C_E_expected=None):
    random.seed(1024)
    np.random.seed(1024)
    # pick n random pairs
    T_Ci_Cj_list = []
    T_Ei_Ej_list = []
    indics = range(len(end_effector_poses))
    selected_pairs = []

    while len(selected_pairs) < n_pairs:
        pair = random.sample(indics, 2)
        sorted_pair = tuple(sorted(pair))  # Ensure pairs are ordered
        if sorted_pair not in selected_pairs:
            selected_pairs.append(sorted_pair)

    for pair in selected_pairs:
        T_W_Ci = camera_poses[pair[0]]
        T_W_Cj = camera_poses[pair[1]]
        T_Ci_Cj = np.linalg.inv(T_W_Ci) @ T_W_Cj
        T_B_Ei = end_effector_poses[pair[0]]
        T_B_Ej = end_effector_poses[pair[1]]
        T_Ei_Ej = np.linalg.inv(T_B_Ei) @ T_B_Ej
        T_Ci_Cj_list.append(T_Ci_Cj)
        T_Ei_Ej_list.append(T_Ei_Ej)
        if T_C_E_expected is not None:
            # Whe we have perfect inputs, we can check AX = XB
            assert np.allclose(T_Ci_Cj @ T_C_E_expected, T_C_E_expected @ T_Ei_Ej)
    return T_Ci_Cj_list, T_Ei_Ej_list


noise_factor = 1e-3
n_pairs = 2
end_effector_poses, end_effector_poses_noisy, errors_expected = (
    generate_end_effector_poses(camera_poses_expected, noise_factor=noise_factor)
)

T_Ci_Cj_list, T_Ei_Ej_list = generate_AB_pairs(
    camera_poses_expected,
    end_effector_poses,
    n_pairs=n_pairs,
    T_C_E_expected=T_C_E_expected,
)
T_C_E_tested = calibrate_hand_eye(T_Ci_Cj_list, T_Ei_Ej_list)
assert np.allclose(T_C_E_tested, T_C_E_expected)
# AX = XB -> T_Ci_Cj @ T_C_E_expected = T_C_E_expected @ T_Ei_Ej
# We can also have T_Wi_Wj @ T_W_B_expected = T_W_B_expected @ T_Bi_Bj
T_Wi_Wj_list, T_Bi_Bj_list = generate_AB_pairs(
    [np.linalg.inv(T_W_C) for T_W_C in camera_poses_expected],
    [np.linalg.inv(T_B_E) for T_B_E in end_effector_poses],
    n_pairs=2,
    T_C_E_expected=T_W_B_expected,
)
T_W_B_tested = calibrate_hand_eye(T_Wi_Wj_list, T_Bi_Bj_list)
assert np.allclose(T_W_B_tested, T_W_B_expected)


T_Ci_Cj_list, T_Ei_Ej_list = generate_AB_pairs(
    camera_poses_tested, end_effector_poses, n_pairs=n_pairs
)
T_C_E_tested = calibrate_hand_eye(T_Ci_Cj_list, T_Ei_Ej_list)
# This seems that our camera_poses_tested is quite good...
assert np.allclose(T_C_E_tested, T_C_E_expected)

# Finally, try use noisy estimation
T_Ci_Cj_list, T_Ei_Ej_list = generate_AB_pairs(
    camera_poses_tested, end_effector_poses_noisy, n_pairs=n_pairs
)
T_C_E_tested = calibrate_hand_eye(T_Ci_Cj_list, T_Ei_Ej_list)

T_Wi_Wj_list, T_Bi_Bj_list = generate_AB_pairs(
    [np.linalg.inv(T_W_C) for T_W_C in camera_poses_tested],
    [np.linalg.inv(T_B_E) for T_B_E in end_effector_poses],
    n_pairs=2,
    T_C_E_expected=T_W_B_expected,
)
# or T_W_B_tested = camera_poses_tested[0] @ T_C_E_tested @ np.linalg.inv(end_effector_poses_noisy[0])
T_W_B_tested = calibrate_hand_eye(T_Wi_Wj_list, T_Bi_Bj_list)
T_B_W_tested = np.linalg.inv(T_W_B_tested)

# assert np.allclose(T_C_E_tested, T_C_E_expected)
# assert np.allclose(T_W_B_tested, T_W_B_expected)

#

Factor Graph based Hand-Eye Calibration Optimization ¶

Several degenerate cases can arise in the previous processes, such as insufficient views to estimate $H$ or $K$ and inadequate movement to determine $T_{CE}$. Additionally, real-world measurements are imperfect, with errors in pixel observations, robot arm end-effector poses, and other factors.

When performing hand-eye calibration, it is ideal to capture a diverse range of calibration board views and execute various robot arm movements. Given the limited field of view of a single camera with a single calibration board, using multiple boards placed throughout the room can improve accuracy.

To address these issues after we have enought data, we formulate the problem as an optimization task to estimate the variables of interest.

Note: In this tutorial, we will not simulate the optimization of camera intrinsics.

Import Libraries ¶

In [109]:

# Import symforce

import symforce

symforce.set_epsilon_to_symbol()

import numpy as np
from symforce.values import Values
import symforce.symbolic as sf

from symforce.opt.factor import Factor
from symforce.opt.optimizer import Optimizer

import sym


def symforce_from_projection_matrix(T):
    return sym.Pose3(R=sym.Rot3.from_rotation_matrix(T[0:3, 0:3]), t=T[0:3, 3])


#

Define Residual Functions ¶

The function reprojection_residual_from_camera computes the reprojection error of a landmark observed by a camera. It assumes that the camera pose is being optimized in the world frame.

$T_{W_C}^{-1} \cdot W_t^{landmark}$ transforms the landmark position into the camera frame.
$K \cdot (...)$ projects the transformed point into image coordinates.
The residual is computed as the difference between the observed pixel location
$uv_{obs}$ and the projected landmark position.

To enforce consistency in transformations, the function pose_loop_residual ensures that the transformation chain forms a valid loop $T_{BB} = T_{BW} \cdot T_{WC} \cdot T_{CE} \cdot (T_{BE_n} \cdot T_{E_nE})^{-1}$.

NOTE: We might also want to have a prior constraint on error correction term $T_{E_nE}$.

In [110]:

# Residual functions


# This is purely optimizing camera poses in world frame
def reprojection_residual_from_camera(
    T_W_C: sf.Pose3,
    W_t_landmark: sf.V3,
    uv_obs: sf.V2,
    K: sf.Matrix33,
    epsilon: sf.Scalar,
) -> sf.V2:
    uv = K * (T_W_C.inverse() * W_t_landmark)
    # TODO: This is not safe, should use epsilon. And obs could go out of view.
    return sf.V2(uv_obs[0] - uv[0] / uv[2], uv_obs[1] - uv[1] / uv[2])


# This is not using camera poses at all. If we combine this with above,
# It seems that we can have the pose loop constraint as well:
# T_B_W * T_W_C * T_C_E * T_B_E_correction.inverse()
def reprojection_residual_from_arm(
    T_B_W: sf.Pose3,
    T_B_E: sf.Pose3,
    T_B_E_position_error: sf.V3,
    T_C_E: sf.Pose3,
    W_t_landmark: sf.V3,
    uv_obs: sf.V2,
    K: sf.Matrix33,
    epsilon: sf.Scalar,
) -> sf.V2:
    T_B_E_correction = sf.Pose3(
        R=T_B_E.rotation(), t=T_B_E_position_error - T_B_E.position()
    )
    T_C_B = T_C_E * T_B_E_correction.inverse()
    uv = K * (T_C_B * T_B_W * W_t_landmark)
    # TODO: This is not safe, should use epsilon
    return sf.V2(uv_obs[0] - uv[0] / uv[2], uv_obs[1] - uv[1] / uv[2])


def pose_loop_residual(
    T_W_C: sf.Pose3,
    T_B_W: sf.Pose3,
    T_B_E: sf.Pose3,
    T_B_E_position_error: sf.V3,
    T_C_E: sf.Pose3,
    epsilon: sf.Scalar,
) -> sf.V2:
    T_B_E_correction = sf.Pose3(
        R=T_B_E.rotation(), t=T_B_E.position() - T_B_E_position_error
    )
    T_B_B = T_B_W * T_W_C * T_C_E * T_B_E_correction.inverse()
    return T_B_B.to_tangent(epsilon=epsilon)


#

Generate Factors ¶

In [111]:

# Generate different factors


def generate_factors(
    n_camera_poses,
    n_world_points,
    use_reprojection_residual_from_camera=False,
    use_reprojection_residual_from_arm=False,
    use_pose_residual=False,
):
    factors = []
    if use_reprojection_residual_from_camera:
        for i in range(n_camera_poses):
            for j in range(n_world_points):
                factors.append(
                    Factor(
                        residual=reprojection_residual_from_camera,
                        keys=[
                            f"T_W_Cs[{i}]",
                            f"W_t_landmarks[{j}]",
                            f"uv_obs[{i}][{j}]",
                            "K",
                            "epsilon",
                        ],
                    )
                )
    if use_reprojection_residual_from_arm:
        for i in range(n_camera_poses):
            for j in range(n_world_points):
                factors.append(
                    Factor(
                        residual=use_reprojection_residual_from_arm,
                        keys=[
                            "T_B_W",
                            f"T_B_Es[{i}]",
                            f"T_B_E_position_errors[{i}]",
                            "T_C_E",
                            f"W_t_landmarks[{j}]",
                            f"uv_obs[{i}][{j}]",
                            "K",
                            "epsilon",
                        ],
                    )
                )
    if use_pose_residual:
        for i in range(n_camera_poses):
            factors.append(
                Factor(
                    residual=pose_loop_residual,
                    keys=[
                        f"T_W_Cs[{i}]",
                        "T_B_W",
                        f"T_B_Es[{i}]",
                        f"T_B_E_position_errors[{i}]",
                        "T_C_E",
                        "epsilon",
                    ],
                )
            )
    assert len(factors) > 0
    return factors


#

Optimize Camera Poses ¶

The pose estimation from camera calibration appears to be quite accurate, as the residual reduction is small.

In [112]:

# Only optimize camera poses

uv_obs = []
for pixel_obs in pixel_obs_list:
    obs_per_camera = []
    for obs in pixel_obs.T:
        obs_per_camera.append(sf.V2(obs[0], obs[1]))
    uv_obs.append(obs_per_camera)

W_t_landmarks = []
for world_point in world_points:
    W_t_landmarks.append(sf.V3(world_point[0], world_point[1], world_point[2]))

inputs = Values(
    T_W_Cs=[symforce_from_projection_matrix(T_W_C) for T_W_C in camera_poses_tested],
    W_t_landmarks=W_t_landmarks,
    uv_obs=uv_obs,
    K=K,
    epsilon=sf.numeric_epsilon,
)

factors = generate_factors(
    n_camera_poses=len(camera_poses_expected),
    n_world_points=len(world_points),
    use_reprojection_residual_from_camera=True,
    use_reprojection_residual_from_arm=False,
    use_pose_residual=False,
)
optimized_keys = []
optimized_keys.extend([f"T_W_Cs[{i}]" for i in range(len(camera_poses_tested))])

optimizer = Optimizer(
    factors=factors,
    optimized_keys=optimized_keys,
    # So that we save more information babout each iteration, to visualize later:
    debug_stats=True,
)

result = optimizer.optimize(inputs)
assert result.status == Optimizer.Status.SUCCESS
print(f"{len(result.iterations)=}")

initial_linearization = optimizer.linearize(result.initial_values)
print(f"Initial error: {initial_linearization.error()}")

optimized_linearization = optimizer.linearize(result.optimized_values)
print(f"Final error: {optimized_linearization.error()}")

#

len(result.iterations)=3
Initial error: 3.4163439793428014e-21
Final error: 5.505113158739144e-25

Optimize End Effector Pose Corrections and $T_{CE}$ ¶

In [122]:

# Optimize T_B_E_position_errors

uv_obs = []
for pixel_obs in pixel_obs_list:
    obs_per_camera = []
    for obs in pixel_obs.T:
        obs_per_camera.append(sf.V2(obs[0], obs[1]))
    uv_obs.append(obs_per_camera)

W_t_landmarks = []
for world_point in world_points:
    W_t_landmarks.append(sf.V3(world_point[0], world_point[1], world_point[2]))

inputs = Values(
    T_W_Cs=[symforce_from_projection_matrix(T_W_C) for T_W_C in camera_poses_tested],
    T_B_W=symforce_from_projection_matrix(T_B_W_tested),
    T_B_Es=[
        symforce_from_projection_matrix(T_B_E) for T_B_E in end_effector_poses_noisy
    ],
    T_B_E_position_errors=[sf.V3(0, 0, 0)] * len(end_effector_poses_noisy),
    T_C_E=symforce_from_projection_matrix(T_C_E_tested),
    W_t_landmarks=W_t_landmarks,
    uv_obs=uv_obs,
    K=K,
    epsilon=sf.numeric_epsilon,
)

# Let's just try to optimize T_C_E, and T_B_E_position_errors

factors = generate_factors(
    n_camera_poses=len(camera_poses_expected),
    n_world_points=len(world_points),
    use_reprojection_residual_from_camera=False,
    use_reprojection_residual_from_arm=False,
    use_pose_residual=True,
)

optimized_keys = []
optimized_keys.extend(
    [f"T_B_E_position_errors[{i}]" for i in range(len(end_effector_poses_noisy))]
)
optimized_keys.extend(["T_C_E"])

optimizer = Optimizer(
    factors=factors,
    optimized_keys=optimized_keys,
    # So that we save more information babout each iteration, to visualize later:
    debug_stats=True,
    params=Optimizer.Params(
        verbose=False,
        iterations=1000,
        early_exit_min_reduction=1e-6,
    ),
)

result = optimizer.optimize(inputs)
assert result.status == Optimizer.Status.SUCCESS
print(f"{len(result.iterations)=}")

initial_linearization = optimizer.linearize(result.initial_values)
print(f"Initial error: {initial_linearization.error()}")

optimized_linearization = optimizer.linearize(result.optimized_values)
print(f"Final error: {optimized_linearization.error()}")

if True:
    print("========")
    print("T_CE(initial)  : ", result.initial_values.attr.T_C_E.position())
    print("T_CE(optimized): ", result.optimized_values.attr.T_C_E.position())
    print("T_CE(expected) : ", T_C_E_expected[:, -1][0:3])
    print("========")
    np.set_printoptions(precision=4, suppress=False)
    print("expected_errors :", [np.array(error) for error in errors_expected])
    print("optimized_errors:", result.optimized_values.attr.T_B_E_position_errors)
    print("========")

previous_result = result
#

len(result.iterations)=11
Initial error: 0.005298482748792415
Final error: 1.2656542091601458e-13
========
T_CE(initial)  :  [ 0.0452 -0.0588 -0.0945]
T_CE(optimized):  [ 0.0448 -0.0458 -0.1041]
T_CE(expected) :  [ 0.0465 -0.05   -0.1017]
========
expected_errors : [array([0.0056, 0.007 , 0.0051]), array([0.0056, 0.0054, 0.006 ]), array([0.0035, 0.003 , 0.0036]), array([0.0058, 0.0054, 0.0066]), array([0.0048, 0.0043, 0.0067]), array([0.0049, 0.0062, 0.0038]), array([0.0063, 0.0036, 0.0057]), array([0.0032, 0.0061, 0.0059]), array([0.0061, 0.0037, 0.0032]), array([0.0066, 0.0037, 0.0061]), array([0.0041, 0.0056, 0.007 ]), array([0.0041, 0.0047, 0.0061]), array([0.004 , 0.0032, 0.0065]), array([0.004 , 0.0046, 0.0042]), array([0.0069, 0.0053, 0.0041]), array([0.0065, 0.0033, 0.0061]), array([0.0056, 0.0038, 0.006 ]), array([0.0068, 0.006 , 0.0057]), array([0.0054, 0.0038, 0.0047]), array([0.0062, 0.006 , 0.0058]), array([0.0034, 0.0063, 0.005 ]), array([0.0052, 0.005 , 0.0049]), array([0.0034, 0.0065, 0.0037]), array([0.0044, 0.0038, 0.0051]), array([0.0066, 0.0041, 0.0059]), array([0.0061, 0.003 , 0.0064]), array([0.0058, 0.0057, 0.0039]), array([0.0052, 0.0069, 0.0039]), array([0.0062, 0.0034, 0.0058]), array([0.0036, 0.0041, 0.0069]), array([0.0069, 0.0036, 0.0068]), array([0.0064, 0.0069, 0.0044])]
optimized_errors: [array([0.004 , 0.007 , 0.0002]), array([0.004 , 0.0046, 0.0012]), array([ 0.0019,  0.0014, -0.001 ]), array([0.0042, 0.0029, 0.0023]), array([0.0056, 0.0028, 0.0018]), array([ 0.0064,  0.0052, -0.001 ]), array([0.0085, 0.003 , 0.0011]), array([0.0062, 0.0059, 0.0017]), array([ 0.0069,  0.0052, -0.0017]), array([0.0066, 0.0055, 0.0013]), array([0.0035, 0.0079, 0.0023]), array([0.0028, 0.0073, 0.0018]), array([0.0024, 0.0032, 0.0016]), array([ 0.0024,  0.0038, -0.0006]), array([ 0.0053,  0.0036, -0.0005]), array([0.0049, 0.0009, 0.0018]), array([0.0039, 0.0038, 0.0011]), array([0.0052, 0.0052, 0.0009]), array([3.7438e-03, 2.0942e-03, 9.9781e-05]), array([0.0046, 0.0036, 0.0016]), array([4.2196e-03, 4.9197e-03, 9.1918e-05]), array([6.7145e-03, 3.9874e-03, 3.2584e-05]), array([ 0.0056,  0.0059, -0.0009]), array([0.0073, 0.0036, 0.0009]), array([0.0074, 0.0055, 0.001 ]), array([0.0062, 0.0049, 0.0016]), array([ 0.0052,  0.008 , -0.0007]), array([ 0.0039,  0.0096, -0.0003]), array([0.0046, 0.0034, 0.0009]), array([0.0019, 0.0033, 0.0021]), array([0.0053, 0.0019, 0.0022]), array([0.0048, 0.0044, 0.0002])]
========

Optimize End Effector Pose Corrections ¶

Now, let's focus on optimizing the end-effector pose correction $T_{E_nE}$. The relative error reduction is significant, even though the absolute system error with the initial value is quite small (we have perfect measurements).

In [136]:

# Optimize T_B_E_position_errors

uv_obs = []
for pixel_obs in pixel_obs_list:
    obs_per_camera = []
    for obs in pixel_obs.T:
        obs_per_camera.append(sf.V2(obs[0], obs[1]))
    uv_obs.append(obs_per_camera)

W_t_landmarks = []
for world_point in world_points:
    W_t_landmarks.append(sf.V3(world_point[0], world_point[1], world_point[2]))

inputs = Values(
    T_W_Cs=[symforce_from_projection_matrix(T_W_C) for T_W_C in camera_poses_tested],
    T_B_W=symforce_from_projection_matrix(T_B_W_tested),
    T_B_Es=[
        symforce_from_projection_matrix(T_B_E) for T_B_E in end_effector_poses_noisy
    ],
    T_B_E_position_errors=[sf.V3(0, 0, 0)] * len(end_effector_poses_noisy),
    T_C_E=symforce_from_projection_matrix(T_C_E_tested),
    W_t_landmarks=W_t_landmarks,
    uv_obs=uv_obs,
    K=K,
    epsilon=sf.numeric_epsilon,
)

factors = generate_factors(
    n_camera_poses=len(camera_poses_expected),
    n_world_points=len(world_points),
    use_reprojection_residual_from_camera=False,
    use_reprojection_residual_from_arm=False,
    use_pose_residual=True,
)

optimized_keys = []
optimized_keys.extend(
    [f"T_B_E_position_errors[{i}]" for i in range(len(end_effector_poses_noisy))]
)

optimizer = Optimizer(
    factors=factors,
    optimized_keys=optimized_keys,
    # So that we save more information babout each iteration, to visualize later:
    debug_stats=True,
)

result = optimizer.optimize(inputs)
assert result.status == Optimizer.Status.SUCCESS
print(f"{len(result.iterations)=}")

initial_linearization = optimizer.linearize(result.initial_values)
print(f"Initial error: {initial_linearization.error()}")

optimized_linearization = optimizer.linearize(result.optimized_values)
print(f"Final error: {optimized_linearization.error()}")

if True:
    print("========")
    np.set_printoptions(precision=4, suppress=False)
    print("expected_errors :", [np.array(error) for error in errors_expected])
    print("optimized_errors:", result.optimized_values.attr.T_B_E_position_errors)
    print("========")

previous_result = result
#

len(result.iterations)=8
Initial error: 0.005298482748792109
Final error: 3.1803870859392485e-13
========
expected_errors : [array([0.0056, 0.007 , 0.0051]), array([0.0056, 0.0054, 0.006 ]), array([0.0035, 0.003 , 0.0036]), array([0.0058, 0.0054, 0.0066]), array([0.0048, 0.0043, 0.0067]), array([0.0049, 0.0062, 0.0038]), array([0.0063, 0.0036, 0.0057]), array([0.0032, 0.0061, 0.0059]), array([0.0061, 0.0037, 0.0032]), array([0.0066, 0.0037, 0.0061]), array([0.0041, 0.0056, 0.007 ]), array([0.0041, 0.0047, 0.0061]), array([0.004 , 0.0032, 0.0065]), array([0.004 , 0.0046, 0.0042]), array([0.0069, 0.0053, 0.0041]), array([0.0065, 0.0033, 0.0061]), array([0.0056, 0.0038, 0.006 ]), array([0.0068, 0.006 , 0.0057]), array([0.0054, 0.0038, 0.0047]), array([0.0062, 0.006 , 0.0058]), array([0.0034, 0.0063, 0.005 ]), array([0.0052, 0.005 , 0.0049]), array([0.0034, 0.0065, 0.0037]), array([0.0044, 0.0038, 0.0051]), array([0.0066, 0.0041, 0.0059]), array([0.0061, 0.003 , 0.0064]), array([0.0058, 0.0057, 0.0039]), array([0.0052, 0.0069, 0.0039]), array([0.0062, 0.0034, 0.0058]), array([0.0036, 0.0041, 0.0069]), array([0.0069, 0.0036, 0.0068]), array([0.0064, 0.0069, 0.0044])]
optimized_errors: [array([0.0043, 0.0052, 0.0163]), array([0.0043, 0.0055, 0.0173]), array([0.0023, 0.0051, 0.0147]), array([0.0045, 0.0094, 0.0172]), array([0.007 , 0.0041, 0.0179]), array([0.0054, 0.005 , 0.0151]), array([0.0051, 0.0014, 0.0169]), array([0.0004, 0.003 , 0.0165]), array([0.0052, 0.0058, 0.0144]), array([0.0073, 0.0047, 0.0174]), array([0.0066, 0.0057, 0.0181]), array([0.0082, 0.0038, 0.0167]), array([0.0027, 0.0014, 0.0176]), array([0.0027, 0.0048, 0.0156]), array([0.0056, 0.0074, 0.0152]), array([0.0052, 0.0073, 0.0167]), array([0.0043, 0.002 , 0.0172]), array([0.0055, 0.0062, 0.017 ]), array([0.0041, 0.0059, 0.0158]), array([0.005 , 0.0101, 0.0164]), array([0.0056, 0.0061, 0.0162]), array([0.0057, 0.0038, 0.0162]), array([0.0022, 0.0043, 0.0148]), array([0.0016, 0.0006, 0.0157]), array([0.0057, 0.0061, 0.0171]), array([0.0069, 0.0041, 0.0177]), array([0.0083, 0.0058, 0.015 ]), array([0.0094, 0.006 , 0.0145]), array([0.0049, 0.0016, 0.017 ]), array([0.0023, 0.0043, 0.0182]), array([0.0056, 0.0057, 0.018 ]), array([0.0051, 0.0109, 0.015 ])]
========

Optimize End Effector Pose Corrections and $T_{CE}$ again ¶

After getting end-effector pose correction $T_{E_nE}$, I try to optimize both $T_{E_nE}$ and $T_{CE}$. Interestingly, there is no movement of $T_{CE}$ but just $T_{E_nE}$.

In [137]:

# Let's just try to optimize T_C_E and T_B_E_position_errors

T_B_E_position_errors = [
    sf.V3(error[0], error[1], error[2])
    for error in previous_result.optimized_values.attr.T_B_E_position_errors
]

if False:
    T_B_E_position_errors = [
        sf.V3(error[0], error[1], error[2]) for error in errors_expected
    ]

inputs = Values(
    T_W_Cs=[symforce_from_projection_matrix(T_W_C) for T_W_C in camera_poses_tested],
    T_B_W=symforce_from_projection_matrix(T_B_W_tested),
    T_B_Es=[
        symforce_from_projection_matrix(T_B_E) for T_B_E in end_effector_poses_noisy
    ],
    T_B_E_position_errors=T_B_E_position_errors,
    T_C_E=symforce_from_projection_matrix(T_C_E_tested),
    W_t_landmarks=W_t_landmarks,
    uv_obs=uv_obs,
    K=K,
    epsilon=sf.numeric_epsilon,
)

factors = generate_factors(
    n_camera_poses=len(camera_poses_expected),
    n_world_points=len(world_points),
    use_reprojection_residual_from_camera=False,
    use_reprojection_residual_from_arm=False,
    use_pose_residual=True,
)

optimized_keys = []
optimized_keys.extend(
    [f"T_B_E_position_errors[{i}]" for i in range(len(end_effector_poses_noisy))]
)
optimized_keys.extend(["T_C_E"])

optimizer = Optimizer(
    factors=factors,
    optimized_keys=optimized_keys,
    # So that we save more information babout each iteration, to visualize later:
    debug_stats=True,
)

result = optimizer.optimize(inputs)
assert result.status == Optimizer.Status.SUCCESS
print(f"{len(result.iterations)=}")

initial_linearization = optimizer.linearize(result.initial_values)
print(f"Initial error: {initial_linearization.error()}")

optimized_linearization = optimizer.linearize(result.optimized_values)
print(f"Final error: {optimized_linearization.error()}")

if True:
    print("========")
    print("T_CE(initial)  : ", result.initial_values.attr.T_C_E.position())
    print("T_CE(optimized): ", result.optimized_values.attr.T_C_E.position())
    print("T_CE(expected) : ", T_C_E_expected[:, -1][0:3])
    print("========")
    np.set_printoptions(precision=4, suppress=False)
    print("expected_errors :", [np.array(error) for error in errors_expected])
    print("optimized_errors:", result.optimized_values.attr.T_B_E_position_errors)
    print("========")

previous_result = result
#

len(result.iterations)=10
Initial error: 3.180387086041939e-13
Final error: 1.2656542081530388e-13
========
T_CE(initial)  :  [ 0.0452 -0.0588 -0.0945]
T_CE(optimized):  [ 0.0452 -0.0588 -0.0945]
T_CE(expected) :  [ 0.0465 -0.05   -0.1017]
========
expected_errors : [array([0.0056, 0.007 , 0.0051]), array([0.0056, 0.0054, 0.006 ]), array([0.0035, 0.003 , 0.0036]), array([0.0058, 0.0054, 0.0066]), array([0.0048, 0.0043, 0.0067]), array([0.0049, 0.0062, 0.0038]), array([0.0063, 0.0036, 0.0057]), array([0.0032, 0.0061, 0.0059]), array([0.0061, 0.0037, 0.0032]), array([0.0066, 0.0037, 0.0061]), array([0.0041, 0.0056, 0.007 ]), array([0.0041, 0.0047, 0.0061]), array([0.004 , 0.0032, 0.0065]), array([0.004 , 0.0046, 0.0042]), array([0.0069, 0.0053, 0.0041]), array([0.0065, 0.0033, 0.0061]), array([0.0056, 0.0038, 0.006 ]), array([0.0068, 0.006 , 0.0057]), array([0.0054, 0.0038, 0.0047]), array([0.0062, 0.006 , 0.0058]), array([0.0034, 0.0063, 0.005 ]), array([0.0052, 0.005 , 0.0049]), array([0.0034, 0.0065, 0.0037]), array([0.0044, 0.0038, 0.0051]), array([0.0066, 0.0041, 0.0059]), array([0.0061, 0.003 , 0.0064]), array([0.0058, 0.0057, 0.0039]), array([0.0052, 0.0069, 0.0039]), array([0.0062, 0.0034, 0.0058]), array([0.0036, 0.0041, 0.0069]), array([0.0069, 0.0036, 0.0068]), array([0.0064, 0.0069, 0.0044])]
optimized_errors: [array([0.0043, 0.0052, 0.0163]), array([0.0043, 0.0055, 0.0173]), array([0.0023, 0.0051, 0.0147]), array([0.0045, 0.0094, 0.0172]), array([0.007 , 0.0041, 0.0179]), array([0.0054, 0.005 , 0.0151]), array([0.0051, 0.0014, 0.0169]), array([0.0004, 0.003 , 0.0165]), array([0.0052, 0.0058, 0.0144]), array([0.0073, 0.0047, 0.0174]), array([0.0066, 0.0057, 0.0181]), array([0.0082, 0.0038, 0.0167]), array([0.0027, 0.0014, 0.0176]), array([0.0027, 0.0048, 0.0156]), array([0.0056, 0.0074, 0.0152]), array([0.0052, 0.0073, 0.0167]), array([0.0043, 0.002 , 0.0172]), array([0.0055, 0.0062, 0.017 ]), array([0.0041, 0.0059, 0.0158]), array([0.005 , 0.0101, 0.0164]), array([0.0056, 0.0061, 0.0162]), array([0.0057, 0.0038, 0.0162]), array([0.0022, 0.0043, 0.0148]), array([0.0016, 0.0006, 0.0157]), array([0.0057, 0.0061, 0.0171]), array([0.0069, 0.0041, 0.0177]), array([0.0083, 0.0058, 0.015 ]), array([0.0094, 0.006 , 0.0145]), array([0.0049, 0.0016, 0.017 ]), array([0.0023, 0.0043, 0.0182]), array([0.0056, 0.0057, 0.018 ]), array([0.0051, 0.0109, 0.015 ])]
========

In [138]:

# Let's just try to optimize T_C_E and T_B_E_position_errors (Cheat on initial guesses)

T_B_E_position_errors = [
    sf.V3(error[0], error[1], error[2])
    for error in previous_result.optimized_values.attr.T_B_E_position_errors
]

if True:  # Cheating... :-)
    T_B_E_position_errors = [
        sf.V3(error[0], error[1], error[2]) for error in errors_expected
    ]

inputs = Values(
    T_W_Cs=[symforce_from_projection_matrix(T_W_C) for T_W_C in camera_poses_tested],
    T_B_W=symforce_from_projection_matrix(T_B_W_tested),
    T_B_Es=[
        symforce_from_projection_matrix(T_B_E) for T_B_E in end_effector_poses_noisy
    ],
    T_B_E_position_errors=T_B_E_position_errors,
    T_C_E=symforce_from_projection_matrix(T_C_E_tested),
    W_t_landmarks=W_t_landmarks,
    uv_obs=uv_obs,
    K=K,
    epsilon=sf.numeric_epsilon,
)


factors = generate_factors(
    n_camera_poses=len(camera_poses_expected),
    n_world_points=len(world_points),
    use_reprojection_residual_from_camera=False,
    use_reprojection_residual_from_arm=False,
    use_pose_residual=True,
)

optimized_keys = []
optimized_keys.extend(
    [f"T_B_E_position_errors[{i}]" for i in range(len(end_effector_poses_noisy))]
)
optimized_keys.extend(["T_C_E"])

optimizer = Optimizer(
    factors=factors,
    optimized_keys=optimized_keys,
    # So that we save more information babout each iteration, to visualize later:
    debug_stats=True,
)

result = optimizer.optimize(inputs)
assert result.status == Optimizer.Status.SUCCESS
print(f"{len(result.iterations)=}")

initial_linearization = optimizer.linearize(result.initial_values)
print(f"Initial error: {initial_linearization.error()}")

optimized_linearization = optimizer.linearize(result.optimized_values)
print(f"Final error: {optimized_linearization.error()}")

if True:
    print("========")
    print("T_CE(initial)  : ", result.initial_values.attr.T_C_E.position())
    print("T_CE(optimized): ", result.optimized_values.attr.T_C_E.position())
    print("T_CE(expected) : ", T_C_E_expected[:, -1][0:3])
    print("========")
    np.set_printoptions(precision=4, suppress=False)
    print("expected_errors :", [np.array(error) for error in errors_expected])
    print("optimized_errors:", result.optimized_values.attr.T_B_E_position_errors)
    print("========")
#

len(result.iterations)=11
Initial error: 0.0020829916578865106
Final error: 1.2656542085960234e-13
========
T_CE(initial)  :  [ 0.0452 -0.0588 -0.0945]
T_CE(optimized):  [ 0.0464 -0.0503 -0.1015]
T_CE(expected) :  [ 0.0465 -0.05   -0.1017]
========
expected_errors : [array([0.0056, 0.007 , 0.0051]), array([0.0056, 0.0054, 0.006 ]), array([0.0035, 0.003 , 0.0036]), array([0.0058, 0.0054, 0.0066]), array([0.0048, 0.0043, 0.0067]), array([0.0049, 0.0062, 0.0038]), array([0.0063, 0.0036, 0.0057]), array([0.0032, 0.0061, 0.0059]), array([0.0061, 0.0037, 0.0032]), array([0.0066, 0.0037, 0.0061]), array([0.0041, 0.0056, 0.007 ]), array([0.0041, 0.0047, 0.0061]), array([0.004 , 0.0032, 0.0065]), array([0.004 , 0.0046, 0.0042]), array([0.0069, 0.0053, 0.0041]), array([0.0065, 0.0033, 0.0061]), array([0.0056, 0.0038, 0.006 ]), array([0.0068, 0.006 , 0.0057]), array([0.0054, 0.0038, 0.0047]), array([0.0062, 0.006 , 0.0058]), array([0.0034, 0.0063, 0.005 ]), array([0.0052, 0.005 , 0.0049]), array([0.0034, 0.0065, 0.0037]), array([0.0044, 0.0038, 0.0051]), array([0.0066, 0.0041, 0.0059]), array([0.0061, 0.003 , 0.0064]), array([0.0058, 0.0057, 0.0039]), array([0.0052, 0.0069, 0.0039]), array([0.0062, 0.0034, 0.0058]), array([0.0036, 0.0041, 0.0069]), array([0.0069, 0.0036, 0.0068]), array([0.0064, 0.0069, 0.0044])]
optimized_errors: [array([0.0056, 0.0069, 0.0054]), array([0.0056, 0.0054, 0.0064]), array([0.0035, 0.0031, 0.0039]), array([0.0058, 0.0055, 0.0069]), array([0.0048, 0.0043, 0.007 ]), array([0.0049, 0.0062, 0.0041]), array([0.0062, 0.0035, 0.0061]), array([0.0031, 0.006 , 0.0063]), array([0.0061, 0.0038, 0.0035]), array([0.0066, 0.0037, 0.0064]), array([0.0042, 0.0056, 0.0073]), array([0.0042, 0.0046, 0.0064]), array([0.004 , 0.0032, 0.0068]), array([0.004 , 0.0046, 0.0046]), array([0.0068, 0.0054, 0.0044]), array([0.0064, 0.0034, 0.0064]), array([0.0055, 0.0037, 0.0063]), array([0.0068, 0.006 , 0.006 ]), array([0.0053, 0.0038, 0.005 ]), array([0.0062, 0.0062, 0.0062]), array([0.0035, 0.0063, 0.0053]), array([0.0052, 0.0049, 0.0052]), array([0.0033, 0.0064, 0.004 ]), array([0.0043, 0.0037, 0.0054]), array([0.0066, 0.0042, 0.0062]), array([0.0061, 0.0031, 0.0067]), array([0.0059, 0.0057, 0.0042]), array([0.0054, 0.0069, 0.0043]), array([0.0062, 0.0034, 0.0061]), array([0.0035, 0.0042, 0.0072]), array([0.0068, 0.0037, 0.0072]), array([0.0064, 0.007 , 0.0048])]
========