Transformation between the camera reference frame and the sensor reference frame (retinal plane)

The second transformation, referred as on Figure 2 binds the camera reference frame \(R_{c}\) to the sensor reference frame \(R_{r}\) (retinal plane). This is a perspective projection (\(3 \times 4\) matrix, referred as \([P]\)) that transforms a 3D point \(\left( \begin{array}{ccc} X_{c} & Y_{c} & Z_{c} \end{array} \right)\) into an image point \((x,y)\) (in metric units).

\(s . \left[ \begin{array}{c} x \\ y \\ 1 \end{array} \right] = \left[ \begin{array}{cccc} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 1 & 0 \end{array} \right]  \left[ \begin{array}{c} X_{c} \\ Y_{c} \\ Z_{c} \\ 1 \end{array} \right] = \left[ \mathbf{P} \right] \left[ \begin{array}{c} X_{c} \\ Y_{c} \\ Z_{c} \\ 1 \end{array} \right]\)

where \(f\) refers to the focal length of the lens used.

Remarque

Equation (2) that shows the perspective projection is to be written:

\(x = f \frac{X_{c}}{Z_{c}}\)

\(y = f \frac{Y_{c}}{Z_{c}}\)

These equations are non-linear ones.

The use of homogenous coordinates makes it possible to write the perspective projection (and the complete pinhole camera model) under a linear form (see equation (2)).