Use of homogeneous coordinates [Imagery]

Use of homogeneous coordinates

In computer vision, homogeneous coordinates are often used [ 1^[1], 5^[2], 3^[3], 4^[4]]:

in 2D:

\(m = \underbrace{ \left[ \begin{array}{c} x \\ y \end{array} \right] }_{Euclidean ~coordinates} \Rightarrow \tilde{m} = \underbrace{ \left[ \begin{array}{c} x \\ y \\ 1 \end{array} \right] }_{Homogeneous ~coordinates}\)

in 3D:

\(M = \underbrace{ \left[ \begin{array}{c} X \\ Y \\ Z \end{array} \right] }_{Euclidean ~coordinates} \Rightarrow \tilde{M} = \underbrace{ \left[ \begin{array}{c} X \\ Y \\ Z \\ 1 \end{array} \right] }_{Homogeneous ~coordinates}\)

There are several advantages to that. For instance, we will see in the “Transformation between the camera reference frame and the sensor reference frame (retinal plane)” section that it enables expressing the pinhole model with a linear relation.