Taking the distortions into account

Rappel

The pinhole camera model modelizes an ideal camera (simple perspective projection) and does not take into account the eventual geometric distortions induced by the optical system used. Several authors [ 6[1], 7[2]] proved that it was indispensable to take those distortions into account within applications of dimensional metrology in order to correct them.

(Usual) parametric approach

The usual parametric approach consists in the modeling of distortion by the improvement of the pinhole camera model with additional terms (thus the model becomes non linear). With this approach, the model draws its inspiration from the geometric aberration theory for a centered (lens) system by adding some corrective terms corresponding to different types of distortions: radial distortion, prismatic distortion, decentering distortion [ 8[3], 9[4], 10[5]].

From the pinhole camera model, the distortions effects can be taken into account by a fourth transformation \(D\), binding the “ideal” retinal coordinates (2.)[6] \(m_{r} = \left( \begin{array}{cc} x & y \end{array} \right)\) to the “real“ retinal coordinates \(\breve{m_{r}} = \left( \begin{array}{cc} \breve{x} & \breve{y} \end{array} \right)\) :

\(\left( \begin{array}{ccc} X & Y & Z \end{array} \right) \overset{T}{\longrightarrow} \left( \begin{array}{ccc} X_{c} & Y_{c} & Z_{c} \end{array} \right) \overset{P}{\longrightarrow} \boxed{ \left( \begin{array}{cc} x & y \end{array} \right) \overset{D}{\longrightarrow} \left( \begin{array}{cc} \breve{x} & \breve{y} \end{array} \right) } \overset{A}{\longrightarrow} \left( \begin{array}{cc} \breve{u} & \breve{v} \end{array} \right)\)

\(\begin{align*} \breve{m_{r}} &= D(m_{r}) = m_{r} + \delta (m_{r}) \\ &= m_{r} + \overbrace{\underbrace{\delta _{r}(m_{r})}_{radial} + \underbrace{\delta _{d}(m_{r})}_{d \acute{e}centrage} + \underbrace{\delta _{p}(m_{r})}_{prismatique}} \end{align*}\)

Several authors showed that the following model, often called R3D1P1 (3.)[7], is more than enough for most of lens objectives with a focal length superior to \(5mm\):

\(D(m_{r}) = m_{r}(1+r_{1}(x^{2}+y^{2})+r_{2}(x^{2}+y^{2})^{2}+r_{3}(x^{2}+y^{2})^{3}) + \left( \begin{array}{c} d_{1}(3x^{2}+y^{2})+2d_{2}xy+p_{1}(x^{2}+y^{2}) \\ 2d_{1}xy+d_{2}(x^{2}+3y^{2})+p_{2}(x^{2}+y^{2}) \end{array} \right)\)

where \(d = \left( \begin{array}{ccccccc} r_{1} & r_{2} & r_{3} & d_{1} & d_{2} & p_{1} & p_{2} \end{array} \right)\) is the vector of the distortion parameters.

Remarque

It is often enough to use a radial model (from order 1 to 3). Writing down: \(\rho = \sqrt{x^{2}+y^{2}}\), the model R3 is often written as:

\(D(m_{r}) = m_{r}(1+r_{1} \rho ^{2}+r_{2} \rho ^{4}+r_{3} \rho ^{6})\)

We call \(k\) the vector of the intrinsic parameters defined by matrix \(K\), and \(d\) the vector of the distortion coefficients (which are also intrinsic to the camera):

\(\mathbf{k} = \left( \begin{array}{ccccc} c_{x} & c_{y} & f_{x} & f_{y} & \theta \end{array} \right)\)

\(\mathbf{d} = \left( \begin{array}{ccccccc} r_{1} & r_{2} & r_{3} & d_{1} & d_{2} & p_{1} & p_{2} \end{array} \right)\)

The model of camera is non-linear and may be written as a vector function \(F\) :

\(\breve{m} = \mathbf{F} (\mathbf{k}, \mathbf{d}, \mathbf{R}, \mathbf{t}, M )\)

For instance, Figure 5 shows a distortion map (amplitude of the distortion in each pixel of the image) obtained during the calibration of a camera equipped with a \(25mm\) lens objective. This figure perfectly reveals that the main component is the radial distortion (the more we move away from the center of the image, the more the distortion is important). In this example, the distortion is pretty low (the magnitude of the distortion is in the order of 1 pixel at the corners of the image) but it can reach several pixels (or even around 10 pixels) for objectives with lower focal length.

Correction of the distortion

It is sometimes essential to know the ideal pixel coordinates \(\left( \begin{array}{cc} u & v \end{array} \right)\), i.e. non-distorted, corresponding to the distorted ones \(\left( \begin{array}{cc} \breve{u} & \breve{v} \end{array} \right)\).

\(\begin{array} {cc} u &= c_{x} + f_{x} x \\ v &= c_{y} + f_{y} y \end{array} \quad \text{ soit } \quad \begin{array}{cc} x &= \frac{u-c_{x}}{f_{x}} \\ y &= \frac{v-c_{y}}{f_{y}} \end{array}\)

Hence, we can deduce that:

\(\breve{u} = c_{x}+f_{x} \breve{x} = c_{x}+f_{x}(x+ \delta _{x}(x,y)) = c_{x}+(u-c_{x})+f_{x} \delta _{x}(x,y)\)

\(\breve{v} = c_{y}+f_{y} \breve{y} = c_{y}+f_{y}(y+ \delta _{y}(x,y)) = c_{y}+(u-c_{y})+f_{y} \delta _{y}(x,y)\)

From equations (12) and (13), we notice that it is possible to express \(\left( \begin{array}{cc} \breve{u} & \breve{v} \end{array} \right)\) according to \(\left( \begin{array}{cc} u & v \end{array} \right)\) and the intrinsic parameters \(k\) and \(d\) of the camera:

\(\breve{u} = f_{u}(u,v,\mathbf{k},\mathbf{d})\)

\(\breve{v} = f_{v}(u,v,\mathbf{k},\mathbf{d})\)

In the general case, the distortion model given by equation (14) cannot be inverted and it is therefore necessary to use a numerical method to estimate the ideal coordinates \(\left( \begin{array}{cc} u & v \end{array} \right)\) of the image-point that would have been obtained with a camera exempt from distortion.

Let \(\breve{m} = \left( \begin{array}{cc} \breve{u} & \breve{v} \end{array} \right)\) be a pixel of the distorted image. We seek the non distorted pixel \(m = \left( \begin{array}{cc} u & v \end{array} \right)\).

The pixel \(\left( \begin{array}{cc} \breve{u} & \breve{v} \end{array} \right)\) corresponds to point \(\breve{m_{r}} = \left( \begin{array}{cc} \breve{x} & \breve{y} \end{array} \right)\) in the retinal plane:

\(\breve{x} = \frac{\breve{u}-c_{x}}{f_{x}}\)

\(\breve{y} = \frac{\breve{v}-c_{y}}{f_{y}}\)

We seek the point \(m_{r} = \left( \begin{array}{cc} x & y \end{array} \right)\) such as \(D(m_{r}) = \breve{m_{r}}, i.e.\):

\(\breve{m_{r}}-D(m_{r}) = 0\)

Equation (15) can be resolved thanks to the Newton method applied to the function \(f(m_{r}) = \breve{m_{r}}-D(m_{r}) \)that we initialize with \(m_{r}^{(0)} = \breve{m_{r}}\).

The common iteration is given by the formula: \(m_{r}^{(i)} = m_{r}^{(i-1)}-d_{N}^{(i-1)} \mbox{ avec } d_{N}^{(i-1)} \mbox{ solution of } \mathbf{J_{f}} d = f(m_{r}^{(i-1)})\)

Namely:

\(m_{r}^{(i)} = m_{r}^{(i-1)} + \mathbf{J_{D}}(m_{r}^{(i-1)})^{-1}( \breve{m_{r}}-D(m_{r}^{(i-1)}))\)

with:

\(\mathbf{J_{D}} = \left( \begin{array}{cc} \frac{\partial \breve{x}}{\partial x} & \frac{\partial \breve{x}}{\partial y} \\ \frac{\partial \breve{y}}{\partial x} & \frac{\partial \breve{y}}{\partial y}\end{array} \right)_{\vert (x,y) = m_{r}^{(i-1)}}\)

The stop criterion may be based on the value of the error module \(\varepsilon\) after each iteration. In practice, convergence is reached after only few iterations \((\approx 4)\).

After obtaining the retinal coordinates \(\left( \begin{array}{cc} x & y \end{array} \right)\), the equation (12) is used to enable the calculation of the sought coordinates \(\left( \begin{array}{cc} u & v \end{array} \right)\).

Remarque

To calculate a corrected image from a distorted images (which is a different problem from correcting only one point), it is not necessary to use a numerical method in order to invert the distortion model. You just have to use the direct model given by equation (14) and to fill the image-to-draw by sweeping \(u\) and \(v\) from the destination image.

For a given pixel \(\left( \begin{array}{cc} u & v \end{array} \right)\) (in integer coordinates) in the destination image, equation (14) is used to enable the calculation of the coordinates \(\left( \begin{array}{cc} \breve{u} & \breve{v} \end{array} \right)\) of the corresponding point in the source image. Usually, these coordinates are not integer and an interpolation is required in order to calculate the intensity value (grayscale) which must be copied in the destination image at the position \(\left( \begin{array}{cc} u & v \end{array} \right)\) (cf. [ 11[8]]).

Other correction methods of distortion have been suggested in the literature. For a review, see in particular the “perspective camera inverse model” section in [ 12[9]].

For instance, Figure 6 shows a distorted image (on the left) and the corrected image (on the right).

Non-parametric approach

In the case of complex optical systems, some authors [ 13[10]] have proved that it is recommended to modelize the distortion in a non-parametric way using spline functions [ 14[11]].

In that case, it is a purely mathematical modeling (kind of “black box” approach) aiming to determine the distortion function that shows the best the way in which the ideal image is distorted [ 15[12], 16[13]].

As it is a purely mathematical modeling (4.)[14], it is not a problem to adopt the layout of Figure 7 and to seek the distortion function \(D\) that links the ideal image-coordinates \(\left( \begin{array}{cc} u & v \end{array} \right)\) of point \(m\) to the image-coordinates \(\left( \begin{array}{cc} \breve{u} & \breve{v} \end{array} \right)\) of point \(\breve{m}\).

In this approach, to be able to correct the distortion easily, it is better to choose the model schematically presented on Figure 8, in which the reciprocal function \(C\) of distortion correction is used instead of the distortion function \(D\). Indeed, the distortion can be corrected directly using \(C\) hereas it may require a lot of time to do all the calculations in the case of inverting function \(D\) , particularly when its reciprocal function \(C\) cannot be analytically determined. Moreover, the field of definition of the spline function for the correction of \(C\) is a priori known and determined by the dimension of the images, whereas the spline function for the distortion of \(D\) has an a priori unknown definition field since it is expressed in the retinal plane which is defined by the calibration.

The equation of the distortion correction becomes:

\(m = D^{-1}(\breve{m}) = C(\breve{m})\)

The estimation of the function of distortion correction \(C\) consists in approximating horizontal (following the axis \(x\)) and vertical (following the axis \(y\)) components of the distortion correction field by two spline surfaces \(S_{y}\) and \(S_{y}\) [ 13[10], 17[15]].

The function of distortion correction permits to correct the points (or a full image) from their distortion. The corrected points are linked to the 3D entry points by a classical pinhole model the parameters of which are easy to estimate.

We are done with establishing the linear (5) and non-linear (11) models of a camera, we are now going to deal with the calibration methods that permit to estimate the parameters of these models.