Calibration of a stereoscopic vision sensor

When we calibrate a single camera, we mainly focus on the intrinsic parameters defined by the matrix \(K\) and, if desired, on the extrinsic parameters defined by the rigid transformation \(T\) (localization of the camera according to the world reference frame). When we calibrate a stereoscopic vision sensor, we focus on both groups of intrinsic parameters defined by both matrix \(K\) and \(K'\) and to the relative position and orientation of both cameras defined by the rigid transformation \(T_{s}\).

The aim of this sensor calibration is to allow the reconstruction of the tridimensional points observed by both cameras and is therefore very important for all of those who want to reach precise tridimensional measurements.

On a practical way, the process for the calibration of a stereoscopic vision sensor is similar to the process described in section 3 (Calibration of a camera) for the calibration of a camera. A target is placed in the field of view, common to both cameras, and a series of images of that target, viewed under different orientations, is taken by each camera.

For instance, Figure 13 shows a series of 9 pairs of images of a target which served to the calibration of a stereovision sensor.

We will write down the rigid transformations \(T_{i}\) and \(T'_{i}\) respectively for the left and right camera, as below:

\(T_{i} = \left[ \begin{array}{cc} R_{i} & t_{i} \\  0^{T} & 1 \end{array} \right] \mbox{ et } T'_{i} = \left[ \begin{array}{cc} R'_{i} & t'_{i} \\  0^{T} & 1 \end{array} \right]\)

They link the \(i\)-th view of the target respectively to the left camera reference frame and to the right camera one. For each position of the target, we have the following relation (see Figure 14) according to (23):

\(T_{s}T_{i} = T'_{i}\)

Different methods permit the estimation of transformation \(T_{s}\).

The method which is usually used consists in the calibration of each camera independently, using the method described in section 3 (Calibration of a camera), in order to determine the intrinsic parameters and the coefficients of distortion of both cameras. Then, both groups \(\{ T_{i} \} \mbox{ and } \{ T'_{i} \}\) which are the matrices of the extrinsic parameters \(R_{s} \mbox{ and } t_{s}\) can be calculated using any pair \(k \in \{ 1 ... n \}\) of matrices of extrinsic parameters using the equation (23):

\(T_{s} = T'_{k} T_{k}^{-1}\)

The choice of the pair of matrices of the extrinsic parameters \(T_{k} \mbox{ and } T'_{k}\) is delicate and several heuristics are possible, such as:

  • Always choosing arbitrarily the \(k\)-th pair of matrices of the experiment, for instance the first pair \(T_{1} \mbox{ and } T'_{1}\) ;

  • Taking the pair of matrices that corresponds to the lowest global error of reprojection of the target points in both images.

All those heuristics have the inconvenience of not using the redundancy provided by the simultaneous use of all the pairs of matrices of the extrinsic parameters to estimate the transformation T s {bold T}_s .

Dorian Garcia [ 10[1]] suggested a method which enables to estimate \(R_{s} \mbox{ and } t_{s}\) by using all the matrices of the extrinsics \(\{ T_{i} \} \mbox{ and } \{ T'_{i} \}\), and showed that it permits more precision in the calibration.

His method consists in directly calculating \(R_{s} \mbox{ and } t_{s}\) minimizing a function­al of the form:

\(\theta = \arg \underset{\theta}{\min} \overset{n}{\underset{i=1}{\sum}} \overset{p}{\underset{j=1}{\sum}} || \breve{m}_{i}^{j} - F(k,k',d,d',R_{i},t_{i},R_{s},t_{s} ;M_{j} ||^{2}\)

with:

\(\begin{array}{ll} \theta &= (k,k',d,d',R_{1...n},t_{1...n},R_{s},t_{s},M_{1...P} ) \\  \breve{m} &= ( \breve{m} ~~ \breve{m}') \\ ~~ &= \mbox{ vector containing the group of measurements provided by both cameras} \end{array}\)

This problem of non-linear optimization is solved using the Levenberg-Marquardt algorithm.