`
`Peter Hansen
`Computer Science Department
`Carnegie Mellon University in Qatar
`Doha, Qatar
`phansen@qatar.cmu.edu
`
`Hatem Alismail, Peter Rander, Brett Browning
`National Robotics Engineering Center
`Robotics Institute, Carnegie Mellon University
`Pittsburgh PA, USA
`{halismai,rander,brettb}@cs.cmu.edu
`
`Abstract
`
`Stereo visual odometry and dense scene reconstruction
`depend critically on accurate calibration of the extrinsic
`(relative) stereo camera poses. We present an algorithm
`for continuous, online stereo extrinsic re-calibration oper-
`ating only on sparse stereo correspondences on a per-frame
`basis. We obtain the 5 degree of freedom extrinsic pose
`for each frame, with a fixed baseline, making it possible to
`model time-dependent variations. The initial extrinsic es-
`timates are found by minimizing epipolar errors, and are
`refined via a Kalman Filter (KF). Observation covariances
`are derived from the Cr´amer-Rao lower bound of the solu-
`tion uncertainty. The algorithm operates at frame rate with
`unoptimized Matlab code with over 1000 correspondences
`per frame. We validate its performance using a variety of
`real stereo datasets and simulations.
`
`1. Introduction
`Stereo vision is core to many 3D vision methods in-
`cluding visual odometry and dense scene reconstruction.
`Good calibration, both intrinsic and extrinsic, is essential
`to achieving high accuracy as it impacts image rectifica-
`tion, stereo correspondence search, and triangulation. In-
`trinsic calibration models image formation for each cam-
`era (e.g. [3]), while extrinsic calibration models the 6 de-
`gree of freedom (DOF) pose between the cameras. For real
`systems, extrinsic calibration errors occur more frequently
`due to larger exposure to shock, vibration, thermal variation
`and cycling. For visual odometry in particular, such errors
`lead to biased results. We propose a method to recalibrate
`extrinsic parameters online to correct drift or bias. Fig. 1
`shows epipolar errors for a range of stereo heads. For 1b
`and 1c there is a near constant bias, while 1a drifts possibly
`caused by thermal expansion from the lighting assembly.
`Online calibration remains an active area of research.
`Online intrinsic calibration (auto or self calibration) es-
`timates intrinsic parameters using scene point correspon-
`
`[18, 17, 8, 11]). How-
`dences from multiple views (e.g.
`ever, the results are generally less accurate than offline
`methods [8] using known relative Euclidean control points
`(e.g. [16]). Here, we focus on correcting drifting extrin-
`sic calibration. Carrera et al. [2] calibrated multi-camera
`extrinsics using monocular visual SLAM maps for each
`camera [6], not necessarily with overlapping fields of view.
`However, the extrinsic estimates were assumed to be sta-
`ble over time and monocular SLAM limits real-time perfor-
`mance in large environments. In contrast, continuous meth-
`ods output a unique extrinsic pose for each stereo pair (per
`time step). In [1], a linear essential matrix estimate is used
`to find relative pose, followed by non-linear refinement in-
`corporating depth ordering constraints. Some constraints
`were placed on the extrinsic pose DOF, and experimental
`testing was restricted to small indoor sequences with a sta-
`tionary camera.
`Dang et al. [5, 4] developed an approach that estimates
`the extrinsics using three error metrics incorporated into
`an iterative Extended Kalman Filter (EKF). The error met-
`rics are derived from bundle adjustment (BA), epipolar con-
`straints, and trilinear constraints. Comparisons were made
`via scene reconstruction accuracy, and they found that us-
`ing epipolar constraints (epipolar reprojection errors) only
`to be inferior to using all three metrics. The number of cor-
`respondences was limited (< 50), and using more is likely
`to significantly impact real-time performance. Interestingly,
`there were several advantages to using epipolar errors only.
`These include the ability to obtain strictly per-frame esti-
`mates without needing temporal correspondences and the
`invariance to non-rigid scenes, which is important for oper-
`ations in dynamic environments.
`In this paper, we contribute a continuous, online, extrin-
`sic re-calibration algorithm that operates in real-time using
`only sparse stereo correspondences and no temporal con-
`straints. The initial extrinsic estimates are obtained by min-
`imizing epipolar errors, and a Kalman Filter (KF) is used
`to limit over-fitting. The unique extrinsics estimated for
`each stereo pair enable temporal drift to be modeled and we
`
`978-1-4673-1228-8/12/$31.00 ©2012 IEEE
`
`1059
`
`APPL-1019/ Page 1 of 8
`Apple v. Corephotonics
`
`
`
`prohrErrorOM)
`
`
`
`flé)
`
`2. Stereo Geometry and Error Metric
`
`2.1. Stereo Pose and Epipolar Constraints
`
`The stereo extrinsics S = [RIt], is composed of a rota-
`tion R 6 50(3) and translation t E R3. It defines the pro-
`jection of a scene point X; = (X, Y, Z)T in the left camera,
`toX, intheright: Xr = RX1+t.
`Our re—calibration algorithm uses image coordinates and
`errors in the left and right stereo rectified images. Let 1'11 (—)
`fr, be a set of homogeneous scene point correspondences
`in a pair of rectified images, which are related to the scene
`points coordinates X;, X, by
`
`filzK1R1X1=lel
`
`m:mix=mi.
`
`(l)
`
`m
`
`where RbR, are rotations applied to each camera, and
`Kl, Kr are pinhole projection matrices with zero skew and
`equal focal lengths f. For convenience we assume that
`
`no
`f 0
`K1=Kr= 0 f '00
`0
`0
`1
`
`-
`
`(3)
`
`R1 and R, are selected to produce a rectified extrinsic pose
`=[—ng3|(b, 0_0),T] where b = ”t” is the original
`baseline, such thatX, =X1 +(—b, 00,)T (e.g. [12]).
`The rectified coordinates are related by
`
`In—zfil 2 I 6 B IO!2III6IBMR
`Til-[M
`
`D
`
`(mamT=@+amAfl
`
`_E
`_ Z ,
`
`m
`
`
`
`mm
`
` V In
`
`P!
`
`l-
`
`Ii
`
`(c)0utdoordatase12: f—— l781piz, 1024)<768pi:r:2 images.
`
`Figure 1: Mean (blue) and 21:30 standard deviation (red)
`epipolar errors for sparse correspondences for different
`stereo data. The supplied calibration (Pointgrey Bumble-
`bee2) was used for rectification in (b).
`
`show that with enough correspondences (e.g. 1000), epipo-
`lar errors alone are sufficient for good re-calibration. More-
`over, the approach is trivial to extend to multiple frames by
`combining correspondences. We validate the approach in
`simulation and on real stereo datasets by comparing visual
`odometry estimates with and without re-calibratjon, and re-
`consu'uction errors compared to offline calibration with a
`known target. We show the limitations of re-calibratjng the
`baseline length, and suggest methods to partially address
`these.
`
`where d is the disparity and Z the depth of a scene point.
`The stereo rectified epipolar constraint is simply r"); = 17,,
`which is independent of the depth and baseline. This
`can also be derived from the monocular essential matrix
`
`in, Efir = 0 [12].
`
`2.2. Calibration Error Metric
`
`~ For re-calibration, we decompose each rotation, R; and
`R,, as the product of two independent rotations:
`
`m=flm i=flfl-
`
`m
`
`They are the rotations R; and R; from the original stereo
`extrinsics S, and a rotation correction R; and R,. We start
`with a set of correspondences u; (—) u; detected in imagery
`rectified with R; and R’,. They are related to the correct
`rectified coordinates f1; H [1,, satisfying the epipolar con-
`straint by
`'T —l
`"'
`[ll 2 KlRl Kl
`
`a, z K,R$K;1u',.
`
`(6)
`
`I
`11,,
`
`For an estimate of R1 and E, the epipolar error e,- is
`
`_Rr[2] Kz—l“r,-
`R—_1, —f
`Rr[3]K11“r
`
`K:u.# ‘7’
`
`[3]
`
`II.
`
`1060
`
`APPL-1019l Page 2 cm
`
`
`
`
`
`Figure 2: The parameterization of the rotation angles Q. In
`the rectified pose, the cameras principal axes are parallel,
`and lie in the plane 11. The rectified coordinates are pro-
`jected to a plane orthogonal to H.
`
`where RT“El means row b of matrix RT. The re-calibration
`
`nction is the sum of squared epipolar errors 6:
`N
`
`objective
`
`argmin Z 6?,
`R1 1121'
`i:
`
`(8)
`
`giving the maximum likelihood estimate of R1 and 12,, from
`which the new S stereo extrinsics can be recovered:
`
`= (0., (it).
`Q: = [(RTR1)TI0] _) Q: = [Rf RiIO] ,
`0. = [(123 RM (in? nah—b, o, 0F] ,
`
`(9)
`(10)
`on
`
`where ((2,, Q?) is the projection Q; followed by (2,.
`As we can only use epipolar constraints, there is no
`means for correcting the stereo baseline estimate b. We
`introduce a method to partially address this in section 4.3.
`We restrict the optimized extrinsic pose by l DOF as a re-
`sult and instead omimize the 5 DOF vector of Euler angles
`<I> = lumbar, 13.,le by minimizing (81Refem'ng to
`Fig 2, the rotations R1 and R, are
`
`R1 = RX(’Y/2) R2031) Iii/(011)
`
`R. = Rx(—’Y/2) R206.) Ry(a.),
`
`(12)
`
`(13)
`
`where RA is the right-handed rotation about the axis A. Eu-
`ler angles are a suitable parameterization as the initial ex-
`trinsic estimate is assumed to be near the solution, and the
`
`expected changes in angles are small.
`
`3. Solution Covariance and Over Fitting
`
`In practice, the correspondences u; (—) 11'r will be cor-
`rupted with noise and the ability to accurately estimate Q
`from these is dependent on many factors. These include: the
`focal lengths, baseline, number of correspondences, spatial
`distribution of correspondences, and the depth of the scene
`points. Small rotation angles Q make over-fitting a concern
`
`
`—01
`‘o
`
`0
`
`Frame
`
`4500
`
`0
`
`Frame
`
`4500
`
`
`
`Frame
`
`4500
`
`Figure 3: Ground truth simulated change in angles Q’ (black
`line), and the initial optimized estimates Q (red dots).
`
`To test this, we simulated a time dependent change in the
`extrinsic pose of a b = 150mm baseline, 640 x 480 reso-
`lution (f = 1000pizc) stereo camera. For each stereo pair,
`1000 random correspondences were generated, and uncor-
`related Gaussian noise (a = 0.5piz) added. The disparity
`values ranged between 1 and 25pm, or equivalently depths
`Z between 3 and 150m. Fig. 3 shows the simulated angular
`changes (black), and the noisy estimates of Q (red).
`
`3.1. Solution Covariance
`
`Assuming that Q is an unbiased estimate of the
`solution Q’, with expected error covariance C =
`8 [(Q — Q’) (Q — Q’)T] , the Cramér-Rao lower bound C is
`greater than or equal to the inverse of the Fisher information
`matrix F, which is the score variance at the solution [15]:
`
`c = 5 [(Q — Q’) (Q — <I>')T] 2 F—1
`_
`alumni) T
`alumni)
`
`F ‘5 KT) (TN "5’
`
`(14)
`
`If the
`Where p(e|Q) is the conditional error probability.
`measurement errors of the imaged points are zero-mean
`Gaussian, then we can assume that 6 ~ N(0, a) at the solu-
`tion, and (15) can be written as
`
`
`
`Fir—2: (&,)T (35,-)
`
`(16)
`
`
`The summation in (16) is taken over all n correspondences,
`and the Jacobian 3:; is the change in error with respect to
`the change in parameters Q at the solution:
`
`which, for the simple case where Q = 0T is
`
`“0: = [ % —.z-... —',
`
`2
`2
`2
`x... ”—f ]_
`(13)
`
`1 061
`
`APPL-1019 I Page 3 018
`
`
`
` 0.040
`
`0.031
`
`-0.010
`
`Bi
`a,
`[3,
`'1
`
`0.03]
`-0.010
`0.030
`0.019
`
`3.142
`0.070
`3.127
`1.969
`
`0.070
`0.041
`0.070
`0.044
`
`(a) Pipe dataset (see Fig. la). All seem points are within
`300mm of the camera. det(C) = 8.452 x 10—42.
`
`
`C
`at
`31
`0r
`3r
`‘7
`a;
`178.414
`2.562
`178.884
`2.737
`-0.013
`Bl
`2.562
`0.967
`2.710
`0.979 0““
`0,-
`178.884
`2.710
`180.958
`2.979
`-0.013
`fir
`2.737
`0.979
`2.979
`1 .002
`0.1107
`'1
`—0.01 3
`0.007
`—0.01 3
`0.007
`0.001
`
`(b) Outdoor dataset 1 (see Fig. lb). Many scene points are > 10m
`from the camera. det(C) = 4.602 x 10—37.
`
`Table l: Covariance matrices for the correspondences in (a)
`Fig. la and (b) Fig. lb. The units are deg2/pi12, and all
`values have been scale by 1.0 x 103 for display purposes.
`
`From (6), (It) yl)T = (17.1 _ "'01 ‘61 _ v0)T and (If, yr)T =
`(11, — no, i), — vo)T. Due to its complexity we omit here the
`fill] Jacobian. For most perspective cameras with awarage
`fields of view the component % dominates the magnitude
`of J, suggesting that 'y will be the most reliable estimate.
`Table 1 shows the covariance matrices for the sets of cor-
`
`respondences in Fig. la and Fig. lb. The variances of the
`angles (leading diagonal) differ significantly in the exam-
`ples, and although the number of correspondences used was
`similar, the determinant of C for the pipe example is several
`orders of magnitude smaller than the outdoor 1 example.
`For the outdoor 1 example, the majority of the scene points
`are distant, and there is a large covariance between the a an-
`gles (a; and a" highlighted in blue), as well as the )3 angles
`([31 and [3,, highlighted in red)‘. This shows that it is pri-
`marily the relative angles 6a = a; — ozr and 613 = [31 — )3,
`being estimated (see Fig. 5). For example, if points at an
`infinite distance are observed in a perfectly rectified stereo
`pair, such that uf = u’r, the epipolar errors 2 6? will be
`zero for any rotations where [3; = fl, (6)3 = 0). In eifect
`this is attempting to estimate a small translation using points
`at infinity (Fig. 4). It is only when )3; 7é )3, that Z a? > 0.
`
`4. Kalman Filter Re-Calibration
`
`Given the noisy estimates (I) of the extrinsic pose ob-
`tained from the non-linear minimization of the epipolar er-
`rors, we use a KF [13] to produce a smoothed estimate <1).
`We use a stationary process model so that we have at time k
`(1),, = <1>k_1, although more complex models could be used.
`I
`-
`-
`-
`o
`o
`0
`0
`Foranyporntatrnlimtyur; =u;.,so Hui, = Dairand 33‘.- = 3'56?
`
`0.030
`
`3.127
`0.070
`3.117
`1.961
`
`0.019
`
`1.969
`0.044
`1.961
`1.235
`
`
`
`Figure 4: For a point at infinity, only relative angles can be
`estimated, for example 6/3 = fl; — fir. Rotating the cameras
`by the same angle )3; = [3, (6b = 0) is approximately equiv-
`alent to adding a small translation change 6t, and estimating
`small translations with distal points is problematic.
`
`The lower bound Ck evaluated at time k is used as the mea-
`surement noise covariance. The process noise covariance Q
`is set to
`
`1r
`
`2
`
`T
`
`Q=(m) (W) D1ag(1,1,1,1,0.25),
`
`2
`
`.
`
`(19)
`
`where fps is frames per second, and -r is the selected angu-
`lar rate of the process noise with units of degrees per minute.
`
`4.1. Update Equations
`
`The time update predictions for the camera state (1);, er-
`ror covariance 'Pk— , and Kalman gain [C]. are
`
`i; = $1.4
`
`P; = PH + Q
`1c; = P; (P; + ct)“,
`
`(20)
`
`(21)
`(22)
`
`from which the updated estimate of the camera state (in. and
`error covariance P]. are evaluated as
`
`(in, = <i>; + IC; (<I> — 6);)
`Pk = (15x5 — 1C1.) 'PE-
`
`(23)
`(24)
`
`4.2. Initializing the State Covariance
`
`We estimate the initial state covariance ’szo by getter-
`ating 50 perfectly rectified frames of checkerboard scene
`points (120 points per frame). Random poses of the carn-
`eras with respect to the checkerboard target are simulated.
`Gaussian noise is then added to each image coordinate with
`a = 0.25pim. The reprojection errors are defined as a func-
`tion of the Euler angles (6) — the y error component is
`(7). The initial estimate ’Pk=o is calculated from the lower
`
`bound of the solution uncertainty.
`Figure 5 shows the KF results <i> obtained from the orig-
`inal optimized estimates <I> in the example in Fig. 3 using
`the process noise rate 7' = 1e‘3. It is clear from Fig. 5 that
`the KF estimates of the individual angles aha" ,61, 5, do
`not accurately estimate the simulated angles. However, the
`dijferential angles 60: = a; — a, and 61‘] = fl; — B, shown
`in the same figure are close approximations of the simulated
`differential angles. Note that 7 is also a differential angle,
`and its filter estimate is very close to the simulated values.
`
`1 062
`
`APPL-1019 I Page 4 of 8
`
`
`
`
`
`Figure 5: Ground truth angles <I> (black) and KP estimates (i)
`(red)— original estimates (1) shown in Fig. 3. The differential
`angles 60: = a; — an 6;? = fl; — 3, are also shown.
`
`4.3. Baseline Estimation
`
`The true baseline distance cannot be measured from
`
`stereo correspondences, however, it may be estimated using
`additional information. Examples include inertial or wheel
`odometry, fixed reference fiduciary markers, or structured
`light measurement observable in both images. Here, we
`used the following per-frame method to obtain the results in
`section 5. We assume that triangulated distance to a scene
`point X,- should be the same using both the original and re-
`calibrated extrinsics. We denote these I,- and 1;, respectively.
`Since distances are proportional to the triangulated depths
`(see 4), we estimate the new baseline b as
`
`.
`
`b
`
`b=E§lf
`
`" z,-
`\
`.-
`
`(25)
`
`The summation is only taken over the nearest n = 5 stereo
`correspondences each frame as the nearest points are the
`most suitable for resolving translation magnitudes.
`
`5. Experiments and Results
`
`To evaluate the approach, we present a range of experi-
`mental online re-calibration results including visual odom-
`etry for the datasets in Fig. 1 (see table 2), and scene recon-
`struction using the dataset described in Sect. 5.
`For all datasets, Harris comers [10] were detected in
`image pairs rectified using the original extrinsics. Sparse
`stereo correspondences were found by thresholding the co-
`sine similarity between SIFI' descriptors [14] for each fea-
`ture. Although sub-pixel accuracy Harris comers were
`found, Zero-Normalized Cross Correlation (ZNCC) was
`
`used to refine the correspondences and improve accuracy.
`
`Outdoor2
`Outoorl
`Pipe
`Assembled Commercial Assembled
`1033x768
`
`7567
`7.5
`1781
`342
`885
`6247
`
`
`
`Resolution
`
`#images
`fps
`f (pix)
`b (m)
`# stereo
`length (m)
`
`Table 2: Summary of the visual odometry datasets (see also
`Fig. l). The notation # stereo is the mean number of stereo
`correspondences found per frame. The camera parameters
`are given for the stereo rectified images.
`
`Importantly, we constrain the right stereo feature to an
`epipolar box and not a line
`For the visual odometry results, temporal correspon-
`dences between adjacent stereo pairs were found by thresh-
`olding the ambiguity ratio [14] between SIFI' descrip-
`tors. Visual odometry estimates were computed using
`both the original and the re—calibrated stereo extrinsic pose.
`The 6 DOF change in pose Q between the left camera
`frames was estimated using Perspective-n-Points (PnP) and
`RANSAC [7], followed by non-linear minimization of the
`image reprojection errors. The KF process noise was set to
`T = 0.001 for each dataset, and 131.20 estimated using the
`method in Sect. 4.2.
`
`Pipe Dataset The stereo camera, original epipolar errors,
`and sample rectified imagery for the pipe dataset are shown
`in Fig.la. As described in [9], the camera observed the up-
`per surface of a 400mm diameter steel pipe as it moved
`forwards and then in reverse through the pipe. Light-
`ing via nine LEDs was mounted to the camera housing,
`which raised the temperature of the camera housing from
`25 — 30°C ambient at the start to 27 — 38°C at the end.
`
`We attribute the time dependent change in epipolar errors to
`thermal expansion.
`The KF estimates of the camera rotation angles, visual
`odometry estimates, and 3D point clouds with original and
`re-calibration extrinsics are shown in Fig.6a, 6b, and 6c.
`Although GPS ground truth is unavailable, all scene points
`belong to the same curved surface, so the reconstructions in
`both directions should align. There is a large misalignment
`using the original extrinsic calibration, which is improved
`significantly using the online re-calibration estimates.
`
`Outdoor Dataset (Camera 1) The first outdoor dataset
`
`(Fig.1b) includes imagery firm a short baseline Pointgrey
`Bumblebee2 stereo camera. The rectified imagery was cre-
`ated using the supplied calibration data. The KF estimates
`of the extrinsics are provided in Fig.7a, and the compari-
`
`1 063
`
`APPL-1019 I Page 5 of 8
`
`
`
`
`
`Tlllo (ninja)
`
`rm (trim)
`
`(a) KF estimates of the rotations angles
`
`
`2 0' “W;-o,2: . _ .
`
`
`
`—4
`4i
`-2
`—1
`0
`X(moton)
`
`g 02*
`g
`0. 4m
`; —0.2-
`.
`.
`,
`—4
`—3
`—2
`-1
`0
`X(melers)
`
`(b) V0 result with pipe axis in X direction: original (top) and
`re-ealibrated (bottom).
`
`0.3
`
`0.3
`
`2
`a,
`
`A 02% A 02‘
`
`e
`0
`
`2‘
`
`0,1 .
`
`7g 01 ,
`
`Turpin-m)
`
`4
`Tm( '
`
`I 1012
`
`Mob
`
` El 1012
`(0'9) "lfiz'iééioiz -
`
`Tammi-ms)
`
`'4'étu'ou'z
`Timur-lbs)
`
`(a) ”estimates of the rotations angles.
`
`
`
`-0.1-
`"03”!“ o
`X (meters)
`
`'-o.2-01° °‘
`Y (meters)
`
`.01 » -
`_
`'03030-1'0 ‘_’
`X (meters)
`
`‘
`
`‘0 0‘
`Y (meters)
`
`(b) V0 (red), and the SH: GPS (blue). 1211 column is the original
`calibration, and right colunm the KF re-ealibration.
`
`(c) V0 result at tie start/end: original (lefl) and re-ealibrated (right).
`Tie points all belong to the same surface.
`
`Figure 7: Results for 5.48km outdoor dataset 1 (commercial
`stereo camera). There are a total of 4 anti-clockwise loops.
`
`Figme 6: Results for the pipe dataset. The black line near
`the surface points in (c) connects the same ground truth
`marker, reconstructed at the start and end of the dataset
`The Euclidean errors in the reconstructed coordinate are:
`
`100.1mm for original calibration, 15.1mm for re-calibrated.
`
`son of the visual odometry estimates using the original and
`re-calibrated extrinsic pose are shown in Fig.7b. The 5Hz
`GPS (non-RTK) measurements collected are included as
`
`ground truth. The visual odometry position estimates were
`linearly interpolated at the time stamps for each of the 1671
`GPS readings2, and then aligned with the GPS by minimiz-
`ing the sum of squared distances. The average absolute dis-
`tance errors were: 0.781m using the original calibration,
`and 0.485m using online recalibration.
`
`(Camera 2) The second outdoor
`Outdoor Dataset
`dataset (see Fig.1c) uses a custom 342mm baseline stereo
`camera. Intrinsic and extrinsic parameters were calibrated
`offline and then we manually flexed the camera to alter the
`extrinsics. The KF estimates of the angles and visual odom-
`etry results are provided in Fig.8. GPS ( 3045 points at
`5Hz) formed the ground truth using the same techniques
`described previously. The absolute average distance errors
`were: 1.632m using the original calibration, and 0.700m
`using online recalibration. As was the case with the first
`outdoor dataset, recalibration reduced the rotational drift.
`
`Indoor Scene Fig. 9a shows the stereo camera and a sam-
`ple image from the left camera used for the indoor con-
`trolledtest. The stereoheadusesthe samecamerasasinthe
`
`2112 GPS z-compomntwassettozeroastle 3D solutionwasrnneli—
`able — the operating environment was approximately planar.
`
`previous experiment, but with a baseline of 220mm and a
`configurable right camera pose. We collected three datasets
`
`1064
`
`APPL-1019 I Page 6 of 8
`
`
`
` -O.1~
`
`4'56 2 4 6 51012141618202
`TIM (muutos)
`
`411511 2 I 6 810121416182022
`Tum. (minulos)
`
`0.65
`
`0.5
`
`
`
`'O 2 4 6.811312141613202
`tum(mnutos)
`
`420 2 4 6 810121416182022
`11m. (mmulos)
`
`(a) [CF estimates of the rotations angles
`
`
`
`(b) V0 (red), and the 51-12 GPS (blue). Lefi column is the origi-
`nal calibration, and right calm the KF recalibration.
`
`Figure 8: Results for 6.25km outdoor dataset 2.
`
`(l, 2 and 3) observing the same indoor scene, each with a
`different right camera pose. Ground truth estimates of the
`extrinsic pose for each set were obtained using a checker-
`board target. Dataset 1 was chosen as the reference calibra-
`tion. The stereo correspondences for each set were found
`in rectified imagery using this reference calibration. The
`online KF recalibration was used to estimate the changes
`from the reference calibration, as shown in Fig. 9b. The
`final KF results are compared to the ground truth in table 3.
`As expected,
`the performance degrades with large
`changes from the reference calibration. Although the errors
`for a; and 0:, appear large for set 1, the resulting change in
`the stereo disparity and scene reconstruction remained rel-
`atively small (see table 3). The standard deviation of the
`disparity (pix) is similar to the checkerboard calibration re-
`projection values of (0,, U”) = (0.231, 0.212)1n':c which is
`itself only an estimate of the true extrinsic pose.
`To better visualize the performance of the re-calibration,
`the overhead views of the scene reconstruction for each
`
`
`
`(a) The stereo camera and sample inlay.
`
`"h
`
`(b) The raw online calibration angle estimates (red points) and KP estimates
`(solid lines). Each row shows the differential angle estimates for each ofthe
`3 datasets (changing right camera pose).
`
` ..
`Y(matm)
`
`Y(meters)Y(motm)
`
`(c) The top view ofthe scene reconstructions for set 1 (blue), set 2 (red) and
`set 3 (green) using: original calibration (top row); checkerboard calibration
`(middle row); online KF recalibration (bottom row).
`
`Figure 9: Hardware and results for the indoor dataset.
`
`the first row uses the reference
`set are shown in Fig. 9c:
`calibration for each set; the second row uses the checker-
`board calibration; and the third row uses the online re-
`
`calibration. These reconstructions were produced using the
`exact sarne stereo correspondences detected in a single im-
`age pair from each set, and are all in the left camera co-
`ordinate frame. The results using online re-calibration are
`significantly more consistent than those using the reference
`
`1 065
`
`APPL—1019 I Page 7 of 8
`
`
`
`opt1
`calib1
`0.00 -0.294
`0.00 -0.328
`0.00
`0.033
`0.00
`0.051
`0.00
`0.050
`0.00
`0.001
`0.00
`0.002
`
`opt2
`calib2
`-0.362 -0.546
`0.456 0.216
`-0.818 -0.762
`-0.127 -0.108
`0.588 0.600
`-0.716 -0.708
`-0.565 -0.566
`
`opt3
`calib3
`-1.137 0.829
`1.613 3.842
`-2.750 -3.014
`-0.367 -2.481
`1.369 -0.940
`-1.736 -1.541
`-1.123 -1.179
`
`αl
`αr
`δα
`βl
`βr
`δβ
`γ
`
`Table 3: The changes in angles from the reference calibra-
`tion using: offline checkerboard calibration (calib); online
`re-calibration (opt). All values have units of degrees. The
`subscripts calibn and optn refer to the image set.
`
`Euclidean Error (mm)
`Euclidear Error (%)
`Disparity Difference (pix)
`Disparity Difference (%)
`
`mean
`24.80
`0.436
`1.076
`1.281
`
`std. dev.
`22.67
`0.329
`0.212
`0.502
`
`Table 4: Statistics for the Euclidean reconstruction and dis-
`parity differences between the checkerboard calibration and
`online re-calibration for set 1.
`
`calibration for each set. Observe that there are some in-
`consistencies in the reconstructions for each set using the
`checkerboard calibration. Again, it too is only an estimate
`of the true extrinsic pose.
`
`6. Conclusions
`We presented an algorithm for online continuous stereo
`extrinsic re-calibration that estimates a separate extrinsic
`pose for each image pair using sparse stereo correspon-
`dences. An initial 5 DOF extrinsic pose estimate (relative
`camera orientations/fixed baseline) is found by minimiz-
`ing stereo epipolar errors, and then refined using a Kalman
`Filter (KF). The KF measurement covariance is the lower
`bound of the per-frame solution uncertainty, which is de-
`pendent on the number and distribution of the scene point
`correspondences, as well as the camera focal length and
`stereo baseline. If only a small number of stereo correspon-
`dences can be found, they simply can be combined over
`multiple frames before estimating the extrinsic pose as no
`temporal constraints are used. Our results for visual odom-
`etry using a range of real datasets in different environments
`show that accuracy is improved using our technique com-
`pared to the original extrinsic calibration. Our future work
`will explore improved methods for estimating the change in
`baseline length.
`
`7. Acknowledgements
`This paper was made possible by the support of NPRP
`grants (# NPRP 08-589-2-245 and 09-980-2-380) from the
`
`Qatar National Research Fund. The statements made herein
`are solely the responsibility of the authors.
`References
`[1] M. Bj¨orkman and J. Eklundh. Real-time epipolar geometry
`IEEE Trans. Pattern
`estimation of binocular stereo heads.
`Analysis and Machine Intelligence, 24(3):425–432, 2002. 1
`[2] G. Carrera, A. Angeli, and A. Davison. SLAM-based au-
`In Int.
`tomatic extrinsic calibration of a multi-camera rig.
`Conference on Intelligent Robots and Systems, 2011. 1
`[3] T. Clarke and J. Fryer. The development of camera cal-
`ibration methods and models. Photogrammetric Record,
`16(91):51–66, April 1998. 1
`[4] T. Dang, C. Hoffman, and C. Stiller. Continuous stereo self-
`IEEE Transac-
`calibration by camera parameter tracking.
`tions on Image Processing, 18(7):1536–1550, July 2009. 1
`[5] T. Dang and C. Hoffmann. Tracking camera parameters of
`an active stereo rig. In Pattern Recognition, volume 4174,
`pages 627–636. Springer Berlin / Heidelberg, 2006. 1
`[6] A. Davison, I. Reid, N. Molton, and O. Stasse. MonoSLAM:
`real-time single camera SLAM. IEEE Trans. Pattern Analy-
`sis and Machine Intelligence, 29(6):1–16, June 2007. 1
`[7] M. A. Fischler and R. C. Bolles. Random sample consen-
`sus: A paradigm for model fitting with applications to image
`analysis and automated cartography. Comms. of the ACM,
`pages 381–395, 1981. 5
`[8] A. Fitzgibbon. Simultaneous linear estimation of multiple
`view geometry and lens distortion. In IEEE CVPR, 2001. 1
`[9] P. Hansen, H. Alismail, B. Browning, and P. Rander. Stereo
`visual odometry for pipe mapping. In IROS, 2011. 5
`[10] C. Harris and M. Stephens. A combined corner and edge
`In Proceedings Fourth Alvey Vision Conference,
`detector.
`pages 147–151, 1988. 5
`[11] R. Hartley and S. B. Kang. Parameter free radial distortion
`correction with centre of distortion estimation. In Interna-
`tional Conference on Computer Vision, 2005. 1
`[12] R. Hartley and A. Zisserman. Multiple View Geometry in
`Computer Vision. Cambridge Univ. Press, 2003. 2
`[13] R. E. Kalman. A new approach to linear filtering and predic-
`tion problems. Transactions of the ASME–Journal of Basic
`Engineering, 82(Series D):35–45, 1960. 4
`[14] D. Lowe. Distinctive image features from scale-invariant
`keypoints. IJCV, 60(2):91–110, 2004. 5
`[15] J. Shao. Mathematical Statistics. Springer Texts in Statistics.
`Springer-Verlag, second edition, 2003. 3
`[16] R. Tsai. A versatile camera calibration technique for high-
`accuracy 3D machine vision metrology using off-the-shelf
`TV cameras and lenses. IEEE Journal of Robotics and Au-
`tomation, RA-3(4):323–344, August 1987. 1
`[17] Z. Zhang. On the epipolar geometry between two images
`In Proceedings International Confer-
`with lens distortion.
`ence on Pattern Recognition, pages 407–411, 1996. 1
`[18] Z. Zhang. Flexible camera calibration by viewing a plane
`from unknown orientations. In ICCV, 1999. 1
`
`1066
`
`APPL-1019 / Page 8 of 8
`
`