`
`319
`
`Analysis of Head Pose Accuracy
`in Augmented Reality
`
`William Hoff, Member, IEEE, and Tyrone Vincent, Member, IEEE
`
`Abstract—A method is developed to analyze the accuracy of the relative head-to-object position and orientation (pose) in augmented
`reality systems with head-mounted displays. From probabilistic estimates of the errors in optical tracking sensors, the uncertainty in
`head-to-object pose can be computed in the form of a covariance matrix. The positional uncertainty can be visualized as a 3D ellipsoid.
`One useful benefit of having an explicit representation of uncertainty is that we can fuse sensor data from a combination of fixed and
`head-mounted sensors in order to improve the overall registration accuracy. The method was applied to the analysis of an
`experimental augmented reality system, incorporating an optical see-through head-mounted display, a head-mounted CCD camera,
`and a fixed optical tracking sensor. The uncertainty of the pose of a movable object with respect to the head-mounted display was
`analyzed. By using both fixed and head mounted sensors, we produced a pose estimate that is significantly more accurate than that
`produced by either sensor acting alone.
`
`Index Terms—Augmented reality, pose estimation, registration, uncertainty analysis, error propagation, calibration.
`
`(cid:230)
`
`1 INTRODUCTION
`
`AUGMENTED reality is a term used to describe systems in
`
`which computer-generated information is superim-
`posed on top of the real world [1]. One form of enhance-
`ment is to use computer-generated graphics to add virtual
`objects (such as labels or wire-frame models) to the existing
`real world scene. Typically, the user views the graphics
`with a head-mounted display (HMD), although some
`systems have been developed that use a fixed monitor
`(e.g., [2], [3], [4], [5]). The combining of computer-generated
`graphics with real-world images may be accomplished with
`either optical [6], [7], [8] or video technologies [9], [10].
`A basic requirement for an AR system is to accurately
`align virtual and real-world objects so that they appear to
`coexist in the same space and merge together seamlessly.
`This requires that the system accurately sense the position
`and orientation (pose) of the real world object with respect
`to the user’s head. If the estimated pose of the object is
`inaccurate, the real and virtual objects may not be registered
`correctly. For example, a virtual wire-frame model could
`appear to float some distance away from the real object.
`This is clearly unacceptable in applications where the user
`is trying to understand the relationship between real and
`virtual objects. Registration inaccuracy is one of the most
`important problems limiting augmented reality applica-
`tions today [11].
`This paper shows how one can estimate the registration
`accuracy in an augmented reality system, based on the
`characteristics of the sensors used in the system. Only
`quasi-static registration is considered in this paper; that is,
`objects are stationary when viewed, but can freely be
`
`. The authors are with the Engineering Division, Colorado School of Mines,
`1500 Illinois St., Golden, CO 80401.
`E-mail: {whoff, tvincent}@mines.edu.
`
`Manuscript received 1 Feb. 1999; revised 6 July 2000; accepted 10 July 2000.
`For information on obtaining reprints of this article, please send e-mail to:
`tvcg@computer.org, and reference IEEECS Log Number 109094.
`
`moved. We develop an analytical model and show how the
`model can be used to properly combine data from multiple
`sensors to improve registration accuracy and gain insight
`into the effects of object and sensor geometry and
`configuration. A preliminary version of this paper was
`presented at the First International Workshop on Augmen-
`ted Reality [12].
`
`1.1 Registration Techniques in Augmented Reality
`To determine the pose of an object with respect to the user’s
`head, tracking sensors are necessary. Sensor technologies
`that have been used in the past
`include mechanical,
`magnetic, acoustic, and optical [13]. We concentrate on
`optical sensors (such as cameras and photo-effect sensors)
`since they have the best overall combination of speed,
`accuracy, and range [7], [14], [15].
`There has been much work in the past in the photo-
`grammetry and computer vision fields on methods for
`object recognition and pose estimation from images. Some
`difficult problems (which are not addressed here) include
`how to extract features from the images and determine the
`correspondence between extracted image features and
`features on the object. In many practical applications, these
`problems can be alleviated by preplacing distinctive optical
`targets, such as light emitting diodes (LEDs) or passive
`fiducial markings, in known positions on the object. The 3D
`locations of the target points on the object must be carefully
`measured, in some coordinate frame attached to the object.
`In this paper, we will assume that point features have been
`extracted and the correspondences known so that the only
`remaining problem is to determine the pose of the object
`with respect to the HMD.
`One issue is whether the measured points are two-
`dimensional (2D) or three-dimensional (3D). Simple passive
`optical sensors, such as video cameras and photo-effect
`sensors, can only sense the direction to a target point and
`not its range. The measured data points are 2D, i.e., they
`
`1077-2626/00/$10.00 (cid:223) 2000 IEEE
`Authorized licensed use limited to: Everything Demo User. Downloaded on December 29,2021 at 14:43:37 UTC from IEEE Xplore. Restrictions apply.
`
`META 1025
`META V. THALES
`
`
`
`320
`
`IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 6, NO. 4, OCTOBER-DECEMBER 2000
`
`represent the locations of the target points projected onto
`the image plane. On the other hand, active sensors, such as
`laser range finders, can directly measure direction and
`range, yielding fully 3D target points. Another way to
`obtain 3D data is to use triangulation; for example, by using
`two or more passive sensors (stereo vision). The accuracy of
`locating the point is improved by increasing the separation
`(baseline) between the sensors.
`Once the locations of
`the target points have been
`determined (either 2D or 3D), the next step is to determine
`the full six degree-of-freedom (DOF) pose of the object with
`respect to the sensor. Again, we assume that we know the
`correspondence of the measured points to the known 3D
`points on the object model. If one has 3D point data, this
`procedure is known as the “absolute orientation” problem
`in the photogrammetry literature. If one has 2D target
`points, this procedure is known as the “exterior orientation”
`problem [16].
`Another issue is where to locate the sensor and target.
`One possibility is to mount the sensor at a fixed known
`location in the environment and put targets on both the
`HMD and on the object of interest (a configuration called
`“outside-in” [14]). We measure the pose of the HMD with
`respect to the sensor, and the pose of the object with respect
`to the sensor, and derive the relative pose of the object with
`respect to the HMD. Another possibility is to mount the
`sensor on the HMD and the target on the object of interest (a
`configuration called “inside-out”). We measure the pose of
`the object with respect to the sensor and use the known
`sensor-to-HMD pose to derive the relative pose of the object
`with respect to the HMD. Both approaches have been tried
`in the past and each has advantages and disadvantages.
`With a fixed sensor (outside-in approach), there is no
`limitation on size and weight of
`the sensor. Multiple
`cameras can be used, with a large baseline, to achieve
`highly accurate 3D measurements via triangulation. For
`example, commercial optical measurement systems, such as
`Northern Digital’s Optotrak, have baselines of approxi-
`mately 1 meter and are able to measure the 3D positions of
`LED markers to an accuracy of approximately 0.15 mm. The
`orientation and position of a target pattern is then derived
`from the individual point positions. A disadvantage with
`this approach is that head orientation must be inferred
`indirectly from the point positions.
`The inside-out approach has good registration accuracy
`because a slight rotation of a head-mounted camera causes
`a large shift of a fixed target in the image. However, a
`disadvantage of this approach is that large translation
`errors occur along the line of sight of the camera. To avoid
`this, additional cameras could be added with lines of sight
`orthogonal to each other.
`
`1.2 Need for Accuracy Analysis and Fusion
`In order to design an augmented reality system that meets
`the registration requirements for a given application, we
`would like to be able to estimate the registration accuracy
`for a given sensor configuration. For example, we would
`like to estimate the probability distribution of the 3D error
`distance between a generated virtual point and a corre-
`sponding real object point. Another measure of interest is
`the overlay error; that is, the 2D distance between the
`
`projected virtual point and the projected real point on the
`HMD image plane, which is similar to the image alignment
`error metrics that appear in other work [7], [9], [17].
`Another reason to have an analytical representation of
`uncertainty is for fusing data from multiple sensors. For
`example, data from head-mounted and fixed sensors might
`be combined to derive a more accurate estimate of object-to-
`HMD pose. The uncertainties of these two sensors might be
`complementary so that, by combining them, we can derive a
`pose that is much more accurate than that from each sensor
`used alone. In order to do this, a mathematical analysis is
`required of uncertainties associated with the measurements
`and derived poses. Effectively, we can create a hybrid
`system that combines the “inside-out” and “outside-in”
`approaches.
`
`1.3 Relationship to Past Work and Specific
`Contributions
`Augmented reality is a relatively new field, but the problem
`of registration has received ample attention, with a number
`of authors taking an optical approach. Some researchers
`have used photocells or photo-effect sensors which track
`light-emitting diodes (LEDs) placed on the head, object of
`interest, or both [7], [14], [15]. Other researchers have used
`cameras and computer vision techniques to detect LEDs or
`passive fiducial markings [5], [8], [18], [19], [20], [21]. The
`resulting detected features, however they are obtained, are
`used to determine the relative pose of the object to the
`HMD. A number of researchers have evaluated their
`registration accuracy experimentally [17], [7], with Monte-
`Carlo simulations [19], or both [18]. However, no one has
`studied the effect of sensor-to-target configuration on
`registration accuracy.
`In this paper, we develop an
`analytical model to show how sensor errors propagate
`through to registration errors, given a statistical distribution
`of the sensor errors and the sensor-to-target configuration.
`Some researchers avoid the problem of determining pose
`altogether and instead concentrate on aligning the 2D image
`points using affine projections [22], [23]. Although this
`approach works well for video-based augmented reality
`systems, in optical see-through HMD systems, it would not
`work as well because the image as seen by the head-
`mounted camera may be different than the image seen by
`the user directly through the optical combiner.
`A number of researchers have developed error models
`for HMD-based augmented reality systems. Some research-
`ers have looked at the optical characteristics of HMDs in
`order to calculate viewing transformations and calibration
`techniques [24], [25]. Holloway [17] analyzed the causes of
`registration error in a see-through HMD system, due to the
`effects of misalignment, delay, and tracker error. However,
`he did not analyze the causes of tracker error, merely its
`effect on the overall registration accuracy. This work, on the
`other hand, focuses specifically on the tracker error and
`does not look at the errors in other parts of the system, or
`attempt to derive an overall end-to-end error model.
`In the computer vision field, the problem of determining
`the position and orientation from a set of given point or line
`correspondences has been well-studied. Some researchers
`have developed analytical expressions for the uncertainty of
`a 3D feature position as derived from image data [26]. Other
`
`Authorized licensed use limited to: Everything Demo User. Downloaded on December 29,2021 at 14:43:37 UTC from IEEE Xplore. Restrictions apply.
`
`META 1025
`META V. THALES
`
`
`
`HOFF AND VINCENT: ANALYSIS OF HEAD POSE ACCURACY IN AUGMENTED REALITY
`
`321
`
`2 BACKGROUND ON POSE ESTIMATION
`2.1 Representation of Pose
`
`researchers have evaluated the accuracy of pose estimation
`algorithms using Monte Carlo simulations [27], [28], [29],
`[30]. Few researchers have addressed the issue of error
`propagation in pose estimation. We follow the method
`suggested by Haralick and Shapiro [16], who outline how to
`derive the uncertainty of an estimated quantity (such as a
`pose) from the given uncertainties in the measured data.
`Kalman filtering [31] is a standard technique for optimal
`estimation. It has been used to estimate head pose in
`augmented and virtual reality applications [7], [32], [33].
`From a sequence of sensor measurements, these techniques
`also estimate the uncertainty of the head pose. This is
`similar to the work described in this paper in the sense that
`a Kalman filter can be interpreted as a method for obtaining
`a maximum likelihood estimate of the state in a dynamic
`system, given input-output data [34]. Our system is static
`and so we do not have a model of the state dynamics. We
`fuse data from two measurements, rather than data from a
`measurement and a prediction from past data.
`In this work, a method is developed to explicitly
`compute uncertainties of pose estimates, propagate these
`uncertainties from one coordinate system to another, and
`fuse pose estimates from multiple sensors. The contribution
`of this work is the application of this method to the
`registration problem in augmented reality. Specifically:
`
`.
`
`The method shows how to estimate the uncertainty
`of object-to-HMD pose from the geometric config-
`uration of the optical sensors and the pose estima-
`tion algorithms used. To help illustrate the method,
`we describe its application to a specific augmented
`reality system.
`. We show how data from multiple different sensors
`can be fused, taking into account the uncertainties
`associated with each, to yield an improved object-to-
`HMD pose. In particular, it is shown that a hybrid
`sensing system combining both head-mounted and
`fixed sensors can improve registration accuracy over
`that from either sensor used alone.
`. We demonstrate mathematically some insights re-
`garding the characteristics of registration sensors. In
`particular, we show that the directions of greatest
`uncertainty for a head-mounted and fixed sensor are
`nearly orthogonal and that these can be fused in a
`simple way to improve the overall accuracy.
`The remainder of this paper is organized as follows:
`Section 2 provides a background on pose estimation, with a
`description of the terminology used in the paper. Section 3
`develops the method for estimating the uncertainty of a
`pose, transforming it from one coordinate frame to another,
`and fusing two pose estimates. Section 4 describes the
`particular experimental augmented reality system that was
`used to test the registration method—that of a surgical aid.
`Section 5 illustrates the application of the method to the
`surgical aid system. A typical configuration is analyzed and
`the predicted accuracy of the combined (hybrid) pose
`estimate is found to be much improved over that obtained
`by either sensor alone. Finally, Section 6 provides a
`discussion.
`
`The pose of a rigid body {A} with respect to another
`coordinate system {B} can be represented by a six element
`Ax (cid:136) (cid:133)BxAorg; ByAorg; BzAorg; (cid:11); (cid:12); (cid:134)T , where BpAorg (cid:136)
`vector B
`(cid:133)BxAorg; ByAorg; BzAorg(cid:134)T
`in
`frame {A}
`is the origin of
`frame {B}, and ((cid:11), (cid:12), ) are the angles of rotation of {A}
`about the (z, y, x) axes of {B}. An alternative representation
`of orientation is to use three elements of a quaternion; the
`conversion between Euler angles and quaternions is
`straightforward [35].
`Equivalently, pose can be represented by a 4 (cid:2) 4
`homogeneous transformation matrix [35]:
`
`(cid:18)
`
`(cid:19)
`
`(cid:133)1(cid:134)
`H (cid:136) B
`AR BpAorg
`0
`1
`AR is the 3 (cid:2) 3 rotation matrix corresponding to the
`where B
`angles ((cid:11), (cid:12), ). In this paper, we shall use the letter x to
`designate a six-element pose vector and the letter H to
`designate the equivalent 4 (cid:2) 4 homogeneous transforma-
`tion matrix.
`Homogeneous transformations are a convenient and
`elegant representation. Given a homogeneous point
`Ap (cid:136) (cid:133)AxP ; AyP ; AzP ; 1(cid:134)T , represented in coordinate system
`{A}, it may be transformed to coordinate system {B} with a
`simple matrix multiplication Bp (cid:136) B
`AHAp. The homoge-
`neous matrix representing the pose of frame {B} with
`respect to frame {A} is just the inverse of the pose of {A}
`BH (cid:136) B
`AH(cid:255)1. Finally, if we know the
`with respect to {B}, i.e., A
`pose of {A} with respect to {B} and the pose of {B} with
`respect to {C}, then the pose of {A} with respect to {C} is
`AH (cid:136) C
`easily given by the matrix multiplication C
`BHB
`AH.
`
`;
`
`BA
`
`2.2 Pose Estimation Algorithms
`The 2D-to-3D pose estimation problem is to determine the
`pose of a rigid body, given an image from a single camera
`(this is also called the “exterior orientation” problem in
`photogrammetry). Specifically, we are given a set of 3D
`known points on the object (in the coordinate frame of the
`object) and the corresponding set of 2D measured image
`points from the camera, which are the perspective projec-
`tions of the 3D points. The internal parameters of the
`camera (focal length, principal point, etc.) are known. The
`goal is to find the pose of the object with respect to the
`camera, cam
`obj x. There are many solutions to the problem; in
`this work, we used the algorithm described by Haralick and
`Shapiro [16], which uses an iterative nonlinear least squares
`method. The algorithm effectively minimizes the squared
`error between the measured 2D point locations and the
`predicted 2D point locations.
`The 3D-to-3D pose estimation problem is to determine
`the pose of a rigid body, given a set of 3D point
`measurements1 (this is also called the “absolute orientation”
`problem in photogrammetry). Specifically, we are given a
`set of 3D known points on the object {objpi} and the
`
`1. These 3D point measurements may have been obtained from a
`previous triangulation process using a stereo vision sensor.
`
`Authorized licensed use limited to: Everything Demo User. Downloaded on December 29,2021 at 14:43:37 UTC from IEEE Xplore. Restrictions apply.
`
`META 1025
`META V. THALES
`
`
`
`(cid:133)4(cid:134)
`
`M1
`..
`
`. M
`
`n
`
`(cid:1)p1
`.
`..
`(cid:1)pn
`
`0B@
`
`(cid:255)
`
`IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 6, NO. 4, OCTOBER-DECEMBER 2000
`
`1CA(cid:1)x ) (cid:1)P (cid:136) M(cid:1)x:
`0B@
`1CA (cid:136)
`(cid:1)(cid:255)1MT (cid:1)P. The covariance matrix of x is given
`(cid:1)
`(cid:21)
`(cid:16)
`(cid:17)T
`(cid:1)(cid:255)1MT (cid:1)P(cid:1)PT MT M(cid:255)
`
`(cid:1)(cid:255)1MT
`(cid:16)
`(cid:17)T
`(cid:1)(cid:255)1MT E (cid:1)P(cid:1)PT
`(cid:255)
`(cid:1)(cid:255)1MT
`(cid:255)
`(cid:1)
`0BB@
`1CCA MT M
`(cid:16)
`(cid:255)
`(cid:1)(cid:255)1MT
`(cid:1)(cid:255)1MT
`
`322
`
`corresponding set of 3D measured points from the sensor
`{senpi}. The goal is to find the pose of the object with respect
`to the sensor, sen
`obj x. There are many solutions to the problem;
`in this work we used the solution by Horn [36], which uses
`a quaternion-based method.2 The algorithm effectively
`minimizes the squared error between the measured 3D
`point locations and the predicted 3D point locations.
`
`3 DETERMINATION AND MANIPULATION OF POSE
`UNCERTAINTY
`
`Given that we have estimated the pose of an object using
`one of the methods above, what is the uncertainty of the
`pose estimate? We can represent the uncertainty of a six-
`element pose vector x, by a 6 (cid:2) 6 covariance matrix Cx =
`E((cid:1)x(cid:1)xT), which is the expectation of the square of the
`difference between the estimate and the true vector.
`This section describes methods to estimate the covar-
`iance matrix of a pose, given the estimated uncertainties in
`the measurements, transform the covariance matrix from
`one coordinate frame to another, and combine two pose
`estimates.
`
`3.1 Computation of Covariance
`Assume that we have n measured data points from the
`sensor {pi} and the corresponding points on the object
`{qi}. The object points qi are 3D; the data points pi are
`either 3D (in the case of 3D-to-3D pose estimation) or 2D
`(in the case of 2D-to-3D pose estimation). We assume that
`the noise in each measured data point is independent and
`that the noise distribution of each point is given by a
`covariance matrix Cp.
`Let pi = g(qi, x) be the function which transforms object
`points into measured data points for a hypothesized pose x.
`In the case of 3D-to-3D pose estimation, this is just a
`multiplication of qi by the corresponding homogeneous
`transformation matrix.
`In the case of 2D-to-3D pose
`estimation, the function is composed of a transformation
`followed by a perspective projection. The pose estimation
`algorithms described above solve for xest by minimizing the
`sum of the squared errors. Assume that have we solved for
`xest using the appropriate algorithm (i.e., 2D-to-3D or 3D-to-
`3D). We then linearize the equation about the estimated
`solution xest:
`
`(cid:20)
`Since pi (cid:25) g(cid:133)qi; xest(cid:134), the equation reduces to
`(cid:1)x (cid:136) Mi(cid:1)x;
`(cid:1)pi (cid:136) @g
`@x
`
`(cid:20)
`
`(cid:134) (cid:135) @g
`@x
`
`(cid:21)T
`
`qi;xest
`
`(cid:1)x:
`(cid:133)2(cid:134)
`
`(cid:133)3(cid:134)
`
`pi (cid:135) (cid:1)pi (cid:136) g qi; xest (cid:135) (cid:1)x(cid:133)
`
`
`
`(cid:134) (cid:25) g qi; xest(cid:133)
`
`(cid:21)T
`
`qi;xest
`
`Solving for (cid:1)x in a least squares sense, we get
`(cid:1)x (cid:136) MT M
`(cid:255)
`by the expectation of the outer product:
`(cid:20)
`(cid:255)
`Cx (cid:136) E (cid:1)x (cid:1)xT
`(cid:255)
`(cid:136) E MT M
`(cid:255)
`
`(cid:17)T
`
`: (cid:133)
`
`5(cid:134)
`
`(cid:136) MT M
`
`(cid:136) MT M
`
`MT M
`
`Cp
`.
`..
`0
`
`(cid:1) (cid:1) (cid:1)
`0
`.
`.
`..
`. .
`(cid:1) (cid:1) (cid:1) Cp
`
`Note that we have assumed that the errors in the data
`points are independent, i.e., E((cid:1)pi(cid:1)pjT) = 0, for i 6(cid:136) j. If the
`errors in different data points are actually correlated, our
`simplified assumption could result in an underestimate of
`the actual covariance matrix. Also, the above analysis was
`derived assuming that the noise is small. However, we
`computed the covariance matrices for the configuration
`described in Section 4, using both (5) and using a Monte
`Carlo simulation, and found (5) is fairly accurate even for
`noise levels much larger than in our application. For
`example, using input noise with variance 225 mm2
`(compared to the actual 0.0225 mm2 in our application)
`the largest deviation between the variances of the transla-
`tional dimensions was 5.5 mm2 (out of 83 mm2).
`
`3.2 Transformation of Covariance
`We can transform a covariance matrix from one coordinate
`frame to another. Assume that we have a six-element pose
`vector x and its associated covariance matrix Cx. Assume
`that we apply a transformation, represented by a six-
`element vector w, to x to create a new pose y. Denote y =
`g(x, w). A Taylor series expansion yields (cid:1)y (cid:136) J(cid:1)x, where
`h
`(cid:1) (cid:136) E J(cid:1)x
`(cid:255)
`J = (@g/@x). The covariance matrix Cy is found by:
`(cid:1)
`(cid:255)
`Cy (cid:136) E (cid:1)y(cid:1)yT
`(cid:133)
`
`(cid:134) J(cid:1)x(cid:133)
`(cid:134)T
`JT (cid:136) JCxJT :
`(cid:136) JE (cid:1)x(cid:1)xT
`the
`A variation on this method is to assume that
`transformation w also has an associated covariance matrix
`Cw. In this case, the covariance matrix Cy is:
`
`Cy (cid:136) JxCxJTx (cid:135) JwCwJTw;
`
`where Jx = (@g/@x) and Jw = (@g/@w). The above analysis
`was verified with Monte Carlo simulations, using both the
`3D-to-3D algorithm and the 2D-to-3D algorithm.
`
`(cid:133)6(cid:134)
`
`(cid:133)7(cid:134)
`
`i
`
`3.3 Interpretation of Covariance
`A useful
`interpretation of
`the covariance matrix is
`obtained by assuming that the errors are jointly Gaussian.
`The joint probability density for n-dimensional error
`vector (cid:1)x is [37]:
`
`where Mi
`is the Jacobian of g, evaluated at (qi, xest).
`Combining all the measurement equations:
`
`2. This is the algorithm used in the Northern Digital Optotrak sensor,
`described in Section 4.
`
`Authorized licensed use limited to: Everything Demo User. Downloaded on December 29,2021 at 14:43:37 UTC from IEEE Xplore. Restrictions apply.
`
`META 1025
`META V. THALES
`
`
`
`HOFF AND VINCENT: ANALYSIS OF HEAD POSE ACCURACY IN AUGMENTED REALITY
`
`(cid:16)
`
`p (cid:1)x(cid:133)
`
`(cid:134) (cid:136) 2(cid:25)j
`
`jN=2 Cxj
`
`j1=2
`
`(cid:17)(cid:255)1
`
`(cid:255)
`
`(cid:1)
`
`(cid:133)8(cid:134)
`
`:
`
`323
`
`Fig. 1. A visualization of the uncertainty of the pose of a coordinate
`frame. The ellipsoid, shown centered at the origin of the coordinate
`frame, represents the uncertainty in the translational component of the
`pose. The rotational uncertainty is depicted as elongated cones about
`each axis.
`
`the rotational uncertainty) are asymmetrical. The uncer-
`tainty is greatest for rotations about the long axis of the
`target pattern and, so, the cones perpendicular to that
`axis are elongated. This is because the shorter dimension
`of the target pattern provides less orientation constraint
`than the longer dimension.
`To illustrate the effect of transformations on covariance
`matrices, another simulation of a pose estimation process
`was performed. The target pattern used in Fig. 1 was
`attached to coordinate frame A and the uncertainty of the
`pose of A with respect to sensor S was computed. As shown
`in Fig. 2, the translational component of the uncertainty is
`represented by a small ellipsoid centered at A and the
`rotational component of the uncertainty is represented by
`elongated cones about each axis of A. Next, two other
`objects with coordinate frames B and C were rigidly
`attached to A, at known poses with respect to A. The
`poses of B and C with respect to S were derived via
`CH (cid:136) S
`H (cid:136) S
`AHABH and S
`
`AHA
`CH, respectively. The covar-
`iance matrices of these poses, CB and CC, were then
`estimated using (6). The uncertainties of the translational
`components of CB and CC are shown by the ellipsoids
`centered at B and C, respectively.
`Note that the ellipsoids for CB and CC are much larger
`than the ellipsoid for CA, even though the relative poses of
`B and C with respect to A are known exactly. This is due to
`the orientation uncertainty in the pose of A with respect to
`S, which gives rise to an uncertainty in the location of B and
`C. The uncertainty is greatest in the plane perpendicular to
`the line to object A—hence, the flattened shapes of the
`ellipsoids associated with CB and CC. Note that the shape of
`the flattened ellipsoids corresponds to the shape of the
`cones about the axes perpendicular to the flattened parts.
`In general, the component of translational uncertainty in
`a frame B that is caused by the orientation error in A can be
`estimated by (cid:1)P = d (cid:1)(cid:18), where (cid:1)(cid:18) is the orientation error
`and d is the distance between A and B. Thus,
`the
`uncertainty in the derived location of B grows with the
`orientation uncertainty in A and also with the distance
`between A and B. If one needs to track an object using a
`
`SB
`
`exp (cid:255)1
`2(cid:1)xT C(cid:255)1
`x (cid:1)x
`the
`If we look at surfaces of constant probability,
`argument of the exponent is a constant, given by the
`x (cid:1)x (cid:136) z2. This is the equation of an
`relation (cid:1)xT C(cid:255)1
`ellipsoid in n dimensions. For a given value of z, the
`cumulative probability of an error vector being inside the
`ellipsoid is P. For n = 3 dimensions, the ellipsoid defined by
`z = 3 corresponds to a cumulative probability P of
`approximately 97 percent.3
`For a six-dimensional pose x, the covariance matrix Cx is
`6 (cid:2) 6 and the corresponding ellipsoid is six-dimensional
`(which is difficult to visualize). However, we can select only
`the 3D translational component of the pose and look at the
`covariance matrix corresponding to it. Specifically, let z =
`(x, y, z)T be the translational portion of the pose vector x =
`(x, y, z, (cid:11), (cid:12), )T. We obtain z from x using the equation z =
`M x, where M is the matrix
`
`0@
`
`M (cid:136) 1 0
`
`0 1
`0 0
`
`0 0 0
`0 0 0
`1 0 0
`
`0
`0
`0
`
`1A
`
`(cid:133)9(cid:134)
`
`The covariance matrix for z is given by Cz = M Cx MT
`(which is just the upper left 3 (cid:2) 3 submatrix of Cx). We
`can then visualize the uncertainty in position using the
`three-dimensional ellipsoid corresponding to the set
`z (cid:133)z (cid:255) (cid:22)z(cid:134) (cid:20) 9g.
`fzj(cid:133)z (cid:255) (cid:22)z(cid:134)TC(cid:255)1
`We can visualize the uncertainty in the rotational
`component of the pose by finding the uncertainties in the
`directions of the x, y, z axes of the coordinate frame
`relative to the world frame. The orientation of a particular
`axis a of
`the coordinate frame is
`found using
`a (cid:136) R(cid:133)(cid:11); (cid:12); (cid:134)e, where R(cid:133)(cid:11); (cid:12); (cid:134) is the rotation matrix of
`the coordinate frame in the world and e is the relevant
`unit vector in the world frame. Using the results of the
`previous section,
`the covariance of a is given by
`^e(cid:137)R(cid:133)(cid:11); (cid:12); (cid:134)e(cid:138)T , where C^e
`Ca (cid:136) @@^e (cid:137)R(cid:133)(cid:11); (cid:12); (cid:134)e(cid:138)C^e
`
`is the
`3 (cid:2) 3 lower right submatrix of Cx corresponding to the
`@^e(cid:137)R(cid:133)(cid:11); (cid:12); (cid:134)e(cid:138) is the Jacobian of
`angular uncertainty and @
`R(cid:133)(cid:11); (cid:12); (cid:134)e with respect to (cid:11); (cid:12); . Ca is of rank 2 and the
`ellipsoid associated with it will be “flat” in the direction
`perpendicular to a. For visualization, these ellipsoids define
`the bases of cones drawn about each axis and show how the
`ends of the axis would move given the variation in the Euler
`angles.
`To illustrate these concepts, a simulation of a pose
`estimation process was performed. A simulated target
`pattern was created, attached to a coordinate frame A.
`The pose of coordinate frame A with respect to a sensor S,
`H, was estimated using a 3D-to-3D algorithm. The
`covariance matrix of the resulting pose, CA, was computed
`using (5). Fig. 1 shows a rendering of
`the ellipsoid
`corresponding to the uncertainty of
`the translational
`component of the pose. The ellipsoid is shown centered
`at the origin of frame A. The rotational uncertainty is
`depicted as elongated cones about each axis. Note that,
`although the ellipsoid (representing the translational
`uncertainty) is almost spherical, the cones (representing
`
`@@
`
`SA
`
`R 1
`
`3. The exact formula for the cumulative probability in N dimensions is
`1 (cid:255) P (cid:136)
`z XN(cid:255)1e(cid:255)X2 =2dX [37].
`N
`2N=2 (cid:255)(cid:133)N=2(cid:135)1(cid:134)
`
`Authorized licensed use limited to: Everything Demo User. Downloaded on December 29,2021 at 14:43:37 UTC from IEEE Xplore. Restrictions apply.
`
`META 1025
`META V. THALES
`
`
`
`324
`
`IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 6, NO. 4, OCTOBER-DECEMBER 2000
`
`Fig. 2. The orientation uncertainty in the pose of a corrdinate frame A (with respect to sensor S) gives rise to translational uncertainties in the poses
`of coordinate frames B and C. Two views are shown, taken from slightly different viewpoints. Note the highly flattened shapes of the ellipsoids for
`frames B and C.
`
`sensor and attaches a target pattern to the object in order to
`do this, then the target pattern should be placed as close as
`possible to the object to minimize the derived pose error.
`Holloway [17] also noted a similar effect in his analysis
`of registration errors in augmented reality systems. He
`found that small errors in measuring the orientation of a
`(fixed) tracking sensor in a world coordinate system could
`lead to large errors in the derived position of a point in the
`scene, due to the large “moment arm” of the tracker-to-
`point distance.
`
`3.4 Combining Pose Estimates
`In this section, we develop a formula to combine two pose
`estimates, weighted by their covariance matrices, following
`the approach outlined by Bevington [38]. Let x1, x2 be two
`n-dimensiona