`
`Orientation
`Tracking for
`Outdoor
`Augmented Reality
`Registration
`The key technological challenge to creat-
`
`A hybrid approach to
`
`orientation tracking
`
`integrates inertial and
`
`vision-based sensing.
`
`Analysis and experimental
`
`results demonstrate the
`
`effectiveness of this
`
`approach.
`
`ing an augmented reality lies in main-
`taining accurate registration between real and
`computer-generated objects. As augmented reality
`users move their viewpoints, the graphic virtual ele-
`ments must remain aligned with the observed positions
`and orientations of real objects. The
`perceived alignment depends on
`accurately tracking the viewing
`pose, relative to either the environ-
`ment or the annotated object(s).1,2
`The tracked viewing pose defines
`the virtual camera used to project
`3D graphics onto the real world
`image, so tracking accuracy direct-
`ly determines the visually perceived
`accuracy of augmented reality
`alignment and registration.1
`Several augmented reality track-
`ing technologies have been devel-
`oped for indoor applications, yet
`none migrate easily to outdoor set-
`tings. Indoors, we can often cali-
`brate
`the environment, add
`landmarks, control lighting, and limit the operating
`range to facilitate tracking. To calibrate, control, or mod-
`ify outdoor environments, however, is unrealistic.
`Our work stems from a program focused on developing
`tracking technologies for wide-area augmented realities
`in unprepared outdoor environments. Other participants
`in the Defense Advanced Research Projects Agency
`(Darpa) funded Geospatial Registration of Information
`for Dismounted Soldiers (Grids) program included Uni-
`versity of North Carolina at Chapel Hill and Raytheon.
`We describe a hybrid orientation tracking system
`combining inertial sensors and computer vision. We
`exploit the complementary nature of these two sensing
`technologies to compensate for their respective weak-
`nesses. Our multiple-sensor fusion is novel in aug-
`mented reality tracking systems, and the results
`demonstrate its utility.
`
`Suya You and Ulrich Neumann
`University of Southern California
`
`Ronald Azuma
`HRL Laboratories
`
`Background
`A wealth of research, employing a variety of sensing
`technologies, deals with motion tracking and registra-
`tion as required for augmented reality. Each technology
`has unique strengths and weaknesses. Existing systems
`can be grouped into two categories: active target and
`passive target (Table 1). Active-target systems incorpo-
`rate powered signal emitters, sensors, and/or landmarks
`(fiducials) placed in a prepared and calibrated envi-
`ronment. Demonstrated active-target systems use mag-
`netic, optical, radio, and acoustic signals.3 Passive-target
`systems are completely self-contained, sensing ambient
`or naturally occurring signals or physical phenomena.
`Examples include compasses sensing the Earth’s mag-
`netic field, inertial sensors measuring linear accelera-
`tion and angular motion, and vision systems sensing
`natural scene features.
`Vision is commonly used for augmented reality track-
`ing.1,2 Unlike other active and passive technologies,
`vision methods can estimate a camera pose directly from
`the same imagery the user observes. The pose estimate
`often relates to the object(s) of interest, not a sensor or
`emitter attached to the environment. This has several
`advantages:
`
`n tracking may occur relative to moving objects,
`n tracking measurements made from the viewing posi-
`tion often minimize the visual alignment error, and
`n tracking accuracy varies in proportion to the visual
`size (or range) of the object(s) in the image.
`
`The ability to track pose and measure residual errors is
`unique to vision. However, vision suffers from a notori-
`ous lack of robustness and high computational expense.
`Combining vision with other technologies offers the
`prospect of overcoming these problems.
`All tracking sensors have limitations. The signal-sens-
`ing range as well as man-made and natural sources of
`interference limit active-target systems. Passive-target
`systems are also subject to signal degradation. For exam-
`ple, poor lighting degrades vision, and proximity to fer-
`
`36
`
`November/December 1999
`
`0272-1716/99/$10.00 © 1999 IEEE
`
`META 1014
`META V. THALES
`
`
`
`rous material distorts compass measurements. Inertial
`sensors measure acceleration or angular rates, so their
`signals must be integrated to produce position or ori-
`entation. Noise, calibration error, and gravity accelera-
`tion impart errors on these signals, producing
`accumulated position and orientation drift. Obtaining
`position from double integration of linear acceleration
`means the accumulation of position drift grows as the
`square of elapsed time. Getting orientation from a sin-
`gle integration of angular rate accumulates drift linear-
`ly with time.
`Hybrid systems attempt to compensate for the short-
`comings of a single technology by using multiple sensor
`types to produce robust results. For example, State
`et al.4 combined active-target magnetic and vision sens-
`ing. Azuma and Bishop5 developed a hybrid of inertial
`sensors and active-target vision to create an indoor aug-
`mented reality system. Passive-target vision and iner-
`tial sensors create a hybrid tracker for mobile robotic
`navigation and range estimation.10,11 Table 1 presents
`these and other examples. A more complete overview
`of tracking technologies can be found elsewhere.1
`
`Approach
`Our approach combines prior work in natural feature
`tracking8,12 with inertial and compass sensors7 to pro-
`duce a hybrid orientation tracking system. By exploit-
`ing the complementary nature of these sensors, the
`hybrid system achieves performance that exceeds any
`of the components.9 Our approach rests on two basic
`tenets:
`
`1. Inertial gyroscope data can increase the robustness
`and computing efficiency of a vision system by pro-
`viding a relative frame-to-frame estimate of camera
`orientation.
`2. A vision system can correct for the accumulated drift
`of an inertial system.
`
`Here we consider the case when the scene range is
`many multiples of the camera focal length. Under this
`condition, the perceived motion of scene features is
`more sensitive to camera rotation than camera transla-
`tion. The vision system tracks 2D image motions. Since
`these largely result from rotations, the gyroscope sen-
`sors provide a good estimate of these motions. Vision
`tracking, in turn, corrects the error and drift of the iner-
`tial estimates.
`
`System overview
`Figure 1 shows the system hardware configuration:
`
`n A compass and tilt sensor module (Precision Naviga-
`tion TCM2) provides the user’s heading and two tilt
`angles in the local motion frame. The module is spec-
`ified to achieve approximately – 0.5 degree of error in
`yaw, at a 16-Hz update rate.
`n Three gyroscopes (System Donner GyroChip II
`QRS14-500-103) in an orthogonal configuration
`sense angular rates of rotation along three perpen-
`dicular axes. The maximum sense range is – 500
`degrees per second, sampled at 1 kHz.
`
`Table 1. Examples of hybrid tracking approaches.
`
`Approaches
`Active-Active
`Active-Passive
`Passive-Passive
`
`Examples
`Magnetic-vision4
`Vision-inertial,5 acoustic-inertial6
`Compass-inertial,7 vision-inertial8-11
`
`Video recorder
`
`Sony XC999
`video camera
`
`TCM2
`orientation sensor
`
`Notch
`filters
`
`16-bit
`A/D
`
`200-MHz PC
`
`3 Gyro Chip II
`rate gyros
`
`V-Cap optical
`see-through
`HMD
`
`1 The system
`configuration
`consists of a
`compass and tilt
`sensor module,
`three gyro-
`scopes, and a
`video camera.
`
`VGA video
`
`n A video camera (Sony XC-999 CCD color camera) pro-
`vides visual streams for a vision-based tracker and
`augmented reality display.
`
`The system fuses the outputs of these sensors to deter-
`mine a user’s orientation. To predict angular motion,
`the system filters and fuses the compass module and
`gyro sensors.7 From a static location under moderate
`rotation rates, the fusion algorithm achieves about two
`degrees of peak registration error. Typical errors are less
`than one degree while operating in real time.7 For rapid
`motions or long tracking periods, the errors become
`larger due to accumulated gyroscope drift and compass
`errors. These are corrected by the vision measurements.
`Since our vision tracking software doesn’t run in real
`time, our experiments used both the inertial data and
`video images for offline processing and fusion.
`
`Inertial tracking
`The basic principles behind inertial sensors rest on
`Newton’s laws. We use gyroscopes that sense rotation
`rate. This lets us integrate the gyroscope data over time
`so that we can compute relative changes of orientation
`within the reference frame. The integration of signal
`and error gives rise to an approximately linear increas-
`ing orientation drift.
`
`Error sensitivity
`We analyzed our gyroscope system’s error sensitivi-
`ty. We sampled the angular rate at 1 kHz and output the
`integrated orientation at 30 Hz to match the imaging
`frame rate. Integrating the angular rates and a coordi-
`nate transformation produces three orientation mea-
`surements (yaw, pitch, and roll) of the tracker with
`respect to the initial orientation.
`
`IEEE Computer Graphics and Applications
`
`37
`
`META 1014
`META V. THALES
`
`
`
`nificant radial distortions that contribute error.8
`Figure 2 illustrates the dynamic gyroscope accuracy
`we measured experimentally. The 3DOF gyro sensor is
`rigidly attached to the video camera and continually
`reports the camera orientation. Rather than attempting
`to measure the ground-truth absolute orientation of the
`sensors, we track visual feature motions to evaluate the
`gyroscope’s accuracy. We manually select image fea-
`tures (~ 5) while the camera and gyroscope are at rest.
`Then during motion we track these features by our
`vision method and compare their observed positions to
`their projected positions derived from the 3D orienta-
`tion changes that the gyroscopes report. Pixel distances
`are proportional to the errors accu-
`mulated by the inertial system (as
`described in Equation 1). Figure 2
`plots the average pixel errors mea-
`sured for the selected features while
`rotating the sensors in an outdoor
`setting. It clearly shows the dynam-
`ic variations between the gyroscope
`data and observed feature motions.
`
`Future
`head
`orientation
`
`Gyroscope stabilization by compass
`We can estimate the head’s angular position and rota-
`tion rate from the outputs of the compass module
`(TCM2) and the three gyroscopes. The system extrapo-
`lates this data one frame into the future to estimate the
`head orientation at the time the image appears on the
`see-through display (Figure 3). Space limitations pro-
`hibit a full explanation of the gyro-compass fusion
`method; please read Azuma et al.7 for the details. This
`section will provide an overview of the fusion method
`and the results.
`Sensor calibration is crucial to system performance.
`The gyroscopes required an estimate of their bias and
`analog notch filters to remove a high-frequency noise.
`The compass encountered significant distortions from
`our environment and the system equipment. The dis-
`tortions remained relatively constant at a single loca-
`tion over time (30 minutes), so heading (yaw)
`calibration was possible with a special nonmagnetic
`turntable (made of Delrin).
`The fusion method compensates for the difference in
`time delays between the two sensors. The gyroscopes
`are sampled by an analog/digital converter at 1 kHz,
`with minimal latency. The system reads the compass at
`16 Hz through a serial line. We captured several data
`runs and determined the average difference in latencies
`was 92 ms. Therefore, the fusion method incorporates
`compass measurements by comparing them to gyro-
`scope estimates 92-ms old.
`Figure 4 shows the filter’s dynamic behavior. The raw
`compass input (blue line) leads the filter output (red
`line). The filter compensates for the lagging compass
`measurements. The filter output retains the smoothness
`of the gyroscope data and is much smoother than the
`raw compass output. When the user stops moving, the
`filter output settles to the compass value, since it pro-
`vides an absolute heading. Clearly, this absolute head-
`ing accuracy limits the registration accuracy. Visual
`measurements can compensate for compass errors.
`
`100
`
`500
`400
`300
`200
`Number of frames
`
`600
`
`700
`
`70
`
`50
`
`30
`
`10
`-10
`-30
`-50
`-70
`
`0
`
`Error (pixels)
`
`Virtual Reality
`
`2 Average pixel
`differences
`between
`tracked features
`and features
`projected by
`gyro measure-
`ments.
`
`3 Schematic
`for the gyro-
`compass fusion.
`
`Gyros
`
`Speed
`
`Compass
`
`Estimation
`
`Orientation
`
`Prediction
`
`Tilt sensor
`
`375
`
`370
`
`365
`
`360
`
`355
`
`Compass
`Filter
`
`350
`29.5
`
`30
`
`31.5
`31
`30.5
`Time (seconds)
`
`32
`
`32.5
`
`Heading (degrees)
`
`4 Sequence of
`heading data as
`the system
`pauses.
`
`A vision system can measure the dynamic gyroscope
`accuracy, so we first determined the relationship
`between angular rate and image motion. Let (fx, fy) be
`the effective horizontal and vertical focal lengths of a
`video camera (in pixels), (Lx, Ly) represent the horizon-
`tal and vertical image resolutions, and (q x, q y) be the
`field-of-view (FOV) of the camera, respectively. If we
`approximate pixels as sampling the rotation angles uni-
`formly (yaw and pitch), the ratio of image pixel motion
`to the rotation angles (pixel per degree) is
`
`L L
`
`x
`
`/
`
`2
`
`f
`
`x
`
`x
`
`L L
`
`y
`
`/
`
`2
`
`f
`
`y
`
`y
`
`
`As a concrete example of this relationship, consider
`the Sony XC-999 video camera with an F 1:1.4, 6-mm
`lens. Through calibration, we determined the effective
`horizontal and vertical focal lengths as fx = 614.059 pix-
`els, and fy = 608.094 pixels, with a 640 · 480 image res-
`olution. The ratios are Lx/q x = 11.625 pixels per degree,
`and Ly/q y = 11.143 pixels per degree. That is, each degree
`of orientation-angle error results in about 11 pixels of
`alignment error in the image plane. Increasing the cam-
`era’s FOV with a wide-angle lens reduces the pixel error
`proportionately, however wide-angle lenses produce sig-
`
`)
`)
`
`(1)
`
`(
`(
`
`=
`
`x
`
`2
`
`tan
`
`=
`
`y
`
`2
`
`tan
`
`--
`
`1
`
`1
`
`q q
`
`L
`
`x
`
`L
`y
`
`38
`
`November/December 1999
`
`META 1014
`META V. THALES
`
`
`
`xu
`
`(u0,v0)
`
`zu
`
`Rwc,Twc
`
`xc
`
`I
`
`yI
`
`zI
`
`xI
`
`O
`
`zw
`
`xw
`
`W
`
`yw
`
`5 Camera
`model and
`related coordi-
`nate systems for
`the hybrid
`system.
`
`changes, so the transformation between C and I is need-
`ed to relate inertial and camera motion. For rotation RIc
`and translation TIc we obtain
`
`] غŒŒŒ øßœœœ
`
`xy z
`
`I
`
`+[
`
`T
`Ic
`
`]
`
`(5)
`
`I I
`
`fi غŒŒŒ øßœœœ
`
`C
`
`:
`
`I
`
`:
`
`
`
`=[
`
`R
`
`Ic
`
`c c
`
`xy z
`
`c
`
`Since we only measure 3D-orientation motion, we only
`need to determine the rotation transformation.
`
`Static calibration
`Static calibration requires two steps—estimating
`intrinsic camera parameters and establishing the trans-
`formation between inertial and camera coordinates.
`
`U
`
`yu
`
`K
`
`C
`
`yc
`
`Hybrid inertial-vision
`tracking
`The hybrid tracker fuses gyro-
`scope orientation (3D) and vision-
`feature motion (2D) to derive a
`robust orientation measure. We
`structure the fusion as predictor-
`corrector image stabilization. First,
`the system estimates approximate
`2D feature-motion from the inertial
`data (prediction). Then the vision
`feature tracking corrects and refines
`the estimate in the image domain
`(2D correction). Finally, the system
`converts the estimated 2D-motion
`residual to a 3D-orientation correc-
`tion for the gyroscope (3D correc-
`tion). During this process, an added benefit is realized.
`The inertial estimate increases the vision tracking’s effi-
`ciency by reducing the image search space and provid-
`ing tolerance to blur and other image distortions.
`
`RIc,TIc
`
`Camera model and coordinates
`Our system includes a charge-coupled device (CCD)
`video camera with a rigidly mounted 3DOF inertial sen-
`sor. Figure 5 shows the four principal coordinate sys-
`tems: world, W : (xw, yw, zw); camera-centered, C : (xc,
`yc, zc); inertial-centered, I : (xI, yI, zI); and 2D image coor-
`dinates, U : (xu, yu).
`A pinhole camera models the imaging process. The
`origin of C lies at the camera’s projection center. The
`transformation from W to C is
`
`W C: :
`
`
`
`=
`
`R
`
`(2)
`
`غŒŒŒŒŒ øßœœœœœ
`
`w w w 1
`x y z
`
`]
`
`[
`
`wc
`
`R T
`wc
`wc
`
`fi غŒŒŒ øßœœœ
`
`c c c
`x y z
`
`
`
`
`
`where the rotation matrix Rwc and the translation vector
`Twc characterize the camera’s orientation and position
`with respect to the world coordinate frame. Under per-
`spective projection, the transformation from W to U is
`
`] غŒŒŒŒ øßœœœœ
`
`w w w
`xy z
`
`1
`
`wc
`
`R T
`wc
`wc
`
`=[
`
`[
`
`]
`
`K R
`
`fi غŒŒŒ øßœœœ
`
`u u
`xy
`
`1
`
`
`
`W U:
`
`:
`
`
`
`where the matrix K
`
`غŒŒŒ øßœœœ
`
`a
`
`0
`
`x
`0
`
`f
`
`o
`
`u
`v
`o
`1
`
`f
`
`x
`
`0 0
`
`a
`
`K =
`
`
`
`Camera parameters. Camera calibration deter-
`mines the intrinsic parameters K and the lens distortion
`parameters. We use the method described elsewhere.8
`A planar target with a known grid pattern is imaged at
`measured offsets along the viewing direction. An itera-
`tive least-squares estimation computes the intrinsic
`parameters and coefficients of the radial lens distortion.
`For our experiments we assumed these parameters were
`constant.
`
`(3)
`
`(4)
`
`Transformation between inertial and camera
`frames. The transformation between the inertial and
`the camera coordinate systems relates the measured iner-
`tial motion to camera motion and image-feature motion.
`Measuring this transformation is difficult, especially with
`optical see-through display systems.1 In this article we
`describe a motion-based calibration, as opposed to the
`boresight methods previously presented.5
`Equation 5 relates the inertial tracker frame and the
`camera coordinate frame. The rotation relationship
`between the two coordinates is
`
`represents the intrinsic parameters of the camera, f is
`the focal length of the camera, a x, a y are the horizontal
`and vertical pixel sizes on the imaging plane, and (u0,
`v0) is the projection of the camera’s center (principal
`point) on the image plane. (For simplicity we omitted
`the lens distortion parameters from the equation.)
`The inertial tracker reports camera orientation
`
`w C = [RIc] w I
`
`(6)
`
`where w C and w I denote the angular velocity of scene
`points relative to the camera coordinate frame and the
`inertial coordinate frame, respectively.
`We obtained the angular motion w I relative to the
`
`IEEE Computer Graphics and Applications
`
`39
`
`META 1014
`META V. THALES
`
`-
`-
`
`
`Virtual Reality
`
`inertial coordinate system from the inertial data. We
`need to compute the camera’s angular velocity in some
`way, in order to determine the transformation matrix
`from Equation 6.
`General camera motion can be decomposed into a lin-
`ear translation and an angular motion. Under perspec-
`tive projection, the 2D image motion resulting from
`camera motion can be written as
`
`to the related motion (rotation) between the camera and
`the scene, can be estimated as
`
`xit = xit- 1 + D xit
`D xit = L
`w C
`
`where L
`
`is given by Equation 8.
`
`(9)
`
`غŒŒŒŒŒ øßœœœœœ
`غŒŒŒŒŒ øßœœœœœ
`
`+
`f V
`Cx
`z
`
`C
`
`2 2
`x f
`
`u
`
`+
`
`f
`
`(
`1
`
`+
`f V
`Cy
`z
`
`C
`
`=
`
`˙
`x
`
`u
`
`=
`
`˙
`y
`
`u
`
`
`
`x V
`u Cz
`
`+
`
`u
`
`w
`
`x y
`u
`f
`
`Cx
`
`w
`
`)
`
`+
`
`w
`y
`u Cz
`
`Cy
`
`(7)
`
`w
`
`)
`
`Cx
`
`2 2
`y f
`
`u
`
`y V
`u Cz
`
`+
`
`+
`
`f
`
`(
`1
`
`u
`
`w
`
`+
`
`w
`x
`u Cz
`
`Cy
`
`x y
`u
`f
`
`.y
`
`.x
`
`where (
`u) denotes the image velocity of point
`u,
`(xu, yu) in the image plane, zC is the range to that point,
`and f is the focal length of the camera. Eliminating the
`translation term and substituting from Equation 6, we
`have
`
`= L
`
`2D tracking correction. Inertial data predicts the
`motion of image features. The correction refines these
`predicted positions by local image searches for the true
`features. Our robust motion-tracking approach inte-
`grates three motion analysis functions, feature selec-
`tion, tracking, and verification in a closed-loop
`cooperative manner to cope with complex imaging con-
`ditions.12 First, in the feature selection module, the sys-
`tem selects 0D (points) and 2D (regions) tracking
`features for their suitability for tracking and motion esti-
`mation. The selection process also uses data from a
`tracking evaluation function that measures the confi-
`dence of the prior tracking estimations.
`Once selected, the system ranks the features accord-
`ing to their evaluations and feeds them into the tracking
`module. A differential-based local optical-flow calcula-
`tion uses normal motions in local neighborhoods to per-
`form a least-squares minimization to find the best affine
`motion estimate for each region. Unlike traditional sin-
`gle-stage implementations, the approach adopts a mul-
`tistage robust estimation strategy. For every estimated
`result, a verification and evaluation metric assesses the
`estimation’s confidence. If the estimation confidence is
`low, the result is refined iteratively until the estimation
`error converges. See Neumann and You12 for details.
`
`w be the
`3D tracking correction. Let w I = w c + D
`orientation from the inertial sensor, in which w c is the
`real camera motion, and D
`w
`is the gyroscope drift that
`we want to estimate and correct. From Equations 7 and
`8, we derive the relationship between the gyro error and
`the resulting 2D error D
`w of image velocity as
`
`(10)
`
`w
`
`= (cid:215)L D
`
`uC
`
` ˙
`x
`
`I
`˙
`xu
`
`I
`˙
`˙
`is the image
`The left-hand of Equation 10,
`x
`xu
`velocity difference between the inertial sensor and the
`real camera motion (or 2D-motion residual). The prob-
`lem of 3D correction is reduced to finding the inertial
`w
`drift
`that minimizes the motion residual
`- fi
`I
` ||˙
`˙ || min
`. Then the inertial drift to be cor-
`x
`xu
`rected is
`
`uC
`
`uC
`
`(11)
`
`
`(cid:215) -- 1 (˙x
`I
`˙ )
`xu
`
`uC
`
`
`
`D Lw =
`
`Results and evaluation
`We experimentally tested our approach. Figure 6a
`shows a sample frame from a 30-Hz video sequence cap-
`tured at an outdoor location with moderate rotation rates.
`In this frame, black dots identify the feature targets that
`we want to track and annotate. The blue labels are posi-
`tioned only by inertial data (fused gyro and compass
`data), while the red labels show the vision-corrected posi-
`tions. The resolution of the images is 640 · 480.
`
`˙
`x
`
`u
`
`
`
`[R
`
`Ic
`
`] w
`
`I
`
`(8)
`
`- -غŒŒŒŒŒ øßœœœœœ
`
`)
`
`y
`
`u
`
`2 2
`x f
`
`u
`
`x
`
`u
`
`+
`(1
`
`
`
`f
`
`u
`
`x y
`u
`f
`
`)
`
`2 2
`y f
`
`u
`
`u
`
`x y
`u
`f
`
`+
`
`f
`
`
`
`(1
`
`where
`
`L =
`
`In words, given knowledge of the internal camera
`parameters, the inertial tracking data w I, and the relat-
`ed 2D motions [
`u] of a set of image features, the
`u,
`transformation RIc between the camera and the inertial
`coordinate systems can be determined from Equation
`8. We can also use this approach to calibrate the trans-
`lation component between position tracking sensors.
`
`.x
`
`.y
`
`Dynamic registration
`The static registration procedure described above
`establishes a good initial calibration. However, the gyro-
`scope accumulates drift over time and produces errors
`with motion. The distribution of drift and error is diffi-
`cult to model for analytic correction. Our strategy for
`dynamic registration minimizes the tracking error in the
`perceived image.
`
`Tracking prediction. Suppose the system detects
`N features in a scene. Our goal is to automatically track
`these features as the camera moves in the following
`frames. Let w C be the camera rotation from frame
`I(x, t - 1) to frame I(x, t). For the scene points Oi, their
`2D positions in the image frame t - 1 are xit- 1 = [xit- 1,
`yit- 1]T. The positions of these points in the frame t, due
`
`40
`
`November/December 1999
`
`META 1014
`META V. THALES
`
`D
`-
`-
`-
`-
`-
`-
`-
`
`
`70
`
`50
`
`30
`
`Inertial only
`Hybrid tracker
`
`Error (pixels)
`
`10
`- 10
`- 30
`- 50
`- 70
`0
`
`Phillips tower
`
`Church
`Landon comm. ctr.
`
`Church
`
`Phillips tower
`
`Landon comm. ctr.
`
`100
`
`200
`
`400
`300
`Number of frames
`
`500
`
`600
`
`700
`
`(b)
`(a)
`6 The tracking result of an outdoor natural scene. (a) Virtual labels annotated over landmarks for video sequences showing vision-
`corrected (red labels), and inertial only (blue labels) tracking results. (b) Hybrid alignment errors for Figure 6a showing inertial only
`(blue line) and vision-corrected (red line) errors.
`
`Figure 6b illustrates the average pixel errors for iner-
`tial-only tracking (blue line) and hybrid inertial-vision
`tracking (red line), respectively. To obtain these quanti-
`tative results, we manually select 10 distinct features in
`initial frames to establish visual reference points. The
`selected features are back-projected in each frame based
`on the camera orientation reported by the tracking sys-
`tem. The average differences between the back-project-
`ed image positions and the observed (vision-tracked)
`feature positions are the measure of tracking accuracy
`in each frame. The inertial tracking errors are effective-
`ly corrected, reducing the average registration error over
`the image sequence to 4.27 pixels (corresponding to ~ 0.4
`degree of rotation). These results illustrate the value of
`hybrid tracking.
`To obtain these results, our hybrid system ran at about
`two to four frames per second on an SGI O2. Our cur-
`rent version runs over 10 frames per second on an SGI
`Onyx2 and multiprocessor PC. Since the 2D-vision cor-
`
`References
`1. R. Azuma, “A Survey of Augmented Reality,” Presence: Tele-
`operators and Virtual Environments, Vol. 6, No. 4, 1997, pp.
`355-385.
`2. U. Neumann and A. Majoros, “Cognitive, Performance and
`Systems Issues for Augmented Reality Applications in Man-
`ufacturing and Maintenance,” Proc. IEEE Virtual Reality
`Annual Int’l Symp., IEEE CS Press, Los Alamitos, Calif.,
`1998, pp. 4-11.
`3. K. Meyer, H.L. Applewhite, and F.A. Biocca, “A Survey of
`Position Trackers,” Presence: Teleoperators and Virtual Envi-
`ronments, Vol. 1, No. 2, 1992, pp. 173-200.
`4. A. State et al., “Superior Augmented Reality Registration
`by Integrating Landmark Tracking and Magnetic Track-
`ing,” Proc. Siggraph 96, ACM Press, New York, 1996, pp.
`429-438.
`5. R. Azuma and G. Bishop, “Improving Static and Dynamic
`Registration in an Optical See-Through HMD,” Proc. Sig-
`graph 95, ACM Press, New York, 1995, pp. 197-204.
`6. E. Foxlin, M. Harrington, and G. Pfeifer, “Constellation: A
`
`rection operates on each feature, the system speed
`depends on the number of tracked features.
`As mentioned before, we assume that scene objects
`are distant to minimize the effect of position errors.
`Although this condition is often met in outdoor appli-
`cations, orientation tracking is insufficient when track-
`ing and annotation features are close to the tracker. In
`this case, the translation term can’t be ignored in the
`motion model. Additional data is needed to provide
`position information. Accelerometers and global posi-
`tioning system (GPS) sensors are important data sources
`that we’ll investigate in our future work.
`n
`
`Acknowledgments
`This work was largely supported by the Darpa Geospa-
`tial Registration of Information for Dismounted Soldiers
`program. We also thank Intel, SGI, and the Integrated
`Media Systems Center for their support.
`
`Wide-Range Wireless Motion-Tracking System for Aug-
`mented Reality and Virtual Set Applications,” Proc. Sig-
`graph 98, ACM Press, New York, 1998, pp. 371-378.
`7. R. Azuma et al., “A Motion-Stabilized Outdoor Augment-
`ed Reality System,” Proc. IEEE Virtual Reality Conf. 99, IEEE
`CS Press, Los Alamitos, Calif., 1999, pp. 252-259.
`8. S. You, U. Neumann, and R. Azuma, “Hybrid Inertial and
`Vision Tracking for Augmented Reality Registration,” Proc.
`IEEE Virtual Reality 99, IEEE CS Press, Los Alamitos, Calif.,
`1999, pp. 260-267.
`9. G. Welch, Hybrid Self-Tracker: An Inertial/Optical Hybrid
`Three-Dimensional Tracking System, Tech. Report TR95-
`048, University of North Carolina at Chapel Hill, Dept. of
`Computer Science, 1995.
`10. R.S. Suorsa and B. Sridhar, “A Parallel Implementation of
`a Multisensor Feature-Based Range-Estimation Model,”
`IEEE Trans. on Robotics and Automation, Vol. 10, No. 6,
`1994, pp. 155-168.
`11. J. Lobo et al., “Inertial Navigation System for Mobile Land
`
`IEEE Computer Graphics and Applications
`
`41
`
`META 1014
`META V. THALES
`
`
`
`Virtual Reality
`
`Vehicles,” Proc. IEEE Int’l Symp. on Industrial Electronics
`(ISIE 95), IEEE Press, Piscataway, N.J., 1995, pp. 843-848.
`12. U. Neumann and S. You, “Natural Feature Tracking for
`Augmented Reality,” IEEE Trans. on Multimedia, Vol. 1, No.
`1, 1999, pp. 53-64.
`
`Immersive Technologies (CGIT) Laboratory at USC. He
`has an MS in electrical engineering from the State Uni-
`versity of New York at Buffalo (1980) and a PhD in com-
`puter science from the University of North Carolina at
`Chapel Hill (1993). His research relates to interactive visu-
`al media, including augmented-reality tracking systems,
`3D modeling, and facial animation.
`
`Suya You is a research staff mem-
`ber at the Computer Science Depart-
`ment and Integrated Media Systems
`Center at the University Of Southern
`California. His research interests are
`in computer vision and 3D comput-
`er graphics and related applications
`such as visual tracking augmented reality, virtual envi-
`ronments, and advanced human-computer interfaces. He
`received his PhD in electrical engineering in 1994 from the
`Huazhong University of Science and Technology, China.
`
`Ronald Azuma is a research staff
`member at HRL Laboratories in Mal-
`ibu, California,
`the corporate
`research laboratories for Hughes
`Electronics and Raytheon. His
`research interests are in augmented
`reality, virtual environments, and
`visualization. He received a BS in electrical engineering
`and computer science from the University of California at
`Berkeley, and an MS and PhD in computer science from
`the University of North Carolina at Chapel Hill.
`
`Ulrich Neumann is an assistant
`professor of computer science at the
`University of Southern California
`and a research associate director for
`computer interfaces at the USC Inte-
`grated Media Systems Center. He also
`directs the Computer Graphics and
`
`Readers may contact You and Neumann at the Inte-
`grated Media Systems Center, University of Southern Cal-
`ifornia, Los Angeles, CA 90089-0781, e-mail {suyay,
`uneumann}@graphics.usc.edu.
`Contact Azuma at HRL Laboratories, 3011 Malibu
`Canyon Rd., MS RL96, Malibu, CA 90265, e-mail
`azuma@HRL.com.
`
`8th Int’l Conf. in Central Europe on
`Computer Graphics, Visualization, and
`Digital Interactive Media (WSCG 2000)
`7-11 February 2000
`Plzen, Czech Republic
`In cooperation with Eurographics, IFIP working group 5.10, and the Computer Graphics Society, WSCG 2000
`will cover topics in algorithms, rendering and visualization, virtual reality, animation and multimedia, medical
`imaging, geometric modeling and fractals, graphical interaction, object-oriented graphics, World Wide Web
`technologies, standards, computer vision, parallel and distributed graphics, computational geometry, CAD/CAM,
`DTP and GIS systems, educational aspects of related fields, and use of graphics within mathematical software.
`Planned keynote speakers include Carl Machover, Machover Associates; Ben Delaney, Cyberedge Information
`Services; Philip J. Willis, University of Bath; and Andrej Iones, University of St. Peterburg. For more information
`contact the organizer and conference secretariat:
`
`Vaclav Skala
`Computer Science Dept., University of West Bohemia
`Univerzitni 8, Box 314, 306 14 Plzen, Czech Republic
`e-mail skala@kiv.zcu.cz http://wscg.zcu.cz
`
`42
`
`November/December 1999
`
`META 1014
`META V. THALES
`
`