`
`Motion Tracking:
`No Silver Bullet,
`but a Respectable
`Arsenal
`
`If you read the surveys of motion tracking
`
`systems,1-5 one thing that will immediately
`strike you is the number of technologies and approach-
`es—a bewildering array of systems operating on entire-
`ly different physical principles, exhibiting different
`performance characteristics, and designed for differ-
`ent purposes. So why does the world need so many dif-
`ferent
`tracking products and
`research projects to do essentially
`the same thing?
`Just as Brooks argued in his
`famous article on software engi-
`neering6 that there is no single tech-
`nique likely to improve software
`engineering productivity an order of
`magnitude
`in a decade, we’ll
`attempt to show why no one track-
`ing technique is likely to emerge to
`solve the problems of every tech-
`nology and application.
`But this isn’t an article of doom
`and gloom. We’ll introduce you to
`some elegant trackers designed for
`specific applications, explain the
`arsenal of physical principles used
`in trackers, get you started on your
`way to understanding the other arti-
`cles in this special issue, and perhaps put you on track to
`choose the type of system you need for your own com-
`puter graphics application. We hope this article will be
`accessible and interesting to experts and novices alike.
`
`This article introduces the
`
`physical principles
`
`underlying the variety of
`
`approaches to motion
`
`tracking. Although no single
`
`technology will work for all
`
`purposes, certain methods
`
`work quite well for specific
`
`applications.
`
`Greg Welch
`University of North Carolina at Chapel Hill
`
`Eric Foxlin
`InterSense
`
`(HMD) or on a projection screen. In immersive sys-
`tems, head trackers provide view control to make the
`computer graphics scenery simulate a first-person
`viewpoint, but animations or other nonimmersive
`applications might use handheld trackers.
`I Navigation. Tracked devices help a user navigate
`through a computer graphics virtual world. The user
`might point a tracked wand to fly in a particular direc-
`tion; sensors could detect walking-in-place motion
`for virtual strolling.
`I Object selection or manipulation. Tracked handheld
`devices let users grab physical surrogates for virtual
`objects and manipulate them intuitively. Tracked
`gloves, acting as virtual surrogates for a user’s hands,
`let the user manipulate virtual objects directly.
`I Instrument tracking. Tracked tools and instruments
`let you match virtual computer graphics represen-
`tations with their physical counterparts—for exam-
`ple, for computer-aided surgery or mechanical
`assembly.
`I Avatar animation. Perhaps the most conspicuous and
`familiar use of trackers has been for generating real-
`istically moving animated characters through full-
`body motion capture (MoCap) on human actors,
`animals, and even cars.
`
`No silver bullet
`Our experience is that even when presented with
`motion tracking systems that offer relatively impressive
`performance under some circumstances, users often
`long for a system that overcomes the shortcomings relat-
`ed to their particular circumstances. Typical desires are
`reduced infrastructure, improved robustness, and
`reduced latency (see the sidebar, “Tracking Latency”).
`The only thing that would satisfy everyone is a magical
`device we might call a “tracker-on-a-chip.” This ToC
`would be all of the following:
`
`I Tiny—the size of an 8-pin DIP (dual in-line package)
`or even a transistor;
`I Self-contained—with no other parts to be mounted in
`the environment or on the user;
`
`What is motion tracking?
`If you work with computer graphics—or watch tele-
`vision, play video games, or go to the movies—you are
`sure to have seen effects produced using motion track-
`ing. Computer graphics systems use motion trackers for
`five primary purposes:
`
`I View control. Motion trackers can provide position
`and orientation control of a virtual camera for ren-
`dering computer graphics in a head-mounted display
`
`24
`
`November/December 2002
`
`0272-1716/02/$17.00 © 2002 IEEE
`
`Align EX1036
`Align v. 3Shape
`IPR2022-00144
`
`
`
`Read
`buffer
`
`Write
`buffer
`
`Client
`
`Network
`
`Read
`buffer
`
`Write
`buffer
`
`Server
`
`Estimate
`
`Sample
`sensor
`
`User motion
`
`Time
`A Typical tracker pipeline.
`
`the past and knowledge of roads in general. The difficulty of
`this task depends on how fast the car is going and on the
`shape of the road. If the road is straight and remains so, the
`task is easy. If the road twists and turns unpredictably, the
`task is impossible.
`
`References
`1. R. Azuma, Predictive Tracking for Augmented Reality, PhD disserta-
`tion, tech. report TR95-007, Univ. North Carolina, Chapel Hill,
`Dept. Computer Science, 1995.
`2. R. Azuma and G. Bishop, “A Frequency-Domain Analysis of Head-
`Motion Prediction,” Proc. Ann. Conf. Computer Graphics and Inter-
`active Techniques (Proc. Siggraph 95), ACM Press, New York,
`1995, pp. 401-408.
`
`Tracking Latency
`Have you seen those so-called “gourmet” cookie stands
`in convenience stores or fast-food restaurants? They usually
`include a sign that boasts “Made fresh daily!”
`Unfortunately, while cookie baking might indeed take place
`daily, the signs don’t actually give you the date on which
`the specific cookies being sold were baked!
`We’ve found a related common misperception about
`delay or latency in interactive computer graphics in general,
`and in tracking in particular. While the inverse of the
`estimate rate (the period of the estimates) contributes to
`the latency, it doesn’t tell the entire story. Consider our
`imaginary tracker-on-a-chip. If you send its 1,000-Hz
`estimates halfway around the world over the Internet, they
`will arrive at a rate of 1,000 Hz, but quite some time later.
`Similarly, within a tracking system, a person moves, the
`sensors are sampled at some rate, some computation is
`done on each sample, and eventually estimates pop out of
`the tracker. To get the entire story, you must consider not
`only the rate of estimates, but also the length of the
`pipeline through which the sensor measurements and
`subsequent pose estimates travel.
`As Figure A illustrates, throughout the pipeline there are
`both fixed latencies, associated with well-defined tasks such
`as sampling the sensors and executing a function to
`estimate the pose, and variable latencies, associated with
`buffer operations, network transfers, and synchronization
`between well-defined but asynchronous tasks. The variable
`latencies introduce what’s called latency jitter.
`Here again there’s no silver bullet. In 1995 Azuma
`showed that motion prediction can help considerably, to a
`point.1,2 The most basic approach is to estimate or measure
`the pose derivatives and to use them to extrapolate forward
`from the most recent estimate—which is already old by the
`time you get to see it—to the present time. The problem is
`that it’s difficult to predict what the user will choose (has
`chosen) to do very far in the future.
`Azuma pointed out that the task is like trying to drive a
`car by looking only in the rear-view mirror. The driver must
`predict where the road will go, based solely on the view of
`
`I Complete—tracking all six degrees of freedom (posi-
`tion and orientation);
`I Accurate—with resolution better than 1 mm in posi-
`tion and 0.1 degree in orientation;
`I Fast—running at 1,000 Hz with latency less than 1
`ms, no matter how many ToCs are deployed;
`I Immune to occlusions—needing no clear line of sight
`to anything else;
`I Robust—resisting performance degradation from
`light, sound, heat, magnetic fields, radio waves, and
`other ToCs in the environment;
`I Tenacious—tracking its target no matter how far or
`fast it goes;
`I Wireless—running without wires for three years on a
`coin-size battery; and
`I Cheap—costing $1 each in quantity.
`
`If this magic ToC existed, we would use it for everything.
`The reality is that every tracker today falls short on at
`
`least seven of these 10 characteristics, and that number
`is unlikely to shrink much in the foreseeable future.
`But all is not lost! Researchers and developers have
`pragmatically and cleverly exploited every available
`physical principle to achieve impressive results for spe-
`cific applications. We’ll start with an overview of some
`of the available ammunition and the strengths and
`weaknesses of each and then look at some specific appli-
`cations and the tracking technologies that have been
`employed successfully in each.
`
`Available ammunition
`Although designers have many pose estimation algo-
`rithms to choose among, they have relatively few sens-
`ing technologies at their disposal. In general, the
`technologies sense and interpret electromagnetic fields
`or waves, acoustic waves, or physical forces. Specifical-
`ly, motion tracking systems most often derive pose esti-
`mates from electrical measurements of mechanical,
`
`IEEE Computer Graphics and Applications
`
`25
`
`
`
`Motion Tracking Survey
`
`Accelerometers
`
`Gyroscopes
`
`Gyroscopes
`
`Motor
`
`Accelerometers
`
`1 (a) Stable-
`platform (gim-
`baled) INS. (b)
`Strapdown INS.
`
`Motor
`
`Motor
`
`(a)
`
`(b)
`
`inertial, acoustic, magnetic, optical, and radio frequen-
`cy sensors.
`Each approach has advantages and limitations. The
`limitations include modality-specific limitations relat-
`ed to the physical medium, measurement-specific limi-
`tations imposed by the devices and associated
`signal-processing electronics, and circumstantial limi-
`tations that arise in a specific application. For example,
`electromagnetic energy decreases with distance, ana-
`log-to-digital converters have limited resolution and
`accuracy, and body-worn components must be as small
`and lightweight as possible. Although alternative clas-
`sifications are possible, we discuss the available ammu-
`nition using a traditional medium-based classification.
`
`Mechanical sensing
`Arguably the simplest approach conceptually,
`mechanical sensing typically involves some form of a
`direct physical linkage between the target and the envi-
`ronment. The typical approach involves an articulated
`series of two or more rigid mechanical pieces intercon-
`nected with electromechanical transducers such as
`potentiometers or shaft encoders. As the target moves,
`the articulated series changes shape and the transduc-
`ers move accordingly. Using a priori knowledge about
`the rigid mechanical pieces and online measurements
`of the transducers, you can estimate the target’s posi-
`tion (one end of the link) with respect to the environ-
`ment (the opposite end).
`This approach can provide very precise and accurate
`pose estimates for a single target, but only over a rela-
`tively small range of motion—typically one cubic meter.
`In his pioneering HMD work in 1968, Sutherland built
`a mechanical tracker composed of a telescoping section
`with a universal joint at either end. While Sutherland
`and his colleagues found the system too cumbersome
`in practice, they relied on it as a “sure method” of deter-
`mining head pose. The most common uses of mechan-
`ical sensing today are for boom-type tracked displays
`that use counterweights to balance the load and for pre-
`cision 3D digitization over a small area. Commercial
`examples include the Boom 3C by FakeSpace and the
`FaroArm by Faro Technologies.
`Articulated haptic devices such as the Phantom by
`SensAble Technologies inherently include mechanical
`tracking of the force-feedback tip. These devices need
`
`to know the tip position to apply appropriate forces, and
`the electromechanical devices typically used to provide
`the forces can also be used to sense the position.
`
`Inertial sensing
`Inertial navigation systems (INSs) became widespread
`for ships, submarines, and airplanes in the 1950s, before
`virtual reality or computer graphics were even conceived,
`but they were the last of the six ammunition technolo-
`gies to be introduced for computer graphics input
`devices. The reason is straightforward: an INS contains
`gyroscopes, and early high-accuracy spinning-wheel
`gyroscopes weighed far too much to be attached to a per-
`son’s body. Not until the advent of MEMS (microelec-
`tronic mechanical systems) inertial sensors in the 1990s
`did the development of inertial input devices begin.
`Originally, inertial navigation systems were built with
`a gimbaled platform (see Figure 1a) stabilized to a par-
`ticular navigation reference frame (such as north-east-
`down) by using gyroscopes on the platform to drive the
`gimbal motors in a feedback loop. The platform-mount-
`ed accelerometers could then be individually double-
`integrated to obtain position updating in each direction,
`after compensating for the effect of gravity on the ver-
`tical accelerometer. Most recent systems are of a differ-
`ent type, called strapdown INS (see Figure 1b), which
`eliminates mechanical gimbals and measures a craft’s
`orientation by integrating three orthogonal angular-rate
`gyroscopes strapped down to the craft’s frame. To get
`position, three linear accelerometers, also affixed to the
`moving body, measure the acceleration vector in body-
`frame, which is then rotated into navigation coordinates
`using the current rotation matrix as determined by the
`gyroscopes. The result is a navigation-frame accelera-
`tion triad just like that measured by the accelerometers
`in the stable-platform INS, which can be gravity-com-
`pensated and double-integrated in the same way. Fig-
`ure 2 illustrates this flow of information.
`Inertial trackers might appear to be the closest thing
`to a silver bullet of all the ammunition technologies we
`describe here. Gyroscopes and accelerometers are
`already available in chip form, and within the next
`decade we expect to see a single-chip six-axis strapdown
`inertial measurement unit—that is, with three gyro-
`scopes and three accelerometers. Inertial sensors are
`completely self-contained, so they have no line-of-sight
`
`26
`
`November/December 2002
`
`
`
`Gyroscopes
`
`Integrate
`
`Orientation
`
`Accelerometers
`
`Rotate
`accelerometers
`into
`locally level
`navigation frame
`
`Remove effect
`of gravity from
`vertical
`accelerometer
`
`Double
`integrate
`
` Position
`
`2 Basic strap-
`down inertial
`navigation
`algorithm.
`
`Tracking Performance Specifications
`and Requirements
`In deciding the quality of tracking required for
`an application involving visual simulation such as
`virtual reality, there are several possible goals:
`
`I The user feels presence in the virtual world.
`I Fixed virtual objects appear stationary, even dur-
`ing head motion (perceptual stability).
`I No simulator sickness occurs.
`I Tracking artifacts don’t affect task performance.
`I Tracking artifacts remain below the detection
`threshold of a user looking for them.
`
`Several types of tracking errors can contribute in
`varying degrees to destroying the sense of
`presence or perceptual stability, causing sickness,
`or degrading task performance. Various authors
`and manufacturers have focused on different
`
`Table A. Tracking performance specifications.
`
`Static
`Spatial distortion. Repeatable errors at different
`poses in the working volume, including effects
`of all sensor scale factors, misalignments, and
`nonlinearity calibration residuals, and repeatable
`environmental distortions.
`Spatial jitter. Noise in the tracker output that
`causes the perception of the image shaking
`when the tracker is actually still.
`
`Stability or creep. Slow but steady changes in
`tracker output may appear over time. The cause
`might be temperature drift or repeatability errors
`if the tracker is power-cycled or moved and
`returned to the same pose.
`
`specifications or defined them differently, and
`every type of tracker has its own complicated
`idiosyncrasies that would require a thick
`document to characterize in complete detail.
`However, Table A presents six specifications that
`can capture the essential aspects of tracking
`performance that affect human perception of a
`virtual environment while a tracked object is still
`(static) or moving (dynamic).
`There’s no clearly defined distinction between
`spatial jitter and creep, as they could be thought
`of representing the high- and low-frequency
`portions of a continuous noise spectrum. A
`reasonable cutoff might be to designate as creep
`any motion slower than a minute hand in
`orientation (0.1 degree per second) and slower
`than 1 mm per second in translation, with
`everything else called jitter.
`
`Dynamic
`Latency. The mean time delay after a motion until
`corresponding data is transmitted. It’s possible to
`specify the latency of the tracker and other sub-
`systems separately, but they don’t simply add up.
`
`Latency jitter. Any cycle-to-cycle variations in the
`latency. When moving, this will cause stepping,
`twitching, multiple image formation, or spatial
`jitter along the direction the image is moving.
`Dynamic error (other than latency). This error type
`includes any inaccuracies that occur during
`tracker motion that can’t be accounted for by
`latency or static inaccuracy (creep and spatial
`distortion). This might include overshoots gener-
`ated by prediction algorithms or any additional
`sensor error sources that are excited by motion.
`
`requirements, no emitters to install, and no sensitivity to
`interfering electromagnetic fields or ambient noise.
`They also have very low latency (typically a couple of
`milliseconds or less), can be measured at relatively high
`rates (thousands of samples per second), and measured
`velocity and acceleration can generally be used to pre-
`dict the pose of a head or a hand 40 or 50 ms into the
`future. Good inertial sensors also offer extremely low
`
`jitter (see the sidebar, “Tracking Performance Specifi-
`cations and Requirements”).
`The weakness that prevents inertial trackers from
`being a silver bullet is drift. If one of the accelerometers
`has a bias error of just 1 milli-g, the reported position
`output would diverge from the true position with an
`acceleration of 0.0098 m/s2. After a mere 30 seconds,
`the estimates would have drifted by 4.5 meters! If you
`
`IEEE Computer Graphics and Applications
`
`27
`
`
`
`Motion Tracking Survey
`
`look closely at Figure 2, you can see that an orientation
`error of 1 milliradian coming from the gyroscopes would
`produce a gravity compensation error of 1 milli-g on one
`of the horizontal accelerometers, causing just this
`calamity.
`Even very good gyroscopes (the kind you wouldn’t
`want to wear on your head) drift by a milliradian with-
`in a short time. Nevertheless, given the advantages
`we’ve enumerated, inertial sensors can prove very valu-
`able when combined with one or more other sensing
`technologies, such as those we describe next. Inertial
`sensors have provided the basis for several successful
`hybrid systems.
`
`Acoustic sensing
`Acoustic systems use the transmission and sensing of
`sound waves. All known commercial acoustic ranging
`systems operate by timing the flight duration of a brief
`ultrasonic pulse.
`In contrast, in 1968 Sutherland built a continuous car-
`rier-phase acoustic tracking system to supplement his
`mechanical system.7 This system used a continuous-
`wave source and determined range by measuring the
`phase shift between the transmitted signal and the sig-
`nal detected at a microphone. Meyer and colleagues
`point out that this “phase-coherent” method enables
`continuous measurement without latency but can only
`measure relative distance changes within a cycle.3 To
`measure absolute distance, you need to know the start-
`ing distance and then keep track of the number of accu-
`mulated cycles. Another problem, which could be the
`reason no successful implementation of the phase-
`coherent approach has been developed, is the effect of
`multipath reflections. Multipath, a term also associated
`with radio transmission, indicates that the signal
`received is often the sum of the direct path signal and
`one or more reflected signals of longer path lengths.
`Because walls and objects in a room are extremely reflec-
`tive of acoustic signals, the amplitude and phase of the
`signal received from a continuous-wave acoustic emit-
`ter in a room will vary drastically and unpredictably with
`changes in the receiver’s position.
`An outstanding feature of pulsed time-of-flight
`acoustic systems is that you can overcome most multi-
`path reflection problems by waiting until the first pulse
`arrives, which is guaranteed to have arrived via the
`direct path unless the signal is blocked. The reason this
`method works for acoustic systems but not for radio fre-
`quency and optical systems is that sound travels rela-
`tively slowly, allowing a significant time difference
`between the arrival of the direct path pulse and the first
`reflection.
`Point-to-point ranging for unconstrained 3D tracking
`applications requires transducers that are as omnidi-
`rectional as possible, so that the signal can be detected
`no matter how the emitter is positioned or oriented in
`the tracking volume. To achieve a wide beam width, you
`must use small speakers and microphones with active
`surfaces a few millimeters in diameter. This is conve-
`nient for integration into human motion tracking
`devices and helps reduce off-axis ranging errors, but the
`efficiency of an acoustic transducer is proportional to
`
`the active surface area, so these small devices can’t offer
`as much range as larger ones.
`To improve the range, most systems use highly reso-
`nant transducers and drive them with a train of electri-
`cal cycles right at the resonant frequency to achieve high
`amplitude. This results in a received waveform that
`“rings up” gradually for about 10 cycles to a peak ampli-
`tude then gradually rings down. For a typical envelope-
`peak detection circuit, this means the point of detection
`is delayed about 10 cycles—about 90 mm—from the
`beginning of the waveform. By detecting on the second
`or third cycle instead of the 10th, you can greatly reduce
`the risk of multipath reflection.
`In our experience, this is one of the most important
`issues for accurate ultrasonic tracking outside of con-
`trolled laboratory settings, and it is the crux of how
`InterSense’s ultrasonic ranging technology remains
`accurate at longer ranges than others.
`The physics of ultrasonic waves in air and transducer
`design dictate other design trade-offs and considerations
`as well. Most ambient noise sources fall off rapidly with
`increasing frequency, so operating at a higher frequency
`is beneficial for avoiding interference, and the shorter
`wavelengths offer higher resolution. However, selecting
`a higher frequency reduces the range because of prob-
`lems with transducer size and frequency-dependent
`attenuation of sound in air, which starts to play a signif-
`icant role by 40 kHz and becomes the dominant factor
`in limiting range by 80 kHz, depending on humidity.
`Ultrasonic trackers typically offer a larger range than
`mechanical trackers, but they’re not a silver bullet. Their
`accuracy can be affected by wind (in outdoor environ-
`ments) and uncertainty in the speed of sound, which
`depends significantly on temperature, humidity, and air
`currents. A rule of thumb is that the speed of sound
`changes about 0.1 percent per degree Fahrenheit of tem-
`perature differential. This corresponds to about a one-
`millimeter error per degree Fahrenheit at one meter.
`Acoustic systems’ update rate is limited by reverbera-
`tion. Depending on room acoustics and tracking volume,
`it may be necessary for the system to wait anywhere from
`5 to 100 ms to allow echoes from the previous measure-
`ment to die out before initiating a new one, resulting in
`update rates as slow as 10 Hz. The latency to complete a
`given acoustic position measurement is the time for the
`sound to travel from the emitter to the receivers, or about
`one millisecond per foot of range. This is unaffected by
`room reverberation and is usually well under 15 ms in
`the worst case. However, in a purely acoustic system with
`a slow update rate, the need to wait for the next mea-
`surement also affects system latency.
`Acoustic systems require a line of sight between the
`emitters and the receivers, but they’re somewhat more
`tolerant of occlusions than optical trackers (which we
`discuss later) because sound can find its way through
`and around obstacles more easily. Finally, we have yet to
`see a purely acoustic tracker that doesn’t go berserk
`when you jingle your keys.
`You can address most of the shortcomings we’ve men-
`tioned by building a hybrid system that combines
`acoustic sensors with others that have complementary
`characteristics—inertial sensors, for example.
`
`28
`
`November/December 2002
`
`
`
`3 Position
`sensing
`detector.
`
`Courtesy of UDT Sensors
`
`Magnetic sensing
`Magnetic systems8 rely on measurements of the local
`magnetic field vector at the sensor, using magnetome-
`ters (for quasi-static direct current fields) or current
`induced in an electromagnetic coil when a changing
`magnetic field passes through the coil (for active-source
`alternating current systems). Three orthogonally ori-
`ented magnetic sensors in a single sensor unit can pro-
`vide a 3D vector indicating the unit’s orientation with
`respect to the excitation.
`You can use the earth’s magnetic field as a naturally
`occurring, widely available DC source to estimate head-
`ing. The shape of the earth’s magnetic field varies to
`some extent over the planet’s surface, but you can use a
`look-up table to correct for local field anomalies.
`Alternatively, you can actively induce excitations with
`a multicoil source unit. This has been a popular means
`for tracking for interactive graphics for many years. You
`can energize each of the source unit coils in sequence
`and measure the corresponding magnetic field vector
`in the sensor unit. With three such excitations, you can
`estimate the position and orientation of the sensor unit
`with respect to the source unit.
`However, ferromagnetic and conductive material in
`the environment can affect a magnetic field’s shape. A
`significant component of the resulting field distortion
`results from unintended fields that appear around near-
`by conductive objects as the source induces eddy cur-
`rents in them. These small fields act in effect as small
`unwanted source units. The most common approach to
`addressing these distortions is to ensure that the work-
`ing volume contains no offending objects. This is why,
`for example, you might see a projector-based display
`system built out of wood or plastic. If you can’t elimi-
`nate the offending objects (perhaps because they’re an
`integral part of the application) you can try to model
`and correct for the resulting distortions.
`You can use alternating or direct current signals to
`excite the source unit’s coils. The use of AC was initial-
`ly popular, but precisely because of the transient dis-
`tortions we just mentioned, manufacturers introduced
`the use of DC fields. Even with DC fields, you must wait
`for the initial transient of each excitation to subside. Fur-
`thermore, you must make an additional excitation-free
`measurement of the ambient magnetic field to remove
`its effect.
`With both AC and DC active source systems, the use-
`ful range of operation is severely limited by the inverse
`cubic falloff of the magnetic fields as a function of dis-
`tance from the source. Position resolution in the radial
`direction from source to sensor depends on the gradi-
`ent of the magnetic field strength, and thus the posi-
`tional jitter grows as the fourth power of the separation
`distance.
`Despite magnetic field strength and distortion prob-
`lems, there are three noteworthy advantages to a mag-
`netic approach to tracking humans. First, the size of the
`user-worn component can be quite small. Second, mag-
`netic fields pass right through the human body, elimi-
`nating line-of-sight requirements. Third, you can use a
`single source unit to simultaneously excite (and thus
`track) multiple sensor units.
`
`Optical sensing
`Optical systems rely on measurements of reflected or
`emitted light. These systems inevitably have two compo-
`nents: light sources and optical sensors. The light sources
`might be passive objects that reflect ambient light or
`active devices that emit internally generated light. Exam-
`ples of passive light sources include distinguishable col-
`ored fiducials and even the natural surfaces in the
`environment. Examples of active light sources include
`light-emitting diodes (LEDs), lasers, or simple light bulbs.
`Optical sensors can be either analog or digital
`devices. Analog sensors offer continuous voltages indi-
`cating the overall intensity or centroid position of the
`aggregate light reaching the sensor. Digital sensors
`offer a discrete image of the scene projected onto the
`sensor. Both types of devices can be 1D or 2D. One-
`dimensional sensors can typically be sampled and
`processed at a higher rate than 2D sensors, but 2D sen-
`sors offer more information per (complete) sample.
`(Later, we’ll describe some systems that use 1D optical
`sensors and some that use 2D sensors.)
`Lenses and apertures can be used to project images
`onto the sensor, indicating the angle to the source. You
`can also use the intensity of light reaching the sensor to
`estimate the distance to the source. Filters can be added
`to selectively admit or reject certain wavelengths of
`light. For example, a sensor system might use infrared
`light sources in conjunction with filters that only admit
`infrared light, effectively providing a light “channel”
`separate from the ambient visible light.
`The simplest analog sensor is a photosensor, a device
`that simply changes resistance as a function of the quan-
`tity of light reaching it. While individual photosensors
`offer relatively little information, relative or ratiomet-
`ric amplitudes within a set of sensors can offer position
`information. Photosensors have the advantage of sim-
`plicity and speed.
`An analog position sensing detector (PSD) is a 1D or
`2D semiconductor device that produces a set of currents
`that indicate the position of the centroid of the light
`reaching the sensor (see the example in Figure 3). Like
`photosensors, PSDs offer measurements based on the
`total light reaching the device. As such, the target light
`source amplitude is typically under program control, so
`
`IEEE Computer Graphics and Applications
`
`29
`
`
`
`Motion Tracking Survey
`
`Light
`
`4 Simplified
`diagram of
`some cells from
`an image-form-
`ing charge-
`coupled device
`(CCD).
`
`Substrate
`
`Metal
`
`Outside-In or Inside-Out?
`When using optical emitters and sensors for tracking, we must
`consider whether to put the light sources on the moving target
`and the sensors in the environment, or vice versa. The first of these
`two alternatives is often described as outside-looking-in, the second
`as inside-looking-out.
`However, these terms can be misleading. For example, the
`Vulcan Measurement System from Arc Second
`(http://www.arcsecond.com) employs multiple optical sensors
`mounted on the target and two or more spinning light sources
`mounted in the environment. The spinning light sources sweep out
`distinct planes of light that periodically hit the optical sensors, and
`the system uses the timing of the hits to derive the sensors’
`positions. While the target-mounted optical sensors do indeed
`“look outward” toward the environment, the system actually has
`the orientation sensitivity characteristics of what is typically called
`an outside-looking-in system. Thus, the typical inside-looking-out
`characterization would be misleading.
`The actual distinguishing factor is whether bearing angles to
`reference points are measured from the outside or the inside.
`
`that the system can use differential signaling to distin-
`guish the target from the ambient light.
`The more familiar digital image-forming devices such
`as charge-coupled devices (CCDs) typically use a dense
`1D or 2D array of pixel sensors that convert light ener-
`gy (photons) into an electrical charge. These systems
`use the array of pixel sensors to produce a discretely
`sampled image of a scene by simultaneously opening
`the pixel sensors to collect light energy over a short time
`interval. Electronics surrounding the pixels then trans-
`fer the array of charges off the chip. Figure 4 is a sim-
`plified diagram of a CCD.
`Although a large set of pixel sensors can be triggered
`simultaneously, measuring and transferring the per-
`pixel charge into a computer can be relatively time-con-
`suming. The result is that image-forming devices are
`typically limited to relatively few measurements per unit
`of time when compared to the simpler analog optical
`PSD described earlier.
`
`30
`
`November/December 2002
`
`Glass
`
`Of course, 1D or 2D images typi-
`cally offer more constraints on a
`pose estimate—for example, letting
`you extract shape, shading, or
`motion of multiple image features.
`However, you must interpret the
`image to obtain those constraints, a
`process that can be computational-
`ly costly. Special-purpose process-
`ing can help, but interpretation is
`still difficult because of variations in
`lighting and surface properties,
`occlusions, and independent (con-
`founding) object motion in the
`images.
`As with other types of sensors,
`you can combine measurements
`from two or more optical sensor
`u