`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. VOL. 10, NO. 3. MAY 1988
`
`VITS-A Vision System for Autonomous Land
`Vehicle Navigation
`
`MATTHEW A. TURK, MEMBER, IEEE, DAVID G. MORGENTHALER, KEITH D. GREMBAN,
`AND MARTIN MARRA
`
`Abstract-In order to adequately navigate through its environment,
`a mobile robot must sense and perceive the structure of that environ(cid:173)
`ment, modeling world features relevant to navigation. The primary
`vision (or perception) task is to provide a description of the world rich
`enough to facilitate such behaviors as road-following, obstacle avoid(cid:173)
`ance, landmark recognition, and cross-country navigation. We de(cid:173)
`scribe VITS, the vision system for Alvin, the Autonomous Land Ve(cid:173)
`hicle, addressing in particular the task of road-following. The AL V has
`performed public road-following demonstrations, traveling distances
`up to 4.5 km at speeds up to 10 km/hr along a paved road, equipped
`with an RGB video camera with pan/tilt control and a laser range scan(cid:173)
`ner. The AL V vision system builds symbolic descriptions of road and
`obstacle boundaries using both video and range sensors. We describe
`various road segmentation methods for video-based road-following,
`along with approaches to boundary extraction and transformation of
`boundaries in the image plane into a vehicle-centered three dimen(cid:173)
`sional scene model.
`
`Index Terms-Autonomous navigation, computer vision, mobile ro(cid:173)
`bot vision, road-following.
`
`I. INTRODUCTION
`
`To achieve goal-directed autonomous behavior, the
`
`vision system for a mobile robot must locate and
`model the relevant aspects of the world so that an intel(cid:173)
`ligent navigation system can plan appropriate action. For
`an outdoor autonomous vehicle, typical goal-directed be(cid:173)
`haviors include road-following, obstacle avoidance, cross(cid:173)
`country navigation, landmark detection, map building and
`updating, and position estimation. The basic vision task
`is to provide a description of the world rich enough to
`facilitate such behaviors. The vision system must then in(cid:173)
`terpret raw sensor data, perhaps from a multiplicity of
`sensor and sensor types, and produce consistent symbolic
`descriptions of the pertinent world features.
`In May of 1985, "Alvin," the Autonomous Land Ve(cid:173)
`hicle at Martin Marietta Denver Aerospace, performed its
`
`Manuscript received December 15, 1986; revised May 15, 1987. This
`work was performed under the Autonomous Land Vehicle Program sup(cid:173)
`ported by the Defense Advanced Research Projects Agency under Contract
`DACA 76-84-C-0005.
`M. A. Turk was with Martin Marietta Denver Aerospace, P.O. Box
`179, M. S. H0427, Denver, CO 80201. He is now with the Media Labo(cid:173)
`ratory, Massachusetts Institute of Technology, Cambridge, MA 02139.
`D. G. Morgenthaler and M. Marra are with Martin Marietta Denver
`Aerospace, P.O. Box 179, M.S. H0427, Denver, CO 80201.
`K. D. Gremban is with the Department of Computer Science, Camegie(cid:173)
`Mellon University, Pittsburgh, PA 15213, on leave from Martin Marietta
`Devner Aerospace, P.O. Box 179, M.S. H0427, Denver, CO 80201.
`IEEE Log Number 8820101.
`
`first public road-following demonstration. In the few
`months leading up to that performance, a basic vision sys(cid:173)
`tem was developed to locate roads in video imagery and
`send three-dimensional road centerpoints to Alvin's nav(cid:173)
`igation system. Since that first demonstration, VITS (for
`Vision Task Sequencer) has matured into a more general
`framework for a mobile robot vision system, incorporat(cid:173)
`ing both video and range sensors and extending its road(cid:173)
`following capabilities. A second public demonstration in
`June 1986 showed the improved road-following ability of
`the system, allowing the ALV to travel a distance of 4.2
`km at speeds up to 10 km/hr, handle variations in road
`surface, and navigate a sharp, almost hairpin, curve. In
`October 1986 the initial obstacle avoidance capabilities
`were demonstrated, as Alvin steered around obstacles
`while remaining on the road, and speeds up to 20 km/hr
`were achieved on a straight, obstacle-free road. This pa(cid:173)
`per describes Alvin's vision system and addresses the par(cid:173)
`ticular task of video road-following. Other tasks such as
`obstacle detection and avoidance and range-based road(cid:173)
`following are discussed elsewhere [10], [11], [36].
`
`A. A Brief Review of Mobile Robot Vision
`
`SRI' s Shakey was the first mobile robot with a func(cid:173)
`tional, albeit very limited, vision system. Shakey was pri(cid:173)
`marily an experiment in problem solving methods, and its
`blocks world vision system ran very slowly. The JPL ro(cid:173)
`bot [32] used visual input to form polygonal terrain models
`for optimal path construction. Unfortunately, the project
`halted before the complete system was finished.
`The Stanford Cart [25], [26] used a single camera to
`take nine pictures, spaced along a 50 cm track, and used
`the Moravec interest operator to pick out distinctive fea(cid:173)
`tures in the images. These features were correlated be(cid:173)
`tween images and their three dimensional positions were
`found using a stereo algorithm. Running with a remote,
`time-shared computer as its ''brain,'' the Stanford Cart
`took about five hours to navigate a 20 meter course, with
`20 percent accuracy at best, lurching about one meter
`every ten to fifteen minutes before stopping again to take
`pictures, think, and plan a new path. The Cart's "sliding
`stereo'' system chose features generally good enough for
`navigation in a cluttered environment, but it did not pro(cid:173)
`vide a meaningful model of the environment.
`Tsugawa et al. [34] describe an autonomous car driven
`
`0162-8828/88/0500-0342$01.00 © 1988 IEEE
`
`Valeo Ex. 1016
`
`
`
`TURK et al.: VITS-AUTONOMOUS LAND VEHICLE NAVIGATION
`
`343
`
`up to 30 km/hr using a vertical stereo pair of cameras to
`detect expected obstacles, but its perception of the world
`was very minimal, limiting its application to a highly con(cid:173)
`strained environment. The "intelligent car" identified
`obstacles in an expected range very quickly by comparing
`edges in vertically displaced images. A continuous "ob(cid:173)
`stacle avoidance'' mode was in effect, and a model of the
`world was not needed.
`A vision system for a mobile robot intended for the fac(cid:173)
`tory floor was presented by Inigo et al. [ 18]. This system
`used edge detection, perspective inversion, and line fit(cid:173)
`ting (via a Hough transform) to find the path, an a priori
`road model of straight lines, and another stereo technique
`using vertical cameras, called motion driven scene cor(cid:173)
`relation, to detect obstacles. The Fido vision system [33]
`uses stereo vision to locate obstacles by a hierarchical cor(cid:173)
`relation of points chosen by an interest operator. Its model
`of the world consists of only the 3-D points it tracks, and
`it has successfully navigated through a cluttered environ(cid:173)
`ment and along a sidewalk. Current work in multisensory
`perception for the mobile robot Hilare is presented by de
`Saint Vincent [7], describing a scene acquisition module,
`using stereo cameras and a laser range finder, and a "dy(cid:173)
`namic vision'' module for robot position correction and
`tracking world features. Another stereo vision system
`based on matching vertical edges and inferring surfaces is
`described by Tsuji et al. [35].
`The goal of a mobile robot project in West Germany is
`to perform autonomous vehicle guidance on a German
`Autobahn at high speeds [8], [22], [29]. The current em(cid:173)
`phasis is on control aspects of the problem, incorporating
`a high-speed vision algorithm to track road border lines.
`The system has performed both road-following and ve(cid:173)
`hicle-following in real-time.
`Other mobile robots have been or are being developed
`that use sensors particularly suited to an indoor environ(cid:173)
`ment (e.g., [4], [19]). The project headed by Brooks [2]
`implements a novel approach to a mobile robot architec(cid:173)
`ture, emphasizing levels of behavior rather that functional
`modules; much of the current vision work may be incor(cid:173)
`porated into such a framework.
`
`B. ALV Background
`The Autonomous Land Vehicle project, part of DAR(cid:173)
`PA's Strategic Computing Program, is intended to ad(cid:173)
`vance and demonstrate the state of the art in image un(cid:173)
`derstanding,
`artificial
`intelligence,
`advanced
`architectures, and autonomous navigation. A description
`of the project and the initial system configuration is found
`in [24]. Related vision research is proceeding concur(cid:173)
`rently by a number of industrial and academic groups, as
`is work in route and path planning, as well as object mod(cid:173)
`eling and knowledge representation. The AL V project is
`driven by a series of successively more ambitious dem(cid:173)
`onstrations. The ultimate success of the project depends
`on coordination among the different groups involved to
`enable rapid technology transfer from the research do(cid:173)
`main to the application domain. As the ALV is intended
`
`to be a national testbed for autonomous vehicle research,
`various vision systems and algorithms will eventually be
`implemented. Some of the current work is briefly de(cid:173)
`scribed in the remainder of this section.
`Vision research areas currently being pursued in rela(cid:173)
`tion to the ALV program include object modeling, stereo,
`texture, motion detection and analysis, and object recog(cid:173)
`nition. An architecture for terrain recognition which uses
`model-driven schema instantiation for terrain recognition
`is presented by Lawton et al. [23]. Such representations
`for terrain models will be important for future cross-coun(cid:173)
`try navigation. Waxman et al. [40], [41] present a visual
`navigation system that incorporates rule-based reasoning
`with image processing and geometry modules. The sys(cid:173)
`tem, developed at the University of Maryland, finds dom(cid:173)
`inant linear features in the image and reasons about these
`features to describe the road, using bootstrap and feed(cid:173)
`forward image processing phases. In the feed-forward
`phase, previous results are used to predict the location of
`the road in successive images. A subset of this system has
`been used to autonomously drive the AL V for short dis(cid:173)
`tances. DeMenthon [6] describes an alternative geometry
`module for the above visual navigation system.
`Significant AL V-related work is proceeding at Carne(cid:173)
`gie-Mellon University (CMU). A review ofrecent results
`from the CMU program is presented by Goto and Stentz
`[13]. Outdoor scene analysis using range data from a laser
`range scanner is presented by Hebert and Kanade [ 16],
`describing methods for preprocessing range data, extract(cid:173)
`ing three dimensional features, scene interpretation, map
`building, and object recognition. Fusion of video and
`range data is also discussed. Range data processing has
`been used on the CMU Navlab to demonstrate obstacle
`avoidance capabilities. Vision algorithms used for suc(cid:173)
`cessful outdoor navigation of the CMU Terregator are de(cid:173)
`scribed by Wallace et al. [37]-[39]. The Terregator has
`achieved continuous motion navigation using both edge(cid:173)
`based and color-based sidewalk finding algorithms.
`Hughes Artificial Intelligence Center is developing
`knowledge-based vision techniques for obstacle detection
`and avoidance using the concept of a virtual sensor which
`blends raw sensor data with specialized processing in re(cid:173)
`sponse to a request from the planning system [5], [30].
`Work at SRI International is focused on object modeling
`and recognition, and on modeling uncertainty in multiple
`representations [ 1].
`FMC Corporation and General Dynamics have dem(cid:173)
`onstrated successful transfer of AL V technology to mis(cid:173)
`sion-oriented scenarios of mixed teleoperation and auton(cid:173)
`omous navigation, performed at the Martin Marietta test
`site in 1986. Kuan et al. [20], [21] describe FMC's re(cid:173)
`search in vision-guided road-following. Other university
`and industrial laboratories which are engaged in vision
`research related to AL V include Advanced Decision Sys(cid:173)
`tems, Columbia University, General Electric, Honeywell
`Research Center, MIT, University of Massachusetts at
`Amherst, University of Rochester, and USC. The Pro(cid:173)
`ceedings of the February 1987 Image Understanding
`
`
`
`344
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.
`
`IO, NO. 3, MAY 1988
`
`Sensors
`
`Control 1 ---,.__::Sce=ne:_::M:::o:::de::.l __ ~
`Position Update
`
`Data
`
`Visual Cues
`
`Trajectory
`
`Knowledge
`Base
`
`Fig. I. The ALY system configuration.
`
`Workshop, sponsored by DARPA, contains descriptions
`and status reports of many of these projects.
`The vision system described in this paper (VITS) is the
`system meeting the perception requirements for testing
`and formal demonstrations of the ALV through 1986.
`Section II gives a system overview, briefly describing the
`various subsystems; it is important to understand the vi(cid:173)
`sion system in its context. Video-based road following is
`discussed in Section Ill, describing sensor control, road
`segmentation, road boundary extraction, and geometric
`transformation to three dimensional world coordinates.
`
`II. ALV SYSTEM OVERVIEW
`It is important to view Alvin's vision subsystem as an
`integral part of a larger system, which can affect and be
`affected by the performance of the system as a whole. Fig.
`1 illustrates the basic system configuration of the AL V,
`including the interfaces to the major modules. In the par(cid:173)
`agraphs below, each of Alvin's major components will be
`discussed in the context of the interaction as a complete
`system.
`
`A. Hardware Components
`The primary consideration behind selection of the hard(cid:173)
`ware components was that Alvin is intended to be a testbed
`for research in autonomous mobility systems. Conse(cid:173)
`quently, it was necessary to provide Alvin with an under(cid:173)
`carriage and body capable of maneuvering both on-road
`and off-road, while carrying on board all the power, sen(cid:173)
`sors, and computers needed for autonomous operation. In
`addition, the requirements of autonomous operation di(cid:173)
`rected the selection of sensors and processing hardware.
`1) Vehicle: Fig. 2 is a photograph of Alvin. The over(cid:173)
`all vehicle dimensions are 2. 7 m wide by 4.2 m long; the
`suspension system allows the height of the vehicle to be
`varied, but it is nominally 3.1 m.
`Alvin weighs approximately 16 000 pounds fully loaded
`with equipment, yet is capable of traveling both on-road
`and off-road. The undercarriage is an all-terrain built by
`Standard Manufacturing, Inc. The basic vehicle is eight(cid:173)
`wheel drive, diesel-powered, and hydrostatically driven.
`Alvin is steered like a tracked vehicle by providing dif(cid:173)
`ferential power to the two sets of wheels.
`Alvin's fiberglass shell protects the interior from dust
`and inclement weather, and insulates the equipment in(cid:173)
`side. The shell provides space for six full-size equipment
`racks, as well as room for service access. The electronics
`within the AL V are powered by an auxiliary power unit.
`
`Fig. 2. Alvin.
`
`An environmental control unit cools the interior of the
`shell.
`2) Sensors: In order to function in a natural environ(cid:173)
`ment, an autonomous vehicle must be able to sense the
`terrain around it, as well as keep track of heading and
`distance traveled. The ALV hosts a number of sensors to
`accomplish these tasks.
`Alvin's sense of direction and distance traveled is pro(cid:173)
`vided by odometers on the wheels coupled to a Bendix
`Land Navigation System (LNS). These sensors enable Al(cid:173)
`vin to follow a trajectory derived from visual data or read
`from a prestored map. The LNS provides direction as an
`angle from true North, while distance traveled is provided
`in terms of horizontal distance (Northings and Eastings),
`and altitude.
`Two imaging sensors are currently available on the
`ALV for use by VITS. The primary vision sensor is an
`RCA color video CCD camera, which provides 480 x
`512 red, green, and blue images, with eight bits of inten(cid:173)
`sity per image. The field of view (38° vertical and 50°
`horizontal) and focus of the camera are kept fixed. The
`camera is mounted on a pan/tilt unit that is under direct
`control of the vision subsystem.
`The other vision sensor is a laser range scanner, devel(cid:173)
`oped by the Environmental Research Institute of Michi(cid:173)
`gan (ERIM). This sensor determines range by measuring
`the phase shift of a reflected modulated laser beam. The
`laser is continuously scanned over a field of view that is
`30° vertical and 80° horizontal. The output of the scanner
`is a digital image consisting of a 64 X 256 array of pixels
`with 8 bits of range resolution.
`
`
`
`TURK et al.: VITS-AUTONOMOUS LAND VEHICLE NAVIGATION
`
`345
`
`Camera
`
`Video Data
`
`Videotape
`Recorder
`
`Time(cid:173)
`Code
`Generator
`
`Left & Right
`Odometers
`
`VICOM
`Image
`Processor
`
`Vehicle
`Control
`& Status
`
`Laser
`Scanner
`Processor
`
`Laser
`Range
`Scanner
`
`Time-
`Code
`Generator
`l/F
`
`Digital
`Control
`
`Land
`Navigation
`System
`Interface
`
`A/D
`
`D/A
`
`Master
`Processor
`80286/80287
`512-kB RAM
`
`Navigation
`Processor
`80816
`128-kB RAM
`
`Vehicle
`Control
`Processor
`8086
`128 kB RAM
`
`Multichannel
`Controller
`8089
`
`~~D_i_sk_C_o_nt_ro_ll_er~~----'-~~~~----'-~~~~_,_~~~~-l
`
`4-MBps
`
`Fig. 3. The first-generation ALY processor configuration.
`
`3) Computer Hardware: Alvin currently uses a variety
`of computers, resulting from the range of processing re(cid:173)
`quirements of the different software subsystems. The di(cid:173)
`verse processing requirements were met by designing a
`modular multiprocessor architecture. VITS is hosted on a
`Vicom image processor, while the other software subsys(cid:173)
`tems are hosted on an Intel multiprocessor system. VITS
`communicates with the other subsystems across a dedi(cid:173)
`cated communication channel, while the other subsystems
`communicate across a common bus. Fig. 3 depicts the
`processor configuration.
`The special capabilities of the Vicom hardware were
`important to the development of the AL V vision subsys(cid:173)
`tem (VITS). The Vicom contains video digitizers, and can
`perform many standard image processing operations at
`near video frame rate (1/30 second). For example, 3 x
`3 convolution, point mapping operations (such as thresh(cid:173)
`olding, or addition and subtraction of constants), and im(cid:173)
`age algebra (such as addition or subtraction of two im(cid:173)
`ages) are all frame rate operations. The Vicom also
`contains a general purpose microcomputer for additional,
`user-defined operations.
`As stated above, Alvin is intended to be a testbed for
`autonomous systems. In fulfilling this charter, plans have
`been made to integrate a number of advanced experimen(cid:173)
`tal computer architectures in future generations of the
`ALV system. This will begin with a new architecture in
`early 1987.
`
`B. Vision
`The vision subsystem is composed of three basic mod(cid:173)
`ules: VITS, the vision executive, which handles initial(cid:173)
`ization, sets up communication channels, and "oversees"
`the processing; VIVO, the video data processing unit; and
`VIRD, the range data processing unit. Range data pro(cid:173)
`cessing has been implemented on the AL V, and results of
`
`range-based road-following and obstacle avoidance are
`presented in [ 10], [ 11].
`The vision system software resides entirely on the Vi(cid:173)
`com image processor, which also houses a board dedi(cid:173)
`cated to camera pan/tilt control and a board to enable
`communication with the Intel system. Nearly all of the
`application code is written in Pascal and uses the Vicom(cid:173)
`supplied libraries for accessing high-speed image opera(cid:173)
`tions. Some low level control routines have been imple(cid:173)
`mented in Motorola 68000 assembly language.
`The responsibility of the vision subsystem in road-fol(cid:173)
`lowing is to process data in the form of video or range
`images to produce a description of the road in front of the
`vehicle. This description is passed to the reasoning sub(cid:173)
`system, which uses additional data such as current posi(cid:173)
`tion, speed, and heading to generate a trajectory for Alvin
`to follow. Communication between the vision subsystem
`and Reasoning takes place in three different forms: the
`scene model, the position update, and visual cues. A spe(cid:173)
`cial communication control processor, part of the utilities
`subsystem, mediates communication between VITS and
`the other subsystems. The control processor shares mem(cid:173)
`ory with VITS, and handles communication by examining
`the content of key memory locations every 100 ms and
`modifying them as appropriate.
`1) Scene Model: The scene model, a description of the
`observed road, is the output of the vision subsystem after
`each frame of images is processed. The scene model con(cid:173)
`tains a record of Alvin's position and heading at the time
`of image acquisition, a description of the road found in
`the imagery, consisting of lists of vehicle-centered 3-D
`points denoting left and right road edges, and an optional
`list of points surrounding an obstacle. The reasoning sub(cid:173)
`system must then transform the road description into a
`fixed, world coordinate system for navigation. VITS may
`optionally specify the scene model in world coordinates;
`
`
`
`346
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 10, NO. 3, MAY 1988
`
`this is more efficient when data acquired from multiple
`sensors or at different times is used to create the scene
`model.
`Since the time needed to compute a scene model is non(cid:173)
`deterministic, VITS sets a "scene model ready" flag in(cid:173)
`dicating that a new scene model is ready to be processed.
`The communication controller examines this flag, and,
`when set, transfers the scene model to the reasoning sub(cid:173)
`system and clears the flag.
`Fig. 4 illustrates the format of a scene model. Fig. 5 is
`an example of a hypothetical road scene and the corre(cid:173)
`sponding scene model.
`2) Position Update: VITS must know the position and
`heading of the vehicle at the time of image acquisition to
`integrate sensor information acquired at different times,
`and to transform vehicle-centered data into world coor(cid:173)
`dinates. In addition, VITS must be able to predict the lo(cid:173)
`cation of the road in an image, given its location in the
`preceding image (see Section III-B-1-d).
`Communication of vehicle motion and position infor(cid:173)
`mation is effected by means of a position update message
`passed from Reasoning to the vision subsystem. The po(cid:173)
`sition update specifies the current vehicle speed, position,
`and heading. Synchronization of position update and im(cid:173)
`age acquisition is mediated by a position update request.
`At the time VITS digitized an image, the ''position update
`request'' flag is set. When the communication controller
`finds the flag set, it sends a message to Reasoning which
`immediately (within 100 ms) generates the required in(cid:173)
`formation, builds a position update message, and sends it
`to VITS.
`3) Visual Cues: The reasoning subsystem interfaces to
`a knowledge base which contains information about the
`test area. Some of this information can be used by VITS
`to specify behavior (find road, locate obstacles, pause,
`resume) or to optimize processing, much as the informa(cid:173)
`tion on a road map can guide a driver. When Reasoning
`determines that a visually identifiable feature should be
`within the field of view, a visual cue is sent to VITS en(cid:173)
`abling vision processing to be modified. In the future,
`when Alvin's domain becomes more complex, these cues
`will be used to guide the transition from one road surface
`to another, from on-road to off-road and vice versa, or to
`guide the search for a landmark. In the current version of
`the system, stored knowledge about the shape of the road
`shoulder has been used to guide a transition between
`range-based and video-based road-following. Apart from
`this, the cue facility has been used to date only to notify
`VITS that the vehicle is approaching a curve (which
`causes the camera panning mechanism to be enabled) and
`to send pause and resume commands to VITS.
`
`C. Reasoning
`
`The Reasoning subsystem is the executive controller of
`the ALV; Vision is a resource of Reasoning. At the high(cid:173)
`est level, Reasoning is responsible for receiving a plan
`script from a human test conductor and coordinating the
`
`type scene_model = record
`time: array[1 . .4J of word; {time stamp}
`count: word; { #of road edge records}
`x,y,psi: real; {vehicle position}
`SM_rec: array[1 .. 1 OJ of record
`tag: string[2J: { lett or right}
`numpts: word; {#of points}
`pts: array [1 .. 1 OJ
`of array[1 .. 3J of real;
`
`end;
`{current SW version}
`version: string[1 OJ;
`num: word; { scene model#}
`
`end;
`Fig. 4. The scene model format.
`
`Scene Model
`
`14
`02
`30
`565
`
`203.6
`1451.3
`1.7
`
`} time stamp
`} Posttion stamp
`
`} # of edge records
`
`VITS 12.2
`
`Version
`
`34
`
`LL
`7
`8.2, 5.1, ·1.1
`
`:
`
`}
`
`S.M.#
`
`Left
`edge
`record
`
`RR
`7
`
`8.2, ·19· -1.1
`
`} Right
`edge
`record
`
`Fig. 5. Road scene and corresponding scene model.
`
`other subsystems on Alvin in order to accomplish the goals
`specified in the script.
`Because the processing involved in creating a visual de(cid:173)
`scription of the environment is beyond the real-time ca(cid:173)
`pability of present computers, the scene model is not used
`directly in the vehicle's control servo loop. Instead, the
`Navigator (part of the reasoning subsystem) pieces to(cid:173)
`gether scene models from the vision system and builds a
`reference trajectory that is sent to the Pilot for control.
`The reasoning subsystem accepts a position update re(cid:173)
`quest from VITS, generates the appropriate data, and
`sends back a position update. Upon receipt of a scene
`model, Reasoning evaluates it and plots a smooth trajec(cid:173)
`tory if the data is acceptable. The new trajectory is com(cid:173)
`puted to smoothly fit the previous trajectory.
`Evaluation of scene models is a powerful capability of
`the reasoning subsystem. Small environmental changes,
`such as dirt on the road, or the sudden appearance of a
`cloud, can significantly affect the output of the vision sub(cid:173)
`system. Reasoning uses assumptions about the smooth(cid:173)
`ness and continuity of roads to verify data from VITS.
`Every scene model is evaluated based on the smoothness
`of the road edges, and on how well they agree with pre(cid:173)
`vious edges. A scene model evaluated as "bad" is dis(cid:173)
`carded.
`Reasoning creates a new trajectory by minimizing a cost
`function based on current heading, curvature of the scene
`model, attraction to a goal, and road edge repulsion. The
`final trajectory is a sequence of points that lie near the
`
`
`
`TURK et al.: VITS-AUTONOMOUS LAND VEHICLE NAVIGATION
`
`347
`
`center of the road. Each point is tagged with a reference
`speed. The reference speeds are computed so that, if no
`new scene models are received, the vehicle will stop at
`the end of the trajectory. The trajectory is then sent to the
`Pilot.
`The reasoning subsystem also interacts with the knowl(cid:173)
`edge base to locate features significant for vision process(cid:173)
`ing. As each new trajectory is generated, the knowledge
`base is searched to determine if any features are within
`the field of view of the vehicle. Features that are both
`within a maximum distance and a maximum angle from
`the current heading are incorporated into a Visual Cue
`which is passed to VITS.
`
`D. Knowledge Base
`The knowledge base consists of a priori map data, and
`a set of routines for accessing the data. Currently, the map
`data contains information describing the road network
`being used as the AL V test track. The map data contains
`coordinates which specify the location of the roadway, as
`well as various significant features along the road, such
`as intersections, sharp curves, and several local road fea(cid:173)
`tures.
`At present, the vision subsystem communicates with the
`knowledge base through Reasoning.
`
`E. Pilot
`The Pilot performs the actual driving of the vehicle.
`Given a trajectory from Reasoning, the Pilot computes the
`error values of lateral position, heading and speed by
`comparing LNS data with the target values specified in
`the trajectory. The Pilot uses a table of experimentally
`obtained control gains to determine commands needed to
`drive the errors toward zero; these commands are output
`to the vehicle controllers.
`The vision subsystem has no direct communication with
`the Pilot.
`
`III. VIDEO-BASED ROAD-FOLLOWING
`The task of the vision system in a road following sce(cid:173)
`nario is to provide a description of the road for naviga(cid:173)
`tion. Roads may be described in a variety of ways, e.g.,
`by sets of road edges, a centerline with associated road
`width, or planar patches. We have chosen to represent a
`road by its edges, or more precisely, points in three space
`that, when connected, form a polygonal approximation of
`the road edge. Road edges are intuitively the most natural
`representation, since they are usually obvious (to humans,
`at least) in road images. Often, however, the dominant
`linear features in road images are the shoulder/vegetation
`boundaries rather than the road/shoulder boundaries. The
`difficulties in extracting the real road boundary from the
`image led us to adopt a segmentation algorithm to first
`extract the road in the image, track the road/nonroad
`boundary, and then calculate three dimensional road edge
`points.
`The current video data processing unit (VIVO) uses a
`clustering algorithm to segment the image into road and
`
`nonroad regions. A detailed description of image segmen(cid:173)
`tation by clustering can be found in [3]. After producing
`a binary road image, the road boundaries are traced and
`select image points are transformed into three dimen(cid:173)
`sional road boundary points. The complete cycle time,
`from digitization to producing a symbolic description of
`the road, is currently just over 2 seconds. The algorithm
`is summarized in the following steps, which are discussed
`in detail in the following sections: 1) digitize the video
`images; 2) segment road/nonroad regions; 3) extract road
`boundaries by tracing the binary road edges; and 4) trans(cid:173)
`form 2-D road edge points to 3-D coordinates and build
`the scene model. Fig. 6 depicts the flow of control in a
`complete scene model cycle.
`
`A. Sensor Control and Image Acquisition
`1) Camera Panning: The position of the road with re(cid:173)
`spect to the vehicle may change due to a curving road,
`vehicle oscillation, or a sudden path correction. Conse(cid:173)
`quently, the position of the road within the field of view
`of a fixed camera may change. Because the video seg(cid:173)
`mentation algorithm requires sampling a population of
`road pixels, two methods were developed to maintain
`knowledge of the road position from frame to frame: cam(cid:173)
`era panning and power windowing. Power windowing, a
`"software panning" technique, is described in Section III(cid:173)
`B-1-d.
`Control of the pan/tilt mechanism is a function of ve(cid:173)
`hicle orientation and desired viewing direction. During
`road-following, we would like the camera to point "down
`the road,'' regardless of the vehicle orientation, keeping
`the road approximately centered in the image. This re(cid:173)
`quires the vision system to know global position infor(cid:173)
`mation and relate the vehicle-centered road description to
`present vehicle location and orientation, and then to cal(cid:173)
`culate and command the desired pan angle. If only one
`road boundary is detected, then VITS will attempt to pan
`the camera to the right or left to bring both road edges
`into view in the next image. The activation of planning is
`also controlled by cues from the reasoning subsystem that
`indicate when panning would be useful (e.g., going
`around a sharp corner), and when it would not be helpful
`(e.g., passing a parking lot).
`In the initial implementation of camera panning, the
`camera was allowed to assume only three positions, left,
`mid, and right, with simple rules for switching from one
`to another based on road location in