throbber
342
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. VOL. 10, NO. 3. MAY 1988
`
`VITS-A Vision System for Autonomous Land
`Vehicle Navigation
`
`MATTHEW A. TURK, MEMBER, IEEE, DAVID G. MORGENTHALER, KEITH D. GREMBAN,
`AND MARTIN MARRA
`
`Abstract-In order to adequately navigate through its environment,
`a mobile robot must sense and perceive the structure of that environ(cid:173)
`ment, modeling world features relevant to navigation. The primary
`vision (or perception) task is to provide a description of the world rich
`enough to facilitate such behaviors as road-following, obstacle avoid(cid:173)
`ance, landmark recognition, and cross-country navigation. We de(cid:173)
`scribe VITS, the vision system for Alvin, the Autonomous Land Ve(cid:173)
`hicle, addressing in particular the task of road-following. The AL V has
`performed public road-following demonstrations, traveling distances
`up to 4.5 km at speeds up to 10 km/hr along a paved road, equipped
`with an RGB video camera with pan/tilt control and a laser range scan(cid:173)
`ner. The AL V vision system builds symbolic descriptions of road and
`obstacle boundaries using both video and range sensors. We describe
`various road segmentation methods for video-based road-following,
`along with approaches to boundary extraction and transformation of
`boundaries in the image plane into a vehicle-centered three dimen(cid:173)
`sional scene model.
`
`Index Terms-Autonomous navigation, computer vision, mobile ro(cid:173)
`bot vision, road-following.
`
`I. INTRODUCTION
`
`To achieve goal-directed autonomous behavior, the
`
`vision system for a mobile robot must locate and
`model the relevant aspects of the world so that an intel(cid:173)
`ligent navigation system can plan appropriate action. For
`an outdoor autonomous vehicle, typical goal-directed be(cid:173)
`haviors include road-following, obstacle avoidance, cross(cid:173)
`country navigation, landmark detection, map building and
`updating, and position estimation. The basic vision task
`is to provide a description of the world rich enough to
`facilitate such behaviors. The vision system must then in(cid:173)
`terpret raw sensor data, perhaps from a multiplicity of
`sensor and sensor types, and produce consistent symbolic
`descriptions of the pertinent world features.
`In May of 1985, "Alvin," the Autonomous Land Ve(cid:173)
`hicle at Martin Marietta Denver Aerospace, performed its
`
`Manuscript received December 15, 1986; revised May 15, 1987. This
`work was performed under the Autonomous Land Vehicle Program sup(cid:173)
`ported by the Defense Advanced Research Projects Agency under Contract
`DACA 76-84-C-0005.
`M. A. Turk was with Martin Marietta Denver Aerospace, P.O. Box
`179, M. S. H0427, Denver, CO 80201. He is now with the Media Labo(cid:173)
`ratory, Massachusetts Institute of Technology, Cambridge, MA 02139.
`D. G. Morgenthaler and M. Marra are with Martin Marietta Denver
`Aerospace, P.O. Box 179, M.S. H0427, Denver, CO 80201.
`K. D. Gremban is with the Department of Computer Science, Camegie(cid:173)
`Mellon University, Pittsburgh, PA 15213, on leave from Martin Marietta
`Devner Aerospace, P.O. Box 179, M.S. H0427, Denver, CO 80201.
`IEEE Log Number 8820101.
`
`first public road-following demonstration. In the few
`months leading up to that performance, a basic vision sys(cid:173)
`tem was developed to locate roads in video imagery and
`send three-dimensional road centerpoints to Alvin's nav(cid:173)
`igation system. Since that first demonstration, VITS (for
`Vision Task Sequencer) has matured into a more general
`framework for a mobile robot vision system, incorporat(cid:173)
`ing both video and range sensors and extending its road(cid:173)
`following capabilities. A second public demonstration in
`June 1986 showed the improved road-following ability of
`the system, allowing the ALV to travel a distance of 4.2
`km at speeds up to 10 km/hr, handle variations in road
`surface, and navigate a sharp, almost hairpin, curve. In
`October 1986 the initial obstacle avoidance capabilities
`were demonstrated, as Alvin steered around obstacles
`while remaining on the road, and speeds up to 20 km/hr
`were achieved on a straight, obstacle-free road. This pa(cid:173)
`per describes Alvin's vision system and addresses the par(cid:173)
`ticular task of video road-following. Other tasks such as
`obstacle detection and avoidance and range-based road(cid:173)
`following are discussed elsewhere [10], [11], [36].
`
`A. A Brief Review of Mobile Robot Vision
`
`SRI' s Shakey was the first mobile robot with a func(cid:173)
`tional, albeit very limited, vision system. Shakey was pri(cid:173)
`marily an experiment in problem solving methods, and its
`blocks world vision system ran very slowly. The JPL ro(cid:173)
`bot [32] used visual input to form polygonal terrain models
`for optimal path construction. Unfortunately, the project
`halted before the complete system was finished.
`The Stanford Cart [25], [26] used a single camera to
`take nine pictures, spaced along a 50 cm track, and used
`the Moravec interest operator to pick out distinctive fea(cid:173)
`tures in the images. These features were correlated be(cid:173)
`tween images and their three dimensional positions were
`found using a stereo algorithm. Running with a remote,
`time-shared computer as its ''brain,'' the Stanford Cart
`took about five hours to navigate a 20 meter course, with
`20 percent accuracy at best, lurching about one meter
`every ten to fifteen minutes before stopping again to take
`pictures, think, and plan a new path. The Cart's "sliding
`stereo'' system chose features generally good enough for
`navigation in a cluttered environment, but it did not pro(cid:173)
`vide a meaningful model of the environment.
`Tsugawa et al. [34] describe an autonomous car driven
`
`0162-8828/88/0500-0342$01.00 © 1988 IEEE
`
`Valeo Ex. 1016
`
`

`

`TURK et al.: VITS-AUTONOMOUS LAND VEHICLE NAVIGATION
`
`343
`
`up to 30 km/hr using a vertical stereo pair of cameras to
`detect expected obstacles, but its perception of the world
`was very minimal, limiting its application to a highly con(cid:173)
`strained environment. The "intelligent car" identified
`obstacles in an expected range very quickly by comparing
`edges in vertically displaced images. A continuous "ob(cid:173)
`stacle avoidance'' mode was in effect, and a model of the
`world was not needed.
`A vision system for a mobile robot intended for the fac(cid:173)
`tory floor was presented by Inigo et al. [ 18]. This system
`used edge detection, perspective inversion, and line fit(cid:173)
`ting (via a Hough transform) to find the path, an a priori
`road model of straight lines, and another stereo technique
`using vertical cameras, called motion driven scene cor(cid:173)
`relation, to detect obstacles. The Fido vision system [33]
`uses stereo vision to locate obstacles by a hierarchical cor(cid:173)
`relation of points chosen by an interest operator. Its model
`of the world consists of only the 3-D points it tracks, and
`it has successfully navigated through a cluttered environ(cid:173)
`ment and along a sidewalk. Current work in multisensory
`perception for the mobile robot Hilare is presented by de
`Saint Vincent [7], describing a scene acquisition module,
`using stereo cameras and a laser range finder, and a "dy(cid:173)
`namic vision'' module for robot position correction and
`tracking world features. Another stereo vision system
`based on matching vertical edges and inferring surfaces is
`described by Tsuji et al. [35].
`The goal of a mobile robot project in West Germany is
`to perform autonomous vehicle guidance on a German
`Autobahn at high speeds [8], [22], [29]. The current em(cid:173)
`phasis is on control aspects of the problem, incorporating
`a high-speed vision algorithm to track road border lines.
`The system has performed both road-following and ve(cid:173)
`hicle-following in real-time.
`Other mobile robots have been or are being developed
`that use sensors particularly suited to an indoor environ(cid:173)
`ment (e.g., [4], [19]). The project headed by Brooks [2]
`implements a novel approach to a mobile robot architec(cid:173)
`ture, emphasizing levels of behavior rather that functional
`modules; much of the current vision work may be incor(cid:173)
`porated into such a framework.
`
`B. ALV Background
`The Autonomous Land Vehicle project, part of DAR(cid:173)
`PA's Strategic Computing Program, is intended to ad(cid:173)
`vance and demonstrate the state of the art in image un(cid:173)
`derstanding,
`artificial
`intelligence,
`advanced
`architectures, and autonomous navigation. A description
`of the project and the initial system configuration is found
`in [24]. Related vision research is proceeding concur(cid:173)
`rently by a number of industrial and academic groups, as
`is work in route and path planning, as well as object mod(cid:173)
`eling and knowledge representation. The AL V project is
`driven by a series of successively more ambitious dem(cid:173)
`onstrations. The ultimate success of the project depends
`on coordination among the different groups involved to
`enable rapid technology transfer from the research do(cid:173)
`main to the application domain. As the ALV is intended
`
`to be a national testbed for autonomous vehicle research,
`various vision systems and algorithms will eventually be
`implemented. Some of the current work is briefly de(cid:173)
`scribed in the remainder of this section.
`Vision research areas currently being pursued in rela(cid:173)
`tion to the ALV program include object modeling, stereo,
`texture, motion detection and analysis, and object recog(cid:173)
`nition. An architecture for terrain recognition which uses
`model-driven schema instantiation for terrain recognition
`is presented by Lawton et al. [23]. Such representations
`for terrain models will be important for future cross-coun(cid:173)
`try navigation. Waxman et al. [40], [41] present a visual
`navigation system that incorporates rule-based reasoning
`with image processing and geometry modules. The sys(cid:173)
`tem, developed at the University of Maryland, finds dom(cid:173)
`inant linear features in the image and reasons about these
`features to describe the road, using bootstrap and feed(cid:173)
`forward image processing phases. In the feed-forward
`phase, previous results are used to predict the location of
`the road in successive images. A subset of this system has
`been used to autonomously drive the AL V for short dis(cid:173)
`tances. DeMenthon [6] describes an alternative geometry
`module for the above visual navigation system.
`Significant AL V-related work is proceeding at Carne(cid:173)
`gie-Mellon University (CMU). A review ofrecent results
`from the CMU program is presented by Goto and Stentz
`[13]. Outdoor scene analysis using range data from a laser
`range scanner is presented by Hebert and Kanade [ 16],
`describing methods for preprocessing range data, extract(cid:173)
`ing three dimensional features, scene interpretation, map
`building, and object recognition. Fusion of video and
`range data is also discussed. Range data processing has
`been used on the CMU Navlab to demonstrate obstacle
`avoidance capabilities. Vision algorithms used for suc(cid:173)
`cessful outdoor navigation of the CMU Terregator are de(cid:173)
`scribed by Wallace et al. [37]-[39]. The Terregator has
`achieved continuous motion navigation using both edge(cid:173)
`based and color-based sidewalk finding algorithms.
`Hughes Artificial Intelligence Center is developing
`knowledge-based vision techniques for obstacle detection
`and avoidance using the concept of a virtual sensor which
`blends raw sensor data with specialized processing in re(cid:173)
`sponse to a request from the planning system [5], [30].
`Work at SRI International is focused on object modeling
`and recognition, and on modeling uncertainty in multiple
`representations [ 1].
`FMC Corporation and General Dynamics have dem(cid:173)
`onstrated successful transfer of AL V technology to mis(cid:173)
`sion-oriented scenarios of mixed teleoperation and auton(cid:173)
`omous navigation, performed at the Martin Marietta test
`site in 1986. Kuan et al. [20], [21] describe FMC's re(cid:173)
`search in vision-guided road-following. Other university
`and industrial laboratories which are engaged in vision
`research related to AL V include Advanced Decision Sys(cid:173)
`tems, Columbia University, General Electric, Honeywell
`Research Center, MIT, University of Massachusetts at
`Amherst, University of Rochester, and USC. The Pro(cid:173)
`ceedings of the February 1987 Image Understanding
`
`

`

`344
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.
`
`IO, NO. 3, MAY 1988
`
`Sensors
`
`Control 1 ---,.__::Sce=ne:_::M:::o:::de::.l __ ~
`Position Update
`
`Data
`
`Visual Cues
`
`Trajectory
`
`Knowledge
`Base
`
`Fig. I. The ALY system configuration.
`
`Workshop, sponsored by DARPA, contains descriptions
`and status reports of many of these projects.
`The vision system described in this paper (VITS) is the
`system meeting the perception requirements for testing
`and formal demonstrations of the ALV through 1986.
`Section II gives a system overview, briefly describing the
`various subsystems; it is important to understand the vi(cid:173)
`sion system in its context. Video-based road following is
`discussed in Section Ill, describing sensor control, road
`segmentation, road boundary extraction, and geometric
`transformation to three dimensional world coordinates.
`
`II. ALV SYSTEM OVERVIEW
`It is important to view Alvin's vision subsystem as an
`integral part of a larger system, which can affect and be
`affected by the performance of the system as a whole. Fig.
`1 illustrates the basic system configuration of the AL V,
`including the interfaces to the major modules. In the par(cid:173)
`agraphs below, each of Alvin's major components will be
`discussed in the context of the interaction as a complete
`system.
`
`A. Hardware Components
`The primary consideration behind selection of the hard(cid:173)
`ware components was that Alvin is intended to be a testbed
`for research in autonomous mobility systems. Conse(cid:173)
`quently, it was necessary to provide Alvin with an under(cid:173)
`carriage and body capable of maneuvering both on-road
`and off-road, while carrying on board all the power, sen(cid:173)
`sors, and computers needed for autonomous operation. In
`addition, the requirements of autonomous operation di(cid:173)
`rected the selection of sensors and processing hardware.
`1) Vehicle: Fig. 2 is a photograph of Alvin. The over(cid:173)
`all vehicle dimensions are 2. 7 m wide by 4.2 m long; the
`suspension system allows the height of the vehicle to be
`varied, but it is nominally 3.1 m.
`Alvin weighs approximately 16 000 pounds fully loaded
`with equipment, yet is capable of traveling both on-road
`and off-road. The undercarriage is an all-terrain built by
`Standard Manufacturing, Inc. The basic vehicle is eight(cid:173)
`wheel drive, diesel-powered, and hydrostatically driven.
`Alvin is steered like a tracked vehicle by providing dif(cid:173)
`ferential power to the two sets of wheels.
`Alvin's fiberglass shell protects the interior from dust
`and inclement weather, and insulates the equipment in(cid:173)
`side. The shell provides space for six full-size equipment
`racks, as well as room for service access. The electronics
`within the AL V are powered by an auxiliary power unit.
`
`Fig. 2. Alvin.
`
`An environmental control unit cools the interior of the
`shell.
`2) Sensors: In order to function in a natural environ(cid:173)
`ment, an autonomous vehicle must be able to sense the
`terrain around it, as well as keep track of heading and
`distance traveled. The ALV hosts a number of sensors to
`accomplish these tasks.
`Alvin's sense of direction and distance traveled is pro(cid:173)
`vided by odometers on the wheels coupled to a Bendix
`Land Navigation System (LNS). These sensors enable Al(cid:173)
`vin to follow a trajectory derived from visual data or read
`from a prestored map. The LNS provides direction as an
`angle from true North, while distance traveled is provided
`in terms of horizontal distance (Northings and Eastings),
`and altitude.
`Two imaging sensors are currently available on the
`ALV for use by VITS. The primary vision sensor is an
`RCA color video CCD camera, which provides 480 x
`512 red, green, and blue images, with eight bits of inten(cid:173)
`sity per image. The field of view (38° vertical and 50°
`horizontal) and focus of the camera are kept fixed. The
`camera is mounted on a pan/tilt unit that is under direct
`control of the vision subsystem.
`The other vision sensor is a laser range scanner, devel(cid:173)
`oped by the Environmental Research Institute of Michi(cid:173)
`gan (ERIM). This sensor determines range by measuring
`the phase shift of a reflected modulated laser beam. The
`laser is continuously scanned over a field of view that is
`30° vertical and 80° horizontal. The output of the scanner
`is a digital image consisting of a 64 X 256 array of pixels
`with 8 bits of range resolution.
`
`

`

`TURK et al.: VITS-AUTONOMOUS LAND VEHICLE NAVIGATION
`
`345
`
`Camera
`
`Video Data
`
`Videotape
`Recorder
`
`Time(cid:173)
`Code
`Generator
`
`Left & Right
`Odometers
`
`VICOM
`Image
`Processor
`
`Vehicle
`Control
`& Status
`
`Laser
`Scanner
`Processor
`
`Laser
`Range
`Scanner
`
`Time-
`Code
`Generator
`l/F
`
`Digital
`Control
`
`Land
`Navigation
`System
`Interface
`
`A/D
`
`D/A
`
`Master
`Processor
`80286/80287
`512-kB RAM
`
`Navigation
`Processor
`80816
`128-kB RAM
`
`Vehicle
`Control
`Processor
`8086
`128 kB RAM
`
`Multichannel
`Controller
`8089
`
`~~D_i_sk_C_o_nt_ro_ll_er~~----'-~~~~----'-~~~~_,_~~~~-l
`
`4-MBps
`
`Fig. 3. The first-generation ALY processor configuration.
`
`3) Computer Hardware: Alvin currently uses a variety
`of computers, resulting from the range of processing re(cid:173)
`quirements of the different software subsystems. The di(cid:173)
`verse processing requirements were met by designing a
`modular multiprocessor architecture. VITS is hosted on a
`Vicom image processor, while the other software subsys(cid:173)
`tems are hosted on an Intel multiprocessor system. VITS
`communicates with the other subsystems across a dedi(cid:173)
`cated communication channel, while the other subsystems
`communicate across a common bus. Fig. 3 depicts the
`processor configuration.
`The special capabilities of the Vicom hardware were
`important to the development of the AL V vision subsys(cid:173)
`tem (VITS). The Vicom contains video digitizers, and can
`perform many standard image processing operations at
`near video frame rate (1/30 second). For example, 3 x
`3 convolution, point mapping operations (such as thresh(cid:173)
`olding, or addition and subtraction of constants), and im(cid:173)
`age algebra (such as addition or subtraction of two im(cid:173)
`ages) are all frame rate operations. The Vicom also
`contains a general purpose microcomputer for additional,
`user-defined operations.
`As stated above, Alvin is intended to be a testbed for
`autonomous systems. In fulfilling this charter, plans have
`been made to integrate a number of advanced experimen(cid:173)
`tal computer architectures in future generations of the
`ALV system. This will begin with a new architecture in
`early 1987.
`
`B. Vision
`The vision subsystem is composed of three basic mod(cid:173)
`ules: VITS, the vision executive, which handles initial(cid:173)
`ization, sets up communication channels, and "oversees"
`the processing; VIVO, the video data processing unit; and
`VIRD, the range data processing unit. Range data pro(cid:173)
`cessing has been implemented on the AL V, and results of
`
`range-based road-following and obstacle avoidance are
`presented in [ 10], [ 11].
`The vision system software resides entirely on the Vi(cid:173)
`com image processor, which also houses a board dedi(cid:173)
`cated to camera pan/tilt control and a board to enable
`communication with the Intel system. Nearly all of the
`application code is written in Pascal and uses the Vicom(cid:173)
`supplied libraries for accessing high-speed image opera(cid:173)
`tions. Some low level control routines have been imple(cid:173)
`mented in Motorola 68000 assembly language.
`The responsibility of the vision subsystem in road-fol(cid:173)
`lowing is to process data in the form of video or range
`images to produce a description of the road in front of the
`vehicle. This description is passed to the reasoning sub(cid:173)
`system, which uses additional data such as current posi(cid:173)
`tion, speed, and heading to generate a trajectory for Alvin
`to follow. Communication between the vision subsystem
`and Reasoning takes place in three different forms: the
`scene model, the position update, and visual cues. A spe(cid:173)
`cial communication control processor, part of the utilities
`subsystem, mediates communication between VITS and
`the other subsystems. The control processor shares mem(cid:173)
`ory with VITS, and handles communication by examining
`the content of key memory locations every 100 ms and
`modifying them as appropriate.
`1) Scene Model: The scene model, a description of the
`observed road, is the output of the vision subsystem after
`each frame of images is processed. The scene model con(cid:173)
`tains a record of Alvin's position and heading at the time
`of image acquisition, a description of the road found in
`the imagery, consisting of lists of vehicle-centered 3-D
`points denoting left and right road edges, and an optional
`list of points surrounding an obstacle. The reasoning sub(cid:173)
`system must then transform the road description into a
`fixed, world coordinate system for navigation. VITS may
`optionally specify the scene model in world coordinates;
`
`

`

`346
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 10, NO. 3, MAY 1988
`
`this is more efficient when data acquired from multiple
`sensors or at different times is used to create the scene
`model.
`Since the time needed to compute a scene model is non(cid:173)
`deterministic, VITS sets a "scene model ready" flag in(cid:173)
`dicating that a new scene model is ready to be processed.
`The communication controller examines this flag, and,
`when set, transfers the scene model to the reasoning sub(cid:173)
`system and clears the flag.
`Fig. 4 illustrates the format of a scene model. Fig. 5 is
`an example of a hypothetical road scene and the corre(cid:173)
`sponding scene model.
`2) Position Update: VITS must know the position and
`heading of the vehicle at the time of image acquisition to
`integrate sensor information acquired at different times,
`and to transform vehicle-centered data into world coor(cid:173)
`dinates. In addition, VITS must be able to predict the lo(cid:173)
`cation of the road in an image, given its location in the
`preceding image (see Section III-B-1-d).
`Communication of vehicle motion and position infor(cid:173)
`mation is effected by means of a position update message
`passed from Reasoning to the vision subsystem. The po(cid:173)
`sition update specifies the current vehicle speed, position,
`and heading. Synchronization of position update and im(cid:173)
`age acquisition is mediated by a position update request.
`At the time VITS digitized an image, the ''position update
`request'' flag is set. When the communication controller
`finds the flag set, it sends a message to Reasoning which
`immediately (within 100 ms) generates the required in(cid:173)
`formation, builds a position update message, and sends it
`to VITS.
`3) Visual Cues: The reasoning subsystem interfaces to
`a knowledge base which contains information about the
`test area. Some of this information can be used by VITS
`to specify behavior (find road, locate obstacles, pause,
`resume) or to optimize processing, much as the informa(cid:173)
`tion on a road map can guide a driver. When Reasoning
`determines that a visually identifiable feature should be
`within the field of view, a visual cue is sent to VITS en(cid:173)
`abling vision processing to be modified. In the future,
`when Alvin's domain becomes more complex, these cues
`will be used to guide the transition from one road surface
`to another, from on-road to off-road and vice versa, or to
`guide the search for a landmark. In the current version of
`the system, stored knowledge about the shape of the road
`shoulder has been used to guide a transition between
`range-based and video-based road-following. Apart from
`this, the cue facility has been used to date only to notify
`VITS that the vehicle is approaching a curve (which
`causes the camera panning mechanism to be enabled) and
`to send pause and resume commands to VITS.
`
`C. Reasoning
`
`The Reasoning subsystem is the executive controller of
`the ALV; Vision is a resource of Reasoning. At the high(cid:173)
`est level, Reasoning is responsible for receiving a plan
`script from a human test conductor and coordinating the
`
`type scene_model = record
`time: array[1 . .4J of word; {time stamp}
`count: word; { #of road edge records}
`x,y,psi: real; {vehicle position}
`SM_rec: array[1 .. 1 OJ of record
`tag: string[2J: { lett or right}
`numpts: word; {#of points}
`pts: array [1 .. 1 OJ
`of array[1 .. 3J of real;
`
`end;
`{current SW version}
`version: string[1 OJ;
`num: word; { scene model#}
`
`end;
`Fig. 4. The scene model format.
`
`Scene Model
`
`14
`02
`30
`565
`
`203.6
`1451.3
`1.7
`
`} time stamp
`} Posttion stamp
`
`} # of edge records
`
`VITS 12.2
`
`Version
`
`34
`
`LL
`7
`8.2, 5.1, ·1.1
`
`:
`
`}
`
`S.M.#
`
`Left
`edge
`record
`
`RR
`7
`
`8.2, ·19· -1.1
`
`} Right
`edge
`record
`
`Fig. 5. Road scene and corresponding scene model.
`
`other subsystems on Alvin in order to accomplish the goals
`specified in the script.
`Because the processing involved in creating a visual de(cid:173)
`scription of the environment is beyond the real-time ca(cid:173)
`pability of present computers, the scene model is not used
`directly in the vehicle's control servo loop. Instead, the
`Navigator (part of the reasoning subsystem) pieces to(cid:173)
`gether scene models from the vision system and builds a
`reference trajectory that is sent to the Pilot for control.
`The reasoning subsystem accepts a position update re(cid:173)
`quest from VITS, generates the appropriate data, and
`sends back a position update. Upon receipt of a scene
`model, Reasoning evaluates it and plots a smooth trajec(cid:173)
`tory if the data is acceptable. The new trajectory is com(cid:173)
`puted to smoothly fit the previous trajectory.
`Evaluation of scene models is a powerful capability of
`the reasoning subsystem. Small environmental changes,
`such as dirt on the road, or the sudden appearance of a
`cloud, can significantly affect the output of the vision sub(cid:173)
`system. Reasoning uses assumptions about the smooth(cid:173)
`ness and continuity of roads to verify data from VITS.
`Every scene model is evaluated based on the smoothness
`of the road edges, and on how well they agree with pre(cid:173)
`vious edges. A scene model evaluated as "bad" is dis(cid:173)
`carded.
`Reasoning creates a new trajectory by minimizing a cost
`function based on current heading, curvature of the scene
`model, attraction to a goal, and road edge repulsion. The
`final trajectory is a sequence of points that lie near the
`
`

`

`TURK et al.: VITS-AUTONOMOUS LAND VEHICLE NAVIGATION
`
`347
`
`center of the road. Each point is tagged with a reference
`speed. The reference speeds are computed so that, if no
`new scene models are received, the vehicle will stop at
`the end of the trajectory. The trajectory is then sent to the
`Pilot.
`The reasoning subsystem also interacts with the knowl(cid:173)
`edge base to locate features significant for vision process(cid:173)
`ing. As each new trajectory is generated, the knowledge
`base is searched to determine if any features are within
`the field of view of the vehicle. Features that are both
`within a maximum distance and a maximum angle from
`the current heading are incorporated into a Visual Cue
`which is passed to VITS.
`
`D. Knowledge Base
`The knowledge base consists of a priori map data, and
`a set of routines for accessing the data. Currently, the map
`data contains information describing the road network
`being used as the AL V test track. The map data contains
`coordinates which specify the location of the roadway, as
`well as various significant features along the road, such
`as intersections, sharp curves, and several local road fea(cid:173)
`tures.
`At present, the vision subsystem communicates with the
`knowledge base through Reasoning.
`
`E. Pilot
`The Pilot performs the actual driving of the vehicle.
`Given a trajectory from Reasoning, the Pilot computes the
`error values of lateral position, heading and speed by
`comparing LNS data with the target values specified in
`the trajectory. The Pilot uses a table of experimentally
`obtained control gains to determine commands needed to
`drive the errors toward zero; these commands are output
`to the vehicle controllers.
`The vision subsystem has no direct communication with
`the Pilot.
`
`III. VIDEO-BASED ROAD-FOLLOWING
`The task of the vision system in a road following sce(cid:173)
`nario is to provide a description of the road for naviga(cid:173)
`tion. Roads may be described in a variety of ways, e.g.,
`by sets of road edges, a centerline with associated road
`width, or planar patches. We have chosen to represent a
`road by its edges, or more precisely, points in three space
`that, when connected, form a polygonal approximation of
`the road edge. Road edges are intuitively the most natural
`representation, since they are usually obvious (to humans,
`at least) in road images. Often, however, the dominant
`linear features in road images are the shoulder/vegetation
`boundaries rather than the road/shoulder boundaries. The
`difficulties in extracting the real road boundary from the
`image led us to adopt a segmentation algorithm to first
`extract the road in the image, track the road/nonroad
`boundary, and then calculate three dimensional road edge
`points.
`The current video data processing unit (VIVO) uses a
`clustering algorithm to segment the image into road and
`
`nonroad regions. A detailed description of image segmen(cid:173)
`tation by clustering can be found in [3]. After producing
`a binary road image, the road boundaries are traced and
`select image points are transformed into three dimen(cid:173)
`sional road boundary points. The complete cycle time,
`from digitization to producing a symbolic description of
`the road, is currently just over 2 seconds. The algorithm
`is summarized in the following steps, which are discussed
`in detail in the following sections: 1) digitize the video
`images; 2) segment road/nonroad regions; 3) extract road
`boundaries by tracing the binary road edges; and 4) trans(cid:173)
`form 2-D road edge points to 3-D coordinates and build
`the scene model. Fig. 6 depicts the flow of control in a
`complete scene model cycle.
`
`A. Sensor Control and Image Acquisition
`1) Camera Panning: The position of the road with re(cid:173)
`spect to the vehicle may change due to a curving road,
`vehicle oscillation, or a sudden path correction. Conse(cid:173)
`quently, the position of the road within the field of view
`of a fixed camera may change. Because the video seg(cid:173)
`mentation algorithm requires sampling a population of
`road pixels, two methods were developed to maintain
`knowledge of the road position from frame to frame: cam(cid:173)
`era panning and power windowing. Power windowing, a
`"software panning" technique, is described in Section III(cid:173)
`B-1-d.
`Control of the pan/tilt mechanism is a function of ve(cid:173)
`hicle orientation and desired viewing direction. During
`road-following, we would like the camera to point "down
`the road,'' regardless of the vehicle orientation, keeping
`the road approximately centered in the image. This re(cid:173)
`quires the vision system to know global position infor(cid:173)
`mation and relate the vehicle-centered road description to
`present vehicle location and orientation, and then to cal(cid:173)
`culate and command the desired pan angle. If only one
`road boundary is detected, then VITS will attempt to pan
`the camera to the right or left to bring both road edges
`into view in the next image. The activation of planning is
`also controlled by cues from the reasoning subsystem that
`indicate when panning would be useful (e.g., going
`around a sharp corner), and when it would not be helpful
`(e.g., passing a parking lot).
`In the initial implementation of camera panning, the
`camera was allowed to assume only three positions, left,
`mid, and right, with simple rules for switching from one
`to another based on road location in

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket