`
`•
`
`i .
`
`I
`
`r: l
`
`J
`
`·-.
`AD-A249 972
`lllllllllllllllllllllllllllllllllllll //IIIII!
`
`Neural Network Perception for Mobile Robot
`Guidance
`
`Dean A. Pomerleau
`February 16, 1992
`CMU-CS-92-115
`
`School of Computer Science
`Carnegie Mellon University
`Pittsburgh, PA 15213
`
`DTI SR.ECTK
`
`MAY12 .
`... I
`
`'-.......... ··- ·...-. .. ~ ...... - ..... ________ J
`
`I .,
`
`'_u~-- -"'-¥~~
`
`: . •.•
`
`.
`
`.... ,
`
`.... I' ...
`
`.. . .
`
`...
`
`Submitted in partial fulfillment of the requirements
`for the degree of Doctor of PhilosophY:
`.....__.....,,u...,..,.,~ •. '>••••• . • . . • . .. , • .!!-~"f'-;,...
`CLEARED ;.-. .:.. ·:-:;!-~~:. _CI.'l!' -~>,
`· . .' ::0 .'~ • ·,.
`iOR Oi'fN PUClUCATIOt:• C.t~.( ;:
`-··::·
`-...:. : -~ ...,~~ ••
`APR 2 8 1992
`··· .. c l ~ :
`Q
`t.
`
`... ..
`...
`
`"""' :
`
`© 1992 by Dean A. Pomerleau
`
`Suppon for this work has come from DARPA, under contracts DACA76-85-C-0019,
`DACA76-8S-C-0003, DACA76-8S-C-0002, DACA76-89-C-0014 and DAAE07-90-C-ROS9.
`These contracts were monitored by the Topographic Engineering Center and by TACOM. This re(cid:173)
`search was also funded in pan by grants from the Fujitsu Corporation and the Shimizu Corporation.
`
`92-12205
`
`111111111111111111111111111 111111111111111111
`
`V1 d-.
`
`I V
`
`·' •
`
`AVS EXHIBIT 2004
`Toyota Inc. v. American Vehicular Sciences LLC
`IPR2013-00419
`
`
`
`Neural Network Perception for Mobile Robot
`Guidance
`
`Dean A. Pomerleau
`February 16, 1992
`CMU-CS-92-115
`
`School of Computer Science
`Carnegie Mellon University
`Pittsburgh, PA 15213
`
`Submitted in partialfulfillment of the requirements
`for the degree of Doctor of Philosophy.
`
`@ 1992 by Dean A. Pomerleau
`
`Support for this work has come from DARPA, under contracts DACA76-35-C-0019,
`DACA76-85-C-0003, DACA76-85-C-0002, DACA76-89-C-0014 and DAAE07-90-C-R059.
`These contracts were monitored by the Topographic Engineering Center and by TACOM. This re-
`search was also funded in part by grants from the Fujitsu Corporation and the Shimizu Corporation.
`
`5(A ?$
`
`
`
`!i*rn eg ie
`
`School of Computer Science
`
`DOCTORAL THESIS
`in the field of
`Computer Science
`
`Neural Network Perception for Mobile Robot Guidance
`
`DEAN POMERLEAU
`
`Submitted in Partial Fulfillment of the Requirements
`for the Degree of Doctor of Philosophy
`
`ACCEPTED:
`
`APPROVED:
`
`SOR
`
`DEAN
`
`DATE
`
`DATE
`
`PROVOST
`
`DATE
`
`
`
`Dedicated to Terry, Glen and Phyllis
`
`A00SM11M or
`
`NTIS GRAMI
`DTIC TAB
`Unannounced
`
`0l
`
`By
`
`Availability Code&
`
`Da jAvA ~and-/or-
`
`Dist Speia
`
`
`
`Abstract
`
`Vision based mobile robot guidance has proven difficult for classical machine
`vision methods because of the diversity and real time constraints inherent in the
`task. This thesis describes a connectionist system called ALVINN (Autonomous
`Land Vehicle In a Neural Network) that overcomes these difficulties. ALVINN
`learns to guide mobile robots using the back-propagation training algorithm. Be-
`cause of its ability to learn from example, ALVINN can adapt to new situations
`and therefore cope with the diversity of the autonomous navigation task.
`But real world problems like vision based mobile robot guidance presents a
`different set of challenges for the connectionist paradigm. Among them are:
`
`* How to develop a general representation from a limited amount of real
`training data,
`
`9 How to understand the internal representations developed by artificial neural
`networks,
`
`* How to estimate the reliability of individual networks,
`* How to combine multiple networks trained for different situations into a
`single system,
`
`* How to combine connectionist perception with symbolic reasoning.
`
`This thesis presents novel solutions to each of these problems. Using these
`techniques, the ALVINN system can learn to control an autonomous van in under 5
`minutes by watching a person drive. Once trained, individual ALVINN networks
`can drive in a variety of circumstances, including single-lane paved and unpaved
`roads, and multi-lane lined and unlined roads, at speeds of up to 55 miles per hour.
`The techniques also are shown to generalize to the task of controlling the prec; ,e
`foot placement of a walking robot.
`
`
`
`Acknowledgements
`I wish to thank my advisor, Dr. David Touretzky for his support and technical
`advice during my graduate studies. Dave has not only provided invaluable feed-
`back, he has also given me the latitude I've needed to develop as a researcher. I am
`also grateful to Dr. Charles Thorpe for the opportunities, resources and expertise
`he has provided me. Without Chuck's support, this research would not have been
`possible. I also wish to thank the other members of my committee, Dr. Takeo
`Kanade and Dr. Terrence Sejnowski, for their insightful analysis and valuable
`comments concerning my work.
`
`I owe much to all the members of the ALV/UGV project. Their technical
`support and companionship throughout the development of the ALVINN system
`I would like to specifically
`has made my work both possible and enjoyable.
`thank Jay Gowdy and Omead Amidi, whose support software underlies much of
`the ALVINN system. James Frazier also deserves thanks for his patience during
`many hours of test runs on the Navlab.
`
`Interaction with members of the Boltzmann group at Carnegie Mellon has
`also been indispensable. From them I have not only learned about all aspects of
`connectionism, but also how to communicate my thoughts and ideas. In particular,
`the insights and feedback provided by discussions with John Hampshire form the
`basis for much of this thesis. I am grateful to Dave Plaut, whose helpful suggestions
`in the early stages of this work put me on the right track.
`Other people who have contributed to the success of this thesis are the members
`of the SM 2 group. In particular, Ben Brown and Hiroshi Ueno have given me the
`opportunity, incentive and support I have needed to explore an alternative domain
`for connectionist mobile robot guidance.
`I am also in debt to my office mates, Spiro Michaylov and Nevin Heintze.
`They have helped me throughout our time as graduate students, with everything
`from explaining I6TEXpeculiarities to feeding my fish. I would like to thank my
`parents, Glen and Phyllis, for encouraging my pursuit of higher education, and
`for all the sacrifices they've made to provide me with a world of opportunities.
`Finally, I am especially grateful to my fianc6e, Terry Jessie, for her constant love,
`support and patience during the difficult months spent preparing this dissertation.
`Her presence in my life has helped me keep everything in perspective.
`
`Dean A. Pomerleau
`
`February 16, 1992
`
`
`
`Contents
`
`1 Introduction
`1.1 Problem Description ............................
`1.2 Robot Testbed Description .........................
`1.3 Dissertation Overview ...........................
`
`2 Network Architecture
`2.1 Architecture Overview ...........................
`Input Representations .....
`2.2
`......................
`2.2.1
`Preprocessing Practice ......................
`2.2.2
`Justification of Preprocessing ..................
`2.3 Output Representation ...........................
`2.3.1
`1-of-N Output Representation .................
`2.3.2
`Single Graded Unit Output Representation ........
`2.3.3 Gaussian Output Representation ................
`2.3.4 Comparing Output Representations ..............
`Internal Network Structures .......................
`
`2.4
`
`3 Training Networks "On-The-Fly"
`3.1 Training with Simulated Data ......................
`3.2 Training "on-the-fly" with Real Data ...................
`3.2.1
`Potential Problems ........................
`Solution - Transform the Sensor Image ...........
`3.2.2
`3.2.3 Transforming the Steering Direction ..............
`3.2.4 Adding Diversity Through Buffering .............
`3.3 Performance Improvement Using Transformations ........
`3.4 Discussion ..................................
`
`V
`
`1
`2
`4
`5
`
`10
`11
`12
`13
`18
`20
`22
`24
`28
`33
`35
`
`37
`38
`41
`41
`42
`48
`52
`54
`56
`
`..
`
`..
`
`
`
`4 Training Networks With Structured Noise
`4.1 Transitory Feature Problem .......................
`4.2 Training with Gaussian Noise ......................
`4.3 Characteristics of Structured Noise ....................
`4.4 Training with Structured Noise ......................
`4.5
`Improvement from Structured Noise Training .............
`4.6 Discussion .................................
`
`5 Driving Results and Performance
`5.1 Situations Encountered ...........................
`5.1.1
`Single Lane Paved Road Driving ................
`5.1.2
`Single Lane Dirt Road Driving .................
`5.1.3
`Two-Lane Neighborhood Street Driving .........
`5.1.4 Railroad Track Following ....................
`5.1.5 Driving in Reverse ........................
`5.1.6 Multi-lane Highway Driving ...................
`5.2 Driving with Alternative Sensors .....................
`5.2.1 Night Driving Using Laser Reflectance Images
`5.2.2 Training with a Laser Range Sensor ..............
`5.2.3 Contour Following Using Laser Range Images
`5.2.4 Obstacle Avoidance Using Laser Range Images
`5.3 Quantitative Performance Analysis ....................
`5.4 Discussion .................................
`
`6 Analysis of Network Representations
`6.1 Weight Diagram Interpretation ......................
`6.2 Sensitivity Analysis ...........................
`6.2.1
`Single Unit Sensitivity Analysis ................
`6.2.2 Whole Network Sensitivity Analysis .............
`6.3 Discussion .................................
`
`7 Rule-Based Multi-network Arbitration
`7.1 Symbolic Knowledge and Reasoning ..................
`7.2 Rule-based Driving Module Integration ...............
`7.3 Analysis and Discussion .........................
`
`..
`
`.....
`
`..
`
`.....
`.....
`
`..
`.
`
`58
`58
`64
`68
`69
`75
`77
`
`80
`80
`81
`83
`84
`84
`85
`85
`86
`88
`88
`90
`90
`91
`94
`
`96
`97
`101
`103
`108
`118
`
`120
`121
`125
`128
`
`
`
`8 Output Appearance Reliability Estimation
`8.1 Review of Previous Arbitration Techniques ..
`8.2 QARE Details..................135
`8.3 Results Using QARE ..
`.. ..
`.. ..
`..
`...
`8.3.1 When and Why OARE Works.
`..
`..
`8.4 Shortcomings of OARE.. ..
`.. ..
`...
`..
`
`.. ..
`
`..
`
`.....
`
`...
`..
`...
`
`..
`
`..
`..
`...
`
`....
`.....
`...
`
`9 Input Reconstruction Reliability Estimation
`9.1 The IRRE Idea ..
`.. ..
`..
`.. ..
`...
`...
`...
`9.2 Network Inversion ..
`..
`.. ..
`..
`...
`.. ...
`9.3 Backdriving the Hidden Units.
`..
`.. ..
`..
`...
`..
`.. ..
`9.4 Autoencoding the Input.
`..
`...
`...
`9.5 Discussion.
`..
`..
`.. ..
`..
`...
`...
`...
`..
`
`.. ....
`...
`.....
`.. ...
`...
`..
`...
`.....
`
`..
`..
`
`10 Other Applications-. The SM2
`10. 1 The Task ..
`.. ..
`..
`...
`..
`..
`10.2 Network Architecture ..
`.. ..
`..
`10.3 Network Training and Performance.
`..
`..
`.. ..
`..
`...
`10.4 Discussion.
`
`...
`...
`...
`...
`.. ...
`.. ..
`..
`..
`...
`...
`..
`
`..
`...
`...
`...
`
`....
`.....
`.....
`.....
`
`11 Other Vision-based Robot Guidance Methods
`11.1 Non-learning Autonomous Driving Systems.
`..
`..
`11.-1. 1 Examples.
`..
`.. ..
`..
`...
`...
`...
`...
`11. 1.2 Comparison with ALVlNN.
`..
`.. ..
`..
`...
`....
`11.2 Other Connectionist Navigation Systems. ..
`..
`..
`..
`..
`....
`11.3 Other Potential Connectionist Methods. ..
`..
`.. ..
`..
`..
`..
`11.4 Other Machine Learning Techniques ..
`..
`..
`.. ..
`..
`....
`11.5 Discussion.
`..
`..
`..
`.. ..
`...
`...
`...
`..
`...
`.....
`
`....
`..
`
`..
`
`..
`..
`
`12 Conclusion
`12.1 Contributions.
`12.2 Future Work.
`
`..
`..
`
`.. ..
`..
`..
`
`..
`...
`
`.. ..
`...
`
`...
`...
`
`...
`...
`
`...
`...
`
`.. ....
`....
`
`..
`
`131
`132
`
`137
`142
`145
`
`147
`147
`148
`152
`159
`163
`
`168
`168
`171
`171
`176
`
`178
`179
`179
`181
`182
`184
`186
`190
`
`192
`192
`196
`
`
`
`Chapter 1
`
`Introduction
`
`A truly autonomous robot must sense its environment and react appropriately.
`Previous mobile robot perception systems have relied on hand-coded algorithms
`for processing sensor information. In this dissertation I develop techniques which
`enable artificial neural networks (ANNs) to learn the visual processing required
`for mobile robot guidance. The power and flexibility of these techniques are
`demonstrated in two domains, wheeled vehicle navigation, and legged robot foot
`positioning.
`The central claims of this dissertation are:
`
`" By appropriately constraining the problem, the network architecture and the
`training algorithm, ANNs can quickly learn to perform many of the complex
`perception tasks required for mobile robot navigation.
`
`" A neural network-based mobile robot perception system is able to robustly
`handle a wider variety of situations than hand-programmed systems because
`of the ability of ANNs to adapt to new sensors and situations.
`
`" Artificial neural networks are not just black boxes. Their internal represen-
`tations can be analyzed and understood.
`
`" The reliability of ANNs can be estimated with a relatively high precision.
`These reliability estimates can be employed to arbitrate between multiple
`expert networks, and hence facilitate the modular construction of connec-
`tionist systems.
`
`
`
`2
`
`CHAPTER 1. INTRODUCTION
`
`Sensors
`
`Low-Level
`] lPmept!!l Perceptual Motor
`-Processing -E=m, Controller
`Input
`
`Control
`siwm
`
`-[Actuators
`
`Feedback
`
`Figure 1.1: Block diagram of sensor based mobile robot guidance
`
`9 By combining neural network-based perception with symbolic reasoning,
`an autonomous navigation system can achieve accurate low-level control
`and exhibit intelligent high-level behavior.
`
`1.1 Problem Description
`
`To function effectively, an autonomous mobile robot must first be capable of pur-
`poseful movement in its environment. This dissertation focuses on the problent of
`how to employ neural network based perception to guide the movement of such
`a robot. To navigate effectively in a complex environment, a mobile robot must
`be equipped with sensors for gathering information about its surroundings. In this
`work, I have chosen imaging sensors as the primary source of information be-
`cause of their ability to quickly provide dense representations of the environment.
`The imaging sensors actually employed include color and black-and-white video
`cameras, a scanning laser rangefinder and a scanning laser reflectance sensor'.
`The imaging sensors provide input to the component of an autonomous mobile
`robot which will be the primary focus of this dissertation, the perceptual process-
`ing module (See Figure 1.1). The job of the perceptual processing module is to
`transform the information about the environment provided by one or more imag-
`ing sensors into an appropriate high level motor command. The motor command
`appropriate for a given situation depends both on the current state of the world as
`reported by the sensors, and the perception module's knowledge of appropriate re-
`sponses for particular situations. The motor responses produced by the perception
`module take the form of elementary movement directives, such as "drive the robot
`
`'Work is also underway in using the same techniques to interpret the output from a sonar array
`and an infrared camera.
`
`
`
`1.1. PROBLEM DESCRIPTION
`
`3
`
`along an arc with a 30m radius" or "move the robot's foot 2.5cm to the right".
`The elementary movement directives are carried out by a controller which
`manipulates the robot's actuators. The determination of the correct motor torques
`required to smoothly and accurately perform the elementary movement directives
`is not addressed in this dissertation. While it is possible to use connectionist tech-
`niques for low level control [Jordan & Jacobs, 1990, Katayama & Kawato, 19911.
`this aspect of the problem is implemented using classical PID control in each ot
`the systems described in this work.
`The two mobile robot domains used to develop and demonstrate the techniques
`of this thesis are autonomous outdoor driving and precise foot positioning for a
`robot designed to walk on the exterior of a space station. Because it embodies
`most of the difficulties inherent in any mobile robot task, autonomous outdoor
`driving is the primary focus of this work.
`In autonomous outdoor driving, the goal is to safely guide the robot through
`the environment. Most commonly, the environment will consist of a network of
`roads with varying characteristics and markings. In this situation, the goal is to
`keep the robot on the road, and in the correct lane when appropriate. There is
`frequently the added constraint that a particular route should be followed through
`the environment, requiring the system to make decisions such as which way to
`turn at intersections. An additional desired behavior for the autonomous robot is
`to avoid obstacles, such as other cars, when they appear in its path.
`The difficulty of outdoor autonomous driving stems from four factors. They
`
`are
`
`" Task variations due to changing road type
`
`* Appearance variations due to lighting and weather conditions
`
`* Real time processing constraints
`
`" High level reasoning requirements
`
`A general autonomous driving system must be capable of navigating in a
`wide variety of situations. Consider some of the many driving scenarios people
`encounter every day: There are multi-lane roads with a variety of lane markers.
`There are two-lane roads without lane markers. There are situations, such as
`city or parking lot driving, where the primary guidance comes not from the r-ad
`delineations, but from the need to avoid other cars and pedestrians.
`
`
`
`4
`
`CHAPTER 1. INTRODUCTION
`
`The second factor making autonomous driving difficult is the variation in
`appearance that results from environmental factors. Lighting changes, and deep
`shadows make it difficult for perception systems to consistently pick out important
`features during daytime driving. The low light conditions encountered at night
`make it almost impossible for a video-based system to drive reliably. In addition,
`missing or obscured lane markers make driving difficult for an autonomous system
`even under favorahle lighting conditions.
`Given enough time, a sophisticated image processing system might be able to
`overcome these difficulties. However the third challenging aspect of autonomous
`driving is that there is a limited amount of time available for processing sensor
`information. To blend in with traffic, an autonomous system must drive at a
`relatively high speed. To drive quickly, the system must react quickly. For
`example, at 50 miles per hour a vehicle is traveling nearly 75 feet per second. A
`lot can happen in 75 feet, including straying a significant distance from the road,
`if the system isn't reacting quickly or accurately enough.
`Finally, an autonomous driving system not only must perform sophisticated
`perceptual processing, it also must make high level symbolic decisions such as
`which way to turn at intersections. This dissertation shows that the first three
`factors making mobile robot guidance difficult can be overcome using artificial
`neural networks for perception, and the fourth can be handled by combining
`artificial neural networks with symbolic processing.
`
`1.2 Robot Testbed Description
`
`The primary testbed for demonstrating the applicability of the ideas developed
`in this dissertation to autonomous outdoor driving is the CMU Navlab, shown
`in Figure 1.2. The Navlab is a modified Chevy van equipped with forward and
`backward facing color cameras and a scanning laser rangefinder for sensing the
`environment. These are the primary sensory inputs the system receives. The
`Navlab also contains an inertial navigation system (INS) which can maintain the
`vehicle's position relative to its starting location. The Navlab is outfitted with
`three Sun Sparcstations, which are used for perceptual processing and other high
`level computation. The Navlab also has a 68020-based processor for controlling
`the steering wheel and accelerator and for monitoring the vehicle's status.
`The Navlab can be controlled by computer or driven by a person just like a
`normal car. This human controllability is useful for getting the Navlab to a test site,
`
`
`
`13. DISSERTATION OVERVIEW
`
`5
`
`Figure 1.2: The CMU Navlab Autonomous Navigation Testbed
`
`and as will be seen in Chapter 3, for teaching an artificial neural network to drive by
`example. I have used two other robots to demonstrate the power of connectionist
`robot guidance, a ruggedized version of the Navlab called Navlab II, and a walking
`robot called the Self Mobile Space Manipulator (SM 2). These additional testbeds
`will be described in more detail in Chapters 5 and 9, respectively.
`
`1.3 Dissertation Overview
`
`The goal of this thesis is to develop techniques that enable artificial neural networks
`to guide mobile robots using visual input. In chapter 2, 1 present the simple neural
`network architecture that serves as the basis for the connectionist mobile robot
`guidance system I develop called ALVINN (Autonomous Land Vehicle In a Neural
`Network). The architecture consists of a single hidden layer, feedforward network
`(see Figure 1.3). The input layer is a two dimensional retina which receives input
`from an imaging sensor such as a video camera or scanning laser rangefinder. The
`output layer is a vector of units representing different steering responses, ranging
`from a sharp left to a sharp right turn. The network receives as input an image of
`the road ahead, and produces as output the steering command that will keep the
`vehicle on the road.
`
`
`
`6
`
`CHAPTER 1. INTRODUCTION
`
`S11"
`
`Lgft
`
`Aa
`
`SMkwp
`
`i~
`
`300uta
`
`301n2 Se
`h-t Red"a
`
`Figure 1.3: ALVINN driving network architecture
`
`Although some aspects of ALVINN's architecture and I/O representation are
`unique, the network structure is not the primary reason for ALVINN's success.
`Instead, much of its success can be attributed to the training methods presented
`in Chapters 3 and 4. Using the training "on-the-fly" techniques described in
`Chapter 3, individual three-layered ALVINN networks can quickly learn to drive
`by watching a person steer. These methods allow ALVINN to learn about new
`situations first hand, as a person drives the vehicle. The ability to augment the
`limited amount of live training data available from the sensors with artificial images
`depicting rare situations is shown to be crucial for reliable network performance
`in both Chapters 3 and 4.
`Using the architecture described in Chapter 2, and the training techniques from
`Chapters 3 and 4, ALVINN is able to drive in a wide variety of situations, described
`in Chapter 5. As a preview, some of ALVINN's capabilities include driving on
`single-lane paved and unpaved roads, and multi-lane lined and unlined roads, at
`speeds of up to 55 miles per hour.
`But developing networks that can drive is not enough. It is also important to
`und, -stand how the networks perform their functions. In order to quantitatively
`understand the internal representation developed by individual driving network, I
`
`
`
`13. DISSERTATION OVERVIEW
`
`7
`
`develop a technique called sensitivity analysis in Chapter 6. Sensitivity analysis
`is a graphical technique which provides insight into the processing performed by
`a network's individual hidden units, and into the cooperation between multiple
`hidden units to carry out a task. The analysis techniques in Chapter 6 illustrate
`that ALVINN's internal representation varies widely depending on the situations
`for which it is trained. In short, ALVINN develops filters for detecting image
`features that correlate with the correct steering direction.
`A typical filter developed by ALVINN is shown in Figure 1.4.
`It depicts
`the connections projecting to and from a single hidden unit in a network trained
`on video images of a single-lane, fixed width road. This hidden unit receives
`excitatory connections (shown as white spots) from a road shaped region on the
`left of the input retina (see the schematic). It makes excitatory connections to the
`output units representing a sharp left turn. This hidden unit is stimulated when a
`road appears on the left, and suggests a left turn in order to steer the vehicle back
`to the road center. Road-shaped region detectors such as this are the most common
`type of feature filters developed by networks trained on single-lane unlined roads.
`In contrast, when trained for highway driving ALVINN develops feature detectors
`that determine the position of the lane markers painted on the road.
`This situation specificity allows individual networks to learn quickly and drive
`reliably in limited domains. However it also severely limits the generality of
`individual driving networks. Chapters 7, 8 and 9 focus on techniques for combining
`multiple simple driving networks into a single system capable of driving in a wide
`variety of situations. Chapter 7 describes rule-based techniques for integrating
`multiple networks and a symbolic mapping system. The idea is to use a map of
`the environment to determine which situation-specific network is appropriate for
`the current circumstances. The symbolic mapping module is also able to provide
`ALVINN with something the networks lack, namely the ability to make high level
`decision such as which way to turn at intersections.
`However rule-based arbitration is shown to have significant shortcomings.
`Foremost among them is that it requires detailed symbolic knowledge of the
`environment, which is often difficult to obtain. In Chapters 8 and 9, I develop
`connectionist multi-network arbitration techniques to complement the rule-based
`methods of Chapter 7. These techniques allow individual networks to estimate
`their own reliability in the current situation. These reliability estimates can be
`used to weight the responses from multiple networks and to determine when a new
`network needs to be trained.
`Chapter 10 illustrates the flexibility of connectionist mobile robot guidance
`
`
`
`8
`
`CHAPTER 1. INTRODUCTION
`
`Weight to Output Units
`
`Weight from Input Retina
`
`-. Road
`-. Non-Road
`
`Figure 1: Diagram of weights projecting to and from a typical hidden unit in a
`network trained on roads with a fixed width. This hidden unit acts as a filter for a
`road on the left side of the visual field as illustrated in the schematic.
`
`
`
`13. DISSERTATION OVERVIEW
`
`9
`
`by demonstrating its use in a very different domain, the control of a two-legged
`walking robot designed to inspect the space station exterior. The crucial task in this
`domain is to precisely position the foot of the robot in order to anchor it without
`damaging either the space station or the robot. The same methods developed to
`steer an autonomous vehicle are employed to safely guide the foot placement of
`this walking robot.
`In Chapter 11, the neural network approach to autonomous robot guidance
`is compared with other techniques, including hand-programmed algorithms and
`other machine learning methods. Because of its ability to adapt to new situations,
`ALVINN is shown to be more flexible than previous hand-programmed systems for
`mobile robot guidance. The connectionist approach employed in ALVINN is also
`demonstrated to have distinct advantages over other machine learning techniques
`such as nearest neighbor matching, decision trees and genetic algorithms.
`Finally, Chapter 12 summarizes the results and discusses the contributions of
`this dissertation. It concludes by presenting areas for future work.
`
`
`
`Chapter 2
`
`Network Architecture
`
`The first steps in applying artificial neural networks to a problem involve choosing
`a training algorithm and a network architecture. The two decisions are intimately
`related, since certain training algorithms require, or are best suited to, specific net-
`work architectures. For this work, I chosen a multi-layered perceptron (MLP) and
`the back-propagation training algorithm [Rumelhart, Hinton & Williams, 19861
`for the following reasons:
`
`* The task requires supervised learning from examples (i.e. given sensor
`input, the network should respond with a specific motor response). This
`rules out unsupervised/competitive learning algorithms like Kohonen's self-
`organizing feature maps [Kohonen, 1990] which learn to classify inputs on
`the basis of statistically significant features, but not to produce particular
`desired responses.
`
`* The system should learn relatively quickly, since one of the goals is to rapidly
`adap: to new driving situations. This rules out certain supervised train-
`ing algorithms/architectures such as Boltzmann Machines [Hopfield, 1982],
`which are notoriously slow at learning.
`
`* The task of determining the correct motor response from sensor input was
`not expected to require substantial, run-time knowledge about recent inputs.
`Thus, it was decided that the extensions of the back-propagation algorithm
`to fully recurrent networks [Pineda, 1987, Pearlmutter, 1988] was not nec-
`essary.
`
`10
`
`
`
`2.1. ARCHITECTURE OVERVIEW
`
`11
`
`The decision to use artificial neural networks in the first place, and to use back-
`propagation over other closely related neural network training algorithms like
`quickprop [Fahlman, 1988] and radial basis functions [Poggio & Girosi, 1990]
`can be better understood after presentation of the architecture and training scheme
`actually employed in this work, and hence will be discussed in Chapter 11.
`Once the decision is made to use a feedforward multi-layered perceptron
`as the underlying network architecture, the question then becomes "what form
`should the MLP take?" This question can be divided into three components:
`the input representation, the output representation, and the network's internal
`I will discuss each of the three components separately, theoretically
`structure.
`and/or empirically justifying the choices made.
`
`2.1 Architecture Overview
`
`The architecture of the perception networks chosen for mobile robot guidance
`consists of a multi-layer perceptron with a single hidden layer (See Figure 2.1).
`The "nput layer of the network consists of a 30x32 unit "retina" which receives
`images from a sensor. Each of the 960 units in the input retina is fully connected
`to the hidden layer of 4 units, which in turn is fully connected to the output layer.
`The output layer represents the motor response the network deems appropriate for
`the current situation. In the case of a network trained for autonomous driving, the
`output layer consists of 30 units and is a linear representation of the direction the
`vehicle should steer in the current situation. The middle output unit represents
`the "travel straight ahead" condition, while units to the left and right of center
`represent successively sharper left and right turns.
`To control a mobile robot using this architecture, input from one of the robot's
`sensors is reduced to a low-resolution 30x32 pixel image and projected onto
`the input retina. After completing a forward pass through the network, a motor
`response is derived from the output activation levels and performed by the low level
`controller. In the next sections, I will expand this high level description, giving
`more details of how the processing actually proceeds and why this architecture
`was chosen.
`
`
`
`12
`
`CHAPTER 2. NETWORK ARCHITECTURE
`
`~Unu
`
`4 Hiden
`
`M432 S..o
`Input ieting
`
`Figure 2.1: ALVINN driving network architecture
`
`2.2
`
`Input Representations
`
`Perhaps the most important factor determining the performance of a particular
`neural network architecture on a task is the input representation. In determining
`the input representation, there are two schools of thought. The first believes that the
`best input representation is one which has been extensively preprocessed to make
`"important" features prominent and therefore easy for the network to incorporate
`in its processing. The second school of thought contends that it is best to give
`the network the "raw" input and let it learn from experience what features are
`important.
`Four factors to consider when deciding the extent of preprocessing to perform
`on the input are the existence of a preprocessing algorithm, its necessity, its com-
`plexity and its generality. If straightforward algorithms are known to perform
`crucial initial processing steps applicable in a wide variety of situations, then it
`is advisable to use them. Such is the case in speech recognition where the raw
`speech input, as represented as amplitude of sound waves over time, is converted
`into coefficients representing amplitudes at various frequencies over time using
`a fast Fourier transform preprocessing step [Waibel et al., 1987]. This FFT pre-
`
`
`
`2.2. INPUT REPRESENTATIONS
`
`13
`
`processing is known to be a useful and widely applicable first step in automatic
`processing of speech data [O'Shaughnessy, 1987]. Furthermore, algorithms to
`compute a signal's Fourier transform, while complex, are well understood and
`have efficient implementations on a variety of computer architectures. Finally,
`not only is the information contained in the Fourier transform proven useful in
`previous speech recognition systems, the Fourier transform also has the property
`that little information in the original signal is lost in the transformat