`t ~ ,.
`
`AD-A249 972
`
`1111!11111!1 !!II! IIIII IIIII !1111111111111 !Ill
`
`Neural Network Perception for Mobile Robot
`Guidance
`
`Dean A. Pomerleau
`February 16, 1992
`CMU-CS-92-115
`
`School of Computer Science
`Carnegie Mellon University
`Pittsburgh, PA 152i3
`
`DTI SBLECTK
`
`.. I
`
`MAY12
`
`- ··- -~ ....... ~ .. ~ ... , ........... ·-- -~-J
`
`.
`
`Submitted in partial fulfillment of the requirements
`for the degree of Doctor of Philosophy_.
`·• .. .{:.'!!:?!:J:::,., "\
`-·~·-'a·~"~-"· .. •' ·-·· .
`CLEARED;. :· · , . '- :-- ~. _
`• ''P ')-,
`... ['
`.
`l' ~ 1" PL!Bt !('ATIOfJ
`~ ....
`~.. ... .o.....:..
`~,..
`
`APR 2 8 1992
`
`Q
`t.
`
`© 1992 by Dean A. Pomerleau
`
`Support for this work has come from DARPA, under contracts DACA76-85-C-0019,
`DACA76-85-C-0003, DACA76-85-C-0002, DACA76-89-C-0014 and DAAE07-90-C-R059.
`These contracts were monitored by the Topographic Engineering Center and by TACOM. This re(cid:173)
`search was also funded in part by grants from the Fujitsu Corporation and the Shimizu Corporation.
`
`92-12205
`lllllllllllllllllllllllllllllllllllllllllllll
`
`\_ !
`
`.-.1
`
`IPR2013-00424 - Ex. 1005
`Toyota Motor Corp., Petitioner
`1
`
`
`
`Neural Network Perception for Mobile Robot
`Guidance
`
`Dean A. Pomerleau
`February 16, 1992
`CMU-CS-92-115
`
`School of Computer Science
`Carnegie Mellon University
`Pittsburgh, PA 15213
`
`Submitted in partial fulfillment of the requirements
`for the degree of Doctor of Philosophy.
`
`@ 1992 by Dean A. Pomerleau
`
`Suppon for this work has come from DARPA, under contracts DACA76-35-C-0019,
`DACA76-85-C-0003, DACA76-85-C-0002, DACA76-89-C-0014 and DAAE07-90-C-R059.
`These contracts were monitored by the Topographic Engineering Center and by TACOM. This re(cid:173)
`search was also funded in part by grants from the Fujitsu Corporation and the Shimizu Corporation.
`
`2
`
`
`
`• egte
`on
`
`School of Computer Science
`
`DOCTORAL THESIS
`in the field of
`Computer Science
`
`Neural Network Perception for Mobile Robot Guidance
`
`DEAN POMERLEAU
`
`Submitted in Partial Fulfillment of the Requirements
`for the Degree of Doctor of Philosophy
`
`ACCEP'IED:
`
`f2·I?J4==
`
`I
`
`APPROVED:
`
`DEAN
`
`r
`
`•
`
`DATE
`
`DATE
`
`PROVOST
`
`DATE
`
`3
`
`
`
`Dedicated to Terry, Glen and Phyllis
`
`toeesslou lol'
`ITIS GRAll
`DTIC TAB
`Unannollltced
`Justlt'1cat1on.
`
`/
`£!r'
`0
`0
`
`By
`~!~r~~u~!!>!¥ ___
`AY61lAb111ty Codes
`
`J)lst ·rv;;!~t~/or -
`~.-1
`
`4
`
`
`
`Abstract
`
`Vision based mobile robot guidance has proven difficult for classical machine
`vision methods because of the diversity and real time constraints inherent in the
`task. This thesis describes a connectionist system called ALVINN (Autonomous
`Land Vehicle In a Neural Network) that overcomes these difficulties. ALVINN
`learns to guide mobile robots using the back-propagation training algorithm. Be(cid:173)
`cause of its ability to learn from example, ALVINN can adapt to new situations
`and therefore cope with the diversity of the autonomous navigation task.
`
`But real world problems like vision based mobile robot guidance presents a
`different set of challenges for the connectionist paradigm. Among them are:
`
`• How to develop a general representation from a limited amount of real
`training data,
`
`• How to understand the internal representations developed by artificial neural
`networks,
`
`• How to estimate the reliability of individual networks,
`
`• How to combine multiple networks trained for different situations into a
`single system,
`
`• How to combine connectionist perception with symbolic reasoning.
`
`This thesis presents novel solutions to each of these problems. Using these
`techniques, the ALVINN system can learn to control an autonomous van in under 5
`minutes by watching a person drive. Once trained, individual ALVINN networks
`can drive in a variety of circumstances, including single-lane paved and unpaved
`roads, and multi-lane lined and unlined roads, at speeds of up to 55 miles per hour.
`The techniques also are shown to generalize to the task of controlling the prec; Je
`foot placement of a walking robot.
`
`5
`
`
`
`Acknowledgements
`I wish to thank my advisor, Dr. David Touretzky for his support and technical
`advice during my graduate studies. Dave has not only provided invaluable feed(cid:173)
`back, he has also given me the latitude I've needed to develop as a researcher. I am
`also grateful to Dr. Charles Thorpe for the opportunities, resources and expertise
`he has provided me. Without Chuck's support, this research would not have been
`possible. I also wish to thank the other members of my committee, Dr. Takeo
`Kanade and Dr. Terrence Sejnowski, for their insightful analysis and valuable
`comments concerning my work.
`
`I owe much to all the members of the ALV/UGV project. Their technical
`support and companionship throughout the development of the ALVINN system
`has made my work both possible and enjoyable. I would like to specifically
`thank Jay Gowdy and Omead Amidi, whose support software underlies much of
`the ALVINN system. James Frazier also deserves thanks for his patience during
`many hours of test runs on the Navlab.
`
`Interaction with members of the Boltzmann group at Carnegie Mellon has
`also been indispensable. From them I have not only learned about all aspects of
`connectionism, but also how to communicate my thoughts and ideas. In panicular,
`the insights and feedback provided by discussions with John Hampshire form the
`basis for much of this thesis. I am grateful to Dave Plaut, whose helpful suggestions
`in the early stages of this work put me on the right track.
`
`Other people who have contributed to the success of this thesis are the mem hers
`of the SM2 group. In particular, Ben Brown and Hiroshi Ueno have given me the
`opportunity, incentive and support I have needed to explore an alternative domain
`for connectionist mobile robot guidance.
`
`I am also in debt to my office mates, Spiro Michaylov and Nevin Heintze.
`They have helped me throughout our time as graduate students, with everything
`from explaining ~TJ:Xpeculiarities to feeding my fish. I would like to thank my
`parents, Glen and Phyllis, for encouraging my pursuit of higher education, and
`for all the sacrifices they've made to provide me with a world of opportunities.
`Finally, I am especially grateful to my fiancee, Terry Jessie, for her constant love,
`support and patience during the difficult months spent preparing this dissertation.
`Her presence in my life has helped me keep everything in perspective.
`
`Dean A. Pomerleau
`
`February 16, 1992
`
`-~----- - - - - - -
`
`6
`
`
`
`Contents
`
`1
`
`Introduction
`1.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . .
`1.2 Robot Testbed Description . . . . . . . . . . . . .
`13 Dissenation Overview . . . . . . . . . . . . . . . . . . . . . .
`
`2 Network Architecture
`2.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . . .
`2.2
`Input Representations
`. . . . . . . . . . . . . . . . . . . . . .
`2.2.1 Preprocessing Practice . . . . . . . . . . . .
`2.2.2
`Justification of Preprocessing
`. . . . . . . .
`2.3 Output Representation . . . . . . . . . . . . . . . .
`2.3.1
`1-of-N Output Representation
`. . . . . . . . . . .
`2.3.2 Single Graded Unit Output Representation . . . . .
`2.3.3 Gaussian Output Representation . . . . . . . . . . . . .
`2.3.4 Comparing Output Representations
`Internal Network Structures
`. . . . . . . . . . . . . . . . . . .
`
`2.4
`
`3 Training Networks "On· The-Fly"
`. . . . . . . . .
`3.1 Training with Simulated Data
`3.2 Training "on-the-fly" with Real Data . . . . . .
`3.2.1 Potential Problems . . . . . . . . . . .
`. . . . . .
`3.2.2 Solution - Transform the Sensor Image
`3.2.3 Transforming the Steering Direction . . . . . . . .
`3.2.4 Adding Diversity Through Buffering
`. . .
`3.3 Performance Improvement Using Transformations .
`3.4 Discussion . . . . . . . . . . . . . . . . . . . . .
`
`1
`2
`4
`5
`
`10
`11
`12
`13
`18
`20
`22
`24
`28
`33
`35
`
`37
`38
`41
`41
`42
`48
`52
`54
`56
`
`v
`
`7
`
`
`
`4 Training Networks With Structured Noise
`4.1 Transitory Feature Problem
`. . . .
`4.2 Training with Gaussian Noise
`. . . . .
`4.3 Characteristics of Structured Noise . . .
`4.4 Training with Structured Noise . . . . . . . .
`Improvement from Structured Noise Training
`4.5
`4.6 Discussion . . . . . . . . . . . . . . . . . .
`
`5 Driving Results and Performance
`5.1 Situations Encountered . . . . . . . . . .
`5.1.1 Single Lane Paved Road Driving.
`5.1.2 Single Lane Din Road Driving. .
`5.1.3 Two-Lane Neighborhood Street Driving .
`5.1.4 Railroad Track Following
`. .
`5.1.5 Driving in Reverse . . . . . .
`. . . . .
`5.1.6 Multi-lane Highway Driving .
`. . . . . . . . . . .
`5.2 Driving with Alternative Sensors , . .
`5.2.1 Night Driving Using Laser Reflectance Images
`5.2.2 Training with a Laser Range Sensor . . . . . .
`5.2.3 Contour Following Using Laser Range Images
`5.2.4 Obstacle Avoidance Using Laser Range Images
`5.3 Quantitative Perfonnance Analysis . . .
`. . . . .
`5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . .
`
`6 Analysis of Network Representations
`6.1 Weight Diagram Interpretation . . . . . . . .
`6.2 Sensitivity Analysis
`. . . . . . . . . . . . .
`6.2.1 Single Unit Sensitivity Analysis . . .
`6.2.2 Whole Network Sensitivity Analysis . . . . . . . . .
`6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`7 Rule-Based Multi-network Arbitration
`7.1 Symbolic Knowledge and Reasoning.
`7.2 Rule-based Driving Module Integration
`7.3 Analysis and Discussion . . . . . . . .
`
`58
`58
`64
`68
`69
`75
`77
`
`80
`80
`81
`83
`84
`84
`85
`85
`86
`88
`88
`90
`90
`91
`94
`
`96
`97
`101
`103
`108
`118
`
`120
`121
`125
`128
`
`8
`
`
`
`8 Output Appearance Reliability Estimation
`8.1 Review of Previous Arbitration Techniques
`8.2 OARE Details
`. . . . . . . . . . . .
`8.3 Results Using OARE . . . . . . . . .
`8.3.1 When and Why OARE Works
`. . . . . . .
`8.4 Shortcomings of OARE
`
`9
`
`Input Reconstruction Reliability Estimation
`9.1 The IRRE Idea . . . . . . . .
`9.2 Network Inversion . . . . . .
`9.3 Backdriving the Hidden Units
`9.4 Autoencoding the Input ..
`9.5 Discussion . . . . . . . .
`
`10 Other Applications • The SM2
`10.1 The Task . . . . . . . . .
`10.2 Network Architecture
`. .
`10.3 Network Training and Performance
`10.4 Discussion . . . . . . . . . . . . . .
`
`11 Other Vision-based Robot Guidance Methods
`11.1 Non-learning Autonomous Driving Systems
`11.1.1 Examples
`. . . . . . . . . . . .
`11.1.2 Comparison with ALVINN ...
`11.2 Other Connectionist Navigation Systems .
`11.3 Other Potential Connectionist Methods .
`11.4 Other Machine Learning Techniques .
`11.5 Discussion . . . . . . . . . . . . . . .
`
`12 Conclusion
`12.1 Contributions .
`12.2 Future Work
`.
`
`131
`132
`135
`137
`142
`145
`
`147
`147
`148
`152
`159
`163
`
`168
`168
`171
`171
`176
`
`178
`179
`179
`181
`182
`184
`186
`190
`
`192
`192
`196
`
`9
`
`
`
`Chapter 1
`
`Introduction
`
`A truly autonomous robot must sense its environment and react appropriately.
`Previous mobile robot perception systems have relied on hand-coded algorithms
`for processing sensor information. In this dissertation I develop techniques which
`enable artificial neural networks (ANNs) to learn the visual processing required
`for mobile robot guidance. The power and flexibility of these techniques are
`demonstrated in two domains, wheeled vehicle navigation, and legged robot foot
`positioning.
`The central claims of this dissertation are:
`
`• By appropriately constraining the problem, the network architecture and the
`training algorithm, ANNs can quickly learn to perform many of the complex
`perception tasks required for mobile robot navigation.
`
`• A neural network-based mobile robot perception system is able to robustly
`handle a wider variety of situations than hand-programmed systems because
`of the ability of ANNs to adapt to new sensors and situations.
`
`• Artificial neural networks are not just black boxes. Their internal represen(cid:173)
`tations can be analyzed and understood.
`
`• The reliability of ANNs can be estimated with a relatively high precision.
`These reliability estimates can be employed to arbitrate between multiple
`expert networks, and hence facilitate the modular construction of connec(cid:173)
`tionist systems.
`
`1
`
`10
`
`
`
`2
`
`CHMTERJ. INTRODUCTION
`
`Sensors
`
`Pm:eptual Perceptual Motor
`Low-Level Control Actuators
`Processing Comm.OO Controller
`lnpul
`Signal
`t
`I
`
`Control
`Feedback
`
`Figure 1.1: Block diagram of sensor based mobile robot guidance
`
`l i I
`
`• By combining neural network-based perception with symbolic reasoning,
`an autonomous navigation system can achieve accurate low-level control
`and exhibit intelligent high-level behavior.
`
`1.1 Problem Description
`
`To function effectively, an autonomous mobile robot must first be capable of pur(cid:173)
`poseful movement in its environment. This dissenation focuses on the problem of
`how to employ neural network based perception to guide the movement of such
`a robot. To navigate effectively in a complex environment, a mobile robot must
`be equipped with sensors for gathering information about its surroundings. In this
`work, I have chosen imaging sensors as the primary source of information be(cid:173)
`cause of their ability to quickly provide dense representations of the environment.
`The imaging sensors actually employed include color and bla~k-and-white video
`cameras, a scanning laser rangefinder and a scanning laser reflectance sensor1•
`The imaging sensors provide input to the component of an autonomous mobile
`robot which will be the primary focus of this dissenation, the perceptual process(cid:173)
`ing module (See Figure 1.1 ). The job of the perceptual processing module is to
`transform the information about the environment provided by one or more imag(cid:173)
`ing sensors into an appropriate high level motor command. The motor command
`appropriate for a given situation depends both on the current state of the world as
`reponed by the sensors, and the perception module's knowledge of appropriate re(cid:173)
`sponses for particular situations. The motor responses produced by the perception
`module take the form of elementary movement directives, such as "drive the robot
`
`1 Work is also underway in using the same techniques to interpret the output from a sonar array
`and an infrared camera.
`
`11
`
`
`
`1.1. PROBLEM DESCRIPTION
`
`3
`
`along an arc with a 30m radius" or "move the robot's foot 2.5cm to the right".
`The elementary movement directives are carried out by a controller which
`manipulates the robot's actuators. The determination of the correct motor torques
`required to smoothly and accurately perfonn the elementary movement directives
`is not addressed in this dissertation. While it is possible to use connectionist tech(cid:173)
`niques for low level control [Jordan & Jacobs, 1990, Katayama & Kawato, 19911.
`this aspect of the problem is implemented using classical PID control in each ot
`the systems described in this work.
`The two mobile robot domains used to develop and demonstrate the techniques
`of this thesis are autonomous outdoor driving and precise foot positioning for a
`robot designed to walk on the exterior of a space station. Because it embodies
`most of the difficulties inherent in any mobile robot task, autonomous outdoor
`driving is the primary focus of this work.
`In autonomous outdoor driving, the goal is to safely guide the robot through
`the environment. Most commonly, the environment will consist of a network of
`roads with varying characteristics and markings. In this situation, the goal is to
`keep the robot on the road, and in the correct lane when appropriate. There is
`frequently the added constraint that a particular route should be followed through
`the environment, requiring the system to make decisions such as which way to
`turn at intersections. An additional desired behavior for the autonomous robot is
`to avoid obstacles, such as other cars, when they appear in its path.
`The difficulty of outdoor autonomous driving stems from four factors. They
`
`are
`
`• Task variations due to changing road type
`
`• Appearance variations due to lighting and weather conditions
`
`• Real time processing constraints
`
`• High level reasoning requirements
`
`A general autonomous driving system must be capable of navigating in a
`wide variety of situations. Consider some of the many driving scenarios people
`encounter every day: There are multi-lane roads with a variety of lane markers.
`There are two-lane roads without lane markers. There are situations, such as
`city or parking lot driving, where the primary guidance comes not from the r"ad
`delineations, but from the need to avoid other cars and pedestrians.
`
`12
`
`
`
`4
`
`CHMTERJ. INTRODUCTION
`
`The second factor making autonomous driving difficult is the variation in
`appearance that results from environmental factors. Lighting changes, and deep
`shadows make it difficult for perception systems to consistently pick out important
`features during daytime driving. The low light conditions encountered at night
`make it almost impossible for a video-based system to drive reliably. In addition,
`missing or obscured lane markers make driving difficult for an autonomous system
`even under favora~le lighting conditions.
`Given enough time, a sophisticated image processing system might be able to
`overcome these difficulties. However the third challenging aspect of autonomous
`driving is that there is a limited amount of time available for processing sensor
`information. To blend in with traffic, an autonomous system must drive at a
`relatively high speed. To drive quickly, the system must react quickly. For
`example, at 50 miles per hour a vehicle is traveling nearly 75 feet per second. A
`lot can happen in 75 feet, including straying a significant distance from the road,
`if the system isn't reacting quickly or accurately enough.
`Finally, an autonomous driving system not only must perform sophisticated
`perceptual processing, it also must make high level symbolic decisions such as
`which way to turn at intersections. This dissertation shows that the first three
`factors making mobile robot guidance difficult can be overcome using artificial
`neural networks for perception, and the fourth can be handled by combining
`artificial neural networks with symbolic processing.
`
`1.2 Robot Testbed Description
`
`The primary testbed for demonstrating the applicability of the ideas developed
`in this dissertation to autonomous outdoor driving is the CMU Navlab, shown
`in Figure 1.2. The Navlab is a modified Chevy van equipped with forward and
`backward facing color cameras and a scanning laser rangefinder for sensing the
`environment. These are the primary sensory inputs the system receives. The
`Navlab also contains an inertial navigation system (INS) which can maintain the
`vehicle's position relative to its starting location. The Navlab is outfitted with
`three Sun Sparcstations, which are used for perceptual processing and other high
`level computation. The Navlab also has a 68020-based processor for controlling
`the steering wheel and accelerator and for monitoring the vehicle's status.
`The Navlab can be controlled by computer or driven by a person just like a
`normal car. This human controllability is useful for getting the Navlab to a test site,
`
`13
`
`
`
`1.3. DISSERTATION OVERVIEW
`
`5
`
`Figure 1.2: The CMU Navlab Autonomous Navigation Testbed
`
`and as will be seen in Chapter 3, for teaching an artificial neural network to drive by
`example. I have used two other robots to demonstrate the power of connectionist
`robot guidance, a ruggedized version of the Navlab called Navlab II, and a walking
`robot called the Self Mobile Space Manipulator (SM2). These additional testbeds
`will be described in more detail in Chapters 5 and 9, respectively.
`
`1.3 Dissertation Overview
`
`The goal of this thesis is to develop techniques that enable artificial neural networks
`to guide mobile robots using visual input. In chapter 2, I present the simple neural
`network architecture that serves as the basis for the connectionist mobile robot
`guidance system I develop called ALVINN (Autonomous Land Vehicle In a Neural
`Network). The architecture consists of a single hidden layer, feedforward network
`(see Figure 1.3). The input layer is a two dimensional retina which receives input
`from an imaging sensor such as a video camera or scanning laser rangefinder. The
`output layer is a vector of units representing different steering responses, ranging
`from a sharp left to a sharp right turn. The network receives as input an image of
`the road ahead, and produces as output the steering command that will keep the
`vehicle on the road.
`
`14
`
`
`
`6
`
`CHMTERJ. INTRODUCTION
`
`Figure 1.3: ALVINN driving network architecture
`
`Although some aspects of ALVINN's architecture and 1/0 representation are
`unique, the network structure is not the primary reason for ALVINN's success.
`Instead, much of its success can be attributed to the training methods presented
`in Chapters 3 and 4. Using the training "on-the-fly" techniques described in
`Chapter 3, individual three-layered ALVINN networks can quickly learn to drive
`by watching a person steer. These methods allow ALVINN to learn about new
`situations first hand, as a person drives the vehicle. The ability to augment the
`limited amount of live training data available from the sensors with artificial images
`depicting rare situations is shown to be crucial for reliable network performance
`in both Chapters 3 and 4.
`Using the architecture described in Chapter 2, and the training techniques from
`Chapters 3 and 4, ALVINN is able to drive in a wide variety of situations, described
`in Chapter 5. As a preview, some of ALVINN's capabilities include driving on
`single-lane paved and unpaved roads, and multi-lane lined and unlined roads, at
`speeds of up to 55 miles per hour.
`But developing networks that can drive is not enough. It is also important to
`undP ·stand how the networks perform their functions. In order to quantitatively
`understand the internal representation developed by individual driving network, I
`
`15
`
`
`
`13. DISSERTATION OVERVIEW
`
`7
`
`develop a technique called sensitivity analysis in Chapter 6. Sensitivity analysis
`is a graphical technique which provides insight into the processing performed by
`a network's individual hidden units, and into the cooperation between multiple
`hidden units to carry out a task. The analysis techniques in Chapter 6 illustrate
`that ALVINN's internal representation varies widely depending on the situations
`for which it is trained. In shon, ALVINN develops filters for detecting image
`features that correlate with the correct steering direction.
`A typical filter developed by ALVINN is shown in Figure 1.4. It depicts
`the connections projecting to and from a single hidden unit in a network trained
`on video images of a single-lane, fixed width road. This hidden unit receives
`excitatory connections (shown as white spots) from a road shaped region on the
`left of the input retina (see the schematic). It makes excitatory connections to the
`output units representing a sharp left tum. This hidden unit is stimulated when a
`road appears on the left, and suggests a left tum in order to steer the vehicle back
`to the road center. Road-shaped region detectors such as this are the most common
`type of feature filters developed by networks trained on single-lane unlined roads.
`In contrast, when trained for highway driving ALVINN develops feature detectors
`that determine the position of the lane markers painted on the road.
`This situation specificity allows individual networks to learn quickly and drive
`reliably in limited domains. However it also severely limits the generality of
`individual driving networks. Chapters 7, 8 and 9 focus on techniques for combining
`multiple simple driving networks into a single system capable of driving in a wide
`variety of situations. Chapter 7 describes rule-based techniques for integrating
`multiple networks and a symbolic mapping system. The idea is to use a map of
`the environment to determine which situation-specific network is appropriate for
`the current circumstances. The symbolic mapping module is also able to provide
`ALVINN with something the networks lack, namely the ability to make high level
`decision such as which way to tum at intersections.
`However rule-based arbitration is shown to have significant shoncomings.
`Foremost among them is that it requires detailed symbolic knowledge of the
`environment, which is often difficult to obtain. In Chapters 8 and 9, I develop
`connectionist multi-network arbitration techniques to complement the rule-based
`methods of Chapter 7. These techniques allow individual networks to estimate
`their own reliability in the current situation. These reliability estimates can be
`used to weight the responses from multiple networks and to determine when a new
`network needs to be trained.
`Chapter 10 illustrates the flexibility of connectionist mobile robot guidance
`
`16
`
`
`
`8
`
`CHM7ERJ. INTRODUCTION
`
`Weight to Output Units
`••• • ••••
`• ••
`Weight from Input Retina
`
`0Road
`[illJ Non-Road
`
`Figure 1.4: Diagram of weights projecting to and from a typical hidden unit in a
`network trained on roads with a fixed width. This hidden unit acts as a filter for a
`road on the left side of the visual field as illustrated in the schematic.
`
`17
`
`
`
`1.3. DJSSERTATJON OVERVIEW
`
`9
`
`by demonstrating its use in a very different domain, the control of a two-legged
`walking robot designed to inspect the space station exterior. The crucial task in this
`domain is to precisely position the foot of the robot in order to anchor it without
`damaging either the space station or the robot. The same methods developed to
`steer an autonomous vehicle are employed to safely guide the foot placement of
`this walking robot.
`In Chapter 11, the neural network approach to autonomous robot guidance
`is compared with other techniques, including hand-programmed algorithms and
`other machine learning methods. Because of its ability to adapt to new situations,
`ALVINN is shown to be more flexible than previous hand-programmed systems for
`mobile robot guidance. The connectionist approach employed in ALVINN is also
`demonstrated to have distinct advantages over other machine learning techniques
`such as nearest neighbor matching, decision trees and genetic algorithms.
`Finally, Chapter 12 summarizes the results and discusses the contributions of
`this dissertation. It concludes by presenting areas for future work.
`
`18
`
`
`
`Chapter 2
`
`Network Architecture
`
`The first steps in applying artificial neural networks to a problem involve choosing
`a training algorithm and a network architecture. The two decisions are intimately
`related, since cenain training algorithms require, or are best suited to, specific net(cid:173)
`work architectures. For this work, I chosen a multi-layered perceptron (MLP) and
`the back-propagation training algorithm [Rumelhart, Hinton & Williams, 1986]
`for the following reasons:
`
`• The task requires supervised learning from examples (i.e. given sensor
`input, the network should respond with a specific motor response). This
`rules out unsupervised/competitive learning algorithms like Kohonen 's self(cid:173)
`organizing feature maps [Kohonen, 1990] which learn to classify inputs on
`the basis of statistically significant features, but not to produce particular
`desired responses.
`
`• The system should learn relatively quickly, since one of the goals is to rapidly
`adap: to new driving situations. This rules out cenain supervised train(cid:173)
`ing algorithms/architectures such as Boltzmann Machines [Hopfield, 1982],
`which are notoriously slow at learning.
`
`• The task of determining the correct motor response from sensor input was
`not expected to require substantial, run-time knowledge about recent inputs.
`Thus, it was decided that the extensions of the back-propagation algorithm
`to fully recurrent networks [Pineda, 1987, Pearlmutter, 1988] was not nec(cid:173)
`essary.
`
`10
`
`19
`
`
`
`2.1. ARCHITECJ'URE OVERVIEW
`
`11
`
`The decision to use artificial neural networks in the first place, and to use back(cid:173)
`propagation over other closely related neural network training algorithms like
`quickprop [Fahlman, 1988] and radial basis functions [Poggio & Girosi, 1990]
`can be better understood after presentation of the architecture and training scheme
`actually employed in this work, and hence will be discussed in Chapter 11.
`
`Once the decision is made to use a feedforward multi-layered perceptron
`as the underlying network architecture, the question then becomes "what form
`should the MLP take?" This question can be divided into three components:
`the input representation, the output representation, and the network's internal
`structure. I will discuss each of the three components separately, theoretically
`and/or empirically justifying the choices made.
`
`2.1 Architecture Overview
`
`The architecture of the perception networks chosen for mobile robot guidance
`consists of a multi-layer perceptron with a single hidden layer (See Figure 2.1).
`The ·nput layer of the network consists of a 30x32 unit "retina" which receives
`images from a sensor. Each of the 960 units in the input retina is fully connected
`to the hidden layer of 4 units, which in turn is fully connected to the output layer.
`The output layer represents the motor response the network deems appropriate for
`the current situation. In the case of a network trained for autonomous driving, the
`output layer consists of 30 units and is a linear representation of the direction the
`vehicle should steer in the current situation. The middle output unit represents
`the "travel straight ahead" condition, while units to the left and right of center
`represent successively sharper left and right turns.
`
`To control a mobile robot using this architecture, input from one of the robot's
`sensors is reduced to a low-resolution 30x32 pixel image and projected onto
`the input retina. After completing a forward pass through the network, a motor
`response is derived from the output activation levels and performed by the low level
`controller. In the next sections, I will expand this high level description, giving
`more details of how the processing actually proceeds and why this architecture
`was chosen.
`
`20
`
`
`
`12
`
`CHAPTER 2. NEIWORK ARCHITECTURE
`
`lOO.t,.t
`Unlu
`
`Figure 2.1: ALVINN driving network architecture
`
`2.2
`
`Input Representations
`
`Perhaps the most imponant factor determining the performance of a particular
`neural network architecture on a task is the input representation. In determining
`the input representation, there are two schools of thought. The first believes that the
`best input representation is one which has been extensively preprocessed to make
`"important" features prominent and therefore easy for the network to incorporate
`in its processing. The second school of thought contends that it is best to give
`the network the "raw" input and let it learn from experience what features are
`important.
`Four factors to consider when deciding the extent of preprocessing to perform
`on the input are the existence of a preprocessing algorithm, its necessity, its com(cid:173)
`plexity and its generality. If straightforward algorithms are known to perform
`crucial initial processing steps applicable in a wide variety of situations, then it
`is advisable to use them. Such is the case in speech recognition where the raw
`speech input, as represented as amplitude of sound waves over time, is converted
`into coefficients representing amplitudes at various frequencies over time using
`a fast Fourier transform preprocessing step [Waibel et al., 1987]. This FFf pre-
`
`21
`
`
`
`22. INPUT REPRESENTATIONS
`
`13
`
`processing is known to be a useful and widely applicable first step in automatic
`processing of speech data [O'Shaughnessy, 1987]. Furthermore, algorithms to
`compute a signal's Fourier transform, while complex, are well understood and
`have efficient implementations on a variety of computer architectures. Finally,
`not only is the information contained in the Fourier transform proven useful in
`previous speech recognition systems, the Fourier transform also has the property
`that little information in the original signal is lost in the transformation. This
`insures that imponant input features are not lost as a result of the FFr.
`On the other hand, if appropriate preprocessing algorithms are not known
`for the task, or if the known algorithms are too complex to be practical, the
`solution is to give the n