`
`www.elsevier.com/locate/imavis
`
`A survey of video processing techniques for traffic applications
`
`V. Kastrinaki, M. Zervakis*, K. Kalaitzakis
`
`Digital Image and Signal Processing Laboratory, Department of Electronics and Computer Engineering, Technical University of Crete, Chania 73100, Greece
`
`Received 29 October 2001; received in revised form 18 December 2002; accepted 15 January 2003
`
`Abstract
`
`Video sensors become particularly important in traffic applications mainly due to their fast response, easy installation, operation and
`maintenance, and their ability to monitor wide areas. Research in several fields of traffic applications has resulted in a wealth of video
`processing and analysis methods. Two of the most demanding and widely studied applications relate to traffic monitoring and automatic
`vehicle guidance. In general, systems developed for these areas must integrate, amongst their other tasks, the analysis of their static
`environment (automatic lane finding) and the detection of static or moving obstacles (object detection) within their space of interest. In this
`paper we present an overview of image processing and analysis tools used in these applications and we relate these tools with complete
`systems developed for specific traffic applications. More specifically, we categorize processing methods based on the intrinsic organization of
`their input data (feature-driven, area-driven, or model-based) and the domain of processing (spatial/frame or temporal/video). Furthermore,
`we discriminate between the cases of static and mobile camera. Based on this categorization of processing tools, we present representative
`systems that have been deployed for operation. Thus, the purpose of the paper is threefold. First, to classify image-processing methods used
`in traffic applications. Second, to provide the advantages and disadvantages of these algorithms. Third, from this integrated consideration, to
`attempt an evaluation of shortcomings and general needs in this field of active research.
`q 2003 Elsevier Science B.V. All rights reserved.
`
`Keywords: Traffic monitoring; Automatic vehicle guidance; Automatic lane finding; Object detection; Dynamic scene analysis
`
`1. Introduction
`
`The application of image processing and computer
`vision techniques to the analysis of video sequences of
`traffic flow offers considerable improvements over the
`existing methods of traffic data collection and road traffic
`monitoring. Other methods including the inductive loop,
`the sonar and microwave detectors suffer from serious
`drawbacks in that
`they are expensive to install and
`maintain and they are unable to detect slow or stationary
`vehicles. Video sensors offer a relatively low installation
`cost with little traffic disruption during maintenance.
`Furthermore, they provide wide area monitoring allowing
`analysis of traffic flows and turning movements (import-
`ant
`to junction design), speed measurement, multiple-
`point
`vehicle
`counts,
`vehicle
`classification
`and
`highway state assessment
`(e.g. congestion or
`incident
`detection) [1].
`
`* Corresponding author. Tel.: þ 30-28210-37206; fax: þ 30-28210-
`37542.
`E-mail address: michalis@systems.tuc.gr (M. Zervakis).
`
`Image processing also finds extensive applications in the
`related field of autonomous vehicle guidance, mainly for
`determining the vehicle’s relative position in the lane and
`for obstacle detection. The problem of autonomous vehicle
`guidance involves solving different problems at different
`abstraction levels. The vision system can aid the accurate
`localization of the vehicle with respect to its environment,
`which is composed of the appropriate lane and obstacles or
`other moving vehicles. Both lane and obstacle detection are
`based on estimation procedures for recognizing the borders
`of the lane and determining the path of the vehicle. The
`estimation is often performed by matching the observations
`(images) to an assumed road and/or vehicle model.
`Video systems for either traffic monitoring or auton-
`omous vehicle guidance normally involve two major tasks
`of perception: (a) estimation of road geometry and (b)
`vehicle and obstacle detection. Road traffic monitoring aims
`at the acquisition and analysis of traffic figures, such as
`presence and numbers of vehicles, speed distribution data,
`turning traffic flows at intersections, queue-lengths, space
`and time occupancy rates, etc. Thus, for traffic monitoring it
`is essential to detect the lane of the road and then sense
`
`0262-8856/03/$ - see front matter q 2003 Elsevier Science B.V. All rights reserved.
`doi:10.1016/S0262-8856(03)00004-0
`
`VALEO EX. 1007
`
`
`
`360
`
`V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381
`
`and identify presence and/or motion parameters of a vehicle.
`Similarly, in autonomous vehicle guidance, the knowledge
`about road geometry allows a vehicle to follow its route and
`the detection of road obstacles becomes a necessary and
`important task for avoiding other vehicles present on the
`road.
`In this paper we focus on video systems considering both
`areas of road traffic monitoring and automatic vehicle
`guidance. We attempt a state of the art survey of algorithms
`and tools for the two major subtasks involved in traffic
`applications, i.e. the automatic lane finding (estimation of
`lane and/or central line) and vehicle detection (moving or
`stationary object/obstacle). With the progress of research in
`computer vision, it appears that these tasks should be trivial.
`The reality is not so simple; a vision-based system for such
`traffic applications must have the features of a short
`processing time, low processing cost and high reliability
`[2]. Moreover, the techniques employed must be robust
`enough to tolerate inaccuracies in the 3D reconstruction of
`the scene, noise caused by vehicle movement and
`calibration drifts in the acquisition system. The image
`acquisition process can be regarded as a perspective
`transform from the 3D world space to the 2D image
`space. The inverse transform, which represents a 3D
`reconstruction of the world from a 2D image, is usually
`indeterminate (ill-posed problem) because information is
`lost in the acquisition mapping. Thus, an important task of
`video systems is to remove the inherent perspective effect
`from acquired images [3,4]. This task requires additional
`spatio-temporal information by means of additional sensors
`(stereo vision or other type sensors) or the analysis of
`temporal information from a sequence of images. Stereo
`vision and optical flow methods aid the regularization of the
`inversion process and help recover scene depth. Some of the
`lane or object detection problems have been already solved
`as presented in the next sections. Others, such as the
`handling of uncertainty and the fusion of information from
`different sensors, are still open problems as presented in
`Section 4 that traces the future trends.
`In our analysis of video systems we distinguish between
`two situations. The first one is the case in which a static
`camera observes a dynamic road scene for the purpose of
`traffic surveillance. In this case, the static camera generally
`has a good view of the road objects because of the high
`position of the camera. Therefore, 2D intensity images may
`contain enough information for the model-based recognition
`of road objects. The second situation is the case in which
`one or more vision sensors are mounted on a mobile vehicle
`that moves in a dynamic road scene. In this case, the vision
`sensors may not be in the best position for observing a road
`scene. Then, it is necessary to correlate video information
`with sensors that provide the actual state of the vehicle, or to
`combine multisensory data in order to detect road obstacles
`efficiently [2].
`Both lane and object detection become quite different in
`the cases of stationary (traffic monitoring) and moving
`
`camera (automatic vehicle guidance), conceptually and
`algorithmically. In traffic monitoring,
`the lane and the
`objects (vehicles) have to be detected on the image plane, at
`the camera coordinates. Alternatively, in vehicle guidance,
`the lane and the object (obstacle) positions must be located
`at the actual 3D space. Hence, the two cases, i.e. stationary
`and moving cameras,
`require different processing
`approaches, as illustrated in Sections 2 and 3 of the paper.
`The techniques used for moving cameras can also be used
`for stationary cameras. Nevertheless, due to their complex-
`ity and computational cost, they are not well suited for the
`relative simpler applications of stationary video analysis.
`Research in the field started as early as in the 70s with the
`advent of computers and the development of efficient image
`processing techniques. There is a wealth of methods for
`either traffic monitoring or terrain monitoring for vehicle
`guidance. Some of them share common characteristics and
`some originate from quite diverse approaches. The purpose
`of this paper is threefold. First, to classify image-processing
`methods used in traffic applications. Second, to provide the
`advantages and disadvantages of these algorithms. Third,
`from this integrated consideration, to attempt an evaluation
`of shortcomings and general needs in this field of active
`research. The paper proceeds by considering the problem of
`automatic lane finding in Section 2 and that of vehicle
`detection in Section 3, respectively. In Section 4 we provide
`a critical comparison and relate processing algorithms with
`complete systems developed for specific traffic applications.
`The paper concludes by projecting future trends and
`developments motivated by the demands of the field and
`the shortcomings of the available tools.
`
`2. Automatic lane finding
`
`2.1. Stationary camera
`
`A critical objective in the development of a road
`monitoring system based upon image analysis is adapta-
`bility. The ability of the system to react to a changing scene
`while carrying out a variety of goals is a key issue in
`designing replacements to the existing methods of traffic
`data collection. This adaptability can only be brought about
`by a generalized approach to the problem which incorpor-
`ates little or no a priori knowledge of the analyzed scene.
`Such a system will be able to adapt
`to ‘changing
`circumstances’, which may include the following: changing
`light levels, i.e. night – day, or sunny – cloudy; deliberately
`altered camera scene, perhaps altered remotely by an
`operator; accidentally altered camera position, i.e. buffeting
`by the wind or knocks due to foreign bodies; changing
`analysis goals, i.e. traffic flow to counting or occupancy
`measurement. Moreover, an adaptive system would ease
`installation of the equipment due to its ability for self-
`initialization [1]. Automatic lane finding (ALF) is an
`important task for an adaptive traffic monitoring system.
`
`
`
`V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381
`
`361
`
`ALF can assist and simplify the installation of a detection
`system.
`It enables the system to adapt
`to different
`environmental conditions and camera viewing positions. It
`also enables applications in active vision systems, where the
`camera viewing angle and the focal length of the camera
`lens may be controlled by the system operator to find an
`optimum view [5].
`The aspects that characterize a traffic lane are its visual
`difference from the environment and the relatively dense
`motion of vehicles along the lane. Thus, features that can be
`easily inferred are the lane characteristics themselves (lane
`markings and/or road edges) and the continuous change of
`the scene along the lane area. Based on these features, we
`can distinguish two classes of approaches in lane detection,
`namely lane-region detection and lane-border detection
`(lane markings and road edges). The first class relates the
`detection of the lane with the changing intensity distribution
`along the region of a lane, whereas the second class
`considers directly the spatial detection of lane character-
`istics. It should be emphasized that the first class considers
`just changes in the gray-scale values within an image
`sequence, without addressing the problem of motion
`estimation. The second class can be further separated,
`based on the method of describing the lane characteristics.
`Two general subclasses involve model-driven approaches in
`which deformable templates are iteratively modified to
`match the road edges, and feature-driven approaches in
`which lane features are extracted, localized and combined to
`meaningful characteristics. The latter approach limits the
`computation-intensive processing of images to simply
`extracting features of interest.
`
`2.2. Moving camera
`
`the lane
`In the case of automatic vehicle guidance,
`detection process is designed to (a) provide estimates for the
`position and orientation of the car within the lane and (b)
`infer a reference system for locating other vehicles or
`obstacles in the path of that vehicle. In general, both tasks
`require two major estimation procedures, one regarding the
`recognition of the borders of the lane and the second for the
`prediction of the path of the vehicle. The derivation of the
`path of the vehicle requires temporal information concern-
`ing the vehicle motion, as well as modeling of the state of
`the car (dynamics and kinematics). Alternatively, the lane
`recognition task can be based on spatial visual information,
`at least for the short-range estimation of the lane position.
`Although some systems have been designed to work on
`completely unstructured roads and terrain, lane detection
`has generally been reduced to the localization of specific
`features, such as lane markings painted on the road surface.
`Certain assumptions facilitate the lane detection task and/or
`speed-up the processing [6]:
`† Instead of processing entire images, a computer vision
`system can analyze specific regions (the ‘focus of
`
`attention’) to identify and extract the features of interest.
`† The system can assume a fixed or smoothly varying lane
`width and thereby limit its search to almost-parallel lane
`markings.
`† A system can exploit its knowledge of camera and an
`assumption of a precise 3D road model (for example,
`a flat road without bumps) to localize features easier
`and simplify the mapping between image pixels and
`their corresponding world coordinates.
`
`Real-time road segmentation is complicated by the great
`variability of vehicle and environmental conditions. Chan-
`ging seasons or weather conditions, time of the day, dirt on
`the road, shadows, spectral reflection when the sun is at low
`angle and manmade changes (tarmac patches used to repair
`road segments) complicate the segmentation process.
`Because of these combined effects, robust segmentation is
`very demanding. Several features of structured roads, such
`as color and texture, have been used to distinguish between
`road and non-road regions in each individual frame.
`Furthermore, road tracking can facilitate road segmentation
`based on previous information. This process, however,
`requires knowledge of the vehicle dynamics, vehicle
`suspension, performance of the navigation and control
`systems, etc.
`Single-frame analysis has been extensively considered
`not only in monocular but also in stereo vision systems. The
`approaches used in stereo vision often involve independent
`processing on the left and right images and projection of the
`result to the ground plane through the Helmholtz shear
`equation, making the assumption of flat road and using
`piecewise road geometry models (such as clothoids) [7,8].
`Furthermore, the inverse perspective mapping can be used
`to simplify the process of lane detection, similar to the
`process of object detection considered in Section 3 [4]. The
`inverse perspective mapping essentially re-projects the two
`images onto a common plane (the road plane) and provides
`a single image with common lane structure.
`In the case of a moving vehicle, the lane recognition
`process must be repeated continuously on a sequence of
`frames. In order to accelerate the lane detection process,
`there is a need to restrict the computation to a reduced
`region of interest (ROI). There are two general approaches
`in this direction. The first restricts the search on the
`predicted path of the vehicle by defining a search region
`within a trapezoid on the image plane, which is located
`through the perspective transform. The second approach
`defines small search windows located at
`the expected
`position of the lane, separated by short spatial distances. A
`rough prediction of the lane position at subsequent video
`frames can highly accelerate the lane detection process. In
`one scheme, the estimated lane borders at the previous
`frame can be expanded, making the lane virtually wider, so
`that the actual lane borders at the next frame are searched
`for within this expanded ROI [9]. In a different scheme,
`
`
`
`362
`
`V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381
`
`a least squares linear fit is used to extrapolate lane markings
`and locate the new search windows at the next frames [10].
`Following the process of lane detection on the image
`plane, the result must be mapped on the road (world)
`coordinate system for navigation purposes. By assuming a
`flat road model, the distance of a 3D-scene point on the road
`plane can be readily computed if we know the transform-
`ation matrix between the camera and the vehicle coordinate
`systems. In more general cases, the road geometry has to be
`estimated in order to derive the transformation matrix
`between the vehicle and the road coordinate systems. The
`aspects of relative position estimation are further considered
`in Section 3, along with the object detection process.
`
`2.3. Automatic lane finding approaches
`
`The fundamental aspects of ALF approaches are
`considered and reviewed in this section. These approaches
`are classified into lane-region detection, feature driven and
`model driven approaches.
`
`2.3.1. Lane-region detection
`One method of automatic lane finding with stationary
`camera can be based upon accumulating a map of significant
`scene change [5]. The so-called activity map, distinguishes
`between active areas of the scene where motion is occurring
`(the road) and inactive areas of no significant motion (e.g.
`verges, central reservation). To prevent saturation and allow
`adaptation to changes of the scene, the map generation also
`incorporates a simple decay mechanism through which
`previously active areas slowly fade from the map. Once
`formed, the activity map can be used by a lane finding
`algorithm to extract the lane positions [1].
`The lane-region analysis can be also modeled as a
`classification problem, which labels image pixels into road
`and non-road classes based on particular features. A typical
`classification problem involves the steps of
`feature
`extraction, feature decorrelation and reduction, clustering
`and segmentation. For road segmentation applications, two
`particular features have been used, namely color and
`texture [11,12]. In the case of color,
`the features are
`defined by the spectral response of the illumination at the
`red, green and blue bands. At each pixel, the (R,G,B) value
`defines the feature vector and the classification can be
`performed directly on the (R,G,B) scatter diagram of the
`image [12]. The green band contributes very little in the
`separation of classes in natural scenes and on the (R,B)
`plane classification can be performed through a linear
`discriminant function [12], since road pixels cluster nicely,
`distinct from non-road pixels. The classification process
`can be based on piece-wise linear discriminant functions,
`in order to account for varying color conditions on the road
`(shading, reflections, etc.) [12]. The road segmentation can
`also be performed using stochastic patter
`recognition
`approaches. One can define many classes representing
`road and/or non-road segments. Each class is represented
`
`by its mean and variance of (R,G,B) values and it is a priori
`likelihood based on the expected number of pixels in each
`class. Gaussian distributions have been used to model the
`color classes [11].
`The apparent color of an object is not consistent, due to
`several factors. It depends on the illuminant color, the
`reflectivity of the object,
`the illumination and viewing
`geometry and the sensor parameters. The color of a scene
`may vary with time, cloud cover and other atmospheric
`conditions, as well as with the camera position and
`orientation. Thus, color as a feature for classification
`requires special
`treatment and normalization to ensure
`consistency of the classification results. Once the road has
`been localized in an image, the color statistics of the road
`and off-road models need be modified in each class,
`adapting the process to changing conditions [13]. The hue,
`saturation, gray-value (HSV) space has been also used as
`more effective for classification [14].
`Besides color, the local texture of the image has been
`used as a feature for classification [11,15]. The texture of the
`road is normally smoother than that of the environment,
`allowing for region separation in its feature space. The
`texture calculation can be based on the amplitude of the
`gradient operator at each image area. Ref. [11] uses a
`normalized gradient measure based on a high-resolution and
`a low-resolution (smoothed) image,
`in order to handle
`shadow interior and boundaries. Texture classification is
`performed through stochastic patter recognition techniques
`and unsupervised clustering. Since the road surface is poorly
`textured and differs significantly from objects (vehicles) and
`background, grey-level segmentation is likely to discrimi-
`nate the road surface area from other areas of interest.
`Unsupervised clustering on the basis of the C-means
`algorithm or the Kohonnen self-organizing maps can be
`employed on a 3D input space of features. Two of these
`features signify the position and the third signifies the grey-
`level of each pixel under consideration. Thus, the classifier
`groups together neighboring pixels of similar intensities
`[16].
`The classification step must be succeeded by a region
`merging procedure, as to combine similar small regions
`under a single label. Region merging may utilize other
`sources of information, such as motion. In essence, a map of
`static regions obtained by simple frame differencing can
`provide information about the motion activity of neighbor-
`ing patches candidate for merging [16]. Texture classifi-
`cation can also be effectively combined with color
`classification based on the confidence of the two classifi-
`cation schemes [11].
`
`2.3.2. Feature-driven approaches
`This class of approaches is based on the detection of
`edges in the image and the organization of edges into
`meaningful structures (lanes or lane markings) [17]. This
`class involves, in general, two levels of processing, i.e.
`feature detection and feature aggregation. The feature
`
`
`
`V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381
`
`363
`
`detection part aims at extracting intensity discontinuities.
`To make the detection more effective, a first step of image
`enhancement is performed followed by a gradient operator.
`The dominant edges are extracted based on thresholding of
`the gradient magnitude and they are refined through
`thinning operators. At this stage, the direction of edges at
`each pixel can be computed based on the phase of the
`gradient and a curvature of line segments can be estimated
`based on neighborhood relations.
`Feature aggregation organizes edge segments into
`meaningful structures (lane markings) based on short-
`range or long-range attributes of the lane. Short-range
`aggregation considers local
`lane fitting into the edge
`structure of the image. A realistic assumption that is often
`used requires that the lane (or the lane marking) width does
`not change drastically. Hence, meaningful edges of the
`video image are located at a certain distance apart, in order
`to fit the lane-width model. Long-range aggregation is based
`on a line intersection model, based on the assumption of
`smooth road curvature. Thus gross road boundaries and
`markings must be directed towards a specific point in the
`image, the focus of expansion (FOE) of the camera system.
`Along these directions, Ref. [4] detects lane markings
`through a horizontal (linear) edge detector and enhances
`vertical edges via a morphological operator. For each
`horizontal line, it then forms correspondences of edge points
`to a two-lane road model (three lane markings) and
`identifies the most frequent lane width along the image,
`through a histogram analysis. All pairs of edge pixels (along
`each horizontal line) that fall within some limits around this
`width are considered as lane markings and corresponding
`points on different scan lines are aggregated together as
`lines of the road. A similar approach is used in Ref. [18] for
`auto-calibration of the camera module. The Road Markings
`Analysis (ROMA) system is based on aggregation of the
`gradient direction at edge pixels in real-time [19]. To detect
`edges that are possible markings or road boundaries, it
`employs a contour following algorithm based on the range
`of acceptable gradient directions. This range is adapted in
`real-time to the current state variables of the road model.
`The system can cope with discontinuities of the road borders
`and can track road intersections.
`Ref. [20] detects brightness discontinuities and retains
`only long straight lines that point toward the FOE. For each
`edge point,
`it preserves the edge direction and the
`neighboring line curvature and performs a first elimination
`of edges based on thresholding of the direction and
`curvature. This is done to preserve only straight lines that
`point towards the specific direction of the FOE. The feature
`aggregation is performed through correlation with a
`synthetic image that encodes the road structure for the
`specific FOE. The edge detection can be efficiently
`performed through morphological operators [21 – 23].
`The approach in Ref. [10] operates on search windows
`located along the estimated position of the lane markings.
`For each search window, the edges of the lane marking are
`
`determined as the locations of maximum positive and
`negative horizontal changes in illumination. Then, these
`edge points are aggregated as boundaries of the lane making
`(paint stripe) based on their spacing, which should
`approximate the lane-marking width. The detected lanes at
`near-range are extrapolated to far-range via linear least-
`squares fit, to provide an estimated lane-marking location
`for placing the subsequent search windows. The location of
`the road markings along with the state of the vehicle are
`used in two different Kalman filters to estimate the near and
`far-range road geometry ahead of the vehicle [10]. Prior
`knowledge of the road geometry imposes strong constraints
`on the likely location and orientation of
`the lanes.
`Alternatively, other features have been proposed that
`capture information about the orientation of the edges, but
`are not affected drastically by extraneous edges. Along these
`lines, the LANA algorithm [24] uses frequency-domain
`features rather than features directly related to the detected
`edges. These feature vectors are used along with a
`deformable-template model of the lane markers in a
`Bayesian estimation setting. The deformable template
`introduces a priori information, whereas the feature vectors
`are used to compute the likelihood probability. The
`parameters of the deformable template are estimated by
`optimizing the resulting maximum a posteriori objective
`function [24]. Simpler linear models are used in Ref. [14]
`for
`road boundaries and lane markings, with their
`parameters estimated via a recursive least squares (RLS)
`filter fit on candidate edge points.
`In general,
`feature driven approaches are highly
`dependent on the methods used to extract features and
`they suffer from noise effects and irrelevant feature
`structures. Often in practice the strongest edges are not
`the road edges, so that the detected edges do not necessarily
`fit a straight-line or a smoothly varying model. Shadow
`edges can appear quite strong, highly affecting the line
`tracking approach.
`
`2.3.3. Model-driven approaches
`In model-driven approaches the aim is to match a
`deformable template defining some scene characteristic to
`the observed image, so as to derive the parameters of the
`model that match the observations. The pavement edges and
`lane markings are often approximated by circular arcs on a
`flat-ground plane. More flexible approaches have been
`considered in Refs. [25,26] using snakes and splines to
`model road segments. In contrast to other deformable line
`models, Ref. [26] uses a spline-based model that describes
`the perspective effect of parallel lines, considering simul-
`taneously both-side borders of the road lane. For small to
`moderate curvatures, a circular arc is approximated by a
`second-order parabola, whose parameters must be esti-
`mated. The estimation can be performed on the image plane
`[27] or on the ground plane [24] after the appropriate
`perspective mapping. Bayesian optimization procedures are
`often used for the estimation of these parameters.
`
`
`
`364
`
`V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381
`
`Model-based approaches for lane finding have been
`extensively employed in stereo vision systems, where the
`estimation of the 3D structure is also possible. Such
`approaches assume a parametric model of
`the lane
`geometry, and a tracking algorithm estimates the parameters
`of this model from feature measurements in the left and
`right images [28]. In Ref. [28] the lane tracker predicts
`where the lane markers should appear in the current image
`based on its previous estimates of the lane position. It then
`extracts possible lane markers from the left and right
`images. These feature measurements are passed to a robust
`estimation procedure, which recovers the parameters of the
`lane along with the orientation and height of the stereo rig
`with respect to the ground plane. The Helmholtz shear
`equation is used to verify that candidate lane markers
`actually lie on the ground plane [28]. The lane markers are
`modeled as white bars of a particular width against a darker
`background. Regions in the image that satisfy this intensity
`profile can be identified through a template matching
`procedure. In this form, the width of the lane markers in
`the image changes linearly as a function of the distance from
`the camera, or the location of the image row considered.
`Thus, different
`templates are used at different
`image
`locations along the length of the road, in both the left and
`right images. Once a set of candidate lane markers has been
`recovered, the lane tracker applies a robust fitting procedure
`using the Hough transform,
`to find the set of model
`parameters which best match the observed data [28]. A
`robust fitting strategy is absolutely essential
`in traffic
`applications, because on real highway traffic scenes the
`feature extraction procedure almost always returns a
`number of extraneous features that are not part of the lane
`structure. These extra features can come from a variety of
`sources, other vehicles on the highway, shadows or cracks in
`the roadway etc.
`Another class of model-driven approaches involves the
`stochastic modeling of lane parameters and the use of
`Bayesian inference to match a road model to the observed
`scene. The position and configuration of the road, for
`instance, can be considered as variables to be inferred from
`the observation and the a posteriori probability conditioned
`on this observation [25,29]. This requires the description of
`the road using small segments and the derivation of
`probability distributions for the relative positions of these
`segments on regular road scenes (prior distribution on road
`geometry). Moreover,
`it requires the specification of
`probability distributions for observed segments, obtained
`using an edge detector on the observed image, conditioned
`on the possible positions of the road segments (a posteriori
`distribution of segments). Such distributions can be derived
`from test data [29].
`The 3D model of the road can also be used in modeling
`the road parameters through differential equations that relate
`motion with spatial changes. Such approaches using state-
`variable estimation (Kalman filtering) are developed in
`Refs. [30,31]. The road model consists of skeletal lines
`
`pieced together from clothoids (i.e. arcs with constant
`curvature change over their run length). The road assump-
`tions define a general highway scene, where the ground
`plane is flat, the road boundaries are parallel with constant
`width, the horizontal road curvature changes slowly (almost
`linearly) and the vertical curvature is insignificant. Assum-
`ing slow speed changes, or piecewise constant speed, the
`temporal change of curvature is linearly related to the speed
`of the vehicle. Thus, the curvature parameters and their
`association with the ego-motion of the camera