`
`Sylvia Gil1
`Ruggero Milanese
`Thierry Pun
`
`Computer Science Department
`University of Geneva, Switzerland
`E-mail: gil@cui.unige.ch
`
`ABSTRACT
`
`This paper describes a motion-analysis system, applied to the problem of vehicle tracking in real-world highway scenes.
`The system is structured in two stages. In the first one, a motion-detection algorithm performs a figure/ground segmentation,
`providing binary masks of the moving objects. In the second stage, vehicles are tracked for the rest of the sequence, by us-
`ing Kalman filters on two state vectors, which represent each target's position and velocity. A vehicle's motion is represent-
`ed by an affmne model, taking into account translations and scale changes. Three types of features have been used for the
`vehicle's description state vectors. Two of them are contour-based: the bounding box and the centroid of the convex poly-
`gon approximating the vehicles contour. The third one is region-based and consists of the 2-D pattern of the vehicle in the
`image. For each of these features, the performance of the tracking algorithm has been tested, in terms of the position error,
`stability of the estimated motion parameters, trace of the motion model's covariance matrix, as well as computing time. A
`comparison of these results appears in favor of the use of the bounding box features.
`
`Keywords: traffic scenes, motion detection, Kalman filter, tracking, feature comparison.
`
`1. INTRODUCTION
`
`Computer vision techniques can be useful in traffic control in order to increase safety and obtain road state information of
`monitored areas. For instance, the possibility to extract complex, high-level road information such as congestion, accident
`or fluid traffic allows to efficiently plan a path through the road network, to quickly bring rescue where needed or to deviate
`the traffic. In order to extract this type of information it is first necessary to segment moving objects from the scene. In this
`way, vehicles can be counted, and their trajectory, as well as their velocity and acceleration can be determined. Moreover,
`statistics can be collected from kinematic parameters in order to make a classification between safe, fluid, congestioned or
`dangerous state of traffic.
`
`One of the major difficulties of monitoring traffic scenes, along with the real-time requirement, is the variety of light
`conditions of outdoor scenes. Indeed, the system should be reliable day and night, even though at night only vehicle lights
`are visible. Weather conditions also bring additional difficulties, such as the presence of the vehicle shadow in sunny days
`(shadows can prevent from correctly segmenting nearby vehicles) or a change in the contrast between the road and the vehi-
`cles when raining (a wet road is darker and generally dries irregularly). Thus, it is necessary to have a system able to adapt
`to these different lighting conditions by exploiting different visual features according to their reliability under such condi-
`tions. This paper presents a comparison of the ability of different features to be recovered and tracked, in an image se-
`quence.
`
`1. Part of this research was conducted while the first author was a visiting scholar at the International Computer Science Institute,
`Berkeley, California, thanks 1r a grant from the Swiss National Fund for Scientific Research (4023-027036).
`
`O-8194-1677-O/95/$6.OO
`
`SPJE Vol. 2344 Intelligent Vehicle Highway Systems (1994) / 253
`
`Downloaded From: http://proceedings.spiedigitallibrary.org/ on 06/17/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
`
`Page 1 of 14
`
`SAMSUNG EXHIBIT 1019
`Samsung v. Image Processing Techs.
`
`
`
`Surveillance of urban and highway scenes has been widely studied in the past five years, thus providing a large amount
`of literature. One of the most popular methods, called model-base tracking, uses a 3-D model of a vehicle and is structured
`in two steps: (i) computation of scale, position and 3-D orientation of the modeled vehicle, also called pose recovery, and
`(ii) tracking of the vehicle by fitting the model in subsequent frames by means ofmaximum-a-posteriori (MAP) techniques1
`or Kalman filters2' 3• The vehicle model being quite detailed (3-D model including the shadow), model-based tracking pro-
`vides an accurate estimate (or recovery) of the vehicles 3-D position which might not be needed for most applications. A
`simplified model of the vehicle is proposed
`where it is represented through a polygon, with fixed number of vertices, en-
`closing the convex hull of some vehicle features. This model dramatically reduces the vehicle model complexity. In Kal-
`man filters are used in order to track the vehicle's position as well as its motion using an affine model which allows for
`translation and rotation. The fixed number of polygon vertices, however, allows little variations on the objects shape. Some
`improvements on this point are proposed in through the use of dynamic contours instead of polygons with a fixed number
`of vertices. Cubic B-splines are fitted on a set of control points (vertices) belonging to the target and so providing a smooth
`parametric curve approximating its contour. In this case, a Kalman filter is used in order to track the curve in subsequent
`frames with a search strategy guided by the local contrast of the target in the image, i.e. with no use of the motion informa-
`tion. In the context of traffic scenes, especially in the case of highways, vehicle's motion should be a powerful cue in order
`to direct the search for the target position in subsequent frames. Another system that combines active contours model with
`Kalman filtering has been presented in 6•
`this case, the use of separate filters for the vehicle position and other motion pa-
`rameters (affine model: translation and scale), has been shown to provide better results.
`
`In consideration of this previous work, the approach described in this paper is based in the following points. First, ad-
`vantage is taken from the simplicity of the targets profile (man-made vehicles), which can be well approximated by simple
`geometric models such as convex polygons; no restriction on the vertices number should be needed. Motion infonnation in
`terms of an affine model (translation and scale) is used, as well as local contrast, in order to locate the vehicle in subsequent
`frames, by means of two separate Kalman filters. Finally, multiple features are tracked in the same image sequence and their
`performances are compared in terms of robustness, CPU time, and error measures. The rest of this paper is organized in the
`following way: Section 2 presents a motion detection system which discriminates between static background and dynamic
`objects and provides a set of binary masks coarsely representing the moving objects. Once moving objects are isolated, their
`mask shape is refined until their boundary accurately matches their contour (Section 3). After the mask refinement is accom-
`pushed, a set of features, such as the mask contour, the pattern describing the thrget itself, and its center of gravity, are com-
`puted for each vehicle, in order to be tracked in subsequent frames (Section 4). In section 5,the tracking procedure is
`described. Results are presented in Section 6, followed by a discussion. Finally, conclusions are presented in Section 7.
`
`2. THE MOTION DETECTION SYSTEM
`
`The goal of the motion detection module is to perform a segmentation between static and dynamic regions in an image se-
`quence by providing a set of binary masks which coarsely represent the shape and the position of the moving objects. The
`method is required to be fast since it represents a preprocessing step for motion computation and tracking. For this purpose,
`it operates on low-level data such as spatio-temporal derivatives or image differences rather than an optical flow informs-
`tion.
`
`2. 1 Related work
`
`Motion detection has been studied in different contexts such as video coding, surveillance, or traffic control. Differential
`methods are based on the substraction of subsequent frames in order to get rid of the constant background and process only
`the moving regions of the image. An example of this method is described in7: after performing the difference between suc-
`cessive frames, a 2-D median filter is applied on the difference image in order to smooth the mask boundaries; finally small
`regions are eliminated. This strategy is strongly affected by the aperture problem, when moving objects contain large re-
`gions of uniform gray-level. In this case, part of these objects are considered static and the resulting masks, despite the me-
`dian regularization, appear oversegmented. A related approach, called the background method, aims at reconstructing the
`background using the spatial and temporal derivatives. When an accurate approximation of the background is available, it is
`subtracted from each frame in order to enhance moving objects. The background image has to be updated to account for
`
`254 / SPIE Vol. 2344 Intelligent Vehicle Highway Systems (1994)
`
`Downloaded From: http://proceedings.spiedigitallibrary.org/ on 06/17/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
`
`SAMSUNG EXHIBIT 1019
`Page 2 of 14
`
`
`
`changing external conditions (e.g. clouds). An example of this method is given in8 which a Kalman filter is used to up-
`date the background image. This method requires a certain number of frames until a reliable background is available, but its
`adaptability is a very attractive feature.
`
`Other methods such as9 exploit motion coherence through MAP techniques in order to separate objects undergoing clif-
`ferent motions. This method minimizes, through a deterministic relaxation procedure, an energy function which combines a
`regularization term and a measure of match between spatio-temporal derivatives and the motion assigned to each region.
`This method, however, requires the computation of motion parameters and therefore does not meet our requirements de-
`scribed above. MAP techniques have also been used in order to compute global thresholds that segment images into static
`and dynamic areas10. Global thresholds are first computed according to the noise probability density function (pdf) of the
`difference images. Since segmentation through global thresholding does not provide well-segmented masks, local refine-
`ments are then applied on this preliminary data, based on the MAP criterion. However, this method still leads to overseg-
`mentation, i.e. to many isolated masks which are actually part of the same moving objecL Also, a major drawback of MAP
`techniques is the large processing power they require.
`
`2.2 Motion detection with multiscale relaxation
`
`In contrast to the previous approaches, our method is based on the simple difference of subsequent frames and requires only
`two frames in order to provide satisfactory results (see11 for more details). The aperture problem, at the basis of the overseg-
`(t) , for each frame I , (t) of the input se-
`mentation artifacts, is solved by the use of a multiresolution pyramid
`quence. At each level 1 of the pyramid (1 = 0, ..., log2image_size), first estimates of motions are obtained by computing
`temporal image differences:
`
`D1,(t) = ix,y(t) —?,y(t—1) .
`
`(1)
`
`Local differences D1, (t) provide two motion contributions, through their magnitude, andy through the locations of sign
`changes. These two fators are locally combined together to form the first motion estimates E ,, (t) (see Figure 1.b). High-
`resolution levels of E x,y(t) have a better spatial localization, but may only yield information at the object boundaries.
`Lower-resolution levels help solve the aperture problem, by filling in the interior of moving objects having constant grey
`level.
`
`Multiple-resolutions motion estimates E1, (t) are combined through a coarse-to-fine pyramidal relaxation process. Its
`goal is to locally propagate the pixel values horizontally within each level, as well as vertically, across contiguous levels of
`the pyramid. The "horizontal" component consists of a diffusion process within each pyramid level, to fill in gaps and re-
`duce noise. The "vertical" component of the relaxation process combines information at location (x,y) of level I with that at
`locations (2x + i, 2y +j)
`fO, 1 } at the ligher rpsolution level l-1 of the pyramid. The updating rule of the vertical
`J
`component is defined by a multiplicative factor y ,, • A, , in which y ., is a scaling coefficient.
`The increment Lt1x,, is defined as a function of the difference image D1(t) . That is, if the value of
`,12 (t) is
`smaller than a threshold (proportional the estimated image noise), then A , is the quadratic termf1, and otherwise it is
`given byf2:
`
`_) 2
`f2=gD1+1x,y_k2J,
`where g (a) is a sigmoidal function of Jhe type ii( 1 + eJ, and k1, k2 are positive constants. This algorithm corresponds
`to pushing the values of the estimates E ,, further towards either 0 or 1.
`After the application of this algorithm, the full-resolution image at the bottom of the pyramid contains a binary mask of
`the moving objects M (x, y) . Due to the diffusion component of the relaxation process, the shape of these regions tends to
`be "convex", and to adapt to the shape of the underlying objects. Figure 1 presents the results on the sequence "walking".
`Despite the shadows and other reflecting surfaces, the resulting masks correspond well to the shape of the moving object.
`
`f1 = —k1 • (D'
`
`(2)
`
`(3)
`
`SPIE Vol. 2344 Intelligent Vehicle Highway Systems (1994)1 255
`
`Downloaded From: http://proceedings.spiedigitallibrary.org/ on 06/17/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
`
`SAMSUNG EXHIBIT 1019
`Page 3 of 14
`
`
`
`Figure 1: Results of the motion detection module on the "walking" sequence: (a) first frame of the se-
`quence I,(t=O); (b) estimate E1°, (c) resulting binary mask superposed to the original image.
`
`3. MASK REFINEMENT AND PROPAGATION
`
`3. 1 Mask refinement
`
`The masks provided by the motion detection module represent a coarse description of the moving object, that must be re-
`fined. The strategy proposed in this section refines the initial mask boundaries according to the magnitude of spatio-tempo-
`ral derivatives in the proximity of the initial mask. A similar approach has been adopted in the field of medical imaging for
`tracking contours of moving organs and cells. Several solutions of different complexity have been proposed. Geiger and
`Vlontzos 12 present a method to match the inner and outer boundaries of a moving heart wall, by defining a cost function to
`be minimized. In this context the two contours are known in advance. The cost function thus only takes into account a
`smoothness constraint on the motion field and a penalty factor for large unmatched arcs of boundaries, while ignoring the
`image intensity or gradient. Although this approach is not applicable in our context, (the final contour is not known in ad-
`vance), the regularization terms apply in both situations. Leymarie and Levine 13describe the tracking mechanism of mov-
`ing cells by means of active contours (snakes). In their case, an initial snake is matched in the following image using a
`potential surface which takes into account the image intensity. Low-pass and band-pass pyramids of the potential surface
`are constructed in order to let the snake evolve while preventing local minima. Snakes are a suitable representation for ob-
`jects that undergo non-rigid motion, and in the context of vehicles, some simplifications can easily be applied.
`
`In the present work, advantage is taken of the geometric simplicity of vehicles by approximating their profile by a con-
`vex polygon in a similar way to and 6• The convexity assumption is not restrictive in the case of vehicles because in most
`projections their profiles are pretty compact and are thus well approximated by a convex polygon. This assumption consid-
`erably simplifies the matching step required by the tracking procedure, since it allows to by-pass problems such as contour
`regularization. Furthermore, an extensive literature exists on the topic of convex hull computation14. For each resolution
`
`level 1, the binary mask M11 (x, y) contains a number of regions R1,..., RN representing moving targets. Due to the proper-
`ties of the relaxation process, these regions tend to be slightly larger than the underlying objects. For this reason, the refining
`process uses each region R1 as a search window for the smallest convex polygon P1 containing the set of key
`points {(x, y), ..., (xi, y1)} . Eachkey point (xf Yj)
`ral derivatives D'(xJ j) exceed a certain threshold. These operations are actually limited to low resolution images in or-
`der to save computation time. Figure 2 shows the coarse initial masks issued from the motion detection algorithm (2.a)
`versus the contours of the final mask after the refinement process (2.b).
`
`within a regions R, is defined as a point where the spatio-tempo-
`
`3.2 Mask propagation to higher resolutions
`
`The propagation of the refined mask to higher resolutions is performed by iteratively repeating the projection procedure
`
`256 / SPIE Vol. 2344 Intelligent Vehicle Highway Systems (1994)
`
`Downloaded From: http://proceedings.spiedigitallibrary.org/ on 06/17/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
`
`SAMSUNG EXHIBIT 1019
`Page 4 of 14
`
`
`
`(2x + i, 2y +j) jJ
`
`from the top of the pyramid down to its bottom until the full-resolution image is reached. A straighiforward method for this
`projection procedure is the search of the maximum spatio-temporal gradient in a window defined by the 2x2 neighborhood:
`{O, 1 } where (x,y) is a polygon vertex at the coarse level. However, this method is not satisfacto-
`ry, since the resulting contour at the lower level tends to slide to neighboring and even static contours of the image, thus de-
`terioratmg the quality of the mask shape. In order to avoid this problem, the search is limited to the window obtained by
`scaling the refined mask oflevel 1 to the higher resolution i-i. The search window thus will contain the spatio-temporal gra-
`dient limited to the regions of the refined mask of the higher level. Figure 2 (b) and (c) shows the results of the contour
`propagation through different resolution levels for different image sequences.
`
`4. FEATURES TO TRACK
`
`once the target has been accurately isolated from the background, some features must be chosen, in order to be tracked over
`successive image frames. Several choices are possible for target features, such as the target's color, its contour or a pattern
`defining its spatial layout. Two tendencies appear to emerge in the existing literature, corresponding to a representation of
`the target's contour and on its description as a region. Both representations have advantages and drawbacks. Contour-based
`approaches15 are fast, since they are based on the (efficient) detection of spatio-temporal gradients. Their major drawback is
`that the contours of an object in an image not always have a physical meaning. Indeed, contour extraction depends on the lo-
`cal intensity variation between an object and the background, so that changes in their relative intensity may cause a contour
`to disappear. This type of features is thus reliable only when the contrast between the target and the background is suffi-
`ciently constant.
`
`(a)
`
`(b)
`
`(c)
`
`(d)
`
`Figure 2: Mask refinement and propagation through the pyramid levels; (a) mask provided by the motion de-
`tection algorithm superposed to the one frame of the sequence; propagation of the refined mask (b) to middle
`(c) and high (d) resolution images.
`
`On the other hand, region-based approaches16 represent the target through a 2D-pattern; they are quite accurate and do
`not depend on the background. Their drawbacks are the computing time required for their manipulation (such as pattern
`matching) and the sensitivity of pattern matching techniques to changes in scale and rotation. Their use is most appropriate
`when the target size is small, when a low resolution approximation of the target is available or when other representations
`
`SP!E Vol. 2344 Intelligent Vehicle Highway Systems (1994)1257
`
`Downloaded From: http://proceedings.spiedigitallibrary.org/ on 06/17/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
`
`SAMSUNG EXHIBIT 1019
`Page 5 of 14
`
`
`
`fail. Indeed, if the target size is small, contour-based approaches tend to fail, since the spatio-temporal gradients of the target
`become low, and the target displacement gets to the sub-pixel level. Contour- and region-based approaches thus appear to be
`complementary.
`
`In our work, both types of representations will be considered. The contour features are based on the convex polygon ap-
`proximation of the target's profile. Two features are taken into account: the bounding box of the convex polygon and its cen-
`ter of gravity. Both features are derived from approximating convex polygons using a variable number of vertices, but are
`represented by a fixed number of parameters (bounding box: two pair of coordinates, center of gravity: one pair of coordi-
`nates). The region-based feature is the spatial pattern of the target, which is stored in a rectangular window containing either
`the target or zeros.
`
`5. THE KALMAN FILTERS
`
`Kalman filters are recursive filters which provide an unbiased, minimum-variance and consistent estimate k of a state vec-
`tor Xk . The index k represents the discrete time. Kalman filtering consists of a three-steps strategy named prediction, mea-
`surement and update. The prediction computes a first estimate of the state vector k+ .
`and of the covariance matrix k
`k 2
`..
`defined as P = E[. ,.] , where . = — (capital letters are used to denote matrices). According to the dynamic sys-
`.k
`k
`k
`tems notation presented in , k denotes the prediction vector before measurement and k (+) refers to the updated vec-
`and the
`tor after the measurement. Prediction equations are based on previous realizations of the updated vector k
`updated matrix 'k
`
`k + 1
`
`=
`
`(k (+) )
`
`(4)
`
`(5)
`
`(6)
`
`÷k '
`k+1 = "km k'
`where Qk the covariance matrix of the model noise W'k: k = E[w'k,w'] . Qk reflects the adequacy of the model to the
`studied physical system. The measurement step consists of the computation, through image processing routines, of visual
`features named the measurements: Zk .Measurements are related to the state vector through the observation equation:
`Zk H•k+k
`where H is the observation matrix and k S a measurement error modeled through an uncorrelated noise. The goal of the
`observation matrix H is to relate the measurement to the state vector. The final update step modifies the state vector accord-
`ing to the measurement Zk thus providing an updated estimate k (+) .The equations describing the update step modify
`the state vector and the covariance matrix through the following equations:
`HTk . [Hk . k HTk +Rk]
`Kk k
`Kk . Hk] . 'k
`= [I _
`'k
`kk+Kk[zk(kkHJ1
`: Rk = E{k,] . The matrix Kk described in equation
`Rk represents the covariance matrix of the measurement noise
`(7) is also called the Kalman gain and has the role of modulating t'1e update state vector k into k (+) by appropriately
`In order to qualitatively describe equation (9), let us consider a matrix norm such as
`weighting the measurement error
`the trace. If the traces of Rk and P1, are respectively small and large, then Kk becomes large according to equation (7), al-
`lowing the update of k (+) . This happens when there is a small amount of measurement noise, thus justifying the update.
`Conversely, if the trace of Rk is large and that of kis close to zero, then the elements of Kk converge to zero too, preventing
`from changing k (+) and locking its value to previous estimations which were not corrupted by noise.
`
`(7)
`
`(8)
`
`(9)
`
`258 ISPIE Vol. 2344 Intelligent Vehicle Highway Systems (1994)
`
`Downloaded From: http://proceedings.spiedigitallibrary.org/ on 06/17/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
`
`SAMSUNG EXHIBIT 1019
`Page 6 of 14
`
`
`
`5. 1 Position and velocity filters
`The Kalman filters are usedto track aset of N visual features of a moving target. For each feature n = 1, ..., N we thus use
`, to represent respectively their position and instantaneous velocity. For each of these
`two state vectors, namely k and
`state vectors, a set of equations similar to equations 4-9 is defined. In the case of the position, thestate vector refers to the
`estimated 2-D coordinates of the N features in the image, that is: k =
`. The measurements Zk
`y,, x, y, ..., xi", ye")
`are the positions of the features, as computed from the k-th image frame. Therefore, there is a straightforward correspon-
`dence between the measurements and the position state vector:
`
`Zk kN lk'
`n = 1
`where
`k u the measurement error in the position vector. The use of a velocity vector
`N in conjunction to
`the position vector, allows to define an affine motion model, which takes into account the translations along th x,mndy axqs,
`as well as the scaling factor sk representing the shrink of the target as it moves away from the camera: k = "k ,vk ,sk)
`The observation matrix H2 for the velocity state vector for a given feature is defined as:
`
`(10)
`
`H2=
`
`1
`
`0
`
`0 Xk(-)--x k
`C
`1 Yk()Yck
`
`(11)
`
`The last column of the matrix 112k represents the vectorjoining the center of gravity of the target ck (x k") to one
`of the corners of the bounding box containing the target. A change in this vector indicates a change in the 'ca1e o tHe target,
`and allows the estimation of the scale factor sk parameter which is responsible for it.
`
`Since the measurements Zk are the features position computed from the k-th frame, the position and velocity state vec-
`tors are inter-related in the measurement equation:
`
`Zk H22 (+)
`
`2k'
`
`(12)
`
`which is another way of writing equation (10) by substituting .ik () by its value given in equation (13).
`The prediction equation for the position vector takes now into account the velocity state vector 'k = ( ukvksk)
`yielding the following expression:
`
`k + 1
`
`Sk (k ck) + (uk,vk)T+ '1k
`
`= k
`incorporates a constant deceleration factor a , which for the
`The prediction equation for the velocity vector 2k+
`image sequence analyzed in this paper is fixed to a valkie smaller than 1 . This coefficient allows to compensate for the
`change in apparent motion of the vehicles while they cross the camera's field of view:
`
`(13)
`
`k + 1
`
`= a •k
`
`2k
`
`(14)
`
`This is mainly due to the relative angle between the road and the projective plane, and may also be amplified by the pres-
`ence of curves in the road, and by the use of wide-angle lenses.
`
`To conclude, the update equations are given below. For the position state vector, they are simply obtained from equa-
`tions 7-9 by substituting the observation matrix by Id2. For the velocity vector, they are again obtained from equations 7-9
`with some slight modifications:
`
`T r
`T
`-1-'
`Kk = P2kH •H 2k [H2kP2k(-)H 2k+Rk]
`
`(15)
`
`SPIE Vol. 2344 intelligent Vehicle Highway Systems (1994) / 259
`
`Downloaded From: http://proceedings.spiedigitallibrary.org/ on 06/17/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
`
`SAMSUNG EXHIBIT 1019
`Page 7 of 14
`
`
`
`2k = [Id3 — Kk H2k] P2kN
`
`(16)
`
`(17)
`
`5.2 Measurement step
`
`The measurement step provides new information on the state vectors, by detecting the position of the N target features in
`the new frame. As descrthed in section 4, the tracking process has been evaluated on three types of features. The first type of
`features is contour-based, in that they are obtained from the smallest convex polygon enclosing high spatio-temporal gradi-
`ents, within a search window (see algorithm introduced in section 3.1 for the mask-refinement step). At each step k, the
`search window is obtained by translating the previous bounding rectangle of the target's convex hull, according to the pre-
`dicted motion. A tolerance margin is added to the window size for safety. This can compensate for possible errors, for in-
`stance those introduced by an incorrect estimation of the predicted motion. Another source of errors degrading the
`measurement step is the image parity change (resulting from an error in the image video sampling). This error can produce
`artificial temporal gradients at the location of a high spatial gradient, causing a deterioration in the shape of the convex hull
`(see section 6 for some examples). Given the search window, a new convex hull is obtained from the thresholded spatio-
`temporal gradients. From the resulting polygon, only two parameters are extracted, and used for tracking (N =2): the up-
`per left and the lower right corners of its bounding rectangle. This approach leads to a considerable information compres-
`sion, and avoids the problem of tracking feature vectors of varying size (the number of vertices of the convex polygon is
`generally not constant through successive frames).
`
`The second contour-based method is based on a similar algorithm, although only the centroid of the convex hull approx-
`imation is retained as the feature to track (N = 1 ). This approach represents a further reduction of information, leading to
`an even lower spatial localization. A priori, it presents the advantage of being more robust for tracking, since it is the result
`of an averaging process.
`
`The third class of features is region-based, since they correspond to 2D the pattern of the target. The measurement step
`estimates the target's position in the new image through a normalized correlation technique. The maximum peak in the cor-
`relation surface is selected as the position feature vector in the new image (N = 1 ). In order to improve the results of this
`method, an adaptive approach has been used. At each step k, the pattern used in the correlation is not fixed, but selected
`from the image k - 1. This allows to account for the slow changes in illumination and in the target's size (which typically
`gets smaller for an outgoing car flow). A considerable improvement in system's perfonnance in terms of both computing
`time and error rate is obtained by selecting a window enclosing the target, rather than using the whole image. To this end, a
`template containing the target is defined at each step. Correlation operations re thus1imited to the smallest rectangular win-
`= ( xk k 1 has been selected, a new template is
`dow enclosing this template (surrounded by zeroes). Once the peak
`computed, to be used as the pattern for the next frame k+1. This is done"by pIacmg the previous mask at location Zk and by
`computing the convex hull of the resulting points, according to the algorithm introduced in section 3.1.
`
`5.3 Initial conditions
`
`The dynamic equations defining the tracking process require an initialization step. This is obtained by analyzing the binary
`masks issued by the motion-detection subsystem (see sections 2-3) on two subsequent frames, as soon as the moving object
`enters the image (typically from the lower end, where figure-ground separation is easier). The correspondence between the
`masks in the two frames are obtained by simply comparing their spatial coordinates. Given a pair of corresponding masks
`(÷) for the
`m', rn" , the center of gravity of rn' (the most recent one) is used as the initial value for the position vector
`region-based method as well as for the second contour-based approach. For the remaining contour-based method, the
`bounding rectangle of m" is used to initialize the window enclosing the moving object's convex hull.
`
`For all methods, the displacement between the centers of gravity of m', m" is employed as the initial velocity vector
`o (+) . The third parameter defining the velocity vector is the scaling factor Sk,defining an magnification/reduction factor
`of the target between two successive frames. In the initialization step, the value of s0 is defined according to the ratio be-
`tween the areas of m', rn".
`
`260 / SPIE Vol. 2344 Intelligent Vehicle Highway Systems (1994)
`
`Downloaded From: http://proceedings.spiedigitallibrary.org/ on 06/17/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
`
`SAMSUNG EXHIBIT 1019
`Page 8 of 14
`
`
`
`6. EXPERIMENTAL RESULTS
`
`In this section, experimental results obtained by the tracking system are reported for a highway sequence acquired during
`daylight. Each frame is 8 bits, :128x256 pixels. The results of all contour- and region-based methods described in the previ-
`ous section are presented and compared. For all methods, the same initial conditions are applied, using the results of the mo-
`tion-detection algorithm (see section 5.3).
`
`A common step for all methods is the detection of the convex hull from the spatio-temporal gradients, providing the
`nieasurement vec