throbber
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 50, NO. 5, SEPTEMBER 2001
`
`1267
`
`A Novel Method for Tracking and Counting
`Pedestrians in Real-Time Using a Single Camera
`
`Osama Masoud and Nikolaos P. Papanikolopoulos, Senior Member, IEEE
`
`Abstract—This paper presents a real-time system for pedestrian
`tracking in sequences of grayscale images acquired by a stationary
`camera. The objective is to integrate this system with a traffic con-
`trol application such as a pedestrian control scheme at intersec-
`tions. The proposed approach can also be used to detect and track
`humans in front of vehicles. Furthermore, the proposed schemes
`can be employed for the detection of several diverse traffic ob-
`jects of interest (vehicles, bicycles, etc.) The system outputs the
`spatio-temporal coordinates of each pedestrian during the period
`the pedestrian is in the scene. Processing is done at three levels:
`raw images, blobs, and pedestrians. Blob tracking is modeled as a
`graph optimization problem. Pedestrians are modeled as rectan-
`gular patches with a certain dynamic behavior. Kalman filtering
`is used to estimate pedestrian parameters. The system was imple-
`mented on a Datacube MaxVideo 20 equipped with a Datacube
`Max860 and was able to achieve a peak performance of over 30
`frames per second. Experimental results based on indoor and out-
`door scenes demonstrated the system’s robustness under many dif-
`ficult situations such as partial or full occlusions of pedestrians.
`
`Index Terms—Applications, image sequence analysis, intelligent
`trasportation systems, pedestrian control at intersections, pedes-
`trian tracking, real-time tracking.
`
`I. INTRODUCTION
`
`T HERE is a wealth of potential applications of pedestrian
`
`tracking. Different applications, however, have different
`requirements. Tracking systems suitable for virtual reality ap-
`plications and for those measuring athletic performance require
`that specific body parts be robustly tracked. In contrast, secu-
`rity monitoring, event recognition, pedestrian counting, traffic
`control, and traffic-flow pattern identification applications em-
`phasize tracking on a coarser level. Here all individuals in the
`scene can be considered as single indivisible units. Of course, a
`system that can perform simultaneous tracking on all different
`scales is highly desirable but until now, no such system exists. A
`few systems that track body parts of one person [24], [13], [16]
`and two persons [6] have been developed. It remains to be seen
`
`Manuscript received April 22, 1997; revised January 28, 2001. This work
`was supported by the Minnesota Department of Transportation through Con-
`tracts #71 789-72 983-169 and #71 789-72 447-159, the Center for Transporta-
`tion Studies through Contract #USDOT/DTRS 93-G-0017-01, the National Sci-
`ence Foundation through Contracts #IRI-9 410 003 and #IRI-9502245, the De-
`partment of Energy (Sandia National Laboratories) through Contracts #AC-
`3752D and #AL-3021, and the McKnight Land-Grant Professorship Program
`at the University of Minnesota.
`The authors are with the Artificial Intelligence, Robotics, and Vision
`Laboratory, Department of Computer Science and Engineering, University
`of Minnesota, Minneapolis, MN 55455 USA (e-mail: masoud@cs.umn.edu;
`npapas@cs.umn.edu).
`Publisher Item Identifier S 0018-9545(01)08257-3.
`
`whether these systems can be generalized to track an arbitrary
`number of pedestrians.
`The work described in this paper targets the second category of
`applications[12],[10].Theproposedapproachhasalargenumber
`of potential applications which extend beyond pedestrians. For
`example, it cannot only detect and track humans in front of or
`aroundvehiclesbutitcanalsobeemployedtotrackseveraldiverse
`traffic objects of interest (vehicles in weaving sections, bicycles,
`rollerbladers, etc.). Oneshould notethat the reliable detection and
`tracking of traffic objects is important in several vehicular appli-
`cations (e.g., parking sensory aids, lane departure avoidance sys-
`tems, etc.). In this paper, we are mainly interested in applications
`relatedtotrafficcontrol withthegoalofincreasing bothsafetyand
`efficiency of existing roadways. Information about pedestrians
`crossing the streets would allow for automatic control of traffic
`lights at an intersection, for example. Pedestrian tracking also al-
`lows the use of a warning system, which can warn drivers and
`workers at a work zone from possible collision risks.
`Several attempts have been made to track pedestrians as single
`units. Baumberg and Hogg [3] used deformable templates to
`track the silhouette of a walking pedestrian. The advantage of
`their system is that it is able to identify the pose of the pedestrian.
`Tracking results were shown for one pedestrian in the scene and
`the system assumed that overlap and occlusions are minimal [2].
`Another use of the silhouette was made by Segen and Pingali [19].
`In their case, features on the pedestrian silhouette were tracked
`and their paths were clustered. The system ran in real-time but
`was not able to deal well with temporary occlusions. Occlusions
`and overlaps seem to be a primary source of instability for
`many systems. Rossi and Bozzoli [17] avoided the problem by
`mounting the camera vertically in their system which aimed to
`mainly count passing pedestrians in a corridor. Such a camera
`configuration, however, may not be feasible in some cases.
`Occlusions and overlaps occur very commonly in pedestrian
`scenes; and cannot be ignored by a pedestrian tracking system.
`The use of multiple cameras can alleviate the occlusion problem.
`Cai and Aggarwal [4] tracked pedestrians with multiple cameras.
`The system, however, did not address the occlusion problem in
`particular but rather how to match the pedestrian across different
`camera views. The switching among cameras was donemanually.
`Smith et al. [22] performed pedestrian detection in real-time.
`The system used several simplistic criteria to judge whether the
`detected object is a pedestrian or not but did not actually track
`pedestrians.
`Shio and Sklansky [20] presented a method for segmenting
`people in motion with the use of correlation techniques over suc-
`cessive frames to recover the motion field. Iterative merging was
`then used to recover regions with similar motion. The method
`
`0018–9545/01$10.00 © 2001 IEEE
`
`

`

`1268
`
`IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 50, NO. 5, SEPTEMBER 2001
`
`Fig. 1.
`
`(a) Background image. (b) Foreground. (c) Difference image showing that a blob does not always correspond to one pedestrian.
`
`was able to deal with partial occlusions. The assumption was
`that pedestrians do not change direction as they move. A dis-
`advantage of this method is the high computational cost of the
`correlation and the iterative merging steps. An interesting ap-
`proach which was presented by Heisele et al. [9] is based on
`their earlier work on color cluster flow [8]. An image is clus-
`tered into regions of similar color. In the subsequent images,
`the clusters are updated using a -means clustering algorithm.
`Assuming that the pedestrian legs form one cluster, a step to
`recognize legs enables the system to recognize and track pedes-
`trians. This was done by checking the periodicity of the cluster
`shape and by feeding the gray scale images of the legs into a
`time delay neural network. The advantage of this approach is
`that it works in the case of a moving camera. Unfortunately, due
`to several costly steps, real-time implementation was not pos-
`sible. Lipton et al. [14] performed classification and tracking of
`vehicles and pedestrians. They used a simple criterion for classi-
`fication and template matching, guided by motion detection for
`tracking. The system was able to track multiple isolated pedes-
`trians and vehicles robustly. The classification step is critical
`since the template that is used for tracking is decided based on
`it. More recently, Haritaoglu et al. [7] successfully tracked mul-
`tiple pedestrians as well as their body parts. The system made
`use of silhouette analysis in finding body parts which can be
`sensitive to occlusions.
`In developing our method, robustness in arbitrary input
`scenes with arbitrary environmental conditions without com-
`promising real-time performance was the primary motivation.
`Our approach does not have a restriction on the camera posi-
`tion. More importantly, we do not make any assumptions about
`occlusions and overlaps. Our system uses a single fixed camera
`mounted at an arbitrary position. We use simple rectangular
`patches with a certain dynamic behavior to model pedestrians.
`
`Overlaps and occlusions are dealt with by allowing pedestrian
`models to overlap in the image space and by maintaining their
`existence in spite of the disappearance of some cues. The cues
`that we use are blobs obtained by thresholding the result of
`subtracting the image from the background.
`Our choice of using blobs obtained after background sub-
`traction is motivated by the efficiency of this preprocessing step
`even though some information is permanently lost. In a typical
`scene, a blob obtained this way does not always correspond to a
`single pedestrian. An example is shown in Fig. 1. This is the main
`source of weakness in many of the systems mentioned above
`which assume a clean one-to-one correspondence between blobs
`and pedestrians. In our system, we allow maximum flexibility
`by allowing this relation to be many-to-many. This relation is
`updated iteratively depending on the observed blobs behavior
`and predictions of pedestrians behavior. Fig. 2 gives an overview
`of the system. Three levels of abstraction are used. The lowest
`level deals with raw images. In the second level, blobs are
`computed and subsequently tracked. Tracked blobs are passed
`on to the pedestrians level where relations between pedestrians
`and blobs as well as information about pedestrians are inferred
`using previous information about pedestrians in that level.
`At the images level, we perform background subtraction and
`thresholding to produce difference images. Background subtrac-
`tion has been used by many to extract moving objects in the
`scene [2], [4], [19], [22]. Change detection algorithms that com-
`pare successive frames [11], [21] can also be used to extract mo-
`tion. However, these algorithms also output regions in the back-
`ground that get uncovered by the moving object as well which
`is undesirable in our case. The choice of the threshold is critical.
`Many thresholding techniques [15], [18] work very well when
`there is an object in the scene. This is because these techniques
`assume that the image to be thresholded contains two categories
`
`

`

`MASOUD AND PAPANIKOLOPOULOS: TRACKING AND COUNTING PEDESTRIANS IN REAL-TIME
`
`1269
`
`Fig. 3.
`blobs.
`
`(a) Blobs in frame (i 1). (b) Blobs in frame i. (c) Relationship among
`
`to describe changes in the difference image in terms of motion of
`blobs and by allowing blobs to merge, split, appear, and vanish.
`Robust blob tracking was necessary since the pedestrians level
`relies solely on information passed from this level. The first step
`is to extract blobs. Connected regions are extracted efficiently
`using boundary following [5]. Another way to extract connected
`components is to use the raster scan algorithm [5]. The advan-
`tage of the latter method is that it extracts holes inside the blobs
`while boundary following does not. However, for the purpose
`of our system, holes do not constitute a major issue. Moving
`pedestrians usually form solid blobs in the difference image and
`if these blobs have holes, they may be still considered part of the
`pedestrian. Boundary following has the extra advantage of being
`more efficient since the blob interior is not considered. The fol-
`lowing parameters are computed for each blob
`1) area— : the number of pixels inside the blob;
`2) bounding box—the smallest rectangle surrounding the
`blob;
`3) density—
`/bounding box area;
`4) velocity—
`, calculated in pixels per second in hori-
`zontal and vertical directions.
`
`A. Blob Tracking
`When a new set of blobs is computed for frame , an associa-
`tion with frame
`’s set of blobs is sought. Ideally, this as-
`sociation can be an unrestricted relation. With each new frame,
`blobs can split, merge, appear, or disappear. Examples of blob
`behavior can be seen in the blob images in Figs. 7 and 8. The
`relation among blobs can be represented by an undirected bi-
`partite graph,
`, where
`.
`and
`are the sets of vertices associated with the blobs in frames and
`, respectively. We will refer to this graph as a blob graph.
`Since there is a one-to-one correspondence between the blobs
`in frame
`and the elements of
`, we will use the terms blob
`and vertex interchangeably. Fig. 3 shows how the blobs in two
`consecutive frames are associated. The blob graph in the figure
`expresses the fact that blob 1 split into blobs 4 and 5, blob 2 and
`part of blob 1 merged to form blob 4, blob 3 disappeared, and
`blob 6 appeared.
`The process of blob tracking is equivalent to computing
`for
`, where
`is the total number of frames.
`Let
`denote the set of neighbors of vertex
`
`Fig. 2. The three levels of abstraction and data flows among them.
`
`of pixel values and they try to separate the two. However, when
`there is only one category, the results become unpredictable. In
`our case, this happens often since the scene may not have any
`objects at one point in time. Instead, we used a fixed threshold
`value. The value was obtained by examining an empty back-
`ground for a short while and measuring the maximum fluctua-
`tion of pixel values during this training period. The threshold is
`set to be slightly above that value. This technique worked suf-
`ficiently well for our purpose. Several measures were taken to
`further reduce the effect of noise. A single step of erosion fol-
`lowed by a step of dilation is performed on the resulting image
`and small clusters are totally removed. Also, the background
`image is updated using a very slow recursive function to cap-
`ture slow changes in the background (e.g., changes in lighting
`conditions due to a passing cloud).
`It should be noted that even with these precautions, in a real
`world sequence, the feature image may still capture some unde-
`sirable features (e.g., shadows, excessive noise, sudden change
`in lighting conditions, and trees moved by the wind (see Fig.
`7 frame 104), etc.). It also may miss parts of moving pedes-
`trians due to occlusions (see Fig. 8 frame 32) or color similarity
`between the pedestrian clothes and the background (see Fig. 8
`frame 44). Our system handles the majority of these situation as
`will be explained in the subsequent sections.
`The next section describes the processing done at the blobs
`level. The pedestrians level is presented in Section III. Experi-
`mental results follow in Section IV. Finally, conclusions are pre-
`sented in Section V.
`
`II. BLOBS LEVEL
`
`In this level, we present a novel approach to track blobs re-
`gardless of what they represent. The tracking scheme attempts
`
`

`

`1270
`
`IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 50, NO. 5, SEPTEMBER 2001
`
`Fig. 4. Overlap area. Pedestrians p and p share blob b while b is only part of p (see Section III-B3 for overlap area computation).
`
`. To simplify the graph computation,
`we will restrict the generality of the graph to those graphs which
`do not have more than one vertex of degree more than one in
`every connected component of the graph. This is equivalent
`to saying that from one frame to the next, a blob may not
`participate in a splitting and a merging at the same time. We
`refer to this as the parent structure constraint. According to
`this constraint, the graph in Fig. 3(c) is invalid. If, however,
`we eliminate the arc between 1 and 5 or the arc between 2
`and 4, it will be a valid graph. This restriction is reasonable
`assuming a high frame rate where such simultaneous split
`and merge occurrences are rare. To further reduce the number
`of possible graphs, we use another constraint which we call
`the locality constraint. With this constraint, vertices can be
`connected only if their corresponding blobs have a bounding
`box overlap area which is at least half the size of the bounding
`box of the smaller blob. This constraint, which significantly
`reduces possible graphs, relies on the assumption that a blob
`is not expected to be too far from where it is predicted to
`be, taking into consideration its speed and location in the
`previous frame. This is also reasonable to assume if we have a
`relatively high frame rate. We refer to a graph which satisfies
`both the parent structure and locality constraints as a valid
`graph.
`To find the optimum , we define a cost function
`so
`that different graphs can be compared. A graph with no edges,
`i.e.,
`, is one extreme solution in which all blobs in
`disappear and all blobs in
`appear. This solution has no as-
`sociation among blobs and should, therefore, have a high cost.
`In order to proceed with our formulation of the cost function,
`we define two disjoint sets, which we call parents,
`, and de-
`scendents,
`, whose union is
`such that
`.
`can be easily constructed by selecting from
`all vertices
`of degree more than one, all vertices of degree zero, and all ver-
`tices of degree one that are only in
`. Furthermore, let
`.
`be the total area occupied by the neighbors of
`The cost function that we use penalizes graphs in which blobs
`change significantly in size. A perfect match would be one in
`which blob sizes remain constant (e.g., the size of a blob that
`splits equals to the sum of the sizes of blobs it split into). We
`
`now write the formula for the cost function as
`
`(1)
`
`This function is a summation of ratios of size change over all
`parent blobs.
`Using this cost function, we can proceed to compute the op-
`timum graph. First, we notice that given a valid graph
`and two vertices
`such that
`, the graph
`has a lower cost than
`provided
`that
`is a valid graph. If it is not possible to find such a
`,
`we call
`dense. Using this property, we can avoid some useless
`enumeration of graphs that are not dense. In fact, this observa-
`tion is the basis of our algorithm to compute the optimum .
`Ouralgorithmtocomputetheoptimumgraphworksasfollows:
`A graph
`is constructed such that the addition of any edge to
`makes it violate the locality constraint. There can beonly one such
`graph. Note that may violate the parent structure constraints at
`this moment. The next step in our algorithm systematically elimi-
`nates just enough edges from to make it satisfy the parent struc-
`ture constraint. The resulting graph is valid and also dense. The
`process is repeated so that all possible dense graphs are gener-
`ated. The optimum graph is the one with the minimum cost. By
`systematically eliminating edges, we are effectively enumerating
`valid graphs. The computational complexity of this step is highly
`dependent on the graph being considered. If the graph already sat-
`isfies the parent structure constraint, it is
`. On the other hand,
`if we have a fully connected graph, the complexity is exponential
`in the number of vertices. Fortunately, because of the locality con-
`straint and the high frame rate, the majority of graphs considered
`already satisfy the parent structure constrained. Occasionally, a
`small cluster of the graph may not satisfy the parent structure con-
`straint and the algorithm will need to enumerate a few graphs. In
`practice, the algorithm never took more than a few milliseconds to
`executeeveninthemostclutteredscenes.Othertechniquesto find
`the optimum (or near optimum) graph (e.g., stochastic relaxation
`using simulated annealing) can also be used. The main concern,
`however, would be their efficiency which may not be appropriate
`for this real-time application due to their iterative nature.
`
`

`

`MASOUD AND PAPANIKOLOPOULOS: TRACKING AND COUNTING PEDESTRIANS IN REAL-TIME
`
`1271
`
`Fig. 5.
`
`(a) A number of snapshots from the input sequence overlaid with pedestrian boxes shown in white and blob boxes shown in black.
`
`(a)
`
`At the end of this stage, we use a simple method to calculate
`the velocity of each blob
`based on the velocities of the blobs
`at the previous stage and the computed blob graph. The blob ve-
`locity will be used to initialize pedestrian models as described
`later. If
`is the outcome of a splitting operation, it will be as-
`signed the same velocity as the parent blob. If
`is the outcome
`of a merging operation, it will be assigned the velocity of the
`largest child blob. If
`is a new blob, it will be assigned zero
`velocity. Finally, if there is only one blob,
`, related to , the
`velocity is computed as
`
`(2)
`
`where
`and
`
`centers of the bounding boxes of
`respectively;
`weight factor set to 0.5 (found empirically);
`sampling interval since the last stage.
`
`and
`
`,
`
`III. PEDESTRIANS LEVEL
`
`The input to this level is tracked blobs and the output is the
`spatio-temporal coordinates of each pedestrian. The relationship
`between pedestrians and blobs in the image is not necessarily
`one to one. A pedestrian wearing clothes that are close in color
`to the background may show up as more than one blob. Partially
`occluded pedestrians may also result in more than one blob or
`
`

`

`1272
`
`IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 50, NO. 5, SEPTEMBER 2001
`
`Fig. 5.
`
`(Continued.) (b) A number of snapshots from the input sequence overlaid with pedestrian boxes shown in white and blob boxes shown in black.
`
`(b)
`
`even in no blobs at all if the pedestrian is fully occluded. Two
`or more pedestrians walking close to each other may give rise
`to a single blob. For this reason, it was necessary to make the
`pedestrians level capable of handling all the above cases. We
`do this by modeling the pedestrian as a rectangular patch with
`a certain dynamic behavior. We found that for the purpose of
`tracking, this simple model adequately resembles the pedestrian
`shape and motion dynamics. We now present this model in more
`detail and then describe how tracking is performed.
`
`A. Pedestrian Model
`Pedestrians usually walk with a constant speed. Moreover,
`the speed of a pedestrian usually changes gradually when
`
`the pedestrian desires to stop or start walking. Our approach
`assumes that motion in the scene is constrained to a plane
`(also called the ground-plane constraint). With this assumption,
`back projection from the scene to the image plane can be
`performed (with the knowledge of the camera geometry) to
`determine the expected dimensions and dynamic behavior of
`the pedestrian in the image coordinate system. Small variations
`in ground elevation will still be tolerated especially in distant
`areas. This restriction can be removed if the scene topology
`can be determined a priori. Camera geometry can be obtained
`using calibration techniques. A suitable technique for traffic
`scenes which makes use of the ground-plane constraint is
`given in [23].
`
`

`

`MASOUD AND PAPANIKOLOPOULOS: TRACKING AND COUNTING PEDESTRIANS IN REAL-TIME
`
`1273
`
`Fig. 6. A number of snapshots from the input sequence in a snowy afternoon overlaid with pedestrian boxes shown in black.
`
`The pedestrian is modeled as a rectangular patch whose di-
`mensions depend on its location in the image. The dimensions
`are equal to the projection of the dimensions of an average size
`pedestrian at the corresponding location in the scene. The patch
`is assumed to move with constant velocity in the scene coordi-
`nate system. The patch acceleration is modeled as zero mean,
`Gaussian noise to accommodate for changes in velocity. Given
`a sampling interval
`, the discrete-time dynamic system for the
`
`pedestrian model can be described by the following equation:
`
`where
`
`state vector consisting of the pedestrian
`location,
`and velocity,
`;
`
`(3)
`
`

`

`1274
`
`IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 50, NO. 5, SEPTEMBER 2001
`
`Fig. 7. Tracking sequence of a pedestrian in different occlusion situations.
`
`transition matrix of the system given by
`
`and
`
`represents the variance of the accel-
`
`;
`
`zero-mean, white,
`of
`sequence
`Gaussian process noise with covari-
`ance matrix
`.
`denotes the location of the pedestrian in the scene.
`where
`computed as in [1] (p. 84) to become
`
`is
`
`eration.
`
`B. Pedestrian Tracking
`
`Tracking pedestrians depends on the current state of pedes-
`trians as well as the input to pedestrians level which is the
`tracked blobs. In our system, we use extended Kalman filtering
`(EKF) to estimate pedestrian parameters. We maintain a
`many-to-many relationship between pedestrians and blobs and
`
`

`

`MASOUD AND PAPANIKOLOPOULOS: TRACKING AND COUNTING PEDESTRIANS IN REAL-TIME
`
`1275
`
`Fig. 8. Tracking sequence demonstrating occlusions and pedestrian overlap.
`
`then use it to provide measurements to the filter. The next five
`sections describe one tracking cycle.
`1) Relating Pedestrians to Blobs: We represent the relation-
`ship between pedestrians and blobs as a directed bipartite graph.
`We relate pedestrians to blobs using a simple rule: if a pedestrian
`was related to a blob in frame
`and that blob is related to
`another blob in the th frame (through a split, merge, etc.), then
`the pedestrian is also related to the latter blob.
`2) Prediction: Given the system equation as in Section
`III-A, the prediction phase of the Kalman filter is given by the
`following equations:
`
`(4)
`
`Here,
`are the predicted state vector and state error co-
`and
`variance matrix, respectively.
`and
`are the previously es-
`timated state vector and state error covariance matrix, respec-
`tively.
`3) Calculating Pedestrian Positions: In this step, we use the
`predicted pedestrian locations as starting positions and we em-
`ploy a two-dimensional (2-D) search to locate the pedestrian.
`The search attempts to find the best overlap between the pedes-
`trian patch and the blobs assigned to the pedestrian. The overlap
`computation takes into consideration the density of the blobs as
`well as the possibility that a blob is shared by more than one
`pedestrian. Fig. 4 illustrates this computation. The overlap area
`for
`is computed as
`.
`For
`, the overlap area is
`. To per-
`
`

`

`1276
`
`IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 50, NO. 5, SEPTEMBER 2001
`
`form the search, we employ a heuristic solution using relaxation.
`First, a large step size is chosen. Then, each pedestrian is moved
`in all possible directions by the step size and the location (in-
`cluding the original location) which maximizes the overlap area
`is recorded. All pedestrian locations are then updated according
`to the recorded locations. This completes one iteration. In each
`following iteration, the step size is decreased. In our implemen-
`tation, we start with a step of 64 pixels and halve the step size
`in each iteration until 1 pixel step size is reached.
`The resulting locations form the measurements that will be
`fed back into the EKF to produce the new state estimates. More-
`over, we use the overlap area to provide feedback about the mea-
`surement confidence by setting the measurement error standard
`deviation, which is described below, to be inversely proportional
`to the ratio of the overlap area to the pedestrian area. That is, the
`smaller the overlap area, the less confident the measurement is
`considered.
`4) Estimation: A measurement is a location in the image
`coordinate system as computed in the previous section,
`. Measurements are related to the state vector by
`
`(5)
`
`is a sequence of
`is the measurement function and
`where
`zero-mean, white, Gaussian measurement noise with covariance
`given by
`. The measurement error standard devi-
`depends on the overlap area computed in the previous
`ation
`section. Pedestrian locations are expressed in world coordinates
`resulting in
`being a nonlinear function which performs pro-
`jection into image coordinates. We let
`be the Jacobian of
`.
`The EKF state estimation equations become
`
`(6)
`
`Fig. 9. Computed rms errors for a pedestrian in the sequence of Fig. 5.
`
`.
`is the Kalman gain at
`where
`The estimated state vector
`is the outcome of the pedes-
`trians level. The dimensions of the pedestrian patch are com-
`puted based on the estimated pedestrian location.
`5) Refinement: At the end of this stage, we perform some
`checks to refine the pedestrian-blob relationships since pedes-
`trians have been relocated. These can be summarized as follows.
`1) If the overlap area between a pedestrian and one of its
`blobs becomes less than a certain threshold (a percentage
`of the size of both), it will no longer be considered be-
`longing to this pedestrian. This serves as the splitting pro-
`cedure when two pedestrians walk past each other. This
`threshold determines the degree of stickiness between
`blobs and pedestrians. In our experiments, a threshold of
`10% gave the best tracking performance.
`2) If the overlap area between a pedestrian and a blob that
`does not belong to any pedestrian becomes more than
`the threshold mentioned above, the blob will be added to
`the pedestrian blobs. This makes the pedestrian reacquire
`some blobs that may have disappeared due to occlusion.
`3) Look for a cluster of blobs, which are not related to any
`pedestrians and whose age is larger than a threshold (i.e.,
`
`have been successfully tracked for a certain number of
`frames). A new pedestrian may be initialized only if it
`will become sufficiently covered by the blobs cluster.
`The pedestrian is given an initial velocity equal
`to
`the average of the blobs velocities. This serves as the
`initialization step. The requirement on the age helps
`in reducing chances of unstable blobs being used to
`initialize pedestrians (for example, blobs resulting from
`tree motion caused by wind and blobs due to noise). A
`threshold of one second was sufficient in most cases.
`4) Select one of the blobs which is already assigned one or
`more pedestrians but can accommodate more pedestrian
`patches. Create a new pedestrian for this blob as in 3).
`This handles cases in which a group of people form one
`big blob, which does not split. If we do not do this step,
`only one pedestrian would be assigned to this blob.
`
`IV. EXPERIMENTAL RESULTS
`
`The system was implemented on a Datacube MaxVideo 20
`video processor, and a Datacube Max860 vector processor. It
`
`

`

`MASOUD AND PAPANIKOLOPOULOS: TRACKING AND COUNTING PEDESTRIANS IN REAL-TIME
`
`1277
`
`Fig. 10. A number of snapshots from a weaving section tracking sequence (the vehicles in the two lanes closer to the edge of the highway are tracked).
`
`was later ported to a 400-MHz Pentium PC equipped with a C80
`Matrox Genesis vision board.
`The system was tested on several indoor and outdoor image
`sequences. Several outdoor sequences in different weather
`conditions (sunny, cloudy, snow, etc.) have been used. In
`most cases, pedestrians were tracked correctly throughout
`the period they appeared in the scene. Scenarios included
`pedestrians moving at a slow or very high speeds, partial and
`full occlusions, bicycles, and several pedestrian interactions.
`Interactions between pedestrians included occlusion of one
`another, repeated merging and splitting of blobs corresponding
`to two or more pedestrians walking together, pedestrians
`walking past each other, and pedestrians meeting and then
`walking back in the direction they came from. The system has
`a peak performance of 30 frames/s. In a relatively cluttered
`image with about six pedestrians, the frame processing rate
`dropped down to about 18 frames/s. Fig. 5 shows 16 snapshots
`spanning a sequence of 8.4 s. The sequence demonstrates the
`system behavior against occlusions—both partial and full.
`The figure also demonstrates how the system works at the
`blobs level. Blobs are shown by their black bounding boxes.
`Notice how tracking is preserved despite the lack of one-to-one
`correspondence between blobs and pedestrians. Fig. 6 shows
`12 snapshots from a scene under different weather conditions.
`The snapshots span a sequence of 35 seconds. A more cluttered
`sequence is shown in Fig. 7. This sequence was taped during a
`snow storm. One can see the effect of the wind on the tree which
`resulted in false blobs in the difference images (frames 36, 66,
`104, 140). The system deals with this situation in several ways.
`1) If the blobs generated are too small, they are automati-
`cally eliminated.
`2) A blob is considered at the pedestrian level only if it is
`tracked successfully for a certain period of time. This
`eliminates blobs that appear then disappear momentarily
`(such as most blobs that appear due to tree motion back
`and forth).
`3) The system can be given information about the scene.
`In particular, the locations where pedestrians can be ex-
`pected to appear. Thus, the system will not instantiate
`pedestrian boxes except at these locations.
`
`shows what happens when two pedestrian walk past each other.
`Kalman filtering is essential here because it provides good
`prediction when there is little or no data. The blob-pedestrian
`relationship refinement procedures guarantee that the pedes-
`trian will be related to the co

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket