throbber

`

`UMMU...
`1A
`,~,~
`·-f~q$
`1 Y4'f/f
`v, f
`bJ:.S
`
`Distributed by:
`Morgan Kaufmann Publishers Inc.
`340 Pine Street, 6th Floor
`San Francisco, Calif. 94104-3205
`ISBN: 1-55860-583-5
`Printed in the United States of America
`
`Canon Ex. 1004 Page 2 of 20
`
`

`

`Table of Contents
`Table of Contents ........................................................................... .............................................................. iii
`
`Author Index ................................................................................................................................................ xi
`
`Foreword .... ............................................................................................................................................... xiv
`
`Acknowledgements .................................................................................................................................. xvii
`
`Volume I
`
`Section I - Video Surveillance and Monitoring (VSAM)
`Video Surveillance and Monitoring - Principal Investigator Reports
`
`"Advances in Cooperative Multi-Sensor Video Surveillance," Takeo Kanade, Robert T. Collins,
`Alan J. Lipton, Peter Burt and Lambert Wixson .......................................................................................... 3
`
`"Extra Sets of Eyes," Kurt G. Konolige and Robert C. Bolles .................................................................. 25
`
`"Forest of Sensors: Using Adaptive Tracking to Classify and Monitor Activities in a Site,"
`W. Eric L. Grimson, Chris Stauffer, R. Romano, L. Lee, Paul Viola and Olivier Faugeras ..................... 33
`
`"Image Understanding Research at Rochester," Christopher Brown, Kiriakos N. Kutulakos
`and Randal C. Nelson ................................................................................................................................. 43
`
`"A Multiple Perspective Interactive Video Architecture for VSAM," Simone Santini
`and Ramesh Jain ......................................................................................................................................... 51
`
`"Multi-Sensor Representation of Extended Scenes using Multi-View Geometry," Shmuel Peleg,
`Amnon Shashua, Daphna Weinshall, Michael Werman and Michal Irani ................................................. 57
`
`"Event Detection and Analysis from Video Streams," Gerard Medioni, Ram Nevatia
`and Isaac Cohen ........................................................................................................... ............................... 63
`
`"Visual Surveillance and Monitoring," Larry S. Davis, Rama Chellappa, Azriel Rosenfeld,
`David Harwood, Ismail Haritaoglu and Ross Cutler ................................................................................. 73
`
`"Aerial and Ground-Based Video Surveillance at Cornell University," Daniel P. Huttenlocher
`and Ramin Zabih ................................................................................................... ..................................... 77
`
`"Reliable Video Event Recognition for Network Cameras," Bruce Flinchbaugh ..................................... 81
`
`"VSAM at the MIT Media Lab and CBCL: Learning and Understanding Action in Video Imagery,"
`Aaron Bobick, Alex Pentland and Tomaso Poggio ........................................................................ ............ 85
`
`"Omnidirectional Vision Systems: 1998 PI Report," Shree K. Nayar and Terrance E. Boult .................. 93
`
`"Image-Based Visualization from Widely-Separated Views," Charles R. Dyer ................................. .... 101
`
`"Retrieving Color, Patterns, Texture and Faces," Carlo Tomasi and Leonidas J. Guibas ....................... 107
`Video Surveillance and Monitoring -Technical Papers
`
`"Using aDEM to Determine Geospatial Object Trajectories," Robert T. Collins, Yanghai Tsin,
`J. Ryan Miller and Alan J. Lipton ............ .... .......... ..... .. .... .... .. .. .... .. ......... ... .. .. .. .. .. .. .... .. .. . .. ............ ...... .. .. 115
`
`"Homography-Based 3D Scene Analysis of Video Sequences," Mei Han and Takeo Kanade ............... 123
`
`iii
`
`Canon Ex. 1004 Page 3 of 20
`
`

`

`Event Recognition and Reliability Improvements for
`the Autonomous Video Surveillance System
`
`Frank Z. Brill, Thomas J. Olson, and Christopher Tserng
`Texas Instruments
`P.O. Box 655303, MS 8374, Dallas, TX 75265
`brill@csc.ti.com, olson@csc.ti.com, tserng@csc.ti.com
`
`Abstract
`
`This report describes recent progress in the devel(cid:173)
`opment of the Autonomous Video Surveillance
`(AVS) system, a general-purpose system for mov(cid:173)
`ing object detection and event recognition. AVS
`analyses live video of a scene and builds a descrip(cid:173)
`tion of the activity in that scene. The recent
`enhancements to AVS described in this report are:
`(1) use of collateral information sources, (2) cam(cid:173)
`era hand-off, (3) vehicle event recognition, and (4)
`complex-event recognition. Also described is a
`new segmentation and tracking technique and an
`evaluation of AVS performing the best-view selec(cid:173)
`tion task.
`
`1. Introduction
`
`The Autonomous Video Surveillance (AVS) sys(cid:173)
`tem processes live video streams from surveillance
`cameras to automatically produce a real-time map(cid:173)
`based display of the locations of people, objects
`and events in a monitored region. The system al(cid:173)
`lows a user
`to
`specify alarm conditions
`interactively, based on the locations of people and
`objects in the scene, the types of objects in the
`scene, the events in which the people and objects
`are involved, and the times at which the events oc(cid:173)
`cur. Furthermore, the user can specify the action to
`take when an alarm is triggered, e.g., to generate an
`audio alarm or write a log file. For example, the
`user can specify that an audio alarm should be trig(cid:173)
`gered if a person deposits a briefcase on a given
`table between 5:00pm and 7:00am on a weeknight.
`Section 2 below describes recent enhancements to
`
`This research was sponsored in part by the DARPA Image
`Understanding Program.
`
`the AVS system. Section 3 describes progress in
`improving the reliability of segmentation and
`tracking. Section 4 describes an experiment that
`quantifies the performance of the AVS "best view
`selection" capability.
`
`2. New AVS functionality
`
`The structure and function of the AVS system is
`described in detail in a previous IUW paper [Olson
`and Brill, 1997]. The primary purpose of the cur(cid:173)
`rent paper is to describe recent enhancements to
`the AVS system. These enhancements are de(cid:173)
`scribed in four sections below: (1) collateral
`information sources, (2) camera hand-off, (3) vehi(cid:173)
`cle event recognition, and ( 4) complex-event
`recognition.
`
`2.1. Collateral information sources
`
`Figure 1 shows a diagram of the AVS system. One
`or more "smart" cameras process the video stream
`to recognize events. The resulting event streams
`are sent to a Video Surveillance Shell (VSS),
`which integrates the information and displays it on
`a map. The VSS can also generate alarms based on
`the information in the event streams. In recent
`work, the VSS was enhanced to accept information
`from other sources, or "recognition devices" which
`can identify the objects being reported on by the
`cameras. For example, a camera may report that
`there is a person near a door. A recognition device
`may report that the person near the door is Joe
`Smith. The recognition device may be a badge
`reader, a keypad in which a person types their PIN,
`a face recognition system, or other recognition sys(cid:173)
`tem.
`
`267
`
`Canon Ex. 1004 Page 4 of 20
`
`

`

`

`

`the hallway, and Camera-2
`era-1 monitors
`monitors the interior of the room. When a person
`moves through the doorway to enter the room from
`the hall or vice-versa, camera hand-off is necessary
`to enable the system to know that the person that
`was being monitored in the hall via Camera-l is
`the same as the person being monitored in the
`room via Camera-2.
`
`The AVS system accomplishes camera hand-off by
`integrating the information from the two cameras
`in the map coordinate system. The AVS "smart"
`cameras report the locations of the monitored ob(cid:173)
`jects and people in map coordinates, so that when
`the VSS receives reports about a person from two
`separate cameras, and both cameras are reporting
`the person's coordinates at about the same map lo(cid:173)
`cation, the VSS can deduce that the two separate
`reports refer to the same person. In the example de(cid:173)
`picted in Figure 2, when a person is standing in the
`doorway, both cameras can see the person and re(cid:173)
`port his or her location at nearly the same place.
`The VSS reports this as one person, using a mini(cid:173)
`mum distance to allow for errors in location. When
`Camera-2 first sees a person at a location near the
`doorway and reports this to the VSS, the VSS
`checks to see if Camera-l recently reported a per(cid:173)
`son near the door. If so, the VSS reports the person
`in the room as the same one that Camera-l had
`been tracking in the hall.
`
`2.3. Vehicle event recognition
`
`This section describes extensions to the existing
`AVS system that enable the recognition of events
`involving interactions of people with cars. These
`new capabilities enable smart security cameras to
`monitor streets, parking lots and driveways and re(cid:173)
`port when suspicious events occur. For example, a
`smart camera signals an alarm when a person exits
`a car, deposits an object near a building, reenters
`the car, and drives away.
`
`2.3.1. Scope and assumptions
`
`Extending the AVS system to handle human-vehi(cid:173)
`cle interactions reliably involved two separable
`subproblems. First, the system's vocabulary for
`events and objects must be extended to handle a
`new class of object (vehicle) and new event types.
`Second, the AVS moving object detection and
`tracking software must be modified to handle the
`outdoor environment, which features variable
`lighting, strong shadows, atmospheric disturbanc-
`
`es, and dynamic backgrounds. The work
`described here in section 2.3 addresses the first
`problem, to extend the system for vehicle events in
`conditions of uniform overcast with little wind.
`Our approach to handling general outdoor lighting
`conditions is discussed in section 4.
`
`The method is further specialized for imaging con(cid:173)
`ditions in which:
`
`1. The camera views cars laterally.
`2. Cars are unoccluded by other cars.
`3. When cars and people overlap, only one of
`the overlapping objects is moving
`4. The events of interest are people getting
`into and out of cars.
`
`2.3.2. Car detection
`
`The first thing that was done to expand the event
`recognizing capability of the current system was to
`give the system the ability to distinguish between
`people and cars. The system classifies oojects as
`cars by using their sizes and aspect ratios. The size
`of an object in feet is obtained using the AVS sys(cid:173)
`tem's
`image coordinate
`to world coordinate
`mapping. Once the system has detected a car, it an(cid:173)
`alyzes the motion graph to recognize new events.
`
`2.3.3. Car event recognition
`
`In principle, car exit and car entry events could be
`recognized by detecting characteristic interactions
`of blobs in difference images, in a manner similar
`to the way AVS recognizes DEPOSIT and RE(cid:173)
`MOVE events. In early experiments, however, this
`method turned out to be unsatisfactory because the
`underlying motion segmentation method did not
`segment cars from people. Whenever the people
`pass near the car they appear to merge with it, and
`track is lost until they walk away from it.
`
`To solve this problem, a new approach involving
`additional image differencing was developed. The
`technique allows objects to be detected and tracked
`even when their images overlap the image of the
`car. This method requires two reference images:
`one consists of the original background scene
`(background image), and the other is identical to
`the first except it includes the car. The system takes
`differences between the current video image and
`the original reference image as usual. However, it
`also differences the current video image with the
`reference image containing the car. This allows the
`
`269
`
`Canon Ex. 1004 Page 6 of 20
`
`

`

`

`

`

`

`

`

`2.3.5.1. Simple sequence. The first sequence was
`filmed from the 3rd story of an office building
`overlooking the driveway in front of the building.
`A car drives up and a person exits the car, walks
`away, deposits a briefcase, and finally reenters the
`car. Then, the car drives away. In this segment, the
`system successfully detects the person exiting the
`car. However, the person entering the car is missed
`because the person gets grouped with a second per(cid:173)
`son walking near the car.
`
`Further on in the sequence, the car drives up again
`and a person exits the car, walks away, removes the
`briefcase, and finally reenters the car. Again, the
`car drives away. In this segment, both the person
`entering and exiting the car are recognized. In both
`these sequences, there was only the one false nega(cid:173)
`tive mentioned earlier and no false positives.
`
`sequence was
`2.3.5.2. Pickup sequence. This
`filmed in front of a house looking at the street in
`front of the house. In the sequence, a person walks
`into the scene and waits at the curb. A car drives
`up, picks up the person, and drives away. The sys(cid:173)
`tem correctly detects the person entering the car.
`There are no false positives or negatives.
`
`sequence was
`2.3.5.3. Drop off sequence. This
`filmed in the same location as the previous one. In
`this sequence, a car drives up and a person is
`dropped off. The car drives away with the person
`still standing in the same location. Then, the person
`walks off. The system correctly detects the person
`exiting the car and does not report a false enter
`event when the car moves away.
`
`2.3.6. Experiments: videotaped sequences
`
`These sequences were run on the system straight
`from videotape. These were all run at a higher
`threshold to accommodate noise on the videotape.
`However, this tended to decrease the performance
`of the system.
`
`2.3.6.1. Dark day. This is a 15 minute sequence
`that was recorded from the 3rd floor of a building
`on a fairly dark day. In that time span, 8 cars passed
`through the camera's field of view. The system de(cid:173)
`tected 6 cars correctly and one false car (due to
`people grouped together). One car that was not de(cid:173)
`tected was due to its small size. The other car was
`undetected because the system slowed down (due
`to multiple events occurring) and missed the imag-
`
`es with the car in them. In this sequence, two
`people entered a car. However, both events were
`missed because the car was not recognized as rest(cid:173)
`ing due to the dark lighting conditions on this rainy
`day.
`
`2.3.6.2. Cloudy day. This is a 13 minute sequence
`in the same location as the previous sequence ex(cid:173)
`cept it is a cloudy day. In this time span, 9 cars
`passed through the camera's field of view and all of
`them were detected by the system. There were a to(cid:173)
`tal of 2 people entering a car and 2 people exiting a
`car. The system successfully detected them all. Ad(cid:173)
`ditionally,
`it
`incorrectly reported one person
`walking near a car as an instance of a person exit(cid:173)
`ing a car.
`
`2.3.6.3. Cloudy day-extended time. This is a 30
`minute sequence in the same location as the previ(cid:173)
`ous two. In this time span, 28 cars pass through and
`all of them were detected. The system successfully
`detected one person exiting a car but missed two
`others. The two people were missed because the
`car was on the edge of the camera's field of view
`and so it was not recognized immediately as a car.
`
`2.3.7. Evaluation of car-event recognition
`
`The modified AVS system performs reasonably
`well on the test data. However, it has only been
`tested on a small number of videotaped sequences,
`in which much of the action was staged. Further
`experiments and further work with live, uncon(cid:173)
`trolled data will be required to make the system
`handle outdoor vehicle events as well as it handles
`indoor events. The technique of using multiple ref(cid:173)
`erence images is interesting and can be applied to
`other problems, e.g. handling repositioned furni(cid:173)
`ture in indoor environments. For more detail on
`this method, see [Tsemg, 1998].
`
`2.4. Complex events
`
`The AVS video monitoring technology enables the
`recognition of specific events such as when a per(cid:173)
`son enters a room, deposits or picks up an object,
`or loiters for a while in a given area. Although
`these events are more sophisticated than those de(cid:173)
`tected via simple motion detection, they are still
`unstructured events that are detected regardless of
`the context in which they occur. This can result in
`alarms being generated on events that are not of
`interest.
`
`273
`
`Canon Ex. 1004 Page 10 of 20
`
`

`

`For example, if the system is monitoring a room or
`store with the intention of detecting theft, the sys(cid:173)
`tem could be set up to generate an alarm whenever
`an object is picked up (i.e., whenever a REMOVE
`event occurs). However, no theft has occurred un(cid:173)
`less the person leaves the area with the object. A
`simple, unstructured event recognition system
`would generate an alarm every time someone
`picked up an object, resulting in many false alarms;
`whereas a system that can recognize complex
`events could be programmed to only generate an
`alarm when the REMOVE event is followed by an
`EXIT event. The EXIT event provides context for
`the REMOVE event that enables the system to fil(cid:173)
`ter out uninteresting cases in which the person does
`not leave the area with the object they picked up.
`This section describes the design and implementa(cid:173)
`tion of such a complex-event recognition system.
`
`We use the term simple event to mean an unstruc(cid:173)
`tured atomic event. A complex event is structured,
`in that it is made up of one or more sub-events. The
`sub-events of a complex event may be simple
`events, or they may be complex, enabling the defi(cid:173)
`nition of event hierarchies. We will simply say
`event to refer to an event that may be either simple
`or complex. In our theft example above, REMOVE
`and EXIT are simple events, and THEFT is a com(cid:173)
`plex event. A user may also define a further event,
`e.g., CRIME-SPREE, which may have one or more
`complex THEFT events as sub-events.
`
`We created a user interface that enables definition
`of a complex event by constructing a list of sub(cid:173)
`events. After one or more complex events have
`been defined, the sub-events of subsequently de(cid:173)
`fined complex events can be complex events
`themselves.
`
`2.4.1. Complex-event recognition
`
`Once the user has defined the complex events and
`the actions to take when they occur, the event rec(cid:173)
`ognition system recognizes these events as they
`occur in the monitored area. For the purposes of
`this section, we assume a priori that the simple
`events can be recognized, and that the object in(cid:173)
`volved
`in
`them can be
`tracked.
`In
`the
`implementation we will use the methods discussed
`in [Courtney, 1997, Olson and Brill, 1997] to track
`objects and recognize the simple events. In order to
`recognize a complex event, the system must keep a
`record of the sub-events that have occurred thus
`
`far, and the objects involved in them. Whenever the
`first sub-event in a complex event's sequence is
`recognized, an activation for that complex event is
`created. The activation contains the ID of the ob(cid:173)
`ject involved in the event, and an index, which is
`the number of sub-events in the sequence that have
`been recognized thus far. The index is initialized to
`1 when the activation is created, since the activa(cid:173)
`tion is only created when the first sub-event
`matches. The system maintains a list of current ac(cid:173)
`tivations for each defined complex-event type.
`Whenever any new event is recognized, the list of
`current activations is consulted to see if the newly
`recognized (or incoming) event matches the next
`sub-event in the complex event. If so, the index is
`incremented. If the index reaches the total number
`of sub-events in the sequence, the complete com(cid:173)
`plex event has been recognized, and any desired
`alarm can be generated. Also, since the complex
`event that was just recognized may also be a sub(cid:173)
`event of another complex event, the activation lists
`are consulted again (recursively) to see if the indi(cid:173)
`ces of any other complex event activations can be
`advanced.
`
`To return to our THEFT example, the complex
`THEFT event has two sub-events, REMOVE and
`EXIT. When a REMOVE event occurs, an activa(cid:173)
`tion for the THEFT event is created, containing the
`ID of the person involved in the REMOVE event,
`and an index set to 1. Later, when another event is
`recognized by the system, the activation is consult(cid:173)
`ed to see if the event type of this new, incoming
`event matches the next sub-event in the sequence
`(in this case, EXIT). If the event type matches, the
`object ID is also checked, in this case to see if the
`person EXITing is -the same as that of the person
`who REMOVEd the object earlier. This is to ensure
`that we do not signal a THEFT event when one
`person picks up an object and a different person ex(cid:173)
`its the area. In a closed environment, the IDs used
`may merely be track-IDs, in which each object that
`enters the monitored area is assigned a unique
`track-ID, and the track-ID is discarded when the
`object is no longer being tracked. If both the event
`type and the object ID match, the activation's index
`is incremented to 2. Since there are only 2 sub(cid:173)
`events in the complex event in this example, the en(cid:173)
`tire complex-event has been recognized, and an
`alarm is generated if desired. Also, since the
`THEFT event has been recognized, this newly rec(cid:173)
`ognized THEFT event may be a sub-event of
`
`274
`
`Canon Ex. 1004 Page 11 of 20
`
`

`

`another complex event. When the complex THEFT
`event is recognized, the current activations are re(cid:173)
`cursively checked to see if the theft is a part of
`another higher-level event, such as a CRIME(cid:173)
`SPREE.
`
`2.4.2. Variations and enhancements
`
`We have described the basic mechanism of defin(cid:173)
`ing and recognizing complex events. There are
`several variations on this basic mechanism. One is
`to allow unordered events, i.e., complex events
`which are simply the conjunction or disjunction of
`their sub-events. Another is to allow negated sub(cid:173)
`events, which can be used to cancel an activation
`when the negated sub-event occurs. For example,
`considering the definition for THEFT again, if the
`person pays for the item, it is not a theft. Also, if
`the person puts the item back down before leaving,
`no theft has occurred. A _ more complete definition
`of theft is one in which "a person picks up an item
`and then leaves without putting it back or paying."
`Assuming we can recognize the simple events RE(cid:173)
`MOVE, DEPOSIT, PAY, and EXIT, the complex
`THEFT event can now be expressed as the ordered
`list (REMOVE, -DEPOSIT, -PAY, EXIT), where
`"-" indicates negation. Another application of the
`complex event with negated sub-events is to detect
`suspicious behavior in front of a building. The nor(cid:173)
`mal behavior may be for a person to park the car,
`get out of it, and then come up into the building. If
`the person parks the vehicle and leaves the area
`without coming up into the building, this may be a
`car bombing scenario. If we can detect the sub(cid:173)
`events for PARK, OUTCAR, ENTER-BUILDING,
`and EXIT, we can define the car-bombing scenario
`as
`(PARK, OUTCAR,
`-ENTER-BUILDING,
`EXIT).
`
`Another variation is to allow the user to label the
`objects involved in the events, which facilitates the
`ability to specify that two object be different. Con-
`
`sider a different car bombing scenario in which two
`cars pull up in front of the building, and a person
`gets out of one car and into the other, which drives
`away. The event definition must specify that there
`are two different cars involved: the car-bomb and
`the getaway-car. This can be accomplished by la(cid:173)
`belling the object involved when defining the
`event, and giving different labels to objects which
`must be different.
`
`Finally, one could allow multiple activations for
`the same event. For example, the desired behavior
`may be that a separate THEFT event should be sig(cid:173)
`nalled for each item stolen by a given person, e.g.,
`if a person goes into a store and steals three things,
`three THEFT events are recognized. The basic
`mechanism described above signals a single
`THEFT event no matter how many objects are sto(cid:173)
`len. We can achieve the alternate behavior by
`creating multiple activations for a given event type,
`differing only in the ID's of the objects involved.
`
`2.4.3. Implementation in AVS
`
`We have described a method for defining and rec(cid:173)
`ognizing complex events. Most of this has been
`implemented and incorporated into the AVS sys(cid:173)
`tem. This subsection describes
`the current
`implementation.
`
`AVS analyzes the incoming video stream to detect
`and recognize events such as ENTER, EXIT, DE(cid:173)
`POSIT, and REMOVE. The primary technique
`used by AVS for event recognition is motion graph
`matching as described in [Courtney, 1997]. The
`AVS system recognizes and reports these events in
`real time as illustrated in Figure 10. When the per(cid:173)
`son enters the monitored area, an ENTER event is
`recognized as shown in the image on the left.
`When the person picks up an object, a REMOVE
`event is recognized, as depicted in the center image
`below. When the person exits the area, the EXIT
`
`Figure 10: A series of simple events
`
`275
`
`Canon Ex. 1004 Page 12 of 20
`
`

`

`event is signalled as shown in the image on the
`right
`
`While the AVS system recognizes numerous events
`as shown above, the user can select which events
`are of interest by providing the dialog box interface
`illustrated in Figure 11. The user selects the event
`type, object type, time, location, and duration of
`the event of interest using a mouse. The user can
`also select an action for the AVS system to take
`when the event is recognized. This dialog box de(cid:173)
`fines one type of simple event; an arbitrary number
`of different simple event types can be defined via
`multiple uses of the dialog box. The illustration in
`Figure 11 shows a dialog box defining an event
`called "Loiter by the door" which is triggered
`when a person loiters in the area near the door for
`more than 5 seconds.
`
`AVS will generate a voice alarm and write a log en(cid:173)
`try when the specified event occurs. If the event is
`only being defined in order to be used as a sub(cid:173)
`event in a complex event, the user might not check
`any action box, and no action will be taken when
`
`the event is recognized except to see if it matches
`the next sub-event in a complex-event activation, or
`generate a new activation if it matches the first sub(cid:173)
`event in a complex event.
`
`After one or more simple events have been defined,
`the user can define a complex event via the dialog
`box shown in Figure 12. This dialog box presents
`two lists: on the left is a scrolling list of all the
`event types that have been defined thus far, and on
`the right is a list of the sub-events of the complex
`event being defined. The sub-event list is initially
`blank when defining a new complex event. When
`the user double-clicks with the left mouse button
`on an item in the event list on the left, it is added as
`the next item in the sub-event list on the right.
`When the user double-clicks with the right mouse
`button on an item in the event list on the left, that
`item is also added to the sub-event list on the right,
`but as a negated sub-event. The event name is pre(cid:173)
`fixed with a tilde (-) to indicate that the event is
`negated.
`
`Figure 11: Selecting a type of simple event
`
`Figure 12: Defining a complex event
`
`276
`
`Canon Ex. 1004 Page 13 of 20
`
`

`

`In the upper right comer of the complex-event defi(cid:173)
`nition dialog box is an option menu via which the
`user indicates how the sub-events are to be com(cid:173)
`is "ordered" to
`bined. The default selection
`indicate sequential processing of the sub-events.
`The other options are "all" and "any." If "all" is se(cid:173)
`lected, the complex event will be signalled if all of
`the sub-events are matched, regardless of order,
`i.e., the complex event is simply the conjunction of
`the sub-events. If "any" is selected, the complex
`event occurs if any of the sub-events occurs, i.e.,
`the complex event is the disjunction of the sub(cid:173)
`events. At the bottom of the dialog box, the user
`can select the action to take when the complex
`event is recognized. The user can save the entire set
`of event definitions to a file so that they may be
`read back in at a later time.
`
`Once a simple or complex event has been defined,
`the AVS system immediately begins recognition of
`the new events in real time, and taking the actions
`specified by the user. The AVS system, augmented
`as described, provides a functioning realization of
`the complex-event recognition method.
`
`3. Advanced segmentation and tracking
`
`In security applications, it is often necessary to
`track the movements of one or more people and ob(cid:173)
`jects in a scene monitored by a video camera. In
`real scenes, the objects move in unpredictable
`ways, may move close to one another, and may oc(cid:173)
`clude each other. When a person moves, the shape
`of his or her image changes. These factors make it
`difficult to track the locations of individual objects
`throughout a scene containing multiple objects.
`The tracking capabilities of the original AVS sys(cid:173)
`tem fail when there is mutual occlusion between
`the tracked objects. This section describes a new
`
`tracking method which overcomes this limitations
`of the previous tracking method, and maintains the
`integrity of the tracks of people even when they
`partially occlude one another.
`
`The segmentation algorithm described here is relat(cid:173)
`ed to tracking systems such as [Wren et al., 1997,
`Grimson et al., 1998, Cai et al., 1995] in that it ex(cid:173)
`tends the reference image to include a statistical
`model of the background. Our method further ex(cid:173)
`tends the tracking algorithm to reason explicitly
`about occlusion and maintain object tracks during
`mutual occlusion events. Unlike the capabilities
`described in previous sections, the new tracking
`method does not run in real time, and has not yet
`been integrated into the AVS system. Optimiza(cid:173)
`tions of the new method are expected to enable it to
`achieve real time operation in the future.
`
`Figure 13 depicts an example scene containing two
`people. In (a), the two people are standing apart
`from each other, with Person-I on the left, and Per(cid:173)
`son-2 on the right. In (b), Person-1 moves to the
`right so that he is partially occluded by Person-2.
`Using a conventional technique such as back(cid:173)
`ground subtraction, it is difficult to maintain the
`separate tracks of the two people in the scene, since
`the images of the two people merge into a single
`large region.
`
`Figure 14 shows a sequence of frames (in normal
`English reading order) in which it is particularly
`difficult to properly maintain the tracks of the two
`people in the scene. In this sequence, Person-2
`moves from right to left and back again, crossing in
`front of Person-1. There are significant occlusions
`(e.g., in the third frame shown), and the orienta(cid:173)
`tions of both people with respect to the camera
`change significantly
`throughout
`the sequence,
`
`(a)
`
`(b)
`
`Figure 13: An example scene containing two people with occlusion
`
`277
`
`Canon Ex. 1004 Page 14 of 20
`
`

`

`Figure 14: A difficult tracking sequence
`making conventional template matching fail on this maintained by the tracking system. P-templates can
`sequence.
`be used to reason about occlusion in a video se(cid:173)
`quence. While we only address the issue of p(cid:173)
`templates for tracking people that are walking up(cid:173)
`right, the concept is applicable to tracking any
`object, e.g., vehicles and crawling people; although
`the shape of the p-template would need to be
`adapted to the type of object being tracked.
`
`When the people in the scene overlap, t

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket