`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`
`Canon Inc., Canon USA, Inc., and Axis Communications AB,
`
`Petitioner,
`
`v.
`
`AVigilon Fortress Corporation,
`
`Patent Owner.
`
`Cases: IPR2019-00311
`
`IPR2019-003 14
`
`US. Patent No. 7,932,923
`
`Issue Date: April 26, 201']
`
`Title: Video Surveillance System Employing Video Primitives
`
`
`
`
`
`
`
`
`
`
`
`
`
`DECLARATION OF BRYAN PATRICK KASIK
`
`
`
`Canon Ex. 1049 Page 1 of 39
`
`
`
`1, Bryan Patrick Kasik, state and declare as follows:
`
`1.
`
`I am a Reference Librarian with the University of Virginia (“UVA”)
`
`Library, 160 McCormick Road, Charlottesville, Virginia 22904.
`
`2.
`
`I am over 18 years of age and am competent to make this Declaration.
`
`I make this Declaration based on my own personal knowledge, based on my
`
`knowledge and review of the business records and practices of the UVA Library.
`
`3.
`
`I am employed at the UVA Library as a Reference Librarian, a
`
`position I have held since July 2016. Before that, I was employed at the UVA Law
`
`Library for 9 years. Through my employment, training, and the other actions as a
`
`Reference Librarian, I have become knowledgeable about the normal business
`
`practices of the UVA Library with respect to how books, articles, and periodicals
`
`are received, cataloged, indexed, shelved, and made available to the public.
`
`4.
`
`Attached as Exhibit A to this Declaration is a true and correct copy of
`
`Brill et al., “Event Recognition and Reliability Improvements for the Autonomous
`
`Video Surveillance System,” Proceedings of the Image Understanding Workshop,
`
`Monterey, California, November 20-23, 1998, Vol. 1, pp. 267—283 (“Brill”).
`
`Exhibit A was obtained from the records that the UVA Library maintains in the
`
`Case Nos. IPR2019—0031 1; IPR2019-003 l4
`U.S. Patent No. 7,932,923
`
`Canon Ex. 1049 Pa 6 2 of 39
`
`ordinary course of its regular activities.
`
`5.
`
`Attached as Exhibit B to this Declaration is a screenshot for Brill in
`
`the UVA Library’s Workflows system, which is the UVA Library’s internal
`
`Canon Ex. 1049 Page 2 of 39
`
`
`
`
`
`Case Nos. IPR2019—0031 l; IPR2019-00314
`
`US. Patent No. 7,932,923
`
`cataloguing system. This record is one that the UVA Library maintains in the
`
`ordinary course of its regular activities as a library. The record contains
`
`information regarding the ordering and receipt of Brill by the UVA Library.
`
`Creation of such records in the Workflows system was and continues to be a
`
`regular practice of UVA Library.
`
`6.
`
`I use and rely on the UVA Library’s Workflows system on a daily
`
`basis as part of my job and have personal knowledge regarding its functions and
`
`the information contained within the Workflows system.
`
`7.
`
`According to the record in Exhibit B, Brill was ordered and entered
`
`into the UVA Library system on July 15, 1999, then assigned a call number on
`
`August 27, 1999.
`
`8.
`
`After receiving a call number at the library, Brill was labeled and
`
`moved to the stacks. This process at the UVA Library typically takes a few days
`
`depending on the volume of books being processed. According to UVA Library’s
`
`normal business practice, Brill would have been moved to the stacks and available
`
`to the public within a few days of August 27, 1999.
`
`9.
`
`Once a reference is moved to the stacks of the UVA Library it is
`
`available to be viewed within the UVA Library by any member of the public.
`
`10. After being added to the UVA Library’s collection, references are
`
`searchable by keywords, including title, author, and date, in UVA’s online catalog
`
`
`
`Canon Ex. 1049 Pa_e 3 of 39
`
`Canon Ex. 1049 Page 3 of 39
`
`
`
`Case Nos. IPR2019-00311; IPR2019—00314
`
`US. Patent No. 7,932,923
`
`system called Virgo. Brill would have been available within the Virgo system as
`
`soon as it was entered into Workflows, allowing a member of the public to search
`
`for and find the 1998 Image Understanding Workshop book.
`
`I personally used
`
`Virgo in 1999 to search for and identify numerous references.
`
`11.
`
`To the best of my knowledge, unless stated otherwise, the above
`
`statements are descriptions of normal business practices at the UVA Library from
`
`at least 1999 through the present.
`
`I have been warned and understand that willful false statements and the like
`
`are punishable by fine or imprisonment, or both (18 U.S.C. § 1001).
`
`I declare that
`
`all statement made in this declaration of my own knowledge are true and all
`
`statement made on information and belief are believed to be true.
`
`I declare under penalty of perjury that the following is true and correct.
`
`Executed on August 15, 2019 in Charlottesville, Virginia.
`
`Bryan Patrick Kasik
`
`Canon Ex. 1049 Page 4 of 39
`
`Canon Ex. 1049 Page 4 of 39
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`EXHIBIT A
`EXHIBIT A
`
`
`Canon EX. 1049 Page 5 of 39
`
`Canon Ex. 1049 Page 5 of 39
`
`
`
`Event Recognition and Reliability Improvements for
`the Autonomous Video Surveillance System
`
`Frank Z. Brill, Thomas J. Olson, and Christopher Tserng
`Texas Instruments
`P.O. Box 655303, MS 8374, Dallas, TX 75265
`brill@csc.ti.com, olson@csc.ti.com, tsemg@csc.ti.com
`
`Abstract
`
`This report describes recent progress in the devel(cid:173)
`opment of the Autonomous Video Surveillance
`(AVS) system, a general-purpose system for mov(cid:173)
`ing object detection and event recognition. AVS
`analyses live video of a scene and builds a descrip(cid:173)
`tion of the activity in that scene. The recent
`enhancements to AVS described in this report are:
`(1) use of collateral information sources, (2) cam(cid:173)
`era hand-off, (3) vehicle event recognition, and (4)
`complex-event recognition. Also described is a
`new segmentation and tracking technique and an
`evaluation of AVS performing the best-view selec(cid:173)
`tion task.
`
`1. Introduction
`
`The Autonomous Video Surveillance (AVS) sys(cid:173)
`tem processes live video streams from surveillance
`cameras to automatically produce a real-time map(cid:173)
`based display of the locations of people, objects
`and events in a monitored region. The system al(cid:173)
`lows
`a user
`to
`specify alarm conditions
`interactively, based on the locations of people and
`objects in the scene, the types of objects in the
`scene, the events in which the people and objects
`are involved, and the times at which the events oc(cid:173)
`cur. Furthermore, the user can specify the action to
`take when an alarm is triggered, e.g., to generate an
`audio alarm or write a log file. For example, the
`user can specify that an audio alarm should be trig(cid:173)
`gered if a person deposits a briefcase on a given
`table between 5:00pm and 7:00am on a weeknight.
`Section 2 below describes recent enhancements to
`
`This research was sponsored in part by the DARPA Image
`Understanding Program.
`
`the AVS system. Section 3 describes progress in
`improving the reliability of segmentation and
`tracking. Section 4 describes an experiment that
`quantifies the performance of the AVS "best view
`selection" capability.
`
`2. New AVS functionality
`
`The structure and function of the AVS system is
`described in detail in a previous IUW paper [Olson
`and Brill, 1997]. The primary purpose of the cur(cid:173)
`rent paper is to describe recent enhancements to
`the AVS system. These enhancements are de(cid:173)
`scribed in four sections below: (1) collateral
`information sources, (2) camera hand-off, (3) vehi(cid:173)
`cle event recognition, and (4) complex-event
`recognition.
`
`2.1. Collateral information sources
`
`Figure 1 shows a diagram of the AVS system. One
`or more "smart" cameras process the video stream
`to recognize events. The resulting event streams
`are sent to a Video Surveillance Shell (VSS),
`which integrates the information and displays it on
`a map. The VSS can also generate alarms based on
`the information in the event streams. In recent
`work, the VSS was enhanced to accept information
`from other sources, or "recognition devices" which
`can identify the objects being reported on by the
`cameras. For example, a camera may report that
`there is a person near a door. A recognition device
`may report that the person near the door is Joe
`Smith. The recognition device may be a badge
`reader, a keypad in which a person types their PIN,
`a face recognition system, or other recognition sys(cid:173)
`tem.
`
`Canon Ex. 1049 Page 6 of 39
`
`
`
`....------,
`Map
`Video ~ ''==D=is=p=lay==I
`Surveillance
`_
`events
`Shell
`c:(})]
`- - - - - (VSS)
`LLti:::J ~ ~
`Smart Camera 2
`audio output
`F===4
`l__)
`log files
`
`I
`
`'
`
`snapshots
`
`monitors
`event filtering
`
`--~l--0171.5-
`
`Smart Camera 1
`
`-
`
`ID Device
`
`Figure I: AVS system diagram
`
`The recognition device we have incorporated is a
`voice verification system. The user stands in a pre(cid:173)
`defined location in the room, and speaks his or her
`name. The system matches the utterance to previ(cid:173)
`ously captured examples of the person speaking
`their name, and reports to the VSS if there is a
`match. The VSS now knows the identity of the per(cid:173)
`son being observed, and can customize alarms
`based on the person's identity.
`
`A recognition device could identify things other
`than people, and could classify actions instead of
`objects. For example, the MIT Action Recognition
`System (MARS) recognizes actions of people in
`the scene, such as raising their arms or bending
`over. MARS is trained by observing examples of
`the action to be recognized and forming "temporal
`templates" that briefly describe the action [Davis
`and Bobick, 1997). At run time, MARS observes
`the motion in the scene and determines when the
`motion matches one of the stored temporal tem(cid:173)
`plates. TI has obtained an evaluation copy of the
`
`MARS software and used it as an recognition de(cid:173)
`vice which identifies actions, and sends the result
`to the AVS VSS. We successfully trained MARS to
`recognize the actions of opening a door, and open(cid:173)
`ing the drawer of a file cabinet. When MARS
`recognizes these actions, it sends a message to the
`AVS VSS, which can generate an appropriate
`alarm.
`
`2.2. Camera hand-off
`
`As depicted in Figure 1, the AVS system incorpo(cid:173)
`rates multiple cameras to enable surveillance of a
`wider area than can be monitored via a single cam(cid:173)
`era. If the fields of view of these cameras are
`adjacent, a person can be tracked from one moni(cid:173)
`tored area to another. When the person leaves the
`field of view of one camera and enters another, the
`process of maintaining the track from one camera
`view to another is termed camera hand-off Figure
`2 shows an area monitored by two cameras. Cam-
`
`file
`
`Rilgions Monitc:irs
`
`"-
`ro,
`a,.,.
`Figure 2: Multiple cameras with adjacent fields of view
`
`268
`
`Canon Ex. 1049 Page 7 of 39
`
`
`
`the hallway, and Camera-2
`era- I monitors
`monitors the interior of the room. When a person
`moves through the doorway to enter the room from
`the hall or vice-versa, camera hand-off is necessary
`to enable the system to know that the person that
`was being monitored in the hall via Camera- I is
`the same as the person being monitored in the
`room via Camera-2.
`
`The AVS system accomplishes camera hand-off by
`integrating the information from the two cameras
`in the map coordinate system. The AVS "smart"
`cameras report the locations of the monitored ob(cid:173)
`jects and people in map coordinates, so that when
`the VSS receives reports about a person from two
`separate cameras, and both cameras are reporting
`the person's coordinates at about the same map lo(cid:173)
`cation, the VSS can deduce that the two separate
`reports refer to the same person. In the example de(cid:173)
`picted in Figure 2, when a person is standing in the
`doorway, both cameras can see the person and re(cid:173)
`port his or her location at nearly the same place.
`The VSS reports this as one person, using a mini(cid:173)
`mum distance to allow for errors in location. When
`Camera-2 first sees a person at a location near the
`doorway and reports this to the VSS, the VSS
`checks to see if Camera- I recently reported a per(cid:173)
`son near the door. If so, the VSS reports the person
`in the room as the same one that Camera- I had
`been tracking in the hall.
`
`2.3. Vehicle event recognition
`
`This section describes extensions to the existing
`AVS system that enable the recognition of events
`involving interactions of people with cars. These
`new capabilities enable smart security cameras to
`monitor streets, parking lots and driveways and re(cid:173)
`port when suspicious events occur. For example, a
`smart camera signals an alarm when a person exits
`a car, deposits an object near a building, reenters
`the car, and drives away.
`
`2.3.1. Scope and assumptions
`
`Extending the AVS system to handle human-vehi(cid:173)
`cle interactions reliably involved two separable
`subproblems. First, the system's vocabulary for
`events and objects must be extended to handle a
`new class of object (vehicle) and new event types.
`Second, the AVS moving object detection and
`tracking software must be modified to handle the
`outdoor environment, which features variable
`lighting, strong shadows, atmospheric disturbanc-
`
`and dynamic backgrounds. The work
`es,
`described here in section 2.3 addresses the first
`problem, to extend the system for vehicle events in
`conditions of uniform overcast with little wind.
`Our approach to handling general outdoor lighting
`conditions is discussed in section 4.
`
`The method is further specialized for imaging con(cid:173)
`ditions in which:
`
`1. The camera views cars laterally.
`2. Cars are unoccluded by other cars.
`3. When cars and people overlap, only one of
`the overlapping objects is moving
`4. The events of interest are people getting
`into and out of cars.
`
`2.3.2. Car detection
`
`The first thing that was done to expand the event
`recognizing capability of the current system was to
`give the system the ability to distinguish between
`people and cars. The system classifies objects as
`cars by using their sizes and aspect ratios. The size
`of an object in feet is obtained using the AVS sys(cid:173)
`tem's
`image coordinate
`to world coordinate
`mapping. Once the system has detected a car, it an(cid:173)
`alyzes the motion graph to recognize new events.
`
`2.3.3. Car event recognition
`
`In principle, car exit and car entry events could be
`recognized by detecting characteristic interactions
`of blobs in difference images, in a manner similar
`to the way AVS recognizes DEPOSIT and RE(cid:173)
`MOVE events. In early experiments, however, this
`method turned out to be unsatisfactory because the
`underlying motion segmentation method did not
`segment cars from people. Whenever the people
`pass near the car they appear to merge with it, and
`track is lost until they walk away from it.
`
`To solve this problem, a new approach involving
`additional image differencing was developed. The
`technique allows objects to be detected and tracked
`even when their images overlap the image of the
`car. This method requires two reference images:
`one consists of the original background scene
`(background image), and the other is identical to
`the first except it includes the car. The system takes
`differences between the current video image and
`the original reference image as usual. However, it
`also differences the current video image with the
`reference image containing the car. This allows the
`
`Canon Ex. 1049 Page 8 of 39
`
`
`
`system to detect objects which may be overlapping
`the car. Using this technique, it is easy to detect
`when people enter and exit a car. If an object disap(cid:173)
`pears while overlapping with a car, it probably
`entered the car. Similarly, if an object appears over(cid:173)
`lapping a car, it probably exited the car.
`
`2.3.4. Basic method
`
`When a car comes to rest, the following steps are
`taken. First, the image of the car object is removed
`from its frame and stored. Then, the car image is
`merged with the background image, creating an
`updated reference image containing the car. (Ter(cid:173)
`minology: a reference car image is the subregion
`of the updated reference image that contains the
`car.) Then, the car background image, the region of
`
`the original background image that is replaced by
`the car image, is stored.
`
`For each successive frame, two difference images
`are generated. One difference image, the fore(cid:173)
`calculated by
`is
`image,
`ground difference
`differencing the current video image with the up(cid:173)
`dated reference image. The foreground difference
`image will contain all the blobs that represent ob(cid:173)
`jects other than the car, including ones that overlap
`the car. The second difference image, the car dif(cid:173)
`the car
`is calculated using
`image,
`ference
`background image. The car difference image is
`formed from the difference between the current
`frame and the car background image, and contains
`the large blob for the car itself. Figures 3 and 4
`show the construction and use of these images.
`
`I
`
`(a)
`
`(b)
`
`(c)
`
`Figure 3: (a) Background image. (b) Car background image.
`(c) Updated reference image
`
`....
`
`'
`
`•
`
`(a)
`
`'············-·······
`
`(b)
`
`Figure 4: (a) Current video image. (b) Foreground difference image
`
`270
`
`Canon Ex. 1049 Page 9 of 39
`
`
`
`---
`,-✓
`stored car
`I
`object
`I
`I
`
`,
`r -
`-
`L
`.J
`removed
`car object
`
`*
`. ..
`
`/
`
`---
`
`-- "1111-
`
`car object
`
`I
`
`/
`
`frame prior to
`car resting
`
`car resting
`frame
`
`/
`
`/
`
`...
`
`- -
`
`- -
`
`- -
`
`- .... _
`
`current
`car object
`
`reference frame
`
`- -
`
`- .....
`/ ~
`... ---
`
`previous frame
`
`current frame
`
`Figure 5: Creation of the motion graph.
`The starred frame represents the frame prior to the background image being updated.
`
`The blobs in the foreground difference image are
`grouped into objects using the normal grouping
`heuristics and placed in the current frame. The
`blobs in the car difference image necessarily repre(cid:173)
`sent the car, so they are all grouped into one current
`car object and placed in a special reference frame.
`Normal links occur between objects in the previous
`frame and objects in the current frame. Additional(cid:173)
`ly, the stored car object, which was removed from
`its frame, (from Step 1) is linked to the current car
`object which is in the reference frame. In any given
`sequence, there is only one reference frame.
`
`Figure 5 demonstrates the creation of this new mo(cid:173)
`tion graph. As indicated by the dotted lines, all
`objects maintain their tracks using this method.
`Notice that even though the car object disappears
`from future frames ( due to the updated reference
`image), it is not detected to have exited because its
`track is maintained throughout every frame. Using
`this method, the system is able to keep track of the
`car object as well as any objects overlapping the
`car. If an object appears intersecting a car object,
`
`an INCAR event is reported. If an object disap(cid:173)
`pears while intersecting a car object, an OUTCAR
`event is reported. Figure 6 shows the output of the
`system. The system will continue to operate in this
`manner until the car in the reference frame begins
`to move again.
`
`When the car moves again, the system reverts to its
`normal single-reference-image state. The system
`detects the car's motion based on the movement of
`its centroid. It compares the position of the cen(cid:173)
`troid of the stored car object with the centroid of
`the current car object. Figure 7 shows the slight
`movement of the car.
`
`Figure 6: Final output of system
`
`cD
`
`e-9
`
`(a)
`
`(c)
`
`(d)
`
`Figure 7: (a) Reference car image. (b) Moving car image.
`( c) Reference car difference image. ( d) Moving car difference image
`
`271
`
`Canon Ex. 1049 Page 10 of 39
`
`
`
`-
`
`- - - --
`
`-~-
`
`current
`car object
`
`---
`,-✓
`stored car
`I
`object
`I
`I
`
`.,
`r -
`-
`L
`.J
`removed
`car object
`
`*
`. ..
`
`/
`
`/
`
`,,.- -~t
`It- ..._
`
`car resting
`frame
`
`previous frame
`
`reference frame
`
`- -
`-~
`
`current frame
`
`/
`
`/
`
`' ..
`
`-- "-- I
`
`/
`
`car object
`
`frame prior to
`car resting
`
`Figure 5: Creation of the motion graph.
`The starred frame represents the frame prior to the background image being updated.
`
`The blobs in the foreground difference image are
`grouped into objects using the normal grouping
`heuristics and placed in the current frame. The
`blobs in the car difference image necessarily repre(cid:173)
`sent the car, so they are all grouped into one current
`car object and placed in a special reference frame.
`Normal links occur between objects in the previous
`frame and objects in the current frame. Additional(cid:173)
`ly, the stored car object, which was removed from
`its frame, (from Step 1) is linked to the current car
`object which is in the reference frame. In any given
`sequence, there is only one reference frame.
`
`Figure 5 demonstrates the creation of this new mo(cid:173)
`tion graph. As indicated by the dotted lines, all
`objects maintain their tracks using this method.
`Notice that even though the car object disappears
`from future frames ( due to the updated reference
`image), it is not detected to have exited because its
`track is maintained throughout every frame. Using
`this method, the system is able to keep track of the
`car object as well as any objects overlapping the
`car. If an object appears intersecting a car object,
`
`an INCAR event is reported. If an object disap(cid:173)
`pears while intersecting a car object, an OUTCAR
`event is reported. Figure 6 shows the output of the
`system. The system will continue to operate in this
`manner until the car in the reference frame begins
`to move again.
`
`When the car moves again, the system reverts to its
`normal single-reference-image state. The system
`detects the car's motion based on the movement of
`its centroid. It compares the position of the cen(cid:173)
`troid of the stored car object with the centroid of
`the current car object. Figure 7 shows the slight
`movement of the car.
`
`Figure 6: Final output of system
`
`eD
`
`(a)
`
`··········,,•,..~·•·.
`
`(b)
`
`(c)
`
`(d)
`
`Figure 7: (a) Reference car image. (b) Moving car image.
`(c) Reference car difference image. (d) Moving car difference image
`
`271
`
`Canon Ex. 1049 Page 11 of 39
`
`
`
`--
`
`link from
`before car
`resting/,,,,.
`/
`
`__.,
`
`I
`I
`
`... D ...
`
`stored car
`object
`
`H
`
`:
`
`:
`
`:
`
`" ..........
`
`\
`
`:
`
`·------·*
`~ :l~>--Y•
`
`.---------,
`
`movmg car
`object
`
`next frame
`Figure 8: Restoration of normal differencing. The starred frame represents the last frame prior to the
`original reference image being restored.
`
`current frame
`
`previous frame
`
`If the centroid locations differ by more than a
`threshold, the following sequence of events occur
`to restore the system to its original state:
`
`1. An object representing the moving car is
`created in the current frame.
`2. The stored car object is linked to this new
`moving car object in the current frame.
`3. Objects in the previous frame that intersect
`the moving car are removed from that
`frame.
`4. The car background image is merged with
`the updated reference image to restore the
`original reference image.
`5. Normal differencing continues.
`
`Figure 8 demonstrates how the system is restored
`to its original state. Note that there is one continu(cid:173)
`ous track that represents the path of the car
`throughout.
`
`When the car begins to move again, transient blobs
`appear in the foreground difference image due to
`the fact that the car is in the updated reference im(cid:173)
`age as seen in Figure 9. Therefore, to create a new
`moving car object in the current frame, these tran(cid:173)
`sient objects, which are
`identified by
`their
`intersection with the location of the resting car, are
`
`grouped together as one car object. If there are no
`transient objects, a copy of the stored car object is
`inserted into the current frame. This way, there is
`definitely a car object in the current frame to link
`with the stored car object. Transient objects might
`also appear in the previous frame when a car is
`moving. Therefore, these transient objects must be
`removed from their frame in order to prevent them
`from being linked to the new moving car object
`that was just created in the current frame. After the
`steps described above occur, the system continues
`as usual until another car comes to rest.
`
`2.3.5. Experiments: disk-based sequences
`
`To test the principles behind the modified AVS sys(cid:173)
`tem, three sequences of video that represented
`interesting events were captured to disk. These se(cid:173)
`quences represented events which the modified
`system should be able to recognize. Capturing the
`sequences to disk reduces noise and ensures that
`the system processes the same frames on every run,
`making the results deterministic. In addition to
`these sequences, longer sequences were recorded
`and run directly from videotape to test how the sys(cid:173)
`tem would work under less ideal conditions.
`
`(a)
`
`(b)
`
`•
`
`(c)
`
`Figure 9: (a) Updated reference image. (b) Current video image. (c) Foreground difference image
`
`272
`
`Canon Ex. 1049 Page 12 of 39
`
`
`
`2.3.5.1. Simple sequence. The first sequence was
`filmed from the 3rd story of an office building
`overlooking the driveway in front of the building.
`A car drives up and a person exits the car, walks
`away, deposits a briefcase, and finally reenters the
`car. Then, the car drives away. In this segment, the
`system successfully detects the person exiting the
`car. However, the person entering the car is missed
`because the person gets grouped with a second per(cid:173)
`son walking near the car.
`
`Further on in the sequence, the car drives up again
`and a person exits the car, walks away, removes the
`briefcase, and finally reenters the car. Again, the
`car drives away. In this segment, both the person
`entering and exiting the car are recognized. In both
`these sequences, there was only the one false nega(cid:173)
`tive mentioned earlier and no false positives.
`
`sequence was
`2.3.5.2. Pickup sequence. This
`filmed in front of a house looking at the street in
`front of the house. In the sequence, a person walks
`into the scene and waits at the curb. A car drives
`up, picks up the person, and drives away. The sys(cid:173)
`tem correctly detects the person entering the car.
`There are no false positives or negatives.
`
`sequence was
`2.3.5.3. Drop off sequence. This
`filmed in the same location as the previous one. In
`this sequence, a car drives up and a person is
`dropped off. The car drives away with the person
`still standing in the same location. Then, the person
`walks off. The system correctly detects the person
`exiting the car and does not report a false enter
`event when the car moves away.
`
`2.3.6. Experiments: videotaped sequences
`
`These sequences were run on the system straight
`from videotape. These were all run at a higher
`threshold to accommodate noise on the videotape.
`However, this tended to decrease the performance
`of the system.
`
`2.3.6.1. Dark day. This is a 15 minute sequence
`that was recorded from the 3rd floor of a building
`on a fairly dark day. In that time span, 8 cars passed
`through the camera's field of view. The system de(cid:173)
`tected 6 cars correctly and one false car ( due to
`people grouped together). One car that was not de(cid:173)
`tected was due to its small size. The other car was
`undetected because the system slowed down ( due
`to multiple events occurring) and missed the imag-
`
`es with the car in them. In this sequence, two
`people entered a car. However, both events were
`missed because the car was not recognized as rest(cid:173)
`ing due to the dark lighting conditions on this rainy
`day.
`
`2.3.6.2. Cloudy day. This is a 13 minute sequence
`in the same location as the previous sequence ex(cid:173)
`cept it is a cloudy day. In this time span, 9 cars
`passed through the camera's field of view and all of
`them were detected by the system. There were a to(cid:173)
`tal of 2 people entering a car and 2 people exiting a
`car. The system successfully detected them all. Ad(cid:173)
`incorrectly reported one person
`it
`ditionally,
`walking near a car as an instance of a person exit(cid:173)
`mg a car.
`
`2.3.6.3. Cloudy day-extended time. This is a 30
`minute sequence in the same location as the previ(cid:173)
`ous two. In this time span, 28 cars pass through and
`all of them were detected. The system successfully
`detected one person exiting a car but missed two
`others. The two people were missed because the
`car was on the edge of the camera's field of view
`and so it was not recognized immediately as a car.
`
`2.3.7. Evaluation of car-event recognition
`
`The modified AVS system performs reasonably
`well on the test data. However, it has only been
`tested on a small number of videotaped sequences,
`in which much of the action was staged. Further
`experiments and further work with live, uncon(cid:173)
`trolled data will be required to make the system
`handle outdoor vehicle events as well as it handles
`indoor events. The technique of using multiple ref(cid:173)
`erence images is interesting and can be applied to
`other problems, e.g. handling repositioned furni(cid:173)
`ture in indoor environments. For more detail on
`this method, see [Tsemg, 1998].
`
`2.4. Complex events
`
`The AVS video monitoring technology enables the
`recognition of specific events such as when a per(cid:173)
`son enters a room, deposits or picks up an object,
`or loiters for a while in a given area. Although
`these events are more sophisticated than those de(cid:173)
`tected via simple motion detection, they are still
`unstructured events that are detected regardless of
`the context in which they occur. This can result in
`alarms being generated on events that are not of
`interest.
`
`273
`
`Canon Ex. 1049 Page 13 of 39
`
`
`
`For example, if the system is monitoring a room or
`store with the intention of detecting theft, the sys(cid:173)
`tem could be set up to generate an alarm whenever
`an object is picked up (i.e., whenever a REMOVE
`event occurs). However, no theft has occurred un(cid:173)
`less the person leaves the area with the object. A
`simple, unstructured event recognition system
`would generate an alarm every time someone
`picked up an object, resulting in many false alarms;
`whereas a system that can recognize complex
`events could be programmed to only generate an
`alarm when the REMOVE event is followed by an
`EXIT event. The EXIT event provides context for
`the REMOVE event that enables the system to fil(cid:173)
`ter out uninteresting cases in which the person does
`not leave the area with the object they picked up.
`This section describes the design and implementa(cid:173)
`tion of such a complex-event recognition system.
`
`We use the term simple event to mean an unstruc(cid:173)
`tured atomic event. A complex event is structured,
`in that it is made up of one or more sub-events. The
`sub-events of a complex event may be simple
`events, or they may be complex, enabling the defi(cid:173)
`nition of event hierarchies. We will simply say
`event to refer to an event that may be either simple
`or complex. In our theft example above, REMOVE
`and EXIT are simple events, and THEFT is a com(cid:173)
`plex event. A user may also define a further event,
`e.g., CRIME-SPREE, which may have one or more
`complex THEFT events as sub-events.
`
`We created a user interface that enables definition
`of a complex event by constructing a list of sub(cid:173)
`events. After one or more complex events have
`been defined, the sub-events of subsequently de(cid:173)
`fined complex events can be complex events
`themselves.
`
`2.4.1. Complex-event recognition
`
`Once the user has defined the complex events and
`the actions to take when they occur, the event rec(cid:173)
`ognition system recognizes these events as they
`occur in the monitored area. For the purposes of
`this section, we assume a priori that the simple
`events can be recognized, and that the object in(cid:173)
`the
`In
`tracked.
`can be
`them
`volved m
`implementation we will use the methods discussed
`in [Courtney, 1997, Olson and Brill, 1997] to track
`objects and recognize the simple events. In order to
`recognize a complex event, the system must keep a
`record of the sub-events that have occurred thus
`
`far, and the objects involved in them. Whenever the
`first sub-event in a complex event's sequence is
`recognized, an activation for that complex event is
`created. The activation contains the ID of the ob(cid:173)
`ject involved in the event, and an index, which is
`the number of sub-events in the sequence that have
`been recognized thus far. The index is initialized to
`1 when the activation is created, since the activa(cid:173)
`tion is only created when the first sub-event
`matches. The system maintains a list of current ac(c