throbber
ATTACHMENTF
`
`TO REQUEST FOR EX PARTE REEXAMINATION OF
`U.S. PATENTNO. 7,868,912
`
`AVIGILONEX. 2005
`
`IPR2019-00236
`Page 1| of 18
`
`AVIGILON EX. 2005
`IPR2019-00236
`Page 1 of 18
`
`

`

`Moving Object Detection and Event Recognition Algorithms for Smart Cameras
`
`Thomas J. Olson
`Frank 7. Brill
`
`Texas Instruments
`Research & Development
`P.O. Box 655303, MS 8374, Dallas, TX 75265
`E-mail: olson@esc.ti.com, brill @ti.com
`http.//www.ti.com/research/docsAuba/indexhtm!
`
`Abstract
`
`Smart video cameras analyze the video stream and
`translate if into a description of the scene in terms
`of objects, object motions, and events, This paper
`describes a set of algorithms for the core computa-
`tions needed to baild smart cameras. Together
`these algorithms make up the Autonomous Video
`Surveillance (AVS)
`system, 4 general-purpose
`framework for moving object detection and event
`recognition. Moving objects are detected using
`change detection, and are tracked using first-order
`prediction and nearest neighbor matching. Events
`are recognized by applying predicates ta the graph
`formed by linking corresponding objects in succes-
`sive frames.The AVS algorithms have been used to
`create several novel video surveillance applica-
`tions. These include a video surveillance shell that
`allows a hurnan to monitor the outputs of multiple
`cameras, a system that takes a single high-quality
`snapshot of every person who enters its field of
`view, and a system that learns the structure of the
`monitored environment by watching humans move
`around in the scene.
`
`1 Introduction
`
`ages and video clips, but these will be carefully
`selected to maximize their useful information con-
`ient. The symbolic information and images from
`smart cameras will be filtered by programs that ex-
`tract data relevant to particular tasks. This filtering
`process will enable a single human to monitor hun-
`dreds or thousands of video streams.
`
`in pursuit of ovr research objectives [Flinchbaugh,
`1997}, we are developing the technology needed to
`make smart cameras a reality. Two fundamental ca-
`pabilities are needed. The first is the ability to
`describe scenes in terms of object motions and in-
`teractions. The second is the ability ts recognize
`important events that occur in the scene, and to
`pickout those that are relevant to the current task.
`These capabilities make it possible to develop a va-
`riety of novel and useful video surveillance
`applications.
`
`1.1 Video Surveillance and Monitoring
`Scenarios
`
`Our work is motivated by a several types of video
`surveillance and monitoring scenarios.
`
`Video cameras today produce images, which must
`be examined by humans in order to be useful. Fu-
`ture ‘smart’ video cameras will praduce infor-
`mation, inchiding descriptions of the environment
`they are monitoring and the events taking place in
`it. The information they produce may inchide im~-
`
`The research described in this report was sponsored in part by
`the DARPA Image Understanding Program.
`
`Indoor Surveillance: Indoor surveillance provides
`information about areas such as building lobbies,
`hallways, and offices. Monitoring tasks in Iobbies
`and hallways include detection of peaple depasit-
`ing things (e.g., unattended luggage in an airport
`founge), removing things (e.z., theft}, or loitering.
`Office monitoring tasks typically require informa-
`tion about people's identities:
`in an office, for
`example, the office owner may do anything at any
`
`159
`
`AVIGILONEX. 2005
`
`IPR2019-00236
`Page 2 of 18
`
`AVIGILON EX. 2005
`IPR2019-00236
`Page 2 of 18
`
`

`

`time, but other people should not open desk draw-
`ers or operate the computer unless the owner is
`present, Cleaning staff may come in at might to vac-
`uum and empty trash cans, but should not handle
`objects on the desk.
`
`Outdoor Surveillance: Outdoor surveillance in-
`chides tasks such as monitoring a site perimeter for
`intrusion or threats from vehicles (e.g., car bombs).
`In military applications, video survedlance can
`function as a seniry or forward observer, e.g. by
`notifying commanders when
`enemy
`soldiers
`emerge from a wooded area or cross a road.
`
`In order for smart cameras to be practical for real-
`world tasks, the algorithms they use must be ro-
`bust. Current
`commercial
`video surveillance
`systems have a high false alarm rate [Ringler and
`Hoover, 1995], which renders them useless for
`most applications. For this reason, our research
`stresses robustness and quantification of detection
`and false alarm rates. Smart camera algorithms
`must also oun effectively on low-cost platforms, so
`that they can be implemented in small, low-power
`packages and can be used im large numbers. Study-
`ing algorithms that can run in near real time makes
`it practical
`to conduct extensive evaluation and
`testing of systems, and may enable worthwhile
`near-term applications as well as contributing to
`long-term research goals.
`
`1.2 Approach
`
`The first step in processing a video stream for sur-
`veillance purposes is to identify the important
`objects in the scene. In this paper it is assumed that
`the important objects are those that move indepen-
`dently, Camera parameters are assumed to be fixed.
`This allows the use of simple change detection to
`identify moving objects. Where use of moving
`cameras is necessary, stabilization hardware and
`stabilized moving object detection algorithms can
`be used (e.g. [Burt et al, 1989, Nelson, 1991], The
`use of criteria other than motion fe.g., salience
`based on shape or color, or more general object
`recognition) is compatible with our approach, but
`these criteria
`are not
`used
`in our
`current
`applications,
`
`Our event recognition algorithms are based on
`graph matching. Moving objects in the image are
`
`tracked over time. Observations of an object in suc-
`cessive video frames are linked to form a directed
`graph (the motion graph}. Events are defined in
`terms of predicates on the motion graph. For in-
`stance,
`the beginning of a chain of successive
`observations of an object is defined to be an EN-
`TER event. Event detection is described in more
`detail below.
`
`Our approach to video surveillance stresses 2D,
`image-based algorithms and simple, low-level ob-
`ject representations that can be extracted reliably
`from the video sequence. This emphasis yields a
`high level of robustness and low computational
`cost. Object recognition and other detailed analy-
`ses are used only after the systern has determined
`that the objects in question are interesting and mer-
`it further investigation.
`
`L3 Research Strategy
`
`The primary technical goal ofthis research is to de-
`velop general-purpose algonthms
`for moving
`object detection and event recognition. These algo-
`rims
`comprise
`the Axtenemous
`Video
`Surveillance (AVS) system, a modular framework
`for building video surveillance applications. AVS
`is designed to be updated to incorporate better core
`algorithms or to tune the processing to specific do-
`mains as our research progresses,
`
`In order to evaluate the AVS core algorithms and
`event recognition and tracking framework, we use
`them to develop applications motivated by the sur-
`veillance
`scenarios
`described
`above.
`The
`applications are smali-scale implementations of fu-
`ture smart camera systems. They are designed for
`long-term operation, and are evaluated by allowing
`them te mn for long periods (hours or days) and
`analyzing their output.
`
`The remainder of this paper is organized as fol-
`lows. The next section discusses related work.
`Section 3 presents the core moving object detection
`and event recognition algorithms, and the mecha-
`nism used te establish the 3D positions of objects.
`Section 4 presents applications that have been built
`using the AVS framework. The fina! section dis-
`cusses the current state of the system and our
`future plans.
`
`160
`
`AVIGILONEX. 2005
`
`IPR2019-00236
`Page 3 of 18
`
`AVIGILON EX. 2005
`IPR2019-00236
`Page 3 of 18
`
`

`

`2 Related Work
`
`Our overall approach to video surveillance has
`been influenced by interest in selective attention
`and task-oriented processing [Swain and Stricker,
`1991, Rimey and Brown, 1993, Camus et al,
`1993]. The fundamental problem with current vid-
`ex surveillance technology is
`that
`the useful
`information density of the images delivered to a
`hurnan is very law; the vast majority of surveil-
`lance video frames contain no useful information
`at al, The fundamental role of the smart camera
`deserthed above is ta reduce the volume of data
`produced by the camera, and increase the value of
`that data. It does this by discarding irrelevant
`frames, and by expressing the information in the
`relevant frames primarily in symbolic form.
`
`2.1 Moving Object Detection
`
`Most algorithms for moving object detection using
`fixed cameras work by comparing incoming video
`frames to a reference image, and attributing signifi-
`cant differences either to motion or to noise. The
`algorithms differ in the form of the comparison op-
`erator they ase, and in the way in which the
`reference image is maintained. Simple intensity
`differencing followed by thresholding is widely
`used [Jain et al., 1979, Yalamanchili et al., 1982,
`Kelly et al., 1995, Bobick and Davis, 1996, Court-
`ney,
`[997]
`because
`it
`i8
`computationally
`inexpensive and works guite well in many indoor
`environments. Same algorithms provide a means of
`adapting the reference image over time, in order fo
`track slow changes in lighting conditions and/or
`changes in the environment [Karmann and von
`Brandt, 1990, Makarov, 1996a]. Some also filter
`the image to reduce or remove low spatial frequen-
`cy content, which again makes the detector fess
`sensitive to lighting changes {Makarov et al.,
`1996b, Keller et al., 19941.
`
`Recent wark [Pentland, 1996, Kahn et al., 1996]
`has extended the basic change detection paradigm
`by replacing the reference image with a statistical
`model of the background. The comparison operator
`becomes a statistical test that estimates the proba-
`bility that the observed pixel value belangs to the
`background,
`
`Gur baseline change detection algorithm uses
`thresholded absolute differencing, since this works
`well for our indoor surveillance scenarios, For ap-
`plications where Hghting change is a problem, we
`use the adaptive reference frame algorithm of Kar-
`mann and von Brandt
`[1990]. We are also
`expenmenting with a probabilistic change detector
`stmilar to Pinder [Pentland, 1996.
`
`Our work assumes fixed cameras. When the cam-
`era is not fixed, simple change detection cannot be
`used because of background motion. One approach
`to this problem is to treat the scene as a collection
`of independently moving objects, and to detect and
`ignore the visual motion due te camera motion
`fe.g. Burt et al, 1989] Other researchers have pro-
`posed ways of detecting features of the optical flaw
`that are Inconsistent with a hypeathesis of self mo-
`tion [Nelson, 1991].
`
`In many of our applications moving object detwe-
`tion is a prelude to person detection. There has
`been significant recent progress in the development
`of algorithms to locate and track humans. Pfinder
`(cited above) uses a coarse statistical model of hn-
`man body geometry and motion to estimate the
`likelihood that a given pixel is part of a human.
`Several researchers have described methods of
`tracking human body and imb movements [Gavri-
`fa and Davis, 1996, Kakadiaris and Metaxas, 1996]
`and locating faces in images [Sung and Poggio,
`1994, Rewley et al.,
`[996]. Intille and Bobick
`{1995] describe methods of tracking humans
`through episodes of mutual occlusion in a highly
`structured environment. We do not currently make
`use of these techniques in live experiments because
`of their computational cost. However, we expect
`that this type of analysis will eventually be an im-
`portant part of smart camera processing.
`
`2.2 Event Recognition
`
`Mosi work on event recognition has focussed on
`events that consist of a well-defined sequence of
`primitive motions. This class of events can be con-
`verted into spatiotemporal patterns and recognized
`using statistical pattern matching techniques. A
`number of researchers have demonstrated algo-
`rithms for recagnizing gestures and sign language
`fe.g., Stamer and Pentland, 1995}. Robick and
`Davis [1996] describe a method ofrecognizing ste-
`
`161.
`
`AVIGILONEX. 2005
`
`IPR2019-00236
`Page 4 of 18
`
`AVIGILON EX. 2005
`IPR2019-00236
`Page 4 of 18
`
`

`

`
`
`
`
`Reference image
`
`Difference image
`Thresholded tmage
`Video Frame
`Figur: f: Image procesaine steps for moving object detection,
`
`reotypical motion pattem corresponding i
`aétions such as sittingdown, walking, or waving.
`
`Cer approach ia event recognition is based on the
`video database indexing work of Courtney [199774,
`which imrodaced the use af predicates on the mo-
`tat graph fo represent events. Motion graphs are
`welf sulted fo representing abstract, generic events
`such aa “depositing an obyeer’ or “coming Hi reat’,
`witch are difcult ts captons using the pattern:
`based anproaches ssferred tn above. On the other
`hand, pattern-hases! approaches can represent cam
`plex motions auch as “throwing an abject or
`waving’ which would be dificult to express ualng.
`motion graphs. fois Akely that both pattern-based
`anit abstract event recognitian techmqass etl be
`neededt3 handle the full range af events thal are of
`mterest in surveillance, applications.
`
`3 AYS Tracking and Event Recognition
`Algerithms
`
`This section deserihes the core icchnologias. that
`provide the wider surveillance and niinitoring ca-
`pabiitics af the AVS systers. There are threes key
`lachnologige: moving object detection,
`vidual
`tracking, and event recognition, The moving ohjset
`detection routines determine when one or mare ob
`jocks onier a monitored scene, decide which pixels
`in a given video frame correspand ky the moving
`objects versas which pixels correspond to the back~
`ground, and fom a simple representation afthe
`oblsct’s gnage in the video frame. This represente-
`tion is seferred to ae a motion region, and lt exists
`ia a aingle video frame, as distinguished fromthe
`warkd adjecis which exist in the world and give rise
`to ghe moths Melons.
`
`Visual tracking consists of determining carrespon-
`ences between the matin regians over 2
`sequence ofvuleo frames, and maintaining a single
`representation, or raed, for the world object which
`gave tise ta the sequienceaf motion regions in the
`sequencs of frames. Mnally, event recognition is &
`means af analyaimgthe collection of tracks m order
`to identify events af interest ineoiving the world
`abjects represented by the tracks.
`
`3.4 MovingObject Detection
`
`The muving object detection teckhnslazy we em-
`ploy isa 2D changedetection technique siralar to
`thal deseribed in Jain ef al. [1878 anc Yalamas-
`chit ct al
`[1982] Brier
`to activation ef the
`moniforing sysinm, an image of the background,
`Le. Sn imaie of the scene which canigis neo ies
`ing ar otherwise infersaling objects, ig captured to
`aurveaethe refererce image. When tis syste8 by
`iperation, the absolute diffeames of the cirent
`video frame from the reference page ja compated
`te prodare a diffivence image, The difference im
`age is Gren thresholded af an appropriate value to
`abtain a binary image in which the “offpixels rep-
`resent backersnind pixels, and the “on” pixels
`represent “moving abiet pixels. The foer-con-
`nected componente af moving object pixels in the
`thresholted imageare the niiinn reginma (See Pig-
`urei}.
`
`Simple appliosti¢n af rhe abject detection proce-
`dure ouilined above results In a numer of srrors,
`largely due io the houtations af threshelding. Hfibe
`threshold ased is toofew, camera aeins and shad-
`aws WHT producespurious objects: whereas of the
`thrashakd as ton high, sarne portions af the objects
`in the scene yell fail fe be separated from the back-
`
`AVIGILONEX. 2005
`
`IPR2019-00236
`Page 5 of 18
`
`AVIGILON EX. 2005
`IPR2019-00236
`Page 5 of 18
`
`

`

`in which a single
`ground, menuiting ty breakep,
`world oblect gives nse to several motion regions
`wittin @ single frame, Gur general approach is 16
`alley breakup. bet ase groaping heurisues to
`merge multiple connected components inte a sings
`mation region and maintain a oné-lo-tare corre-
`spondence between mathnr regions and world
`abjecta within each frame.
`
`Unegrouping techangae weomploy ws 20) marpho-
`logical dilatien of the motion regions. This enables
`ihe system ts merge comnecied components sepa
`rated by a few pixels, bt using this iechnikpue te
`span hege gaps reaults in a severe performance
`degradation. Moreover, diauon in the imagespace
`may result
`in incorrectly merging distant alyeets
`which are nearby in the image (a few pixels), bet
`are im fact asparated by a largs distance in the
`workd (a fewfeet}.
`
`IY SES information is avaliable, the connected cam-
`ponent grouping algorithm mikes use of an
`estimate of the sive Gn world coordinates} of the
`abyects in the a@nage. The hounding boxes of the
`comnected componente are expanded vertically and
`harizontally Oy a distance mieaaured in feet crather
`thant pinele), andconnected conrponants with over:
`lapping bounding boxes are merged ints a single
`motion region. The technique for estimating the
`arteof theahjects mm the image is desorbed im soc-
`Han 3.4 below.
`
`32 Tracking
`
`The finetion af the AVS Wacking routias i te es.
`tablish carrepondences
`between
`the motion
`resiogs in the Gurrent frameand dose in the previ
`wus
`frame. We ase the ischniqee af Courtney
`[2897], which proceeds as follows. First assume
`that we have cormmputed 209 welacity estimates for
`the mation regions inthe previous frame. These ve-
`focny estimates, together with the locations ofthe
`centroids in the previoas frame, are used to proheect
`thelocations of the centroids of the mation regions
`inte the current frame. Thon, a sete! searest-
`neighbor
`crferion
`i
`used
`to
`establish
`somespondences.
`
`Let P be the set of mothin region centred bica-
`tons in the provinus frame, with p, ons such
`location, Lat p' he the projected location of p. ie
`
`fhe the setaf all such
`shecurrent frame, and det
`projected locations in the certent franie. Lee €" he
`the set af motign region centred bucations iq the
`carent frame. Ho the distance heoscor Pe and
`£,.€ i is the smallest for all alomenn of C. aut
`his distance is also the smallest af the distances
`Retween c, and all elements of P Ge. BP: and ep
`are maine) nearest maishbors}, then establish # cor
`respandence between op. and ¢, by creating 3
`bidirectional strongHakbetween them, Usethedif-
`fersnes In fine and space between BR; and ¢; is
`deterrine 8 veksiity estinmeefor ¢.. expressed in
`pixels per second. H there is an existing track con-
`taining p.. add c, hit Otherwise, establish a new
`track, amd add bath p. and ¢, fo i.
`
`‘The strong Hnksformthe basisaf the tracks with a
`high-confideace oftheir correctness. Video objects
`which donot havemaiteal nearest neighbors in the
`adjacent frame may fad to form correspondendce
`Racause the underlying world abject is involved in
`an event (e.g. enter, exit, denasit, semove}, fn or-
`der fx assist in the ddentification af these events,
`abpiets withonl strong dinky are given anidireciian-
`al wedk finds ta the their Cnon-metya)) nearest
`avighbors. Phe weaklinks represent potential am-
`baruity i the trackingprocessThe motion regions
`in all of the frames, together with their strame and
`week links, forma mrenen erepk,
`
`Figure 2 depicts a sample motion graph. In the fig-
`ure, each frame
`is onedimensienal, and a
`represented by « vertical fine (AY ~ PIS Circles.
`represent objects in the acene, thedark arnwe rep-
`fesem atrong Hinks, and the gray arrows represent
`weaklinks. An ofjent enters the scene in frame FY,
`and then moves through thescene until faene Pa,
`wheres # degaartsa second ablect. The frat objeci
`conlinues fo move through thesoem, and exits at
`frame PS. The deposited abject remmny stationary.
`At frame FS another object enters the soene, inne
`porarily ocehades tle stationary object at frame
`PH} (or is orchided by i), and then proceeds to
`roves pastthe stationary object. This second mav-
`ing object reweraes directions around frames FITS
`and F 14, pontine to remove the Satienaryobject in
`frame FIG, and fimally exis in framePY. An adel
`fianal object snicrs in frame PS andestes inframe
`ES welthout interacting whh any other object,
`
`As mdinated by the strped RY pattems in Pagure 2.
`ihe correct cumeapondences for the tracks are arm-
`
`LAS
`
`AVIGILONEX. 2005
`
`IPR2019-00236
`Page 6 of 18
`
`AVIGILON EX. 2005
`IPR2019-00236
`Page 6 of 18
`
`

`

`ENTER
`
`ENTER
`
`EXIT
`
`
`
`| ogedsir rt
`
`
`
`
`
`3
`Ft
`
`FQ
`
`Fe
`
`Fa
`
`fa
`
`FS
`
`yi
`FB
`
`FT
`
`roe fo fig FM Fie Fig eS
`
`REMOVE
`
`FIS FIG Fiy Fd
`
`EXIT
`ENTER
`Figure 2: Byent detection in the motion graph.
`
`dhe
`such as
`interactions
`bigucas after object
`ocehusion in frarne FLO. Ths AVS systom resolves
`this ambiguity where possible by preferring ow
`match moving objects with moving objects, and
`Stationary objects with stationary objects. The dis-
`tinction between moving and shwianary iracks is
`computed using thresholds on the selecity asti-
`mates, and hysteresis for stidihaing transiticns
`between moving and aiationary,
`
`Following an occhaiomOvhich may hast fir several
`frames} the fearnes immediately befare and after
`the ooclasion are compared fe.g, frames PO and
`FLT in Figure 2). The AVS systent examines cach
`stationary object
`in the pre-unclusion frarne, and
`searches for its cormespondant inthe post-acclusion
`frame Ovhich shvadd be exactly where Wo was be-
`fore, sincethe object is stationary), This procedure
`resabves a large portion of the tracking ambiguities.
`General resohition of ambiguities resulting from
`mulipic moving objects in the acens is a tome for
`further neeearch. The AVA system may benefit
`frominchision of a “closedworld tracking” facllity
`such a8 that described by fnnile and Bobick
`{1995a, (90Sb).
`
`AS Event Recaption
`
`Certain features of racks and pairs of imacks corre
`sped to events, For example, the begimming of 4
`track corresponds tt an ENTER event, and the end
`corresponds to an EXIT event. fn an cuelmeevent
`datechon systematts preferable io detect the event
`
`aS near it Hime as possible to the acmal accarrence
`of the event. The previous eysiem whichused mo-
`von graphs for event detection [Courtney1907}
`opertiedin & batch mods, ami required multipis
`PASS Over the motion graph, prechuling on-fine:
`operation. The AVS ayatem dntects events ina gins
`gle pads over the amtion graph, as the graph i
`created. However, in order fo nadace errors due to
`noise, the AVS systent bitraduces a sHght delay of
`n frame fimes (ied in the current implementation}
`before reporting certain events. For example. in
`Finere 2, an enter event oceurs on frame FI. The
`AVS aystem requiresthe track ig be maintained for
`n frames before reporting the enter evant, N the
`track mot recuntained for the requiced number of
`frames, it is ignarsd, and the enter event a nat re~
`ported, eg, fa > 4, the ahjcet in Figure 2 which
`enters in Trams PS and exits in frome FR will not
`
`gusieraie any cvenis.
`
`Atrack that splits inte fwo tracks, one of which ix
`moving, and the other of which is Stauonary, corme~
`sponds tg a DEPOSIT event. If a moving track
`iiferskete a Siationary track, aud then continues i
`move, bul the stationary track ends at the intersec~-
`tion, thie correspands ty a REMOVE event. The
`yehiove event can be asneratad as soon as the re-
`mover discecludke the location of the sistionary
`oent which was removed, and fhe system can de-
`termine that the Malionary object ig no longerat
`that location.
`
`E84
`
`AVIGILONEX. 2005
`
`IPR2019-00236
`Page 7 of 18
`
`AVIGILON EX. 2005
`IPR2019-00236
`Page 7 of 18
`
`

`

`
`
`Figure 3: Establishing theimage fo map coordinate transformation
`
`in a manner similar ts the oeclasion situation de-
`acribed above in sectian 3.2, the deposht event alsa
`Rives tise to aatbiguity as to which abject is the de-
`poaitor, and which ia dus deposites. For example, it
`may Rave been Unt the object which entered at
`frame FT of Figure 2 stopped at frame P4 and de-
`posited A aroving shjcct, and it
`is the deposited
`abject which then priceeded to exit the scene at
`Fa. Again, the AVS aysiemrelies au a moving ¥s.
`stationary distinction fo resolve the amblieuny, and
`insiata that the depaaites rernain stativmary after a
`depesk event. The AVS system reqaires botthe
`depositar and the depastes tracks to extend for a
`frames past the point af which ihe tracks separate
`fe.g., past frame PS in Figure 23, amd that the de-
`posited object remain stationary; atheredas no
`deposit fvert ig ganeraiad,
`
`Also detected (hai met iiustrated in Figure 2), are
`RESTevents Geben a moving object comes to 8
`gop), and MOVE events (when a RESTing object
`bagins to mive again} Finally, onefarther event
`that is detected is the LIGHTSOUT event, which
`oceans whenever a large changeaccurs aver the on-
`Gre image. The motion graph reed rust be commuliad
`to detect this event.
`
`3.4 Image te World Mapping
`
`in onder to inoateobjects acen in the image with re-
`spect i G map, H is Becessary to establish 2
`mapping betwoen image and map ceardinates. THis
`mapping is cstablished in the AVS systemby hay
`Hye a user draw quadriaterals on the homzontal
`
`surfaces Vande in an image, and the corresponding
`quadrlaterals an a map, ag shown in Figure 3. A
`warp immeformation from image to migy coondl-
`nmaies
`rs
`cdmstndiad using the quadrilateral
`coordinates,
`
`Unce the transformations arecatablished, the sys-
`tem can estimate the focauion af an object fas in
`Rinckbaaghand Bannon [19S]) by assuring thai
`all abpects nest on a horizontal surface. Whenan
`Hbject is detected in the scene, dhe midpoint afthe
`lowest sideof the bounding baxis used as fheim
`age point to project inte the map window using the
`quadriatsral warp transformation [Wolherg, 1990),
`
`4 Applications
`
`The AVS core algorithms desenmbed in section 3
`have been ased ay the basis for sevens) vider sar
`veilanze applications. Section 4 describes thse
`applications that we have implemented: situations!
`awareness, best-view selection for activity logsing.
`aki envirgiment learning.
`
`4.) Situational Awareness
`
`‘Phe goal of the aituationsl awarsness applicaiion is
`to producea feal-tine map-hased display of the fo-
`cations of psople, objects
`and events
`in a
`montonsl region, and fo allow @ aser ta Speedy
`alarm. conditions interactively. Alarm conditicns
`may be based an the locations af people and ab-
`jeota in the seene, the gypes of ahyects in the seene,
`the sinis in which ihe people and objects are in-
`
`AVIGILONEX. 2005
`
`IPR2019-00236
`Page 8 of 18
`
`AVIGILON EX. 2005
`IPR2019-00236
`Page 8 of 18
`
`

`

`Thurs " Friday wd Saturday.adfunda
`
`valved, and ihe times at which the events occur.
`Purthermors, the user can specify the actioniotake
`when an alarmis tiggered, og. to generate an au-
`dio alarm or write a ig Als, Por example, the user
`should be alle to specify that an audio alarm
`should he triggered Ha person deposrs a brefosse
`on a given iible between S:00pm and 7:00 am on a
`weeknight,
`
`Figure 3: User interface for specifyimg a monitor in AVS
`in order to determine theientities af objects fe.2..
`briefcase, netebuok), the sihationsl awareness sys-
`tein cOmmnuncates with one aor more object
`analysis muduiss (QAMos). ‘The core engines cap-
`tere snapshots of imterasting objects in the sesnes,
`and forward the sagpshots fo the QAM,along with
`the EDs of the tracks containing the objects. The
`GAM thon processes the snapshot ie order te deter-
`mune the type of object. The OAM processing and
`the AVS core engine compuiations are asynchro-
`nous, So the care angine may have processed
`several mors frames by time the QAM completes
`HS analysis, Once the analysis is conplete,
`the
`QAMsenils the results Gin object type label} and
`the track HD back fo the core engine. ‘The cone en-
`gine uses ihe wack TDto associate the babel with
`the correct object in the current frame (assuming
`the object has remained in the scene and been eue-
`cesshifly tracked).
`
`The architecture of the AVS sitiational awareness
`aystern is depictes! in Pigare 4, The systconsists
`af one ar more smart cameras communicating with
`a Video Surveillance Shell (VSS). Each camera has
`associated with @ an independent AVS core engine
`that performs theprocessing described in section 3.
`That is, the engine finds and tacks moving objects
`imthe scems, mapa their image locations to world
`comdinates, and recngnizes sevens invalving dhe
`objects. Each core engine emits a stream of loca-
`tin and evant reparts to the VSS, which filtersthe
`incoming evant stresans for user-specified alarm
`
`The VSS provides a map displayof the monitored
`area, with the loeations af the objectsin the scene
`reported as iconson The map. The VSS aise allows
`the aser to speedy alarm regions and conchtions.
`Alanm regions ars specified by drawing them on
`the may using a mouse, and naming themas de-
`sited, Theuser can then specify the conditions and
`actione far alarms by creating one or mare meni
`tors. Figure 3 depicts the monitor creation dialog
`box. The user names the monitor and uses the
`mouse fselect cheek boxes assoniaied with the
`conditions that adil trigger the noniter, The user
`Aftalysis
`selects the type af event, the type of object in
`age SESS|Mogute =
`
`volved in the event, the day of wesk and fime of
`Smart Camera &
`{OSNY
`Sag Biss
`day of the event, where the event accars, and what
`BoEH RAAN
`te do when the alarm goendiion occurs. The piont-
`for apecified in Figure 5 specifics that a voice alarm
`
`conditions and takes the apprupriate actions, neh |
`
`Fimure 4: The situational awareneas oystem
`
`188
`
`AVIGILONEX. 2005
`
`IPR2019-00236
`Page 9 of 18
`
`AVIGILON EX. 2005
`IPR2019-00236
`Page 9 of 18
`
`

`

`
`
`Pigare & Tracking an object in thescene on the map
`
`will he sounded when a bricfease is deposited on
`TableA between 5:00mm and 7am on a week-
`night. The voice alarms are customuzed to the avant
`and abject type, se that when this alarms is ing
`gerd, the system will announces “deposit box" via
`His audio iayeput, Figure 6 shows @ person about to
`friggerthis alarm.
`
`§ Best-View Selection for Activity Logging
`
`ta many video surveillance applications the goal of
`surveillance ig nat ta detect events in real time and
`generate alarms, bot rather fo construct a iseor a
`dst trail of all of the activiaythat takes place inthe
`camert'’s feld of view, This log is examined by in-
`vestigaiors fier a security incident {e.¢., a thear
`terrorist aitack}, and is used to identify possible
`SMAPACTS OF WHESSOS,
`
`in order togain experience with this fype of appli
`oation, we have used the tracking and event
`detection capabilities described in section 3 to can-
`struct a program that monitors and records the
`movernenis af humans mAs fel of view. For ev-
`ery persan ist if sees, Tt creates a log fle that
`summarizes important pvformation about the per-
`aon, including & smapshot taken when the person
`wag close to the camera and Gf passibie) facing it.
`The log Glee are made available to authorized users
`via the World-Wide Web.
`
`8.1 Architecture
`
`The appheation makes use of the AVS core alge-
`fiihms to detect and track people. Upon detection
`af a track corresponding to a person in the iapur,
`the irackar associates & dita recon? with the wack,
`The data recordcontains a summery of information
`about the person, Including a snapshot satracted
`from the current video image. Ae the person is
`tacked through the scone, the iacker axarmizies
`each image of that persnn that & reerives. H the
`new image isa better view of the persenthan the
`previnuslysavedsnapshot, the snapshot is replaced
`with the new view. Whon the person saves the
`scene, the data record is saved to a file,
`
`Each log entry fils records the time when the per
`son entered the scene and a Het of coordinate pairs
`showing their position in cach video frame. Each
`lag entry fle alse contuns the snapshot thai was
`stared in the track record for the person when they
`exited the scene. Because of the way smapehots are
`maintained, the final snapshot js the best view of
`the person that the system had doring irecking. Pi-
`nally, the Ing amtry file contains a pointer to fhe
`reference image that was iy effect when the snap.
`shot was
`taken. Thie information fomns
`an
`extremely conejes description af the person’s
`movarnents and appearance while they were in the
`SkENE.
`
`Selecting the best view: The system uses simple
`heuristics to decide when the current view ata per
`
`Ley
`
`AVIGILONEX. 2005
`
`IPR2019-00236
`Page 10 of 18
`
`AVIGILON EX. 2005
`IPR2019-00236
`Page 10 of 18
`
`

`

`
`
`
`
`
`
`Lab
`Figure 7: Floor plan of area used for hallway aionitoring experintents.
`monitors the hallwayand printer aleave.
`
`Virtual
`Reality
`
`Vision Research Lab
`
`oe
`
`eee
`
`=
`
`
`
`ss
`Camera is locatedat right gad
`
`aan is berer than the previously saved view. First,
`the new view is considered better dP the subinet is
`moving Ewand the camer in the current. frame,
`ad was moving away in the previously saved
`vane TRIs cantaes the svetern bs favor views on
`which the subject's fare ig visible. Hf this nile dees
`n

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket