`
`TO REQUEST FOR EX PA.RTE REEXAMJ NATION OF
`U.S. PATENT NO. 7,868,912
`
`AVIGILON EX. 2005
`IPR2019-00311
`Page 1 of 18
`
`
`
`~foving Object Detection and Event .Recognition Algorithms for Smart Cameras
`
`Thoma~ J. Olson
`Frank Z. Brill
`Texas Instruments
`Re~earch & Development
`P.O. Box 655303, MS 8374, Dallas, TX 75265
`E-mail: olson@csc.ti.com, brill@ti.com
`http://www.ti.com/research/docs/iuba/index.html
`
`Abstract
`Smart vide-0 cameras analyze the video stream and
`translate it into a description of the scene in tenns
`of objects, object motions, and events. '111is paper
`describes. a set of algorithms for the core computa(cid:173)
`tions needed to build smart <.--.ameras. Together
`these algorithms make up the Antonomous Video
`Surveillance (AVS) system, a general-pu.rpose.
`fram.ework for moving object detection and event
`recognition. Mo\'ing objects are detected using
`change detection, and are tracked using first-order
`prediction and ne.arest neighbor matching. Events
`are recognized by applying predicates to the graph
`formed by linking corresponding obje-et~ in succes(cid:173)
`sive frames.The AVS algorithms have bt-cn used to
`create seve.ni'! novel v.ideo surveillance applica(cid:173)
`tions. 'Dlese include a video surveillance shell that
`allows a human to monitor the outputs of multiple
`ca.meras, a system that takes a single high-quality
`snaps.hot of ev~ry person who enters its field of
`view, and a system that learns the structure of the
`monitored environment by watching humans move
`around in the scene.
`
`1 Introduction
`
`ages and video clips, but these- will be carefully
`selected to maximize their n~·eful infom1ation con(cid:173)
`tent. The symbolic information and images from
`smart cameras will be filtered by programs that ex(cid:173)
`tra.ct data relevant to particular tasks. This filtering
`process will enable a single human to monitor hun(cid:173)
`dreds or thm1saods of video streams.
`
`In pursuit of our research objectives [Flinchbaugh,
`1997}, we are developing the technology nee.ded to
`make :.mart cameras a reality. 1\vo fundamental ca(cid:173)
`pabilities are n<.>eded. The firs t is t11e ability to
`describe scenes in terms of object motions and in(cid:173)
`teractions. The second is the ability to recognize
`important events that occur in the scene, and to
`pick out those th,u are: relevant to the current task,
`These capabilities make it possible to develop a. va(cid:173)
`riety of novel and useful video surveiUam~e
`applications,
`
`1.1 Video Surveillance and M:onitoring
`Scenarios
`
`Our work is motivated by a several types of video
`surve-illancc and monitoring scenarios,
`
`Video cameras today produce images, which must
`he examined by humans in order to be useful, Fu(cid:173)
`tore 'smart' video cameras will produce infor(cid:173)
`mation, including descriptions of the environment
`they are monitoring and the events taking place in
`it. The information they pmducc may incJude im-
`
`1be re-11eatch describe<! in this report wns sponsored in part by
`the DARPA Image Understanding Program.
`
`Indoor Surveillance: Indoor urveiUance provides
`infom1ation about areas such as building lobbies.
`hallways, and office..~. Monitoring tasks in lobbie.s
`and ha!Iways include detection of people deposit(cid:173)
`ing things (e.g., unattended luggage in an airport
`lounge), removing things (e.g., theft}, or loitering.
`Office monitoring tasks typically require informa(cid:173)
`tion about people's identities: in an office. for
`example, the office owner may do anything at any
`
`159
`
`AVIGILON EX. 2005
`IPR2019-00311
`Page 2 of 18
`
`
`
`time. but other people should not open desk draw•
`ers or operate the computer unless the owner is
`present, Cleaning staff may come in at night to vac(cid:173)
`uum and empty trash cans, but should not handle
`objects on the desk.
`
`Outdoor SurveiUance: Outdoor surveil.lance in•
`eludes tasks such as monitoring a site perimeter for
`intrusion or threats from vehicles (e.g .• car bombs).
`In military applications. video surveitl.ance can
`function as a sentry or forward observer, e.g. by
`notifying commanders when enemy soldiers
`emerge from a wooded area or cross a road.
`
`In order for smart cameras to be practical for real(cid:173)
`world tasks, the. algorithms they use must be ro(cid:173)
`bust. Current commercial video surveillance
`systems have a high false alarm rate {Ringler and
`Hoover, 1995], which renders them useless for
`most applications. For this reason, our research
`stresses robustness and quantification of detection
`and false alarm rates. Smart camera algorithms
`must also run effectively on low-cost platforms, so
`that they can be implemented in small, low-power
`packages and can be used in large numbers. Study(cid:173)
`ing algorithms that can run in near real time makes
`it practical to conduct extensive evaluation and
`testing of systems. and may enable. worthwhtle
`ncar-tem1 applications as well as contributing to
`long-tenn research goals.
`
`1.2 Approach
`
`The first step in processing a video stream for sur(cid:173)
`veillance purpose$ is to identify the important
`objects in the scene. In this paper it is assumed that
`the important objects are those that move indepen•
`denrly. Camera parameters are assumed to be fixed.
`This allows the use of simple change detection to
`identify moving objects. Where use of moving
`cameras is necessary, stabilization hardware and
`.stabilized moving object detection algoriduns can
`be used (e.g. [Burt et al, 1989, Nelson, 1991]. The
`use of criteria other than motion (e.g,, salience
`based on shape or color, or more general object
`recognition) is compatible with our approach. but
`these criteria are not used
`in our current
`appl kations.
`
`Our event recognition algorithms are based on
`graph matching. Moving objects in the image are
`
`tracked over time. Observations of an ob.iect in suc(cid:173)
`cessive video frames are finked to fonn a directed
`graph (the motion graph). Events are defined in
`terms of predicates on the motion graph. For in(cid:173)
`stance. the beginning of a chain of successjve
`observations of an o~ject is defined to be an EN•
`TER event. Event detection is described in more
`detail below.
`
`Our approach to video surveillance stresses 2D,
`image•b.~sed algorithms and simple, low-level ob(cid:173)
`ject representations that can be extracted reliably
`from the video sequence. This emphasis yields a
`high level of robustness and low computational
`cost. Object recognition and other detailed analy(cid:173)
`ses are used only after the system has determined
`that the objects in question are interesting and mer(cid:173)
`it further investigation.
`
`1.3 Research Strategy
`
`The primary technical goal of this research is to de•
`vclop general-purpose algorithms for moving
`object dett.~ction and event recognition. These algo(cid:173)
`rithms
`comprise
`the Autonomous Video
`Surveillance (AVS) system, a modular framework
`for building video surveillance applications. AVS
`is designed to be updated to incorporate better core
`algorithms or to tune the processing 10 spec.Hie do(cid:173)
`mains as our rese.arch progresses,
`
`In order to evaluate the AVS core algorithms and
`event recognition and tracking frnmework. we use
`them to develop applications motivated by the snr(cid:173)
`above. The
`vciHance
`i;cenarms
`described
`applications arc small-scale implementations of fu(cid:173)
`ture smart camera systems. They are designed for
`long-tenn operation, and are evahlated by al.lowing
`them to run for tong periods {hours or days} and
`analyzing their output.
`
`The remainder of this paper is organized as fol(cid:173)
`lows. The next section discusses related work.
`Section 3 presents the core moving object detection
`and event recognition algorithms, and the mecha(cid:173)
`nism used to establish the 3D positions of objects.
`Section 4 presents applications that have been built
`using the AVS framework. The final section dis(cid:173)
`cusses the current state of the system and our
`future plans.
`
`160
`
`AVIGILON EX. 2005
`IPR2019-00311
`Page 3 of 18
`
`
`
`2 Related Work
`
`v
`
`>
`
`Our overall approach to video surveillance has
`been influenced by lntere_<;t in seiective attention
`and task-oriented processing f Swain and Stricker,
`1991 Rimev and Brown, 1993, Camus et aL,
`t993j. The fundamental problem with current vid(cid:173)
`eo surveillance technology is
`that the useful
`information density of the images delivered to a
`human is verv !ow~ the vast majority of survt.~il(cid:173)
`lance. video frames contain no tiseful information
`at alt The fundamental role of the smart camera
`described above is to reduce the volume of data
`produced by the camera, and increase the value of
`that data. It does this by discarding irrelevant
`frames, and by expressing the information in the
`releva.nt frames primarily in symbolic form.
`
`2.1 1\-fo,·ing Object Detection
`
`Most algorithms for moving object detection using
`fixe.d cameras work by comparing incoming video
`frames to a reference image, and attributing signifi(cid:173)
`cant differences either to motion or to noise. The
`algorithms differ in the form of the comparison op(cid:173)
`erator they use. and in the way in which the
`reference image is maintained. Simple intensity
`differencing followed by thresholding is widely
`used {Jain et al., 1979, Yalamanchili et al., 1982,
`Kelly et al., I 995. Bohk:k and Davis, J 996, Court(cid:173)
`ney: I 997] ~ -ause
`is computationally
`it
`inexpensive and works quite well in many indoor
`environments. Some algorithms provide a means of
`adapting the reforence image over time., in order to
`track slow changes in lighting conditions and/or
`changes in the environment [Karmann and von
`Brandt, 1990, Makarov, J 996a). Some also filler
`the image to reduce or remove low spatial frequen~
`cy content, which again makes the detector less
`sensitive to lighting changes lMaka.rov et aL,
`19%b, Koller et al., 1994].
`
`Recent work [Pentland, 19%. Kahn et al., 1996.I
`has extended the basic change detection paradigm
`by replacing the reference image with a statistical
`model of the background. The comparison operator
`becomes a statistical test that estimates the proba(cid:173)
`bility that the obse.rved pixel value belongs to the
`background.
`
`Our baseline change detection algorithm use,-;
`thresholded absofute differencing, since this works
`well for our indoor surveillance scenarios. For ap(cid:173)
`plications where lighting change is a problem, we
`use the adaptive reference frame algorithm of Kar(cid:173)
`mann and von Brandt [ 1990]. We are also
`experimenting with a probabilistic change detector
`similar to Pfinde.r {Pentland, 1996.
`
`Our \vork assumes fixed cameras. When the cam(cid:173)
`era is not fixed, simple change dc-tection cannot be
`used because of background motion. One approach
`to this problem is to treat the scene as a collection
`of independently moving objects, and to detect and
`ignore the visual mot.ion due to camera motion
`!_e.g. Burt et al., J 9891 Other researchers have pro(cid:173)
`posed ways of detecting features of the optical flow
`that are inconsistent with a hypothesis of self mo(cid:173)
`lion (Nelson, 1991],
`
`In many of our applications moving object detec~
`tion is a prelude to person detect.ion. There has
`been significant recent progress in the development
`of algorithms to locate and tmck humans, Pfinder
`(cited above) uSt~s a coarse statistical mod.el of Im(cid:173)
`man body geometry and motion to estimate the
`likelihood that a gi\,·en pixel is part of a human.
`Several researche-rs have described methods of
`tracking human body and limb ,mwements fGavd(cid:173)
`la and Davis, 1996, Kakadiaris and Metaxas, 1996.l
`and locating faces in images [Sung and Poggio,
`1994. Rowley et al., 1996J. IntiHe and Bobick
`[ I 995] describe methods of tracking humans
`through episodes of mutual occlusion in a highly
`structured environment. We do not currently make
`use of the.se techniques in five experiments because
`of their computational cost. However, we expect
`that this type of analysis will eventually be an im~
`portant part of smart camera processing.
`
`2.2 E,·ent Recognition
`
`Most work on event recognition has focussed on
`events thal consist. of a well-defined sequence of
`primitive motions. This class of events can be con(cid:173)
`verted into spatiotemporal patterns and recognized
`using ~tati~tical pattern matching techniques, A
`number of researchers have demonstrated algo(cid:173)
`rithms for recog.nizing gestures and sign language
`te.g., Stamer and Pentland. 1995]. Bobick and
`Davis [ 1996.l describe a method of recognizing ste-
`
`161
`
`AVIGILON EX. 2005
`IPR2019-00311
`Page 4 of 18
`
`
`
`~~
`
`,~~
`
`'°t-
`-:-
`t\
`I
`.... J\t·
`
`:rs?.i:ityptc:t1
`rnotiori patterns corre-spt11i(fa1g.
`tn
`ac{ti)nt );ltt:h 11~ sitting d11wn, Wlllking, 1ir waving.
`
`Our 11p1x-oi.K:h tiJ e.,,e.m re,\)gnltkm is b~ised on th(.',
`\.'1det) d,ttaha~z .. -indt~ing work ,.,f Courlnt:.!y fl ~ini,
`which intnKfate-tl tlw use (if pre-dk,u~s t)n th~ .mi:J-(cid:173)
`tii:m graph to r~pn.~1~nt ~'>'t~n~s- \•lt)rhm gni:phs a~
`wtU suited tn repr-es,~nting .1hstrnd, gr.-nerk event~
`:mch ~~ 'd~p<.i~iting ,)r\ ,)bjfc( i)f '~,,1t1il1g to re~r,
`wl1icl1 ,m~ difficult hJ captnt~ i.t~ing th{l p,Ht~m(cid:173)
`ba~ed approat.~ht$ teforre<.1 ti:~ :ibovt_ On lht other
`himil, patt~rn--b-.<i~x! ,1pi:-it1)il~hes c.~1n represi:mJ c.Q\1)~
`p.!~.~ motions .such. as "thrmving an d:ijecl' nr
`-•wuvin,g', \.Vhkh wvuM he-dinku!t to ~-Xpre~s tishj_g
`m<)tfon graphs. H: it~ Hkdy that f)(}fh pattem~bast:d
`and ab$l:~i~- ttw~n1 ret:t,gnitfrm te-:hniques \,.,iJl be
`needed w h:mdle the foll r.mfw ,:if evt~nts thal are {)f
`interdt in surve&Hanc~ ,1ppiit~nti1m~-
`
`3 AVS Trucking and .E,~i•nt Rt't't)gnition
`Algurhbms
`
`-This se~ifon Jel;cribt'-$ th~-<.'.m~ t{!Ghn~k)gies that
`pi\?V.kk the vkk•ti sup;~H!ant:e and m11nitoring eti-(cid:173)
`pabililk~ f1f tht AVS syM~m. 1-liere ,1re tht~i1 k\!y
`(ibjt!:Ct det~~:lion, vfauM
`kx~hm)h:Jg.ks: m,}ving-
`trnckin~. and event r.'&1Jgnition, Tht~ moving d~kct
`dett1t1k)n n1utin($ <.fokrmine when <)ru:> or mt':it"e- <)b--·
`jcds -e-nm.r ,l mnnitor~d j,.::ene, d~~-id~- wnkh pixels
`in a giwn vide{1 name i:tim~:'>pmid w th{~ moving
`object~ ver:ms whi¢h pb:ds cmrespnnd to the back"
`gfoum;J, and form a simt:ii~ r~prfsentadon <.l lhe.
`1,bJe-<i'.,; im~g~ .in the vitkl,1 fram~- Thi,; re:p1~se:nta:(cid:173)
`t:krn is refuncd to ~s a motion .tf-g.fon. ,tnd it e,xists
`in n sh1t~k virk<.1 fr.arne~ ~is dlsting-1fr~htd frnm lhe
`wortd (Jbjtci~- which e,{JSt in the world :ind give ri~e
`to th,~ 1rwtkin. r~gifms,
`
`Visu~tl tr.~ckinit cQn,~is.1s nf &.!l~-rmining l'.Xirre.!>p@~
`dt-ns.:es
`l>et1.vt-M
`(W~r a
`llii:.~ motk1t1
`r~.giNi%
`fr~,mes, .'Ind m.:\intaining a ~ingk
`~et1sr~~nte1>fvsd~(cid:144)
`rel,1~,,efitati<m, or !rad,. fot the wi~ild {,bject whkh
`~ve rii;e h'l the-:sequcn(:t~ 0f motfon. i-t~git}1is h'l th~
`sequem:.e of frames, Fi.rrnHy, evenl n.~~1gnitkm :is j!
`means r.~f .·=malyziil~;Jhe cdlik.·til:m of tracks ll) ,1mJ~r
`to identify i:.ivent-. nf intere~f lnvoJving thi- w,Jrld
`ohj~t,::t~ rq~res!;'.'.nlt.ld by the tmd:.s.,
`
`'ll1e mo:viog obje-~t i.letec.tiNi t~dmnfogy W¢ ~:-m-
`1:1foy ,s_a 2D chang.:.-detection tedmique $1-mifar tn
`thit tksi,::nbtd in Jain tf rtL fl 97'9J and Yi!Mrmn-(cid:173)
`c-hiH et al f 'l:982J, Prior w ad.ivatimi of the
`1)1(>Hitmfog systl;\m, an image ~)f the bai::kgt\)\lf!ij.
`Le,. 1:tn im~1ge of the scene w'hich c-::mtairn; fl(t irt!Wv
`iiig. ~'ir <)tberw'isc interesting 6t~i~<'..t~, ts c~11Hw~tl t.1
`!{t!rve as the r~fn.•1-ri·it ihw8e, \Vlu.in die s.yS:k!n is in
`operarkm, tiK~ abs·oJut~i Jifferesict d HKt cvn~nt
`vidoo fnm)c frnm the refr~ron.;.:ti rrMgc is ~,irn})Ute-d
`hJ }1t'NitK!t -a dffj:&tiiti,~t• imag,1, 'Th~ <liffet~m:e itll•
`;igt.• is then thre~ho!1:,.l:ed at an appHiprj.itc v,iJue tu
`tiht~ln :a binary hmige in \Vhkh tht '\)ff' \ib:tg rep-(cid:173)
`-reRem h."1.:kgmu.nd p1xd\\ .and th~ ''nn'' _pixels
`ropr~.__,;;~n~ "moving ~)t(cid:157) J~:cf' pix.:~f~, Thf- fol1r--~l)n·
`m~ded co.lUfK111enfa 1-.·f tru:.win~ 1Jizji.:.!Ct pixds. in the
`thtt$hokl~d jn:i~~! ,lr~ th1c.l l)'l•~tlt]i1 rcgfons fw~ flg-(cid:173)
`Hre-· l ).
`
`Simpfo llpplic~{i{m tif the nhjett det<~tton pr1xe~
`dure onlfoi~d above re~ult~ in a rmn'lm.'<f' (tf ertot£;
`htrge!y due to the limii.iht:>ns of th~-e:<;ht~lding, IHhti
`thre$hoM tised is t.()O f ow~ ,,im.~ra ilt)i~e and s.had-(cid:173)
`{).WS wUI prod11c<.~ sp11riou" t)b_ki::t~: wtw.:reas if the
`thre~!mld :is to1> nigh, sm'n<.t pmt!uns of fhe 9bjech
`in th~ seem:-w!H foit tt} be. i::epa.rnled fmm the back"
`
`AVIGILON EX. 2005
`IPR2019-00311
`Page 5 of 18
`
`
`
`g_roMd, resilliing in br-iJa:btp, ln which a ~ingk
`\Vtxld tih_iect g.i '>'e~ ri:;e !<.1 se,•eral mothm t~gfrrm,
`within ~ sittgk ftam~, Our geMri\J appmijch is t<)
`,l}low hr~:1kup, hut u-:;~ gmuping hm~ri-:;tk~ t1>
`rl·terge multiftfe ~Hl'HR~t{ed C'Ollq,(>tients itito a 3tilglc
`mnti~)n mgi(m ,\nd mai~itain a -0ne»tt)•(n1e ,:rirr1:.~-(cid:173)
`sp~).1id~t"KC lx~twtl'tn rn-i:itkm -regioni. and w,xld
`~)bjects ·within each ft .. wn~.
`
`One W't)l.rping techiiiqut. ~-~-,uni1k)y is 20 mN.1:ib().
`.fogica! tlifatl{Jn t.->f the motion Rigion~. 'fhfa.cnahfos
`the sy~h:!1n w merge ci.m~iected compone:m"' ~pa-(cid:173)
`mteJ by a fow pix~ts,. but ttsing this kti,;hni,we tt1
`i.11an huge g,lpii result-:; fo i:t ·st~vere performan1~e
`d~grndah11i1. Mo:re-1,l\"er, diladon in the iu-1ng1:.~ $pace
`nilly t~:.t1l1 ttl lrlt<)Ht-x~rly n~s1-gh1g distant nhj,ic1s
`wnid1 a~c. nearby in th<.~ irnage {a fo·w piindt-). but
`aN in fatl sepanited hy a: la.rgt:: {iiM;:µ1ce. ·iJJ the
`wo_rl<l {a fow t~~~t).
`
`tr JD infon11atkm 1$ a-.~ibhlt-. the -t:(miii~cted ct1m~
`pone.nt grouping ,ifgoriihm mak:t$ uw of an
`¢sttm~ite ti the size (ill ·w{ffld com:-di.n;.ttes) of the
`()~jet:ts ht the tm~t:,, ]1~ bt)UJ1tfotg h1:,xe-" ()J' the
`ct1i1nected CfHUfl"ments ~re exp,lnded v~rtkaHy and
`h(iriZ(xntaJly by a tli$ml1Ct 1)1t;;~Sllted tn fhet (rather
`thuu ph.-:ds) .• and.~m:im~teJ cmnptments whh: over-(cid:173)
`),¼J)ph1g boundiiig b()X1c:~ 11re 11-ieriw<l int() a sin:gk
`1u.otion n~g~on, The techliique fi:)r ei<limatillg the
`size (If the obj~~!:-; fo the frni1ge i~ <le~criood in se<>~
`tk.m 1A below;
`
`Tb{>. fo_nctkm 1:if it-le l\VS tracking J'()tllifl~ fa hH'-S~
`ta.bHsh. corres.pondencus between
`int~ motimi
`reg:i()J\~-i.11 tht~ ciJnent fr-an~·.Mid th<)Se in th~ pti!Vi•
`t.1ns frame. \Vt> ti:sti the ledmi<Jmi t)f Cmittney
`[l.997], w'hkb pm<..~ee<l$. ~:-, ioHows. Finst a.ssm1w
`th,,t Wt': have <.:.mnpmed 2D-vtl.ocity ei.tirnm.1}s· for
`the n:lQtkin n.·gi<ms in the prt.w1mt~ frame, The$~ ve-(cid:173)
`fodty e.itirnalt~t,, toge-the..r ,.vhh !he .. !oc.~itinn~ 11f th~
`cemrnids fo the i,revh>us frame, ~ire ustd tl) prt1Ject
`th-e ki:c.nkms of th{~ <c:~n.trcshJs fJf th-.:~ 1.nNion r~gion~
`iilh) the s:.:mn~nt fra.rne, Tht.~1, a imifui.i! 1:~!i1t.:st•
`criteri<m
`iS used
`to
`t~,-,tablis-h
`tl~~ighbor
`corn:~~pmidt~m:,~s~.
`
`Let P be th<.~ ,;c.t 1,t: mmimi rt~ifoi, tt~l1t~)id kica•
`lion~ in the ptt.1,'i-mht .frarne, with p1 <.me ~uch
`location, Let p'; be the _pm_iected Jocatkm of pi iu
`
`bt~ the ·srt nf au -such
`th<~ cmn~nt frn:me. ,ind !~t
`pr<.tl~ted fo,:§1ti(m~ in the corr,~111 frame.. Let C be
`the SN (if w1otit.ifi reghm <:tntttii<l kicsititms. iti thst-(cid:173)
`.:-mr~nr frame. If the distanct: h¢tw~t~t1 p' i an;J
`c i E C is th~ smaHt.\St for an ~kmiem~- i:,f C. !ind
`thh dis.t.an<:e is ah<.) the sman~t 1}f die dist~mi~es·
`h~t<.~.'ten (~; imd ,lll ekme11ts t:1f JP {Le., 11\ and ci.
`are rm1tual n~a.n.tit neigl:fm)r:;\ theti ~s,abHstl 11 -::or•
`re-spt~fld~nq~ b~tw-et~r1. f..\ m1d {~ i ~y creating a
`b-idire~tinmil strong link lx:twctm them, tfae the zfif~
`foren(e in Hmt' and .Sf.!'<K"t~ bd\\'eell P; ,md <:; h'.l
`dete:nnine a vclodty e.siiiii,ilt~ t'Of (\_, exprnsst•d in
`phds per seci):mL H there is ~m mdsting trnck ttm-,
`taining P;-, ad(! J\ tiJ it Otherwise-, ~~tabibh a n~w
`tm<>k, and add bi:tth pl a.nd ,,--<' to it.
`
`The ~trrmg links' fom1 the Ntsfa oJ the tr,\cks \.\,ith a
`high,c<mfldtnce ()r their.- C(ln:t'.:t!nt:~$, Video obiectJ;
`-~"hkh <i1:.)· tliJt h.:w~-n~utti~l nefttest ik~ighbi::;ts :1t~ the.
`adj.Kent fmm_e- rmrf"t'lil to _fonn C()rresp(mdfnt)i;.:~
`b~ause tM under-lyi11g world <.)bjet.:t i!i inv{J!¥ed fri
`.Hn ev~•~t (t~.g., entc:r:, exit,_ dep(~it._ ~rmwc}. h t t)r·,
`<ler 1n ~s.sist in thi~ idemifo.:ath?n tl H1ese ewnts,
`1,bJtt.ts \'ltithout .\.tr(mg lints· are givc:n unidii-ed:ioi1~
`,d -~vnJk. b'tih t,1 the their fomHtMual) nearest
`m . ...-ightii:;rs, The~ ,v~ik links repr:e$en:t p,:lten:fo1l ~m,
`higuity 111 the tnkk.ing_ pttx:e:.;s;:l'he moti{m regions
`m a.H of t1H~- frame~. togefher Willi thei.r ttrm.ig amj
`wt~\k lfoks, form a mmiNt ,gmph.
`
`Hgu.1,:. 2 depict~ a 1'iimpre m(ttion graph Jn th~ fig-(cid:173)
`um, each fr~uw
`l::-
`.:me--dim~msi(maL ,tn~
`t~
`re.pre;;ente"-1 hy a vertitaf .line (r't) -- Fl R). Circle<;.
`rnpre~~m ()Qje~t~ jn tht~ $C.{~ne., the-dart tim,ws ft>!)·
`resent strxmg. links. and the g.ra)' am)ws repre-1.em
`wtia.k :Ut1k$. An ~~fett enters the s,Mn~~ in frm,1e Fl ,
`~md then move-s throi.1gh tru.~ ~-tme until Jrtmle F4,
`wMr~ it (fofri1sit~-a seci:.md ot~Ject. Th<.~ flt-~t objt~c~
`Ct)niirme~: tc1 mbw through th~ seem,. ,ttid exits; ,it
`fi:a1rie, F6. The depr:i,~itcd (lh.je..:.t remain~ :stadomtry,
`At frnn1e F'S -~nother i:.thjt~t- .enter$ tilt~ .sct~1e. lttl~(cid:173)
`f)(.)turily ~x:.dlkle~ the ~.i.fo:m.-uy s:.shj{.>ct at frar11e
`fl.n (or is o~duikxl hy it}. ,md then pn:ic<.:eds t<J
`rn.<.)Yt:: p,ii(t the SMioml.t'y <.lbfect Thh st~oncl m~\'•·
`ing ohje<:t rt~vi1rs¢~ dire~ti<)-Oi- .trxmnd frames Fl3
`and F!4, rerums w remm-'e the· :{l.atfomi.t)' i:,hj,•tf in
`·tnuui:.i, f'l 6, and frnally exit~ in frnm(~-Pf 1, t\n atidi·
`-ti1Jnal til~ject cnk~,ri iii fr;.1me ·Fs ,amt exits ut-fmme
`'E'S w·ith<'1Ut hHt~rntting with {U~y <:.)ther <)bie,t,
`
`t\s itidicated by the ~ttip{i.-1 HU patterns in Figure 2,
`the n,rrect correspondences: for the 1rn-ck:s ,iu: ~m-,
`
`AVIGILON EX. 2005
`IPR2019-00311
`Page 6 of 18
`
`
`
`ENTER
`
`I
`
`F2
`
`F;l F4
`
`FO
`
`F1
`
`bigunus after object :intcr~~tt.ions such as
`the
`occlusion ih frame FlO. The AVS system resofves
`fht~ arnbigii.ity when~ poss1hJ~- by preforring to
`rrn~tch n1nvtng o\,">je:Ct~ with mQ\c'ing obje,:is, and
`stmkmru~'-t nhjects. with J>tation(iry o~je~~t.:, Th~ xfo(cid:173)
`t:in,!lli)n bet\W.•e:n 11.K;ving and ~t.-Uimuu·y 1r11:eks i~
`~omput<.xl using thre-sholds 1.-\n the v~-t(X:tty es1.i~
`:maws., and hystere~is fbr .st:.i.biliiing 1:ransitkms
`hetwe(~.n mt>vh1g ,n<l stationa._iy,
`n .. %.iwing ,m t-x:i:.:.!u~ion. {whith 1.nay la.st ihr 1everal
`frames) ti¾~ fr;;im~-s in:1rnedfottily hdhre :md a.ftt'sr
`-the ot.~~!w;ion m-e compar~ re,g,, frames P9 mid
`Fl J .in Fl:gttrtt 2). Th~ AVS syste:n, exa:11:lint~~ e\~t.:h
`sta.ti.otiary obj~"J in the p:nH:.<iedus.ion franw,. an<l
`se.udi.es for hs cotrespt>n<fon.1 in the poi;t,oc~.luskm
`frarr.ie (whi<.:h sb:.HJ!d b{~ e.~.-a~li)' \vber~ it was be(cid:173)
`for(i,. since th~~ obje,ct is ~t~1t:iormry), ·n1i" pro~ed.tirf:
`rest1h'es a forge _port.i<.lit <.lf the. tracbns; ambigui.6e~A
`Geru~r~! re-~(cid:144)ltttfon of a:mbtgulli.e:s rn~uhing. from
`multiple mo,dng tlbjecb :in th.e. si;.eJJe is a lopk for
`ftlrth(~r n.~sl\latch, 'rhe AV$ :-yMr:Itl may ~n,~tit.
`from ind11si.on t)f a '\:fosed wottd tntckini'' faciHtv
`s.u1ch .as that described h.y
`lnti!k and Hohk~
`t4f•'-'l ·}
`l ! .'>:?,. a, • ~ ,.H)_! ,
`l ! (<.t\'\'.<'
`
`"
`
`.
`
`' ~-
`
`y
`
`~i .. ne~ir· tn time ,ts poS!iiblf to the actual occurr~m .. -e
`o:f the ztvent. The pn.~v·ious sy,,;tem whk:h ust~d rn.o(cid:173)
`tion graphs for event detedlon fCt:;unney,_. 1997}
`opernicd in a batch mode~ 1md .required mulhpl~
`passes o,,er the mmk10 graph;. pn~d11ding ordirie
`optrnti.tm. The: AVS sys.tetl} d~tecis M¢nts in,, sin-(cid:173)
`gfe pa:~~ <.rv~r th~~ .u~ol:ion grnph • . as tht'- graph is
`crca(etl. l-fowe'i<'tX~ in orde!' to reduce ,~r_mrs due h'>
`rmisc the. AVS ~yste:m introd~1~es li ~ligh1 ,t.~tay t)f
`n frame tfrnes (n~:3 itt the current .imptemenl,llt(m)
`be,fom .rep,:irting certairi evimts. F<)r ~x;;impk. "in
`Figur~ 2, ~o enter ~v-~nt occi:u·~ <m frn.m.e Fl. 'rhe
`,AVS t.ys.tem n~quires the track fo ht~ nminia.ined for
`n fran)\15 bdhre repmting the enter event. tr the
`lrnck nt.1t mttirna.ined for the required number ()f
`frame!l-, it ts ignor~di at~fl .th~ enter l~w:mi i~ m~t re-·
`ported., i.!,g" if n > 4, th~ r}hj~d itl Figure 2 whkh
`i::ntcn, in fnm1t~ F5 and c~ih, in fmm~ F8 will not
`ge11erme-.u-iy twei1t~.
`
`A .track that spHts inJo two tnidrn, ,m.e of \\'hicb l~
`movi11g, .:i.m:l tht~ other of which i:s st.itifmary. t~t)n-~-(cid:173)
`~ptmds to a DHPOS lT event. lf ,1 n!oving trM-k
`int,~rSc(:ts a. SJ.i.ltt)H<lry track, tmd lhcn OOt\tiniWi (O
`flllW{\ hut tht ~lcl(fonary track oods ,tl the interse,<;v
`tion, this ton-e.spnnds. t.-:i ~1 RE~·iOVE event. The
`remov~ event cJn be gtiti.mned a!I soot1 ;\i; fh{~ l\.."'(cid:173)
`n,cM:r disocdudt.~s 1hc. loc~tl¢n nf the stl¼fh)JltJry
`Q~joot which Wl'i!:, ·mmm,ed. and the i:.yskm t-.an ,fo •
`term.ine that .the s.t.it1m1ary nl:iect is tKl h)J1g1;.ir at
`tha( loc:,ttorL
`
`Ce.rtain fraton:_~~ t>f trn ... ks itt).d pairs d ' nw;k s c-0~(cid:173)
`~t~n<l h(cid:157) events, For exampfo, the beginnin~ of a
`.track i:.:ar.respfl!'lds tc:i ~m HN7.1:'.R ~v~nt, Md the end
`t~orre.Sp<.lnd~ t(> all EXlT ewnL 1n ,ill 1Jl1vline event
`detection sy.-;tcn:1, .it h pn!forabk t~:>.detect· tht! t:vent
`
`164
`
`AVIGILON EX. 2005
`IPR2019-00311
`Page 7 of 18
`
`
`
`11
`'-·,J t ___ ;
`----_ ______ _J
`
`'~"""'Jll't,,. r"''"~·.::~-~::~~----...
`
`I
`
`.
`
`tn. ~l ni~irmer gimilar ti:> the ncdu'>i1Jrt sftmttifm de(cid:173)
`scribc<i.l iib<)ve: in section 3,2, the :depo~it t:<:Vtilt also
`glY~)S rise t~> amhigutty n:-:i ro which (1b)t~ct is !fa::. tk··
`Jlt,siwr, and whk~h is the depo~itef.'.. F6r ~xam.ple, it
`m~.y ha.v~ be-en Huit tht~ t:>hject whkh emere<l at
`fr~h)e Fl ()t' figure .2 -~t<'JpjMd at frnme F4 and de(cid:173)
`I}o$ittd -~ tm.n-itig 1Jhje,;:t and H l$ the dtpvsitt">!:l
`object whh;h then pn;.~i.!ed~d tn exh the st:eni.i at
`Hi, Again, Ihe /\VB sy~tt.1m rehe-s {m a moving ~'s,
`.~tath:m.My distindion t(} resdve. thi.! mnbiguity. ~ml
`iH:'>tM~ t.h,u the dept~~ik>e re11).;:1in ~(4itki11:u:y ~ftt;o:_r \l
`ceJX}~it event:. Tne AVS $y1>tem require,; lmth ihe
`(k~1>o~itor Mid the de!)(,~ik't.~ trncb to t~xtetK! fr1r n
`fmmet pa1't the point al which the. trnd,s separate
`fo,g., p;:H,t fnune F5 in Flgi1re 2), ,md that the· d<>·
`pt,sired oi:,lje{',t :remain stath.ir~ary; t}therv.-ise 110
`dept)~it (iv~~-nt is gt'sK~-n\ted,
`
`A{!,O dete.cted {but ,mt Hh .. t~irate.d in Figur.t' 2}, are.
`RESl events {wlwn a m(wii1g object ~<)tne~ to a
`Sl()p). :md \\JOVE t':Wmt:s (when ,1 RESTing, t~bJtcr
`beg1n:1< W trHJ'.'t agai.n} Finally, i:me fnttlter ~•,'\\)nt
`that b: de.teeted i$ the LIGHTSOVT event, ·1vhich
`occ.tirs \11henever a forg.e ~:ha.nge {lcturs tJVt)r tht~ en-·
`tfre image .. The n1:ti{ion g:i'aph n~ed nt:,t be cN1::;1.Jlted
`t() ikte"'·:t thts event
`
`3.4 fo:~ng.e to \Vo-'rid l.Vfapping
`
`In <Jr<le.r w lrw.i.te t)bje<:t!;: ~cell ifl the image with re-(cid:173)
`~l'let! t{1 a m,w, it i~ l'l{~'X~~:<ilr</ tt) e:miblish ,i
`mapping heti..vt.*rl :image ,rnd m;.i1, toun:hmite:t This
`mapping is ~~~tJbHshed in the A.VS tyste.ni by hnv(cid:173)
`iug· a tiser draw quaclri-lattmlfs <m ihe hori2<.mtal
`
`l.urfact-"S vi~ihk lt1 an im,<\ge-, and th~ c9rresf1~)nding.
`q\i~drilatern!~ r.in a n:inp,. as shown in Figtm.:: 3, A
`warp tr,ii,11sfonna1i1:>n frnrn image t,~ m~p cnrmJi--
`th~ q11a{fri!.a.1ern!
`11ates
`is C()n:i,tri.ltt~d 11-sing
`C<)<.)rdimit~Ji-,
`
`(}nee the. transfrinnatkim. a.re,- est.iMEhe<l, the 1'J~"
`tem tan C!>tirna,~ ihf i()Cmhm i:.W an ,>b}t::<.~i (as in
`Flindih;mgh and Bammn f'f 994].) hy a~suming that
`,1ll t~bj:et,ts l\'.St (m a ht)tt21)mal s11rfac{!. \Vhen im
`~-;bject i~ Je.tccted in the scene, the-midpqlnt Qf the
`JowcH sid<.~ ~)f the b1>nndh1g twx i$ used l1S th(:· irn(cid:173)
`~1ge po.int t~) prnject int<.) ~be m;~p window using rhe
`quadrilatttal warp U'lmsforn:1-1.iion l\\.\)lberg, 199l)l,
`
`4 Applications
`·nw. i\VS core- algurithtm de,-.criood in :<;:ection '3
`have ~s.·m used as th~ b,sis for ~even1t video sur~
`v~.ill,mc.e applk,itiuns, !-kctkin. 4 describes three
`n1:1plkntkms th,1t W'~ h,1vt~ impJen:it.Jutd; .situ,~thm,\J
`,rwarene~:,, best-view sde{'.t:km for m:tivity loggfog,
`a.1Kl cnvlti:.mm~m fo,\rning.
`
`4-,l Situatfonal Awai--cness
`
`The goal (,f the s1tuational awarent~s~ ,ippliq,ition h
`HJ pr<)c.k1t.e--a !~~ll--time map·b.\sed display ('Yf the It),
`(~ations of pe-0pk, objez:ts arid
`iweott<
`'in 3.
`nl()nihJr~d reglt)n, ~md tl) alh>,.,- a user· l<:) $1x.·dfy
`i£form 1wndition~ intcriictivel.y. Ahmn ct>nditi<:ms
`may be based (m the bcafo:m~ <Jf peor,te amt. ob ...
`sects \fl the seem:, th~ !)'l~~ of (J~jt."('.tS in the i(.'.~I)(!,
`the e'\<-t:nt~ in \'>hkh 1he f)~ople Md (>lrjocls .~re 'ltl··
`
`AVIGILON EX. 2005
`IPR2019-00311
`Page 8 of 18
`
`
`
`~I!:: 1:
`l!~:!::::f!~:!'?~:!!~~:::!'~1#: .. :~l'.:::::::::\:::, .,., .
`
`.
`
`.
`
`.·.
`
`..
`
`..,.,.· ..
`
`m~1;:~~~1::~1:irnf:::::t:w
`
`· :::;;:;:rr::::::s::::::t::::.,.: ::rn::::=rn:::::.
`
`,:::::::::t•;::}}
`
`.. ··.,
`
`,,.,.,
`
`!!:~~1r:•::::i::& :liiill\\lC!:_,ij;IIiil{,i:.:,,:,.•·:,.i.p,,·:•;:·: .• ,.:.:. :,.··::.:.:'·1,.:,,:,.•·:,,: :.A: ::.: :,.: :,.•. :,:,:,,:~.;:.~:.••·.i.·.:r.:.·_:,:.1.'. ·.·.•·,•~,·, ·,:,:.·.:,::•·~:.:. :.:_,~,;,F·:·:·•.·.;,:·:.•~.:.•_~.:. •.·.:, . . •·.1.·:::•-:.' . t:~. ,: :,:;:·· :•f f::, .. ; .• ,:,·_:,:1.1 .•.I·'·:::.·.>o:.::.t
`•;n:::::::::::,:, i ilililiiili;i:Jj:l:j:!;l:J: ::rw::::::m@tntnt@::
`-
`~'!''"'"'~ 1 ~.
`
`_:; .. \_·.;
`
`.:.:::·•.•.:,.•_;:.::•_::•··.:.•.:,.i.i,. f.•,:,.l:.:·.::r
`
`- · .
`
`.•. i,.·_1,.·.i,:l,·.:,:.:.;.:, •. : .. i,.:.:,·_,.·:·.: .•. :.•.:.'.:.•::.:.:.·.~.•.:.: .. :.• .. :.: .•. l.· .. :.:.·.·.•·•.::i·•
`.. :::·• .. :,._':~.•-·,•.',: .• _:,!_:.•_:.·_:,'..:,.:,.·_!,•.: .. 1.i.i
`
`·
`
`volve<l, and the tirnes lit which the. e·vems o,:ctu.
`F~r.tht~ru.roi:e, th~ user t'.:m specify the. acth)n to· take
`wh<~n u-n ~llum1 is triggered, e,g,, w. ge11ernte an m,t~
`,fa, Jdarm ·or wrile a hsg fik, Fer e.:,rnmp.k, tfa.~ user
`shotsld be 1tb-le to ::specify t.lrnt .m audio aLim1
`~hotdd he tr.i.ggcred if n 1}1.~rso.n dep~)sit$ :J hriefe.ase-(cid:173)
`t'l-n a glveri Htbk bct\.\.'e~n 5:00pm and 7:fK) ~m ,;m. ~
`weektiig:ht
`
`The archltt-cture, {)f the AVS situaiit)nal. awareness
`i,ystcm is dt!:pided in Figure 4, The systt~m <..:orisisls
`of one or mt,re smart ca~ne:r:is (i:11mntmk:ati11g with
`11. VjJeo Sun,eilhtm::<.i Shell {VSS). Ead1 i::amen1 hiV:.
`aSSt':ICiated with it ~1{1 in<k~pt<.ndt11t AVS ~().11~ <.~!'.lgine
`tk::it per.forms the. pnx.-:essirtg ckscrlbed in section 3.
`That is, the engine tfods. Md m.1cks movitig ohjed;<,.
`in the sce.11e, map~ their inmge- tO<.\ilions to W<)rld
`c<Jordhta.tes,. ,11)<.i W<~;ignit.cs ev~11ts h'tvi:.~king !ht"c
`ol>jtcts, Each <..:o~ engine emits ,1 stream 1.Jf hx:a(cid:173)
`tkin ii.nd ~v¢n~ r<.~pt1rts to dk VSS. which fi!tenr the
`incOn'ling ~:ven~ stre~m$ fot uwr~-::sp~!{;:Hloo ~i!tttm
`t'.1Jnditk)rt.s.and takes the apprtiprhtk cidion!"i,
`
`(},.,hje<:I f&.\~mfk-m
`Ff gm\~ 4< The $itu3.!it)M! 1i\v.aretwss !iysh~m
`
`166
`
`Jo Mlkt to d.et<~rn1h1t~ the ideniltit~s nf Z)bjeds (t,g,.
`h.riefoa..~e. riolem:K~k}. the sim:ulonat awatt~i~ss &:}'"£··
`\.1/itb m1e or more
`lcrn cotnmi.mica.te-s
`(>lUt~Cl
`analy~is. mui-hd1:1s (QA.Ms). 'flte core e"'-~ines cap(cid:173)
`tu.re Silap~hois of it:rt.crt.~hng <)bje,:ts in the scenes,
`and forward the snapsh1">t<; to thi:i. OAM,. i1long with
`the IDs ,,f the tr~ch containing the d.tjecfs .. The
`OAM the•l pro~esscs the snap$hOt in ur<ler Ul de,ter(cid:173)
`m.ine the type of ol)jeet , The OAM p,xK~~sitig and.
`the /\VS core engine C{ll'iipurntitms M~ asynctm.'>-
`1tou~. so the rnre t~nginc may hiwe pmcessc::d
`severul more fnime-s by Hine f!}e OAM .:_:ompkt~s.
`its analysis, Orn.ce tht~ analysis ·is c1)mpkte, the.
`OAM. ~'erKls the· ~s11hs (at1 ol*ct iype h,t~l) and
`the trnt.k m hack to thi~ z:,,re· engine. '.fbe tl)~ e11-
`gitle uses the ti.ad;, Il) IIJ ,)KSt)Ciate the .lahet with
`the , .on-:1:x:t object in the ~orrcnt fn~mt~ (as!>uming
`th~ <1bject h.ns remained in the :ia::,em.: anct t~~cn suc(cid:173)
`(iess!\1Uy tr~cktd).
`
`The VSS pnwide,\ a map dispfay of the monitored
`il.t'¢a. with the locations ·,•f the <)l~ject~ in the ~cene
`reported as icons•O:!l th(: map, Th~ VSS ~list) l¼Hows
`the use.:r t{) spt~ify alarm ·regions arid conditions,
`Alarul r.~gkms are si~•iiied hy Jrnwing tht)n1 on
`!he map usitlg a ffii)tis(! 5 :ind na.rning th(m1 a~ de••
`~in.!d, T