throbber
ATTACHMENT C
`
`TO REQUEST FOR EX PARTE REEXAMINATION OF
`
`U.S. PATENT NO. 7,932,923
`
`AVIGILON EX. 2008
`IPR2019-00311
`Page 1 of 9
`
`

`

`Spatio-Temporal Modeling of Video Data for On-Line
`Object-Oriented Query Processing
`
`Young H·ancis Day, SerhaJt Dagt,a.§, Mitsutoshi Iino, Ashfo,q Khokhar, and Arif Gha.foor
`Dist,ributed Multimedia Systems Laboratory
`School of Electrical Engineering
`Purdue University, West Lafayette) IN 47907
`
`Abstract
`
`This paper pr'esents a framework for data modeling
`and ,qemantic abstraction of image/video data. The
`fra.mrnwrk is b,1-~td on iipatio-tempoml informatior. as(cid:173)
`"sociated wit.h salient objects in an image or in a se(cid:173)
`quence of video frames and on a 8et of generalised
`n-ary operators defined to specify spatial awl tempo.
`ml relationships of objects present in the data. The
`met!wdolo.qy presented in th.is paper can manifest ii(cid:173)
`self tffectivdy in conceptualizing evenfa and l1eler·oge(cid:173)
`nwus 11iews in m1tltimedia data as perceii•ed by indi(cid:173)
`vidual 1tsers. The prnposed parndigm induces a multi(cid:173)
`level indexing and searching meclumism that models
`information at various levds of granularity and hence
`allows processin,q of content-ba.sed que.ries in real time.
`We also devise a unified object-oriented inlerface j'or·
`·users with heterogeneous views fo specify q11erie.s on
`the unbiased encoded data. C11rrentiy th.is framework
`is being de1.idoped to ·reali.tt'. a highly int.egrnted nnilti(cid:173)
`media database a.rchitecture. 1
`
`1
`
`Introduction
`
`Recent advanct~ in broadband networking, high
`performance computing, and storage systems hiwe
`result.ed in a tremendous inter0st in digitizing large
`ard1ive1; of multimedia data. aml providing interactive
`access to usem. M,,ny future multimedia applica.tions
`will require retrievai of video data including search(cid:173)
`ing, browsing, selective replays, editing, etc. Due to
`t.he shear volume of the data, all these capabilities
`require efficient cornputer vision/image processing al(cid:173)
`gorithms for automatic indexing a.nd abstra.ction of
`video data. Subsequently, powerful indexing and data
`
`1 T}Ur:s rest~arcb WM 8upported in part by the N!!!.ti:onal Sdeno::e
`Foundation under gnmt. number 9418 7$7-EEC a.nd in part by
`AR.PA under contra.ct DABT63-9Z-C-00220NR.
`
`retrieval tedmiques need to be employed to support
`content-based query processing.
`
`The key cJrn.racteristic of video data is the spa-·
`tial/tempornl semantics associated with iL, making
`video data quite different from other types of data such
`as text., voice a.nd images. A user of video dat,ahase can
`generate que:ries containing both temporal and spatial
`concepts. However, considerable semantic heterogene(cid:173)
`ity may exist among users of such data due to differ(cid:173)
`ence in their pre-conceived interpretation or intend~d
`use of the information given in a video dip. Sem~n(cid:173)
`tic heterogeneity has been a. difficult problem for con(cid:173)
`ventional database [6], and even today this problem
`is not clearly understood. Consequently, providing a
`comprehensive interpretal;ion of video data is a much
`more complex problem.
`
`Most of the existing video <lat.abase systems either
`employ imagt~ processing tedmiquei; for indexing of
`video data [5, 3, 10, 2] or nse traditional database
`approaches based on keywords or annotated textu.al
`descriptors [11, 15]. However, most of these systems
`lack the ability to provide a general-purpose, aut,o(cid:173)
`matic indexing mechanism which renders an unbfa.sed
`description of video data. Also they do not handle
`i.he semantic heterogeneity eificiently. In order to ad(cid:173)
`dress the iBSues related to user .. independent view and
`semantic heterogeneity, we propose a framework for
`semantically unbiased abstrnction of video data. The
`framework is ba.sed on spatio .. t.emporal information as(cid:173)
`sociated with salient objects in an image or in a se(cid:173)
`quence of video frames and on a set of ge1rnrnJi:;ied
`n-ary operators defined to specify spai.ial and tempo·
`ral relationships of objects -present, in the da.ta, The
`methodology presented in t,his pa.per can manifest it(cid:173)
`self effectively in conceptualizing events and heteroge(cid:173)
`neous views in multimedia data as perceived by indi(cid:173)
`vidual users. The propo!led paradigm induces a. multi~
`level indexing and searching mechanism that. models
`information at various levels of granuta.rity and hence
`
`0-8186-7105-X/95 $4.00 © 1995 lREt~
`
`98
`
`AVIGILON EX. 2008
`IPR2019-00311
`Page 2 of 9
`
`

`

`put in one sequence. The term clip is a gene:ric object
`without any structural meaning, which is a portion
`of a video sequence with a starting and. ending frame
`numbers. In order to put things in perspective, we
`first suggest the following definitions.
`Generic indexing : It is the process of identifying a
`dip from a video sequence and using image processing
`algorithms (histogrn.mB or equivalents) to partition the
`dip into ordered shots.
`Structural indexing : It is the process of group-(cid:173)
`ing continuom; shots t.o form an episode and grouping
`continuous episode-S to form a program.
`In t.his paper we address issues related io struc(cid:173)
`tural indexing only. Generally, most of the episode,;
`and programs can be expressed in the form of worldly
`knowledge by describing the interplay among physi(cid:173)
`cal objects in t.he course of time and their relation(cid:173)
`ship in space. Physical objeds may indude persons,
`buildings, vehicles, etc. Video is a typical replica of
`this worldly environment. In conceptual modeling of
`video data for the purpose of stnictural indexing, it
`is therefore important that we identify physical ob(cid:173)
`jects and their relationship in t.ime a.nd space. Subse(cid:173)
`quently, we can represent these relations in a suitable
`data structure that is useful for users to manipulate.
`Temporal relations a.mong objects have been previ(cid:173)
`ously modeled by using methods like temporal-interval
`[9). For spatial relations, most of the techniques are
`based on projecting objects on a two or three dimen(cid:173)
`sional coordinate system. Very little attempt has been
`made to formally express spatio-temporal. interactions
`of objects in a single framework. Though in [8], spa(cid:173)
`tial/ternporn.l metadata for video database is defined,
`yet no detailed modeling is provided. In the follow(cid:173)
`ing sections, we describe a generalized framework de(cid:173)
`scribing spatio-temporal relat.ionships of objects in an
`image or video.
`
`2.1 Generalized Spatial and Temporal
`Operations
`
`Generalized spatial and temporal operations pre(cid:173)
`sented in this section are an extension i.o our earlier
`work [9]. The reason of introducing the generalization
`in both spatial and temporal domains is to simplify
`describing compl.:x spatial or temporal events, which
`otherwise are rather cumbersome to expresa [4]. vVe
`first give a definition for the generalized n-ary relat.ion.
`
`Definition 1 : Generalized n-a:ry relation A
`generalized n-ary relation R( r 1 , ... , r,,) is a relation
`among n objects, r;s, that satisfies one of the condi(cid:173)
`tions in Table 1 according lo iheir positions in space
`
`I VldroD~a I
`
`Figure 1: System abstract.ion
`
`allows processing of content-based queries in real t.ime.
`However, a unified framework is needed for the users
`to express and for the system to process semantically
`het,erogeneous queries on the encoded data. For this
`purpose, we propose an object-oriented interface that
`p.rovides an elegant paradigm for representing hetero(cid:173)
`geneous views of the users. The architecture of the
`proposed system is shown in Figure 1.
`The organization of this paper is as follows. Section
`2 presents the framework for characterizing various
`events in video <lat.a. A video database architecture
`based on that framework is proposed in Section 3. In
`Section 4, an object-oriented approach le; present.eel for
`users to specify the perceived view of video data. The
`paper is concluded in Section 5.
`
`Characterizing
`for
`2 Framework
`Events in Video Data
`
`Generally, a video sequence CO!lsists of ordered
`frames that can be partitioned int.o a collect.ion of
`shots using various image processing techniques like
`histogram comparisons, motion- based indexing, and
`optical flow determination. Each shot contains no
`scene changes and is the basic element for characteriz(cid:173)
`ing the video data [14]- Several shots can be grouped
`logically into episodes or scenes., i.e., an episode is a
`specific sequence of shots [15]. Several episodes can be
`
`99
`
`AVIGILON EX. 2008
`IPR2019-00311
`Page 3 of 9
`
`

`

`barors
`
`I
`
`~ril&Tfil
`tf@'''%hf:'~',WI1'~)lmfil'tf:W:J
`eqUi!iO f;ef~~~@§°$;,;:@§i,fiil
`mr:1£@,&'.,,4- ,~ii
`tM@M1\W:M1'1tillfilil
`'
`
`1
`
`R.elati~_r_1 nan~~T-~.i~j_p~l' -~~~9~!_raints, Vi, 1 <i<rl!
`'f
`,;• < -r,+1 •
`before
`I
`fl
`,,• = r,+1'
`meets
`1vf
`·r;' < r,+1" < -r;" < r,+1 • f
`overlaps
`O
`rr' < r,+1' < r,+t • < -r,• I
`C
`contains
`I
`~ I\ .. e
`_ 1 _
`c
`e
`,
`r, < r,.f-1
`,; -- r,.f-1
`starts
`1
`D
`r,' < -r;+1' I\ r;' = -r,.J-1 •
`CO
`complete..9
`jl
`.. ~f. .. _ ... ... .'.::<.~ ... :: .. :.~±~'.J:: .. :.i• ":: .:.t±~."-
`equals
`r,' :::: starting coordinate of object r;
`r; • = ending coordinate of object -r;
`
`Table l; n-ary relations
`
`called bounding vciiume, V, for each salient physical
`object present in t.he frame has been extrad.ed and
`stored in VSDG (Video Semantic Directed Graph) [7]
`or equivalent data structure. The volume describes
`the spatial projetti.on of an object in :r., y, and z axes
`and is defined in the following way:
`
`Figure 2: 1lr!l,ry relat,fona
`
`Bounding Volume (V) :::-
`
`or ti·me domain with respect to each other.
`
`The :reiation is represented by the corresponding
`name and symbol. The operands oft.he relatiom, i.e.
`r;, (i
`:::: 1, ... , n) are either the projections of the
`positions of the objects (spatial domain) or time span
`of a certain object/event (temporal domain):
`The generalized n-ary relations and the correspond(cid:173)
`ing interval constraints are shown in Figure 2 and_Ta(cid:173)
`ble l, respectively. The same fundamental reiat10ns
`can be used either in space or time domains. The dif(cid:173)
`ference is in the meaning of the operands rather than
`the open.lion.
`fo'or the spatial domain the operands
`represent the physical location of the objects while in
`the tempornl case they represent the duration of acer(cid:173)
`tain temporal event (such as p1·esence). The number of
`operands, :n, in the relations is assumed variable. This
`generality enables any sp<',tia.l or temporal situation to
`be represented in terms of the seven fundamental n-(cid:173)
`ary relations in Figure ;2,
`
`2,2 Modeling of Spatial Events in a Single
`Frame
`
`Assume l.bat computer vision/image processing al(cid:173)
`gorithms fer object id@tifi.ca.t:on and recognition have
`been applied to video frnmeB r,ml a. spatfal attribute,
`
`(x1, x:;i, Yl, Y:i, z1, z2)
`A 2-D Bounding Box is used in those ca;,es where
`only 2-D information is available. For all three coor(cid:173)
`dinates, the points with subscripts 1 and 2 specify the
`beginning a.nd end points of the projections respec(cid:173)
`tively.
`The information provided by the bounding volumes
`is not imffit:ient to describe meaningful semantic infor(cid:173)
`mation present in a frame. Although it provid<"..s the
`most fundamental information about a frame, e.g., the
`locations of individual objects, it needs to he expa,nded
`to construct higher level contents in the frame. Such
`detailed information contents in a single frame can be
`termed as spatial events,
`For example, presiding a meeting attaches a mean(cid:173)
`ing to some spatial area. For this event, a person in
`a frame may be ident.ified such that he/she is either
`standing or sitting on a chair in the center c.r front of a
`meeting room. Another example of a spai.ial concept
`is three point position in a basket.ball ileld, Similarly,
`a person may be siUing on a ch,i.ir or some phy,,ical
`object,. in this caae, we have a conceptua{ spatial ob(cid:173)
`ject 'sitting' with attributes 'a physical object whit.h
`sits' and 'a physical object. being sit on', and they are
`relat,ed by the 'sitting' relationship.
`In order to express events in an unambiguous way,
`we present a formal definition of a spatial event ba.;;ed
`on the spatial operations discussed in the previous sec(cid:173)
`tion.
`
`100
`
`AVIGILON EX. 2008
`IPR2019-00311
`Page 4 of 9
`
`

`

`A spatial event describes the relative positiona of
`objects in a frame.
`
`Definition 2 : Spatial Event A spatial event B, is
`a logical expression consisting of various generalized n(cid:173)
`ary spatial operations on projections and is described
`11.11 follows
`E, = R1(r1 1, ... ,r.,,1) 01 R2(r/, ... ,r ... ,2) <>2 ...
`<>m-1 Rm( r1 "', ... , r,.,,. m),
`whe1·e Rj, j ::: 1, ... , m is a generalized n-ary relation,
`01,;, k :::: 1, ... , m-1 is one of the logical operators (/\
`or V) a111J rj is the projection of object j in rel~tion i
`on a:, y, or z axis.
`
`Note that more complex spatial events can be con(cid:173)
`structed by relating several spatial events using logical
`opera.tors.
`As an example of spatial event, consider a player
`holding the ball in a bMketball game. To simplify
`the characterization of this situation, we ruisume when
`the bounding boxes of the objects player and ball are
`in coni,acl. with each other, it is considered that the
`frame contains event "player holding the ball". This
`is characterized by six of the n-ary relations in both x
`and y coordinates and can be formally expressed as
`E. ::: (M(r,,1,r,,b) V O(r.,1,-r,.b) V C(,.,1,-r./') V
`S(r.,,1,r,,~) V CO(r,,1,-r,,b) V E(-r,, 1 , ,,.i)) A
`6
`1 , ry°) V C(r/, r11
`) V S(r/, ,::,•) V
`(M(r/, r1/) V 0(,11
`CO(r/, r,b) V E(r,/, 1/)),
`where r,,1 is the projection of the bounding box asso(cid:173)
`ciated with objed. player 1 on the x-axis and r.,~ is the
`projection of the bounding box associated with the ob(cid:173)
`ject ball on the x-axis, etc. If the specified condition
`ii!! satisfied for a given frame, the event E. exists.
`As a side note, we need to mention that one can
`maintain the spatial events information for each frame.
`However, the overhead associated with such detailed
`specification may be formidable. Also, tracking such
`detailed information may not even be needed for many
`applications. We, therefore, can maintain temporal in(cid:173)
`formation by only identifying spatial eventa in frames
`at. 5 distance (in frames) apart.
`Spa.tiat events can 1ierve as t.lte low level (fine·-grain)
`indexing mechanisms for video data where information
`contents at the frame-level are generated. Modeling
`more complex information contents, such as gloomy
`weather is a more challenging problem.
`
`2.3 Temporal Events
`
`The next level of video information modeling in··
`volve;; ternporai dimension. Temporal modeling of a
`video clip is crucial for users to llltimately construct
`
`complex views or i.o describe episodes/events in the
`clip. Episodes ca.n be expressed by collectively inter(cid:173)
`preting the behavior of physical objects. The behavior
`can be described by observing the total duration an
`object appears in a given video clip and its relative
`movement over the sequence frames in which it ap(cid:173)
`pears. For example, occurrence of a slam-dunk in a
`sport video clip can be an episode in a user's specified
`query.
`Modeling of this episode requires tracking motion
`of the player for whom slam-dunk is being queried
`and tracking motion of the bail in a careful manner
`especially when it approaches the hoop. Tracking the
`motion of the player and the motion of the ball are
`two simple temporal events. These temporal events
`need to be expressed prior to composing the complex
`episodes of slam-dunk. It is obvious that t.hese sim(cid:173)
`ple events can be exprnssed formally a.B a temporal se(cid:173)
`quence of various spatial events, spanning over a mnn(cid:173)
`ber of frames. Composite temporal events are defined
`in t.erms of at.her simple or complex temporal events
`relating them by the n,-ary relations. We formally de(cid:173)
`fine a temporal event a.s follows.
`
`Definition 3 : Temporal Events A simple tempo(cid:173)
`ml e~ent (E,t) is defined as a logical operation on a
`set of spatial events the durations of which are related
`by one of the n-ary temporal relations. Formally,
`E,i =
`R1 (d(E, 1 ), ••• , d(E,,.)) 01 R2(d(E,,), ... , d(E.,.)) 02
`... <>m-1 Rm(d(E.J, ... , d(E,J),
`where Rj is a genemlized n-ary r-elation and d(E,.) is
`the duration of the spatial event E,,. A compoaite tem(cid:173)
`poral event (E~t) is formed by f'uriker relating the ex(cid:173)
`i11ting temporal events. using the same spatio-temporal
`generalized operators. Formally,
`Eet =
`R1(d(Et,), ... , d(EcJ) <>1 R2(d(E1,), ... , d(Ei.)) 02
`... <>m-1 Rm(d(Ei.), ... , d(Et,J),
`
`where d(E1,) 'sin this case are durations of temporal
`e~ienis which could be either simple or composite.
`
`In video data, associated with each spatial event
`is its duration d(E,) during which the spatial event
`persists. If the event starts at frame # °' and ends at
`frame# {j then d(E,) = /3- u + 1. The duration of the
`result of an n-ary operation is the aggregate duration,
`i.e. the time ini,erval between i;he earliest starting time
`and the latest ending time of the involved objects.
`A set of spatial/temporal events can be a.rrnnged in
`a nondecreasing order in terms of the (approximate)
`start time. However, we may not know the exact inter(cid:173)
`interval delays in many cases dnring the definition of
`
`101
`
`AVIGILON EX. 2008
`IPR2019-00311
`Page 5 of 9
`
`

`

`events. Therefore, instead of giving an exact value,
`we can specify ranges of inter-interval delays. In the
`extreme cases, every duration or delay is a va.riable.
`1n summary, in defining a spatio-temporal event, we
`only have t.o specify the componeots in a. certa.in order,
`the durations (r;s) and inter-interval delays (,L>-'s) a.re
`optional
`An example for a. temporal event can be an exten(cid:173)
`sion of the previous example "holding a ball" to "pass-(cid:173)
`ing of a ball between two players". This is character(cid:173)
`ized by two events of the same type, namely: "holding
`of the ball by a player." The pass event is then corn(cid:173)
`pooed of these events with two conditions. First, these
`events should follow each other, and second is that t,he
`delay between these two events should exceed some
`specified value. Accordingly, we can express the pa$S
`event as
`E.,i=B(d(E,, ), d(E,,)),
`where B is the before n-ary operation, and d(E,;)'s
`are the durations of the spa.tiaI events a..~ defined in
`the previous section for players 1 and 2. A compos(cid:173)
`ite temporal event "3 passes that follow each other"
`can be sirnlla::rly expressed as a sequence of temporal
`events as
`Eot~B(d(E,t, ), d(E,t,), d(B,, 0 )),
`where d(E, 1,)'s are dumtious of the corresponding
`simple temporal events.
`
`3 Video Database Architecture
`
`Summarizing the discussion of the previous sec(cid:173)
`tion, we ha.ve suggested three levels of semantic in(cid:173)
`dexing of video databases. The first level identifies
`spatial events from the physical position of salient,
`objects. The second level maintains indexes for sim(cid:173)
`ple temporal events using the information n1aintained
`on the first level. Subsequently, composite temporal
`events are identified from simple events and composit.e
`events.
`The first step is to process the ra.w video data and
`extract the relevant information such as the identities
`of the objects of interest and their bounding volnmes,
`This constitutes a challenging problem even for to(cid:173)
`day's advanced computer vision technology. Discus(cid:173)
`sion of this problem is not, the main theme of thi!l pa(cid:173)
`pero However, it's worthwhile t.o mention some issues
`related t.o the initia.l processing.
`For an .easier ;~nd more efficient recognit.ion, ob(cid:173)
`jects should be grouped into d1!Sses. This enables
`pre-defined object models to be used and simplifies
`recognition t.hrough appropriate matching techniques.
`
`Since the identities of human objects, which are obvi(cid:173)
`ously of special interest in video databases, are deter(cid:173)
`mined by their faces, their recognition should be given
`special treatment. Among many different schemes
`that incorporate a wide range of methods, EIGEN(cid:173)
`f'.ACE technique appears to be t,he most reliable and
`robust [16]. This method recognizes faces according
`to their representatioo in a feature spa.ce formed by
`"eigeufaces" of the sample faces, The bounding vol~
`ume information for any recogni;sed object can easily
`obtained through well-established edge detection algo(cid:173)
`rithms to be used in the later stepB to construct the
`entire database.
`A possible architect,ure of the spatfo .. t,emporal pa.rt
`pf the database is shown in Figure 3. At each level
`of indexing ( event database), the event definitions rui
`weli as the ad,ual event ocrnrrences and Uieir corre(cid:173)
`sponding information is stored" The event databases
`can be implemented either in an object-oriented envi(cid:173)
`romnent o:r as a relational database. We will illustrate
`using the object.,oriented paradigm next.
`At the first level of the archii.ecture lies tru, spatial
`event database. This database is built by direct use of
`the information about physical objects and their po(cid:173)
`sitions. This information, for example, can be stored
`in VSDG (Video Semantic Directed Graph) [7]. Each
`event. is represented as a class. The event definition
`,,nd recognition procedures are stored as methods in
`the dass definition.. We also record the following infor(cid:173)
`mation (stored as instance variables): the object IDs
`of the participating objects, clip number of the· spatial
`event. detected, the st.arting and ending frame num(cid:173)
`bers of t.he event. The actual events identlfied are the
`instances of the corresponding event class. All the in(cid:173)
`stances of a dass constitute the collection of the class.
`The spa.tial event database is updated with new in(cid:173)
`stances and event types a!l they iJ.re encountered; dur(cid:173)
`ing the archiving/retrieval proc<"..ss.
`The spatial events (Es) information is used to con(cid:173)
`struct the second level indexing scheme, simple tempo(cid:173)
`rn.l event dat1>.base. The classes in this level :represent.
`the simple temporal events, formally defined in the
`previous section. The methods of a class a.re used· to
`represent and recognize the event as in the first level.
`The instance variables of each of thtt cla.~s are follow(cid:173)
`ing: durations and IDs of component spatial events
`specified in the first level indexing; clip nmnber of the
`video dip cont,aining the simple temporal event, and
`the starting and ending fn,me numbers of the event.
`At the highest, level is the composite temporal event
`data.base. The classes at this level represent the tem(cid:173)
`poral events among the simple/composite temporal
`
`102
`
`AVIGILON EX. 2008
`IPR2019-00311
`Page 6 of 9
`
`

`

`~ f
`....... ,,..,
`
`A
`
`01 ¢ 3 0 11
`
`-~
`
`s
`
`M
`
`Composite T,:;r~~l
`Event Database
`
`tl
`
`It\
`
`¢ 1 C2 0~
`~«aB!Sl&ri
`IIS-<\l
`
`l
`
`0 1 Ca i'J!!,
`aw,.son
`{~ )
`(IS-PilRT.01')
`
`~s-wO!. vro-;N)
`
`{IS-MEMOCA-01')
`
`Figure 4: Object--oriented abstractions.
`
`an object-oriented approach which provides an elegant
`paradigm for user's view representation. It is a hierar(cid:173)
`chical schema that allows users to compose their views
`about the video data. The purpose is to offer the max(cid:173)
`imum fl.e.'lf.ibility io a user to represent their semantics
`and at the same time allow processing of het,mlge(cid:173)
`neous query by interacting with t,he proposed architec(cid:173)
`ture. The objective for a:r1 object-oriented paradigm is
`two folds. First. the users can define their conceptual
`queries involvi~g motion in a systematic way. Sec(cid:173)
`ond, it allows processing of user's conceptual queries
`by using various event databMes. 'l'herefore, the sys(cid:173)
`tem processes users' queries with the assistance of the
`proposed object-oriented views. In other words, an
`object-oriented view serves as a "knowledge base" for
`the system.
`Corresponding to the three entities (physical ob(cid:173)
`jects, spatial events and temporal events) used in
`the modeling of video data, t.hree objects are defined
`from the user point of view. These are physical ob(cid:173)
`jects (PO), spatial objects (SO), and temporal objects
`(TO). For video data, a user can use combinations of
`various object-oriented abstractions (such as shown in
`Figure 4) on these objects to specify queries. Theim(cid:173)
`porta.nt. feature of this hierarchy, and in general for any
`object-oriented abstraction, is that terminal nodes are
`either POs, SOs, or TOs. Any complex video query is
`expressed as a function of these nodes and processing
`of such queries requires searching the occurrence of
`SOs and TOs over the specified PO's. As an example,
`consider a sports video database which can be used
`by multiple users with different interests. Figure 5 de(cid:173)
`scribes an object hierarchy of view /knowledge which
`a user may would like i.o construct.
`A sports-fan may view video data as a collec(cid:173)
`tion of players, event, and teams. Furthermore, in
`his/her view, there a:re three types of players, for(cid:173)
`ward, guard, and center. There are two types of event,
`individuaLevent and team_event. Tea.ms consist of
`those from NBA and NCAA. Also, the composition of
`the field is described in detail. A sports fan can gener(cid:173)
`ate a, query such a.a 'Give the video clips where Michael
`Jordan (i.e., a PO) hM a slam-dunk (TO)'. The sys-
`
`103
`
`Figure 3: An architecture for spatio-temporal event
`identification
`
`events. The strncture of the dasses is the same M
`in the second level, except the durations and IDs of
`temporal events are recorded. Composite temporal
`events can be recursively formed from simple and/or
`composite events.
`Although we have not presented a formal gram(cid:173)
`mar for expressing queries, we have proposed a gen(cid:173)
`eral framework for characterizing events u:,ing general(cid:173)
`ized n-ary operators. A user can specify more events
`as needed and store them. New classes can also be
`formed based on t,he existing classes at. lower/same
`level through n-ary operations and using class inher(cid:173)
`itance. OccasionaJly, the system may resort back to
`processing of raw video data to identify objects that
`were not previously identified. We expect the pro(cid:173)
`posed methodology to be helpful in providing on-line
`capabilities for query processing.
`
`4 An Object-Oriented Model of Video
`Data
`
`As mentioned earlier, video data is represeni.ed by
`three entities, physical objecta, 6pa,tial events il,lld t,ein(cid:173)
`poral events. For user to query video dil,ta, W€ propos€
`
`AVIGILON EX. 2008
`IPR2019-00311
`Page 7 of 9
`
`

`

`and express complex querie,; in more abstract man(cid:173)
`ners. The spatial objects (SOs) and temporal objects
`defined earlier are used as operands for spatial and
`temporal predic.ates, For additional detail on the use
`of predicate logic, we refer to [7].
`
`5 Conclusion
`
`We have presented a framework for sema.nt.ic index(cid:173)
`ing of video data. using the generalized n-ary opern.to:rs
`proposed in this pa.per. The same set of operators
`is used for modeling both spatial and temporal con(cid:173)
`tents of an image or a. sequence of video frames. This
`enables a unified methodology for handling content-(cid:173)
`based spatial and spatio-temporal queries. The sys(cid:173)
`tem is hie:rarchica.l in nat!lre and allows multi-level
`indexing and searching mechanism by modeling infor-·
`ma.tion at various levels of semantic granularity and
`hence allows processing of content-based queries with(cid:173)
`out processing raw image or video data. Currently
`t.hii; fra.:mework ia being developed at Purdue lJniver(cid:173)
`sity to realir.e a .highly integr~!.ed multimedia database
`architecture.
`
`References
`
`[1} A. Abellll>, :md J. R. Kend.-,f, "Qu"1ite.fr,e Describing Ql:,..
`jects Using Spatl"1 l?repositions," in AAA.l-9S, Proc. of
`11th. N .. tfoni,/ Gonf•_rence on Artifi.ti .. l lnfo/li;re"oe, pp.
`536-MO.
`
`[2) F. Arman, R Depommier, A, Hsu, and M-Y. Ch.in,
`"Content-ba,sed Brnwsing of Video Sequences," Proo. of
`Second ACM Internaiia1u,/ Conj. on M1,li.imed.ia, San
`Firamcisco, CA, October l9lH, pp. 97-102.
`
`[3] T. A:n,dt, and S.-K. Chang, "Image Sequence Gompre;i.(cid:173)
`'89 Work•l>op an Vi(cid:173)
`ek,n by Icc,nic Inde:ring," IEEE VL-
`a,;,,./ L,mfJ1'&!Je5, Rome, Italy, 1989,
`
`[4] A. Del Bimbo, E. Vicarlo, a.nd D. Zlngim.i, "A Spatio(cid:173)
`Temporal Logic for Image Sequence Coding and R~-
`1.l'ieva.l.," Proceeding of !F:E.8 VD' 92 Work•hop on ·vi&u.a.l
`L,mg&:tgeJ, Seattle WA, Septemb~r 1992, pp. 228-230.
`
`[5] S ... K. Ch"--«g, q .. Y. Sh;, ""d 0.-\,V. Y&n, "Iconk l,o.d«,ci:..i;
`by 2-D String-a/' IEEE Tr!1:nJt1-~tfon.:3 :,n. PaUe:rn Analysi$
`and lvfathine lntoiligot>oe, Vol. PAMI-9, No. 3, May 1987,
`pp. 413-427.
`
`[6) C, J, Date, ..4,. lnfrod,;dfon fo D,,-t~bau Syskff», Vol. 1,
`5th edit.ion, Addfoion We•ley, 1990.
`
`[7} Y. P. DBy, S. Dag,,.,,, M, Iino, A. Kh.okha.r, &Ud A.
`Gh.afoor, "Object-Oriented Conceptual Modeling of Vldeo
`Data," To ~ppt,w in P~oo. IEEE JO.DE '9/i,
`
`104
`
`Figure 5; Fan's view
`
`tem first searches the slam-dunk relation or collection
`of composite temporaJ. event database to find if Jor(cid:173)
`dan appears one or more titnes. If it is true, the clip
`numbers containing Michael Jordan slam-dunks are
`returned. Otherwise, the system goes to lower !evd
`event databases and event definition databases until
`it finds enough informa.tion to evaluate the query. A
`fan may wa.nt; to identify those video segments where
`stea.i (TO) occurs a.round the right sideline of the front
`court. In this case "right sideline" and "front court"
`are also objects of interest in addition to the player
`and the ball.
`The definitions of some of the classes used in these
`examples are given in Table 2. The methods are coded
`based on generalized n-ary operator@ and describe the
`spatio-temporal processing related to the event of the
`corresponding object. In class NBA, "SetOf' is used
`to specify t.he association abstraction.
`
`Table 2: Clas.s definitions.
`The predicate logic described in our earlier work t7J
`and also used in [1, 12, 13] can be used to ,:onst.ruct
`
`AVIGILON EX. 2008
`IPR2019-00311
`Page 8 of 9
`
`

`

`ln. Vidoo
`"MetM1>ta
`[8] R. Jihin and A. Harnp1>pur,
`Dat&baseo," AOM S1GMOD RECORD, VoL 23, No. 4,
`December 19:M, pp. 27•33.
`
`[9] T, D. C. Little and A. Ghafoor, "lnterval-Ba.oetl Concep(cid:173)
`tual. Models for Time-dependent Multimedia Data," IEEE
`Tr,rnsaclfona on Knowledge and.1)11,fo. Engineering, Vol. 5,
`No. ,J, August 1993, pp. 551-663:
`
`[10)
`
`A. Nagw11,Jm, and Y. 'l'=a.ka., "A utom ... eic Video Inde,dng
`and F\ill Video Se.arclt for Object Apperu:&nces," in 2nd
`Working Conference on ViRual Datda$o System$, Bu(cid:173)
`dapei;t, Hungary, October 1mn, IFIP WG 2.6., pp.119-
`133.
`
`!11]
`
`E. Comoto, and K. Tanaka, "OVID: Deoign and Imple(cid:173)
`mentation of ,. Video-Object De.tabru;e System," · iEEE
`Tranu.ctivns on Knowledge ,md Data Engineering, Vc,i.
`5, No. 4, August 1993, pp. 629--643,

`
`[U] D. A. Randell and A. G. Cohn, "Elqlloitlng L1>ttlce ln,.
`th.eol!'y of Spa.c., ,md Time," Computer Malkematfo• A;,(cid:173)
`plfoatforu, Vol. 23 1 No. 6-9, 1992, pp.459-416.
`
`!13] D, A. Ran.deli, Z. Cul, llilld A.G. Cohn, "A SpaUal Logfo
`Ba.oed on Regiorui a.nd Connection," in Proc. of 9rd Intl.
`Conj. o" Principle~ of J(nowledge Rept>e,enfatfon i>nd
`Re1,aeni'/l.g, Cambridge, MA, 1992, pp. 165-176.
`
`[14j S. W. Smoliar and H. Zh=g, "Conte,,t-Booed. Vidro fo.
`deicing ,,,,,.d Retrl.eval,» IEEE J.,fo/timedia, Vol. 1, Ne.. 2,
`Summer 19!M, pp. 62-72.
`
`[l5j D. Swanberg, C.-F. Shu, R. J,.in, "Knowledge Gwded
`Paniing ln Video Data.bru;es," Proc. SPIE 9!1, San Jo,e,
`Jan.mary 1993, pp. 3-11 - 3-22.
`
`[16] M. Turk and A. Penthmd, "Eig.,rifaces for Recognition,"
`Jo11,r1,11/ of Cegniiive N .... roseiet1oe, Vol. 3, No. 1, Ja.,..,_,
`1989, pp. 71-86.
`
`105
`
`AVIGILON EX. 2008
`IPR2019-00311
`Page 9 of 9
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket