throbber
Object-Oriented Conceptual Modeling of Video Data
`
`Young Francis Day, Serhan DagtaS, Mitsutoshi Iino
`Ashfaq Khokhar, and Arif Ghafoor
`Distributed Multimedia Systems Laboratory
`School of Electrical Engineering
`Purdue University, West Lafayette, IN 47907
`
`Abstract
`
`In this paper, we propose a graphical data model
`for specifying spatio-temporal semantics of video data.
`The proposed model segments a video clip into sub-
`segments consisting of objects. Each object is de-
`tected and recognized, and the relevant information of
`each object is recorded. The motions of objects are
`modeled through their relative spatial relationships as
`time evolves. Based on the semantics provided b y this
`model, a user can create his/her own object-oriented
`view of the video database. Using ihe propositional
`logic, we describe a methodology f o r specifying concep-
`tual queries involving spatio-temporal semantics and
`expressing views f o r retrieving various video clips. Al-
`ternatively, a user can sketch the query, by examplify-
`ing the concept. The proposed methodology can be used
`to specify spatio-temporal concepts at various levels of
`information granularity.
`
`1 Introduction
`
`The key characteristic of video data is the spa-
`tial/temporal semantics associated with it, making
`video data quite different from other type of data such
`as text, voice and images. A user of video database can
`generate queries containing both temporal and spatial
`concepts. However, considerable semantic heterogene-
`ity may exist among users of such data due to differ-
`ence in their pre-conceived interpretation or intended
`use of the information given in a video clip. Seman-
`tic heterogeneity has been a difficult problem for con-
`ventional database [3], and even today this problem
`is not clearly understood. Consequently, providing a
`comprehensive interpretation of video data is a much
`more complex problem.
`
`'This research was supported in part by the National Science
`Foundation under grant number 9418 767-EEC
`
`Most of the existing video database systems ei-
`ther employ image processing techniques for indexing
`of video data or use traditional database approaches
`based on keywords or annotated textual descriptors.
`For indexing, keywords and textual descriptions for
`video data have also been suggested in [6], based on
`a generalization hierarchy in object-oriented realm.
`Video segments can be joined or concatenated based
`on the semantics of this hierarchy. However, this ap-
`proach is very tedious since the perception of video
`contents is done manually by users, not through an
`automatic image processing mechanism. A video sys-
`tem that automatically parses video data into scenes
`using a color histogram comparison routine have been
`proposed in [5]. To locate frames containing desired
`objects, a method based on comparing the color his-
`In [4], a hierar-
`togram maps of objects is used.
`chical video stream model is proposed, that uses a
`template or histogram matching technique to identify
`scene change in a video segment. A video segment is
`thus divided into several subsegments. In each sub-
`segment, a frame-based model is used to index the
`beginning frame of a subsegment from which objects
`are identified. In this system a video stream is parsed,
`and the information is stored in the database. How-
`ever, this system has many limitations. First, the tex-
`tual descriptions are manually associated with video
`segments. Second, t,he system provides parsing only
`for a specific type of video.
`In order to address the issues related to user-
`independent view and semantic heterogeneity, we pro-
`pose a semant,ically unbiased abstraction for video
`data using a directed graph model. The model al-
`lows t,o represent spatic-t,emporal aspects of informa-
`tion associated with objects (persons, buildings, vehi-
`cles, etc.) present in video data. However, such an au-
`tomated video data-base system requires an effective
`and robust recognition of objects present in the video
`database. Due to the diverse nature of the video data,
`we can use various t,echniques currently available ac-
`
`1063-6382/95 $4.00 0 1995 IEEE
`
`40 1
`
`

`

`VSDGCompU.IMobl +5?
`
`b
`
`I
`
`k W v i 0 Data
`Figure 1: System abstraction
`cording to the requirements of different situations that
`may occur in the input. For each input video clip,
`using a database of known objects, we first identify
`the corresponding objects, their sizes and locations,
`their relative positions and movements, and then en-
`code this information in the proposed graphical model.
`The encoded video data may be semantically highly
`rich. Therefore a unified framework is needed for the
`users to express and for the system to process seman-
`tically heterogeneous queries on the unbiased encoded
`data. For this purpose, we propose an object-oriented
`approach that provides an elegant paradigm for rep-
`resenting user’s view of the data. It is a hierarchical
`scheme that provides the necessary framework for a
`user to compose the views of the data with the maxi-
`mum flexibility and at the same time allows processing
`of heterogeneous queries by evaluating the proposed
`graphical abstraction. For this purpose we also de-
`fine an interface between these modeling paradigms.
`Another reason for this modeling approach is to pro-
`vide an efficient indexing mechanism for on-line query
`processing without performing computations on t,he
`raw video data since such computation can be quite
`extensive. The proposed VSDG can be generated off-
`line and subsequently can be used to process user’s
`queries on-line. The architecture of the proposed sys-
`tem is shown in Figure 1.
`The organization of this paper is as follows. Sec-
`tion 2 describes the assumptions, techniques and the
`proposed graphical model. Two examples of video
`clips are used to illustrate the model. In Section 3,
`an object-oriented approach is presented for users to
`specify the perceived view of video data. The use of
`predicate logic for specifying spatio-temporal concepts
`and query types are also presented there. Section 4 de-
`scribes the computer vision/image processing require-
`ments of the proposed model. The paper is concluded
`in Section 5 .
`
`cal objects in the course of time and their relation-
`ship in space. Physical objects may include persons,
`buildings, vehicles, etc. A video database, is a typ-
`ical replica of this wordly environment. In concep-
`tual modeling of video data, it is therefore important
`that we identify physical objects and their relation-
`ship in time and space. Subsequently, we can represent
`these relations in a suitable structure that is useful for
`users to manipulate. Temporal relations among ob-
`jects has been previously modeled by using Petri-Net
`[l], temporal-interval [2], languages, etc. For spatial
`relations, most of the techniques are based on project-
`ing objects on a two or three dimensional coordinate
`system. Very little attempt has been made to for-
`mally express spatio-temporal interactions of objects
`in a single framework. In this section, we propose a
`graphical model to capture both spatial and tempo-
`ral semantics of video data. The proposed model is
`referred as Video Semantic Directed Graph (VSDG)
`model. The most important features of the VSDG
`model is an unbiased representation of the information
`while providing a reference framework for construct-
`ing semantically heterogeneous user’s view of the video
`data. Using this model along with the object-oriented
`hierarchy (discussed in Section 3) the system shown in
`Figure 1 can be used for on-line query processing, with
`performing any computation on the actual raw video
`data. In the following sections, modeling of space and
`time informatlion associated with objects is described
`in detail.
`
`2.1 Spatio-Temporal Modeling over a Se-
`quence of Frames (a Clip)
`
`The spatial attrihute, of a salient physical object
`present in the frames can be extracted in form of
`bounding volume, 2 , that describes the spatial pro-
`jection of an object, in three dimensions. It is a func-
`tion of bounding rectangular ( Y ) , centroid, and depth
`information related to the object. The bounding rect-
`angular is computed with reference to a coordinate
`system with an origin at the lower left corner of each
`frame. Both 2 , and Y are expressed as :
`
`Bounding Rectangular ( Y ) = (width, height, 2 , y)
`
`Bounding Volume (2) =
`
`2 Graph-Based Conceptual Modeling
`
`Generally, most of the wordly knowledge can he
`expressed by describing the interplay among physi-
`
`(Bounding Rectangular, centroid, depth)
`
`Temporal modeling of a video clip is crucial for
`users to constriict, views or to describe episode/events
`in the clip. Episodes can be expressed by collectively
`
`402
`
`

`

`interpreting the behavior of physical objects. The be-
`havior can be described by observing the total dura-
`tion an object appears in a given video clip and its
`relative movement over all the frames. For example,
`occurrence of a slam-dunk in a sport video clip can be
`an episode in a users specified query. The processing
`of this query requires evaluation of both spatial and
`temporal aspects of various objects.
`Temporal information of objects can be captured
`by specifying the changes in the spatial parameters
`associated with the bounding volume (2) of objects
`over the sequence of frames. At the finest level, these
`changes can be recorded at each frame. Although,
`this fine-grained temporal specification may be desir-
`able for frame based indexing of video data, it may
`not be required in most of the applications. The over-
`head associated with such detailed specification may
`be formidable. Alternatively, a coarse-grained tempo-
`ral specification can be maintained by only analyzing
`frames at 6 distance apart [7]. This skip distance (6)
`may depend upon the complexity of episodes. 6 is
`an interger with frame as its unit. There is an ob-
`vious tradeoff between the amount of storage needed
`for temporal specification and the detail of informa-
`tion maintained by the VSDG model.
`
`2.2 The Proposed Model
`
`Formally, both the spatial and temporal specifica-
`tions of a clip can be represented as a directed graph,
`as shown in Figure 2, that consists of n video seg-
`ments, labeled V I , V - , . . ., V,.
`In this model, time
`spans of physical objects are represented as circular
`nodes in the graph. Salient objects within a segment
`are grouped between two rectangular nodes, whereas
`such a node marks the appearance of a new physical
`object. Within a segment there can be an arbitrary
`number of objects. A video clip may consist of several
`such segments. An object may appear in any number
`of segments. In order to capture temporal semantics
`of object via VSDG, a motion vector can be associated
`with each object, represented by a circular node. The
`formal definition of VSDG is following.
`A VSDG is a directed graph with a set of circu-
`lar nodes (P), a set of rectangular nodes (T), and a
`set of arcs (A) connecting circular nodes to rectangu-
`lar nodes. A circular node has an attribute describing
`the duration for which an object (person, vehicle, etc.)
`appears in a video segment. A rectangular node cor-
`responds to an event in a video clip whenever a new
`physical object appears. In other words, a rectangu-
`lar node marks the start of a new segment, that differs
`from its predecessor segment in terms of appearance of
`
`Figure 2: VSDG representation of a clip
`
`a new physical object. Each circular node has exactly
`one incoming arc and one outgoing arc. The dura-
`is max(duration(O;l), . . .,
`tion of a video segment
`duration(O;,,)), where m; is the number of salient
`objects Oij (1 5 j 5 m i ) appearing in segment V;.
`Each circular node has the following attributes:
`D:P + I,
`is a mapping from t,he set of circular nodes to the set
`of durations in terms of frames.
`0 W : P + (21,. . .,z[;j),
`is a mapping from the set, of circular nodes to the set
`of motion vetors. The element Z; (Vi (1 5 i 5 [:I))
`of t,his vector is the Boiinding Volume at i-2h sampled
`frame. In other words, 2; = (Bounding Boxi, depth;,
`centroid;), and Bounding Box; = (width;, heighti ,
`z;, y i ) , where 1 represents the number of frames as-
`sociated with object. Oi in a given video segment and
`6 is tlhe time granularity for tracking motion of every
`object in a video segment.
`
`From t,he motion vectors, we can derive the rela-
`tive spatial relationship between any two objects for
`any sampled frame. The relative movements between
`two objects can be described as a set of mutual spa-
`tial relationships, a3 in [$]. For any sampled frame,
`the relative positmion between objects 0; and Oj can
`be captured by the spatial relationship between their
`projections on each coordinate axis, I , y, and z . In
`other words, for O;, R;j = ( S R ; j = , SR;j,, S R i j z ) rep-
`resents the spatial relat,ionship of projections of Oj
`wit,h respect, to 0, on each axis. If there are IC concur-
`rent objects, between t,wo rectangular nodes, then IC -
`1 such vectors are generated for each Oi. Therefore,
`in a given segment of a VSDG, if an object 0, appears
`concurrently with 0, for 1 frames, then 0; has a vector
`M;j = (R;j, , . . . , R;j L6, ). Similarly, Oj has a vect,or
`Mji = ( R j i , , . . . , Rji
`). To determine the speed of
`L i J .
`an object,, we t,ake t,he difference Z, - 2 0 and divide it
`For O;, vectors R and M are derived infor-
`by [:J6.
`mation and are additional attributes associated with
`circular node represent,ing 0;.
`
`403
`
`

`

`3 A VSDG-Based Object-Oriented
`Model
`
`As mentioned earlier, the objective of the system,
`shown in Figure 1, is to process video queries on-line.
`We, therefore, need to represent the rich semantics of
`video data using a suitable data model. In this sec-
`tion, we propose an object-oriented model which can
`be interfaced with the VSDG model. The objective
`of proposing an object-oriented paradigm is two folds.
`First, this can be used by the users to define their
`conceptrial queries involving motion in a systematic
`way. Second, it can allow processing of user’s con-
`ceptual queries by evaluating VSDG. Therefore, the
`system processes users’ queries with the assistance of
`the object-oriented views. In other words, an object-
`oriented view serves as a ”knowledge base” for the sys-
`tem. In Section 3.1, we describe how a user can con-
`struct. views about. the video data consisting of various
`objects.
`
`3.1 An Object-Oriented Based User’s De-
`fined View
`
`As we have mentioned video data is represented
`by three entities, spatial, temporal and physical ob-
`jects. For user to query video data, we propose an
`object-oriented approach which provides an elegant
`paradigm for user’s view representation. It is a hi-
`erarchical schema that. allows a user to compose their
`views about, the video data. The objective is to offer
`the maximum flexibility to a user to represent their
`semantics and at the same time allow processing of
`heterogeneous query by executing the proposed VSDG
`model. Generally, in any worldly knowledge three en-
`tities of interest are: spatial, temporal, and physical.
`The spatial entity called Conceptual Spatial Object
`(CSO) is the spatial concept associated with an object
`which can be extract,ed at the frame level. For exam-
`ple, Presiding a meeting attaches a meaning to some
`spatial area. For this concept, a person in a frame
`may be itlent,ified such that he/she is either stand-
`ing or sitting on a chair in the center of a meeting
`room. Another example of a spatial concept is ‘sit-
`ting’. A person may be sitting on a chair or some
`physical object. In this case, we have a conceptual
`spatial object, ‘sitting’ with attributes ‘a physical ob-
`ject which can sit.’ and ‘a physical object being sit
`on’, and they are relat.etl by the ‘sitting’ relationship.
`Conceptwil Temporal Object (CTO) defines the con-
`cept t<hat extends over a sequence of frames. It may
`involve several CSOs or CTOs (may be combination
`of both). There can be several temporal relationships
`
`Figure 3: Snapshots of two example video clips
`
`2.3 An Example of VSDG-Based Model-
`ing
`
`In this section, we use a video clip shown in Figure
`3 to illustrate the proposed model. In the example
`video clip (Figure 3(a)), a car (object 2) and a per-
`son (object 1) appear first, then the camera moves
`toward the right and two persons (object 1 and object
`5) are walking toward each other and shake hands.
`Assuming that proper object recognition methods are
`used to identify these objects, we can appropriately
`define the bounding volumes information for the ob-
`jects. The complete VSDG model, for the example
`video clip is given in Figure 4, which describes the
`information about various objects and their temporal
`behaviors. The VSDG in Figure 4, has four rectan-
`gular nodes which correspond to three different scene
`changes. The first rectangular node ( t o ) marks the
`start of video clip, tl indicates the appearance of ob-
`jects 0 5 , t 2 indicates the appearance of object OS, and
`t 3 indicates the end of the video clip. There are a to-
`tal of six objects, 01, 0 2 , 0 3 , 04, 0 5 , OS, and some
`objects appear in multiple scenes. For example, 01,
`and V,.
`0 2 , 0 3 , and 0 4 appear in video segments
`
`404
`
`

`

`Figure 5: Fan’s view
`Michael Jordan (i.e., a PO) has a slam-dunk (CSO +
`CTO)’. The system first finds the image of Michael
`Jordan from the picture attribute of player class, and
`selects a set of video clips, then the slamdunk CTO
`is involved with its inputs supplied by the places in
`VSDG of selected video clips. On the other hand,
`a coach would like to identify those video segments
`where pass (CTO) occurs around the right sideline of
`the front court. The system invokes methods associ-
`ated with a CTO called pass and with a CSO called
`right sideline and uses them to evaluate VSDG.
`The definit,ions of some of the classes used in this
`example are given in Table 2. The entries are self-
`explanatory. The methods are coded based on predi-
`cate logic and describe the spatio-temporal processing
`related to the event of that object. In class NBA,
`”Setof’ is iised t,o specify t,he association abstraction.
`The method ”slamdunk” will be defined later.
`
`3.2 Predicate Logic for Spatio-Temporal
`Semantics
`
`In this section we describe a methodology to express
`concept,s to express queries for video data using CSO
`and CTO and rules based on predicate logic.
`
`3.2.1 Spatial Predicate
`Here we define two sets of spatial predicates. The
`first set specifies predicates involving absolute mea-
`sures in video data, while the second set represents
`spatial predicates based on relative positions among
`objects in video d a h . These sets are defined on two
`arbitrary physical objects ”a” and ”b”. Similar rela-
`tions are described in [ll, 9, lo].
`Set I :
`D ( t , a , b ) : Dist,ance bet,ween a and b at time 1 (may
`not, need t)
`- = J(xat - z b t ) ’ + (yat - Y b t ) ’ + ( t a t - %at)’
`DP(t1, t z , a): Displacement of a between 11 and 12
`= - J ( ~ a t 1 - X a t z ) ’ + (yat1 - ~atz)’ + ( t a t 1 - %at’)’
`RC(t, a , b): The relative coordinate of b to a at time t
`- = ( z b t - r a t , Y b t - !/at, Z b t - t a t )
`
`t - Y + Y + Y +
`
`Figure 4: VSDG representation of the example clip
`
`among physical objects (described below). Examples
`are ‘slamdunk’, ‘walking’, etc. A formal definition of
`CTO is given in Section 3.2. Lastly, Physical Objects
`(PO) are the physical objects described above which
`correspond to places in VSDG. Some examples are
`persons, tree, houses, et c .
`Both CSO and CTO can be called logical objects.
`In general, any concept or semantic view of a user
`can be expressed using a set of rules from predicate
`logic that operate upon CSOs, CTOs, and POs. The
`foundation of these rules is given in Section 3.2. The
`objects and rules together can describe complex be-
`havior involving multiple objects.
`For video data, a user can use combination of vari-
`ous abstractions to construct his/her view of the video
`data. The important feature of this hierarchy, and in
`general for any object-oriented abstractions, is that
`each terminal node is either a CTO, a CSO, or a PO.
`Any complex video query is expressed as a function
`of these terminal nodes and processing of such query
`requires execution of some CTO and CSO over the
`specified PO’s. As an example, consider a sports video
`database which can be used by multiple users with
`widely different interests. Figure 5 describes an ob-
`ject hierarchy of view/knowledge which a user would
`like to construct.
`A fan may view the video data as the collec-
`tion of players, event, and teams. Furthermore, in
`his/her view, there are three types of players, for-
`ward, guard, and center. There are two types of
`event, individualxvent and team-event. Teams con-
`sist of those from NBA and NCAA. A sports fan can
`generate a query such as ‘Give the video clips where
`
`405
`
`

`

`EC(a, 6): a externally connects with 6.
`m(a,, 6,) V m(ay, 6 , ) V m ( a z , bz) V m-'(az, 6,)
`Vm-'(ay, by) v m - ' ( a 2 , 6 , )
`We define a predicate called INTERSECT(ai, 6 i ) as
`o(ai,bi) V s ( a i , b i ) V d ( a i , b i ) V f ( a i , b i ) V o-'(ai,bi)
`Vs-'(ai,bi) v d - ' ( a i , b i ) Vf-'(a;,bi)
`PO(a, 6): a partially overlaps 6 .
`E INTERSECT(a,, bZ) A INTERSECT(ay , 6,)A
`INTERSECT(a,,b,) A -.PP(a,b) A y P P ( 6 , a ) A
`Table 2: Class definitions
`6 )
`+(a,
`a no
`DC(a, 6): a is discrete from 6 .
`E -INTERSECT(a,, 6,) A -INTERSECT(a,, by)
`A-INTERSECT(a,,b,)
`
`search(), . . .
`
`pl.).e,
`
`=.me, te.m
`birthdate
`
`In specifying the spatial predicates for Set 11, thir-
`teen spatial relations between two objects are adopted.
`The thirteen spatial relations and their notations are
`shown in Table 11. Here, we use the same notations
`as we used earlier in our temporal modeling [2]. The
`relations in Set I1 can be graphically illustrated in Fig-
`ure 6. Note that "a" and "b" can be either 2D or 3D
`objects.
`Table 3: Spatial/temporal relations and notations
`
`mt
`
`Temporal Notation
`bi
`
`Relation
`Spatial Notation
`before
`b
`after
`b-'
`meets
`m
`met - by
`m-'
`ov erlaps
`overlapped - by
`0
`0-1
`starts
`8
`started - by
`8 - 1
`during
`d
`during-'
`d-' f
`d;'
`finishes
`r;'
`f-'
`finishes-'
`eauals
`e
`e,
`The spatial predicates for Set I1 are represented as fol-
`lows :
`E(a, 6): a is equal to 6.
`= e(a,, 6,) A e(ay, 6,) A e(a,, 6 2 )
`Note that ai is a's projection on i-axis.
`PP(a,b): a is a proper part of 6.
`[ ~ ( a z 6%) V d ( a z , a,)
`V f(az, bo)] A
`M a y , 6,) v 4 a Y , 6,) v f(aY, 6, 11 A
`[S(%, 6 2 ) v d ( % , 6,) v f(a2, bz)]
`
`Macro
`I3 (Before)
`0 (Overlap)
`
`definition
`b t v mt
`V f;' V et
`0 1 V s t V d;'
`
`3.2.3 Expressing Queries Using Predicate
`Logic
`In the following section, we classify and characterize
`different types of queries and explain with examples
`how the spatio-temporal logic predicate can be used
`to describe t,hese qiieries. Generally video queries
`can be classified into two categories: spatial queries
`and spatia-temporal queries. Spatial queries deal only
`
`406
`
`

`

`with the spatial attributes of objects, including their
`appearances. Following are few queries of this type :
`
`0 Querying whether or not an object/person is
`present in a video clip(s):(x IN v ) .
`0 Identifying the relative position of object/person.
`For example, search for those video clips where
`Mr. X appears with Mr. Y, with X standing in
`front of Y. The predicate for such a query is :
`3 f E frame, 3 x , y E person
`z IN f A y IN f A x.z.depth < y.z.depth
`Here x, y are circular nodes in VSDG and r.depth
`is the depth of a bounding volume z of a circular
`node associated with x or y .
`
`The second type of queries, called spatietemporal
`queries, involve temporal dimension also and generally
`require abstraction of information over a large group
`of frames. Some typical queries which fall into this
`category are:
`
`Finding the duration of an object. For example,
`how long has person X appeared on a certain
`video clip. This query can be expressed as :
`X.duration A X IN v
`
`0 Estimating the speed of an object. For example,
`how fast is object X walking in a certain clip.
`x IN A D P ( t l , t a , z )
`t a - t i
`Here, tl and 22 are two variables denoting frame
`numbers assigned by the system.
`
`Several complex queries can be constructed using
`the above basic query types. As mentioned earlier, all
`these queries generally require processing of various
`combination of objects which are the leaf nodes of the
`object hierarchy (shown in Figure 5). Such process-
`ing in turn requires performing operations of predicate
`logic on CSO, CTO, and PO of the VSDG.
`For example, the concept slamdunk can be de-
`scribed by the following two concepts.
`
`0 Slam-dunk is a scored basket where a player's
`hand is top of the basket rim and grabs the rim
`instantaneously after the basket is successful.
`0 A scored basket is categorized as slam-dunk if
`the acceleration of the ball towards the basket
`is greater than g , the gravitation force, i.e, it is
`not a free fall.
`
`In order to construct the logic of this query, we can
`define a logic function called Ondop as follows.
`EC(a,b) A (yea > y c g )
`OndoP(a,b)
`
`E
`
`trt,
`
`Figure 7: QTVE sketches for slam-dunk
`
`Using this function and the previous defined pred-
`icates, slamdunk can be described as following;
`3 x E basketball, 3 y E basketrim, 3 z E basketnet,
`3 hl, h2 E hand
`mt (On-top(h1, x ) V Ondop(h2, x ) ,
`m z , Y ) , P P ( X , z ) , OnJop(h1, Y ) v On-top(h2, Y ) )
`Similarly, t,o specify the concept like walking. Two
`possible ways are as follows.
`(1) Assume that there is a fixed (not moving) object
`on the scene (e.g., a home). Then a person is walk-
`ing if the relative distances between the person and
`the house at. two different instants is greater than a
`threshold. Accordingly, the following expression can
`be used to describe this event.
`3 x E person, 3 y in house, 3 t l , t 2
`( t i < t2)A
`(D(RC(t1, y, x), RC(t2, y, x)) > Threshold)
`( 2 ) Assume there is no reference object. In this case
`the expression is :
`3 x E person, 3 t l , t 2
`(tl < 2 2 ) A ( D P ( t l , t 2 , z) > Threshold)
`Similarly, the concept, 'pass' can be described as
`following.
`E player, 3 y E basketball
`3 ~ 1 ~ x 2
` ( E C ( z 1 , Y), W X l
` Y ) A DC(z2, Y ) , EC(z2, Y ) )
`m t
`,
`Theoretically, any concept that requires expression
`of spatio-temporal interactions among objects can be
`specified by predicate logic expressions. We have pro-
`vided only a limited number of examples and even for
`those examples, only a few possible ways of specify-
`ing them have been discussed. Occasionally, there are
`situations where some qiieries cannot be processed by
`the proposed models and we need to process the raw
`video data. These queries may contain some objects
`which are not already identified in the VSDG model.
`
`407
`
`

`

`3.3 Query by Temporal Video Example
`
`An alternative way to express queries for video
`databases is what we call “Query by Temporal Video
`Example” (QTVE). In this method, queries can be en-
`tered by drawing sketches that examplify the concept
`in the video clips to be searched. Possible sketches
`that a user can draw for “slam-dunk are shown in
`Figure 7.
`In this method, main objects of interest and their
`positions are roughly sketched by the user in a se-
`quence of frames and the temporal information is also
`specified by the time differences, Ati’s. The number
`of frames may vary. No time difference is required for
`the single frame case. As for the video data, sketches
`are first passed through a recogniton process. The
`method used for this purpose should be different than
`that for video data, because the visual characteristics
`of sketches are quite different from those of real video
`scenes. Once the objects and their positions are iden-
`tified, a sub-VSDG can be formed the same way the
`main VSDG for data is built. The last step is a simi-
`larity matching between two VSDG’s to determine the
`video clips that contain the concept represented by the
`query VSDG. This matching process has to take two
`important issues into consideration:
`First, the shapes and positions of the sketched ob-
`jects is very approximate. Therefore, this information
`must have a high degree of tolerance in the match-
`ing process. For example, a player approaching the
`rim from right and left are two pictorially different,
`cases but conceptually identical. Second, the tempo-
`ral information must also be used with a high toler-
`ance. The user cannot be expected to specify time
`differences exactly, a range should rather be sufficient.
`In Figure 7, for example, a reasonable range for At1
`would be 0.2 - l.0sec.
`In addition to sketches, real video frames sampled
`from the database can also be used for querying. The
`advantage of this is less reliance on the ability of the
`user to draw proper figures, but the flexibility is re-
`duced in the sense that only predefined and stored
`concepts can be queried.
`
`4 Conclusion
`
`We have proposed a graphical data model called
`Video Semantic Directed Graph (VSDG) for unbiased
`abstraction and representation of video data. The
`model extracts spatial and temporal information of
`objects in a video clip and represents it in the form of
`a directed graph. A user can use this this graphical
`
`representation to construct its own view of the data.
`For this purpose, an object oriented data model is de-
`scribed. Using propositional logic described in the pa-
`per, a user can specify queries and hence can retrieve
`corresponding video clips without ever processing raw
`video data. The proposed methodology employs com-
`puter vision and image processing (CVIP) techniques
`to automate the construction of the video database
`based on the VSDG model.
`
`References
`
`[l] T . D. C. Little, and A. Ghafoor, ”Synchronization and
`Storage Models for Multimedia Objects,” IEEE Journal
`on Selected Areas In Communications, Vol. 8 , No. 3, April
`1990, pp. 413-427.
`[2] T . D. C. little, and A. Ghafoor, ”Interval-Based Concep
`tiial Models for Time-dependent Multimedia Data,” IEEE
`Tranaactions on Knowledge and Data Engineering, Vol. 5,
`No. 4, Angust 1993, pp. 551-563.
`(31 C. J. Date, An Introduction i o Database Systems, Vol. 1 ,
`5th edition, Addision Wesley, 1990.
`
`[4] D. Swanberg, C.-F. Shu, and R. Jain, “Knowledge Guided
`Parsing in Video Databases,” Proceeding of SPIE, San
`Jose, California, January 1993.
`
`[5] A. Nagasaka, and Y. Tanaka, “Automatic Video Indexing
`and Full Video Search for Object Appearances,” in 2nd
`Working Conference on Visual Database Systems, Bu-
`dapest, Hungry, October 1991, IFIP WG 2.6., pp.119-133.
`
`[6] E. Oomoto, and K . Tanaka, “OVID: Design and Imple-
`mentation of a Video-Object Database System,” IEEE
`Transactions on Knowledge and Data Engineering, Vol.
`5, No. 4, August 1993, pp. 629-643.
`
`[7] M. Ioka, and M. Kurokawa, “A Method for Retrieving
`Sequences of Images on the basis of Motion Analysis,”
`Proceeding SPIE, Image Storage and Retrieval Systems,
`San Jose, 1992, pp. 35-46.
`
`[B] A. Del Bimbo, E. Vicario, and D. Zingoni, “A Spati-
`Temporal Logic for Image Sequence Coding and Re-
`trieval,” Proceeding of IEEE VL’ 92 Workshop on Visual
`Languages, Seattle WA, September 1992, pp. 228-230.
`[9] D. A. R. and A. G. Cohn, “Exploiting Lattice in a theory
`of Space and Time,” Computer Mathematics Applications,
`Vol. 23, No. 6-9, 1992, pp.459-476.
`[IO] A. Abella, and J . R. Kender, “Qualitative Describing Ob-
`jects Using Spatial Prepositions,” in AAAI-93, Proc. of
`1 1 t h National Conference on Artificial Intelligence, pp.
`536-540.
`[ll] D. A. Randell, Z . Cui, and A. G . Cohn, “A Spatid Logic
`Based on Regions and Connection,” in Proc. of 3rd Intl.
`Conf. on Principles of Knowledge Representation and
`Reasoning, Cambridge, MA, 1992, pp. 165-176.
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket