throbber
P1: JSN
`Multimedia Tools and Applications
`
`KL557-03-Golshani
`
`February 18, 1998
`
`11:1
`
`Multimedia Tools and Applications 6, 289–312 (1998)
`c(cid:176) 1998 Kluwer Academic Publishers. Manufactured in The Netherlands.
`
`A Language for Content-Based Video Retrieval
`
`FOROUZAN GOLSHANI
`Department of Computer Science, Arizona State University, Tempe, Arizona 85287-5406
`
`golshani@asu.edu
`
`NEVENKA DIMITROVA
`Philips Research, 345 Scarborough Rd., Briarcliff Manor, NY 10510
`
`nvd@philabs.research.philips.com
`
`Abstract. We present an effective technique for automatic extraction, representation, and classification of digital
`video, and a visual language for formulation of queries to access the semantic information contained in digital
`video. We have devised an algorithm that extracts motion information from a video sequence. This algorithm
`provides a low-cost extension to the motion compensation component of the MPEG compression algorithm. In
`this paper, we present a visual language called VEVA for querying multimedia information in general, and video
`semantic information in particular. Unlike many other proposals that concentrate on browsing the data, VEVA
`offers a complete set of capabilities for specifying relationships between the image components and formulating
`queries that search for objects, their motions and their other associated characteristics. VEVA has been shown to
`be very expressive in this context mainly due to the fact that many types of multimedia information are inherently
`visual in nature.
`
`Keywords: visual languages, content-based retrieval, digital video
`
`1.
`
`Introduction
`
`It is generally believed that the human mind is visually oriented. For those concepts that can
`be presented visually, people acquire information at a higher rate by discovering graphical
`relationships in complex pictures rather than plain text. Analogous to this human char-
`acteristic, graphical/visual interfaces attempt to augment the traditional human-computer
`interactive mechanisms with visual aides [25].
`Human and computer vision refer to the ability of seeing an image and understanding
`its details—generally known as image contents. This ability is often summarized as: to
`observe and to evaluate. Visual information is what humans can extract and understand
`from images and video. Making inferences is much easier in visual systems. For example,
`given two objects—one large and one small—a visual system can immediately recognize
`the disparity of size. Such an inference would be much more difficult to make using a text
`annotations.
`Spatial and motion characteristics of objects derived from images and video sequences
`are inherently visual. This visual aspect behooves us to find best suited visual paradigms
`for content retrieval of images and video sequences.
`In a previous paper, we have presented a method for extracting object motion during
`video encoding by the MPEG method. Since a large part of the low level motion analysis
`is already performed by the video encoder for the encoding of the blocks in the predicted
`
`AVIGILON EX. 2009
`IPR2019-00314
`Page 1 of 24
`
`

`

`P1: JSN
`Multimedia Tools and Applications
`
`KL557-03-Golshani
`
`February 18, 1998
`
`11:1
`
`290
`
`GOLSHANI AND DIMITROVA
`
`and bidirectional frames, our algorithms perform very well in creating object trajectories
`based on these coarse-grained representations of the optical flow. Simply put, we extract
`macroblock trajectories which are spatiotemporal representations of macroblock motion
`and use them for object motion recovery. By describing the movements that we derive
`from the process of motion analysis, we introduce a dual hierarchy consisting of spatial and
`temporal parts for video sequence representation. This gives us the flexibility to examine
`arbitrary frames at various levels of abstraction, and to retrieve the associated temporal
`information (say, object trajectories) in addition to the spatial representation. See [12] for
`details.
`Motion in a video segment may be a result of a number of different phenomena. Our work
`does not cover all types. For example, camera motion cannot be separated from object mo-
`tion. Basic camera movements include: vertical rotation (called tilting), vertical transverse
`movement (or booming), horizontal rotation (known as panning), horizontal transverse
`movement (called tracking), variations in focus distance (or zooming), and horizontal lat-
`eral movement (commonly known as dollying). These six along with no movement, i.e.,
`fixed, constitute the seven basic camera operations. Obviously, camera operations are the
`cause of significant characteristics of the video data and should be modeled properly. Optical
`flow techniques for motion extraction work much better for detecting this kind of change.
`See Akutsu et al. [1] for more detail.
`There are certain other types of motion that present a problem for our schemes. Our
`algorithms are primarily geared to recognizing motion in 2D, i.e., the (x, y) direction.
`When an object is moving directly toward the camera, similar to the zooming operation of
`the camera, we cannot distinguish and accurately formulate the motion. Again, to a certain
`degree, schemes based on optical flow analysis perform better in this regard.
`In this paper, we outline the design of a multimedia database language which has well
`defined semantics in both the icon-based and character-based paradigms. The reason for
`supporting both paradigms is to have the best of both worlds: intuitive visual description
`of visual languages and the scalability of the character based languages. This paper first
`surveys the visual metaphors for content based retrieval for digital video in Section 2. The
`video information model are then presented in Section 3.
`In Section 4, we present the
`formal foundation of the language Visual Extension to VArqa (VEVA) which embodies
`the representation of the visual model into a common iconic and character-based language.
`The grammar of a language based on VEVA is given in Section 4.1. The execution model
`is described in Section 4.2. Section 4.3 illustrates the usage of the VEVA language through
`example queries and results. Some final observations are presented in the conclusions
`section.
`
`2. Visual metaphors for content retrieval
`
`The basic concept of visual interaction is the visual metaphor. The quality, in terms of
`understandability, clarity and effectiveness (i.e., having intuitive connotations with the con-
`cept they represent), is very important for raising the level of communication and increasing
`the potential number of users. A visual metaphor may be characterized as some symbol that
`would remind the user of an object or concept. Such a symbol may be a part of the object
`
`AVIGILON EX. 2009
`IPR2019-00314
`Page 2 of 24
`
`

`

`P1: JSN
`Multimedia Tools and Applications
`
`KL557-03-Golshani
`
`February 18, 1998
`
`11:1
`
`A LANGUAGE FOR CONTENT-BASED VIDEO RETRIEVAL
`
`291
`
`or an exaggeration of one of its significant attributes. In the case of an iconic language, the
`visual metaphor is the icon. An icon is usually a simplified image of the function of the
`system or an entity, and is used for manipulation of the system. The user recognizes the ob-
`jectives from the icon images and their spatial arrangements. In a flexible setting, the shape
`and other characteristics of the icons are tailored by the user to suit her/his own mental
`representation of the tasks to be performed. The specification of icons is a whole new field
`of research.
`Visual languages are based on the direct manipulation of graphical objects. Several
`languages cater to graphical description of a wide variety of applications, and allow the
`specification of complex computing environments. We will discuss a few examples. Media
`Streams is a visual language that enables users to create multilayered, iconic annotations of
`video content [10]. The objects denoted by icons are organized into hierarchies. The icons
`are used to annotate video streams in what the author calls a “Media Time Line”. While
`appealing with the simplicity and the abundance of iconic expressions, Media Streams uses
`a fixed vocabulary for the video annotation process.
`A system proposed by Little et al. supports content based retrieval and playback [20].
`A specific schema is composed of movie, scene and actor relations with a fixed set of
`attributes. The system requires manual feature extraction. The features are then inserted
`into the schema. Querying may involve reference to the attributes of movie, scene or actor.
`Once a movie is selected, the user can browse from scene to scene beginning with the initial
`selection. Querying in this system is achieved by browsing the predetermined attributes
`of the movies. Browsing as a visual interaction metaphor in digital video is useful for
`applications which have a predefined set of attributes and where users are not expected to
`learn a new interaction language.
`An algebraic approach to content-based access to video is presented in [30]. Video
`presentations are composed of video segments using a Video Algebraic Model. The algebra
`contains methods for combining video segments both temporally and spatially, as well as
`methods for navigation and querying. This model leads to a framework for efficient access
`and management of video collections. However, the search process is based on the attribute
`information of the algebraic video nodes which are textually represented by human readable,
`semi-structured, algebraic video files. This approach ties together the attribute-based access
`and content browsing of the video nodes.
`One premise of our work is that visual languages and browsing are extremely impor-
`tant for interactive capabilities in digital video applications. The language presented here,
`VEVA, has much in common with the above visual languages. It is an iconic language
`which has a formal mathematical model. Video sequences can be queried based on the
`motion as well as spatial aspects of the objects. The iconic representations of all the con-
`cepts can be queried based on their temporal appearance in a way similar to that in Media
`Streams. The language closely follows a model for motion recovery and representation
`of objects in digital video [12]. The advantage of VEVA is that it is based on the same
`model which is used for extraction of semantics of video sequences. As part of the lan-
`guage, a suite of operators can be used for automatic feature extraction and matching of
`objects.
`
`AVIGILON EX. 2009
`IPR2019-00314
`Page 3 of 24
`
`

`

`P1: JSN
`Multimedia Tools and Applications
`
`KL557-03-Golshani
`
`February 18, 1998
`
`11:1
`
`292
`
`GOLSHANI AND DIMITROVA
`
`3. Video information model and language
`
`3.1. The model
`
`In this section we present a formal foundation for our video information model. It is based
`on an algebraic framework [13, 31] which has been used in many other areas including
`programming languages, software and hardware specifications, and object oriented mod-
`eling. Algebraic specifications are developed in the following manner. Given an alphabet
`consisting of several classes of symbols for types and their associated operators (functions),
`a schema is specified. (Formally, the schema corresponds closely to the notion of signature
`in the algebraic framework.) The schema has all the necessary syntactic information along
`with typing rules, i.e., the rules that determine what type of object(s) can be given to each
`operator and what type of object(s) it will return. In our system, the domain-dependent infor-
`mation, provided by the system developer, is combined with the application-independent
`constructs, provided by the system itself, in order to create the schema. In essence, the
`schema is a formal specification of all objects of interest, real or conceptual, and the re-
`lationships between them. Given a schema, the set of well-formed expressions is defined.
`These expressions are constructed by using the numerous powerful operators that are pro-
`vided for each type. We will see that, despite a formalized underpinning, the language
`presented here is extremely simple. Strongly resembling conventional set theory, our lan-
`guage has a functional flavor, similar to Lucid [29] and others. In Section 3.2 we will
`expand the treatment of video data type. This is based on our work on specification of
`object recognition in images and motion recovery in video [11].
`The basic constructs of the model are data types and functions that operate on the data
`types. Two main kinds of data types that are used in this discussion are:
`† System-defined, fixed data types, called “deliverable types”, are: string, integer, boolean,
`text, image, audio and video. These are the application-independent constituents of the
`system and are present in any specification. The traditional data types integer, string and
`boolean are known as printable objects. Analogously, we call audio and video deliverable
`(presentable) types, since they inherently contain a time component.
`† User-defined data types, called “entity types”, such as “PERSON”, “STUDENT”, are
`those that represent objects or concepts in the real world. These, generally, have properties
`that are embodied in deliverable types.
`
`Each data type has a number of operators associated with it. Operators that are associated
`with user-defined types are called user-defined functions. User-defined functions describe
`the domain related relationships, cross references between entity types, and the attributes
`of objects. Cross references typically represent multivalued relationships. Function types
`have the general form of:
`` : fi0 £ ¢ ¢ ¢ £ fin¡1 ! fin
`
`where every fii , called a type expression, is inductively defined as:
`
`AVIGILON EX. 2009
`IPR2019-00314
`Page 4 of 24
`
`

`

`P1: JSN
`Multimedia Tools and Applications
`
`KL557-03-Golshani
`
`February 18, 1998
`
`11:1
`
`A LANGUAGE FOR CONTENT-BASED VIDEO RETRIEVAL
`
`293
`
`— a data type
`— fi1 [ fi2, fi1 £ fi2 or P.fi1/ where fi1 and fi2 are type expressions.
`
`Note that the above definition will allow the functions to take as arguments: object types,
`their unions, cartesian products and powersets.
`In addition to the user-defined functions, we need a collection of operators that are in-
`dependent of the application domain, and as such operate on system defined data types.
`Each data type, such as text, graphics, scanned images, audio, and video, has a rich se-
`lection of operators associated with it. For example, the operator appendPar performs
`concatenation operation for text paragaraphs for the type Text. All set theoretical, boolean,
`and arithmetic operators are included. In addition, there are a number of variable binding
`operators. The main characteristic of these operators is that they cause a variable to range
`over the elements of its domain. An example is the set construction operator that has the
`form f f .x / j P.x /g, where f .x / denotes the desired output objects, and P.x / denotes
`the retrieval predicate that must hold for those objects. While x ranges over its domain,
`whenever P.x / is satisfied, f .x / is added to the set. There are a number of other operators
`like this, including the logical quantifiers. We will see some examples when we introduce
`video operators.
`A list of basic multimedia operators and their semantics are presented in [15]. These
`operators are categorized into: set operators, logical operators, operators for temporal syn-
`chronization and spatial composition, arithmetic operations, and media specific operators
`for text, graphics, audio, image, and video manipulation. The basic data types and their as-
`sociated operators are part of the schema (signature) of the multimedia information system.
`The syntax of the language is developed on the basis of this schema.
`In the algebraic framework, the algebra associates with each type a set of objects that
`behave as mandated by the specification of that type. Thus, the set associated with the
`type Integer is, obviously, the set of integer numbers. Other data types denote objects that
`have the predefined properties. Objects of type Text are paragraphs. Data type Audio
`denotes signals of one dimension, and can be thought of as a function of time. The type
`Image contains signals of two dimensions. Finally, Video is a signal of three dimensions
`F .x ; y; i / represented as Fi .x ; y/ where i represents the frame counter, and x and y are
`pixel coordinates. When no confusion is expected, we may omit the references to the
`coordinates x and y or to the frame counter i. When needed we will use superscripts to
`.x ; y/ and F 2
`distinguish between different video streams, e.g., F 1
`.x ; y/. The following
`i
`i
`notation will be used:
`
`— Fb.x ; y/ for the first frame of the video signal
`— Fe.x ; y/ for the last frame of the video signal
`— Fc.x ; y/ for the current frame of the video signal
`
`The algebra allows partial functions. Whenever the function is not defined over a certain
`part of the domain we use the symbol (cid:181) to denote the undefined values.
`In this paper we focus specifically on the Video data type. In the next sections we describe
`operators for the video data type in detail.
`
`AVIGILON EX. 2009
`IPR2019-00314
`Page 5 of 24
`
`

`

`P1: JSN
`Multimedia Tools and Applications
`
`KL557-03-Golshani
`
`February 18, 1998
`
`11:1
`
`294
`
`GOLSHANI AND DIMITROVA
`
`3.2. Physical video data type
`
`A video data stream carries much more complex information than any other type of media.
`Operators on the video stream range from operators for delivery of video streams, editing
`operators, as well as operators for extracting motion for video classification. In this section,
`we will present video operators for editing and delivery. We will present the operators for
`motion based content classification in Section 3.3.
`
`3.2.1. Editing operators. The primitive operators for video editing are analogous to the
`list processing operators. For list processing, first head, tail and append (car, cdr, : : : in
`Lisp) are defined, and then, based on these, more elaborate operators are introduced. A
`similar definition is followed for type video. The primitive video operators that are defined
`first are:
`— # for obtaining the first frame of a video sequence. Thus F .x ; y/# D Fb.x ; y/
`— " for obtaining the video sequence without its first frame.
`
`Based on the above, we can define such operators as:
`— #a returns the first portion of the video sequence up to frame number a. We write F#a
`to indicate the first a frames of F.
`— "a returns the last portion of the video sequence starting with frame number a. Thus
`F"a denotes the last portion of F starting with frame number a.
`— – appends one video sequence to the end of another (for concatenation of two video
`sequences).
`
`Many editing tasks can now be introduced by means of more elaborate operators, includ-
`ing the following:
`† Inserting a video stream into another video stream:
`v insert : Video £ Video £ Integer ! Video
`(
`
`F 1#a – F 2 – F 1"a
`
`(cid:181)
`
`if a ‚ 0 and F 1 has ‚a frames
`otherwise
`
`where:
`v insert.F 1; F 2; a/ D
`
`- Extract a video clip from a video stream:
`v clip : Video £ Integer £ Integer ! Video
`(
`
`where
`v clip.F; a1; a2/ D
`
`F ¡ F#a 1 ¡ F"a 2
`
`(cid:181)
`
`if a1 • a2 and F has ‚a2 frames
`otherwise
`
`AVIGILON EX. 2009
`IPR2019-00314
`Page 6 of 24
`
`

`

`P1: JSN
`Multimedia Tools and Applications
`
`KL557-03-Golshani
`
`February 18, 1998
`
`11:1
`
`A LANGUAGE FOR CONTENT-BASED VIDEO RETRIEVAL
`
`295
`
`† Video cut extraction:
`cuts : Video ! P.int/
`
`cuts.F / D fi jdifference .Fi¡1; Fi / >threshold g
`
`The operator cuts takes an input video stream and returns a set of frame numbers which
`correspond to a drastic scene change, i.e., video cut. Determining the difference between
`the frames for cut detection Fi¡1 and Fi is not a straightforward task because of camera
`operation or sophisticated video effects. Cut detection is important in the initial stages
`of video processing, since shot boundaries have to be identified to extract meaningful
`information within a shot [16, 22, 24].
`† Extract a set of motion icons from a video stream:
`micons : Video ! P.Image/
`
`where
`micons.F / D fFi j i 2cuts. F /g
`
`This operator extracts frames in a video sequence that correspond to video cut changes.
`A micon is a visual representation of the most representative scenes of a video stream.
`† Extract a still image from a video sequence at a given frame number:
`still : Video £ integer ! Image
`
`3.2.2. Video delivery operators. Delivery operators take a video stream and present it at
`user’s request during retrieval time. These operators do not change the contents of the input
`video stream.
`Consider the video stream Fi .x ; y/. As before, we use Fb.x ; y/ , Fe.x ; y/ and Fc.x ; y/
`to denote the beginning frame, the ending frame, and the current frame of the video stream,
`respectively.
`The two primary delivery (playback) operators, play and reverse, are defined as follows:
`play D Fe
`iDc Fi .x ; y/
`reverse D Gb
`iDc Fi .x ; y/
`R
`R
`The video operators G and F are similar in nature to operators that force variables to range
`b
`a f .x / dx which causes x to range
`over their domain. Analogous to the integral sign
`in
`over the interval [a; b]; F and G force the frame counter to range from c to the end or to the
`beginning of the frame counter, respectively.
`
`AVIGILON EX. 2009
`IPR2019-00314
`Page 7 of 24
`
`

`

`P1: JSN
`Multimedia Tools and Applications
`
`KL557-03-Golshani
`
`February 18, 1998
`
`11:1
`
`296
`
`GOLSHANI AND DIMITROVA
`
`3.2.3. Video attribute operators. Finally, a host of operators for querying the physical
`attributes of video are included. For example, given a video clip, v length returns a number
`representing the length of the sequence.
`v length : Video ! Integer
`
`Similar operators are introduced for querying other attributes of the physical video. The
`expected playback frame rate is given by the operator:
`frame rate : Video ! Integer
`
`The following operator gives the physical storage size of the video:
`size : Video ! Integer
`
`Another format related video information is the resolution of the individual frames:
`resolution : Video ! String
`
`The following operator gives back the compression scheme used for storing the video
`data:
`
`compression : Video ! String
`
`Figure 1 contains a complete description of the operators that apply to the physical video
`type. Double lines indicate set-valued operators.
`
`Figure 1. Physical video data type.
`
`AVIGILON EX. 2009
`IPR2019-00314
`Page 8 of 24
`
`

`

`P1: JSN
`Multimedia Tools and Applications
`
`KL557-03-Golshani
`
`February 18, 1998
`
`11:1
`
`A LANGUAGE FOR CONTENT-BASED VIDEO RETRIEVAL
`
`297
`
`Figure 2. Conceptual video data type.
`
`3.3. Spatiotemporal operators of the video data type
`
`The purpose of object analysis and motion analysis is to extract relevant properties of objects
`and their movements in order to represent the concepts emerging from the video sequences.
`In this section we bring together both object and motion aspects of video content analysis
`into a unique representation of the conceptual video data type.
`Figure 2 illustrates the operators that apply to the conceptual video type.
`Motion analysis starts with the motion vector recovery followed by tracing of individual
`macroblock trajectories. Conceptually, this process is captured by the operator:
`extractedTrajectories : Video ! Trajectory
`
`The data type Trajectory consists of the representation of the motion path of objects
`in a video sequence. Each trajectory can be thought of as an n-tuple of motion vectors.
`This trajectory representation is a basis for various other trajectory representations, such
`as curve representation and point representation. The diversity of the trajectory represen-
`tations makes the querying process more flexible. Matching operators used for motion
`retrieval depend on the method employed for trajectory representation. Examples include:
`exact matching function that uses absolute frame coordinates; exact matching function that
`uses relative coordinates; curve comparison based on the curve fitting approach used for
`interpolated trajectory representation; approximate matching that uses chain code represen-
`tation; and qualitative matching that uses differential chain code. The result in each case
`is a similarity factor between the input trajectory and a target trajectory in the set of object
`trajectories.
`
`AVIGILON EX. 2009
`IPR2019-00314
`Page 9 of 24
`
`

`

`P1: JSN
`Multimedia Tools and Applications
`
`KL557-03-Golshani
`
`February 18, 1998
`
`11:1
`
`298
`
`GOLSHANI AND DIMITROVA
`
`The operators still and micons were described in Section 3.2. The operator features has
`the following signature:
`features : Image ! featureMatrix
`
`This operator takes an image and derives a set of features indexed into a feature matrix.
`The operator leadsTo, which has the signature:
`leadsTo : featureMatrix ! Object
`This operator takes the input feature matrix and performs a classification of the input
`features into a predefined set of object categories.
`The process of feature selection and feature mapping is discussed extensively in the
`computer vision literature [2, 27]. The actual process of feature selection depends on the
`specific domain implementation.
`The operator identifiedObjects is abstracted as:
`identifiedObjects : Video ! Object
`where the Object is a union of all user defined data types in a particular schema. In ad-
`dition, Object may contain other objects of interest. The operator identifiedObjects is a
`representation of a composition of a series of other operators such as: still, features, and
`leadsTo.
`Using object descriptions and their traversed trajectories we infer activities:
`activeObjects : Object £ Trajectory ! Activity
`Description of activities is derived from previously computed motion features. For ex-
`ample, if the object has been recognized as a car, then, by associating with it a straight line
`as its trajectory, it will have an activity: driveStraight.
`We should note that derivation of activities is a process in which the object representation
`might be an empty set. The process of derivation of activities directly from motion repre-
`sentation is an idea that started more than 20 years ago [18]. However, implementationally,
`there are still unanswered questions.
`Once the activities in a video sequence are described, we can pose the question: what
`are the video sequences in which a particular activity occurs, using the operator:
`occurs : Activity ! Video
`Occurs is an operator that delivers a video sequence which contains a certain activity.
`Figure 2 represents the conceptual video operators described above. The figure is an at-
`tempt to synthesize all of the information derivable from the video sequences. The operators
`for identification of objects are simplified in order to give the complete picture.
`In order to establish a correspondence between the paths traversed by objects and their
`spatial descriptions, we use a family of functions which take as input any object description
`and trajectory description. There are a variety of activities that can be inferred solely from
`trajectory description, but those activities would be the generic ones. For example, we
`
`AVIGILON EX. 2009
`IPR2019-00314
`Page 10 of 24
`
`

`

`P1: JSN
`Multimedia Tools and Applications
`
`KL557-03-Golshani
`
`February 18, 1998
`
`11:1
`
`A LANGUAGE FOR CONTENT-BASED VIDEO RETRIEVAL
`
`299
`
`Figure 3. Car racing schema.
`
`can infer that whatever object has a trajectory congruent to a straight line is performing a
`“move straight” activity. Given a certain context, say where the only moving objects in the
`application domain are cars and we do not expect any other moving objects, then we can
`infer a more specific activity such as “drive straight”.
`Consider a system for storing information on car racing in which information about race
`cars and drivers are stored. The user defined entity types and the associated operators for this
`application form a schema that is represented graphically in figure 3. The type expressions
`of some sample functions are:
`
`! String
`: Driver
`nameOf
`! Image
`: Driver
`pictureOf
`! Car
`: Driver
`drives
`: Driver £ Car ! Race
`racesIn
`! Video
`: Race
`coverage
`! Integer
`: Race
`yearOf
`! String
`: Race
`winnerOf
`! String
`: Race
`rnameOf
`! Audio
`announcement : Race
`! String
`: Car
`modelOf
`! String
`: Car
`makeOf
`
`AVIGILON EX. 2009
`IPR2019-00314
`Page 11 of 24
`
`

`

`P1: JSN
`Multimedia Tools and Applications
`
`KL557-03-Golshani
`
`February 18, 1998
`
`11:1
`
`300
`
`GOLSHANI AND DIMITROVA
`
`Given the car race schema (see figure 3) the following activities can be inferred from the
`set of video sequences:
`† turnLeft is true if trajectory orientation changes to the left with respect to the current
`direction.
`† turnRight is true if trajectory orientation changes to the right with respect to the current
`direction.
`† driveStraight is true if trajectory orientation stays the same.
`† speedUp is true if velocity increases.
`† slowDown is true if velocity decreases.
`† collision is true if trajectory t1 coincides with trajectory t2 at a particular time instance.
`collision : Trajectory £ Trajectory ! Activity
`
`An example query on the above schema could be: show all the video sequences in which
`John’s Ferrari is speeding up.
`fcoverage.Race/ j exists Driver exists Car:
`speedUp.O/ occursIn coverage.Race/
`and racesIn.Driver; Car/ is Race
`and nameOf .Driver/ is “John”
`and makeOf .Car/ is “Ferrari”g
`Using derived descriptions, many new types of queries that refer to the contents of video
`sequences can be specified. Specifically, we can express queries that refer to the contents
`of video sequences. Examples include the following:
`
`— Retrieve all the video sequences in which a red car is turning left.
`— Show all the video sequences in which John’s car is slowing down.
`
`The VEVA query language is a visual one. It allows for specification of spatial properties,
`as well as exact and inexact specification of motion properties. In the next section, we will
`outline the language and discuss its implementation.
`
`4.
`
`Implementation of VEVA
`
`A query language for multimedia information systems must provide a number of features
`beyond those of ordinary textual query languages. One such feature is the ability to deal with
`spatial and temporal dimensions of multimedia objects. This capability ensures that, when
`presenting the retrieved results, temporal precedence and spatial composition of objects are
`exactly those requested by the user.
`Defined within the algebraic framework described in Section 3, VEVA is a query lan-
`guage that provides all the necessary constructs for retrieval and management of multimedia
`information [15]. The basis for the language is a schema (algebraic signature) which con-
`tains entity types (both user-defined and application-independent types) and the associated
`
`AVIGILON EX. 2009
`IPR2019-00314
`Page 12 of 24
`
`

`

`P1: JSN
`Multimedia Tools and Applications
`
`KL557-03-Golshani
`
`February 18, 1998
`
`11:1
`
`A LANGUAGE FOR CONTENT-BASED VIDEO RETRIEVAL
`
`301
`
`operators. By using these operators, the user can visually specify a query for the desired
`objects in a simple way. VEVA has a formal grammar with which the set of acceptable
`expressions can be generated. In fact, VEVA’s parent language, Varqa, yields a family of
`attractive graphical languages [14]. One example is VEENA [21], which provides graphical
`primitives for operators, functions, and sorts (sets) in the style of data-flow programming.
`The main construct is the set construction operator which has the form:
`f f .x / j P.x /g
`
`As in most database languages, there is a specification part f .x /, which stands for the
`objects of interest, and a filtering part P.x /, which represents the conditions that must be
`met for the specified objects. The analogy with SQL is that the f .x / part corresponds to
`“SELECT ... FROM”, and the P.x / corresponds to the “WHERE” clause.
`The visual language VEVA is implemented in Tcl [23] with extensions to handle image
`and video manipulation. The visual front end has the following functionalities: select a
`database, draw visual queries by connecting “sorts” and “functions” and applying operators
`to symbols.
`
`4.1. Visual grammar
`
`Sentences in visual languages are assemblies of pictorial objects such as “ovals”, “arrows”,
`or “icons” with spatial relations such as “next to” or “contains” between them. Their un-
`derlying structure is a variant of directed graphs. Therefore, graph grammars are a natural
`means for defining the syntax of visual languages.
`The grammar for the visual language VEVA is given using visual rules in the style of a
`picture description language which was developed within the syntactic approach to pattern
`recognition [27] The grammar rules contain nonterminal and terminal icons. The rules are
`given as graph rewriting rules where the left-hand side is a nonterminal icon, and the right-
`hand side is a graph containing nonterminal and terminal icons connected with links of
`certain kinds. Nonterminal icons are denoted by shaded iconic symbols, whereas terminal
`icons have

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket