throbber

`

`acm Transactions
`on Information
`Systems
`
`ACM
`1515 Broadway
`New York, NY 10036
`Tel: (212) 869-7440
`ACM European Service Center
`Avenue Marcel Thiry 204
`1200 Brussels, Belgium
`Tel: 32 2 774 9602
`Fax: 32 2 774 9690
`Email: acm_europe@acm.org
`
`Editor-in-Chief
`Robert B. Allen
`
`Editor-in-Chief Designate
`W. Bruce Croft
`
`Associate Editors
`Alex Borgida
`
`Mic Bowman
`
`Shih-Fu Chang
`
`Prasun Dewan
`
`Steven Feiner
`
`John Herring
`
`Michael N. Huhns
`
`Simon Kaplan
`
`Rob Kling
`
`Ray Larson
`
`John J. Leggett
`
`David D. Lewis
`
`Judith Olson
`
`Paolo Paolini
`
`Gerard Salton
`
`Peter Schauble
`
`Alan F. Smeaton
`
`Headquarters Quarterlies Staff
`Mark Mandelbaum
`Nhora Cortes-Comerer
`RomaSimon
`George Criscione
`
`Bellcore/2A-367/445 South St./Morristown, NJ 07960-6438 USA/
`+ 1-20 1-829-4315/tois@bellcore.com
`
`University of Massachusetts/Computer Science Department/LGRC 243, Box
`3461 0/Amherst, MA 01003-4610 USA/+ 1-413-545-0463/croft@cs.umass.edu
`
`Rutgers University/Department of Computer Science/ CORE Building Room
`315/Piscataway, NJ 08854 USA/+ 1-908-932-4 7 44/
`borgida@topaz.rutgers.edu
`Transarc Corporation!The Gulf Tower/707 Grant Street/Pittsburgh, PA 15219
`USA/+ 1-412-338-6752/mic@transarc.com
`Columbia University/Department of Electrical Engineering and Center for
`Telecommunications Research/New York, NY 10027 USA/+ 1- 212-316-9068/
`sfchang@ctr.columbia.edu
`Department of Computer Science/Room 150/CB 3175, Sitterson Hall/
`University of North Carolina, Chapel Hill/Chapel Hill, NC 27599-3175 USA/
`+ 1-919-962-1823/dewan@cs.unc.edu
`Department of Computer Science/Columbia University/500 W. 120 Street/
`New York, NY 10027 USA/+1-212-939-7083/feiner@cs.columbia.edu
`Oracle Corporation/Spatial Information Systems/3 Betheseda Metro Center/
`Suite 1400/20814 USA/+ 1-301-907 -2723/jherring@us.oracle.com
`Department of Electrical and Computer Engineering/University of South
`Carolina/Columbia, SC 29208 USA/+1-803-777-5921/huhns@ece.sc.edu
`Department of Computer Science/University of Queensland/Saint Lucia 4072/
`Australia/+61-733-653-168/s.kaplan@cs. uq.oz.au
`Department of Information and Computer Science/University of California at
`Irvine/Irvine, CA 92717 USA/+ 1-714-856-5955/kling@ics.uci.edu
`University of California at Berkeley/School of Library and Information Studies/
`Berkeley, CA 94720 USA/+ 1-415-642-6046/larson@sherlock.berkeley.edu
`Department of Computer Science!Texas A&M University/College Station, TX
`77843-3112 USA/+ 1-409-845-0298/leggett@cs.tamu.edu
`AT&T Bell Laboratories/2C 408/600 Mountain Avenue/Murray Hill, NJ 07974
`USA/+ 1-908-582-3976/lewis@research.att.com
`CSMIUUniversity of Michigan/701 Tappan Street/Ann Arbor, M148109-1234
`USA/+ 1-313-7 47-4606/jso@csmil.umich.edu
`Dipartimento di Elettronica/Politecnico di Milano/Piazza Leonardo da Vinci
`32/20133 Milan, ltaly/+39-2-23993520/paolini@ipmel2.elet.polimi.it
`Department of Computer Science/Cornell University/Ithaca, NY 14853 USA/
`+ 1-607 -255-4117/gs@cs.cornell.edu
`Institute of Information Systems/Swiss Federal Institute of Technology (ETH)/
`ETH Zentrum, IFW/CH-8092, Zurich, Switzerland/+41-1-254-7222/
`schauble@inf.ethz.ch
`School of Computer Applications/Dublin City University/Giasnevin/Dublin 9,
`Ireland i+353-1-7045262/asmeaton@compapp.dcu.ie
`
`Director of Publications
`Associate Director of Publications
`Managing Editor, ACM Quarterlies
`Associate Managing Editor, ACM Quarterlies
`
`ACM Transactions on Information Systems (ISSN 1046-8188) is published 4 times a year in January, April, July, and
`October by the Association for Computing Machinery, Inc., 1515 Broadway, New York, NY 10036. Second-class
`postage paid at New York, NY 10001, and at additional mailing offices. Postmaster: Send address changes to Trans(cid:173)
`actions on Information Systems, ACM, 1515 Broadway, New York, NY 10036.
`
`Copyright© 1995 by the Association for Computing Machinery, Inc. (ACM). Permission to make digital or hard copies
`of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or
`distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
`Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is
`permitted. To copy otherwise, to republish. to post on servers, or to redistribute to lists, requires prior specific permis(cid:173)
`sion and/or a fee. Request permission to republish from: Publications Dept. ACM, Inc. Fax+ 1 (212) 869-0481 or
`email <permissions@acm.org>.
`For other copying of articles that carry a code at the bottom of the first or last page or screen display, copying is
`permitted provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222
`Rosewood Drive, Danvers, MA 01923.
`
`For subscription and submissions information, see inside back cover.
`
`Canon Ex. 1006 Page 2 of 34
`
`

`

`Motion Recovery for Video Content
`Classification
`
`NEVENKA DIMITROVA and FOROUZAN GOLSHAN!
`Arizona State University, Tempe
`
`Like other types of digital informatwn, vrdeo sequences must be classified based on the
`semantics of their contents. A more-precise and completer extraction of semantic informatwn wrll
`result m a more-effective classification. The most-discernible difference between stillrmages and
`moving pictures stems from movements and variations Thus, to go from the realm of still-rmage
`repositories to video databases, we must be able to deal wrth motion. Particularly, we need the
`abrhty to classify objects appearing in a vrdeo sequence based on therr characteristics and
`features such as shape or color, as well as their movements By describing the movements that
`we derive from the process of motwn analysis, we introduce a dual hrerarchy consistmg of spatial
`and temporal parts for vrdeo sequence representation. This grves us the flexrbility to examine
`arbitrary sequences of frames at various levels of abstractwn and to retrieve the assocrated
`temporal informatwn (say, object trajectories) in additwn to the spatial representation. Our
`algonthm for motion detection uses the motion compensation component of the MPEG video-en(cid:173)
`coding scheme and then computes traJectories for obJects of interest. The specification of a
`language for retrieval of video based on the spatial as well as motwn characteristics rs presented.
`
`Categories and Subject Descriptors. H.3.3 [Information Storage and Retrieval]: Information
`Search and Retrieval; H.5 1 [Information Interfaces and Presentation]· Multimedia Infor(cid:173)
`mation Systems; I.2.10 [Artificial Intelligence]: Vision and Scene Understandmg-motwn
`
`General Terms: Algorithms, Desrgn
`
`Additional Key Words and Phrases: Content-based retrieval of video, motion recovery, MPEG
`compressed video analysis, video databases, vrdeo retrieval
`
`i. INTRODUCTION
`Applications such as video on demand, automated surveillance systems, video
`databases, industrial monitoring, video editing, road traffic monitoring, etc.
`involve storage and processing of video data. Many of these applications can
`benefit from retrieval of the video data based on their content. The problem is
`that, generally, any content retrieval model must have the capability of
`
`Thi5 article is a revised veroion y.,:ith maJor extensions of an earlier paper which was presented at
`the ACM Multimedia '94 Conference.
`Authors' addresses: N Dimitrova, Philips Laboratories, 345 Scarborough Road, Briarcliff Manor,
`NY 10562; email: nvd@philabs.philips com, F. Golshani, Department of Computer Science and
`Engineermg, Arizona State University, Tempe, AZ 85287-5406; email golsham@asu.edu.
`Permission to make digrtaljhard copy of part or all of this work for personal or classroom use rs
`granted without fee provided that copres are not made or distributed for profit or commercral
`advantage, the copyright notice, the title of the publication, and its date appear, and notrce rs
`given that copying rs by permission of ACM, Inc. To copy otherwise, to republish, to post on
`servers, or to redistribute to hsts, requires prior specific permission andjor a fee
`@ 1995 ACM 1046-8188j95j1000-0408$03.50
`
`ACM Transactions on InformatiOn Systems, Vol 13, No 4. October 1995, Pages 408-439
`
`Canon Ex. 1006 Page 3 of 34
`
`

`

`Video Content Class1f1cation
`
`409
`
`dealing with massive amounts of data. As such, classification is an essential
`step for ensuring the effectiveness of these systems.
`Motion is an essential feature of video sequences. By analyzing motion of
`objects we can extract information that is unique to the video sequences. In
`human and computer vision research there are theories about extracting
`motion information independently of recognizing objects. This gives us sup(cid:173)
`port for the idea of classifying sequences based on the motion information
`extracted from video sequences regardless of the level of recognition of the
`objects. For example, using the motion information we can not only submit
`queries like "retrieve all the video sequences in which there is a moving
`pedestrian and a car" but also queries that involve the exact position and
`trajectories of the car and the pedestrian.
`Previous work in dynamic computer vision can be classified into two major
`categories based on the type of information recovered from an image se(cid:173)
`quence: recognition through recovering structure from motion and recognition
`through motion directly. The first approach may be characterized as attempt(cid:173)
`ing to recover either low-level structures or high-level structures. The low-level
`structure category is primarily concerned with recovering the structure of
`rigid objects, whereas the high-level structure category is concerned primar(cid:173)
`ily with recovering nonrigid objects from motion. Recovering objects from
`motion is divided into two subcategories: low-level motion recognition and
`high-level motion recognition. Low-level motion recognition is concerned with
`making the changes between consecutive video frames explicit (this is called
`optical flow [Horn and Schunck 1981]). High-level motion recognition is
`concerned with recovering coordinated sequences of events from the lower(cid:173)
`level motion descriptions.
`Compression is an inevitable process when dealing with large multimedia
`objects. Digital video is compressed by exploiting the inherent redundancies
`that are common in motion pictures. Compared to encoding of still images,
`video compression can result in huge reductions in size. In the compression of
`still images, we take advantage of spatial redundancies caused by the simi(cid:173)
`larity of adjacent pixels. To reduce this type of redundancy, some form of
`transform-based coding (e.g., Discrete Cosine Transform, known as DCT) is
`used. The objective is to transform the signal from one domain (in this case,
`spatial) to the frequency domain. DCT operates on 8 X 8 blocks of pixels and
`produces another block of 8 X 8 in the frequency domain whose coefficients
`are subsequently quantized and coded. The important point is that most of
`the coefficients are near zero and after quantization will be rounded off to
`zero. Run-length coding, which is an algorithm for recording the number of
`consecutive symbols with the same value, can efficiently compress such an
`object. The next step is coding. By using variable-length codes (an example is
`Huffman tables), smaller code words are assigned to objects occurring more
`frequently, thus further minimizing the size.
`Our aim in the coding of video signals is to reduce the temporal redundan(cid:173)
`cies. This is based on the fact that, within a sequence of related frames,
`except for the moving objects, the background remains unchanged. Thus to
`reduce temporal redundancy a process known as motion compensation is
`
`ACM Transactions on Information Systems, Vol 13, No 4. October 1995
`
`Canon Ex. 1006 Page 4 of 34
`
`

`

`410
`
`N. Dimitrova and F. Golshan1
`
`used. Motion compensation is based on both predictive and interpolative
`coding.
`MPEG (Moving Pictures Expert Group) is the most general of the numer(cid:173)
`ous techniques for video compression [Furht 1994; LeGall 1991; Mattison
`1994]. In fact, the phrase "video in a rainbow'' is used for MPEG, implying
`that by adjusting the parameters, one can get a close approximation of any
`other proposal for video encoding. Motion compensation in MPEG consists of
`predicting the position of each 16 X 16 block of pixels (called a macroblock)
`through a sequence of predicted and interpolated frames. Thus we work with
`three types of frames-namely, those that are fully coded independently of
`others (called reference frames or !-frames), those that are constructed by
`prediction (called predicted frames or P-frames), and those that are con(cid:173)
`structed by bidirectional interpolation (known as B-frames). It begins by
`selecting a frame pattern which dictates the frequency of !-frames and the
`intermixing of other frames. For example, the frame pattern IBBPBBI indi(cid:173)
`cates (1) that every seventh frame is an !-frame, (2) that there is one
`predicted frame in the sequence, and (3) that there are two B-frames between
`each pair of reference andjor predicted frames. Figure 1 illustrates this
`pattern.
`Our approach to extracting object motion is based on the idea that during
`video encoding by the MPEG method, a great deal of information is extracted
`from the motion vectors. Part of the low-level motion analysis is already
`performed by the video encoder. The encoder extracts the motion vectors for
`the encoding of the blocks in the predicted and bidirectional frames. A
`macroblock can be viewed as a coarse-grained representation of the optical
`flow. The difference is that the optical flow represents the displacement of
`individual pixels while the macroblock flow represents the displacement of
`macroblocks between two frames. At the next, intermediate level, we extract
`macroblock trajectories which are spatiotemporal representations of mac(cid:173)
`roblock motion. These macroblock trajectories are further used for object
`motion recovery. At the highest level, we associate the event descriptions to
`object/motion representations.
`Macroblock displacement in each individual frame is described by the
`motion vectors which form a coarse optical-flow field. We assume that our
`tracing algorithm is fixed on a moving set of macroblocks and that the
`correspondence problem is elevated to the level of macroblocks instead of
`individual points. The advantage of this elevation is that even if we lose
`individual points (due to turning, occlusion, etc.) we are still able to trace the
`object through the displacement of a macroblock. In other words, the corre(cid:173)
`spondence problem is much easier to solve and less ambiguous. Occlusion and
`tracing of objects which are continuously changing are the subject of our
`current investigations.
`In Section 2 of this article we survey some of the research projects related
`to our work. In Section 3 we present the object motion analysis starting from
`the low-level analysis through the high-level analysis. We discuss the impor(cid:173)
`tance of motion analysis and its relevance to our model which is presented in
`Section 3.4. Section 4 introduces the basic OMV structures (object, motion,
`
`ACM Transactwns on InformatiOn Systems, Vol. 13. No.4, October 1995.
`
`Canon Ex. 1006 Page 5 of 34
`
`

`

`

`

`412
`
`N. D1mitrova and F. Golshani
`
`In the dynamic computer vision literature there are general models for
`object motion estimation and representation, as well as domain-restricted
`models. A general architecture for the analysis of moving objects is proposed
`by Kubota et al. [1993]. The process of motion analysis is divided into three
`stages: moving-object candidate detection, object tracking, and final motion
`analysis. The experiments are conducted using human motion. Another ap(cid:173)
`proach to interpretation of the movements of articulated bodies in image
`sequences is presented by Rohr [1994]. The human body is represented by a
`three-dimensional model consisting of cylinders. This approach uses the
`modeling of the movement from medical motion studies. Koller et al. [1993]
`discuss an approach to tracking vehicles in road traffic scenes. The motion of
`the vehicle contour is described using an affine motion model with a transla(cid:173)
`tion and a change in scale. A vehicle contour is represented by closed cubic
`splines. We make use of the research results in all these domain-specific
`motion analysis projects. Our model combines the general area of motion
`analysis with individual frame (image) analysis.
`In case of video modeling, the video footage usually is first segmented into
`shots. Segmentation is an important step for detection of cut points which can
`be used for further analysis. Each video shot can be represented by one or
`more key frames. Features such as color, shape, and texture could be ex(cid:173)
`tracted from the key frames. An approach for automatic video indexing and
`full video search is introduced by Nagasaka and Tanaka [1992]. This video(cid:173)
`indexing method relies on automatic cut detection and selection of first
`frames within a shot for content representation. Otsuji and Tonomura [1993]
`propose a video cut detection method. Their projection detection filter is
`based on finding the biggest difference in consecutive-frame histogram differ(cid:173)
`ences over a period of time. A model-driven approach to digital video segmen(cid:173)
`tation is proposed by Hampapur et al. [ 1994]. The paper deals with extracting
`features that correspond to cuts, spatial edits, and chromatic edits. The
`authors present an extensive formal treatment of shot boundary identifica(cid:173)
`tion based on models of video edit effects. In our work, we rely on these
`methods for the initial stages of video processing, since we need to identify
`shot boundaries to be able to extract meaningful information within a shot.
`One representation scheme of segmented video footage uses key frames
`[Arman et al. 1994]. The video segments can also be processed for extraction
`of synthetic images, or layered representational images, to represent closely
`the meaning of the segments. A methodology for extracting a representative
`image, salient video stills, from a sequence of images is introduced by
`Teodosio and Bender [1993]. The method involves determining the optical
`flow between successive frames, applying affine transformations calculated
`from the flow-warping transforms, such as rotation, translation, etc., and
`applying a weighted median filter to the high-resolution image data resulting
`in the final image. A similar method for synthesizing panoramic overviews
`from a sequence of frames is implemented by Teodosio and Mills [ 1993].
`Swanberg et al. [1993] introduced a method for identifying desired objects,
`shots, and episodes prior to insertion in video databases. During the insertion
`process, the data are first analyzed with image-processing routines to identify
`
`ACM Transactwns on Informatwn Systems, Vol 13, No.4, October 1995
`
`Canon Ex. 1006 Page 7 of 34
`
`

`

`Video Content Classification
`
`413
`
`the key features of the data. In this model, episodes are represented using
`finite automata. Only video clips with inherently well defined structure can
`be represented. The model exploits the spatial structure of the video data
`without analyzing object motion. Zhang et al. [ 1994] presented an evaluation
`and a study of knowledge-guided parsing algorithms. The method has been
`implemented for parsing of television news, since video content parsing is
`possible when one has an a priori model of a video's structure.
`Another system, implemented by Little et al. [ 1993], supports content-based
`retrieval and playback. They define a specific schema composed of movie,
`scene, and actor relations with a fixed set of attributes. Their system requires
`manual feature extraction. It then fits these features into the schema.
`Querying involves the attributes of movie, scene, and actor. Once a movie is
`selected, a user can browse from scene to scene beginning with the initial
`selection. Weiss [1994] presented an algebraic approach to content-based
`access to video. Video presentations are composed of video segments using a
`video algebra. The algebra contains methods for temporally and spatially
`combining video segments, as well as methods for navigation and querying.
`Media Streams is a visual language that enables users to create multilayered
`iconic annotations of video content [Davis 1993]. The objects denoted by icons
`are organized into hierarchies. The icons are used to annotate the video
`streams in a Media Time Line. The Media Time Line is the core browser and
`viewer of Media Streams. It enables users to visualize video at multiple time
`scales simultaneously, in order to read and write multilayered, iconic annota(cid:173)
`tions, and it provides one consistent interface for annotation, browsing,
`query, and editing of video and audio data.
`The work presented here follows from a number of efforts listed above.
`Specifically, we use low- and intermediate-level motion analysis methods
`similar to those offered by Allmen [1991] and others. Our object recognition
`ideas have been influenced by the work of Jain and his students [Gupta et al.
`1991a; 1991b], Grosky [Grosky and Mehrotra 1989], and the research in
`image databases. Several lines of research such as those in Little et al.
`[ 1993], Swanberg et al. [ 1993], Zhang et al. [ 1994], and Weiss [ 1994] provided
`many useful ideas for the modeling aspects of our investigations. An early
`report of our work was presented in Dimitrova and Golshani [1994].
`
`3. MOTION RECOVERY IN DIGITAL VIDEO
`In this section we describe in detail each level of the motion analysis pipeline.
`At the low-level motion analysis we start with a domain of motion vectors.
`During intermediate-level motion analysis we extract motion trajectories that
`are made of motion vectors. Each trajectory can be thought of as an n-tuple of
`motion vectors. This trajectory representation is a basis for various other
`trajectory representations. At the high-level motion analysis we associate an
`activity to a set of trajectories of an object using domain knowledge rules.
`
`3.1 Low-Level Motion Extraction: Single Macroblock Tracing
`In MPEG, to encode a macroblock in a predicted or a bidirectional frame, we
`first need to find the best matching macroblock in the reference frames, then
`
`ACM Transactions on InformatiOn Systems, Vol. 13, No.4, October 1995.
`
`Canon Ex. 1006 Page 8 of 34
`
`

`

`414
`
`N. D1m1trova and F. Golshan1
`
`find the amount of x and y translation (i.e., the motion vector), and finally
`calculate the error component [Patel et al. 1993]. The motion vector is
`obtained by minimizing a cost function that measures the mismatch between
`a block and each predictor candidate. Each bidirectional and predicted frame
`is an abundant source of motion information. In fact, each of these frames
`might be considered a crude interpolation of the optical flow. Thus, the
`extraction of the motion vectors of a single macro block through a sequence of
`frames is similar to low-level motion analysis.
`Tracing a macro block can continue until the end of the video sequence if we
`do not impose a stopping criterion. We have a choice: to stop after a certain
`number of frames, stop after the object (macroblock) has come to rest, stop if
`the block comes to a certain position in the frame, stop if the macro block gets
`out of the scene, or stop if the macroblock is occluded.
`The algorithm for tracing the motion of a single macroblock through one
`frame pattern for MPEG encoding is given in Figure 2. In Dimitrova [1995],
`we describe object motion tracing for video databases in more detail. The
`algorithm takes the forward and backward motion vectors that belong to a
`particular macroblock and computes the macroblock's trajectory. The algo(cid:173)
`rithm computes the macroblock's position in a B-frame by averaging the
`positions obtained from: (1) the previous block coordinates and forward
`motion vectors and (2) next (predicted) block coordinates and the backward
`motion vector. The position of a macroblock in a P-frame is computed using
`only block coordinates and forward motion vectors. If during the tracing
`procedure the initial macroblock moves completes out of its position, then we
`have to extract motion vectors for the new macroblock position, which implies
`that we are continuing by tracing the macroblock whose position coincides
`with the (x, y) coordinates ofthe initial macroblock. In the rest of this article,
`we will use T to indicate the set of all possible motion vectors.
`
`3.1.1 Trajectory Description. Various motion retrieval procedures have
`specific requirements for retrieving desired objects. These requirements de(cid:173)
`pend on the characteristics of the retrieval which may be flexible to strict.
`The choice of trajectory representation may dictate the manner in which
`retrieval is conducted. Given a set of motion vectors for a macroblock, a
`number of mechanisms exist for trajectory representation. Below we present
`a sample list:
`
`(1) Point Representation: A trajectory in this case is a set of points repre(cid:173)
`sented by the absolute or relative frame coordinates of the position of the
`object, say
`
`where (x,, y,) is derived by projecting (x, y, i) onto the image plane.
`( x, y, i) denotes the position of an object, i.e., ( x, y ), at time instant i.
`(2) Curve Representation: A parametric B-spline curve P(u) can be computed
`that passes through each of the trajectory points ( x,, y,) (see Farin [ 1990]
`for a detailed discussion). The first step involves generating a parameteri-
`
`ACM TransactiOns on InformatiOn Systems, Vol. 13, No.4. October 1995
`
`Canon Ex. 1006 Page 9 of 34
`
`

`

`Video Content Classification
`
`415
`
`Given: trames F = Fi i = O, ... ,n;
`motion vectors V = (fmx(i),fmy(i)),(bmx(i),bmy(i)) i= 1,n
`initial block coordinates bx, by
`Initialize R = 0,
`for i=l, ... , n
`if F(i) :f= I then
`if F(i) == P then
`if previousType == I
`ex = bx- fmx(i)/2;
`cy = by- fmy(i)/2;
`nextblockx = ex; nextblocky = cy;
`if previousType == P
`givenx = futurex;
`giveny = futurey;
`futurex = futurex - fmx(i)/2;
`futurey = futurey - fmy(i)/2;
`if F(i) == B then
`ex=( (givenx-fmx(i)/2)+(futurex-bmx(i) /2)) /2;
`cy=( (giveny-fmy(i) /2)+(futurey-bmy(i) /2)) /2;
`if block(bx,by) n block(cx,cy) == 0 then
`extract(mx(i),my(i)) for (cx,cy)
`R = R U {(mx(i),my(i))}
`if F(i) is the last in a group of B frames before a P frame
`ex = futurex;
`cy = futurey;
`if block(bx,by) n block(cx,cy) == 0 then
`extract(mx(i),my(i)) for (cx,cy)
`R = R U {(mx(i},my(i))}
`if F(i) == I then
`(bx,by) +- bestMatch(bx,by) in I
`if stopping criteria == true, then
`return R;
`endfor
`
`Fig. 2. Algorithm for tracing the motion of a macro block.
`
`zation or knot sequence u 1 ~ u 2 ~ .. · ~ un. A commonly used approach
`employs cumulative chord lengths defined by the points ( x,, y, ). The next
`step involves setting up and solving a tridiagonal linear system of equa(cid:173)
`tions whose unknowns are the control points d, of the B-splines N,(u).
`The linear system depends on the x,, y, and u, values. This linear
`system can be efficiently solved in ct"(n) time using standard techniques
`for tridiagonal matrices. The B-spline curve has the form:
`
`P( u) = L, d,N,(u),
`
`and it satisfies the following:
`
`(a) P(u,) = (x, y,);
`
`ACM Transactwns on Information Systems, Vol 13, No 4, October 1995
`
`Canon Ex. 1006 Page 10 of 34
`
`

`

`416
`
`N. Dimitrova and F. Golshan1
`
`(b) P(u) is a piecewise cubic polynomial, i.e., for u, sus u,+l, P(u) is a
`polynomial of degree less or equal to three; and
`(c) the first and the second derivatives of P( u) are continuous.
`
`(3) Chain Code Representation: We develop a piecewise linear approximation
`to the trajectory using a set of orientation primitives. Given a set of
`discrete trajectory orientation primitives, we use a zig-zag line represen(cid:173)
`tation of the trajectory to generate the code. Another way of viewing this
`approach is derived from a neighborhood matrix with each neighbor coded
`to correspond to the primitives in the figure [Schalkoff 1989].
`(4) Differential Chain Code Representation: Each segment is coded relative to
`the next line segment using the direction (left or right) and the length.
`For example, we can have a code for: right shorter-1, right equal-2, right
`longer-3, left shorter-4, left equal-5, left longer-6 [Schalkoff 1989]. This
`scheme is useful for approximate matching of object trajectories. It is a
`rotation-, scaling-, and translation-invariant scheme.
`
`Figure 3 illustrates these methods used for the representation of an
`arbitrary movement. Figure 3(a) is an exact coordinate representation; 3(b) is
`a B-spline curve representation. Figure 3(c) represents the chain-coding
`process, and 3(d) shows the differential chain code representation of the
`trajectory.
`Note that in the coordinate representation and B-spline and chain code
`representation schemes we have a way of representing zero motion, i.e., when
`the motion vector is a null vector. If the macroblock does not move over a
`certain number of frames, the point will be repeated. In the B-spline repre(cid:173)
`sentation, the knot G.e., the control point) will have a multiplicity greater
`than one. In the chain code representation, the zero motion is represented by
`the code "0." So, in all these representations the trajectory is not only a
`spatial representation of the object's motion (the path) but also a temporal
`characterization of the motion. By keeping track of the zero motion we are
`able to describe stationary objects as well.
`The diversity of the trajectory representations makes the querying process
`more flexible. The actual method of representation does not have a significant
`impact on the querying process as long as modeling, representation, and
`querying are all done in the same fashion.
`
`3.1.2 Trajectory-Matching Functions. Applications such as automated
`surveillance may require retrieval of either video sequences or objects con(cid:173)
`tained in these sequences based on the object trajectories. For example,
`queries of the type "retrieve objects that have a motion trajectory whose point
`of origination is the main gallery door and terminate at the Juan Miro's
`picture on the opposite wall" may help in the identification of the person who
`damaged the picture.
`Matching functions used for motion retrieval depend on the method em(cid:173)
`ployed for trajectory representation, as described below.
`
`ACM TransactiOns on InformatiOn Systems, Vol. 13, No.4, October 1995
`
`Canon Ex. 1006 Page 11 of 34
`
`

`

`

`

`418
`
`N. Dimitrovaand F. Golshani
`
`pendent movements, then, such a nonrigid object is represented as a set of
`rigid objects with their respective trajectories. At the highest level of motion
`analysis, we associate "activities" with the object trajectory representations.
`Rigid-object motion is represented by a single trajectory. The trajectory is
`one common representation of the trajectories of all the component macro(cid:173)
`blocks. Finding the most-representative trajectory is not a simple task. In the
`simplest case we can take the trajectory of the object centroid as the reference
`object trajectory. A more-complicated case occurs if we decide to create a
`common trajectory by processing all of the macroblock trajectories or by
`examining only a subset of all macroblock trajectories.
`Mean averaging of all trajectories of the macroblocks of the object is an
`alternative to choosing the object centroid's trajectory. The averaging of the
`trajectories in the exact form is pointwise averaging of the trajectories at
`each frame.
`The following two assumptions make the object motion recovery feasible:
`
`(1) Integrity of Objects: We assume objects are rigid or consist of rigid parts
`connected to each other. We do not consider situations in which objects
`disintegrate. This assumption is important because we only use object
`trajectory representation.
`(2) Motion Continuity: Each macroblock under consideration has continuous
`motion. This assumption is important for the trajectory representation,
`since every trajectory segment represents continuation of the previous
`trajectory segment.
`
`Averaging trajectories is used for determining a representation of a non(cid:173)
`rigid body motion. For nonrigid objects, we must determine the number of
`trajec

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket