`
`.
`
`FOR THE DISTRICT OF MASSACHUSETTS
`
`CANON, INC., CANON USA, INC.
`AND AXIS COMM. AB,
`Plaintiffs,
`
`V.
`
`AVIGILON FORTRESS
`CORPORATION,
`Defendant.
`
`
`
`Civil Action No. l :19-MC-91401-NMG-
`JCB
`
`DECLARATION OF KATHERINE ZIMMERMAN
`
`1, Katherine Zimmerman, state and declare as follows:
`
`1.
`
`I am a Scholarly Communications and Licensing Librarian at the Massachusetts
`
`Institute of Technology (“MIT”) Libraries, 105 Broadway, Building NE36, Suite
`
`6101, Cambridge, Massachusetts 02142.
`
`2.
`
`I am over 18 years of age and am competent to make this Declaration. I make this
`
`Declaration based on my own personal knowledge, based on my knowledge and
`
`review of the business records and practices of the MIT Libraries, based on
`
`conversations with other library staff, and based on the notes and records of Marilyn
`
`McSweeney who prepared Declarations until her retirement in 2016.
`
`3.
`
`I have been employed at MIT since 2016.
`
`4. Through the actions described in paragraph 2, I have become knowledgeable about
`
`the MIT Libraries’ normal business practices with respect to how MIT receives,
`
`catalogs,
`
`indexes, shelves, and makes available to the public journals and
`
`publications.
`
`858406.v2
`
`-
`
`Canon Ex. 1055 Page 1 of 45
`
`Canon Ex. 1055 Page 1 of 45
`
`
`
`5. Attached as Exhibit A to this Declaration is a true and accurate copy of the catalog
`
`record from the MIT Libraries’ online catalog system (known as the Barton
`
`Catalog) for the publication series entitled ACM Transactions on Information
`
`Systems: a publication of the Association for Computing Machinery vols. 7 (1989)
`
`- 26 (2008) (“ACM Transactions on Information Systems”). This is a record that
`MIT maintains in the ordinary course ofits regular activities.
`-
`
`6. Attached as Exhibit B to this Declaration is a true and accurate copy of the issue
`
`cover, first page, back cover, and full article text, for the article titled “Motion
`
`Recovery for Video Content Classification” by Nevenka Dimitrova and Forouzan
`
`Golshani published on pages 408-439 of Volume 13, No. 4 of the . ACM
`
`Transactions on Information Systems, which was published in October 1995 (the
`
`“October 1995 Issue”). The ACM Transactions on Information Systems is
`
`available in print format in vols. 7 (1989) - 26 (2008) from the MIT Libraries, and
`
`is a record that MIT maintains in the ordinary course of its regular activities.
`
`7. The October 1995 Issue has an MIT Libraries date stamp of “NOV 13 1995,”
`
`indicating that the MIT Libraries received the issue on November 13, 1995.
`
`8. After a serials issue receives a date stamp, it undergoes a process of being labeled
`
`and moved to a shelf of the MIT Libraries. Based on current MIT Libraries
`
`practice, this process typically takes one to two weeks. According to the MIT
`
`Libraries’ current normal business practice, the October 1995 Issue would have
`
`been displayed on a shelf of the MIT Libraries no later than November 27, 1995.
`
`858406-v2
`
`I
`
`Canon Ex. 1055 Page 2 of 45
`
`Canon Ex. 1055 Page 2 of 45
`
`
`
`9. Once a publication is on a shelf of theMIT Libraries it is available to be Viewed
`
`within the MIT Libraries by any member of the public or requested Via Interlibrary
`
`Loan.
`
`10. To the best of my knowledge and that of current MIT' employees, unless stated
`
`otherwise, the above statements are descriptions of normal business practices at the
`
`MIT Libraries from at least the beginning of 1995 and through the present.
`
`I declare under penalty of perjury that the foregoing is true and correct. Executed on
`
`October 23, 2019, at Cambridge, Massachusetts.
`
`\
`
`I? THEIZEW
`
`858406.v2
`
`*
`
`Canon Ex. 1055 Page 3 of 45
`
`Canon Ex. 1055 Page 3 of 45
`
`
`
`Canon EX. 1055 Page 4 of 45
`
`Canon Ex. 1055 Page 4 of 45
`
`
`
`
`
`EXHIBIT A
`EXHIBIT A
`
`Canon EX. 1055 Page 5 of 45
`
`Canon Ex. 1055 Page 5 of 45
`
`
`
`MIT Libraries' catalog - Barton - Full Catalog - Full Record
`
`http://library.mit.edu/F/9RDNRF61C37HYS4TADPBBXPC1SYSD1L...
`
`Search Full Catalog:
` • Basic
` • Advanced
`
`Search only for:
` • Conferences
` • E-resources
`
` • Journals
` • MIT Theses
`
`• Reserves
`• more...
`
`• Your Account
`• Help with Your Account
`
`• Your Bookshelf
`• Previous Searches
`
`MIT Libraries
`
`Other Catalogs
`
`Help
`
`Full Record
`
`Permalink for this record: http://library.mit.edu/item/000395517
`
`Results List | Add to Bookshelf | Save/Email
`
`Choose
`format:
`
`Standard | Citation |
`
`MARC
`tags
`
`Record 2 out
`of 3
`
`Title ACM transactions on information systems : a publication of the Association for Computing
`Machinery.
`Continues ACM transactions on office information systems
`Online Access
` v.7:no.1 (1989:Jan.)-
`Library Holdings Library Storage Annex - Off Campus Collection | HF.A184
` v.7(1989)-v.26(2008)
`Shelf Access Find it in the library/Request item
`Shelf Location Library Storage Annex - Off Campus Collection | HF.A184
`
`Published New York, NY : The Association, c1989-
`Description v. : ill. ; 26 cm.
`Numbering Vol. 7, no. 1 (Jan. 1989)-
`Series ACM series on computing methodologies.
`Current Frequency Quarterly
`Format Serial (e.g. journals, book series, etc.)
`Note Title from cover.
`Other Format Also available via the World Wide Web.
`Subject Electronic data processing -- Periodicals.
`Information storage and retrieval systems -- Periodicals.
`Information retrieval -- Periodicals.
`Other Author Association for Computing Machinery.
`Title Abbreviation ACM trans. inf. sys.
`Other Title Transactions on information systems.
`Association for Computing Machinery transactions on information systems.
`ISSN 1046-8188
`CODEN ATISET
`Local System Number 000395517
`
`1 of 2
`
`10/10/2019, 12:45 PM
`
`Canon Ex. 1055 Page 6 of 45
`
`
`
`MIT Libraries' catalog - Barton - Full Catalog - Full Record
`
`http://library.mit.edu/F/9RDNRF61C37HYS4TADPBBXPC1SYSD1L...
`
`Basic Search of Full Catalog
`Search type:
`
`Search for:
`
`Barton Questions: Ask Us! | Contact Us
`Massachusetts Institute of Technology
`77 Massachusetts Avenue, Cambridge, MA 02139-4307 USA
`
`© 2003 Massachusetts Institute of Technology
`
`2 of 2
`
`10/10/2019, 12:45 PM
`
`Canon Ex. 1055 Page 7 of 45
`
`
`
`
`
`EXHIBIT B
`EXHIBIT B
`
`Canon EX. 1055 Page 8 of 45
`
`Canon Ex. 1055 Page 8 of 45
`
`
`
`I'll”
`II
`III
`0 0303
`
`”1.3
`
`
`
` me”'5m3
`
`3 908
`
`l2.:..
`
`.43..:
`,,..
`_....i._.....
`-.,a...
`
`Canon Ex. 1055 Page 9 of 45
`
`
`
`acm Series on
`Computing
`Methodologies
`
`Special issue on Video Information Retrieval
`
`371
`
`Guest Editors' Introduction
`by Scott Stevens and Thomas Little
`
`by Flaif Keller. Wolfgang Effelsberg, and Bernd Lampaarter
`
`Embedded Video in Hypermedia Documents: Supporting
`integration and Adaptive Control
`by Dick c. A. Bulterman
`
`' A Video Retrieval and Sequencing System
`by Tat-Seng Chue and Li-Qun Buan
`
`Motion Recovery for Video Content Classification
`by Nevenks Dimitrova and Forouzan Golshani
`
`XMovie: Architecture and Implementation of a Distributed
`Movie System
`
`
`
`Canon Ex. 1055 Page 10 of 45
`
`
`
`acm Transacuons
`I
`0 n I n fo rm at I o n
`Systems
`
`1515 Broadway
`___'
`New York. NY 10036
`Tel: (212) 869-?440
`ACM European Service Center
`Avenue Marcet Thiry 204
`i§i9%§;“$$i'%e%3'9‘“m
`
`Fax: 32 2 774 9590
`Email: acm _europe@acm.org
`
`Editor-in-Chiet
`Ftobert E. Allen
`
`Editor-infinite! Desrgnate
`W. Bruce Crolt
`Associate Editors
`Alex Bargida
`
`Mic Bowman
`
`Shin-Pu Chang
`
`prasun Dewan
`
`Steven Feiner
`
`John Herring
`
`Michael N. Huhns
`
`Simon Kaplan
`
`.
`Hob KIIng
`
`Ray Larson
`
`John J. Leggefl
`
`David D. Lewis
`_
`Judith Olson
`,
`.
`Paolo Paollnl
`
`Gerard Salton
`
`Peter Schauble
`
`Alan F. Smeaton
`
`Headquarters Ouarterlres Starr
`Mark Mandelbaum
`Nhora Cortes-Comerer
`Home Simon
`George Criscione
`
`Beiicorex2A-aerr445 South St.rM0iristown. NJ organisms USAr
`4.172017829-43tS/tors@t3ellor3ie.com
`
`University.‘ 01 Massachusettsfiomputer SCience DepartmenULGRC 243. Box
`34610IAmherst. MA (ENDS-4610 USN- 1 -413—5415-0463fcrott@cs.umassedu
`
`Rutgers Universitleepartmeni 01 Computer Soiencei CORE Butiding Room
`315-‘Piscataway. NJ 08854 USAJH-QUB-QSE-di’ddt
`b0rgtdfl®topaz rutgers edu
`Transarc Corporationflhe Gull TowerfiO? Grant Streetfpitlsburgii. PA 15219
`USA-+17412'338‘6752Imtc@1ransarc com
`Columbia Universttyr‘Department oi Eteclrtca! Engineering and Center for
`Telecommunications HesearchsNew York. NY t0027 USAiM- 2t2-3tB-9068r
`slchang©ctrcolumbiaedu
`Department oi Computer ScrencoiRoom 150.CB 31?5. Sitterson i-iali.l
`Unwersrty ol North Carolina. Chapel Hill/Chapei Hill. NC 27599-3175 USN
`+1-919-962-1823tdewan@cs uncedu
`Department oi Computer ScrencetCoturnoia UniversuytSOO W. 120 Streeu
`New York. NY 1002? USAr‘w-1721279393083tieiner@cs Columbiaedu
`Oracle Corporationi‘Spaliai inlorr‘naiion Systems-’3 Betheseda Metro Center!
`Suite 1400/208t4 USAf+1301-907A2?23/ih€trlng@us oraclecom
`Department 01 Electrical and Computer EttgiheetingrUniverSIty or South
`CaroitnaiCoiurnbia. SC 29208 USA!»t-803-Fir-5921rhuhns@ece.sc.edu
`Department ot Computer Screncei‘Universny o! OueenslandrSaint LUCla 40M!
`Australia-2w -?33~653-168i-s.kaplan@cs.uq oz.au
`Department oi tniormation and Computer SCiencer'UniverSity ot Catitornta at
`IrVinei‘irvine. CA 927”? USAE‘1-TiJ-856-5955iklin9@ics uct edu
`UniverSity ol Calitomia at Berkeley/School of Library and Information Studies.'
`Berkeley. CA 94720 USAwt-dt56112-6021Sitarson@sheriockberketeyedu
`Department ot Computer SeienceiTexas AtsM UniverSiieroliege Station. TX
`Fads-3112 USA”t-409A845-0298.*leggett@cs tamu ectu
`ATET Bell Laboratories'EC 408K500 Mountain Avenue/Murray Hill. NJ W974
`USA.»l-908-582-39?6i|ews@reseurch on com
`CSMiL‘UniuerSity ot Michigani‘ml Tappan Streeti'Ann Arbor. Mi 48109-1234
`usn.-~t-aia-r47—4505igso@csmii ur‘rtich edu
`Dtpanrmento di EEettromca-‘Politecnico di Milanoi‘Piazza Leonardo da Vinci
`32:20t33 Milan. Iialy'.3972723993520i-paoiini@ipmel2.etet polimi it
`Department ol Computer SeiencerCornelt Ur1iverSltyr’ithaC8. NY 14853 USA!
`. 1-507-255-411 rigs@cs oorneil edu
`institute ol Intormatior‘. Systerns’Swiss Federal Institute 01 Technology (ETHiJ
`ETH Zemium. IFWJCHeBOQQ. Zurich. SthZEriandi+414-954-7222i
`schauble©rnl eihz on
`School of Computer Applications.'Duplin City Universttyi‘Giasnevmi‘Dublin 9.
`Ireland/+3.53%7TOASZGZ/asmealon@compapp 050 ie
`
`Director oi Publications
`Assomate Director ot Publications
`Managing Editor, ACM Ouaneriies
`Assooate Managing Editor. ACM Ouanerlies
`
`
`
`
`
`A.‘
`
`ACM Transactions on information Systems ttSSN 1046-8188) is published 4 times a year in January, April. July. and
`October by the Association tor Computing Machinery, inc, 1515 Broadway. New York, NY 10036. Second-class
`postage paid at New York. NY 10001 , and at additionat mailing oilices. Postmaster: Send address changeS '0 Trans?
`actions on Information Systems. ACM, 1515 Broadway. New York. NY 10036.
`Copyright © 1995 b)’ ”18 Association tor Computing Machinery. inc. (ACM). Permission to make digital or hard copieg
`oi part or ait of this work tor personal or classroom use is granted without lee provided that copies are not made or
`distrlbyled tor profit 0r commercial advantage and that copies bear this notice and the full citation on the first page
`COPYtIth-‘S for components of this work owned by others than ACM must be honored. Abstracting with credit is
`permitted. To copy otherwise. to republish, to post on sewers. or to redistribute to lists. requires prior specific Pe’mis'
`3'°”_a”d’°' a '99- Request Permission to repubiish itorn: Publications Dept. ACM.
`inc. Fax +1 {2121 869-0431 or
`email <permissions©acm.org>.
`pefgitfggegrgoitjyigogttmictes that carry a code at the bottom or the first or last page or screen display. copyingzis
`"" e
`a
`taper-copy fee indicated in thec
`‘
`‘
`C it!
`.
`Rosewood Drive. Danvers. MA 01923.
`@Qfififlrmmelmrghqpfigéml I 8f 45
`F
`or subscription and submissions intormation. see inside back cover.
`
`.
`
`___—-—
`
`Canon Ex. 1055 Page 11 of 45
`
`
`
`
`
`
`
`Motion Recovery for Video Content
`Classification
`
`NEVENKA DIMITROVA and FOROUZAN GOLSHANI
`
`Arizona State University. Tempe
`
`information. video hl‘tlLll'lll'l‘F must he t'lilfifilllt'tl hosed on the.
`Like other types of digital
`semantics oftl’ieir contents, A rnoreepret'ise anti compleli-i' ext rut-lion [ll semantic ml'nrmuiiOn Will
`result in a more-effective Classtiicatiriii. The mostetiirtcern:hle (hill-retire lii-lui-i'ii slill images and
`moving pictures stems from movements arid variations Thus. to go From the realm otstill-imoge
`repositories to video databases. we must he iilile to deiil \vitli motion l’ttrtieuliirlv. we need the.
`ability to classify objects appearing in ii. video sequence hosed on their characteristics and
`features such a.
`shape 01‘ color. :13. well as their movements B." tiesi'rihinpr the movements that
`we derive from the process of motion analysis. We introduci- :1 dual lill'T'fH'l‘ll) consisting ol‘ spiitinl
`and temporal parts for video sequence representation This gives its the llei-tihilitv to examine
`arbitrary sequences of Frames at various levels of obstruction and to retrieve the associated
`temporal Information may. Object
`tr‘ajECLUFlQ’Sl
`in addition to the spotml representation Our
`algorithm for motion detection uses the motion compensation component of the Ml’li‘t'} video-en-
`coding scheme and then computes trajectories for iiliii-i‘is ol' interest The specification of" :1
`language {01' retrieval of'vtdeo hosed on the -p:ili:il zis Well .is motion i-liiiiuii-ii-ristics is presented.
`
`Categories and Subject Descriptors H ."i 3 [Information Storage and Retrievall' Information
`Search and Retrieval; Hi: 1 [Information Interfaces and Presentation] Multimedia Infor-
`mation Systems. 12 10 [Artificial Intelligence] Vision and Hut-nt- i'iiilei'sluniling mono”
`General Terms Algorithms. Design
`
`Additional Key Words and Phrases: L'ontent-hzisetl ri-trii-viil ol video. motion recovery. MPEG
`compreSsed video analysis. video (minimises, t'tdt-ii retrieval
`
`
`1 INTRODUCTION
`
`Applications such as video on demand. automated surveillance systems. video
`databases. industrial monitoring. video editing. road traffic monitoring. etc.
`involve storage and processing of video data. Many of these applications 'on
`benefit From retrieval of the video data based on their content. The problem is
`that. generally. any content retrieval model must have.
`the capability of
`
`
`This article is a revised version with major extensions ol'an earlier paper which was presented at
`the ACM Multimedia '94 Conference.
`Authors' addresses: N. Dimitrova. Philips Lahortmiries. IMF) Scarborough Mood. Br'iiirelilT Manor.
`NY t0562; email: nvdi’u philubsphilipscom; I". (Jolshom, Department of Computer Science and
`Engineering, Arizona State University. Tempe, AZ 85287-54l16:emiii|1 golshaniiu asuiidw
`Permission to make digital/hard copy of part or all ol' this work for personal or classroom use is
`granted without fee provided that copies are not made or distrihuted for prolit or commchifl1
`advantage. the copyright notice, the title ol‘ the publication, and its date appear. and milk?“- i5
`giwzn that copying is by permission of ACM, Inc. To copy otherwise, to republish. L0 POP-l 0“
`servers. or to redistribute to lists, requires prior specific permission and/or a fee.
`(9' 1995 ACM 10468188/95/i000-0408 $03.50
`
`ACM Transactions on Information Systems. Vol. 13. No. 4. October $995. Pages 408 439.
`
`Canon EX. 1055 Page 12 of 45
`
`Canon Ex. 1055 Page 12 of 45
`
`
`
`Video Content Classification
`
`-
`
`409
`
`cation is an essential
`
`dealing with ninssiv ' amounts of data. As such, classifi
`step For ensuring the ci'li'cliveness of'these systems,
`Motion is on essential thaturc of video sequences. By analyzing motion of
`Objects we ‘élll extract. information that is unique to the video sequences. In
`human and computer t'lHltll]
`research there are theories about extracting
`motion inlornmiioo _independently of recognizing objects. This gives us sup-
`port [or the idea ol classifying sequences based on the motion information
`ext. acted from video sequences regardless of the level of recognition of the
`objects.
`lior example. using the motion information we can not only submit
`queries like "retrievq all
`the Video sequences in which there is a moving
`pedestrian and .‘l cur
`lint also queries that involve the exact position and
`trajectories ol' the car and the pedestrian.
`Previous Work in dy na mic computer vision can be classified into two major
`categories based on the type of information recovered from an image se-
`quence: recogiiit ion through recovering structure from motion and recognition
`through motion directly. The first approach may be Characterized as attempt-
`ing to recover either low-level structures or high-level structures. The low-level
`structure category is primarily concerned with recovering the structure of
`rigid objects. when-us the high-level structure category is concerned primar-
`ily with recovering nonrigid objects from motion. Recovering objects from
`motion is divided into two subcategories: low—level motion recognition and
`high—level motion recognition, Low-level motion recognition is concerned with
`making the changes between consecutive video frames explicit (this is called
`optical {low [Horn and Schunck 1981i). High—level motion recognition is
`concerned with recovering coordinated sequences of events from the lower-
`levol motion descriptions
`Compression is an inevitable process when dealing with large multimedia
`objects. Digital video is compressed by exploiting the inherent redundancies
`that are common in motion pictures. Compared to encoding of still images,
`video compression can result in huge reductions in size. In the compression of
`still images. we take advantage of spatial redundancies caused by the simi-
`larity of adjacent pixels. To reduce this type of redundancy. some form of
`transform-hosed coding (cg. Discrete Cosine Transform. known as DCT) is
`used. The objective is to transform the signal from one domain (in this case,
`spatial) to the frequency domain. DCT operates On 8 X 8 b10Ck5 Chum-‘15. and
`Produces another block of 8 x 8 in the frequency domain whose coeffiCIents
`are subsequently quantized and coded. The imPOl'taut [30th ‘5 that mOSt Of
`the coefficients are near zero and after quantization ‘anle rounded Off to
`zero. Run-length coding. which is an algorithm for recm-ding the number of
`consecutive symbols with the same value, can efficiently compress SUCh an
`object. The next step is coding. By using variable-length @des (an example ls
`Huf'l'man tables). smaller code words are assigned to objects occurring more
`frequently. thus further minimizing the size.
`Our aim in the coding of video signals is to reduce the temporal redtundan-
`cies. This is based on the fact that. within a sequimce 0f “slat? Tliiimetsd
`except for the moving objects. the background remains ‘unchange . and: is
`reduce temporal redundancy a process known as motion compens
`'
`‘
`‘
`"
`O'tober 1995.
`{\(TM 'l‘mnsuciions on Information bystems, Vol. 13. No. 4.
`t
`
`Canon EX. 1055 Page 13 of 45
`
`Canon Ex. 1055 Page 13 of 45
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`410
`
`.
`
`N. Dimitrova and F. Golshani
`
`used. Motion compensation is based on both predictive and interpolative
`coding.
`MPEG (Moving Pictures Expert Group) is the most general of the. numer-
`ous techniques for video compression IFurlit 1994; Letiall 1991: Mattison
`1994]. In fact. the phrase “video in a rainbow" is used for MI’EG, implying
`that by adjusting the parameters. one can get a close approximation of any
`other proposal for video encoding. Motion compensation in MPEG consists of
`predicting the position of each 16 X 16 block of pixels (called a macroblock)
`through a sequence of predicted and interpolated frames. Thus we work with
`three types of frames—namely, those that are fully coded independently of
`others (called reference frames or I-frames).
`those that are constructed by
`prediction (called predicted frames or P-frames). and those that are con-
`structed by bidirectional
`interpolation (known as B-frames). It begins by
`selecting a frame pattern which dictates the frequency of I-frames and the
`intermixing of other frames For example. the frame pattern IBBPBBI indi~
`cares {1]
`that every seventh frame is an I—frame.
`(2)
`that
`there is one
`predicted frame in the sequence. and [3} that there are two B-frames between
`each pair of reference and,--"or predicted frames. Figure 1
`illustrates this
`pattern.
`
`Our approach to extracting object motion is based on the. idea that during
`video encoding by the MPEG method. a great deal of information is extracted
`from the motion vectors. Part of the low-level motion analysis is already
`performed by the video encoder. The encoder extracts the motion vectors for
`the encoding of the blocks in the predicted and bidirectional
`frames. A
`macroblock can be viewed as a coarse-grained representation of the optical
`flow. The difference is that the optical flow represents the displacement of
`individual pixels while the macroblock flow represents the displacement of
`macroblocks between two frames. At the next. intermediate level, we extract
`macroblock trajectories which are spatiotemporal representations of mac"
`roblock motion. These macroblock trajectories are further used for object
`motion recovery. At the highest level. We associate the event descriptions to
`object/ motion representations.
`frame is described by the
`Macroblocl-t displacement
`in each individual
`motion vectors which form a coarse optical-flow field. We assume that our
`tracing algorithm is fixed on a moving set of macroblocks and that the
`correspondence problem is elevated to the level of macroblocks instead of
`individual points. The advantage of this elevation is that even if we lose
`individual points (due to turning, occlusion. etc.) we are still able to trace the
`object through the displacement of a macroblock. In other words, the corre-
`spondence problem is much easier to solve and less ambiguous. Occlusion and
`tracing of objects which are continuously changing are the subject of our
`current investigations.
`In Section 2 of this article we survey some of the research projects related
`to our work. In Section 3 we present the object motion analysis starting from
`the low-level analysis through the high-level analysis. We discuss the impor-
`tance of motion analysis and its relevance to our model Which is presented in
`Section 3.4. Section 4 introduces the basic OMV structures (object, motion,
`ACM Transactions on information Systems. Vol. 13. No. 4. October 1995.
`
`Canon EX. 1055 Page 14 of 45
`
`d n.——___—__.
`
`Canon Ex. 1055 Page 14 of 45
`
`
`
`Forward prediction
`
`Video Content Classification
`
`.
`
`411
`
`1//Bf—B\P
`ejammme
`\mfl:\:rfl\/JJW
`
`Bidirectional prediction
`
`l’ig i
`
`I’Hruard and Iiidtrcctional prediction in MPEG.
`
`Canon EX. 1055 Page 15 of 45
`
`video—sequence]. as the basis for the video information model. The basic
`retrieval operators. the UM\«"-lnnguage specification, and some examples are
`given. Empirical results are outlined in Section 5, and Section 6 presents
`some concluding remarks.
`
`2. RELATED WORK
`
`The research presented in this article builds on the existing results in two
`areas: dynamic computer vision and digital video modeling.
`A current
`trend in computational vision is influenced by the idea that
`motion analysis does not depend on complex-object descriptions. Our work
`follows
`this trend and is based on the recent publications that are in
`agreement with this idea in computational vision. The idea of object/event
`recognition regardless of the existence of object representations can be traced
`back to the early 70‘s when Johansson [1976] introduced his experiments
`with moring-Iig/it displays. The idea was to attach lights to the joints of a
`human subject dressed in dark-colored clothing and observe the motion of
`lights against a dark background. The audience not only could recognize the
`object (human being) but could also describe the motion and the events
`taking place. Goddard [1992] investigated the high-level representations and
`computational processes required for the recognition of human motion based
`on moving-light. displays, The idea is that recognition of any motion involves
`indBXing into stored models of the movement. These stored Wadels‘ called
`scenarios. are represented based on coordinated sequences of discrete motion
`events. The structures and the algorithms are articulated in the language of
`structured connectionist models. Alimen [1991] introduced a computational
`rI‘amework l'or intermediate-level and high-level motion analySIs based or;
`SPatiotempora] surface flow and spatiotempoml flow curves. Spatlotemdpfile‘it
`surfaces are projections of" contours over time. Thus. these surfaces are ‘1
`representations of' object motion.
`‘
`1995.
`ACM 'l‘ronsactiuns on Information Systems. Vol. 1.1. No. 4, October
`
`Canon Ex. 1055 Page 15 of 45
`
`
`
`412
`
`-
`
`N. Dimitrova and F. Golshani
`
`
`
`In the dynamic computer vision literature there are general models For
`object motion estimation and representation. as well as domaiii-restricted
`models. A general architecture for the analysis of moving objects is proposEd
`by Kubota et al. [1993]. The process of motion analysis is divided into three
`stages: moving—object candidate detection. uluet't
`tracking. and final motion
`analysis. The experiments are conducted using human motion. Another ap-
`proach to interpretation of' the movements of articulated i'mdies in image
`sequences. is presented by Rohr [1994]. The human body is represented by a
`three-dimensional model consisting of cylinders. This approach uses the
`modeling of the movement from medical motion studies. Koller et at [[993]
`discuss an approach to tracking vehicles in road traffic scenes. The motion of
`the vehicle contour is described using an ailine motion model with a transla-
`tion and a change in scale. A vehicle contour is represented by closed cubic
`splines. We make use of the rest-=arch results in all
`these donmin-specific
`motion analysis projects. Our model combines the general area oi" motion
`analysis with individual frame timagel analysis.
`In case of video modeling. the video footage usually is first segmented into
`shots. Segmentation is an important step for detection of’ cut points which can
`be used for further analvsis. Each video shot can he represented by one or
`more key frames. Features such as color. shape. and texture could be ex-
`tracted from the key frames. An approach ['or automatic video indexing and
`full video search is introduced by Nagasaki: anti Tamika [1992]. This video-
`indexing method relies on automatic cut detection and selection of first
`frames within a shot for content representation. ()tsu_ji and Tonomura [1993]
`propose a video cut detection method. Their projection detection filter is
`based on finding the biggest difference in CHIN-“(TUtin'thtlTH‘ histogram differ-
`ences over a period of'time. A model~driven approach to digital video segmen-
`tation is proposed by Hampapur et a]. [1994]. The paper deals with extracting
`features that correspond to cuts. spatial edits. and chromatic edits. The
`authors present an extensive formal treatment of' shot boundary identifica-
`tion based on models of' video edit et'f'ects.
`in our work. Wt‘ rely on these
`methods for the initial stages of‘ video processing. since we need to identity
`shot boundaries to be able to extract meaningful information within a shot.
`One representation scheme of segmented video footage uses key Frames
`[Arman et a]. 1994i. The video segments can also be processed [or extraction
`of synthetic images, or layered representational images. to represent closely
`the meaning of the segments. A methodology for extracting a representative
`image, salient video stilts,
`from a sequence of'
`images is
`introduced by
`Teodosio and Bender [1993]. The method involves determining the optical
`flOW between successive frames. applying af'fine transformations calculated
`from the flow-warping transforms. such as rotation.
`translation, etc.. and
`applying a weighted median filter to the high—resolution image data resulting
`in the final image. A similar method for synthesizing panoramic overviews
`from a sequence of Frames is implemented by Teodosio and Mills [1993].
`Swanberg et al. [1993] introduced a method for identifying desired objectS,
`shots, and episodes prior to insertion in video databases. During the insertion
`process, the data are first analyzed with image-processing routines to identify
`ACM Transactions on Information Systems. Vol. 13. No. 4. October [995.
`
` t
`
` Ms——_____
`
`Canon EX. 1055 Page 16 of 45
`
`i
`
`
`
`;
`
`j
`
`-
`
`1
`
`
`
`Canon Ex. 1055 Page 16 of 45
`
`
`
`Video Content Classification
`
`-
`
`413
`
`t:.:e:.i:;.‘.‘.:::::5:1..‘;i"...:i:::“...‘:.tutti-sires
`W
`|
`n
`.‘
`~
`y WEI] dEfined structure can
`be represtnted.
`lb:l .IandEI exploits the spatial structure of the video data
`without anti 31.1 ng o )Jl‘tt motion. Zhang et al. [1994] presented an evaluation
`and a study oi. knowledge-guidedparsmg algorithms. The method has been
`iiiiplc"1(“ll"“I
`for parsing oi
`television neWS, since video content parsing is
`possdile when one has an a priori model of a video’s structure,
`Another systvnr implemented by Little et a1. {1993], supports content-based
`retrieval and playback.
`l‘liey define a specific schema composed of movie,
`scene. and actor relations With a fixed set of'attributes. Their system requires
`manual
`feature extraction.
`It
`then fits these features into the schema.
`Querying involves the attributes of movie. scene. and actor. Once a movie is
`selected. a user can browse From scene to scene beginning with the initial
`selection. Weiss HHS-ll presented an algebraic approach to content-based
`access to video. Video presentations are composed of video segments using a
`video algebra. The algebra contains methods for temporally and spatially
`combining video segments, as Well as methods for navigation and querying.
`Media Streams is. a visual language that enables users to create multilayered
`iconic annotations of video content [Davis 1993]. The objects denoted by icons
`are organized into hierarchies. The icons are used to annotate the video
`streams in a Media Time Line. The Media Time Line is the core browser and
`viewer of Media Streams. It enables users to visualize video at multiple time
`scales simultaneously. in order to read and write multilayered, iconic annota—
`tions, and it. provides one consistent
`interface for annotation, browsing,
`query. and editing of video and audio data.
`The work presented here follows from a number of efforts listed above.
`Specifically. we use Iow— and intermediate~level motion analysis methods
`similar to those offered by Alimen [1991] and others. Our object recognition
`ideas have been influenced by the work ofJain and his students [Gupta et al.
`1991a; 1991b]. Grosky [Grosky and Mehrotra 1989], and the research in
`image databases. Several
`lines of research such as those in Little et al.
`[1993]. Swanberg et a]. [1993]. Zhang et al. [1994], and WejSS [1994] PTOVEdEd
`many useful ideas for the modeling aspects of our investigations. An early
`report of our work was presented in Dimitrova and Golshani [1994]-
`
`3. MOTION RECOVERY IN DIGITAL VIDEO
`In this section we describe in detail each level of the motion analys'is pipeline.
`At the low-level motion analysis we start with a domain of motion vectors.
`During intermediate-level motion anal)’Sis we “(trad motion trajectories that
`are made of motion vectors. Each trajecwry can be thought 0f as ann-tuple 0t:
`motion vectors. This trajectory representation is a baSisfor various :tl‘lEI
`trajectory representations. At the high-level motion analySIS we assoclil B an
`activity to a set of trajectories of an Object using domain knowledge iu es.
`3.1 Low-Level Motion Extraction: Single Macroblock Tracing
`In MPEG, to encode a macroblock in a predicted or a bidirectional frame. we
`.
`h
`first need to find the. best matching macroblock 1n the reference frames, t en
`- bi 1995.
`AFM Transactions on Information Systems. Vol. 13. No. 4, Otto tr
`
`Canon EX. 1055 Page 17 of 45
`
`Canon Ex. 1055 Page 17 of 45
`
`
`
`414
`
`-
`
`N. Dimitrova and F. Golshani
`
`
`
`find the amount of .r and ‘\‘ translation li.e.. the inolion vector}. and finally
`calculate the error component
`IPatel ct all 1993]. The motion vector is
`obtained by minimizing a cost function that measures the mismatch between
`a block and each predictor candidate. Each liidirectioiial and predicted frame
`is an abundant source oi" motion inlormaiion.
`in fact. each of' these frames
`might be considered a crude interpolation ol' the optical
`flow. Thus,
`the
`extraction of‘the motion rectors of a single macrohlock through a sequence of
`frames is similar to low-level motion analysis.
`Tracing a macrolilock can continue until the end oi'thc \'l(l(.‘0 sequence if we
`do not impose a stopping criterion. We have a choice: to stop after a certain
`number of frames. stop after the object (inacrohlocltl has come to rest. stop if
`t