`Davis et al.
`
`19
`
`54
`(75)
`
`73)
`
`TIME-BASED MEDIA PROCESSING SYSTEM
`
`Inventors: Marc Davis, San Francisco; David
`Levitt, Palo Alto, both of Calif.
`Assignee: Interval Research Corporation, Calif.
`
`Appl. No.: 08/693,004
`Filed:
`Aug. 6, 1996
`Int. Cl." ............................ G06F 17/00; G06F 3/00;
`G11B 27/10
`U.S. Cl. .......................... 345/328; 345/302; 34.5/967;
`386/102; 386/55
`Field of Search ..................................... 345/328, 302,
`345/327,967, 349, 356; 386/4, 52,55,
`102; 707/104
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`4.914,568 4/1990 Kodosky et al. ....................... 345/349
`5,099,422 3/1992 Foresman et al.
`386/54 X
`5,119,474 6/1992 Beitel et al. ......
`... 345/302
`5,177,513
`1/1993 Saito .............
`... 352/129
`5,247.666 9/1993 Buckwold .....
`... 707/100
`5,291,587 3/1994 Kodosky et al. .
`345/349 X
`5,301,336 4/1994 Kodosky et al. .
`... 345/348
`5,359,712 10/1994 Cohen et al. .....
`... 345/328
`5,388,197 2/1995 Rayner ..........
`... 345/328
`5,414,808 5/1995 Williams ...
`... 345/328
`5,548,340 8/1996 Bertram ...
`... 348/559
`5,623.587 4/1997 Bulman .........
`... 345/435
`5,659,793 8/1997 Escobar et al. ...
`... 345/302
`5,682,326 10/1997 Klingler et al. ..
`... 345/302
`5,708,767
`1/1998 Yeo et al. ..
`... 345/302 X
`5,724,605 3/1998 Wissner .........
`... 345/302
`5,748,956 5/1998 Lafer et al. ...
`... 707/104
`5,760,767 6/1998 Shore et al. ..
`... 345/328
`5,781,188 7/1998 Amiot et al. ............................ 345/328
`5,861,880
`1/1999 Shimizu et al. ........................ 345/302
`5,889,514 3/1999 Boezeman et al.
`... 345/302
`5,892,506 4/1999 Hermanson ............................. 345/302
`FOREIGN PATENT DOCUMENTS
`0564247A1 10/1993 European Pat. Off..
`0687109A1 12/1995 European Pat. Off..
`
`USOO596971.6A
`Patent Number:
`11
`(45) Date of Patent:
`
`5,969,716
`Oct. 19, 1999
`
`0706124A 4/1996
`WO93/08664 4/1993
`WO93/21635 10/1993
`WO94/16443 7/1994
`WO96/31829 10/1996
`
`European Pat. Off..
`WIPO.
`WIPO.
`WIPO.
`WIPO.
`
`OTHER PUBLICATIONS
`“Advance bei Matador, Fernseh-und Kino-Technik, Vol.
`48, No. 5, May 1, 1994, Heidelberg, DE, pp. 259-260.
`Weitzman et al., “Automatic Presentation of Multimedia
`Documents. Using Relational Grammars”, 1994 ACM Pro
`ceedings, Multimedia 94, pp. 443-451, Oct. 1994, San
`Francisco, California.
`Davis, “Media Streams: An Iconic Visual Language for
`Video Representation”, Proceedings of the 1993 Symposium
`on Visual Languages, pp. 196-220, 1993.
`Adobe After Effects, URL:http://www.adobe.com/prodin
`dex/aftereffects/main.html, http://www.adobe.com/prodin
`deX/aftereffects/details.html#features, 1997.
`Cinebase,
`URL:http://www.cinesoft.com/info/aboutcine
`base/index.html, 1997.
`Primary Examiner Raymond J. Bayerl
`Attorney, Agent, or Firm-Burns, Doane, Swecker &
`Mathis, L.L.P.
`ABSTRACT
`57
`Existing media signals are processed to create new media
`content by defining content representations for the existing
`media and establishing functional dependencies between the
`representations. The content representations constitute dif
`ferent data types which determine the kinds of operations
`that can be performed and dependencies that can be estab
`lished. Among the types of transformation that can be
`achieved are Synchronization, Substitution resequencing
`temporal compression and dilation, and the creation of
`parametric Special effects. The content representations and
`their functional dependencies are combined to construct a
`functional dependency network which causes the desired
`transformations to occur on input media Signals. The inputs
`to the functional dependency network are parametrically
`Specified by media data types to construct a template that can
`be used to create adaptive media productions.
`
`17 Claims, 8 Drawing Sheets
`
`NUYBCELERY FELINE
`
`s
`I
`
`s
`I
`
`A =
`
`2
`
`WER
`
`
`
`WREE
`
`Ly
`
`o
`
`40
`2-66
`
`SCHEB
`
`Akamai Ex. 1009
`Akamai Techs. v. Equil IP Holdings
`IPR2023-00330
`Page 00001
`
`
`
`U.S. Patent
`
`Oct. 19, 1999
`
`Sheet 1 of 8
`
`5,969,716
`
`12
`
`
`
`
`
`
`
`
`
`MEDIA
`INPUT
`
`KEYBOARD
`
`CURSOR
`CONTROL
`
`DISPLAY
`
`PRINTER
`
`NETWORK |
`
`COMM.
`
`27
`
`24
`
`26
`
`28
`
`30
`
`31
`
`FIG 1
`
`MEDIA - PARSER H-CR FIG 2A
`CR-PARSER H-CR FIG 2B
`CR-PRODUCERH-MEDIA FIG 2C
`MEDIA-PRODUCERH - MEDIA FIG 2D
`
`
`
`PHONES
`
`MEDIA
`
`CONTENT
`REPRESENTATIONS
`
`FIG 3
`
`
`
`PROSODY
`
`IPR2023-00330 Page 00002
`
`
`
`U.S. Patent
`
`Oct. 19, 1999
`
`Sheet 2 of 8
`
`5,969,716
`
`MEDIA 1
`
`PARSER
`
`MEDIA 2
`
`PARSER
`
`
`
`
`
`
`
`CONTENT
`REPRESENTATION
`
`CONTENT
`REPRESENTATION
`
`
`
`MEDIA
`
`
`
`
`
`MEDIA-A.
`FCN.
`
`PARSER
`
`CONTENT
`REPRESENTATION
`
`
`
`
`
`PRODUCER
`
`MEDIA
`
`REPRESENTATION
`FIG 5
`
`IPR2023-00330 Page 00003
`
`
`
`U.S. Patent
`
`Oct. 19, 1999
`
`Sheet 3 of 8
`
`5,969,716
`
`76
`
`PREVIEW WINDOW FNix.
`
`TIMELINE WINDOW
`MNM-M1/YM
`VIDEO
`IAI
`A e - || ADO
`
`A O
`
`D
`
`KD
`
`BROWSE
`
`PREMEW/RECORD
`
`
`
`
`
`E. E. BROWSE/EDD
`72
`\\
`T
`| "RENGINA. SRig
`MEDIA
`FILES Resis|| D FILES
`VN-82
`V
`7 N
`BROWSE/SEARCH
`Yinz
`86
`ALVINITISAMCID
`
`SEARCH
`RITERA
`
`APPLY
`FUNCTIONS
`
`84
`
`
`
`
`
`CENCAPSULATD
`
`78
`
`81
`
`
`
`
`
`
`
`
`
`
`
`BROWSE
`SEARC
`
`80
`
`88
`
`FUNCTION BRARY
`OPERATORS ON BUT IN DATA
`TYPES (STREAMS, RECORDS,
`FUNCION AND DATA
`SEQUENCES, NUMBERSETC)
`QUERY PALETE
`PLUG-N
`HIGH EVEL OPERATORS TO
`SYNCHRONIZE, PATTERN MATCH,
`CODE & DATA 2
`HICHER ORDER DATA TYPES
`SUBSTITUTE, ANNOTATE AUDIO,
`PUC-IN
`VIDEO, MUSIC AND TEXT
`META-DATA
`USER-DEFINED FUNCTIONS | USER-DEFINED DATA TYPES
`HUSR-DEN DAATPS
`PLATFORM-SPECIFIC
`(PREPARE FOR
`ERAS QEE) (NUR NERAct N
`
`DATA LIBRARY
`
`FIRST ORDER DATA TYPES
`
`
`
`
`
`ADAPTIVE
`TEMPATES
`
`FIG 6
`
`IPR2023-00330 Page 00004
`
`
`
`U.S. Patent
`U.S. Patent
`
`Oct. 19, 1999
`
`Sheet 4 of 8
`
`5,969,716
`5,969,716
`
`
`
`FUNCTION PALLETTE ONE
`
`FIG 7
`FIG. 7
`
`IPR2023-00330 Page 00005
`
`IPR2023-00330 Page 00005
`
`
`
`U.S. Patent
`U.S. Patent
`
`Oct. 19, 1999
`
`Sheet 5 of 8
`
`5,969,716
`5,969,716
`
`
`
`
`
`vf
`
`c+
`
`
`
`INDIAAYTTIO_GNNN
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`IPR2023-00330 Page 00006
`
`IPR2023-00330 Page 00006
`
`
`
`
`
`J O - L
`
`U.S. Patent
`U.S. Patent
`
`Oct. 19, 1999
`Oct. 19, 1999
`
`Sheet 6 of 8
`Sheet 6 of 8
`
`5,969,716
`5,969,716
`
`
`
`S
`
`ge
`
`S &
`
`se
`
`S SE
`
`[|
`
`
`
`
`FIG.9
`
`LI
`LJ
`
`LJ
`
`a L
`
`J
`[|
`
`J
`
`= L
`
`_]
`LI
`L_]
`
`
`
`
`
`IPR2023-00330 Page 00007
`
`
`
`KUNGFUTIMELINE
`
`g
`E
`g
`Se
`
`IPR2023-00330 Page 00007
`
`
`
`U.S. Patent
`
`Oct. 19, 1999
`
`Sheet 7 of 8
`
`5,969,716
`
`
`
`
`
`HOHHX3 JSTOHA
`
`IPR2023-00330 Page 00008
`
`
`
`U.S. Patent
`U.S. Patent
`
`Oct. 19, 1999
`Oct. 19, 1999
`
`Sheet 8 of 8
`Sheet 8 of 8
`
`5,969,716
`5,969,716
`
`
`
`ItOld
`
`YY
`
`SIAOWSAV
`
`S.
`
`IPR2023-00330 Page 00009
`
`IPR2023-00330 Page 00009
`
`
`
`1
`TIME-BASED MEDIA PROCESSING SYSTEM
`
`FIELD OF THE INVENTION
`The present invention is directed to the production,
`transformation, modification, resequencing, and distribution
`of time-based media Signals, Such as Video and audio
`Signals, and more particularly to a media processing System
`that is capable of providing reconfigurable, adaptive media
`productions that can accept, adapt, and/or be adapted to new
`media signals provided by a user, without requiring high
`levels of skill on the user's part. These processes are directed
`to, but not limited to, the motion picture, television, music,
`audio, and on-line content industries.
`
`BACKGROUND OF THE INVENTION
`Today's most advanced media processing Systems are
`mechanical, rather than computational, devices. They
`directly manipulate extents of temporal media in the same
`manner as the first film editing Systems at the dawn of the
`century, and their users are Still required to think that way.
`In order to understand how even the most advanced media
`editing Systems operate, one can imagine a virtual robot arm
`manipulating media according to temporal in and out points.
`A different model of the content being operated upon, and of
`the operations being performed, could result in different
`methods of media production and different kinds of media
`productions. Two historical analogies are illustrative in this
`connection. The first relates to the invention of manufac
`tured interchangeable parts in the process of gun manufac
`ture in the later part of the 18th century. Before the invention
`of interchangeable parts, gun manufacture Suffered from a
`lack of standardization and reusability of components. Every
`part was a unique result of handicraft, rather than a stan
`dardized manufactured component. The invention of manu
`factured interchangeable parts transformed gun production
`from a pre-industrial to an industrial mode of production. In
`the later part of the twentieth century, media production
`methods have yet to achieve the Stage of industrialization
`reached by gun manufacture at the end of the eighteenth
`century. The current invention aims to alter that situation.
`In order for media to be produced by means of the
`manufacture of interchangeable parts, purely mechanical
`modes of production are insufficient. Computational media
`production methods are required, in a manner analogous to
`the invention in the 1980's of computational production
`methods in Software design which enabled the Simple
`definition, creation, and reuse of Software components.
`The ability to quickly, Simply and iteratively produce new
`media content is of Special interest in contexts where movie
`making has been historically hampered by lack of skill and
`resources. In particular, home consumer production of
`movie content suffers from the lack of the following three
`capabilities which are needed to meet these objectives:
`easy-to-use yet powerful composition tools
`access to media content which cannot be produced in the
`home
`tools for producing high-quality Soundtracks (including
`multitrack music, dialogue, narration, and Sound
`effects)
`Another limitation associated with current media process
`ing Systems is the fact that they are poorly Suited for the
`re-use of pre-existing media content. This is especially the
`case in Situations in which the cost and/or difficulty of
`creating new media content exceed the cost and/or difficulty
`of reusing existing media content. For consumers wishing to
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`5,969,716
`
`2
`participate in media productions, access to existing media is
`of paramount importance given their lack of production
`skill, financial resources, and media assets. Currently, there
`is no mechanism by which pre-existing recordings can be
`efficiently retrieved and combined to present the desired
`effect.
`In Summary, there is a need for a time-based media
`processing System which is capable of providing high
`quality, adaptive media productions without requiring a
`Significant level of skill on the part of the user, and is
`therefore Suited for use by the average consumer. The
`objective of the invention is to enable new efficiencies,
`methods, and forms in the production and distribution of
`media content. The invention also aims to Satisfy a need for
`a media-processing System which facilitates the re-use of
`media content, and indirectly the labor and expertise that
`created it.
`
`SUMMARY OF THE INVENTION
`In pursuit of these objectives, the present invention
`embodies a new paradigm for computational media proceSS
`ing which is comprised of two fundamental components:
`Content Representation
`(automatically, Semi-automatically, and manually gener
`ated descriptive data that represent the content of media
`signals)
`Functional Dependency
`(functional relationships that operate on content represen
`tations and media signals to compute new media
`content)
`The invention combines these two techniques to create
`time-based media processing Systems, which manipulate
`representations of media content in order to compute new
`media content. The invention is intended to Support a
`paradigm shift from the direct manipulation of Simple tem
`poral representations of media (frames, timecodes, etc.), to
`the interactive computation of new media from higher level
`representations of media content and functional dependen
`cies among them. This paradigm of media processing and
`composition enables the production of traditional media
`(e.g., movies, television programs, music videos, etc.) to be
`orders of magnitude faster than current methods. AS Such,
`uses of the invention may have fundamental consequences
`for the current industrial processes of media production,
`distribution, and reuse. By means of content representation
`and functional dependency, the current invention creates a
`production process for computational media components
`which can determine what they contain, and how they can be
`processed, adapted, and reused.
`In accordance with the present invention, a media Signal
`is processed in a media parser to obtain descriptive repre
`Sentations of its contents. Each content representation is data
`that provides information about the media Signal, and is
`functionally dependent on the media Signal. Depending
`upon the particular data type of the content representation,
`different kinds of information can be obtained about the
`media, and different types of operations can be performed on
`this information and the media it is functionally dependent
`upon. Content representations also Support inheritance of
`behavior through directed graph structures (e.g., general to
`Specific) and are composable into new content representa
`tions. For example, an audio signal can be parsed to identify
`its pitch. Higher order parsing can be performed on this
`content representation to obtain additional information
`about the media signal, Such as its prosody (i.e., its pitch
`pattern), or in the case of music, its chord structures.
`
`IPR2023-00330 Page 00010
`
`
`
`3
`Media parsers may operate automatically, Semi
`automatically, or manually. Automatic media parsers require
`no human input in order to produce their content represen
`tations from their input media Signals. Semi-automatic and
`manual media parsers require human input or manual anno
`tation to produce their content representations.
`The information that is obtained from the content repre
`Sentation of a media Signal is fed to a media producer which
`defines a functional relationship between input media Sig
`nals and content representations, to produce the new media
`production. For example, the rate of events of a particular
`Song might be used to control the rate at which a Video signal
`is played, So that events in the Video are Synchronized with
`events in the Song. Alternatively, a Soundtrack can be
`accelerated, decelerated and/or modified to fit it to a Video
`Sequence. In another example, the functional relationship
`can be used to Substitute one item of media for another. For
`instance, original Sounds in a Soundtrack for a Video signal
`can be replaced by a new set of Sounds having similar
`properties, e.g. durations, which correspond to those of the
`original Sounds. In another example, events in a Video or
`audio signal can be detected and used to modify one or both
`media Signals in a particular manner to create special effects.
`In yet another example, Specific media signals can be
`triggered in response to the content of another media Signal
`to, for instance, produce an animation which reacts to the
`Semantic content of an incoming Stream of media Signal with
`its dependent content representation.
`In the System of the present invention, the generation of
`a reconfigurable and adaptive media production is carried
`out in two major phases. In the first phase, a functional
`dependency network is built by a perSon referred to herein
`as a template builder. The functional dependency network
`provides a functional Structure, or template, which outputs
`the ultimate media production. To this end, a multiplicity of
`different media parsers and media producers are employed
`to respectively process different types of media signal S and
`different data types for the content representations. The
`functional dependency network is built by combining
`Selected ones of the media parsers and media producers in a
`manner to process media signals and provide a desired
`functional relationship between them. During the building
`phase, a fixed set of media signals are input to the functional
`dependency network, and the template builder can itera
`tively vary the parsers and producers to obtain a desired
`result using this constant Set of input signals. In addition,
`new content representations and new data types, can be
`defined during this phase. Template builders can re-use
`existing templates in the construction of new ones.
`Once the template has been built, one or more inputs to
`the functional dependency network can be changed from
`constant input Signals to parameters that are defined by their
`data types. The resulting functional dependency network
`with parametric input(s) forms an adaptive template that is
`provided to a template user. In the Second phase of the
`procedure, the template user provides media Signals which
`are of the required data type, to be used as input signals to
`the functional dependency network. These media signals are
`processed in accordance with the functions built into the
`adaptive template to produce a new media production that
`adapts, and/or adapts to, the template users input.
`In an alternative embodiment of the invention, the con
`Stant input Signals need not be changed to parameters once
`the functional dependency network has been defined. In this
`case, a traditional media presentation, i.e. one which is not
`adaptive, is obtained. However, the ability to produce and
`alter the media production in an iterative manner provides a
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`5,969,716
`
`4
`greater degree of efficiency and automation than more
`traditional methods of media production. In addition, the
`System permits pre-existing media content to be reused in a
`meaningful way.
`As a further feature of the invention, a visual data flow
`interface is provided to facilitate the Selection, combination
`and construction of media parsers and producers in the
`building of the functional dependency network. The
`manipulation of parsers, producers, functions, media
`Signals, data types, and content representations is effected as
`the template builder Selects, drags and connects their iconic
`representations in a graphical data flow network. The func
`tionality provided by the interface is analogous to the
`operation of a Spreadsheet, in the Sense that the network
`builder can Select and place data items, i.e. media Signals, in
`a particular arrangement, and Specify functional dependen
`cies between the data items. The interface displays the input
`Signals, intermediate processing results, and final outputs in
`both a spatial and a temporal manner, to provide ready
`comprehension of the relationships of the media Signals and
`the content representations in the functional dependency
`network. This feature allows the network to be constructed
`in an intuitive manner.
`With the capabilities provided by the present invention,
`data in any particular medium, or combination of media,
`undergoes parsing and/or annotation, and Subsequent func
`tional combination, to construct a template which can pro
`duce new media productions. The new media productions
`may be produced by other template users each providing
`their own media, or by the template builder, to make
`multiple productions with Similar Structures.
`The invention enables consumers to produce movie con
`tent with high production values without the traditionally
`high production costs of training, expertise, and time. The
`invention also enables the creation of a new type of media
`production which can adapt, and adapt to, new media input.
`An example of Such an adaptive media production is a music
`video which can incorporate new video without loss of
`Synchronization, or alternatively adapt its Video content to
`new music. From the Viewpoint of consumers who desire to
`See themselves reflected in movies, Videos, and television
`programs, only simple interactive Selection, rather than
`editing, is required to make or See a media production
`adapted to and/or adapting their own media content.
`These features of the invention, as well as the advantages
`offered thereby, are explained in greater detail hereinafter
`with reference to Specific examples illustrated in the accom
`panying drawings.
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 is a general block diagram of a computer System
`of the type in which the present invention might be imple
`mented;
`FIGS. 2A-2D are schematic diagrams of the basic opera
`tions that are performed in the context of the present
`invention;
`FIG. 3 is a block diagram of the relationships of different
`types of content representations,
`FIG. 4 is a block diagram of a functional dependency
`network;
`FIG. 5 is a block diagram of an exemplary template;
`FIG. 6 is a block diagram of the architecture of a system
`constructed in accordance with the present invention;
`FIG. 7 is an illustration of a function palette;
`FIG. 8 is an illustration of a user interface for manipu
`lating an audio/video signal to Synchronize its events with
`the events of another audio signal;
`
`IPR2023-00330 Page 00011
`
`
`
`5,969,716
`
`15
`
`25
`
`S
`FIG. 9 is an illustration of a user interface for manipu
`lating an audio/video signal to Substitute new Sounds,
`FIG. 10 is an illustration of a user interface for manipu
`lating a video Signal to create an auto rumble effect; and
`FIG. 11 is an illustration of a user interface for selecting
`new media Signals to produce a new media production from
`an adaptive template.
`DETAILED DESCRIPTION
`To facilitate an understanding of the principles and fea
`tures of the present invention, it is described hereinafter with
`reference to particular examples of media content and pro
`cessing. In particular, the analysis and transformation of
`various video and audio Streams are described in the context
`of Simple, readily comprehensible implementations of the
`invention. It will be appreciated, however, that the practical
`applications of the principles which underlie the invention
`are not limited to these Specific examples. Rather, the
`invention will find utility in a wide variety of situations and
`in connection with numerous different types of media and
`production contexts.
`In general, the present invention is directed to the pro
`cessing and transformation of various types of media
`Signals, to generate new media content. The particular
`hardware components of a System in which the following
`principles might be implemented do not form part of the
`invention itself. However, an exemplary computer System is
`briefly described herein to provide a thorough understanding
`of the manner in which the features of the invention coop
`erate with the components of Such a System to produce the
`desired results.
`Referring to FIG. 1, a computer System includes a com
`puter 10 having a variety of external peripheral devices 12
`connected thereto. The computer 10 includes a central
`processing unit 14 and associated memory. This memory
`generally includes a main memory which is typically imple
`mented in the form of a random acceSS memory 16, a Static
`memory that can comprise a read only memory 18, and a
`permanent Storage device, Such as a magnetic or optical disk
`20. The CPU 14 communicates with each of these forms of
`memory through an internal buS 22. Data pertaining to a
`variety of media Signals can be Stored in the permanent
`storage device 20, and selectively loaded into the RAM 16
`as needed for processing.
`The peripheral devices 12 include a data entry device Such
`as a keyboard 24, a pointing or cursor control device 26 Such
`as a mouse, trackball, pen or the like, and Suitable media
`input devices 27, Such as a microphone and a camera. An
`A/V display device 28, such as a CRT monitor or an LCD
`50
`Screen, provides a visual display of Video and audio infor
`mation that is being processed within the computer. The
`display device may also include a set of speakers (not
`shown) to produce audio Sounds generated in the computer.
`A permanent copy of the media Signal can be recorded on a
`Suitable recording mechanism 30, Such as a video cassette
`recorder, or the like. A network communications device 31,
`Such as a modem or a transceiver, provides for communi
`cation with other computer Systems. Each of these periph
`eral devices communicates with the CPU 14 by means of
`one or more input/output ports 32 on the computer.
`In the processing of media Signals in accordance with the
`present invention, four fundamental types of operations are
`performed. Referring to FIG. 2A, one type of operation is to
`parse an original media signal into a content representation
`of that Signal. The original media Signal comprises data
`which defines the content of the Signal. In the case of an
`
`6
`audio signal, for example, that data comprises individual
`Samples of the amplitude of an audio pressure wave. In the
`case of a Video signal, that data might be the values of the
`individual pixels that make up the frames of the Signal.
`In a first order parser, the original media data is processed,
`or analyzed, to obtain new data which describes one or more
`attributes of the original data. The new data, and its corre
`sponding type information, is referred to herein as content
`representation. For instance, in the case of an audio signal,
`one type of first order parser can produce output data which
`describes the fundamental frequency, or pitch of the Signal.
`A first order parser for Video might indicate each time that
`the Video image Switches to a different camera shot. Various
`types of media signals will have associated forms of content
`representation. For example, a speech Signal could be rep
`resented by the individual Speech components, e.g., phones,
`which are uttered by the Speaker. In this regard, reference is
`made to U.S. patent application Ser. No. 08/620,949, filed
`Mar. 25, 1996, for a detailed discussion of the annotation
`and transformation of media signals in accordance with
`Speech components. Video signals can likewise be analyzed
`to provide a number of different forms of content represen
`tation. In this regard, reference is made to Davis, “Media
`Streams: Representing Video for Retrieval and
`Repurposing”, Ph.D. thesis submitted to the Program in
`Media Arts and Sciences, Massachusetts Institute of
`Technology, February 1995, particularly at Chapter 4, for a
`detailed discussion of the content representation of Video.
`The disclosure of this thesis is incorporated herein by
`reference thereto.
`The parsing of a media Signal to generate a content
`representation can be carried out automatically, Semi
`automatically, or manually. For instance, to manually parse
`a Video signal to identify different camera shots, a human
`observer can view the Video and annotate the frames to
`identify those in which the camera shot changes. In an
`automatic approach, each frame can be analyzed to deter
`mine its color histogram, and a new shot can be labeled as
`one in which the histogram changes from one frame to the
`next by a prespecified threshold value. In a Semiautomatic
`approach, the viewer can manually identify the first few
`times a new shot occurs, from which the System can deter
`mine the appropriate threshold value and thereafter auto
`matically detect the new camera angles.
`Referring to FIG. 2B, in the second fundamental type of
`operation, a content representation is processed in a Second
`or higher order parser to generate additional forms of
`content representation. For example, the pitch content rep
`resentation of an audio signal can be parsed to indicate
`properties of its prosody, i.e. whether the pitch is rising or
`falling. In the case of a Video Signal, a first order content
`representation might compute the location of a colored
`object using the color of pixels in a frame, while a Second
`order parser might calculate the Velocity of that object from
`the first order representation. In another Video example,
`higher order parsing of the shot data can produce content
`representations which identify Scene boundaries in a
`Sequence of shots according to continuity of diegetic (i.e.
`Story) time and location. These types of content represen
`tation may depend on aspects of human perception which
`are not readily computable, and therefore manual and/or
`Semi-automatic annotation might be employed.
`Each different form of content representation employs a
`data type whose data values are functionally dependent upon
`the data of the media Signal. These data types effectively
`define a component architecture for all media Signals. In this
`regard, different representations can have a hierarchical or
`
`35
`
`40
`
`45
`
`55
`
`60
`
`65
`
`IPR2023-00330 Page 00012
`
`
`
`7
`peer-to-peer relationship to one another. Referring to FIG. 3,
`different content representations produced by first-order
`parsing of a given media Signal have a peer-to-peer rela
`tionship. Thus, pitch data and phone data derived from
`parsing a Speech Signal are peers of one another. Content
`representations which are produced by higher order parsers
`may have a hierarchical relationship to the content repre
`Sentations generated by lower-order parsers, and may have
`a peer-to-peer relationship to one another. Hence, prosody
`data is hierarchically dependent on pitch data. The data type
`inherently defines the types of content representations and
`media Signals that a parser or producer can compute, and in
`what manner. Based on this information, desired functional
`dependencies can be established between different content
`representations and media signals to generate new media
`content from a template.
`Referring to FIG. 2C, a third type of operation is the
`processing of content representations to produce a new
`media Signal. In this type of operation, the data of the
`content representation might be an input parameter to a
`media producer which causes a media signal to be generated,
`for example, a Synthetic media Signal may be rendered from
`its content representation, Such as computer animation
`parameters or MIDI Sequences, respectively. In the fourth
`type of operation, depicted in FIG. 2D, a media signal is
`transformed in accordance with a defined media producer to
`produce new media Signals.
`These fundamental operations define two basic types of
`operators that are employed in the present invention. AS used
`herein, a media parser is an operator which produces content
`representation as its output data, whether the input data is
`media data, i.e. a first-order parser, or another form of
`content representation as in Second and higher order parsers.
`A media producer, on the other hand, is an operator which
`transforms input data to produce a media Signal as its output
`data.
`In the context of the present invention, these operators are
`Selectively combined to build a functional dependency net
`work. A simple example of a functional dependency network
`is illustrated in FIG. 4. Referring thereto, the functional
`dependency network receives one or more media signals as
`input Signals, and parses these input Signals to generate
`content representations for each. The media signals which
`are input to the functional dependency network could be
`retrieved from a storage medium, Such as the hard disk 20,
`or they can be real-time signals. The content representations
`and media signals are processed in a media producer to
`generate a new media signal. In the context of the present
`invention, a multitude of different kinds of transformations
`can be performed on media Signals within the functional
`dependency network. One example of a media transforma
`tion includes Synchronization, in which the events in one
`media Signal are Synchronized with events in another media
`Signal, e.g. by varying their playback rates. Another type of
`transformation comprises Sound Substitution, Such as foley
`in traditional motion picture production, in which one type
`of Sound is Substituted for another type of Sound in an
`audio/video signal. A third type of processing is the modi
`fication of a media Signal in accordance with another media
`Signal, to produce parametric Special effects. A fourth type of
`processing is the triggering of a Specific media Signal in
`accord with another media signal to, for example, produce a
`reactive animation to an incoming Stream of media signal
`with its dependent content representation. For example, an
`animated character may respond to content representations
`parsed in real-time from live closed-captioned tex