`MARC DAVIS (cid:9)
`School of Information Management and Systems (cid:9)
`School of Information Management and Systems
`University of California at Berkeley (cid:9)
`University of California at Berkeley
`
`
`P U B L I C A T I O N S
`PUBLICATIONS
`marc@sims.berkeley.edu
`marc@sims.berkeley.edu
`www.sims.berkeley.edu/~marc
`www.sims.berkeley.edui-marc
`
`
`
`
`
`IDIC: Assembling Video Sequences
`IDIC: Assembling Video Sequences
`from Story Plans and Content
`from Story Plans and Content
`Annotations
`Annotations
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Bibliographic Reference:
`Bibliographic Reference:
`Marc Davis and Warren Sack. "IDIC: Assembling Video Sequences from Story Plans and
`Marc Davis and Warren Sack. "IDIC: Assembling Video Sequences from Story Plans and
`Content Annotations." In: Proceedings of IEEE International Conference on Multimedia
`Content Annotations." In: Proceedings of IEEE International Conference on Multimedia
`Computing and Systems in Boston, Massachusetts, IEEE Computer Society Press, 30-36,
`Computing and Systems in Boston, Massachusetts, IEEE Computer Society Press, 30-36,
`1994.
`1994.
`
`
`
`Page 1 of 10
`
`MINDGEEK EXHIBIT 1014
`
`
`
`
`
`
`
`
`Page 2 of 10
`
`MINDGEEK EXHIBIT 1014
`
`Page 2 of 10
`
`MINDGEEK EXHIBIT 1014
`
`
`
`IDIC: Assembling Video Sequences from Story Plans and Content
`IDIC: Assembling Video Sequences from Story Plans and Content
`Annotations
`Annotations
`
`Warren Sack * and Marc Davis +
`Warren Sack * and Marc Davis +
`
`*
`* MIT Media Lab, Machine Understanding Group, 20 Ames Street, Cambridge, MA 02139
`MIT Media Lab, Machine Understanding Group, 20 Ames Street, Cambridge, MA 02139
`phone: 617/253-9497
`email: wsack@media.mit.edu
`phone 617/253-9497 (cid:9)
`email wsack@media.mit.edu
`
`+ MIT Media Lab & Interval Research Corp., 1801 Page Mill Road, Building C, Palo Alto, CA
`+ MIT Media Lab & Interval Research Corp., 1801 Page Mill Road, Building C, Palo Alto, CA
`94304
`94304
`email: davis@interval.com
`email davis@interval.com
`
`phone: 415354-3631 (cid:9)
`phone: 415/354-3631
`
`Abstract§
`Abstract§
`We describe a system, IDIC, which can generate a
`We describe a system, /D/C, which can generate a
`video sequence according to a story plan by
`video sequence according to a story plan by
`selecting appropriate segments from an archive of
`selecting appropriate segments from an archive of
`annotated video. IDIC uses a simple planner to
`annotated video. /D/C uses a simple planner to
`generate its stories. By critically examining the
`generate its stories. By critically examining the
`strengths and weaknesses of the representation
`strengths and weaknesses of the representation
`and algorithm employed in the planner, we are able
`and algorithm employed in the planner, we are able
`to describe some interesting similarities and
`to describe some interesting similarities and
`differences between planning and video story
`differences between planning and video story
`generation. We use our analysis of IDIC to
`generation. We use our analysis of /D/C to
`investigate the representation and processing
`investigate the representation and processing
`issues involved in the development of video gen-
`issues involved /%7 the development of video gen-
`eration systems.
`eration systems.
`
`1. Introduction: The Common Sense of
`1. Introduction: The Common Sense of
`Television
`Television
`
`Americans watch a lot of television. On
`Americans watch a lot of television. On
`average most watch six hours of TV a day, and
`average most watch six hours of TV a day, and
`most households have the set on for at least eight
`most households have the set on for at least eight
`hours [Cross 1983, p. 2]. What are we learning
`hours [Cross 1983, p. 2]. What are we learning
`from the attention we spend on soap operas, sit-
`from the attention we spend on soap operas, sit-
`coms, ads, Monday night football, talk shows, and
`coms, ads, Monday night football, talk shows, and
`music videos? A culturally specific form of common
`music videos? A culturally specific form of common
`sense. Indeed what we are learning through the
`sense. Indeed what we are learning through the
`television has become, to a large extent, the con-
`television has become, to a large extent, the con-
`sensual reality of the United States. Rodney King's
`sensual reality of the United States. Rodney King's
`beating by the L.A. police, the explosion of the
`beating by the L.A. police, the explosion of the
`space shuttle Challenger, former Vice-President
`space shuttle Challenger, former Vice-President
`Quayle's comments about Murphy Brown, and
`Quayle's comments about Murphy Brown, and
`Murphy Brown's response to Quayle, the name of
`Murphy Browiis response to Quayle, the name of
`Lucy's husband (Ricky), and the slogan from the
`Lucys husband (Ricky), and the slogan from the
`Wendy's restaurant commercial which was often
`Wendys restaurant commercial which was often
`quoted in the 1984 presidential race ("Where's the
`quoted in the 1984 presidential race ("Where's the
`beef?") are all examples of events which were seen
`beef?") are all examples of events which were seen
`by most of us, not with the naked eye, but on
`by most of us, not with the naked eye, but on
`television; all of these events are "common
`television; all of these events are "common
`
`§ Published in the Proceedings of the IEEE International
`§ Published in the Proceedings of the IEEE International
`Conference on Multimedia Computing and Systems, May
`Conference on Multimedia Computing and Systems, May
`14-19, 1994, Boston, MA.
`14-19, 1994, Boston, MA.
`
`sensical" to the extent that they are referents with
`sensical" to the extent that they are referents with
`which "everyone" is assumed to be familiar for the
`which "everyone" is assumed to be familiar for the
`purposes of casual discourse. Ever since, at least,
`purposes of casual discourse. Ever since, at least,
`McCarthy's description of an advice taker
`McCarthy's description of an advice taker
`[McCarthy 1958], a machine that could be
`[McCarthy 1958], a machine that could be
`programmed in a common vernacular, researchers
`programmed in a common vernacular, researchers
`(e.g., Lenat and Guha 1990; Hobbs and Moore
`(e.g., Lenat and Guha 1990; Hobbs and Moore
`1985) have been trying to find a way to articulate
`1985) have been trying to find a way to articulate
`“common sense” in a computationally interpretable
`"common sense" in a computationally interpretable
`form. It is striking that none of this research has
`form. It is striking that none of this research has
`been aimed at representing television, the subject
`been aimed at representing television, the subject
`which occupies almost as many of Americans'
`which occupies almost as many of Americans'
`waking hours as work and school. One of our
`waking hours as work and school. One of our
`current concerns is to address this oversight. This
`current concerns is to address this oversight. This
`paper is a description of some of our initial efforts
`paper is a description of some of our initial efforts
`aimed at articulating the "common sense" of
`aimed at articulating the "common sense" of
`television.
`television.
`
`With our long-term research agenda we
`With our long-term research agenda we
`seek to address two issues: one technological and
`seek to address two issues: one technological and
`one theoretical:
`one theoretical:
`
`• The Technological Issue: Interactive
`• The Technological Issue: Interactive
`Television: In the next few years the technology of
`Television: In the next few years the technology of
`television will be integrated with computers. As a
`television will be integrated with computers. As a
`consequence, television (and also the “common
`consequence, television (and also the "common
`sense” of television) will change. Viewers will have
`sense" of television) will change. Viewers will have
`access to services which will allow them to search
`access to services which will allow them to search
`for and download movies and all types of television
`for and download movies and all types of television
`shows from distant sources. It will also be possible,
`shows from distant sources. It will also be possible,
`with the advent of digital television, to program
`with the advent of digital television, to program
`
`"interactive" shows which will allow the viewer to, “interactive” shows which will allow the viewer to,
`for example, specify a change in narrative, replace
`for example, specify a change in narrative, replace
`characters or actors, specify camera movements,
`characters or actors, specify camera movements,
`or, in general, to play the role, in a limited manner,
`or, in general, to play the role, in a limited manner,
`of the director. In our research we are attempting
`of the director. In our research we are attempting
`to find the means to represent, index, and
`to find the means to represent, index, and
`automatically draw inferences about television
`automatically draw inferences about television
`shows. We hope that this work will provide the
`shows. We hope that this work will provide the
`underpinnings necessary
`to support
`the
`underpinnings necessary to support the
`functionality of an interactive television technology.
`functionality of an interactive television technology.
`
`Page 3 of 10
`
`MINDGEEK EXHIBIT 1014
`
`
`
`Negotiate
`
`Fight
`
`sL
`
`Fight (continued)
`
`Threaten
`
`3
`
`5
`
`Rescue
`
`Rescue (continued)
`
`= establishing-negotiate
`
`= break-down
`
`Y = threaten-renewed-violence
`
`= pre-e3nptive-rescue
`
`MOM 1! Ito meson Taft
`Figure 1: The "Rescue" Trailer
`
`2 2
`
`Page 4 of 10
`
`MINDGEEK EXHIBIT 1014
`
`
`
`• The Theoretical Issue: Television and A/ Theories
`• The Theoretical Issue: Television and AI Theories
`of Common Sense.- Within the discipline of artificial
`of Common Sense: Within the discipline of artificial
`intelligence (AI) we often speak as though
`intelligence (Al) we often speak as though
`knowledge comes in only two flavors: (1) expert
`knowledge comes in only two flavors: (1) expert
`knowledge; and, (2) culturally independent
`knowledge; and, (2) culturally independent
`"common sense" knowledge. Everyone is assumed
`"common sense" knowledge. Everyone is assumed
`to possess, at least some, "common sense." Thus,
`to possess, at least some, "common sense." Thus,
`human novices, students, readers, viewers, or
`human novices, students, readers, viewers, or
`learners, in general, are prefigured, in the literature
`learners, in general, are prefigured, in the literature
`of AI as "non-experts;" i.e., as minds which possess
`of Al as "non-experts;" i.e., as minds which possess
`the ubiquitous "common sense," but which lack a
`the ubiquitous "common sense," but which lack a
`specific sort of knowledge, an expertise of a
`specific sort of knowledge, an expertise of a
`particular professional or academic discipline. This
`particular professional or academic discipline. This
`is an inadequate representation of "common sense"
`is an inadequate representation of "common sense"
`because it leaves no room for a study of the sorts of
`because it leaves no room for a study of the sorts of
`culturally specific and rarely archived knowledges
`culturally specific and rarely archived knowledges
`that many of us are fluent in; e.g., popular culture.
`that many of us are fluent in; e.g., popular culture.
`Consequently, we would contend
`that
`Consequently, we would contend that
`contemporary AI theories of representation are
`contemporary Al theories of representation are
`inadequate to the task of representing the "common
`inadequate to the task of representing the "common
`sense" of television. The "common sense" of
`sense" of television. The "common sense" of
`television is the content of television and the sort of
`television is the content of television and the sort of
`learning and transformations experienced by
`learning and transformations experienced by
`viewers of television. In short, in AI it is difficult to
`viewers of television. In short, in Al it is difficult to
`construct flexible and perceptive representations of
`construct flexible and perceptive representations of
`popular culture, in general, and television, in
`popular culture, in general, and television, in
`particular, because there exist no adequate means
`particular, because there exist no adequate means
`to represent the fact that producers and viewers
`to represent the fact that producers and viewers
`know a lot of things which are neither as culturally
`know a lot of things which are neither as culturally
`independent as "common sense" has been
`independent as "common sense" has been
`presumed to be by AI researchers, nor as
`presumed to be by Al researchers, nor as
`professionally or academically specialized as expert
`professionally or academically specialized as expert
`knowledge.
`knowledge.
`
`Our initial steps toward our long-term
`Our initial steps toward our long-term
`research goals have been, what we tend to refer to
`research goals have been, what we tend to refer to
`as, a
`"literature
`review by critical
`re-
`as, a "literature review by critical re-
`implementation." We are trying to reassess and ex-
`implementation." We are trying to reassess and ex-
`tend older work in artificial intelligence (Al) to see if
`tend older work in artificial intelligence (AI) to see if
`it
`is arguably applicable to the relatively
`it is arguably applicable to the relatively
`unexamined domain of television. Our methodology
`unexamined domain of television. Our methodology
`involves re-implementing cognitive models as
`involves re-implementing cognitive models as
`computer programs and then integrating them into
`computer programs and then integrating them into
`larger systems for annotating, analyzing, and
`larger systems for annotating, analyzing, and
`generating video. Instead of "writing off" older
`generating video. Instead of "writing off" older
`work, we are attempting to give ourselves first-hand
`work, we are attempting to give ourselves first-hand
`experience with computer-based instantiations of
`experience with computer-based instantiations of
`prior research. Our aim has been to find a set of
`prior research. Our aim has been to find a set of
`indexing and inferencing techniques which will
`indexing and inferencing techniques which will
`allow us to create programs which can au-
`allow us to create programs which can au-
`tomatically create new videos by composing
`tomatically create new videos by composing
`together parts of others stored in a digital archive.
`together parts of others stored in a digital archive.
`The work reported in the present paper was
`The work reported in the present paper was
`originally initiated to illustrate how planning
`originally initiated to illustrate how planning
`techniques, as they have been described in the
`techniques, as they have been described in the
`artificial intelligence literature, are not applicable to
`artificial intelligence literature, are notapplicable to
`the task of video generation. Contradictorily, to our
`the task of video generation. Contradictorily, to our
`own surprise, we found that some planning
`own surprise, we found that some planning
`
`techniques are indeed of interest in the domain of
`techniques are indeed of interest in the domain of
`video generation.
`video generation.
`
`This paper is divided into two sections.
`This paper is divided into two sections.
`
`(1) An Example.-We give an example of the
`(1) An Example: We give an example of the
`sort of videos that our simplest system can
`sort of videos that our simplest system can
`generate. This simplest of systems is nothing
`generate. This simplest of systems is nothing
`fancy: its inferencing capabilities are built upon a
`fancy: its inferencing capabilities are built upon a
`GPS-type [Newell and others 1963] planner. But,
`GPS-type [Newell and others 1963] planner. But,
`the system’s output is of interest because it allows
`the system's output is of interest because it allows
`us to illustrate the sorts of mechanisms inherent to
`us to illustrate the sorts of mechanisms inherent to
`the domain of automatic video generation.
`the domain of automatic video generation.
`
`(2) GPS and Video Generation: We
`(2) GPS and Video Generation: We
`describe the architecture of our simplest system to
`describe the architecture of our simplest system to
`point out the sources of the strengths and
`point out the sources of the strengths and
`weaknesses illustrated by its output. Many
`weaknesses illustrated by its output. Many
`arguments have been made in the AI literature to
`arguments have been made in the Al literature to
`demonstrate that it is unrealistic to imagine that
`demonstrate that it is unrealistic to imagine that
`simple planning routines could ever do anything
`simple planning routines could ever do anything
`practical [Chapman 1987]. However, the analysis
`practical [Chapman 1987]. However, the analysis
`we provide of our system investigates how
`we provide of our system investigates how
`planning can be a tool for framing the problems of
`planning can be a tool for framing the problems of
`video generation: we find certain aspects of the
`video generation: we find certain aspects of the
`representations used in planners (e.g., operators
`representations used in planners (e.g., operators
`with add and delete lists) to be a useful description
`with add and delete lists) to be a useful description
`of concepts ubiquitous to film theory and thus
`of concepts ubiquitous to film theory and thus
`essential to any sort of reasoning about film and
`essential to any sort of reasoning about film and
`video. In addition, we point out some essential, but
`video. In addition, we point out some essential, but
`technically commensurable, differences between
`technically commensurable, differences between
`planning and story generation.
`planning and story generation.
`
`2. An Example: The "Rescue" Trailer
`2. An Example: The "Rescue" Trailer
`
`Our simplest video generator (which we call
`Our simplest video generator (which we call
`IDIC) uses a version of GPS [Newell and others
`IDIC) uses a version of GPS [Newell and others
`1963] to plan out a story; it indexes into an archive
`1963] to plan out a story; it indexes into an archive
`of digital video to select scenes to illustrate each
`of digital video to select scenes to illustrate each
`part of the story generated, and then edits together
`part of the story generated, and then edits together
`the scenes into a newly created video story. The
`the scenes into a newly created video story. The
`user can specify the sorts of actions that should be
`user can specify the sorts of actions that should be
`portrayed in the story that gets planned out by IDIC.
`portrayed in the story that gets planned out by IDIC.
`The query to IDIC which generated the "Rescue"
`The query to IDIC which generated the "Rescue"
`video represented in Figure 1 was the following:
`video represented in Figure 1 was the following:
`
`(idic (gps '() '(rescue) *sttng-movie-ops*)))
`(idic (gps '0 '(rescue) *sttng -movie -ops*)))
`
`The user calls GPS with a start state
`The user calls GPS with a start state
`(shown as empty in the example above), a conjunct
`(shown as empty in the example above), a conjunct
`of goals, and a list of operators; then, the output of
`of goals, and a list of operators; then, the output of
`GPS is passed to IDIC which assembles the
`GPS is passed to IDIC which assembles the
`appropriate video footage together to create a new
`appropriate video footage together to create a new
`video. We have written a library of GPS operators
`video. We have written a library of GPS operators
`for the domain of Star Trek- The Next Generation
`for the domain of Star Trek: The Next Generation
`(hereafter referred to as STTNG) trailers. In other
`(hereafter referred to as SUNG) trailers. In other
`words, IDIC generates new STTNG trailers from an
`words, IDIC generates new SUNG trailers from an
`archive of existing trailers for STTNG episodes.
`archive of existing trailers for STTNG episodes.
`
`3 3
`
`Page 5 of 10
`
`MINDGEEK EXHIBIT 1014
`
`
`
`We are using a modified version of the
`We are using a modified version of the
`GPS program that can be found in [Norvig, 1992].
`GPS program that can be found in [Norvig, 1992].
`Running GPS with the goal to generate a story (i.e.,
`Running GPS with the goal to generate a story (i.e.,
`a story plan) which contains a rescue in it, GPS
`a story plan) which contains a rescue in it, GPS
`produces the following output:
`produces the following output:
`
`Goal: rescue
`Goal: rescue
`Consider: pre-emptive-rescue
`Consider: pre-emptive-rescue
` Goal: threaten
`Goal: threaten
` Consider: threaten-renewed-violence
`Consider: threaten-renewed-violence
` Goal: fight
`Goal: fight
` Consider: escalate
`Consider: escalate
` Goal: threaten
`Goal: threaten
` Consider: break-down
`Consider: break-down
` Goal: negotiate
`Goal: negotiate
` Consider: appease
`Consider: appease
` Goal: threaten
`Goal: threaten
` Consider: de-escalate
`Consider: de-escalate
` Goal: fight
`Goal: fight
` Consider: establishing-negotiate
`Consider: establishing-negotiate
` Action: establishing-negotiate
`Action: establishing-negotiate
` Action: break-down
`Action: break-down
` Action: threaten-renewed-violence
`Action: threaten-renewed-violence
`Action: pre-emptive-rescue
`Action: pre-emptive-rescue
`
`Story Plan:
`Story Plan:
`((executing establishing-negotiate)
`((executing establishing-negotiate)
` (executing break-down)
`(executing break-down)
` (executing threaten-renewed-violence)
`(executing threaten-renewed-violence)
` (executing pre-emptive-rescue))
`(executing pre-emptive-rescue))
`
`The final story plan produced uses four
`The final story plan produced uses four
`GPS operators (establishing-negotiate, etc.). IDIC
`GPS operators (establishing-negotiate, etc.). IDIC
`illustrates GPS's output using four scenes selected
`illustrates GPS's output using four scenes selected
`from its archive of digital video. Figure 1 contains
`from its archive of digital video. Figure 1 contains
`six stills from a video that was automatically created
`six stills from a video that was automatically created
`by IDIC in response to the query above. The stills
`by IDIC in response to the query above. The stills
`are numbered in the temporal order of the video
`are numbered in the temporal order of the video
`produced by IDIC. The lower-left corner of Figure 1
`produced by IDIC. The lower-left corner of Figure 1
`is a “road map” describing how GPS planned out
`is a "road map" describing how GPS planned out
`the four scenes that constitute the final video. The
`the four scenes that constitute the final video. The
`six small panels in the lower-left are reproductions
`six small panels in the lower-left are reproductions
`of the six stills which can be seen at the top of the
`of the six stills which can be seen at the top of the
`figure. The connections between the scenes
`figure. The connections between the scenes
`represented by the stills are noted in the lower left
`represented by the stills are noted in the lower left
`by a sequence of arrows. Each of the arrows is
`by a sequence of arrows. Each of the arrows is
`labeled with the GPS operator which was used to
`labeled with the GPS operator which was used to
`link two scenes together. Figure 1 is, thus, a
`link two scenes together. Figure 1 is, thus, a
`representation of the video produced by IDIC and
`representation of the video produced by IDIC and
`summarization of how the GPS operators “explain”
`summarization of how the GPS operators "explain"
`the connections between the different scenes in the
`the connections between the different scenes in the
`video.
`video.
`
`3. Generating Trailers with GPS
`3. Generating Trailers with GPS
`
`3.1 Why Star Trek The Next Generation
`3.1 Why Star Trek The Next Generation
`Trailers?
`Trailers?
`
`We chose to analyze, represent, and
`We chose to analyze, represent, and
`generate trailers of the popular syndicated
`generate trailers of the popular syndicated
`
`4 4
`
`television series Star Trek- The Next Generation for
`television series Star Trek: The Next Generation for
`several reasons: its characters and stories all take
`several reasons: its characters and stories all take
`place within one limited, yet rich narrative universe;
`place within one limited, yet rich narrative universe;
`and there is a practice among Star Trek fans of re-
`and there is a practice among Star Trek fans of re-
`editing shows as well as generating new stories
`editing shows as well as generating new stories
`within the narrative universe of Star Trek which has
`within the narrative universe of Star Trek which has
`been studied by researchers [Jenkins 1992].
`been studied by researchers [Jenkins 1992].
`Furthermore, trailers, because of their length, are
`Furthermore, trailers, because of their length, are
`also a tractable object of study both for practical
`also a tractable object of study both for practical
`(disk space) and theoretical (narrative complexity)
`(disk space) and theoretical (narrative complexity)
`reasons.
`reasons.
`
`3.2 Representing Media: Audio and Video
`3.2 Representing Media: Audio and Video
`in STTNG Trailers
`in STTNG Trailers
`
`The first step in making a video generator
`The first step in making a video generator
`is to analyze the structure of what is to be
`is to analyze the structure of what is to be
`generated. For STTNG trailers, as with most
`generated. (cid:9)
`For STTNG trailers, as with most
`videos, the main structural decomposition is into
`videos, the main structural decomposition is into
`separate video and audio tracks. These tracks can
`separate video and audio tracks. These tracks can
`be broken down into logical segmentations: for the
`be broken down into logical segmentations: for the
`video, scenes separated by cuts; and for the audio,
`video, scenes separated by cuts; and for the audio,
`dialogue segments separated by pauses.
`dialogue segments separated by pauses.
`
`In analyzing the structure of the dialogue
`In analyzing the structure of the dialogue
`segments we found an unexpected result: a single
`segments we found an unexpected result: a single
`STTNG trailer can be decomposed into two
`STTNG trailer can be decomposed into two
`separate, yet coherent, trailers which have no
`separate, yet coherent, trailers which have no
`scenes in common. For example, as shown below,
`scenes in common. For example, as shown below,
`by annotating a single trailer according to who is
`by annotating a single trailer according to who is
`speaking in each scene, we are able to make two
`speaking in each scene, we are able to make two
`trailers: a trailer in which only the narrator speaks,
`trailers: a trailer in which only the narrator speaks,
`and a trailer in which only the characters speak (the
`and a trailer in which only the characters speak (the
`italicized text).
`italicized text).
`
`Narrator: On an all new episode of Star Trek the
`Narrator: On an all new episode of Star Trek the
`Next Generation...
`Next Generation...
`Duras: You are a traitor/
`Duras: You are a traitor!
`Narrator: Worf is accused of treason and faces a
`Narrator: Worf is accused of treason and faces a
`Klingon death penalty.
`Klingon death penalty.
`Worf: /NS a good day to die.
`Worf: It is a good day to die.
`Narrator: His enemies are hiding the truth that
`Narrator: His enemies are hiding the truth that
`could free him.
`could free him.
`Picard: You will not execute a member of my
`Picard: You will not execute a member of my
`crew.
`crew.
`Narrator: Now Picard must risk his life to defend
`Narrator: Now Picard must risk his life to defend
`Worf's innocence
`Worf's innocence
`Narrator: on the next all new episode of Star Trek
`Narrator: on the next all new episode of Star Trek
`the Next Generation.
`the Next Generation.
`
`These two different trailers also elucidate
`These two different trailers also elucidate
`the relationship between the separate audio and
`the relationship between the separate audio and
`video tracks of the movie. Listening to the audio
`video tracks of the movie. Listening to the audio
`only, the narrator's trallertells a coherent story in
`only, the narrator’s trailer tells a coherent story in
`which the characters are named (Worf, Picard,
`which the characters are named (Worf, Picard,
`enemies) and the action and basic conflict are
`enemies) and the action and basic conflict are
`described. The characters' trallerrelies far more on
`described. The characters’ trailer relies far more on
`the video track for its coherence. Characters are
`the video track for its coherence. Characters are
`
`Page 6 of 10
`
`MINDGEEK EXHIBIT 1014
`
`
`
`identified by being seen rather than being spoken
`identified by being seen rather than being spoken
`of (for they predominantly speak in the first or
`of (for they predominantly speak in the first or
`second person, rather than being spoken about in
`second person, rather than being spoken about in
`the third person as in the narrator’s trailer),
`the third person as in the narrator's trailer),
`sentences often contain deictic references which
`sentences often contain deictic references which
`can only be resolved visually (in another trailer Data
`can only be resolved visually (in another trailer Data
`says “You cannot survive in this.” -- this sentence
`says "You cannot survive in this." -- this sentence
`relies on the video to fill in what “this” refers to), and
`relies on the video to fill in what "this" refers to), and
`the main action and conflict of the story are
`the main action and conflict of the story are
`depicted in the video rather than described in the
`depicted in the video rather than described in the
`audio. Given these results one can postulate two
`audio. Given these results one can postulate two
`theoretical extremes into which a trailer can be
`theoretical extremes into which a trailer can be
`decomposed: a trailer which is audio only and a
`decomposed: a trailer which is audio only and a
`trailer which is silent.
`trailer which is silent.
`There are relevant historical examples for
`There are relevant historical examples for
`both theoretical extremes. For an audio-only
`both theoretical extremes. For an audio-only
`narrative the examples are numerous ranging from
`narrative the examples are numerous ranging from
`pre-literate oral storytelling and epic poetry to radio
`pre-literate oral storytelling and epic poetry to radio
`plays. For a video only narrative there are
`plays. For a video only narrative there are
`interesting precedents from theater and film: the
`interesting precedents from theater and film: the
`"dumb show" performed in Ham/et III.ii reminds us
`"dumb show" performed in Hamlet III.ii reminds us
`of the theatrical practice of pantomime stories (a
`of the theatrical practice of pantomime stories (a
`"dumb show" functioned as a sort of "trailer" for a
`"dumb show" functioned as a sort of "trailer for a
`play by silently enacting its plot in short form before
`play by silently enacting its plot in short form before
`the play began); and the triumphs of Chaplin and
`the play began); and the triumphs of Chaplin and
`others from the early days of cinema provide us
`others from the early days of cinema provide us
`with a rich tradition of silent visual narratives (in its
`with a rich tradition of silent visual narratives (in its
`purest non-verbal
`form
`this corpus would
`purest non-verbal form this corpus would
`encompass scenes which did not make use of "title-
`encompass scenes which did not make use of "title-
`cards" to provide narrative information).
`cards" to provide narrative information).
`
`In the case of audio only narrative, one can
`In the case of audio only narrative, one can
`imagine a trailer in which the representation of
`imagine a trailer in which the representation of
`action and story is wholly dependent on a
`action and story is wholly dependent on a
`representation of the dialogue. This task would
`representation of the dialogue. This task would
`then be one of text interpretation in natural
`then be one of text interpretation in natural
`language (ignoring for the moment the use of non-
`language (ignoring for the moment the use of non-
`speech audio in the trailer). In the case of video
`speech audio in the trailer). In the case of video
`only narrative, the representation of action and
`only narrative, the representation of action and
`story would rely wholly on the representation of the
`story would rely wholly on the representation of the
`visual events and transitions in the video, of story
`visual events and transitions in the video, of story
`elements which are intelligible without any use of
`elements which are intelligible without any use of
`sound. An ideal representation of a trailer would
`sound. An ideal rep