`
`Jonathan Dakss, Stefan Agamanolis, Edmond Chalom, and V. Michael Bove, Jr.
`
`MIT Media Laboratory
`20 Ames Street, Cambridge, Massachusetts 02139 USA
`
`ABSTRACT
`
`Hyperlinked video is Video in which specific objects are made selectable by some form of user interface, and
`the user’s interactions with these objects modify the presentation of the video. Identifying and tracking the
`objects remains one of the chief difficulties in authoring hyperlinked Video; we solve this problem through the
`use of a Video tracking and segmentation algorithm that uses color, texture, motion, and position parameters.
`An author uses a computer mouse to scribble roughly on each desired object in a frame of Video, and the
`system generates a segmentation mask for that frame and following frames. We have applied this technique
`in the production of a soap opera program, with the result that the user can inquire about purchasing
`clothing and furnishings used in the show. We will discuss this and other uses of the technology, describe our
`experiences in using the segmentation algorithm for hyperlinked Video purposes, and present several different
`user—interface methods appropriate for hyperlinked video.
`
`Keywords: hyperlinked video, hypertext, Video object segmentation, video object tracking, digital television
`
`1. INTRODUCTION
`
`Users of the World Wide Web are familiar with the concept of hyperlinks, in which “clicking” on specially
`tagged words or graphics in a document retrieves other documents, or perhaps modifies the current one.
`The idea of applying the same kind of interaction in video programs has often been discussed as a desirable
`possibility 7 consider for instance a fashion program in which clicking on an article of clothing provides
`information about it, or a nature documentary in which children click on plants and animals in the scene to
`learn more about them. Playback of such material is well within the capabilities of typical digital television
`decoders with graphical overlay capability, but creating it has posed a challenge because of the difficulty of
`identifying and tracking the selectable regions in every frame, by either manual or automatic methods.
`
`We have developed a method for tracking and segmenting video objects that simplifies the process of creating
`hyperlinked video. The author of the video uses a computer mouse to scribble roughly on each desired object
`in a frame of video and the system generates full segmentation masks for that frame and for following and
`preceding frames until there is a scene change or an entrance of new objects. These masks label every pixel
`in every frame of the video as belonging to one of the regions roughly sketched out by the author at the
`beginning of the process. The author may then associate each region with a particular action (e.g. graphical
`overlay, switching to a different Video data stream, transmission of data on a back channel). During playback,
`the viewer can select objects with a mouse or an analogous device, such as an enhanced TV remote control
`with point-and—click capability. In our demonstrations, we use a Video projector that can identify the location
`of a laser pointer aimed at its screen.
`Further author information:
`J.D.: E—mail: dakss©media.mit.edu
`S.A.: E—mail: stefan©media.mit.edu
`E.C.: E—mail: chalom©media.mit.edu
`
`V.M.B.(correspondence): E—mail: vmb@media.mit.edu
`
`Hulu
`
`Exhibit 1012
`
`Page 0001
`
`Hulu
`Exhibit 1012
`Page 0001
`
`
`
`We apply a novel method of using color, texture, motion, and position features to segment and track video
`objects. Our system uses a combination of these attributes to develop multi-modal statistical models for
`each region as roughly defined by the author. The system then creates the segmentation masks by finding
`areas that are statistically similar and tracking them throughout a video scene. The authoring tool and the
`playback system are supported by Isis, a programming language specially tailored for object—based media.
`
`We utilized this system to create HyperSoap, a hyperlinked video program that resembles television serial
`dramas (known as “soap operas”) in which the viewer can select props, clothing and scenery to see purchasing
`information for the item such as the item’s price and retailer. We produced this program entirely from scratch,
`not starting with pre—made video material,
`in order to learn more about how the production (scripting,
`shooting, editing) of hyperlinked video would differ from that of traditional television programming. We
`also learned a great deal about how people interact with hyperlinked video, and based our design of several
`modes of user interaction on this information.
`
`In the following section we briefly describe existing hyperlinked video software. In section 3, we discuss our
`authoring system and how it differs from other approaches. In sections 4 and 5, we describe HyperSoap in
`more detail and discuss several issues that we confronted in the process of designing it. Section 6 summarizes
`our work and presents other applications for this technology.
`
`2. RELATED WORK
`
`Several companies have announced products for authoring hyperlinked video. VisualSHOCK MOVIE from
`the New Business Development Group of Mitsubishi Electric America, Inc.1 concentrates on playback from
`the Web or a local source (e.g. CD—ROM, DVD—ROM). The authoring tool, called the Movie MapEditor,
`requires specifying a rectangular bounding box around the location of a desired object in the starting and
`ending frames of a video sequence, and a tracking algorithm estimates the position of the “hot button” in
`intermediate frames. Manual tracking by the operator is also supported. HotVideo from IBM’s Internet
`Media Group2 has a similar approach, but supports a larger set of geometrical object shapes. Intermediate
`locations of an object may be specified in multiple keyframes or located by a tracking algorithm. Veon’s
`V—Active?’ is another current authoring tool that incorporates an object—tracking algorithm.
`
`Although satisfactory in some situations, these methods for associating links to objects are too coarse for
`many varieties of video sequences. Particular difiiculties include associating links to objects that overlap or
`are enclosed within larger objects, such as a window of a house and the house itself, and perhaps a large
`tree partially obscuring both. To enable the association of links with arbitrarily shaped regions that may be
`changing over time, a system that can differentiate among objects with greater precision is needed. Many such
`segmentation systems exist that employ an automatic process to identify objects in video or imagery with
`block—level or pixel—level precision. For example, the VisualSEEk system4 automatically identifies regions
`based on color and texture distribution. This and other automatic systems are limited by the quality of the
`segmentation of objects and the relevance of the identifiable objects to the needs of the author. Precise and
`high—resolution segmentation masks can be an important factor in playback systems for hyperlinked video
`as well, since they could be used in graphical overlays to indicate the presence of hyperlinks or to highlight
`objects when they are selected by the viewer.
`
`3. OUR APPROACH
`
`We have developed a novel segmentation system that classifies every pixel in every frame of a video sequence
`as a member of an object or group of objects.56 The author of the video defines what the objects are by
`providing the algorithm with “training data77 in the form of rough scribbles on each object in one frame of a
`sequence. The system creates a statistical model for each object based on color, texture, motion, and position
`features calculated from the training data. These models are then used to classify each of the remaining
`pixels in the frame as a member of the object with the highest statistical similarity. By tracking the training
`data forward and backward through the sequence and applying the same operations, several seconds worth
`of video can be segmented. In the following sections, we describe this process in greater detail.
`
`Hulu
`
`Exhibit 1012
`
`Page 0002
`
`Hulu
`Exhibit 1012
`Page 0002
`
`
`
`
`
`Figure 1. A frame from the HyperSoap video, and an example of “training data” scribbled by the author.
`
`3.1. Object identification
`
`In the first step of the segmentation process, the author selects a single representative frame from the
`sequence (typically one in which the important objects are fully visible) and highlights representative pixels
`inside of each object using a simple drawing tool. This information serves as the “training data” for the
`segmentation algorithm (Figure 1).
`
`The system will also estimate the location of the training data pixels within each of the remaining frames
`in the sequence using a block-matching tracking scheme. When this stage has completed, there are pixels in
`each frame that have been classified to correspond to the objects defined by the author.
`
`3.2. Feature calculation
`
`In the next stage of the process, the system calculates a multi-dimensional feature vector for every pixel
`in the video sequence. Unlike many segmentation systems that use only one or two different features to
`help distinguish objects in a video sequence, our system allows the author to select any combination from a
`variety of different features that estimate the color, motion, and texture properties of each pixel. The row
`and column positions of a pixel in the frame may also be used as features. Combining the data of several
`features in this way yields a more robust segmentation whose quality is less affected by misleading data from
`a single feature.
`
`Three primary color spaces, RGB, YIQ and LAB, may be used to indicate the color features of pixels,
`although other spaces are possible as well. Motion features may be calculated with two different constant
`brightness constraint (or “optical flow”) algorithms, one proposed by Kanade and Lucas7 and the other
`by Horn and Schunck.8
`The system incorporates two different texture classification models. One is a
`local-statistics measure which computes intensity mean and standard deviation across pixel blocks at several
`scales. A more complex technique, known as simultaneous autoregressive modeling (SAR),9 uses a linear
`prediction method in which a pixel’s value is estimated from those of its neighbors. Here the weighting
`coefficients of the neighbors and the error term are the texture features.
`
`3.3. Pixel classification
`
`The next step in the process is to build statistical models for each object based on the feature information
`calculated for the tracked “training data” pixels. The system then compares the feature information of each
`remaining pixel to these models and labels the pixel as a member of the region to which it bears the greatest
`statistical similarity. A certainty measurement is also included in the classification of each pixel.
`
`Because only statistical methods are employed to classify pixels, it is likely that there will be small aberrations
`
`Hulu
`
`Exhibit 1012
`
`Page 0003
`
`Hulu
`Exhibit 1012
`Page 0003
`
`
`
`
`
`Figure 2. The output segmentation mask for the frame shown in Figure 1.
`
`in the output, such as a small group of incorrectly classified pixels surrounded by properly classified pixels.
`To help rectify these anomalies, any pixel that was classified with a certainty factor below a specific threshold
`is reclassified according to a “K-nearest neighbor” strategy that reassigns the pixel to the object that is most
`popular among its neighboring pixels. Once this stage is complete, the final output of the system is a
`segmentation mask that associates every pixel in the entire video sequence to a particular object (Figure 2).
`
`3.4. Linking objects with actions
`
`The last step in the authoring process is to link the objects in the video with the actions that should be
`taken when they are selected by the viewer, such as the display of additional information or a change in
`the presentation of the video. Selecting an object might cause a graphic overlay to appear containing a
`description of the object, or it might cause a new video clip to be displayed. For example, in HyperCafelO
`the viewer can navigate through various video clips of conversations occurring in a cafe by selecting actors
`in the video or icons that appear at certain times during the presentation.
`
`Linking objects with actions involves creating a system for associating each object with some kind of data,
`such as a URL, a procedure, an image, or a structure containing several separate pieces of data.
`It also
`involves writing the computer program or script that will interpret that data to perform the desired action.
`Associating objects with data can be accomplished in a number of ways, including information attached
`separately to each frame of the video or a central database that can be queried during playback. For
`example, VisualSHOCK maintains an “anchor list,” accessible via a drop—down menu system, where data
`structures containing link information are stored and later referenced by the VisualSHOCK movie player to
`perform the desired action.
`
`In our system, a simple text file database associates each object with important data, although other strate-
`gies may be used. The playback system, written in the Isis programming language,11 includes instructions
`on how to read and interpret the data in the database. When a viewer selects an object in the video, the value
`of the corresponding pixel in the segmentation mask identifies the object. The playback system retrieves the
`data associated with that object from the database and changes the presentation of the video accordingly.
`
`4. HYPERSOAP
`
`We used our authoring tool to create HyperSoap, a four—minute hyperlinked video drama closely patterned
`after daytime television soap operas. The viewer watches this show on a large projection screen with the
`knowledge that everything in the scene is for sale, including clothes, props and furnishings. Instead of using
`a mouse, the viewer selects objects with a laser pointer (Figure 3). Pointing the laser at the screen highlights
`selectable objects, and keeping the pointer fixed on one object for a short period selects it (indicated by a
`
`Hulu
`
`Exhibit 1012
`
`Page 0004
`
`Hulu
`Exhibit 1012
`Page 0004
`
`
`
`
`
`Figure 3. Viewers interact with HyperSoap by aiming a laser pointer at objects in a large screen video
`projection.
`
`change in color of the highlight).
`
`Depending on the mode of playback and the preferences of the viewer, the playback system will display
`information about the selected object in a number of different ways.
`In one particular mode, the system
`waits for an appropriate point to interrupt the video, typically when an actor has finished speaking his
`line, and displays a separate screen containing a detailed still image of the selected product along with a
`text box that includes the product’s brand name, description, and price.
`In another mode, appropriate
`for a broadcast scenario or when the viewer desires more instant feedback, an abbreviated information box
`appears immediately, without pausing the video, and then fades away after a few seconds (Figure 4).
`If
`requested by the viewer, a summary of all the products that were selected is shown at the end of the video.
`
`We oversaw every aspect of the production of HyperSoap, including scriptwriting, storyboarding, shooting
`and editing. We also supervised the creation of a unique musical soundtrack in which individual pieces,
`composed to match the mood of a particular part of the scene, are capable of being seamlessly looped and
`cross-faded. When the video is paused to display product information, the music continues to play in the
`background, lessening the impact of the interruption on the continuity of the video.
`
`The pixel—level segmentation of the 40 products appearing in HyperSoap was good enough to permit the
`use of the segmentation mask for highlighting objects. A total of 45 shots were processed through the
`system individually, where the number of linkable objects per shot ranged from 10 to 20. We found that to
`maintain a consistently high quality of segmentation, we needed to provide new training data to the system
`approximately every 30 frames, or one second of video, in order to minimize the propagation of error when
`estimating the location of the trained pixels in unmarked frames. However, this still provided an exceptional
`level of automation of the segmentation process; the author scribbled on an average of 5000 pixels in each
`training frame, meaning that for each 30—frame sequence the algorithm was required to classify 99.8 percent
`of the total pixels.
`
`Hulu
`
`Exhibit 1012
`
`Page 0005
`
`Hulu
`Exhibit 1012
`Page 0005
`
`
`
`$10
`
`Marina ID.
`
`Earrings
`
`Figure 4. A screen grab of HyperSoap during playback in which an object has been selected. The crosshairs
`have been added to indicate where the viewer is aiming the laser pointer.
`
`5. DESIGNING HYPERSOAP
`
`In this section, we describe some of the issues we were confronted with throughout the process of designing
`HyperSoap, such as how viewers should interact with our video, how the presence of hyperlinks should be
`indicated, and how hyperlink actions should be executed. In addition we discuss ways in which the production
`of hyperlinked video might differ from that of traditional programs.
`
`5.1. Venue and mode of interaction
`
`One of the most critical issues in designing hyperlinked video is one of venue. The video might be presented
`in a window on a computer monitor or on a standard television set, or perhaps in a large screen projection.
`The viewer might be sitting at a desk at work or on a sofa at home. Several different kinds of devices might
`be used to select objects and interact with the video. In some ways, using a mouse on a PC may be natural
`for hyperlinked video, considering that people are familiar with using a mouse to activate hyperlinks on the
`World Wide Web. However, the desktop is not always an ideal place to view video. Many of the genres
`of content suitable for hyperlinked video are those that people are accustomed to watching in their living
`rooms.
`
`A television viewing situation may be more natural for certain kinds of programming, but devices enabling
`viewers to interact with the video are less developed in this environment than in other venues. Since viewers
`are comfortable using a remote control device to browse through channels of programming on their television,
`it makes sense that they might also use this device to browse through hyperlinks. For example, WebTV’s
`WebPIP12 users can press a button on their remote control when an icon appears on the screen indicating
`the presence of a link. They can then choose to display the web page content referenced by the link or
`archive the link for later viewing. However, this system allows for only one link per segment of video. We
`envision several different kinds of devices that could be used to select from several links that are available
`
`simultaneously. For example, a viewer could cycle through available links with a button on a remote control,
`and then activate a link by pressing another button. Other approaches might incorporate a position touch
`pad or a joystick. Alternatively, embedding an inertial sensor inside a remote control would allow the viewer
`to move the remote in the air to control the position of a cursor on the screen.
`
`We felt the usual computer screen and mouse style of interaction would not be appropriate for the kind of
`content we were developing in HyperSoap. We designed our playback system with the intent of simulating
`a future television viewing scenario, one in which a device with significant computational ability (e.g. a
`set-top box, digital television receiver or DVD player) would be capable of mediating viewer interactions and
`modifying the presentation based on the author’s design. Our prototype viewing system includes a large
`screen projection driven by a workstation running Isis, the programming language in which the playback
`
`Hulu
`
`Exhibit 1012
`
`Page 0006
`
`Hulu
`Exhibit 1012
`Page 0006
`
`
`
`software is written.
`
`As discussed earlier, the viewer selects objects with a hand-held laser pointer. A small video camera attached
`to the video projector enables the playback system to sense the location of the laser dot on the projection.
`This is possible because the red dot generated by the laser is always brighter than the brightest possible
`image our projector can produce. Since the camera’s image is never perfectly aligned to the projection, a
`coordinate transformation is applied to correct for any displacement or keystone distortion. The parameters
`for this transformation need to be calculated only once by a special calibration program after the position
`of the projector/ camera is fixed relative to the screen.
`
`5.2. Indication of hyperlinks
`
`When displaying hyperlinked video, it is important to indicate the presence of hyperlinks to viewers in a
`manner suited to the application and to the goals of the viewing experience. In most cases, the presence of
`hyperlinks should be readily apparent but not overly distracting. If possible, the preferences of the viewer
`should be addressed as well.
`
`Many of the existing hyperlinked video playback tools that employ a pointing device for interaction indicate
`the presence of hyperlinks by changing the shape of the cursor when it is positioned over a hyperlinked
`object. When this happens, a name or description for the link may be displayed in an area of the computer
`screen. For example, IBM’s HotVideo software changes the cursor to an icon that corresponds to the media
`type of the information contained in the link. HotVideo also displays an icon at all times during playback
`that changes color to indicate when hyperlink opportunities exist. Similarly, the WebPIP system displays a
`small icon in a graphic overlay when a hyperlink opportunity is available.
`
`Other approaches have been used as well. HotVideo is capable of displaying wireframe shapes around the
`hyperlinked objects in the video, although this method can cause the video to become quite cluttered if there
`are several hyperlinks in a single frame. It can also be confusing to the viewer if a linked object occupies a
`large portion of the video frame or surrounds other objects. A HotVideo author can also choose to indicate
`hyperlinks by changing the brightness or tint of the pixels inside the wireframe shape, or by applying other
`transformations.
`
`If the viewer is given this
`In developing HyperSoap, we wanted every object to be selectable all the time.
`knowledge prior to watching the video, then indicating the presence of hyperlinks is not necessary. When
`nobody is interacting, the video looks fairly ordinary, enabling the viewer to watch the presentation in
`a more passive manner without any distraction that might arise from indicating hyperlink opportunities.
`When the viewer selects an object with the laser pointer, the playback system utilizes the information in
`the segmentation mask to highlight all of the pixels that comprise the object. The highlight remains on the
`object for a few seconds after the laser pointer is turned off and while the corresponding product information
`is presented, and then fades out. We also found that subsampling the segmentation mask by a factor of two
`had a minimal effect on the quality of the rendered object highlights.
`
`5.3. Initiation of hyperlink actions
`
`Unlike traditional hypertext documents, the temporal nature of hyperlinked video raises interesting issues
`regarding the timing of actions associated with selected hyperlinks.
`In many hyperlinked video playback
`systems, clicking on a link causes a new web page to be loaded immediately in a separate window. However,
`displaying lengthy or detailed information while the video continues to run may cause the viewer to feel
`distracted from the video content. Likewise, any delay in showing the link information may blur the context
`of the information such that it is no longer relevant to the video presentation. Thus, it is important to
`consider how and when hyperlink actions should occur in order to best serve the goals of the application. It
`may be important to maintain the continuity of the video while still presenting the desired link information
`to the viewer in a timely and meaningful fashion.
`
`From our experimentation with HyperSoap, we found that viewers have a variety of preferences regarding
`the timing with which product information is displayed when an object is selected. When a viewer selected
`
`Hulu
`
`Exhibit 1012
`
`Page 0007
`
`Hulu
`Exhibit 1012
`Page 0007
`
`
`
`a linked object in an early version of HyperSoap, the video was interrupted immediately to show the link
`content. After a few seconds, the link content would disappear and the video would resume from the point it
`was interrupted. While some users enjoyed the instant response from the system, others found it somewhat
`troubling that their interactions would cut off actors in the middle of delivering lines or interrupt other
`important moments within the program. Also, viewers tended to forget the context of the video content
`before the interruption and found themselves somewhat lost in the plot when it resumed.
`
`Our next implementation yielded a more desirable manner of interaction: when a viewer selected a hyperlink,
`the system waited until the video reached a break in the action, typically after an actor finished speaking a
`line, before pausing the video and showing the link information. This yielded a more favorable reaction from
`viewers, although the delay in displaying the information caused many first—time users to think the playback
`mechanism was not working properly. Others who felt more interested in the link information expressed that
`they would have preferred instant feedback.
`
`In the latest implementation, there are three modes from which to choose. In the first, link information is
`displayed at breakpoints as described above, but with the addition that a selected object remains highlighted
`until the breakpoint is reached in order to indicate to the viewer that the system hasn’t “forgotten” about
`it. In the second mode, abbreviated product information is displayed immediately in a graphic overlay when
`an object is selected, without pausing the video. In the last mode, no information about selected products
`is displayed until the very end of the show.
`
`5.4. Production design
`
`HyperSoap raises several interesting issues regarding the differences between traditional television program-
`ming and hyperlinked video content. A traditional television soap opera is designed in such a way that
`simply adding hyperlinks and allowing interruptions would likely detract from its effectiveness because they
`would obstruct the flow of the story.
`
`In designing the content for HyperSoap, we had to move away from a format where the story is central and
`any product placement or viewer interaction are decorations, to a new format where all of these components
`are equally important and dependent on each other for the survival of the presentation. The knowledge that
`everything would be for sale and that viewers would be interacting was part of the program’s design from
`the earliest stages of planning. This entailed several changes in the way the scene was scripted, shot, and
`edited when compared to a traditional TV program.
`
`For example, shots were designed to emphasize the products for sale while not diverting too much attention
`from the flow of the story. Certain actions are written into the script in order to provide natural opportunities
`for close-ups of products which would have otherwise been difficult to see or select with our interaction device.
`Similarly, the scene is edited so that shots are either long enough or returned to often enough for viewers to
`spot products and decide to select them. The story itself involves the products in ways that might increase
`their value to the consumer. Integrating the story with the products makes interruptions to show product
`information less jarring overall. The knowledge that everything is for sale and that the program was designed
`for the dual purpose of entertainment and shopping motivates the viewer to “stay tuned77 and to interact
`with the products.
`
`6. CONCLUSIONS
`
`The pixel—level segmentation enabled by our approach was critical in HyperSoap because the scene contained
`many oddly—shaped and overlapping objects. Identifying objects by scribbling roughly on key frames in the
`sequence was an effective way of dealing with complex shapes and scenes containing large numbers of objects.
`The pixel-level tracking of objects helped automate the authoring process since many of these objects move
`rapidly and change shape as a result of both camera movement and the action in the scene.
`
`We are currently exploring how hyperlinked video capabilities might be utilized in educational applications.
`For example, children could collect specimens and learn more about particular aspects of nature by selecting
`
`Hulu
`
`Exhibit 1012
`
`Page 0008
`
`Hulu
`Exhibit 1012
`Page 0008
`
`
`
`animals and foliage in a “safari” program. Hyperlinked video might also be useful in training applications,
`especially those that involve complicated tools and a strong temporal component. Consider a presentation of
`a surgical procedure in which selecting objects would allow viewers with different backgrounds to learn more
`about the instruments and people involved or follow alternative edits of the footage that emphasize their
`individual interests or preferences. Stories in which there is an emphasis on interactions with physical objects
`can also benefit from hyperlinked video. For example, we implemented a different version of HyperSoap in
`which selecting objects reveals hidden plot details related to those objects.
`
`The tools we have developed can be applied to these and other kinds of hyperlinked video programming as
`well. Although our segmentation system can handle many different types of video content, it is particularly
`well—suited to highly detailed sequences containing complex object shapes and movements. Our playback
`system was specially tailored to fulfill the specific goals of the HyperSoap application, but many of the ideas
`behind its design are applicable to other kinds of content. In addition, the creation of HyperSoap alerted us
`to many important aspects of the design of hyperlinked video in general, including production techniques,
`choice of venue, and methods for viewer interaction.
`
`ACKNOWLEDGMENTS
`
`We would like to thank Michael Ponder, J CPenney, and the J CPenney Headquarters Production Staff for
`their invaluable assistance with the HyperSoap shoot. We would also like to thank Kevin Brooks, Paul
`Nemirovsky and Alex Westner for providing their scriptwriting, scoring and Foley sound expertise. Also a
`special thank you to JuliaAnn Lewis, Matthew Rupe and Andrew T. Chandler, for their tremendous acting
`talents. This research has been supported by the Digital Life Consortium at the MIT Media Laboratory.
`
`REFERENCES
`
`9.03.“?
`
`Mitsubishi Electric America, Inc., VisualSHOCK MOVIE Web Site, http://www.visualshock.com.
`IBM, HotVideo Web Site, http://www.software.ibm.com/net.media/solutions/hotvideo.
`Veon, V-Active Web Site, http://www.veon.com.
`J. R. Smith and S.—F. Chang, “Local color and texture extraction in spatial query,” in IEEE Proc. Int.
`Conf. Image Processing, pp. 11171011711171016, 1996.
`5. E. Chalom, Statistical Image Sequence Segmentation Using Multidimensional Attributes. PhD thesis,
`MIT, Cambridge MA, 1998.
`6. E. Chalom and V. M. Bove, Jr., “Segmentation of an image sequence using multi-dimensional image
`attributes,” in IEEE Proc. Int. Conf. Image Processing, pp. II75257II7528, 1996.
`7. T. Kanade and B. D. Lucas, “An iterative image registration technique with an application to stereo
`vision,” in Proc. DARPA Image Understanding Workshop, pp. 1217130, 1981.
`8. B. K. P. Horn and B. G. Schunck, “Determining optical flow,” AI Memo 572, MIT Artificial Intelligence
`Laboratory, April 1980.
`9. A. K. Jain and J. Mao, “Texture classification and segmentation using multiresolution simultaneous
`autoregressive models,” Pattern Recognition 25, pp. 1737188, February 1992.
`10. N. Sawhney, D. Balcom, and I. Smith, “Authoring and navigating video in space and time,” IEEE
`Multimedia 4, pp. 30739, October 1997.
`11. S. Agamanolis and V. M. Bove, Jr., “Multilevel scripting for responsive multimedia,” IEEE Multimedia
`4, pp. 40749, October 1997.
`
`12. WebTV Web Site, http://www.webtv.com.
`
`Hulu
`
`Exhibit 1012
`
`Page 0009
`
`Hulu
`Exhibit 1012
`Page 0009
`
`