throbber
HUMAN-COMPUTER INTERACTION ‘om
`
`Kluwer Academic Publishers
`
`AMAZON EX.1011
`
`Amazon v. CustomPlay
`
`US Patent No. 9,124,950
`
`
`
`Page i
`
`Page i
`
`AMAZON EX. 1011
`Amazon v. CustomPlay
`US Patent No. 9,124,950
`
`

`

`PERSONALIZED DIGITAL TELEVISION
`
`Page ii
`
`

`

`HUMAN-COMPUTER INTERACTION SERIES
`
`VOLUME 6
`
`Editors-in-Chief
`
`John Karat, IBM Thomas Watson Research Center (USA)
`Jean Vanderdonckt, Université Catholique de Louvain (Belgium)
`
`Editorial-Board
`
`Gregory Abowd, Georgia Institute of Technology (USA)
`Gaëlle Calvary, IIHM-CLIPS-IMAG (France)
`John Carroll, Virginia Tech (USA)
`Gilbert Cockton, University of Sunderland (United Kingdom)
`Mary Czerwinski, Microsoft Research (USA)
`Steve Feiner, Columbia University (USA)
`Elizabeth Furtado, University of Fortaleza (Brazil)
`Kristiana Höök, SICS (Sweden)
`Robert Jacob, Tufts University (USA)
`Robin Jeffries, Sun Microsystems (USA)
`Peter Johnson, University of Bath (United Kingdom)
`Kumiyo Nakakoji, University of Tokyo (Japan)
`Philippe Palanque, Université Paul Sabatier (France)
`Oscar Pastor, University of Valencia (Spain)
`Fabio Paternò, CNUCE-CNR (Italy)
`Costin Pribeanu, National Institute for Research & Development
`in Informatics (Romania)
`Marilyn Salzman, Salzman Consulting (USA)
`Chris Schmandt, Massachussetts Institute of Technology (USA)
`Markus Stolze, IBM Zürich (Switzerland)
`Gerd Szwillus, Universität Paderborn (Germany)
`Manfred Tscheligi, Center for Usability Research and Engineering (Austria)
`Gerrit van der Veer, Vrije Universiteit Amsterdam (The Netherlands)
`Shumin Zhai, IBM Almaden Research Center (USA)
`
`The titles published in this series are listed at the end of this volume.
`
`Page iii
`
`

`

`Personalized Digital Television
`
`Targeting Programs to Individual Viewers
`
`Edited by
`
`Liliana Ardissono
`Dipartimento di Informatica,
`Università di Torino, Italy
`
`Alfred Kobsa
`University of California,
`Irvine, CA, U.S.A.
`
`and
`
`Mark Maybury
`Information Technology Division,
`
`The MITRE Corporation, Bedford, MA, U.S.A.
`
`KLUWER ACADEMIC PUBLISHERS
`
`NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
`
`Page iv
`
`

`

`eBook ISBN:
`Print ISBN:
`
`1-4020-2164-X
`
`1-4020-2163-1
`
`©2004 Kluwer Academic Publishers
`New York, Boston, Dordrecht, London, Moscow
`
`Print ©2004 Kluwer Academic Publishers
`Dordrecht
`
`All rights reserved
`
`No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,
`mechanical, recording, or otherwise, without written consent from the Publisher
`
`Created in the United States of America
`
`Visit Kluwer Online at:
`and Kluwer's eBookstore at:
`
`http://kluweronline.com
`http://ebooks.kluweronline.com
`
`Page v
`
`

`

`TABLE OF CONTENTS
`
`Preface
`
`Introduction
`
`Part 1: Electronic Program Guides
`
`1. User Modeling and Recommendation Techniques for Personalized
`Electronic Program Guides
`Pietro
`Liliana
`Ardissono,
`Cristina Gena,
`Fabio Bellifemine, Angelo Difino and Barbara Negro
`
`Torasso,
`
`2. TV Personalization System. Design of a TV Show Recommender
`Engine and Interface
`John Zimmerman, Kaushal Kurapati, Anna L. Buczak,
`Dave Schaffer, Srinivas Gutta and Jacquelyn Martino
`
`3. Case-Studies on the Evolution of the Personalized Electronic
`Program Guide
`Barry Smyth and Paul Cotter
`
`4.
`
`Interactive Television Personalization.
`Programs
`Derry O’ Sullivan, Barry Smyth, David Wilson, Kieran
`Mc Donald and Alan F. Smeaton
`
`From Guides
`
`to
`
`5. Group modeling: Selecting a Sequence of Television Items to Suit a
`Group of Viewers
`Judith Masthoff
`
`6. Categorization of Japanese TV Viewers Based on Program Genres
`They Watch
`Yumiko Hara, Yumiko Tomomune and Maki Shigemori
`
`Part 2: Broadcast News and Personalized Content
`
`7. Personalcasting: Tailored Broadcast News
`Mark Maybury, Warren Greiff, Stanley Boykin, Jay Ponte,
`Chad McHenry and Lisa Ferro
`
`vii
`
`ix
`
`1
`
`3
`
`27
`
`53
`
`73
`
`93
`
`143
`
`175
`
`177
`
`Page vi
`
`

`

`vi
`
`TABLE OF CONTENTS
`
`8. Media Augmentation and Personalization through Multimedia
`Processing and Information Extraction
`Janevski,
`Nevenka Dimitrova,
`John Zimmerman, Angel
`Lalitha Agnihotri, Norman Haas, Dongge Li, Ruud Bolle,
`Senem Velipasalar, Thomas McGee and Lira Nikolovska
`
`9. ContentMorphing: A Novel System for Broadcast Delivery
`of Personalizable Content
`Avni Rambhia, Gene Wen, Spencer Cheng
`
`Part 3: ITV User Interfaces
`
`10. Designing Usable Interfaces for TV Recommender Systems
`Jeroen van Barneveld and Mark van Setten
`
`11. The Time-Pillar World. A 3D Paradigm for the New Enlarged TV
`Information Domain
`Fabio Pittarello
`
`203
`
`235
`
`257
`
`259
`
`287
`
`Page vii
`
`

`

`Chapter 8
`
`Media Augmentation and Personalization Through
`Multimedia Processing and Information Extraction
`
`NEVENKA DIMITROVA1, JOHN ZIMMERMAN2, ANGEL JANEVSKI1,
`LALITHA AGNIHOTRI1, NORMAN HAAS3, DONGGE LI4, RUUD BOLLE3,
`SENEM VELIPASALAR3, THOMAS MCGEE1 and LIRA NIKOLOVSKA5
`1Philips Research, 345 Scarborough Rd., Briarcli¡ Manor, NY 10510, USA.
`e-mail: {Nevenka.Dimitrova,Angel.Janevski,Lalitha.Agnihotri,Tom.McGee}@philips.com
`2Human-Computer Interaction Institute, Carnegie Mellon, Pittsburgh, PA, USA.
`e-mail: johnz@cs.cmu.edu
`3IBM T.J. Watson, 30 Saw Mill River Road, Hawthorne, NY 10532, USA.
`e-mail: {nhaas,bolle}@us.ibm.com
`4Motorola Labs, 1301 East Algonquin Road, Schaumburg, Illinois 60196.
`e-mail: dongge.li@motorola.com
`5MIT, Department of Architecture, 265 Massachusetts Avenue N51-340, Cambridge,
`MA 02139, USA. e-mail: lira@mit.edu
`
`Abstract. This chapter details the value and methods for content augmentation and persona-
`lization among di¡erent media such as TV and Web. We illustrate how metadata extraction
`can aid in combining di¡erent media to produce a novel content consumption and interaction
`experience. We present two pilot content augmentation applications. The ¢rst, called MyInfo,
`combines automatically segmented and summarized TV news with information extracted from
`Web sources. Our news summarization and metadata extraction process employs text summari-
`zation, anchor detection and visual key element selection. Enhanced metadata allows matching
`against the user pro¢le for personalization. Our second pilot application, called InfoSip, performs
`person identi¢cation and scene annotation based on actor presence. Person identi¢cation relies
`on visual, audio, text analysis and talking face detection. The InfoSip application links person
`identity information with ¢lmographies and biographies extracted from the Web, improving
`the TV viewing experience by allowing users to easily query their TVs for information about
`actors in the current scene.
`
`Key words. content augmentation, personalization, pro¢le, personal news, video indexing, video
`segmentation, video summarization, information extraction, TV interface, user interface design,
`interactive TV
`
`1. Introduction
`
`For many years, people have enjoyed using their televisions as a primary means for
`obtaining news, information and entertainment, because of the rich viewing experi-
`ence it provides. TVs o¡er viewers a chance to instantly connect with people and
`places around the world. We call this a lean-back approach to content consumption.
`More recently the Web has emerged as a comparably rich source of content. However,
`
`L. Ardissono et al. (eds.), Personalized Digital Television, 203^233, 2004.
`# 2004 Kluwer Academic Publishers. Printed in the Netherlands.
`
`Page 203
`
`

`

`204
`
`NEVENKA DIMITROVA ET AL.
`
`unlike TV, which allows users to select only channels, the Web o¡ers users much more
`interactive access to expanding volumes of data from PCs and laptops. We call this
`a lean-forward approach to content. We explore the process and value of linking
`content from these two di¡erent, yet related, media experiences. We want to generate
`a lean-natural approach that combines the best of these two media and marries it
`to users’ lifestyles.
`At a high level we wanted to explore how cross-media information linking and
`personalization generates additional value for content. We call this research direction
`Content Augmentation. As an example, imagine a user watches a movie that has
`characters gambling in Las Vegas. A content augmentation application can extract
`the location from the movie, then, in anticipation of the user’s inquiry, it can peruse
`the Web for supplemental information such as the prices and availability of rooms
`in the casino featured in the ¢lm, instructions for the game the characters play, infor-
`mation on the design and history of the hotel, etc. In addition, this application
`can employ a user pro¢le, personalizing the linked content by prioritizing the types
`of links a user most often explores.
`To test this model, we developed a pilot system. We began by focus group-testing
`several concepts, and, based on the group’s reaction, designed and implemented a per-
`sonal news application (MyInfo) and a movie information retrieval application (Info-
`Sip) that enhances the traditional media experience by combining Web and TV content.
`This paper details current TV experience (Section 2.1), related work in content
`understanding and Web/TV information linking (Section 2.2), our user-centered
`design process (Section 3.1), pilot applications (Sections 3.2 and 3.3), system overview
`(Section 4), multimedia annotation and integration methods (Section 5), Web
`information extraction methods
`(Section 6), and our personalization model
`(Section 7). We present our conclusions in Section 8.
`
`2. Augmented User Experience
`
`The current TV experience grows out of a 50-year tradition of broadcasters trying to
`capture a mass audience. They used both demographic data and input from adver-
`tisers to determine which programs to play at the various times of day. More recently,
`the emergence of niche-based TV channels such as CNN (news), MTV (music), ESPN
`(sports), and HGTV (home and garden) allows viewers more control over when they
`view the content they desire. In addition, the arrival of electronic program guides
`(EPGs) have allowed viewers to browse the program o¡erings by genres, date, time,
`channel, title, and, in some cases, search using keywords, a big step forward over
`traditional paper guides that allow access by time and channel only.
`
`2.1. THE CURRENT TV NAVIGATION AND PERSONALIZATION
`
`Current EPGs found in digital satellite settop boxes, cable settop boxes, and personal
`video recorders from TiVo (www.tivo.com) and ReplayTV (www.digitalnetworksna.
`
`Page 204
`
`

`

`MEDIA AUGMENTATION AND PERSONALIZATION
`
`205
`
`com/replaytv/default.asp) o¡er users advanced methods for ¢nding something to
`watch or record. These systems generally hold one to two weeks’ worth of TV data,
`including program titles, synopses, genres, actors, producers, directors, times of
`broadcast, and channels. Viewers can use EPGs to browse listings by time, channel,
`genre, or program title. In addition, viewers can search for speci¢c titles, actors,
`directors, etc. Finally, the TiVo system o¡ers a recommender that lists highly rated
`programs and automatically records these programs when space is available on
`its hard disk.
`Although TiVo is currently the only commercial product with a recommender,
`much personalization research has been done in this area. Das and Horst developed
`the TV Advisor, where users enter their explicit preferences in order to produce a
`list of recommendations (Das et al., 1998). Cotter and Smyth’s PTV uses a mixture
`of case-based reasoning and collaborative ¢ltering to learn users’ preferences in order
`to generate recommendations (Cotter et al., 2000). Ardissono et al. created the Per-
`sonalized EPG that employs an agent-based system designed for settop box operation
`(Ardissono et al., 2001). Three user modeling modules collaborate in preparing
`the ¢nal recommendations: Explicit Preferences Expert, Stereotypical Expert, and
`Dynamic Expert. And Zimmerman et al. developed a recommender that uses a neural
`network to combine results from both an explicit and an implicit recommender
`(Zimmerman et al. This Volume). What all these recommenders have in common
`is that they only examine program-level metadata. They do not have any detailed
`understanding of the program, and cannot help users ¢nd interesting segments within
`a TV program.
`There has been also research in personalization related to adaptive hypermedia
`systems (Brusilovsky, 2003). These systems build a model of the goals, preferences
`and knowledge of each individual user, and use this model throughout the interaction
`with the user, in order to adapt to the needs of that user.
`The Video Scout project we previously developed o¡ers an early view of persona-
`lization at a subprogram level (Jasinschi et al., 2001, Zimmerman et al., 2001). Video
`Scout o¡ers users two methods for personalizing the TV experience. First, Scout
`can display TV show segments (Figure 8.1). For example, it segments talk shows into
`host/guest segments, displays musical performances and individual jokes. Second,
`Scout o¡ers a user interface element called ‘TV magnets’ (Figure 8.2). If users specify
`¢nancial news topics and celebrity names, then Scout watches TV and stores matching
`segments, monitoring the contents of talk shows for celebrity clips and searching
`the contents of ¢nancial news programs for ¢nancial news stories. Subprogram level
`access to TV programs improves the TV experience by allowing users more control
`over the content they watch.
`
`2.2. RELATED WORK IN CONTENT ANALYSIS AND ENHANCED TV
`
`Recently, there has been increasing interest in hyperlinking video with supplemental
`information. Examples include Microsoft and CBS’s interactive TV (Microsoft
`
`Page 205
`
`

`

`206
`
`NEVENKA DIMITROVA ET AL.
`
`Figure 8.1. Talk show segmented into host and guest segments.
`
`1997), ABC’s enhanced TV (ABC 2003), the HyperSoap project at the MIT Media
`Lab (Dakss), and Jiang and Elmagarmed’s work on their Logical Hypervideo Data
`Model (Jiang et al., 1998).
`In 1997 at the National Association of Broadcaster’s Expo, we saw Microsoft
`demonstrate their Enhanced TV concept. This concept allowed users to see Internet
`data associated with a TV program while watching the program. The Internet content
`
`Figure 8.2. Financial news magnet screen with four stored clips from two TV shows.
`
`Page 206
`
`

`

`MEDIA AUGMENTATION AND PERSONALIZATION
`
`207
`
`appeared on the side and bottom of the TV screen while the TV show played. Since
`then Microsoft has been working with broadcasters such as CBS to deliver interactive
`TV versions of the Grammy Awards, NCAA Basketball, and even TV dramas like
`CSI (Microsoft 2000). The current implementation works only for users with WebTV
`plus service or with a Microsoft UltimateTV settop box.
`ABC’s enhanced TV broadcasts allow users to view supplemental information such as
`player statistics for football games, answer questions for game shows, and answer polling
`questions for talk and news shows (ABC 2003). The interaction takes place on a
`computer displaying synchronized Webcast data that corresponds to events on the
`TV show. The current implementation can make it di⁄cult for users, as their attention
`is needed on two screens simultaneously. In addition, the lean forward model of com-
`puter use is not completely appropriate for the more lean back task of watching TV.
`Both the Microsoft/CBS and the ABC products combine Internet content with TV
`shows. However, neither allows users much freedom to explore. The Internet content
`is packaged and sent to users by the same people who created the TV program. Also,
`neither product personalizes either the TV show or the Internet content for individual
`users.
`Another concept called ‘HyperSoap’ (Dakss et al.) allows TV viewers using a
`special remote control to point at clothing, props and other furnishings on a soap
`opera in order to learn how they can be purchased. The research group studied
`how people interact with hyperlinked video and employed this information in devel-
`oping di¡erent modes of interaction. The design of the system matches current
`TV viewing in that it allows users to interact with a remote control. However,
`one clear challenge for this model is how to deal with objects that jump around
`on the screen as the story jumps from cut to cut.
`
`Figure 8.3. Weather screen with Web story highlighted.
`
`Page 207
`
`

`

`208
`
`NEVENKA DIMITROVA ET AL.
`
`Figure 8.4. Headlines screen with TV story highlighted.
`
`Jiang and Elmagarmed have introduced a novel video data model called ‘Logical
`Hypervideo Data Model’ (Jiang et al. 1998). The model is capable of representing
`multilevel video abstractions with video entities that users are interested in (de¢ned
`as hot objects) and their semantic associations with other logical video abstractions,
`including hot objects themselves. The semantic associations are modeled as video
`hyperlinks and video data with such property are called hypervideo. Video hyperlinks
`provide a £exible and e¡ective way of browsing video data. However, in this system,
`all the associations are derived manually. Users communicate with the system using
`a query language. This method of interaction allows them to explore information,
`but con£icts with the lean back model of TV viewing.
`Broadcast news analysis and retrieval for various purposes has also been an active
`area of research for a number of years. We created an initial ‘Personal News Retrieval
`System’ in 1996 to test the feasibility of video broadcast ¢ltering in the news domain
`(Elenbaas et al. 1999). The news broadcasts from di¡erent channels were semi-
`automatically indexed on a server. A client application invoked from a Web
`browser allows users to search individual stories. Searching is based on anchorperson,
`broadcaster, category, location, top-stories and keywords.
`Merlino et al. developed the ‘Broadcast News Editor/Navigator’ (BNE/BNN)
`(Merlino et al., 1997). They rely on the format of the broadcast to be broken down
`into series of states, such as start of broadcast, advertising, new story, and end
`of broadcast. They use multi-source cues such as text cues (‘back to you in New
`York’), audio silence to ¢nd commercials, and visual cues such as black frame
`and single and double booth anchor recognition.
`Hanjalic and his colleagues describe a semi-automatic news analysis method based on
`pre-selection of categories (Hanjalic et al., 1999). They ¢nd anchorperson shots, using
`
`Page 208
`
`

`

`MEDIA AUGMENTATION AND PERSONALIZATION
`
`209
`
`a template for matching the shots by matching individual frames. Also, they incorporated
`a simple word-spotting algorithm to form reports and use this for topic speci¢cation.
`Other systems have been reported in the literature dealing with the news retrieval (Ahanger
`et al., 1997, Brown et al., 1995, Chen et al., 1997, Maybury 2000). In addition, there
`is very recent research that performs automated segmentation of news and user modeling
`to generate personalcasts (Maybury et al., this volume).
`Broadcast TV companies have also tried to come up with Internet versions of their
`content. For example, CNN has a limited number of current stories and an archive
`of old ones available in Real-video or MPEG-4 (netshow) format. (See http://
`www.cnn.com/videoselect/for more details.)
`The di¡erence between our applications MyInfo and InfoSip and the cited systems
`is threefold: (i) our applications integrate both Web and TV content, as opposed limit-
`ing users to a single source, (ii) our interface employs a TV-like interaction, and
`(iii) MyInfo performs extensive prioritization and personalization based on detailed
`user preferences.
`
`3. Pilot Applications
`
`In order to explore and demonstrate the usefulness of content augmentation, we
`applied a selective process of ¢ltering initial ideas and concepts. In this section,
`we present our process and the pilot applications.
`MyInfo and InfoSip are both designed to enhance the features of a Personal Video
`Recorder (PVR) such as a TiVo, ReplayTV, or UltimateTV. These hard disk-based
`settop boxes currently allow users to easily store large numbers of shows. The
`segmented news stories, movies and supplemental information from the Web will
`all be stored on a PVR for access by users using a traditional remote control
`that has a few additional buttons. These applications are not currently intended
`to work with live broadcasts.
`
`3.1. THE DESIGN PROCESS
`
`We began by conducting a brainstorming session that included engineers and
`designers with experience in video processing, Web information retrieval, and
`Web and interactive TV design. We produced twenty concepts that coalesced into
`the following themes:
`
`^ Connect: Connect users with each other, with their community; with the live
`world.
`^ Explore: Support users’ ability to move deeper into a speci¢c topic. Allow users to
`specify the level of detail they require.
`^ Anticipate: Extract, classify, and summarize information before users request it.
`^ Summarize: Reduce overwhelming amounts of content (especially redundant
`content) into appropriate chunks based on user context.
`
`Page 209
`
`

`

`210
`
`NEVENKA DIMITROVA ET AL.
`
`After concept generation, we conducted two focus group sessions. Our focus group
`consisted of four men and four women living in the suburbs near New York City.
`They came from di¡erent educational, ethnic, and socio-economic backgrounds;
`however, they all enjoyed watching TV and all had access to and experience with
`using the Web.
`Our ¢rst session focused on evaluating and prioritizing the di¡erent concepts. In
`addition, participants shared their current strategies, preferences, and gripes for
`watching TV and collecting information from the Web. The following two concepts
`received particularly high ratings from participants:
`
`1. Personal News: the application supplements TV news stories with richer detail
`obtained from the Web.
`2. Actor Info: the application displays Web links for actors in the movie currently
`being viewed.
`
`Our second focus group employed the same participants, and used a participatory
`design approach to better de¢ne the pilot applications. In exploring the personal
`news concept, participants revealed that they currently sought out news using a niche
`sur¢ng technique. When they wanted to know something like the price of a stock,
`the outcome of a sporting event or the weather, they would tune their TVs to
`an appropriate channel such as ESPN (sports), MSNBC (¢nance), or the Weather
`Channel and then wait for the information to appear. They generally did not
`use the Web for this sort of high-level news because it required them to abandon
`household tasks such as making breakfast or folding laundry in order to go upstairs
`and boot a computer. They desired a system that o¡ered faster access to personal
`news around the themes of sports, ¢nance, tra⁄c, weather,
`local events, and
`headlines. They wanted access to the freshest information for these content zones
`from any TV in their home.
`In exploring the Actor Info application, participants really liked the idea of viewing
`supplemental information for a movie, but they did not want to be interrupted. Instead
`they wanted to be able to easily ask questions such as: Who’s that actor? What’s that
`song? Where are they? What kind of shoes are those? etc. They wanted the answers
`to these questions to appear immediately on the screen in an overlay. This way, they
`could get the information they wanted without interruption. They did not want links
`to Web sites. Instead, they wanted much more digested and summarized information.
`For more detail on the design process, please see (Zimmerman et al., this volume).
`
`3.2. MYINFO
`
`Users access the MyInfo application via a remote control. They can select any of the
`six content zones identi¢ed by the focus group in order to see personal Web extracted
`data and the latest TV stories that match this zone. In addition, users can press a
`button labeled ‘MyInfo’ in order to see a personalized TV news broadcast that
`displays TV news and Web extracted data from all of the content zones.
`
`Page 210
`
`

`

`MEDIA AUGMENTATION AND PERSONALIZATION
`
`211
`
`The interface displays an expanded story on the left, and a prioritized list of stories
`on the right. The top story always contains the Web-extracted information, which
`matches speci¢c request in the user pro¢le. The Web-extracted information includes:
`for weather, a four-day forecast for the speci¢ed zip code; for sports, the latest scores
`and upcoming games for speci¢ed teams; for local events, a prioritized listing by
`how soon the event will happen, distance from the home, and degree of match to
`keywords in pro¢le; for tra⁄c, delays for user-speci¢ed routes and ‘hot-spots’;
`and for ¢nance, current prices for stocks, change in price, and percent change for
`indexes, stocks, and funds listed in the pro¢le.
`By pressing the NEXT button, users can navigate down in the list of stories. This
`allows them to e¡ectively skip stories they do not want to hear. In addition, they
`can press the PLAY-ALL button in order to automatically play all the stories in a
`single content zone. The interaction supports users’ lifestyles, and takes a step towards
`a lean-natural
`interface. Users can quickly check information such as weather
`and tra⁄c right before they leave their homes. They can also play back all, or sections
`of, the personalized news as a TV show, leaving themselves free to carry out tasks
`in their homes such as eating, cooking, and laundry.
`
`3.3. INFOSIP
`
`The InfoSip pilot application allows users to sip information about actors in a scene
`while watching a movie. Users press the WHO button on the remote control and
`detailed information appears at the bottom of the screen. Currently, our system pro-
`vides an image, a biography, a ¢lmography, and current rumors, for all actors in
`the current scene (Figure 8.5). We manually extract the image from the video,
`but we hope to automate this process using our actor identi¢cation algorithms
`
`Figure 8.5. InfoSip screen.
`
`Page 211
`
`

`

`212
`
`NEVENKA DIMITROVA ET AL.
`
`(Section 5.5). The descriptive information is automatically extracted from the Web.
`This application has an advantage over supplemental metadata supplied on DVDs,
`in that it is always up to date. In the example below, Tim Robbins’ ¢lmography details
`work he did in 2002, even though the source movie, Robert Altman’s The Player,
`was released in 1992.
`During the collaborative design session, the participants stated that they often saw
`an actor whom they recognized but could not place. They wanted a simple method
`of selecting one of the actors, and seeing enough information to help them remember
`where they had seen that actor before. The decision to display all of the actors in
`the current scene takes a step towards a lean-natural interface by allowing users
`to both sip the metadata and view the movie simultaneously. Listing all actors in
`the movie would generate too large a list to navigate and would run the risk of drawing
`the user away from watching the movie. Displaying only the actors currently on screen
`would often require users to scan back in the movie, because, by the time they realized
`they wanted the information and grabbed the remote control, the shot with the actor
`they wanted might have ended. The ¢lmographies have two pieces of additional
`information that support functionality that was designed but not yet implemented.
`Their display can be personalized by using a viewing history to highlight movies
`the user has seen the speci¢c actor in, aiding the recognition task. In addition, when
`¢lmographies contain movies that match movies scheduled for broadcast, users
`can use this interface to select movies for recording.
`
`3.4. DEMONSTRATION
`
`We developed these applications to stimulate conversations between stakeholders in
`the TV/Web content value chain, from media producers, packagers, distributors
`to media consumers. The original idea was to develop these applications as demon-
`strators in order to explore the target applications for consumers. We hoped to
`use the applications to generate business models and new application concepts with
`colleagues in the content creation, broadcasting, and distribution domains. However,
`in the future, we plan to perform a qualitative evaluation of these applications with
`users.
`
`4. System Overview
`
`The system diagram in Figure 8.6 shows the high-level chain of content processing and
`augmentation. Unannotated or partially annotated content is delivered to the service
`provider (e.g. content provider, broadcaster) where generic analysis and augmentation
`is performed.
`Content and (optionally) metadata are delivered to the ¢rst step (Feature Extrac-
`tion and Integration) of the processing chain. At the server stage of the augmentation,
`the system extracts features and summarizes the content, generating descriptive meta-
`data. (A more detailed description of this step is given in Section 5.) The generated
`
`Page 212
`
`

`

`MEDIA AUGMENTATION AND PERSONALIZATION
`
`213
`
`Figure 8.6. Content Augmentation system diagram.
`
`metadata, in conjunction with any existing metadata, is then used to augment the
`content with additional information from Web sources. This information is provided
`by using Information Extraction from Web pages (WebIE), as described in Section
`6. The augmentation (Augmentation) that occurs at the server side is general, in that
`it is not based on any personal pro¢le. Following broadcaster augmentation, the
`content with the complete metadata is formatted and delivered to the consumer
`device (Formatting).
`The remaining augmentation is performed in the client stage. Here, a consumer
`device has the capability of storing content, metadata (in Storage), and user pro¢le
`(User Pro¢le). The device also has a prioritization module that relies on the user pro-
`¢le. This is used to perform a secondary augmentation (Augmentation) with Web
`information (WebIE), but this time based solely on user preferences. The information
`obtained is stored together with the content and is presented to users (Interaction
`Engine) as if it were a part of the original program. One of the reasons we kept
`all personalization on the client was to help insure privacy, a major concern of users
`in our focus group.
`There are several delivery pathways for the augmentation data, depending on the
`implementation of the system and the business model. Encoding metadata with
`the media is the most straightforward approach to delivering augmentation, but alter-
`native pathways are also possible. Web broadcasts or subscription-based data
`
`Page 213
`
`

`

`214
`
`NEVENKA DIMITROVA ET AL.
`
`retrieval can also o¡er localized or personalized versions of the augmentation data.
`Finally, the principle division in the server and client stage in Figure 8.6 is mainly
`to emphasize various aspects of the system. Implementations of the system where
`various client functions are provided by the server, and, inversely, server functions
`performed by the client, are possible.
`
`5. Content Processing
`
`Methods for automatic metadata extraction can be divided into coarse- and ¢ne-grain
`segmentation and abstraction. In this section, we brie£y introduce the methods used
`for our applications. For MyInfo, we coarsely segment the news broadcast into indi-
`vidual stories as described in Section 5.1. Next, each story is summarized by a repre-
`sentative textual summary and a frame that captures the visual summary. Text
`summarization is described in Section 5.2. Visual summarization is performed by
`detection shots of the news anchor (as described in Section 5.3) and selection of
`the most important visual key element (as described in Section 5.4.) For InfoSip,
`we apply person identi¢cation using both face and voice identi¢cation, as described
`in Section 5.5.
`
`5.1. COARSE SEGMENTATION
`
`Our approach exploits well-known, previously reported, cues to segment commercials
`and news segments from news programs (Merlino et al. 1997 and Boykin et al.
`1999). We ¢rst ¢nd the commercial breaks in a particular news program, and then
`we perform story segmentation within the news portion. For stories, we use the story
`break markup (‘>>>’) in the closed captioning. In addition, we have investigated
`the detection of story segment boundaries at a macrosegment level (McGee et al.
`1999, Dimitrova et al. 2003).
`There is a variety of commercial detectors that perform text, audio, and visual
`analysis to determine if TV programs contain commercial breaks (Blum 1992, Bonner
`et al., 1982, Boykin et al., 1999, Merlino et al., 1997). Since our domain consists
`of ‘commercial aware’ programs, in which the anchors announce that a commercial
`break is coming up, we were able to use a computationally inexpensive, genre-speci¢c,
`text-based commercial detector. In part, this relies on the absence of closed captioning
`for 30 seconds or more, and in part, it relies on the news anchors using cue phrases
`to segue to/from the commercials, such as, ‘coming up after the break’ and ‘welcome
`back’. We look for onset cues such as ‘right back’, ‘come back’, ‘u

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket