`
`Kluwer Academic Publishers
`
`AMAZON EX.1011
`
`Amazon v. CustomPlay
`
`US Patent No. 9,124,950
`
`
`
`Page i
`
`Page i
`
`AMAZON EX. 1011
`Amazon v. CustomPlay
`US Patent No. 9,124,950
`
`
`
`PERSONALIZED DIGITAL TELEVISION
`
`Page ii
`
`
`
`HUMAN-COMPUTER INTERACTION SERIES
`
`VOLUME 6
`
`Editors-in-Chief
`
`John Karat, IBM Thomas Watson Research Center (USA)
`Jean Vanderdonckt, Université Catholique de Louvain (Belgium)
`
`Editorial-Board
`
`Gregory Abowd, Georgia Institute of Technology (USA)
`Gaëlle Calvary, IIHM-CLIPS-IMAG (France)
`John Carroll, Virginia Tech (USA)
`Gilbert Cockton, University of Sunderland (United Kingdom)
`Mary Czerwinski, Microsoft Research (USA)
`Steve Feiner, Columbia University (USA)
`Elizabeth Furtado, University of Fortaleza (Brazil)
`Kristiana Höök, SICS (Sweden)
`Robert Jacob, Tufts University (USA)
`Robin Jeffries, Sun Microsystems (USA)
`Peter Johnson, University of Bath (United Kingdom)
`Kumiyo Nakakoji, University of Tokyo (Japan)
`Philippe Palanque, Université Paul Sabatier (France)
`Oscar Pastor, University of Valencia (Spain)
`Fabio Paternò, CNUCE-CNR (Italy)
`Costin Pribeanu, National Institute for Research & Development
`in Informatics (Romania)
`Marilyn Salzman, Salzman Consulting (USA)
`Chris Schmandt, Massachussetts Institute of Technology (USA)
`Markus Stolze, IBM Zürich (Switzerland)
`Gerd Szwillus, Universität Paderborn (Germany)
`Manfred Tscheligi, Center for Usability Research and Engineering (Austria)
`Gerrit van der Veer, Vrije Universiteit Amsterdam (The Netherlands)
`Shumin Zhai, IBM Almaden Research Center (USA)
`
`The titles published in this series are listed at the end of this volume.
`
`Page iii
`
`
`
`Personalized Digital Television
`
`Targeting Programs to Individual Viewers
`
`Edited by
`
`Liliana Ardissono
`Dipartimento di Informatica,
`Università di Torino, Italy
`
`Alfred Kobsa
`University of California,
`Irvine, CA, U.S.A.
`
`and
`
`Mark Maybury
`Information Technology Division,
`
`The MITRE Corporation, Bedford, MA, U.S.A.
`
`KLUWER ACADEMIC PUBLISHERS
`
`NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
`
`Page iv
`
`
`
`eBook ISBN:
`Print ISBN:
`
`1-4020-2164-X
`
`1-4020-2163-1
`
`©2004 Kluwer Academic Publishers
`New York, Boston, Dordrecht, London, Moscow
`
`Print ©2004 Kluwer Academic Publishers
`Dordrecht
`
`All rights reserved
`
`No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,
`mechanical, recording, or otherwise, without written consent from the Publisher
`
`Created in the United States of America
`
`Visit Kluwer Online at:
`and Kluwer's eBookstore at:
`
`http://kluweronline.com
`http://ebooks.kluweronline.com
`
`Page v
`
`
`
`TABLE OF CONTENTS
`
`Preface
`
`Introduction
`
`Part 1: Electronic Program Guides
`
`1. User Modeling and Recommendation Techniques for Personalized
`Electronic Program Guides
`Pietro
`Liliana
`Ardissono,
`Cristina Gena,
`Fabio Bellifemine, Angelo Difino and Barbara Negro
`
`Torasso,
`
`2. TV Personalization System. Design of a TV Show Recommender
`Engine and Interface
`John Zimmerman, Kaushal Kurapati, Anna L. Buczak,
`Dave Schaffer, Srinivas Gutta and Jacquelyn Martino
`
`3. Case-Studies on the Evolution of the Personalized Electronic
`Program Guide
`Barry Smyth and Paul Cotter
`
`4.
`
`Interactive Television Personalization.
`Programs
`Derry O’ Sullivan, Barry Smyth, David Wilson, Kieran
`Mc Donald and Alan F. Smeaton
`
`From Guides
`
`to
`
`5. Group modeling: Selecting a Sequence of Television Items to Suit a
`Group of Viewers
`Judith Masthoff
`
`6. Categorization of Japanese TV Viewers Based on Program Genres
`They Watch
`Yumiko Hara, Yumiko Tomomune and Maki Shigemori
`
`Part 2: Broadcast News and Personalized Content
`
`7. Personalcasting: Tailored Broadcast News
`Mark Maybury, Warren Greiff, Stanley Boykin, Jay Ponte,
`Chad McHenry and Lisa Ferro
`
`vii
`
`ix
`
`1
`
`3
`
`27
`
`53
`
`73
`
`93
`
`143
`
`175
`
`177
`
`Page vi
`
`
`
`vi
`
`TABLE OF CONTENTS
`
`8. Media Augmentation and Personalization through Multimedia
`Processing and Information Extraction
`Janevski,
`Nevenka Dimitrova,
`John Zimmerman, Angel
`Lalitha Agnihotri, Norman Haas, Dongge Li, Ruud Bolle,
`Senem Velipasalar, Thomas McGee and Lira Nikolovska
`
`9. ContentMorphing: A Novel System for Broadcast Delivery
`of Personalizable Content
`Avni Rambhia, Gene Wen, Spencer Cheng
`
`Part 3: ITV User Interfaces
`
`10. Designing Usable Interfaces for TV Recommender Systems
`Jeroen van Barneveld and Mark van Setten
`
`11. The Time-Pillar World. A 3D Paradigm for the New Enlarged TV
`Information Domain
`Fabio Pittarello
`
`203
`
`235
`
`257
`
`259
`
`287
`
`Page vii
`
`
`
`Chapter 8
`
`Media Augmentation and Personalization Through
`Multimedia Processing and Information Extraction
`
`NEVENKA DIMITROVA1, JOHN ZIMMERMAN2, ANGEL JANEVSKI1,
`LALITHA AGNIHOTRI1, NORMAN HAAS3, DONGGE LI4, RUUD BOLLE3,
`SENEM VELIPASALAR3, THOMAS MCGEE1 and LIRA NIKOLOVSKA5
`1Philips Research, 345 Scarborough Rd., Briarcli¡ Manor, NY 10510, USA.
`e-mail: {Nevenka.Dimitrova,Angel.Janevski,Lalitha.Agnihotri,Tom.McGee}@philips.com
`2Human-Computer Interaction Institute, Carnegie Mellon, Pittsburgh, PA, USA.
`e-mail: johnz@cs.cmu.edu
`3IBM T.J. Watson, 30 Saw Mill River Road, Hawthorne, NY 10532, USA.
`e-mail: {nhaas,bolle}@us.ibm.com
`4Motorola Labs, 1301 East Algonquin Road, Schaumburg, Illinois 60196.
`e-mail: dongge.li@motorola.com
`5MIT, Department of Architecture, 265 Massachusetts Avenue N51-340, Cambridge,
`MA 02139, USA. e-mail: lira@mit.edu
`
`Abstract. This chapter details the value and methods for content augmentation and persona-
`lization among di¡erent media such as TV and Web. We illustrate how metadata extraction
`can aid in combining di¡erent media to produce a novel content consumption and interaction
`experience. We present two pilot content augmentation applications. The ¢rst, called MyInfo,
`combines automatically segmented and summarized TV news with information extracted from
`Web sources. Our news summarization and metadata extraction process employs text summari-
`zation, anchor detection and visual key element selection. Enhanced metadata allows matching
`against the user pro¢le for personalization. Our second pilot application, called InfoSip, performs
`person identi¢cation and scene annotation based on actor presence. Person identi¢cation relies
`on visual, audio, text analysis and talking face detection. The InfoSip application links person
`identity information with ¢lmographies and biographies extracted from the Web, improving
`the TV viewing experience by allowing users to easily query their TVs for information about
`actors in the current scene.
`
`Key words. content augmentation, personalization, pro¢le, personal news, video indexing, video
`segmentation, video summarization, information extraction, TV interface, user interface design,
`interactive TV
`
`1. Introduction
`
`For many years, people have enjoyed using their televisions as a primary means for
`obtaining news, information and entertainment, because of the rich viewing experi-
`ence it provides. TVs o¡er viewers a chance to instantly connect with people and
`places around the world. We call this a lean-back approach to content consumption.
`More recently the Web has emerged as a comparably rich source of content. However,
`
`L. Ardissono et al. (eds.), Personalized Digital Television, 203^233, 2004.
`# 2004 Kluwer Academic Publishers. Printed in the Netherlands.
`
`Page 203
`
`
`
`204
`
`NEVENKA DIMITROVA ET AL.
`
`unlike TV, which allows users to select only channels, the Web o¡ers users much more
`interactive access to expanding volumes of data from PCs and laptops. We call this
`a lean-forward approach to content. We explore the process and value of linking
`content from these two di¡erent, yet related, media experiences. We want to generate
`a lean-natural approach that combines the best of these two media and marries it
`to users’ lifestyles.
`At a high level we wanted to explore how cross-media information linking and
`personalization generates additional value for content. We call this research direction
`Content Augmentation. As an example, imagine a user watches a movie that has
`characters gambling in Las Vegas. A content augmentation application can extract
`the location from the movie, then, in anticipation of the user’s inquiry, it can peruse
`the Web for supplemental information such as the prices and availability of rooms
`in the casino featured in the ¢lm, instructions for the game the characters play, infor-
`mation on the design and history of the hotel, etc. In addition, this application
`can employ a user pro¢le, personalizing the linked content by prioritizing the types
`of links a user most often explores.
`To test this model, we developed a pilot system. We began by focus group-testing
`several concepts, and, based on the group’s reaction, designed and implemented a per-
`sonal news application (MyInfo) and a movie information retrieval application (Info-
`Sip) that enhances the traditional media experience by combining Web and TV content.
`This paper details current TV experience (Section 2.1), related work in content
`understanding and Web/TV information linking (Section 2.2), our user-centered
`design process (Section 3.1), pilot applications (Sections 3.2 and 3.3), system overview
`(Section 4), multimedia annotation and integration methods (Section 5), Web
`information extraction methods
`(Section 6), and our personalization model
`(Section 7). We present our conclusions in Section 8.
`
`2. Augmented User Experience
`
`The current TV experience grows out of a 50-year tradition of broadcasters trying to
`capture a mass audience. They used both demographic data and input from adver-
`tisers to determine which programs to play at the various times of day. More recently,
`the emergence of niche-based TV channels such as CNN (news), MTV (music), ESPN
`(sports), and HGTV (home and garden) allows viewers more control over when they
`view the content they desire. In addition, the arrival of electronic program guides
`(EPGs) have allowed viewers to browse the program o¡erings by genres, date, time,
`channel, title, and, in some cases, search using keywords, a big step forward over
`traditional paper guides that allow access by time and channel only.
`
`2.1. THE CURRENT TV NAVIGATION AND PERSONALIZATION
`
`Current EPGs found in digital satellite settop boxes, cable settop boxes, and personal
`video recorders from TiVo (www.tivo.com) and ReplayTV (www.digitalnetworksna.
`
`Page 204
`
`
`
`MEDIA AUGMENTATION AND PERSONALIZATION
`
`205
`
`com/replaytv/default.asp) o¡er users advanced methods for ¢nding something to
`watch or record. These systems generally hold one to two weeks’ worth of TV data,
`including program titles, synopses, genres, actors, producers, directors, times of
`broadcast, and channels. Viewers can use EPGs to browse listings by time, channel,
`genre, or program title. In addition, viewers can search for speci¢c titles, actors,
`directors, etc. Finally, the TiVo system o¡ers a recommender that lists highly rated
`programs and automatically records these programs when space is available on
`its hard disk.
`Although TiVo is currently the only commercial product with a recommender,
`much personalization research has been done in this area. Das and Horst developed
`the TV Advisor, where users enter their explicit preferences in order to produce a
`list of recommendations (Das et al., 1998). Cotter and Smyth’s PTV uses a mixture
`of case-based reasoning and collaborative ¢ltering to learn users’ preferences in order
`to generate recommendations (Cotter et al., 2000). Ardissono et al. created the Per-
`sonalized EPG that employs an agent-based system designed for settop box operation
`(Ardissono et al., 2001). Three user modeling modules collaborate in preparing
`the ¢nal recommendations: Explicit Preferences Expert, Stereotypical Expert, and
`Dynamic Expert. And Zimmerman et al. developed a recommender that uses a neural
`network to combine results from both an explicit and an implicit recommender
`(Zimmerman et al. This Volume). What all these recommenders have in common
`is that they only examine program-level metadata. They do not have any detailed
`understanding of the program, and cannot help users ¢nd interesting segments within
`a TV program.
`There has been also research in personalization related to adaptive hypermedia
`systems (Brusilovsky, 2003). These systems build a model of the goals, preferences
`and knowledge of each individual user, and use this model throughout the interaction
`with the user, in order to adapt to the needs of that user.
`The Video Scout project we previously developed o¡ers an early view of persona-
`lization at a subprogram level (Jasinschi et al., 2001, Zimmerman et al., 2001). Video
`Scout o¡ers users two methods for personalizing the TV experience. First, Scout
`can display TV show segments (Figure 8.1). For example, it segments talk shows into
`host/guest segments, displays musical performances and individual jokes. Second,
`Scout o¡ers a user interface element called ‘TV magnets’ (Figure 8.2). If users specify
`¢nancial news topics and celebrity names, then Scout watches TV and stores matching
`segments, monitoring the contents of talk shows for celebrity clips and searching
`the contents of ¢nancial news programs for ¢nancial news stories. Subprogram level
`access to TV programs improves the TV experience by allowing users more control
`over the content they watch.
`
`2.2. RELATED WORK IN CONTENT ANALYSIS AND ENHANCED TV
`
`Recently, there has been increasing interest in hyperlinking video with supplemental
`information. Examples include Microsoft and CBS’s interactive TV (Microsoft
`
`Page 205
`
`
`
`206
`
`NEVENKA DIMITROVA ET AL.
`
`Figure 8.1. Talk show segmented into host and guest segments.
`
`1997), ABC’s enhanced TV (ABC 2003), the HyperSoap project at the MIT Media
`Lab (Dakss), and Jiang and Elmagarmed’s work on their Logical Hypervideo Data
`Model (Jiang et al., 1998).
`In 1997 at the National Association of Broadcaster’s Expo, we saw Microsoft
`demonstrate their Enhanced TV concept. This concept allowed users to see Internet
`data associated with a TV program while watching the program. The Internet content
`
`Figure 8.2. Financial news magnet screen with four stored clips from two TV shows.
`
`Page 206
`
`
`
`MEDIA AUGMENTATION AND PERSONALIZATION
`
`207
`
`appeared on the side and bottom of the TV screen while the TV show played. Since
`then Microsoft has been working with broadcasters such as CBS to deliver interactive
`TV versions of the Grammy Awards, NCAA Basketball, and even TV dramas like
`CSI (Microsoft 2000). The current implementation works only for users with WebTV
`plus service or with a Microsoft UltimateTV settop box.
`ABC’s enhanced TV broadcasts allow users to view supplemental information such as
`player statistics for football games, answer questions for game shows, and answer polling
`questions for talk and news shows (ABC 2003). The interaction takes place on a
`computer displaying synchronized Webcast data that corresponds to events on the
`TV show. The current implementation can make it di⁄cult for users, as their attention
`is needed on two screens simultaneously. In addition, the lean forward model of com-
`puter use is not completely appropriate for the more lean back task of watching TV.
`Both the Microsoft/CBS and the ABC products combine Internet content with TV
`shows. However, neither allows users much freedom to explore. The Internet content
`is packaged and sent to users by the same people who created the TV program. Also,
`neither product personalizes either the TV show or the Internet content for individual
`users.
`Another concept called ‘HyperSoap’ (Dakss et al.) allows TV viewers using a
`special remote control to point at clothing, props and other furnishings on a soap
`opera in order to learn how they can be purchased. The research group studied
`how people interact with hyperlinked video and employed this information in devel-
`oping di¡erent modes of interaction. The design of the system matches current
`TV viewing in that it allows users to interact with a remote control. However,
`one clear challenge for this model is how to deal with objects that jump around
`on the screen as the story jumps from cut to cut.
`
`Figure 8.3. Weather screen with Web story highlighted.
`
`Page 207
`
`
`
`208
`
`NEVENKA DIMITROVA ET AL.
`
`Figure 8.4. Headlines screen with TV story highlighted.
`
`Jiang and Elmagarmed have introduced a novel video data model called ‘Logical
`Hypervideo Data Model’ (Jiang et al. 1998). The model is capable of representing
`multilevel video abstractions with video entities that users are interested in (de¢ned
`as hot objects) and their semantic associations with other logical video abstractions,
`including hot objects themselves. The semantic associations are modeled as video
`hyperlinks and video data with such property are called hypervideo. Video hyperlinks
`provide a £exible and e¡ective way of browsing video data. However, in this system,
`all the associations are derived manually. Users communicate with the system using
`a query language. This method of interaction allows them to explore information,
`but con£icts with the lean back model of TV viewing.
`Broadcast news analysis and retrieval for various purposes has also been an active
`area of research for a number of years. We created an initial ‘Personal News Retrieval
`System’ in 1996 to test the feasibility of video broadcast ¢ltering in the news domain
`(Elenbaas et al. 1999). The news broadcasts from di¡erent channels were semi-
`automatically indexed on a server. A client application invoked from a Web
`browser allows users to search individual stories. Searching is based on anchorperson,
`broadcaster, category, location, top-stories and keywords.
`Merlino et al. developed the ‘Broadcast News Editor/Navigator’ (BNE/BNN)
`(Merlino et al., 1997). They rely on the format of the broadcast to be broken down
`into series of states, such as start of broadcast, advertising, new story, and end
`of broadcast. They use multi-source cues such as text cues (‘back to you in New
`York’), audio silence to ¢nd commercials, and visual cues such as black frame
`and single and double booth anchor recognition.
`Hanjalic and his colleagues describe a semi-automatic news analysis method based on
`pre-selection of categories (Hanjalic et al., 1999). They ¢nd anchorperson shots, using
`
`Page 208
`
`
`
`MEDIA AUGMENTATION AND PERSONALIZATION
`
`209
`
`a template for matching the shots by matching individual frames. Also, they incorporated
`a simple word-spotting algorithm to form reports and use this for topic speci¢cation.
`Other systems have been reported in the literature dealing with the news retrieval (Ahanger
`et al., 1997, Brown et al., 1995, Chen et al., 1997, Maybury 2000). In addition, there
`is very recent research that performs automated segmentation of news and user modeling
`to generate personalcasts (Maybury et al., this volume).
`Broadcast TV companies have also tried to come up with Internet versions of their
`content. For example, CNN has a limited number of current stories and an archive
`of old ones available in Real-video or MPEG-4 (netshow) format. (See http://
`www.cnn.com/videoselect/for more details.)
`The di¡erence between our applications MyInfo and InfoSip and the cited systems
`is threefold: (i) our applications integrate both Web and TV content, as opposed limit-
`ing users to a single source, (ii) our interface employs a TV-like interaction, and
`(iii) MyInfo performs extensive prioritization and personalization based on detailed
`user preferences.
`
`3. Pilot Applications
`
`In order to explore and demonstrate the usefulness of content augmentation, we
`applied a selective process of ¢ltering initial ideas and concepts. In this section,
`we present our process and the pilot applications.
`MyInfo and InfoSip are both designed to enhance the features of a Personal Video
`Recorder (PVR) such as a TiVo, ReplayTV, or UltimateTV. These hard disk-based
`settop boxes currently allow users to easily store large numbers of shows. The
`segmented news stories, movies and supplemental information from the Web will
`all be stored on a PVR for access by users using a traditional remote control
`that has a few additional buttons. These applications are not currently intended
`to work with live broadcasts.
`
`3.1. THE DESIGN PROCESS
`
`We began by conducting a brainstorming session that included engineers and
`designers with experience in video processing, Web information retrieval, and
`Web and interactive TV design. We produced twenty concepts that coalesced into
`the following themes:
`
`^ Connect: Connect users with each other, with their community; with the live
`world.
`^ Explore: Support users’ ability to move deeper into a speci¢c topic. Allow users to
`specify the level of detail they require.
`^ Anticipate: Extract, classify, and summarize information before users request it.
`^ Summarize: Reduce overwhelming amounts of content (especially redundant
`content) into appropriate chunks based on user context.
`
`Page 209
`
`
`
`210
`
`NEVENKA DIMITROVA ET AL.
`
`After concept generation, we conducted two focus group sessions. Our focus group
`consisted of four men and four women living in the suburbs near New York City.
`They came from di¡erent educational, ethnic, and socio-economic backgrounds;
`however, they all enjoyed watching TV and all had access to and experience with
`using the Web.
`Our ¢rst session focused on evaluating and prioritizing the di¡erent concepts. In
`addition, participants shared their current strategies, preferences, and gripes for
`watching TV and collecting information from the Web. The following two concepts
`received particularly high ratings from participants:
`
`1. Personal News: the application supplements TV news stories with richer detail
`obtained from the Web.
`2. Actor Info: the application displays Web links for actors in the movie currently
`being viewed.
`
`Our second focus group employed the same participants, and used a participatory
`design approach to better de¢ne the pilot applications. In exploring the personal
`news concept, participants revealed that they currently sought out news using a niche
`sur¢ng technique. When they wanted to know something like the price of a stock,
`the outcome of a sporting event or the weather, they would tune their TVs to
`an appropriate channel such as ESPN (sports), MSNBC (¢nance), or the Weather
`Channel and then wait for the information to appear. They generally did not
`use the Web for this sort of high-level news because it required them to abandon
`household tasks such as making breakfast or folding laundry in order to go upstairs
`and boot a computer. They desired a system that o¡ered faster access to personal
`news around the themes of sports, ¢nance, tra⁄c, weather,
`local events, and
`headlines. They wanted access to the freshest information for these content zones
`from any TV in their home.
`In exploring the Actor Info application, participants really liked the idea of viewing
`supplemental information for a movie, but they did not want to be interrupted. Instead
`they wanted to be able to easily ask questions such as: Who’s that actor? What’s that
`song? Where are they? What kind of shoes are those? etc. They wanted the answers
`to these questions to appear immediately on the screen in an overlay. This way, they
`could get the information they wanted without interruption. They did not want links
`to Web sites. Instead, they wanted much more digested and summarized information.
`For more detail on the design process, please see (Zimmerman et al., this volume).
`
`3.2. MYINFO
`
`Users access the MyInfo application via a remote control. They can select any of the
`six content zones identi¢ed by the focus group in order to see personal Web extracted
`data and the latest TV stories that match this zone. In addition, users can press a
`button labeled ‘MyInfo’ in order to see a personalized TV news broadcast that
`displays TV news and Web extracted data from all of the content zones.
`
`Page 210
`
`
`
`MEDIA AUGMENTATION AND PERSONALIZATION
`
`211
`
`The interface displays an expanded story on the left, and a prioritized list of stories
`on the right. The top story always contains the Web-extracted information, which
`matches speci¢c request in the user pro¢le. The Web-extracted information includes:
`for weather, a four-day forecast for the speci¢ed zip code; for sports, the latest scores
`and upcoming games for speci¢ed teams; for local events, a prioritized listing by
`how soon the event will happen, distance from the home, and degree of match to
`keywords in pro¢le; for tra⁄c, delays for user-speci¢ed routes and ‘hot-spots’;
`and for ¢nance, current prices for stocks, change in price, and percent change for
`indexes, stocks, and funds listed in the pro¢le.
`By pressing the NEXT button, users can navigate down in the list of stories. This
`allows them to e¡ectively skip stories they do not want to hear. In addition, they
`can press the PLAY-ALL button in order to automatically play all the stories in a
`single content zone. The interaction supports users’ lifestyles, and takes a step towards
`a lean-natural
`interface. Users can quickly check information such as weather
`and tra⁄c right before they leave their homes. They can also play back all, or sections
`of, the personalized news as a TV show, leaving themselves free to carry out tasks
`in their homes such as eating, cooking, and laundry.
`
`3.3. INFOSIP
`
`The InfoSip pilot application allows users to sip information about actors in a scene
`while watching a movie. Users press the WHO button on the remote control and
`detailed information appears at the bottom of the screen. Currently, our system pro-
`vides an image, a biography, a ¢lmography, and current rumors, for all actors in
`the current scene (Figure 8.5). We manually extract the image from the video,
`but we hope to automate this process using our actor identi¢cation algorithms
`
`Figure 8.5. InfoSip screen.
`
`Page 211
`
`
`
`212
`
`NEVENKA DIMITROVA ET AL.
`
`(Section 5.5). The descriptive information is automatically extracted from the Web.
`This application has an advantage over supplemental metadata supplied on DVDs,
`in that it is always up to date. In the example below, Tim Robbins’ ¢lmography details
`work he did in 2002, even though the source movie, Robert Altman’s The Player,
`was released in 1992.
`During the collaborative design session, the participants stated that they often saw
`an actor whom they recognized but could not place. They wanted a simple method
`of selecting one of the actors, and seeing enough information to help them remember
`where they had seen that actor before. The decision to display all of the actors in
`the current scene takes a step towards a lean-natural interface by allowing users
`to both sip the metadata and view the movie simultaneously. Listing all actors in
`the movie would generate too large a list to navigate and would run the risk of drawing
`the user away from watching the movie. Displaying only the actors currently on screen
`would often require users to scan back in the movie, because, by the time they realized
`they wanted the information and grabbed the remote control, the shot with the actor
`they wanted might have ended. The ¢lmographies have two pieces of additional
`information that support functionality that was designed but not yet implemented.
`Their display can be personalized by using a viewing history to highlight movies
`the user has seen the speci¢c actor in, aiding the recognition task. In addition, when
`¢lmographies contain movies that match movies scheduled for broadcast, users
`can use this interface to select movies for recording.
`
`3.4. DEMONSTRATION
`
`We developed these applications to stimulate conversations between stakeholders in
`the TV/Web content value chain, from media producers, packagers, distributors
`to media consumers. The original idea was to develop these applications as demon-
`strators in order to explore the target applications for consumers. We hoped to
`use the applications to generate business models and new application concepts with
`colleagues in the content creation, broadcasting, and distribution domains. However,
`in the future, we plan to perform a qualitative evaluation of these applications with
`users.
`
`4. System Overview
`
`The system diagram in Figure 8.6 shows the high-level chain of content processing and
`augmentation. Unannotated or partially annotated content is delivered to the service
`provider (e.g. content provider, broadcaster) where generic analysis and augmentation
`is performed.
`Content and (optionally) metadata are delivered to the ¢rst step (Feature Extrac-
`tion and Integration) of the processing chain. At the server stage of the augmentation,
`the system extracts features and summarizes the content, generating descriptive meta-
`data. (A more detailed description of this step is given in Section 5.) The generated
`
`Page 212
`
`
`
`MEDIA AUGMENTATION AND PERSONALIZATION
`
`213
`
`Figure 8.6. Content Augmentation system diagram.
`
`metadata, in conjunction with any existing metadata, is then used to augment the
`content with additional information from Web sources. This information is provided
`by using Information Extraction from Web pages (WebIE), as described in Section
`6. The augmentation (Augmentation) that occurs at the server side is general, in that
`it is not based on any personal pro¢le. Following broadcaster augmentation, the
`content with the complete metadata is formatted and delivered to the consumer
`device (Formatting).
`The remaining augmentation is performed in the client stage. Here, a consumer
`device has the capability of storing content, metadata (in Storage), and user pro¢le
`(User Pro¢le). The device also has a prioritization module that relies on the user pro-
`¢le. This is used to perform a secondary augmentation (Augmentation) with Web
`information (WebIE), but this time based solely on user preferences. The information
`obtained is stored together with the content and is presented to users (Interaction
`Engine) as if it were a part of the original program. One of the reasons we kept
`all personalization on the client was to help insure privacy, a major concern of users
`in our focus group.
`There are several delivery pathways for the augmentation data, depending on the
`implementation of the system and the business model. Encoding metadata with
`the media is the most straightforward approach to delivering augmentation, but alter-
`native pathways are also possible. Web broadcasts or subscription-based data
`
`Page 213
`
`
`
`214
`
`NEVENKA DIMITROVA ET AL.
`
`retrieval can also o¡er localized or personalized versions of the augmentation data.
`Finally, the principle division in the server and client stage in Figure 8.6 is mainly
`to emphasize various aspects of the system. Implementations of the system where
`various client functions are provided by the server, and, inversely, server functions
`performed by the client, are possible.
`
`5. Content Processing
`
`Methods for automatic metadata extraction can be divided into coarse- and ¢ne-grain
`segmentation and abstraction. In this section, we brie£y introduce the methods used
`for our applications. For MyInfo, we coarsely segment the news broadcast into indi-
`vidual stories as described in Section 5.1. Next, each story is summarized by a repre-
`sentative textual summary and a frame that captures the visual summary. Text
`summarization is described in Section 5.2. Visual summarization is performed by
`detection shots of the news anchor (as described in Section 5.3) and selection of
`the most important visual key element (as described in Section 5.4.) For InfoSip,
`we apply person identi¢cation using both face and voice identi¢cation, as described
`in Section 5.5.
`
`5.1. COARSE SEGMENTATION
`
`Our approach exploits well-known, previously reported, cues to segment commercials
`and news segments from news programs (Merlino et al. 1997 and Boykin et al.
`1999). We ¢rst ¢nd the commercial breaks in a particular news program, and then
`we perform story segmentation within the news portion. For stories, we use the story
`break markup (‘>>>’) in the closed captioning. In addition, we have investigated
`the detection of story segment boundaries at a macrosegment level (McGee et al.
`1999, Dimitrova et al. 2003).
`There is a variety of commercial detectors that perform text, audio, and visual
`analysis to determine if TV programs contain commercial breaks (Blum 1992, Bonner
`et al., 1982, Boykin et al., 1999, Merlino et al., 1997). Since our domain consists
`of ‘commercial aware’ programs, in which the anchors announce that a commercial
`break is coming up, we were able to use a computationally inexpensive, genre-speci¢c,
`text-based commercial detector. In part, this relies on the absence of closed captioning
`for 30 seconds or more, and in part, it relies on the news anchors using cue phrases
`to segue to/from the commercials, such as, ‘coming up after the break’ and ‘welcome
`back’. We look for onset cues such as ‘right back’, ‘come back’, ‘u