`US 20090300475Al
`
`c19) United States
`c12) Patent Application Publication
`Fink et al.
`
`c10) Pub. No.: US 2009/0300475 Al
`Dec. 3, 2009
`(43) Pub. Date:
`
`(54) WEB-BASED SYSTEM FOR
`COLLABORATIVE GENERATION OF
`INTERACTIVE VIDEOS
`
`(75)
`
`Inventors:
`
`Michael Fink, Mountain View, CA
`(US); Ryan Junee, Mountain View,
`CA (US); Sigalit Bar, Haifa (IL);
`Aviad Barzilai, Haifa (IL); Isaac
`Elias, Haifa (IL); Julian Frumar,
`Mountain View, CA (US); Herbert
`Ho, Mountain View, CA (US); Nir
`Kerem, Haifa (IL); Simon Ratner,
`Mountain View, CA (US); Jasson
`Arthur Schrock, Mountain View,
`CA (US); Ran Tavory, Haifa (IL)
`
`Correspondence Address:
`GOOGLE I FENWICK
`SILICON VALLEY CENTER, 801 CALIFORNIA
`ST.
`MOUNTAIN VIEW, CA 94041 (US)
`
`(73) Assignee:
`
`Google Inc., Mountain View, CA
`(US)
`
`(21) Appl. No.:
`
`12/388,365
`
`(22) Filed:
`
`Feb.18,2009
`
`Related U.S. Application Data
`
`(60) Provisional application No. 61/058,459, filed on Jun.
`3, 2008.
`
`Publication Classification
`
`(51)
`
`Int. Cl.
`G06F 17121
`(2006.01)
`G06F 17130
`(2006.01)
`G06F 21100
`(2006.01)
`(52) U.S. Cl. ... 715/230; 707/104.1; 726/4; 707/E17.009;
`707/E17.044
`
`(57)
`
`ABSTRACT
`
`Systems and methods are provided for adding and displaying
`interactive annotations for existing online hosted videos. A
`graphical annotation interface allows the creation of annota(cid:173)
`tions and association of the annotations with a video. Anno(cid:173)
`tations may be of different types and have different function(cid:173)
`ality, such as altering the appearance and/or behavior of an
`existing video, e.g. by supplementing it with text, allowing
`linking to other videos or web pages, or pausing playback of
`the video. Authentication of a user desiring to perform anno(cid:173)
`tation of a video may be performed in various manners, such
`as by checking a uniform resource locator (URL) against an
`existing list, checking a user identifier against an access list,
`and the like. As a result of authentication, a user is accorded
`the appropriate annotation abilities, such as full annotation,
`no annotation, or annotation restricted to a particular tempo(cid:173)
`ral or spatial portion of the video.
`
`Video Hosting Server
`108
`::::
`
`,...
`,__
`
`~
`
`~
`
`~
`
`Client
`130
`
`Browser
`132
`
`Embedded
`Player
`134
`
`~
`
`Network
`Interface
`122
`
`User
`Database
`140
`
`~
`
`Video
`Database
`128
`
`.....
`
`~
`
`Front End Server
`124
`
`Video Server
`126
`
`Network
`105
`
`r-j
`
`A
`
`)\_
`
`:., -
`
`Annotation Server 150
`
`Authentication Server 170
`
`URL list
`ill
`
`Video analysis
`module 152
`
`Annotation
`Database
`154
`
`Exhibit 1007
`MG Freesites v. Scorpcast
`
`
`
`Ul
`-....J
`.i;...
`0
`0
`~
`
`> ....
`
`0
`0
`N
`rJJ
`c
`
`1,0 --- 0
`
`Ul
`
`0 ....
`....
`""""
`rJJ =(cid:173)
`
`('D
`('D
`
`1,0
`0
`0
`N
`~
`~
`
`~
`
`c ('D
`.... 0 =
`.... 0 = ""O = O" -....
`('D = """" t "e -....
`
`""""
`~
`""O
`
`""""
`~
`(')
`
`""""
`~
`(')
`
`154
`
`Database
`Annotation
`
`module 152
`
`Video analysis
`
`Annotation Server 150
`
`126
`
`124
`
`Video Server
`
`Front End Server
`
`~
`
`128
`
`....
`
`~
`
`140
`
`...._
`
`Database
`
`Video
`
`Database
`
`User
`
`122
`
`Interface
`Network
`
`__,,
`--..
`
`.....
`,-
`
`__.
`--..
`
`,r
`
`...._
`
`108
`
`Video Hosting Server
`
`FIG. 1
`
`~
`
`171
`
`....
`
`URL list
`
`~
`
`Authentication Server 170
`
`,,...
`
`/,,/,/
`
`·~
`
`105
`
`Network
`
`'"'
`
`/~'i
`
`-1
`
`," /-v~:::~-----
`
`·~
`I,
`>
`
`/
`
`··· . ._(
`>
`I
`/
`.J
`
`134
`
`Player
`
`Embedded
`
`132
`
`Browser
`
`130
`Client
`
`
`
`Patent Application Publication
`
`Dec. 3, 2009 Sheet 2 of 5
`
`US 2009/0300475 Al
`
`c:::,
`
`......
`"'
`
`= =
`
`0
`N
`
`
`
`Ul
`--.J
`.i;...
`0
`0
`~
`
`> ....
`
`0
`0
`N
`rJJ
`c
`
`1,0 --- 0
`
`~
`
`('D
`
`0 ....
`.....
`rJJ =- ('D
`
`Ul
`
`1,0
`0
`0
`N
`~
`~
`
`~
`
`c ('D
`
`.... 0 =
`.... 0 = ""O = O" -....
`('D = ..... t "e -....
`
`~ .....
`
`(')
`
`~ .....
`
`(')
`
`~ .....
`""O
`
`FIG. 3
`
`I
`
`n.nl'\ In..~~ I _,,_
`
`r // I
`
`-
`
`-----
`
`-
`
`/ I
`
`:"'j
`
`0
`
`,.., 215A -(!)
`
`~
`
`I
`
`(cid:143)
`
`20
`
`304
`
`303
`
`302
`
`I
`
`I I 202--{
`
`305
`
`CJ I PASTE LINK TO A YOUTUBE VIDE01 CHANNEL1 OR SEARCH RES UL Tl
`I O:OOfil!3
`
`f-$-1 --1 0:00~5 f-$-1 (cid:143)
`
`(cid:143)
`
`315 I
`
`Y BIRT~DAY MOM K
`
`1 P
`
`I 210lfF' HAP
`
`I WHAT'S BEHIND THE WINDOW?
`I (8)
`CJ PASTE LINK TO A YOUTUBE VIDEO, CHANNEL, OR SEARCH RESULT
`o 1 0:00:09.a 1 --1 0:00:15.o
`YOUR BELOVED SON ON THE KITCHEN TABLE
`(8)
`CJ PASTE LINK TO A YOUTUBE VIDEO, CHANNEL, OR SEARCH RESULT
`9 I 0:00:03.5 1 --1 0:00:13.3 1 (cid:143)
`HAPPY BIRTHDAY MOM
`
`I o
`
`(8)
`
`' ( SAVE DRAFT) ( PREVIEW )
`
`301--.J, "' '"' '"''
`
`(cid:143)
`
`
`Patent Application Publication
`
`Dec. 3, 2009 Sheet 4 of 5
`
`US 2009/0300475 Al
`
`01">
`
`0
`(])
`"'O
`
`(])
`:Q
`>
`0
`I....
`a_
`
`.......
`s::::i-
`
`f---+,
`
`(])
`u
`~
`(])
`+-'
`C
`--
`C
`0
`~1~ f---+ Ol+J
`(")2
`s::::i- 0
`C
`C
`ro
`(])
`"'O
`">
`0
`I.... a_
`
`(])
`+-'
`ro
`u
`
`(])
`..c
`+-'
`::J
`<(
`
`s::::i-
`
`f---+
`
`Cf)
`C
`_Q
`+-'
`ro
`+-'
`0
`
`01 C
`
`s:::;1- C
`ro
`s::::i-
`(])
`-~
`(])
`u
`(])
`~
`
`
`
`Ul
`-....J
`.i;...
`0
`0
`~
`
`> ....
`
`0
`0
`N
`rJ'1
`c
`
`1,0 --- 0
`
`0 ....
`Ul
`.....
`rJ'1 =(cid:173)
`
`('D
`('D
`
`Ul
`
`1,0
`0
`0
`N
`~
`~
`
`~
`
`525
`
`c ('D
`.... 0 =
`.... 0 = ""O = O" -....
`('D = ..... t "e -... . (')
`
`~ .....
`
`(')
`
`~ .....
`
`~ .....
`""O
`
`DELETEALL !..
`!SAVE DRAFT] I PUBLISH I
`
`525
`
`'\ I t>PREVIEW]
`510
`
`BYYOU
`®
`
`I ®
`
`@00:02 (5 SECS)
`FORGETTHATYOU'RE WATCHING GRAPHICS.
`G:i WOW TI-iE MOTION IS SO BELIEVABLE· YOU
`SHOW: ALL MINE OTHERS
`
`01 THISKINDAREMINDSMEOF
`
`O c,Q ._,._,~ ~ BEING A TEENAGER.
`
`,lPMISE
`
`/
`
`(S~
`
`~ ~ah-_
`
`-
`
`Ju J
`-:.\ ~rl ~ I~ /
`rUI
`ol ~1 cJ IC? s\ t \ q
`
`-;;..
`
`I I
`
`',_/I
`
`vii
`
`507
`
`'~ _
`
`FIG. 5
`
`THIS LINK ALLOWS OTHERS TO ADD ANNOTATIONS TO YOUR VIDEO. RESETTING THE LINK WILL NOT REMOVE EXISTING ANNOTATIONS.
`
`I I COPY I l DISABLE & RESET LINK~
`
`rf'http://www.youtube.com/watch?v=7 _SrHIAqLak&group_id=234jHjs8
`
`INVITE OTHERS TO ADD ANNOTATIONS
`
`I>
`
`I
`
`01 :20
`I I
`
`I
`
`I
`
`01:10
`I
`
`I
`
`I.
`
`01:00
`
`I
`
`I
`
`516
`' I
`
`"-
`
`~
`
`~I~
`
`00:50
`
`I
`
`I
`
`00:40
`I
`
`I
`
`00:30
`I
`
`I
`
`00:20
`
`I
`
`I
`
`I
`
`00:10
`
`I
`
`I
`
`0:2312:30
`
`I
`
`2
`I~
`
`/ -
`t> I
`
`fa
`
`<
`j3
`u
`
`
`
`(cid:143) 0
`
`®
`q, @00:55 (9 SECS)
`BYYOU
`FILM TRAILERS.
`[ J CLICK HERE FOR THE OTHER PIXAR FEATURE ®
`BYYOU
`® _
`BYYOU
`
`0 PAUSEANNOTATION
`
`@00:55 (5 SECS)
`
`@00:30 (5 SECS)
`
`0 OH MAN, I LOVED THIS PART-IT WAS TOO
`
`@01 :23 (5 SECS)
`FUNNY.
`
`/
`
`'/
`
`IC\ v
`
`~ i.vn111nn~r1n~~1vucm Vnt[ru,c""A
`
`BY JFRUMAR
`
`°0,111 e::i I ~
`
`,/
`
`/
`
`/
`
`1 I II _
`~
`
`~ Ill"
`
`520
`
`51
`
`505
`
`500
`
`
`
`US 2009/0300475 Al
`
`Dec. 3, 2009
`
`1
`
`WEB-BASED SYSTEM FOR
`COLLABORATIVE GENERATION OF
`INTERACTIVE VIDEOS
`
`CROSS REFERENCE TO RELATED
`APPLICATIONS
`
`[0001] This application claims the benefit of Provisional
`Application No. 61/058,459, filed on Jun. 3, 2008, which is
`hereby incorporated herein by reference.
`
`TECHNICAL FIELD
`
`[0002] The disclosed embodiments relate generally to the
`collaborative generation ofinteractive features for digital vid(cid:173)
`eos.
`
`BACKGROUND
`
`[0003] Conventional web-based systems perm1ttmg the
`storage and display of digital videos typically only allow
`commenting on the video as a whole. In particular, if viewers
`wish to comment on or otherwise reference a particular por(cid:173)
`tion of the video, they are obliged to explicitly describe the
`portion by text or time in the video and other indirect means.
`Conventional systems also have simplistic controls for anno(cid:173)
`tating a video, to the extent that they allow such annotation at
`all. Rather, such systems either allow only the owner (e.g., a
`user who uploaded the video) to add annotations, or else allow
`all users to do so, without restrictions.
`
`SUMMARY
`
`[0004] The present invention includes systems and meth(cid:173)
`ods for adding and displaying interactive annotations for
`online hosted videos. A graphical annotation interface allows
`the creation of annotations and association of the annotations
`with a video. Annotations may be of different types and have
`different functionality, such as altering the appearance and/or
`behavior of an existing video, e.g. by supplementing it with
`text, allowing linking to other videos or web pages, or pausing
`playback of the video.
`[0005] Authentication of a user desiring to perform anno(cid:173)
`tation of a video may be performed in various manners, such
`as by checking a uniform resource locator (URL) against an
`existing list, checking a user identifier against an access list,
`and the like. As a result of authentication, a user is accorded
`the appropriate annotation abilities, such as full annotation,
`no annotation, or annotation restricted to a particular tempo(cid:173)
`ral or spatial portion of the video.
`[0006] The features and advantages described in this sum(cid:173)
`mary and the following detailed description are not all-inclu(cid:173)
`sive. Many additional features and advantages will be appar(cid:173)
`ent to one of ordinary skill in the art in view of the drawings,
`specification, and claims presented herein.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0007] FIG. 1 is a block diagram of a system architecture
`for allowing annotation of online hosted videos, according to
`one embodiment.
`[0008] FIG. 2 illustrates different types of annotations that
`may be added to a video, according to one embodiment.
`[0009] FIG. 3 depicts a user interface for creating the anno(cid:173)
`tations of FIG. 2, according to one embodiment.
`[0010] FIG. 4 illustrates the steps involved in adding anno(cid:173)
`tations to videos, according to one embodiment.
`
`[0011] FIG. 5 illustrates an annotation interface allowing
`the addition of annotations and providing information on
`existing annotations, according to one embodiment.
`[0012] The figures depict various embodiments of the
`present invention for purposes ofillustration only. One skilled
`in the art will readily recognize from the following discussion
`that alternative embodiments of the structures and methods
`illustrated herein may be employed without departing from
`the principles of the invention described herein.
`
`DETAILED DESCRIPTION OF DRAWINGS
`
`[0013] FIG. 1 is a block diagram of a system architecture in
`accordance with one embodiment. As illustrated in FIG. 1, a
`video hosting server 108 includes a front end server 124, a
`video server 126, a network interface 122, a video database
`128, and a user database 140. Other conventional features,
`such as firewalls, load balancers, application servers, fail over
`servers, site management tools, and so forth are not shown so
`as to more clearly illustrate the features of the system.
`Examples of a suitable video hosting server 108 for imple(cid:173)
`mentation of the system include the YouTube™ and Google
`Video™ websites; other video hosting sites are known as
`well, and can be adapted to operate according the teaching
`disclosed herein. It will be understood that the term "website"
`represents any system and method of providing content and is
`not intended to be limited to systems that support content
`provided via the Internet or the HTTP protocol. The various
`servers are conventionally implemented, whether as a single
`piece of software or hardware or as multiple pieces of soft(cid:173)
`ware or hardware and can couple to the network 105 via the
`network interface 122. In general, functions described in one
`embodiment as being performed on the server side can also be
`performed on the client side in other embodiments if appro(cid:173)
`priate.
`[0014] A client 130 executes a browser 132, and connects to
`the front end server 124 via a network 105, which is typically
`the Internet, but may also be any network, including but not
`limited to a LAN, a MAN, a WAN, a mobile, wired or wireless
`network, a private network, or a virtual private network.
`While only a single client 130 and browser 132 are shown, it
`is understood that very large numbers (e.g., millions) of cli(cid:173)
`ents are supported and can be in communication with the
`video hosting server 108 at any time. The client 130 may
`include a variety of different computing devices. Examples of
`client devices 130 are personal computers, digital assistants,
`personal digital assistants, cellular phones, mobile phones,
`smart phones or laptop computers. As will be obvious to one
`of ordinary skill in the art, the present invention is not limited
`to the devices listed above.
`In some embodiments, the browser 132 includes an
`[0015]
`embedded video player 134 such as, for example, the Flash™
`player from Adobe Systems, Inc. or any other player adapted
`for the video file formats used in the video hosting video
`hosting server 108. A user can access a video from the video
`hosting server 108 by browsing a catalog of videos, conduct(cid:173)
`ing searches on keywords, reviewing play lists from other
`users or the system administrator ( e.g., collections of videos
`forming channels), or viewing videos associated with a par(cid:173)
`ticular user group ( e.g., communities).
`[0016] Video server 126 receives uploaded media content
`from content providers and allows content to be viewed by
`client 130. Content may be uploaded to video server 126 via
`the Internet from a personal computer, through a cellular
`network from a telephone or PDA, or by other means for
`
`
`
`US 2009/0300475 Al
`
`Dec. 3, 2009
`
`2
`
`transferring data over network 105 known to those of ordinary
`skill in the art. Content may be downloaded from video server
`126 in a similar manner; in one embodiment media content is
`provided as a file download to a client 130; in an alternative
`embodiment, media content is streamed client 130. The
`means by which media content is received by video server
`126 need not match the means by which it is delivered to
`client 130. For example, a content provider may upload a
`video via a browser on a personal computer, whereas client
`130 may view that video as a stream sent to a PDA. Note also
`that video server 126 may itself serve as the content provider.
`Communications between the client 130 and video hosting
`server 108, or between the other distinct units of FIG. 1, may
`be encrypted or otherwise encoded.
`[0017] Users of clients 130 can also search for videos based
`on keywords, tags or other metadata. These requests are
`received as queries by the front end server 124 and provided
`to the video server 126, which is responsible for searching the
`video database 128 for videos that satisfy the user queries.
`The video server 126 supports searching on any fielded data
`for a video, including its title, description, tags, author, cat(cid:173)
`egory and so forth.
`[0018] Users of the clients 130 and browser 132 can upload
`content to the video hosting server 108 via network 105. The
`uploaded content can include, for example, video, audio or a
`combination of video and audio. The uploaded content is
`processed and stored in the video database 128. This process(cid:173)
`ing can include format conversion (transcoding), compres(cid:173)
`sion, metadata tagging, and other data processing. An
`uploaded content file is associated with the uploading user,
`and so the user's account record is updated in the user data(cid:173)
`base 140 as needed.
`[0019] For purposes of convenience and the description of
`one embodiment, the uploaded content will be referred to a
`"videos", "video files", or "video items", but no limitation on
`the types of content that can be uploaded are intended by this
`terminology. Each uploaded video is assigned a video iden(cid:173)
`tifier when it is processed.
`[0020] The user database 140 is responsible for maintain(cid:173)
`ing a record of all users viewing videos on the website. Each
`individual user is assigned a user ID ( also referred to as a user
`identity). The user ID can be based on any identifying infor(cid:173)
`mation, such as the user's IP address, user name, or the like.
`The user database may also contain information about the
`reputation of the user in both the video context, as well as
`through other applications, such as the use of email or text
`messaging. The user database may further contain informa(cid:173)
`tion about membership in user groups, e.g. a group of users
`that can view the same annotations. The user database may
`further contain, for a given user, a list of identities of other
`users who are considered friends of the user. (The term "list",
`as used herein for concepts such as lists of authorized users,
`URL lists, and the like, refers broadly to a set of elements,
`where the elements may or may not be ordered.)
`[0021] The video database 128 is used to store the received
`videos. The video database 128 stores video content and
`associated metadata, provided by their respective content
`owners. The video files have metadata associated with each
`file such as a video ID, artist, video title, label, genre, and time
`length.
`[0022] An annotation server 150 provides the ability to
`view and add annotations to videos in the video database 128.
`The annotation server 150 collects various annotations-such
`as text boxes, "thought bubbles", and the like-from uploads
`
`by a user or the owner of a video, from publishers, or as a
`result of video analysis techniques. It then stores these anno(cid:173)
`tations within an annotation database 154. The annotation
`server 150 also provides to entities such as the client 130 or
`the video hosting server 108, for a given video, annotations
`stored within the annotation database 154 for that video.
`Generally, an annotation modifies the behavior of an other(cid:173)
`wise non-interactive video, providing interactive overlays
`with which a user can interact, or altering the usual flow of
`playback of the video, for example. Examples of interactive
`overlays include text boxes, thought bubbles, spotlights,
`hyperlinks, menus, polls, and the like, any of which can have
`an arbitrarily sophisticated user interface behavior. In one
`embodiment, the annotation server 150 is on a separate physi(cid:173)
`cal server from the video hosting server 108, although in other
`embodiments the annotation functionality is included within
`the video hosting server 108.
`[0023] The annotation database 154 maintains an associa(cid:173)
`tion between each annotation and the appropriate portion of
`the annotated video. In one embodiment, for example, the
`annotation database 154 stores an identifier of the annotation
`type (e.g., a text box) along with any information associated
`with that type ( e.g., a text caption), a time stamp(s) of the
`video to which the annotation applies (e.g., from time 01:05
`to time 01 :26), an identifier of the video which the annotation
`annotates, and an identifier of a user who submitted the anno(cid:173)
`tation ( e.g., a username ). Some types of annotation may also
`be associated with a link to another web page, video, network
`object, or the like. Many other storage implementations for
`annotations would be equally possible to one of skill in the art.
`[0024] A video analysis module 152 can be used by the
`annotation server 150 to automatically generate annotations,
`orto suggest them to a user. This can entail techniques such as
`speech analysis, vision analysis (e.g., face detection, object
`recognition, and optical character recognition (OCR)), or
`crawling annotations explicitly or implicitly available.
`[0025] Since annotation of videos may be accomplished
`from remote locations over the network 105 by a variety of
`users, an authentication mechanism can be used to restrict
`annotations to only a subset of users. Thus, an authentication
`server 170 is provided to verify access by clients 130 to
`annotation functionality of the annotation server 150. As
`described further below, authentication may be performed in
`a number of ways in different embodiments, such as using
`secret links, access control lists, user credibility scores, or
`permissions based on community moderation. In one
`embodiment, a three-tiered permissions system is employed,
`with a first, lowest permission tier for users who can only
`view and interact with annotations of a video by clicking on
`links, a second, higher permission tier for those users who can
`add or modify their own annotations, and a third, highest
`permission tier for those users who can also modify and delete
`any annotations in the video. The use of secret links employs
`a URL list 171, which associates videos with a URL through
`which access to an annotation interface is obtained. In one
`embodiment, the authentication server 170 is implemented as
`a component of video hosting server 108.
`[0026] FIG. 2 illustrates some different types of interactive
`annotations (hereinafter "annotations") that may be added to
`a video, according to one embodiment. A main video area 202
`displayed on the client 130 plays a video stored in the video
`database 128 and served to the client 130 by the video server
`126. Playback can be controlled via, for example, a video
`controls area 204. In the illustrated example, three distinct
`
`
`
`US 2009/0300475 Al
`
`Dec. 3, 2009
`
`3
`
`annotations 205-215 have been added. Annotations 205 and
`210 are text boxes and thought bubbles, which display static
`text. Annotation 215 is a spotlight that displays text, e.g.
`"What's behind the window?" in response to a user hovering
`the mouse within its boundaries. Any of these annotation
`types can have a time range during which it is active, e.g. from
`a time 0:15 to 0:36. For example, the text box 205 could be set
`to appear 15 seconds into the playing of the video and disap(cid:173)
`pear 21 seconds later, after a user has had a chance to read it.
`[0027] Any of these annotation types may also have arbi(cid:173)
`trarily sophisticated presentation, such as shape and text col(cid:173)
`oring and styling, or associated actions, such as displaying
`additional annotations or redirecting the user to a target web(cid:173)
`based location such as a uniform resource locator (URL)
`upon being activated, such as by a mouse click, mouse over,
`press ofa key corresponding to the annotation, or the like. The
`target location to which control is transferred could include an
`advertisement, or content including an advertisement. For
`example, clicking on spotlight 215 could lead to a web page
`describing a particular product. The target location could also
`cause display of an object or scene taken from a different
`perspective, e.g. the back side of an object taken from a
`different camera angle. Additionally, the target location could
`have a link, button, or annotation that transfers control back to
`the original video, instead of to a different video. In one
`embodiment, control can be transferred back to a particular
`moment in the original video, e.g., as specified by a URL
`encoding the video identifier and a description of the moment
`in the video, such as "t=0:22", denoting a time 22 seconds into
`the video. Such uses of time stamps in URLs can be used to
`construct, for example, a branching series of pages, which can
`be used to create an interactive storyline within a single video.
`This allows, for example, rapid transfer to another video
`portion, without the delay entailed by obtaining a different
`video. In one embodiment, an annotation can be displayed
`conditionally, for example if a user mouses over another
`annotation, when that other annotation is displayed either at
`the same time or a later time.
`[0028] Annotations may also be added to modify the play(cid:173)
`back of the video, rather than to present an interactive graphi(cid:173)
`cal interface. For example, a pause annotation causes play(cid:173)
`back of the video to halt for a given time delay, including an
`unlimited time. This allows, for example, an arbitrary amount
`of time for users to make a choice before the video continues.
`Using the time stamps in URLs as described above, one can
`modify the playback of a video so that, for example, clicking
`( or even positioning the mouse over) a door will seek to the
`portion of the video that displays the door opening and the
`room behind it. This can increase the level of interactivity in
`a video to a degree similar to that of a computer game.
`[0029] The use of various types of annotations can be used
`to modify standard linear video viewing in a number of dif(cid:173)
`ferent ways. They could be used to implement, for example, a
`menu-style interface, in which the video displays several
`choices via annotations with links to other pages, and then
`pauses the video to allow the user to select one of the choices.
`The menu items could be still annotations, animated video
`annotations, and the like, and could be displayed in a tradi(cid:173)
`tional list of items, as separate labeled visual objects, or in a
`variety of other manners. They could also be used to imple(cid:173)
`ment branching story lines, where clicking on one annotation
`leads to one continuation of the video, and clicking on a
`different annotation leads to a different continuation. For
`example, annotations could be used to implement an interac-
`
`tive game of "rock, paper, scissors", in which, for instance,
`clicking on an annotation corresponding to a "rock", "paper",
`or "scissors" choice leads to a separate video or portion of the
`same video depicting a tie, a win, or a loss, respectively, each
`outcome potentially leading to the display of additional anno(cid:173)
`tations representing a second round of the game. The menu
`items could also be used to implement multi-perspective sto(cid:173)
`rylines, wherein clicking on the annotated face of an actor
`leads to seeing the remainder of the story from that actor's
`perspective.
`[0030] FIG. 3 depicts a user interface for manually creating
`the annotations of FIG. 2, according to one embodiment.
`Annotation icons 302-305 correspond to four annotation
`types (speech bubbles, text boxes, spotlights, and pauses,
`respectively); selecting one of them and then clicking on the
`playing video creates an annotation of that type at the location
`and time corresponding to the click. The annotation then has
`default values, such as text captions, time ranges, boundaries,
`and associated URLs. In FIG. 3, editing regions 310,305, and
`315 correspond to displayed annotations 205, 210, and 215,
`respectively, and the contents thereof can be edited to change
`the values of the caption. Editing region 310, for example,
`comprises a text caption 31 OA, a time range 31 OB, and a link
`310C. The link can be, for example, a page within the video
`service that denotes a watch page for a video or that denotes
`a channel displaying thumbnails for several related videos.
`The text caption 31 OA has been set to the value "Happy
`birthday mom" by the user, and the time range 310B is set to
`last from 0:00:03, fifth frame, to 0:00: 13, third frame, and the
`link 310C does not yet have a specified value. Editing of
`annotations can also be accomplished graphically; for
`example, the boundary of callout 215 can be changed by
`dragging on the handles 215A associated with the boundaries.
`[0031] As an alternative or addition to manually creating
`the annotations using the user interface of FIG. 3, the video
`analysis module 152 of FIG. 1 can be used to automatically
`detect temporal and spatial locations to add annotations, to
`determine their associated values, and/or to control the
`behavior of existing annotations.
`[0032] One example for such analysis is face detection,
`which can be used in the video of FIG. 3 to detect the face of
`the boy in the images as a human face and to suggest the
`creation of a text bubble in association therewith, or it could
`be used to automatically provide or suggest a caption describ(cid:173)
`ing the recognized face.
`[0033] Another example could include applying object rec(cid:173)
`ognition methods based on local descriptor matching. (See,
`for example, "A Performance Evaluation of Local Descrip(cid:173)
`tors", Mikolajczyk, K.; Schmid, C., IEEE Transactions on
`PatternAnalysis and Machine Intelligence, Volume 27, Issue
`identify
`10:1615-1630). Such object recognition can
`instances of known textured objects, such as locations, build(cid:173)
`ings, books, CD covers, and the like. (Example images for
`training recognition of such objects can be readily found in
`product/location catalogs which associate the product name
`with one or more images of the product). Once an object, such
`as the cover of a particular CD, is detected, the manual anno(cid:173)
`tation process can be simplified by providing an educated
`guess regarding the object's spatial and temporal positioning.
`[0034] Recognized objects can then be associated with
`annotations, such as links presenting more information, e.g.
`from a given viewpoint. For example, events can be presented
`from a national perspective by using object recognition to
`identify objects associated with a certain nationality and pre-
`
`
`
`US 2009/0300475 Al
`
`Dec. 3, 2009
`
`4
`
`senting associated information, e.g. associating, with the
`Indian team members of a cricket match, a link to the next
`event that the team will be participating in. As another
`example, an athlete recognized using object recognition can
`be associated with a link or other annotation data that pro(cid:173)
`vides statistics, personal information, etc. on the athlete.
`[0035] Additionally, in conjunction with a search index of a
`search engine such as Google™ or YouTube™, if an object in
`a video is indeed recognized, then a phrase describing that
`product could be executed against the search index and the top
`search result suggested as a link for the object ( e.g., searching
`for the title of a recognized music CD and linking to a product
`search page corresponding to that title).
`[0036]
`In one embodiment, an annotation link corresponds
`to a search query so that if a user clicks on the annotation, the
`user will see a search result page for the query. For example,
`a user may view all videos posted by a person in the video who
`has been identified by the user and whose name has been used
`as a search term. This type of annotation allows the results
`page to be up to date since a search on a search term associated
`with an annotation will not always yield the same results
`page.
`[0037] Object recognition could further be used to identify
`locations of interest, and in combination with "geotagged"
`videos in which location information is embedded, videos
`related to the recognized location could be provided or sug(cid:173)
`gested as links.
`[0038] Object recognition could further be augmented by
`tracking the movement of the object across frames, thereby
`moving any associated annotations along with it. For
`example, if the boy moved his position in the various frames
`of the video, object recognition could be used to track the
`boy's face as he moves and to automatically reposition the
`text bubble 210 near the detected face in each frame. This
`would be a type of annotation that moves within the frame in
`connection with an object in the frame. In the case of a Flash
`player, analysis of the video would preferably be done on the
`server while display of the annotation in different frame loca(cid:173)
`tions during video play would generally be achieved within
`the player as the object moves within the video.
`[0039] Yet another type of analysis is optical character rec(cid:173)
`ognition (OCR), the details of which are known to one of skill
`in the art. For example, words on a billboard could be recog(cid:173)
`nized using OCR and a corresponding textual caption auto(cid:173)
`matically provided or suggested for the billboard.
`[0040] FIG. 4 illustrates steps involved in adding annota(cid:173)
`tions to videos, according to one embodiment. The client 130
`requests a video from the video server 108 using the network
`105. The front end server 124 of the video hosting server 108
`receives the request and delegates to the video server 126,
`which obtains the video from the video database 128 and
`provides it 410 to the client 130. In one embodiment, the
`video hosting server 108 then delegates to the annotation
`server 150 to provide the annotation user interface; in other
`embodiments, the video hosting server 108 requests the anno(cid:173)
`tations for the video from the annotation server 150 and then
`itself provides the annotation user interface, providing any
`annotations created via the user interface to the annotation
`server for storage. In one embodiment, the provided annota(cid:173)
`tion user interface differs depending on the identity of the user
`doing the annotation. For example, if the annotating us