throbber
Self-Describing Schemes for Interoperable MPEG-7
`Multimedia Content Descriptions
`
`Seungyup Paek, Ana B. Benitez, and Shih-Fu Chang'
`
`Image & Advanced TV Lab, Department of Electrical Engineering
`Columbia University, 1312 S.W. Mudd, Mail code 4712 Box F-4
`New York, NY 10027, USA
`
`ABSTRACT
`In this paper, we present the self-describing schemes for interoperable image/video content descriptions, which are being
`developed as part of our proposal to the MPEG-7 standard. MPEG-7 aims to standardize content descriptions for multimedia
`data. The objective of this standard is to facilitate content-focused applications like multimedia searching, filtering, browsing,
`and summarization. To ensure maximum interoperability and flexibility, our descriptions are defined using the eXtensible
`Markup Language (XML), developed by the World Wide Web Consortium. We demonstrate the feasibility and efficiency of
`our self-describing schemes in our MPEG-7 testbed. First, we show how our scheme can accommodate image and video
`descriptions that are generated by a wide variety of systems. Then, we present two systems being developed that are enabled
`and enhanced by the proposed approach for multimedia content descriptions. The first system is an intelligent search engine
`with an associated expressive query interface. The second system is a new version of MetaSEEk, a metasearch system for
`mediation among multiple search engines for audio-visual information.
`Keywords: MPEG-7, self-describing scheme, interoperability, audio-visual content description, visual information system,
`metasearch, XML.
`
`1. INTRODUCTION
`It is increasingly easier to access digital multimedia information. Correspondingly, it has become increasingly important to
`develop systems that process, filter, search and organize this information, so that useful knowledge can be derived from the
`exploding mass of information that is becoming accessible. To enable exciting new systems for processing, searching, filtering
`and organizing multimedia information, it has become clear that an interoperable method of describing multimedia content is
`necessary. This is the objective of the emerging MPEG-7 standardization effort.
`In this paper, we first give a brief overview of the objectives of the MPEG-7 standard. MPEG-7 aims at the
`standardization of content descriptions of multimedia data. The objectives of this standard are to facilitate content-focused
`applications like multimedia searching, filtering, browsing, and summarization.
`Then, we present self-describing schemes for interoperable image/video content descriptions, which are being
`developed as part of our proposal to MPEG-7. To ensure maximum interoperability and flexibility, our descriptions use the
`eXtensible Markup Language (XML), developed by the World Wide Web Consortium. Under the proposed self-describing
`schemes, an image is represented as a set of relevant objects that are organized in one or more object hierarchies. Similarly, a
`video is viewed as a set of relevant events that can be combined hierarchically in one ore more event hierarchies. Both, objects
`and events, are described by some feature descriptors that can link to external extraction and similarity code.
`Finally, we demonstrate the feasibility and efficiency of our self-describing schemes in our MPEG-7 testbed. In our
`testbed, we will show how our scheme can accommodate image and video descriptions that are generated by a wide variety of
`systems. In addition, we introduce two systems being developed, which are enabled and enhanced by our approach for multi-
`media content descriptions. The first system is an intelligent search engine with an associated expressive query interface. The
`second system is a metasearch system for mediation among multiple search engines for audio-visual information.
`
`1. Email: {syp, ana, sfchang}@ee.columbia.edu ; WWW: http://www.ee.columbia.eduk- {syp, ana, sfchang}/
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2013-1
`
`

`

`2. MPEG-7 STANDARD AND SCENARIOS
`
`2.1. MPEG-7 standard
`The MPEG-7 standard [14] has the objective of specifying a standard set of descriptors to describe various types of multimedia
`information. MPEG-7 will also standardize ways to define other descriptors as well as Description Schemes (DS s) for the
`structure of descriptors and their relationships. This description (i.e. the combination of descriptors and description schemes)
`will be associated with the content itself to allow fast and efficient searching for material of a user's interest. MPEG-7 will also
`standardize a language to specify description schemes, i.e. a Description Definition Language (DDL), and the schemes for
`encoding the descriptions of multimedia content.
`2.2. MPEG -7 scenarios
`MPEG-7 will improve existing applications and enable completely new ones. We will review three of the most relevantly
`impacted application scenarios [3]: distributed processing, exchange, and personalized viewing of multimedia content.
`• Distributed processing
`MPEG-7 will provide the ability to interchange descriptions of audio-visual material independently of any platform, any
`vendor, and any application, which will enable the distributed processing of multimedia content. This standard for
`interoperable content descriptions will mean that data from a variety of sources can be plugged into a variety of distributed
`applications such as multimedia processors, editors, retrieval systems, filtering agents, etc. Some of these applications
`may be provided by third parties, generating a sub-industry of providers of multimedia tools that can work with the
`standard descriptions of the multimedia data.
`The vision of the near future is one in which a user can access various content providers' web sites to download
`content and associated indexing data, obtained by some low-level or high-level processing. The user can then proceed to
`access several tool providers' web sites to download tools (e.g. Java applets) to manipulate the heterogeneous data
`descriptions in particular ways, according to the user's personal interests. An example of such a multimedia tool will be a
`video editor. A MPEG-7 compliant video editor will be able to manipulate and process video content from a variety of
`sources if the description associated with each video is MPEG-7 compliant. Each video may come with varying degrees of
`description detail such as camera motion, scene cuts, annotations, and object segmentations.
`• Content exchange
`A second scenario that will greatly benefit from an interoperable content-description standard is the exchange of
`multimedia content among heterogeneous audio-visual databases. MPEG-7 will provide the means to express, exchange,
`translate, and reuse existing descriptions of audio-visual material.
`Currently, TV broadcasters, radio broadcasters, and other content providers manage and store an enormous amount of
`audio-visual material. This material is currently described manually using textual information and proprietary databases.
`Describing audio-visual material is an expensive and time-consuming task, so it is desirable to minimize the re-indexing
`of data that has been processed before.
`Consider a media company that purchases videos from a TV broadcaster. The TV broadcaster has already described
`and indexed the content in their proprietary description scheme. Without an interoperable content description, the
`purchasing company will have to invest manpower to translate manually the description of the broadcaster into their
`proprietary scheme. Interchange of multimedia content descriptions would be possible if all the content providers
`embraced the same scheme and system. As this is unlikely to happen, MPEG-7 proposes to adopt a single industry-wide
`interoperable interchange format that is system and vendor independent.
`• Customized views
`Finally, multimedia players and viewers compliant with the multimedia description standard will provide the users with
`irmovative capabilities such as multiple views of the data configured by the user. The user could change the display's
`configuration without requiring the data to be downloaded again in a different format from the content broadcaster.
`The ability to capture and transmit semantic and structural annotations of the audio-visual data, made possible by
`MPEG-7, greatly expands the range of possibilities for client-side manipulation of the data for displaying purposes. For
`example, a browsing system can allow users to quickly browse through videos if they receive information about their
`corresponding semantic structure. For example, when modeling a tennis match video, the viewer can choose to view only
`the third game of the second set, all the overhead smashes made by one player, etc.
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2013-2
`
`

`

`These examples only hint at the possible uses that creative multimedia-application designers will find for richly
`structured data delivered in a standardized way based on MPEG-7.
`
`3. SELF -DESCRIBING SCHEMES
`
`In this section, we present description schemes for interoperable image/video content descriptions. The proposed description
`schemes are self-describing in the sense that they combine the data and the structure of the data in the same format. The
`advantages of such a type of descriptions are flexibility, easy validation, and efficient exchange.
`
`3.1. eXtensible Markup Language (XML)
`
`SGML (Standard Generalized Markup Language, ISO 8879) is a standard language for defining and using document formats.
`SGML allows documents to be self-describing, i.e. they describe their own grammar by specifying the tag set used in the
`document and the structural relationships that those tags represent. SGML makes it possible to define your own formats for
`your own documents, to handle large and complex documents, and to manage large information repositories. However, full
`SGML contains many optional features that are not needed for Web applications and has proven to be too complex to current
`vendors of Web browsers.
`
`The World Wide Web Consortium (W3C) has created an SGML Working Group to build a set of specifications to
`make it easy and straightforward to use the beneficial features of SGML on the Web [21]. The goal of the W3C SGML activity
`is to enable the delivery of self-describing data structures of arbitrary depth and complexity to applications that require such
`structures. The first phase of this effort is the specification of a simplified subset of SGML specially designed for Web
`applications. This subset, called XML (Extensible Markup Language), retains the key SGML advantages in a language that is
`designed to be vastly easier to learn, use, and implement than full SGML.
`
`Before describing the image and video DSs, we present some of the core features of XML. Let's start with a simple
`XML element:
`
`<im a g e > Hello M PEG -7 world ! </ima g e>
`
`<image> is the start tag and </image> is the end tag; Hello MPEG-7 world! is the content of the element. What does the image
`tag mean? In short, it means anything you want it to mean. XML predefines no tags at all. Rather than relying on a few hun-
`dred predefined tags, XML lets you create the tags you need to describe your data. Users define what is allowed in each docu-
`ment by providing rules, collectively known as the Document Type Definition (DTD). The DTD states the element types with
`their characteristics, the notations, and the entities allowed in the document. Apart from the DTD, XML documents must fol-
`low some basic well-form rules. This is the minimum criterion for XML parsers and processors.
`
`Text in XML documents consists of characters. A document's text is divided into character data and markup. In a first
`approximation, markup describes a document's logical structure while character data is the basic content of the document.
`Generally, anything inside a pair of <> angle brackets is markup and anything that is not inside these brackets is character data.
`Start tags and empty tags may optionally contain attributes. An attribute is a name-value pair separated by an equal sign. Work
`is in progress to include binary data in XML tags. Currently, XML allows defining binary entities pointing to binary data (e.g.
`images). They require an associated notation describing the type of resource (e.g. GIF and JPG).
`
`3.2. Image description scheme
`
`In this section, we present the proposed description scheme for images. To clarify the explanation, we will use the example
`shown in Figure 1. Using this example, we will walk through the image DS expressed in XML. Along the way, we will explain
`the use of various XML elements that are defined for the proposed image DS. The complete set of rules of the tags in the image
`and video description schemes is defined in our document type definitions [15]. Another advantage of using XML as the DLL
`is that it provides the capability to import external description schemes' DTDs to incorporate them in one description by using
`namespaces. We will see an example later in this section.
`
`The basic description element of our image description scheme is the object element (<object>). An object element
`represents a region of the image for which some features are available. There are two different types of objects: physical and
`logical objects. Physical objects usually correspond to continuous regions of the image with some descriptors in common
`(semantics, features, etc.) - in other words, real objects in the image. Logical objects are groupings of objects based on some
`high-level semantic relationships (e.g. faces). The object element comprises the concepts of group of objects, objects, and
`regions in the visual literature. The set of all objects identified in an image is included within the object set element
`(<object_set>).
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2013-3
`
`

`

`For the image example of Figure 1.a, we have chosen to describe the objects listed below. Each object element has a
`unique identifier within an image description. The identifier is expressed as an attribute of the object element (id). Another
`attribute of the object element (type) distinguishes between physical and logical objects. We have left the content of each
`object element empty to show clearly the overall structure of the image description. Later in the section, we will describe the
`features that can be included within the object element.
`
`<object_set>
`<object id =IP type="PHYSICAL" > </object> <!— Fa mily portra it —>
`<object id ="1" typ e ="PHYSICAL" > </object> <!— Father —>
`<object id ="2" typ e ="PHYSICAL" > </object> <!— M other —>
`<object id ="3" typ e ="LOG ICAL" > </object> <!— Fa ces —>
`<object id ="4" typ e ="PHYSICAL" > </object> <!— Father's fa ce —>
`<object id ="5" typ e ="PHYSICAL" > </object> <!— M other's fa ce —>
`</o bjec t_set>
`
`Set of Objects
`
`0 1 2 3 4 5 ...
`
`Physical Object
`Hierarchy
`
`Logical Object
`Hierarchy
`
`3
`
`4 5
`
`0
`
`1 (cid:9)
`
`2
`
`4 5
`
`b)
`
`a)
`
`Figure 1: a) Image example. b) High-level description of the image by proposed image description scheme.
`
`The image description scheme is comprised of object elements that are combined hierarchically in one or more object
`hierarchy elements (<object_hierarchy>). The hierarchy is a way to organize the object elements in the object set element.
`Each object hierarchy consists of a tree of object node elements (<object_node>). Each object node points to an object. The
`objects in an image can be organized by their location in the image or by their semantic relationships. These two ways to group
`objects generate two types of hierarchies: physical and logical hierarchies. A physical hierarchy describes the physical location
`of the objects in the image. On the other hand, a logical hierarchy organizes the objects based on a higher level understanding
`of their semantics, similar to semantic clustering.
`
`Continuing with the image example in Figure 1.a, two possible hierarchies are shown in Figure 1.b. These hierarchies
`are expressed in XML below. The type of hierarchy is included in the object hierarchy element as an attribute (type). The
`object node element has associated a unique identifier in the form of an attribute (id). The object node element references an
`object element by using the latter's unique identifier. The reference to the object element is included as an attribute
`(object_ref). An object element can include links back to nodes in the object hierarchy as an attribute too (object_node_ref).
`
`<0 bje c t_hie ra rthy typ e ="PHYSIC A L"> <!— Physic a I hie ra rc hy —>
`<0 b je c t_no d e id =1110" o bjec t_ref=13"> <!— Portrait —>
`<object_node id ="11"object_ref="1"> <!— Fa the r —>
`<object_node id ="12"object_ref="47> <!— Father's fa ce —>
`</object_node>
`<object_node id ="13"object_ref="2"> <!— Mother-->
`<object_node id ="14"object_ref="57> <!— Mother's fa ce —>
`</object_node>
`</object_node>
`qobject_hiera it hy>
`<object_hiera hy typ e =10 G ICAL"> <!— Log ic a I hie ra rc hy: faces in the image —>
`<0 b je c t_no d e id ="15" o bje c t_ref="3"> <!— Fa c es —>
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2013-4
`
`(cid:9)
`

`

`<object_node id ="16" object_ref="4"/> <!— Fa the r's fa c e —>
`<object_node id ="17" o bje c t_ref="5"/> <!— Mother's fa c e —>
`</object_node>
`</objec thie ra It hy>
`
`An object set element and one or more object hierarchy elements form the image element (<image>). The image ele-
`ment symbolizes the image or picture being described.
`
`In our image description scheme, the object element contains the feature elements; they include location, color, tex-
`ture, shape, size, motion, time, and annotation elements, among others. Time and motion descriptors with have sense when the
`object belongs to a video sequence. The location element contains pointers to the locations of the image. Note that annotations
`can be textual, visual or audio. These features can be extracted or assigned automatically or manually. For those features
`extracted automatically, the feature descriptors can include links to external extraction and similarity matching code. An exam-
`ple is included below. This example also shows how external DSs can be imported and combined with ours.
`<object id ="4" typ e ="PHYSIC A L" object_node_ref="12 16 1 > <!— Fa the r's face —>
`<c olor> </c olor>
`<texture>
`<ta mu ra >
`<ta mura_va lue c oa rseness=13.01" contra st="0.39" o de nta tio n=13.77>
`<c ode type ="IEX1RACTIO N" la ng ua ge="JAVA" version="1.2"> <!— Link to extra ction code —>
`<loc a tion> <loc a tion_site href="ftp ://extra c tion.ta mura .ja va "/> </loc a tion>
`</code>
`</ta mura >
`</texture>
`<sha pe> </sha pc >
`<position> </position>
`<!— import and use of e xte ma I a nnota tion DS's DID —>
`<text_a nnota tio n xmlns:extAnDS="http ://www.other.ds/a nnotations.dtd">
`<extAnDS:C la ss>Fa c e </extAnDS:C la ss>
`</te xt_a nnotation>
`<fp bjec t>
`
`In summary, both, the object hierarchy and object set elements, are part of the image element (<image>). The objects
`in the object set are combined hierarchically in one or more object hierarchy elements. For efficient transversal of the image
`description, links are provided to traverse from objects in the object set to corresponding object nodes in the object hierarchy
`and viceversa. The objects include various feature descriptors that can link to external extraction and similarity matching code.
`
`3.3. Video description scheme
`
`In this section, we present the proposed description scheme (DS) for videos. To clarify the explanation, we will use the exam-
`ple shown in Figure 2. Using this example, we will walk through the video DS expressed in XML. Along the way, we will
`explain the use of various XML elements that are defined for the proposed MPEG-7 video DS. The structure of the image
`description scheme and the video description scheme are very similar.
`
`The basic description element of our video description scheme is the event element (<event>). An event represents
`one or more shots of the video for which some features are available. We distinguish three different types of events: a shot, a
`continuous group of shots, and a discontinuous group of shots. Discontinuous group of shots will usually be associated
`together based on common features (e.g. background color) or high-level semantic relationships (e.g. actor on screen). The
`event element comprises the concepts of story, scene, and shot in the visual literature. The set of all events identified in a video
`is included within the event set element (<event_set>).
`
`For the video example of Figure 2.b, we have chosen to describe the events listed below. Each event element has a
`unique identifier within a video description. The identifier is expressed as an attribute of the event element (id). Another
`attribute of the event element (type) distinguishes between the three different types of events. We have left each event element
`empty to show clearly the overall structure of the video description. Later in the section, we will describe the features that can
`be included within the event element.
`
`<eve nt_set>
`<event id ="0" type ='SHOT' > </event> <!— The tig er —>
`<event id ="1" type ="SHOT' > </event> <!— Sta lking the prey-->
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2013-5
`
`

`

`<event id =12" type ='SHOT' > </event> <!-- C ha se —>
`<event id ="3" type ='SHOT' > </event> <!-- Ca pture —>
`<event id ="4" type ="C 0 WON UO US G ROUP_SHOIS" > </event> <!— Feed ing —>
`<event id ="5" type ="SHOT' > </event> <!-- Hiding the food -->
`<event id =16" type ="SHOT' > </event> <!-- Feeding the young —>
`</eve nt_set>
`
`The lig er [event 0]
`
`_
`
`Feed ing [event 4]
`
`Stalking the
`prey
`
`Chase
`
`Capture Hiding the
`Food
`
`[event 1] (cid:9)
`
`[eve nt2] (cid:9)
`
`[event 3]
`
`[event 5] (cid:9)
`
`Feeding
`the young
`
`[event 6]
`
`0:00 (cid:9)
`
`0:03 (cid:9)
`
`0:09 (cid:9)
`
`0:12 (cid:9)
`
`0:17 (cid:9)
`
`Time
`
`Set of Events
`
`0 1 2 3 4 5 6 ...
`
`a)
`
`b)
`
`Physical Event
`Hierarchy
`
`0
`
`1 2 3 4
`/\
`5 6
`
`Figure 2: a) Video example. b) High-level description of the video by proposed video description scheme.
`The video description scheme is comprised of event elements that are combined hierarchically in one or more event
`hierarchy elements (<event_hierarchy>). The hierarchy is a way to organize the event elements in the event set element. Each
`event hierarchy consists of a tree of event node elements (<event_node>). Each event node points to an event. The event in a
`video can be organized by their location in the video or by their semantic relationships. These two ways to group events
`generate two types of hierarchies: physical and logical hierarchies. A physical hierarchy describes the time composition of the
`events in the video. On the other hand, a logical hierarchy organizes the events based on a higher level understanding of their
`semantics, similar to semantic clustering.
`Continuing with the video example in Figure 2.a, one possible hierarchy is shown in Figure 2.b. The corresponding
`XML is below. The type of hierarchy is included in the event hierarchy element as an attribute (type). The event element has
`associated a unique identifier as an attribute (id). The event node element references an event element by using the latter's
`unique identifier. The reference to the event element is included as an attribute (event_ref). An event element can include links
`back to nodes in the event hierarchy to jump between events and event nodes in both directions (event_node_ref).
`<eve nt_h ie ra it hy type = 1PHYSE A L">
`<eve nt_no d e id ="10" eve nt_ref= 10"> <!-- lhe Tiger-->
`<eve nt_no d e id ="11" eve nt_ref="17 > <!— Stalking the prey —>
`<eve nt_no d e id ="12" eve nt_ref="27 > <!— C ha se -->
`<eve nt_no d e id ="13" eve nt_ref="37 > <!— C a ptu re —>
`<eve nt_no d e id ="14" eve nt_ref="4"> <!-- Feed ing —>
`<eve nt_nod e id ="15" eve nt_ref="57> <!— Hid ing the food —>
`<eve nt_nod e id ="16" eve nt_ref= 167> <!— Feed ing the young -->
`</eve nt_nod e >
`</eve nt_nod e >
`</eve nt_hie ra it hy>
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2013-6
`
`

`

`An event set element and one or more even hierarchy elements form the video element (<video>). The video element
`symbolizes the video sequence being described.
`In our video description scheme, the event element contains the feature elements; they include location, shot
`transition (i.e. various within shot or across shot special effects), camera motion, time, key frame, annotation and object set
`elements, among others. The object element is defined in the image description scheme; it represents the relevant objects in the
`event. As in the image DS, these features can be extracted or assigned automatically or manually. For those features extracted
`automatically, the feature descriptors can include links to extraction and similarity matching code. For example,
`<event id ="3" type d'PHYSE A L" event_node_ref="10"> <!— Ca p tu re —>
`<object_set> </object_set>
`<c a me ra _m otio n>
`<ba c kg ID un_a ffine_mod e I>
`<ba c kg rbund_affine_motion va lue>
`<pa nning d ire c tio n="NE7>
`<zoom direction="117>
`</bac kg round_a ffine_motion_va lue>
`<code type ="DIS1A NC E" la ngua ge="JAVA"version="1.0"> <!— Link to similarity ma tc hing c ode —>
`<location> <loc a tio n_site href="ftp://dist.ba c g round .a ffine '7> </location>
`</code>
`</bac kg round_a ffine_mod e I>
`</c a me ra _m otio n>
`<time> </time>
`</event>
`In summary, both, the event hierarchy and event set elements, are part of the video element (<video>). The event
`elements in the event set element are combined hierarchically in one or more event hierarchy elements. For efficient
`transversal of the video description, links are provided to traverse from events in the event set to corresponding event nodes in
`the event hierarchy and viceversa. The events include various feature descriptors that can link to extraction and similarity
`matching code.
`
`4. MPEG -7 TESTBED
`The proposed self-describing schemes are intuitive, flexible, and efficient. We demonstrate the feasibility of our self-describing
`schemes in our MPEG-7 testbed. In our test bed, we are using the self-describing schemes for descriptions of images and
`videos that are generated by a wide variety of systems we have developed. We are developing two systems that are enabled and
`enhanced by our approach for multimedia content descriptions. The first system is an intelligent search engine with an
`associated expressive query interface. The second system is a new version of MetaSEEk, a metasearch system for mediation
`among multiple search engines for audio-visual information.
`4.1. Description generator
`In our MPEG-7 testbed, we are using various image/video processing, analysis, and annotation systems to generate a rich vari-
`ety of descriptions for a collection of image/video items, as shown in Figure 3. The descriptions that we generate for visual
`content include low-level visual features of automatically segmented regions, user defined semantic objects, high-level scene
`properties, classifications, and associated textual information. We are also including descriptions that are generated by our col-
`laborators. As described in section 3.1., we are using XML as the DDL for the descriptions. The descriptions have the structure
`of the image/video description scheme (DS) described in section 3.2. and 3.3. The DS and DDL are designed to accommodate
`descriptions generated by a wide variety of heterogeneous systems.
`Once all the descriptions for an image/video item are generated, the descriptions are inputted into a database, which
`the search engine accesses. We shall describe now the systems used to generate the descriptions.
`• VideoQ: Region-based indexing and searching system.
`This system extracts visual features such as color, texture, motion, shape, and size for automatically segmented regions of
`a video sequence [5]. The system first decomposes a video into separate shots. This is performed by scene change detec-
`tion [13]. Scene changes may be either abrupt or transitional (e.g. dissolve, fade in/out, and wipe). For each shot, the sys-
`tem estimates the global (i.e. the motion of dominant background) and the camera motion. Then, it segments, detects, and
`tracks regions across the frames in the shot computing different visual features for each region. For each shot, the descrip-
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2013-7
`
`

`

`tion generated by this system is a set of regions with visual and motion features, and the camera motion. Some keywords,
`assigned manually, are also available for each shot.
`• AMOS: Video object segmentation system.
`
`Currently, fully automatic segmentation of semantic objects is only successful in constrained visual domains. The AMOS
`system [23] takes on a powerful approach in which automatic segmentation is integrated with user input to track semantic
`objects in video sequences. For general video sources, the system allows users to define an approximate object boundary
`by using a tracing interface. Given the approximate object boundary, the system automatically refines the boundary and
`tracks the movement of the object in subsequent frames of the video. The system is robust enough to handle many real-
`world situations that are hard to model in existing approaches, including complex objects, fast and intermittent motion,
`complicated backgrounds, multiple moving objects, and partial occlusion. The description generated by this system is a
`set of semantic objects with the associated regions and features that can be manually annotated with text.
`• MPEG domain face detection system.
`This system efficiently and automatically detects faces directly in the MPEG compressed domain [20]. The human face is
`an important subject in video. It is ubiquitous in news, documentaries, movies, etc., providing key information to the
`viewer for the understanding of the video content. This system provides a set of regions with face labels.
`• WebClip: Hierarchical video browsing system.
`
`This system parsers compressed MPEG video streams to extract shot boundaries, moving objects, object features, and
`camera motion [12]. It also generates a hierarchical shot-based browsing interface for intuitive visualization and editing of
`videos.
`
`Image/video content
`
`VideoQ video object feature
`extraction
`
`_o MPEG-7 description
`
`MPEG-domain face detection
`
`_o MPEG-7 description h
`
`AMOS object segmentation
`
`WebClip hierarchical video
`browsing
`
`_o MPEG-7 description k
`
`MPEG-7 description h
`
`Manual text annotations
`
`MPEG-7 description h
`
`In Lumine scene classification
`
`_o MPEG-7 description k
`
`Visual apprentice model based
`classification system
`
`_o MPEG-7 description h
`
`Descriptions generated col-
`laborators
`
`MPEG-7 description
`
`(cid:9)/Integration of MPEG-7
`descriptions
`
`w
`Image/video database
`
`:
`
`Image/video search engine
`
`:
`
`Query interface and
`image/video browser
`
`Figure 3: Architecture for combining descriptions generated by heterogeneous systems.
`• Visual Apprentice: Model based image classification system.
`
`Many automatic image classification systems are based on a pre-defined set of classes in which class-specific algorithms
`are used to perform classification. The Visual Apprentice [10] allows users to define their own classes and provide exam-
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2013-8
`
`

`

`pies that are used to automatically learn visual models. The visual models are based on automatically segmented regions,
`their associated visual features, and their spatial relationships. For example, the user may build a visual model of a portrait
`in which one person wearing a blue suit is seated on a brown sofa, and a second person is standing to the right of the
`seated person. The system uses a combination of lazy-learning, decision trees, and evolution programs during classifica-
`tion. The description generated by this system is a set of text annotations, i.e. the user defined classes, for each image.
`*In Lumine: Scene classification system.
`The In Lumine system [16] is a method for high-level semantic classification of images and video shots based on low-level
`visual features. The core of the system consists of various machine learning techniques such as rule induction, clustering,
`and nearest neighbor classification. The system is being used to classify images and video scenes into high-level semantic
`scene classes such as {nature landscape}, {city/suburb}, {indoor}, and {outdoor} . The

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket