`Query by Image
`and Video Content:
`The QBIC System
`The QBIC System
`
`Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jonathan Ashley, Qian Huang, Byron Dom,
`Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jonathan Ashley, Qian Huang, Byron Dom,
`Monika Gorkani, Jim Hafher, Denis Lee, Dragutin Petkovie, David Steele, and Peter Yanker
`Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, David Steele, and Peter Yanker
`ZBMAlmaden Research Center
`IBM Almaden Research Center
`
`P
`picture yourself as a fashion designer needing images of fabrics
`
`icture yourself as a fashion designer needing images of fabrics
`with a particular mixture of colors, a museum cataloger looking
`with a particular mixture of colors, a museum cataloger looking
`for artifacts of a particular shape and textured pattern, or a movie
`for artifacts of a particular shape and textured pattern, or a movie
`producer needing a video clip of a red car-like object moving from right
`producer needing a video clip of a red car-like object moving from right
`to left with the camera zooming. How do you find these images? Even
`to left with the camera zooming. How do you find these images? Even
`though today’s technology enables us to acquire, manipulate, transmit,
`though today's technology enables us to acquire, manipulate, transmit,
`and store vast on-line image and video collections, the search method-
`and store vast on-line image and video collections, the search method-
`ologies used to find pictorial information are still limited due to difficult
`ologies used to find pictorial information are still limited due to difficult
`research problems (see “Semantic versus nonsemantic” sidebar). Typ-
`research problems (see "Semantic versus nonsemantic" sidebar). Typ-
`ically, these methodologies depend on file IDS, keywords, or text associ-
`ically, these methodologies depend on file IDs, keywords, or text associ-
`ated with the images. And, although powerful, they
`ated with the images. And, although powerful, they
`don’t allow queries based directly on the visual properties of the images,
`• don't allow queries based directly on the visual properties of the images,
`are dependent on the particular vocabulary used, and
`• are dependent on the particular vocabulary used, and
`don’t provide queries for images similar to a given image.
`• don't provide queries for images similar to a given image.
`Research on ways to extend and improve query methods for image data-
`Research on ways to extend and improve query methods for image data-
`bases is widespread, and results have been presented in workshops, con-
`bases is widespread, and results have been presented in workshops, con-
`ferences,’.* and surveys.
`ferences,L2 and surveys.
`We have developed the QBIC (Query by Image Content) system to
`We have developed the QBIC (Query by Image Content) system to
`explore content-based retrieval methods. QBIC allows queries on large
`explore content-based retrieval methods. QBIC allows queries on large
`image and video databases based on
`image and video databases based on
`example images,
`• example images,
`user-constructed sketches and drawings,
`• user-constructed sketches and drawings,
`selected color and texture patterns,
`• selected color and texture patterns,
`
`m
`
`QBlC* lets users
`QBIC* lets users
`find pictorial information
`find pictorial information
`in large image and video
`in large image and video
`databases based on color,
`databases based on color,
`shape, texture, and sketches.
`shape, texture, and sketches.
`QBIC technology is part of
`QBIC technology is part of
`several I B M products.
`several IBM products.
`
`‘To run an interacnve query, vult the QBIC Web sewer
`*To run an interactive query, visit the QBIC Web server
`at http //imwqbic almaden ibm COW
`' at lutp://wwwqbic. alma den. ibm. con3.
`
`Semantic versus nonsemantic information
`Semantic versus nonsemantic information
`At first glance, content-based querying appears deceptively
`descriptions t o scenes through model matching-is an
`At first glance, content-based querying appears deceptively
`descriptions to scenes through model matching—is an
`simple because we humans seem to be so good at it. If a pro-
`unsolved problem in image understanding. Humans are
`simple because we humans seem to be so good at it. If a pro-
`unsolved problem in image understanding. Humans are
`gram can be written to extract semantically relevant text
`much better than computers at extracting semantic descrip-
`gram can be written to extract semantically relevant text
`much better than computers at extracting semantic descrip-
`phrases from images, the problem may be solved by using
`tions from pictures. Computers, however, are better than
`phrases from images, the problem may be solved by using
`tions from pictures. Computers, however, are better than
`currently available text-search technology. Unfortunately, in
`humans at measuring properties and retaining these in
`currently available text-search technology. Unfortunately, in
`humans at measuring properties and retaining these in
`an unconstrained environment, the task of writing this pro-
`long-term memory.
`an unconstrained environment, the task of writing this pro-
`long-term memory.
`gram is beyond the reach of current technology in image
`One of the guiding principles used by QBIC is t o let com-
`gram is beyond the reach of current technology in image
`One of the guiding principles used by QBIC is to let com-
`understanding. At an artificial intelligence conference sev-
`puters do what they do best-quantifiable measurement-
`understanding. At an artificial intelligence conference sev-
`puters do what they do best—quantifiable measurement—
`eral years ago, a challenge was issued to the audience to write
`and let humans do what they do best-attaching semantic
`eral years ago, a challenge was issued to the audience to write
`and let humans do what they do best—attaching semantic
`a program that would identify all the dogs pictured in a chil-
`meaning. QBIC can find “fish-shaped objects,” since shape
`a program that would identify all the dogs pictured in a chil-
`meaning. QBIC can find "fish-shaped objects," since shape
`is a measurable property that can be extracted. However,
`dren’s book, a task most 3-year-olds can easily accomplish.
`dren's book, a task most 3-year-olds can easily accomplish.
`is a measurable property that can be extracted. However,
`Nobody in the audience accepted the challenge, and this
`since fish occur in many shapes, the only fish that will be
`Nobody in the audience accepted the challenge, and this
`since fish occur in many shapes, the only fish that will be
`found will have a shape close t o the drawn shape. This is not
`remains an open problem.
`remains an open problem.
`found will have a shape close to the drawn shape. This is not
`the same as the much harder semantical query of finding
`Perceptual organization-the process of grouping image
`Perceptual organization—the process of grouping image
`the same as the much harder semantical query of finding
`all the pictures of fish in a pictorial database.
`features into meaningful objects and attaching semantic
`features into meaningful objects and attaching semantic
`all the pictures of fish in a pictorial database.
`
`0018-9162/95/$4 00
`0018-9162195154.00 (cid:9)
`
`1995 IEEE
`1995 IEEE
`
`September 1995
`September 1995
`
`23
`
`Page 1 of 10
`
`MINDGEEK EXHIBIT 1013
`
`
`
`Grid: (cid:9)
`
`Undo
`
`att
`
`It18 (cid:9)
`11 (cid:9)
`
`.43/111 111 MIELe
`51.
`4,111111M111111191111811 DN,
`1111.,f
`URI (cid:9)
`.1111111 '
`.111811 (cid:9)
`AM* ,IF.11*1111111111111111
`g (cid:9)
`011till,n181111111112l
`11111111111111111111111111111111111111111111111111
`11101111111111M1111111111g11111111
`811111110111111111111111M1111111111111
`118118111111111111111111111111111111101111
`
`poly
`
`elli
`
`root
`
`line
`
`Pft) Y & Query
`
`I -C ome
`-21 hits retuned (at loam 12941 palatial. results atorahma
`,ear? (cid:9)
`Figure 1. QBlC query by drawn color. Drawn query specification on left; best 21 results sorted by similarity
`Figure 1. QBIC query by drawn color. Drawn query specification on left; best 21 results sorted by similarity
`t o the query on right. The results were selected from a 12,968-picture database.
`to the query on right. The results were selected from a 12,968-picture database.
`
`~.
`
`,-\,
`S t i l l images
`Still images (cid:9)
`
`)
`)
`
`R-frames
`
`Use
`
`Object
`identification
`
`Scene
`Scene
`
`s,
`
`Objects
`\,
`\ Feature
`Feature
`extraction
`
`/Motion objects
`Motion objects
`f
`
`‘ Shots
`Shots
`
`Sketch (cid:9)
`
`Scene
`Positional
`color/texture
`
`Object
`
`User (cid:9)
`defined Texture Color)
`
`Location Shape
`
`mere
`motion
`
`•
`
`Database
`
`Filtering/indexing i•41
`
`I
`
`Query interface
`Query interface
`
`Text
`Location
`Sketch
`Multiobject
`Shape
`User -Pia Color Texture Shape Multiobject Sketch Location Text
`Existlng
`ional Object Camera User
`Positional Object Camera User Existing
`exture motion motlon deflned
`color/texture motion motion defined image
`
`1 *
`
`Match engine
`Match engine
`
`I
`I
`I
`
`,
`
`-
`
`Text
`Location
`Sketch
`Multiobject
`Shape
`Texture
`Color
`Color Texture Shape Multiobject Sketch Location Text +if_
`Positional Object Camera User
`Positional Object Camera User
`color/texture motion motion defined
`color/texture motion motion defined
`i
`
`/’
`
`User
`User-4 (cid:9)
`
` Best matches returned
`in similarity order (cid:9)
`
`)
`
`Figure 2. QBIC database population (top) and query (bottom) architecture.
`
`Computer
`Computer
`
`Page 2 of 10
`
`MINDGEEK EXHIBIT 1013
`
`(cid:9)
`
`
`Figure 3. QBIC still image population interface. Entry for scene
`Figure 3. QBIC still image population interface. Entry for scene
`text at top. Tools in row are polygon outliner, rectangle outliner,
`text at top. Tools in row are polygon outliner, rectangle outliner,
`ellipse outliner, paintbrush, eraser, line drawing, object
`ellipse outliner, paintbrush, eraser, line drawing, object
`translation, flood fill, and snake outliner.
`translation, flood fill, and snake outliner.
`
`1
`
`I
`
`eJ
`
`$
`
`camera and object motion, and
`• camera and object motion, and
`other graphical information.
`• other graphical information.
`Two key properties of QBIC are (1) its
`Two key properties of QBIC are (1) its
`use of image and video content-com-
`use of image and video content—com-
`putable properties of color, texture, shape,
`putable properties of color, texture, shape,
`and motion of images, videos, and their
`and motion of images, videos, and their
`the queries, and (2) its graph-
`objects-in
`objects—in the queries, and (2) its graph-
`ical query language in which queries are
`ical query language in which queries are
`posed by drawing, selecting, and other
`posed by drawing, selecting, and other
`graphical means. Related systems, such as
`graphical means. Related systems, such as
`MIT’s Photobook3 and the Trademark and
`MIT's Photobook' and the Trademark and
`Art Museum applications from ETL,4 also
`Art Museum applications from ETL,4 also
`address these common issues. This article
`address these common issues. This article
`describes the QBIC system and demon-
`describes the QBIC system and demon-
`strates its query capabilities.
`strates its query capabilities.
`QBIC SYSTEM OVERVIEW
`QBIC SYSTEM OVERVIEW
`Figure 1 illustrates a typical QBIC query.”
`Figure 1 illustrates a typical QBIC query.'
`The left side shows the query specification,
`The left side shows the query specification,
`where the user painted a large magenta cir-
`where the user painted a large magenta cir-
`cular area on a green background using standard drawing
`cular area on a green background using standard drawing
`tools. Query results are shown on the right: an ordered list of
`tools. Query results are shown on the right: an ordered list of
`“hits” similar to the query. The order of the results is top to
`"hits" similar to the query. The order of the results is top to
`bottom, then left to right, to support horizontal scrolling. In
`bottom, then left to right, to support horizontal scrolling. In
`general, all queries follow this model in that the query is spec-
`general, all queries follow this model in that the query is spec-
`ified by using graphical means-drawing, selecting from a
`ified by using graphical means—drawing, selecting from a
`results
`color wheel, selecting a sample image, and so on-and
`color wheel, selecting a sample image, and so on—and results
`are displayed as an ordered set of images.
`are displayed as an ordered set of images.
`To achieve this functionality, QBIC has two main com-
`To achieve this functionality, QBIC has two main com-
`ponents: database population (the process of creating an
`ponents: database population (the process of creating an
`image database) and database query. During the popula-
`image database) and database query. During the popula-
`tion, images and videos are processed to extract features
`tion, images and videos are processed to extract features
`describing their content-colors,
`textures, shapes, and
`describing their content—colors, textures, shapes, and
`camera and object motion-and
`the features are stored in
`camera and object motion—and the features are stored in
`a database. During the query, the user composes a query
`a database. During the query, the user composes a query
`graphically. Features are generated from the graphical
`graphically. Features are generated from the graphical
`query and then input to a matching engine that finds
`query and then input to a matching engine that finds
`images or videos from the database with similar features.
`images or videos from the database with similar features.
`Figure 2 shows the system architecture.
`Figure 2 shows the system architecture.
`
`Data model
`Data model
`For both population and query, the QBIC data model has
`For both population and query, the QBIC data model has
`
`still images or scenes (full images) that contain objects
`• still images or scenes (full images) that contain objects
`(subsets of an image), and
`(subsets of an image), and
`video shots that consist of sets of contiguous frames and
`• video shots that consist of sets of contiguous frames and
`contain motion objects.
`contain motion objects.
`
`For still images, the QBIC data model distinguishes between
`For still images, the QBIC data model distinguishes between
`“scenes” (or images) and “objects.” A scene is an image or
`"scenes" (or images) and "objects." A scene is an image or
`single representative frame of video. An object is a part of
`single representative frame of video. An object is a part of
`a scene-for
`example, the fox in Figure 3-or
`a moving
`a scene—for example, the fox in Figure 3—or a moving
`entity in a video. For still image database population, fea-
`entity in a video. For still image database population, fea-
`tures are extracted from images and objects and stored in a
`tures are extracted from images and objects and stored in a
`database as shown in the top left part of Figure 2.
`database as shown in the top left part of Figure 2.
`Videos are broken into clips called shots. Representative
`Videos are broken into clips called shots. Representative
`
`frames, or r-frames, are generated for each extracted shot.
`frames, or r-frames, are generated for each extracted shot.
`R-frames are treated as still images, and features are
`R-frames are treated as still images, and features are
`extracted and stored in the database. Further processing
`extracted and stored in the database. Further processing
`of shots generates motion objects-for
`example, a car
`of shots generates motion objects—for example, a car
`moving across the screen.
`moving across the screen.
`Queries are allowed on objects (“Find images with a red,
`Queries are allowed on objects ("Find images with a red,
`round object”), scenes (“Find images that have approxi-
`round object"), scenes ("Find images that have approxi-
`mately 30-percent red and 15-percent blue colors”), shots
`mately 30-percent red and 15-percent blue colors"), shots
`(“Find all shots panning from left to right”), or any com-
`("Find all shots panning from left to right"), or any com-
`bination (“Find images that have 30 percent red and con-
`bination ("Find images that have 30 percent red and con-
`tain a blue textured object”).
`tain a blue textured object").
`In QBIC, similarity queries are done against the data-
`In QBIC, similarity queries are done against the data-
`base of pre-extracted features using distance functions
`base of pre-extracted features using distance functions
`between the features. These functions are intended to
`between the features. These functions are intended to
`mimic human perception to approximate a perceptual
`mimic human perception to approximate a perceptual
`ordering of the database. Figure 2 shows the match
`ordering of the database. Figure 2 shows the match
`engine, the collection of all distance functions. The match
`engine, the collection of all distance functions. The match
`engine interacts with a filteringhndexing module (see
`engine interacts with a filtering/indexing module (see
`“Fast searching and indexing” sidebar, next page) to sup-
`'Fast searching and indexing" sidebar, next page) to sup-
`port fast searching methodologies such as indexing. Users
`port fast searching methodologies such as indexing. Users
`interact with the query interface to generate a query spec-
`interact with the query interface to generate a query spec-
`ification, resulting in the features that define the query.
`ification, resulting in the features that define the query.
`DATABASE POPULATION
`DATABASE POPULATION
`In still image database population, the images are
`In still image database population, the images are
`reduced to a standard-sized icon called a thumbnail and
`reduced to a standard-sized icon called a thumbnail and
`annotated with any available text information. Object
`annotated with any available text information. Object
`identification is an optional but key part of this step. It lets
`identification is an optional but key part of this step. It lets
`users manually, semiautomatically, or fully automatically
`users manually, semiautomatically, or fully automatically
`identify interesting regions-which we call objects-in
`identify interesting regions—which we call objects—in
`the images. Internally, each object is represented as a
`the images. Internally, each object is represented as a
`binary mask. There may be an arbitrary number of objects
`binary mask. There may be an arbitrary number of objects
`per image. Objects can overlap and can consist of multi-
`per image. Objects can overlap and can consist of multi-
`ple disconnected components like the set of dots on a
`ple disconnected components like the set of dots on a
`polka-dot dress. Text, like “baby on beach,” can be associ-
`polka-dot dress. Text, like "baby on beach," can be associ-
`ated with an outlined object orwith the scene as a whole.
`ated with an outlined object or with the scene as a whole.
`
`’’ The scene image database used in thefigures consists of about 2 4 5 0
`The scene image database used in the figures consists of about 7,450
`imagesfrom the Mediasource Series of images and audiofrom Applred
`images from the Mediusource Series of images and audio from Applied
`Optical Media Corp., 4,100 imagesfiom the PhotoDiscsampler CD, 950
`Optical Media Corp., 4,100 images from the PhotoDisc sampler CD, 950
`imagesfrom the Corel Professional Photo CD collection, and 450 images
`images from the Corel Professional Photo CD collection, and 450 images
`J?om an IBM collection.
`from an IBM collection.
`
`Object-outlining tools
`Object-outlining tools
`Ideally, object identification would be automatic, but
`Ideally, object identification would be automatic, but
`this is generally difficult. The alternative-manual
`iden-
`this is generally difficult. The alternative—manual iden-
`tification-is
`tedious and can inhibit query-by-content
`tification—is tedious and can inhibit query-by-content
`
`September 1995
`September 1995
`
`25
`
`Page 3 of 10
`
`MINDGEEK EXHIBIT 1013
`
`
`
`Fast searching and indexing
`Fast searching and indexing
`Indexing tabular data for exact matching or range
`Indexing tabular data for exact matching or range
`searches in traditional databases is a well-understood prob-
`searches in traditional databases is a well-understood prob-
`lem, and structures like B-trees provide efficient access
`lem, and structures like B-trees provide efficient access
`mechanisms. In this scenario, indexing assures sublinear
`mechanisms. In this scenario, indexing assures sublinear
`search while maintaining completeness; that is, all records
`search while maintaining completeness; that is, all records
`satisfying the query are returned without the need for
`satisfying the query are returned without the need for
`examining each record in the database. However, in the con-
`examining each record in the database. However, in the con-
`text of similarity matching for visual content, traditional
`text of similarity matching for visual content, traditional
`indexing methods may not be appropriate. For queries in
`indexing methods may not be appropriate. For queries in
`which similarity is defined as a distance metric in high-
`which similarity is defined as a distance metric in high-
`dimensional feature spaces (for example, color histogram
`dimensional feature spaces (for example, color histogram
`queries), indexing involves clustering and indexable repre-
`queries), indexing involves clustering and indexable repre-
`sentations of the clusters. In the case of queries that com-
`sentations of the clusters. In the case of queries that com-
`bine similarity matching with spatial constraints on objects,
`bine similarity matching with spatial constraints on objects,
`the problem is more involved. Data structures for fast access
`the problem is more involved. Data structures for fast access
`of high-dimensional features for spatial relationships must
`of high-dimensional features for spatial relationships must
`be invented.
`be invented.
`In a query, features from the database are compared t o
`In a query, features from the database are compared to
`corresponding features from the query specification t o
`corresponding features from the query specification to
`determine which images are a good match. For a small data-
`determine which images are a good match. For a small data-
`base, sequentigl scanning of the features followed by
`base, sequential scanning of the features followed by
`straightforward similarity computations is adequate. But as
`straightforward similarity computations is adequate. But as
`the database grows, this combination can be too slow. To
`the database grows, this combination can be too slow. To
`speed up the queries, we have investigated a variety of tech-
`speed up the queries, we have investigated a variety of tech-
`niques. Two of the most promising follow.
`niques. Two of the most promising follow.
`
`Filtering
`Filtering
`A computationally fast filter is applied to all data, and only
`A computationally fast filter is applied to all data, and only
`items that passthrough the filter are operated on by the sec-
`items that pass through the filter are operated on by the sec-
`ond stage, which computes the true similarity metric. For
`ond stage, which computes the true similarity metric. For
`example, in QBlC we have shown that color histogram match-
`example, in QBIC we have shown that color histogram match-
`ing, which is based on a 256-dimensional color histogram and
`ing, which is based on a 256-dimensional color histogram and
`requires a 256 matrix-vector multiply, can be made efficient
`requires a 256 matrix-vector multiply, can be made efficient
`by filtering. The filtering step employs a much faster com-
`by filtering. The filtering step employs a much faster com-
`putation in a 3D space with no loss in accuracy. Thus, for a
`putation in a 3D space with no loss in accuracy. Thus, for a
`query on a database of 10,000 elements, the fast filter is
`query on a database of 10,000 elements, the fast filter is
`applied t o produce the best 1,000 color histogram matches.
`applied to produce the best 1,000 color histogram matches.
`These filtered histograms are subsequently passed to the
`These filtered histograms are subsequently passed to the
`slower complete matching operation t o obtain, say, the best
`slower complete matching operation to obtain, say, the best
`200 matches t o displayto a user, with the guarantee that the
`200 matches to display to a user, with the guarantee that the
`global best 200 in the database have been found.
`global best 200 in the database have been found.
`
`Indexing
`Indexing
`For low-dimensional features such as average color and
`For low-dimensional features such as average color and
`texture (each 3D), multidimensional indexing methods such
`texture (each 3D), multidimensional indexing methods such
`as R*-trees can be used. For high-dimensional features-for
`as R*-trees can be used. For high-dimensional features—for
`example, our 20-dimensional moment-based shape feature
`example, our 20-dimensional moment-based shape feature
`vector-the dimensionality is reduced using the K-L, or prin-
`vector—the dimensionality is reduced using the K-L, or prin-
`cipal component, transform. This produces a low-dimen-
`cipal component, transform. This produces a low-dimen-
`sional space, as low astwo or three dimensions, which could
`sional space, as low as two or three dimensions, which could
`be indexed by using /?*-trees.
`be indexed by using R*-trees.
`
`applications. As a result, we have devoted considerable
`applications. As a result, we have devoted considerable
`effort to developing tools to aid in this step. In recent
`effort to developing tools to aid in this step. In recent
`work, we have successfully used fully automatic unsu-
`work, we have successfully used fully automatic unsu-
`pervised segmentation methods along with a fore-
`pervised segmentation methods along with a fore-
`ground/background model to identify objects in a re-
`ground/background model to identify objects in a re-
`stricted class of images. The images, typical of museums
`stricted class of images. The images, typical of museums
`and retail catalogs, have a small number of foreground
`and retail catalogs, have a small number of foreground
`objects on a generally separable background. Figure 4
`objects on a generally separable background. Figure 4
`shows example results. Even in this domain, robust algo-
`shows example results. Even in this domain, robust algo-
`rithms are required because of the textured and varie-
`rithms are required because of the textured and varie-
`gated backgrounds.
`gated backgrounds.
`
`We also provide semiautomatic tools for identifying
`We also provide semiautomatic tools for identifying
`objects. One is an enhanced flood-fill technique. Flood-fill
`objects. One is an enhanced flood-fill technique. Flood-fill
`methods, found in most photo-editing programs, start
`methods, found in most photo-editing programs, start
`from a single object pixel and repeatedly add adjacent pix-
`from a single object pixel and repeatedly add adjacent pix-
`els whose values are within some given threshold of the
`els whose values are within some given threshold of the
`original pixel. Selecting the chreshold, which must change
`original pixel. Selecting the threshold, which must change
`from image to image and object to object, is tedious. We
`from image to image and object to object, is tedious. We
`automatically calculate a dynamic threshold by having the
`automatically calculate a dynamic threshold by having the
`user click on background as well as object points. For rea-
`user click on background as well as object points. For rea-
`sonably uniform objects that are distinct from the back-
`sonably uniform objects that are distinct from the back-
`ground, this operation allows fast object identification
`ground, this operation allows fast object identification
`
`Figure 4. Top row is the original image. Bottom row contains the automatically extracted objects using a
`Figure 4. Top row is the original image. Bottom row contains the automatically extracted objects using a
`foregroundhackground model. Heuristics encode the knowledge that objects tend to be in the center of
`foreground/background model. Heuristics encode the knowledge that objects tend to be in the center of
`the picture.
`the picture.
`
`26
`
`Computer
`Computer
`
`Page 4 of 10
`
`MINDGEEK EXHIBIT 1013
`
`
`
`without manually adjust-
`without manually adjust-
`ing a threshold. The exam-
`ing a threshold. The exam-
`ple in Figure 3 shows an
`ple in Figure 3 shows an
`object, a fox, identified by
`object, a fox, identified by
`using only a few clicks.
`using only a few clicks.
`We designed another
`We designed another
`outlining tool to help users
`outlining tool to help users
`track object edges. This tool
`track object edges. This tool
`takes a user-drawn curve
`takes a user-drawn curve
`and automatically aligns it
`and automatically aligns it
`with nearby image edges.
`with nearby image edges.
`Based on the “snakes” con-
`Based on the "snakes" con-
`cept developed in recent
`cept developed in recent
`computer vision research,
`computer vision research,
`the tool finds the curve that
`the tool finds the curve that
`maximizes the image gra-
`maximizes the image gra-
`dient magnitude along the
`dient magnitude along the
`curve.
`curve.
`The spline snake formu-
`The spline snake formu-
`lation we use allows for
`lation we use allows for
`smooth solutions to the
`smooth solutions to the
`resulting nonlinear mini-
`resulting nonlinear mini-
`mization problem. The
`mization problem. The
`computation is done at
`computation is done at
`interactive speeds so that,
`interactive speeds so that,
`as the user draws a curve, it
`as the user draws a curve, it
`is “rubber-banded’’ to lie
`is "rubber-banded" to lie
`along object boundaries.
`along object boundaries.
`
`Video data
`Video data
`For video data, database
`For video data, database
`population has three major
`population has three major
`components:
`components:
`
`shot detection,
`• shot detection, (cid:9)
`representative frame cre-
`• representative frame cre-
`ation for each shot, and
`ation for each shot, and
`derivation of a layered representation of coherently
`• derivation of a layered representation of coherently
`moving structures/objects.
`moving structures/objects.
`
`Shots are short sequences of contiguous frames that we
`Shots are short sequences of contiguous frames that we
`use for annotation and querying. For instance, a video clip
`use for annotation and querying. For instance, a video clip
`may consist of a shot smoothly panning over the skyline
`may consist of a shot smoothly panning over the skyline
`of San Francisco, switching to a panning shot of the Bay
`of San Francisco, switching to a panning shot of the Bay
`meeting the ocean, and then to one that zooms to the
`meeting the ocean, and then to one that zooms to the
`Golden Gate Bridge. In general, a set of contiguous frames
`Golden Gate Bridge. In general, a set of contiguous frames
`may be grouped into a shot because they
`may be grouped into a shot because they
`
`depict the same scene,
`• depict the same scene,
`signify a single camera operation,
`• signify a single camera operation,
`contain a distinct event or an action like a significant
`• contain a distinct event or an action like a significant
`presence and persistence of an object, or
`presence and persistence of an object, or
`are chosen as a single indexable entity by the user.
`• are chosen as a single indexable entity by the user.
`
`Our effort is to detect many shots automatically in a pre-
`Our effort is to detect many shots automatically in a pre-
`processing step and provide an easy-to-use interface for
`processing step and provide an easy-to-use interface for
`the rest.
`the rest.
`
`SHOT DETECTION. Gross scene changes or scene cuts
`SHOT DETECTION. Gross scene changes or scene cuts
`are the first indicators of shot boundaries. Methods for
`are the first indicators of shot boundaries. Methods for
`detecting scene cuts proposed in the literature essentially
`detecting scene cuts proposed in the literature essentially
`fall into two classes: (1) those based on global represen-
`fall into two classes: (1) those based on global represen-
`
`Figure 5. Scene cuts automatically extracted from a 1,148-frame sales demo
`Figure S. Scene cuts automatically extracted from a 1048-frame sales demo
`from Energy Productions.
`from Energy Productions.
`
`~
`
`U-M-I
`U-M-I
`BEST COPY AVAILABLE
`BEST COPY AVAILABLE
`-~
`tations like color/intensity histogram