throbber
(12) United States Patent
`US 6,282,317 B1
`(10) Patent N0.:
`Luo et al.
`
`(45) Date of Patent: Aug. 28, 2001
`
`USOO6282317B1
`
`(54) METHOD FOR AUTOMATIC
`DETERMINATION OF MAIN SUBJECTS IN
`PHOTOGRAPHIC IMAGES
`
`(75)
`
`Inventors: Jiebo Luo, Pittsford; Stephen Etz,
`Fairport; Amit Singhal, Rochester, all
`of NY (US)
`
`(73) Assignee: Eastman Kodak Company, Rochester,
`NY (US)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 09/223,860
`
`(22)
`
`Filed:
`
`Dec. 31, 1998
`
`Int. Cl.7 ....................................................... G06K 9/46
`(51)
`(52) US. Cl.
`.............................................................. 382/203
`(58) Field of Search ..................................... 382/203, 204,
`382/205, 206, 207, 168, 169, 173, 218,
`195, 155, 156, 159, 160, 190, 202, 227
`
`W. Osberger, et al., “Automatic identification of perpetually
`important regions in an image,” in Proc. IEEE Int. Conf.
`Pattern Recognition, 1998.
`Q. Huang, et al., “Foreground/background segmentation of
`color images by integration of multiple cues,” in Proc. IEEE
`Int. Conf. Image Process., 1995.
`T. F. Syeda—Mahmood, “Data and model—driven selection
`using color regions,” Int. J. Comput. Vision, vol. 21, No. 1,
`pp. 9—36, 1997.
`Luo, et al., “Towards physics—based segmentation of pho-
`tographic color images,” Proceedings of the IEEE Interna-
`tional Conference on Image Processing, 1997.
`L. R. Williams, “Perceptual organization of occluding con-
`tours,” in Proc. IEEE Int. Conf. Computer Vision, 1990.
`J. August, et al., “Fragment grouping via the principle of
`perceptual occlusion,” in Proc. IEEE Int. Conf. Pattern
`Recognition, 1996.
`Lee, “Color image quantization based on physics and psy-
`chophysics,” Journal of Society of Photographic Science
`and Technology ofJapan, vol. 59, No. 1, pp. 212—225, 1996.
`J. Pearl, Probabilistic Reasoning in Intelligent Systems, San
`Francisco, CA: Morgan Kauffmann, 1988.
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`* cited by examiner
`
`.................................. 382/38
`4,975,975 * 12/1990 Filipski
`5,724,456
`3/1998 Boyack et a1.
`382/274
`
`9/1998 Akerib ..............
`395/800.14
`5,809,322 *
`
`....................... 364/514
`5,850,352 * 12/1998 Moezzi et al.
`1/1999 Rhoads ................................. 382/232
`5,862,260 *
`
`5,983,218 * 11/1999 Syeda-Mahmood ..
`707/104
`6,014,461 *
`1/2000 Hennessey et a1.
`................. 382/195
`OTHER PUBLICATIONS
`
`V. D. Gesu, et al., “Local Operators to detect regions of
`interest,” Pattern Recognition Letters, vol.
`18, pp.
`1077—1081, 1977.
`R. Milanese, Detecting salient regions in an image: From
`biological evidence to computer implementations, PhD the-
`sis, University of Geneva, Switzerland, 1993.
`X. Marichal, et al., “Automatic detection of interest areas of
`an image or of a sequence of images,” in Proc. IEEE Int.
`Conf. Image Process, 1996.
`
`Primary Examiner—Andrew W. Johns
`Assistant Examiner—Seyed Azarian
`(74) Attorney, Agent, or Firm—David M. Woods
`
`(57)
`
`ABSTRACT
`
`A method for detecting a main subject in an image, the
`method comprises:
`receiving a digital
`image; extracting
`regions of arbitrary shape and size defined by actual objects
`from the digital image; grouping the regions into larger
`segments corresponding to physically coherent objects;
`extracting for each of the regions at least one structural
`saliency feature and at least one semantic saliency feature;
`and integrating saliency features using a probabilistic rea-
`soning engine into an estimate of a belief that each region is
`the main subject.
`
`37 Claims, 9 Drawing Sheets
`
`
`
`Bosch v. Monument Peak Ventures
`|PR2019—01473 MPV Ex. 2001
`
`Page 1
`
`Bosch v. Monument Peak Ventures
`IPR2019-01473 MPV Ex. 2001
`Page 1
`
`

`

`US. Patent
`
`Aug. 28, 2001
`
`Sheet 1 0f 9
`
`US 6,282,317 B1
`
`6!
`
`FIG!
`
`Bosch v. Monument Peak Ventures
`|PR2019—01473 MPV Ex. 2001
`
`Page 2
`
`Bosch v. Monument Peak Ventures
`IPR2019-01473 MPV Ex. 2001
`Page 2
`
`

`

`US. Patent
`
`Aug. 28, 2001
`
`Sheet 2 0f 9
`
`US 6,282,317 B1
`
`50
`
` 52
`
`
`
`SEGMEN TA TION
`
`54
`
`
`
`NON—PURPOSI VE
`
`PERCEP TUAI. GROUP/NC
`
`56
`
`
`
`PURPOS/ VE
`PERCEP TUAL
`
`
`GROUP/NC
`
`
`
`
`
`SEMAN TIC
`S TRUC TURAL
`FEA TURE
`FEA TURE
`
`
`
`EX TRAC 770N
`EXTRA C TION
`
`
`
`
`
`
`
` 512
`
`PROBABILIS T/C
`REASON/NC
`
`BELIEF
`MAP
`
`FIG. 2
`
`Bosch v. Monument Peak Ventures
`|PR2019—01473 MPV Ex. 2001
`
`Page 3
`
`Bosch v. Monument Peak Ventures
`IPR2019-01473 MPV Ex. 2001
`Page 3
`
`

`

`US. Patent
`
`Aug. 28, 2001
`
`Sheet 3 0f 9
`
`US 6,282,317 B1
`
`BELIEF
`
`
`
`MW
`
`SENSOR VALUE
`
`MAX
`
`FIG. 3
`
`MAIN
`
`SUBJEC T
`
`
`
`
`FEATURE 1
`
`FEATURE N
`
`FIG. 7
`
`Bosch v. Monument Peak Ventures
`|PR2019—01473 MPV Ex. 2001
`
`Page 4
`
`Bosch v. Monument Peak Ventures
`IPR2019-01473 MPV Ex. 2001
`Page 4
`
`

`

`US. Patent
`
`Aug. 28, 2001
`
`Sheet 4 0f 9
`
`US 6,282,317 B1
`
`0
`'
`u
`n'
`a. c.
`CIDII.9
`'1'"... "p.u
`on.
`
`t.
`
`‘vo
`
`
`
`Bosch v. Monument Peak Ventures
`|PR2019—01473 MPV Ex. 2001
`
`Page 5
`
`Bosch v. Monument Peak Ventures
`IPR2019-01473 MPV Ex. 2001
`Page 5
`
`

`

`US. Patent
`
`Aug. 28, 2001
`
`Sheet 5 0f 9
`
`US 6,282,317 B1
`
`IMD TH DIS TR/BU TION
`
`0F MAIN SUBJEC T
`
`MEAN
`
`WU TH
`VALUE
`
`0. 8
`
`0. 6
`
`0. 4
`
`0.2
`
`0
`
`0
`
`0
`
`64
`
`PIXEL
`POS/ T/ON
`
`728
`
`FIG. 4C
`
`792
`
`255
`
`96
`
`792
`
`288
`
`38.3
`
`PIXEL POSITION
`
`HEIGH T DIS TRIBU TION
`OF MAIN SUBJECT
`
`0
`
`0.2
`
`0.4
`
`0.6
`
`0.8
`
`I
`
`MEAN TRUTH VALUE
`
`Bosch v. Monument Peak Ventures
`|PR2019—01473 MPV Ex. 2001
`
`Page 6
`
`Bosch v. Monument Peak Ventures
`IPR2019-01473 MPV Ex. 2001
`Page 6
`
`

`

`US. Patent
`
`Aug. 28, 2001
`
`Sheet 6 0f 9
`
`US 6,282,317 B1
`
`"14:21:
`
`v.
`N '3'
`
`V
`
`Ԥ 3) ,1" .1:
`mafia
`»‘
`.
`i
`um. .
`5‘
`“a a» '
`1
`
`Bosch v. Monument Peak Ventures
`|PR2019—01473 MPV Ex. 2001
`
`Page 7
`
`Bosch v. Monument Peak Ventures
`IPR2019-01473 MPV Ex. 2001
`Page 7
`
`

`

`US. Patent
`
`Aug. 28, 2001
`
`Sheet 7 0f 9
`
`US 6,282,317 B1
`
`HORIZON TAL DIS TRIBU TION
`OF MAIN SUBJEC T
`
`0.8
`
`MEAN 0' 6
`TRU TH
`VALUE
`
`0. 4
`
`02
`
`FIG. 5b
`
`.25
`
`.5
`
`VER TICAL
`POSI TION
`
`FIG. 56
`
`.75
`
`1.0
`
`0
`
`.25
`
`.5
`
`. 75
`
`7
`
`HORIZON TAL POSI TION
`
`VER TICAL DIS TRIBU TION
`OF MAIN SUBJECT
`
`0
`
`0.2
`
`0.4
`
`0.6
`
`0.8
`
`I
`
`MEAN TRUTH VALUE
`
`Bosch v. Monument Peak Ventures
`|PR2019—01473 MPV Ex. 2001
`
`Page 8
`
`Bosch v. Monument Peak Ventures
`IPR2019-01473 MPV Ex. 2001
`Page 8
`
`

`

`US. Patent
`
`Aug. 28, 2001
`
`Sheet 8 0f 9
`
`US 6,282,317 B1
`
`FIG.6
`
`Bosch v. Monument Peak Ventures
`|PR2019—01473 MPV Ex. 2001
`
`Page 9
`
`Bosch v. Monument Peak Ventures
`IPR2019-01473 MPV Ex. 2001
`Page 9
`
`

`

`s”U
`
`tn
`
`0.5
`
`3,
`
`%
`
`U
`
`1B
`
`
`02%:P8m20:E536%ES:58me
`wE08szEHk?aexcise?magi
`
`m.Nu396
`mmmmug39:u\sgong:
`
`1mzfi:#0332E<38m20:E55%EmES2mg
`
`mom.New
`
`
`
`920:E5:8652E<5.mmmEEsEEemu@5on93¢:2&3SEESfigsmmmgag;838m
`
`
`
`9III.M356.23SEEhnn«203sz55E8N+30%:"$99
`
`N+ZOE«QM:H20h5m:
`
`7.96.m,m85*mgoue<Ex§n396moM,8%EmsEg+a
`
`mmtxfilev20R<mm:
`
`
`
`mm:\XqiH20R<mmt
`
`m0m6«EMSOD
`
`i010<~tk<3VmjoxbQZ<
`
`SEEAE
`
`Bosch v. Monument Peak Ventures
`|PR2019-01473 MPV EX. 2001
`
`Page 10
`
`Bosch v. Monument Peak Ventures
`IPR2019-01473 MPV Ex. 2001
`Page 10
`
`
`
`
`
`

`

`US 6,282,317 B1
`
`1
`METHOD FOR AUTOMATIC
`DETERMINATION OF MAIN SUBJECTS IN
`PHOTOGRAPHIC IMAGES
`
`FIELD OF THE INVENTION
`
`The invention relates generally to the field of digital
`image processing and, more particularly, to locating main
`subjects, or equivalently, regions of photographic interest in
`a digital image.
`BACKGROUND OF THE INVENTION
`
`In photographic pictures, a main subject is defined as what
`the photographer tries to capture in the scene. The first-party
`truth is defined as the opinion of the photographer and the
`third-party truth is defined as the opinion from an observer
`other than the photographer and the subject (if applicable).
`In general, the first-party truth typically is not available due
`to the lack of specific knowledge that the photographer may
`have about the people, setting, event, and the like. On the
`other hand,
`there is,
`in general, good agreement among
`third-party observers if the photographer has successfully
`used the picture to communicate his or her interest in the
`main subject to the viewers. Therefore, it is possible to
`design a method to automatically perform the task of detect-
`ing main subjects in images.
`Main subject detection provides a measure of saliency or
`relative importance for different regions that are associated
`with different subjects in an image. It enables a discrimina-
`tive treatment of the scene contents for a number of appli-
`cations. The output of the overall system can be modified
`versions of the image, semantic information, and action.
`The methods disclosed by the prior art can be put in two
`major categories. The first category is considered “pixel-
`based” because such methods were designed to locate inter-
`esting pixels or “spots” or “blocks”, which usually do not
`correspond to entities of objects or subjects in an image. The
`second category is considered “region-based” because such
`methods were designed to locate interesting regions, which
`correspond to entities of objects or subjects in an image.
`Most pixel-based approaches to region-of—interest detec-
`tion are essentially edge detectors. V. D. Gesu, et al., “Local
`operators to detect regions of interest,” Pattern Recognition
`Letters, vol. 18, pp. 1077—1081, 1997, used two local
`operators based on the computation of local moments and
`symmetries to derive the selection. Arguing that the perfor-
`mance of a visual system is strongly influenced by infor-
`mation processing done at early vision stage, two transforms
`named the discrete moment transform (DMT) and discrete
`symmetry transform (DST) are computed to measure local
`central moments about each pixel and local radial symmetry.
`In order to exclude trivial symmetry cases, nonuniform
`region selection is needed. The specific DMT operator acts
`like a detector of prominent edges (occlusion boundaries)
`and the DST operator acts like a detector of symmetric
`blobs. The results from the two operators are combined via
`logic “AND” operation. Some morphological operations are
`needed to dilate the edge-like raw output map generated by
`the DMT operator.
`R. Milanese, Detecting salient regions in an image: From
`biology to implementation, PhD thesis, University of
`Geneva, Switzerland, 1993, developed a computational
`model of visual attention, which combines knowledge about
`the human visual system with computer vision techniques.
`The model
`is structured into three major stages. First,
`multiple feature maps are extracted from the input image
`(for examples, orientation, curvature, color contrast and the
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`
`like). Second, a corresponding number of “conspicuity”
`maps are computed using a derivative of Gaussian model,
`which enhance regions of interest
`in each feature map.
`Finally, a nonlinear relaxation process is used to integrate
`the conspicuity maps into a single representation by finding
`a compromise among inter-map and intra-map inconsisten-
`cies. The effectiveness of the approach was demonstrated
`using a few relatively simple images with remarkable
`regions of interest.
`To determine an optimal tonal reproduction, J. R. Boyack,
`et al., US. Pat. No. 5,724,456, developed a system that
`partitions the image into blocks, combines certain blocks
`into sectors, and then determines a difference between the
`maximum and minimum average block values for each
`sector. A sector is labeled an active sector if the difference
`
`exceeds a pre-determined threshold value. All weighted
`counts of active sectors are plotted versus the average
`luminance sector values in a histogram, which is then shifted
`via some predetermined criterion so that the average lumi-
`nance sector value of interest will fall within a destination
`
`window corresponding to the tonal reproduction capability
`of a destination application.
`In summary, this type of pixel-based approach does not
`explicitly detect region of interest corresponding to seman-
`tically meaningful subjects in the scene. Rather, these meth-
`ods attempt to detect regions where certain changes occur in
`order to direct attention or gather statistics about the scene.
`X. Marichal, et al., “Automatic detection of interest areas
`of an image or of a sequence of images,” in Proc. IEEE Int.
`Conf. Image Process., 1996, developed a fuzzy logic-based
`system to detect interesting areas in a video sequence. A
`number of subjective knowledge-based interest criteria were
`evaluated for segmented regions in an image. These criteria
`include: (1) an interaction criterion (a window predefined by
`a human operator);
`(2) a border criterion (rejecting of
`regions having large number of pixels along the picture
`borders);
`(3) a face texture criterion (de-emphasizing
`regions whose texture does not correspond to skin samples);
`(4) a motion criterion (rejecting regions with no motion and
`low gradient or regions with very large motion and high
`gradient); and (5) a continuity criterion (temporal stability in
`motion). The main application of this method is for directing
`the resources in video coding, in particular for videophone
`or videoconference.
`It
`is clear that motion is the most
`
`effective criterion for this technique targeted at video instead
`of still images. Moreover, the fuzzy logic functions were
`designed in an ad hoc fashion. Lastly, this method requires
`a window predefined by a human operator, and therefore is
`not fully automatic.
`W. Osberger, et al., “Automatic identification of percep-
`tually important regions in an image,” in Proc. IEEE Int.
`Conf. Pattern Recognition, 1998, evaluated several features
`known to influence human visual attention for each region of
`a segmented image to produce an importance value for each
`feature in each region. The features mentioned include
`low-level factors (contrast, size, shape, color, motion) and
`higher level factors (location,
`foreground/background,
`people, context), but only contrast, size, shape, location and
`foreground/background (determining background by deter-
`mining the proportion of total image border that is contained
`in each region) were implemented. Moreover, this method
`chose to treat each factor as being of equal importance by
`arguing that (1) there is little quantitative data which indi-
`cates the relative importance of these different factors and
`(2) the relative importance is likely to change from one
`image to another. Note that segmentation was obtained using
`the split-and-merge method based on 8x8 image blocks and
`
`Bosch v. Monument Peak Ventures
`|PR2019—01473 MPV Ex. 2001
`
`Page 11
`
`Bosch v. Monument Peak Ventures
`IPR2019-01473 MPV Ex. 2001
`Page 11
`
`

`

`US 6,282,317 B1
`
`3
`this segmentation method often results in over-segmentation
`and blotchiness around actual objects.
`Q. Huang, et al., “Foreground/background segmentation
`of color images by integration of multiple cues,” in Proc.
`IEEE Int. Conf. Image Process, 1995, addressed automatic
`segmentation of color images into foreground and back-
`ground with the assumption that background regions are
`relatively smooth but may have gradually varying colors or
`be lightly textured. A multi-level segmentation scheme was
`devised that included color clustering, unsupervised seg-
`mentation based on MDL (Minimum Description Length)
`principle, edge-based foreground/background separation,
`and integration of both region and edge-based segmentation.
`In particular, the MDL-based segmentation algorithm was
`used to further group the regions from the initial color
`clustering, and the four corners of the image were used to
`adaptively determine an estimate of the background gradient
`magnitude. The method was tested on around 100 well-
`composed images with prominent main subject centered in
`the image against large area of the assumed type of unclut-
`tered background.
`T. F. Syeda-Mahmood, “Data and model-driven selection
`using color regions,” Int. J. Comput. Vision, vol. 21, no. 1,
`pp. 9—36, 1997, proposed a data-driven region selection
`method using color region segmentation and region-based
`saliency measurement. A collection of 220 primary color
`categories was pre-defined in the form of a color LUT
`(look-up-table). Pixels are mapped to one of the color
`categories, grouped together through connected component
`analysis, and further merged according to compatible color
`categories. Two types of saliency measures, namely self-
`saliency and relative saliency, are linearly combined using
`heuristic weighting factors to determine the overall saliency.
`In particular, self-saliency included color saturation, bright-
`ness and size while relative saliency included color contrast
`(defined by CIE distance) and size contrast between the
`concerned region and the surrounding region that is ranked
`highest among neighbors by size, extent and contrast in
`successive order.
`
`In summary, almost all of these reported methods have
`been developed for
`targeted types of images: video-
`conferencing or TV news broadcasting images, where the
`main subject is a talking person against a relatively simple
`static background (Osberg, Marichal); museum images,
`where there is a prominent main subject centered in the
`image against large area of relatively clean background
`(Huang); and toy-world images, where the main subject are
`a few distinctively colored and shaped objects (Milanese,
`Syeda). These methods were either not designed for uncon-
`strained photographic images, or even if designed with
`generic principles were only demonstrated for their effec-
`tiveness on rather simple images. The criteria and reasoning
`processes used were somewhat
`inadequate for less con-
`strained images, such as photographic images.
`
`SUMMARY OF THE INVENTION
`
`It is an object of this invention to provide a method for
`detecting the location of main subjects within a digitally
`captured image and thereby overcoming one or more prob-
`lems set forth above.
`
`It is also an object of this invention to provide a measure
`of belief for the location of main subjects within a digitally
`captured image and thereby capturing the intrinsic degree of
`uncertainty in determining the relative importance of differ-
`ent subjects in an image. The output of the algorithm is in the
`form of a list of segmented regions ranked in a descending
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`order of their likelihood as potential main subjects for a
`generic or specific application. Furthermore, this list can be
`converted into a map in which the brightness of a region is
`proportional to the main subject belief of the region.
`It is also an object of this invention to use ground truth
`data. Ground truth, defined as human outlined main subjects,
`is used to feature selection and training the reasoning engine.
`It is also an object of this invention to provide a method
`of finding main subjects in an image in an automatic manner.
`It is also an object of this invention to provide a method
`of finding main subjects in an image with no constraints or
`assumptions on scene contents.
`It is further an object of the invention to use the main
`subject location and main subject belief to obtain estimates
`of the scene characteristics.
`
`The present invention comprises the steps of:
`a) receiving a digital image;
`b) extracting regions of arbitrary shape and size defined
`by actual objects from the digital image;
`c) grouping the regions into larger segments correspond-
`ing to physically coherent objects;
`d) extracting for each of the regions at least one structural
`saliency feature and at
`least one semantic saliency
`feature; and,
`e) integrating saliency features using a probabilistic rea-
`soning engine into an estimate of a belief that each
`region is the main subject.
`The above and other objects of the present invention will
`become more apparent when taken in conjunction with the
`following description and drawings wherein identical refer-
`ence numerals have been used, where possible, to designate
`identical elements that are common to the figures.
`ADVANTAGEOUS EFFECT OF THE
`INVENTION
`
`The present invention has the following advantages of:
`a robust image segmentation method capable of identify-
`ing object regions of arbitrary shapes and sizes, based
`on physics-motivated adaptive Bayesian clustering and
`non-purposive grouping;
`emphasis on perceptual grouping capable of organizing
`regions corresponding to different parts of physically
`coherent subjects;
`utilization of a non-binary representation of the ground
`truth, which capture the inherent uncertainty in deter-
`mining the belief of main subject, to guide the design
`of the system;
`a rigorous, systematic statistical training mechanism to
`determine the relative importance of different features
`through ground truth collection and contingency table
`building;
`extensive, robust feature extraction and evidence collec-
`tion;
`combination of structural saliency and semantic saliency,
`the latter facilitated by explicit identification of key
`foreground- and background- subject matters;
`combination of self and relative saliency measures for
`structural saliency features; and,
`a robust Bayes net-based probabilistic inference engine
`suitable for integrating incomplete information.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a perspective view of a computer system for
`implementing the present invention;
`
`Bosch v. Monument Peak Ventures
`|PR2019—01473 MPV Ex. 2001
`
`Page 12
`
`Bosch v. Monument Peak Ventures
`IPR2019-01473 MPV Ex. 2001
`Page 12
`
`

`

`US 6,282,317 B1
`
`5
`FIG. 2 is a block diagram illustrating a software program
`of the present invention;
`FIG. 3 is an illustration of the sensitivity characteristic of
`a belief sensor with sigmoidal shape used in the present
`invention;
`FIG. 4 is an illustration of the location PDF with
`
`unknown-orientation, FIG. 4(a) is an illustration of the PDF
`in the form of a 2D function, FIG. 4(b) is an illustration of
`the PDF in the form of its projection along the width
`direction, and FIG. 4(c) is an illustration of the PDF in the
`form of its projection along the height direction;
`FIG. 5 is an illustration of the location PDF with known-
`
`orientation, FIG. 5(a) is an illustration of the PDF in the
`form of a 2D function, FIG. 5(b) is an illustration of the PDF
`in the form of its projection along the width direction, and
`FIG. 5(c) is an illustration of the PDF in the form of its
`projection along the height direction;
`FIG. 6 is an illustration of the computation of relative
`saliency for the central circular region using an extended
`neighborhood as marked by the box of dotted line;
`FIG. 7 is an illustration of a two level Bayes net used in
`the present invention; and,
`FIG. 8 is block diagram of a preferred segmentation
`method.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`In the following description, the present invention will be
`described in the preferred embodiment as a software pro-
`gram. Those skilled in the art will readily recognize that the
`equivalent of such software may also be constructed in
`hardware.
`
`Still further, as used herein, computer readable storage
`medium may comprise,
`for example; magnetic storage
`media such as a magnetic disk (such as a floppy disk) or
`magnetic tape; optical storage media such as an optical disc,
`optical
`tape, or machine readable bar code; solid state
`electronic storage devices such as random access memory
`(RAM), or read only memory (ROM); or any other physical
`device or medium employed to store a computer program.
`Referring to FIG. 1, there is illustrated a computer system
`10 for implementing the present invention. Although the
`computer system 10 is shown for the purpose of illustrating
`a preferred embodiment, the present invention is not limited
`to the computer system 10 shown, but may be used on any
`electronic processing system. The computer system 10
`includes a microprocessor based unit 20 for receiving and
`processing software programs and for performing other
`processing functions. A touch screen display 30 is electri-
`cally connected to the microprocessor based unit 20 for
`displaying user related information associated with the
`software, and for receiving user input via touching the
`screen. A keyboard 40 is also connected to the micropro-
`cessor based unit 20 for permitting a user to input informa-
`tion to the software. As an alternative to using the keyboard
`40 for input, a mouse 50 may be used for moving a selector
`52 on the display 30 and for selecting an item on which the
`selector 52 overlays, as is well known in the art.
`A compact disk-read only memory (CD-ROM) 55 is
`connected to the microprocessor based unit 20 for receiving
`software programs and for providing a means of inputting
`the software programs and other information to the micro-
`processor based unit 20 via a compact disk 57, which
`typically includes a software program. In addition, a floppy
`disk 61 may also include a software program, and is inserted
`
`5
`
`10
`
`15
`
`20
`
`25
`
`35
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`into the microprocessor based unit 20 for inputting the
`software program. Still further, the microprocessor based
`unit 20 may be programmed, as is well know in the art, for
`storing the software program internally. A printer 56 is
`connected to the microprocessor based unit 20 for printing
`a hardcopy of the output of the computer system 10.
`Images may also be displayed on the display 30 via a
`personal computer card (PC card) 62 or, as it was formerly
`known, a personal computer memory card international
`association card (PCMCIA card) which contains digitized
`images electronically embodied the card 62. The PC card 62
`is ultimately inserted into the microprocessor based unit 20
`for permitting visual display of the image on the display 30.
`Referring to FIG. 2, there is shown a block diagram of an
`overview of the present invention. First, an input image of
`a natural scene is acquired and stored SO in a digital form.
`Then,
`the image is segmented SZ into a few regions of
`homogeneous properties. Next,
`the region segments are
`grouped into larger regions based on similarity measures S4
`through non-purposive perceptual grouping, and further
`grouped into larger regions corresponding to perceptually
`coherent objects S6 through purposive grouping (purposive
`grouping concerns specific objects). The regions are evalu-
`ated for their saliency SS using two independent yet comple-
`mentary types of saliency features—structural saliency fea-
`tures and semantic saliency features. The structural saliency
`features, including a set of low-level early vision features
`and a set of geometric features, are extracted 88a, which are
`further processed to generate a set of self-saliency features
`and a set of relative saliency features. Semantic saliency
`features in the forms of key subject matters, which are likely
`to be part of either foreground (for example, people) or
`background (for example, sky, grass), are detected 88b to
`provide semantic cues as well as scene context cues. The
`evidences of both types are integrated 810 using a reasoning
`engine based on a Bayes net to yield the final belief map of
`the main subject $12.
`To the end of semantic interpretation of images, a single
`criterion is clearly insufficient. The human brain, furnished
`with its a priori knowledge and enormous memory of real
`world subjects and scenarios, combines different subjective
`criteria in order to give an assessment of the interesting or
`primary subject(s) in a scene. The following extensive list of
`features are believed to have influences on the human brain
`
`in performing such a somewhat intangible task as main
`subject detection:
`location, size, brightness, colorfulness,
`texturefulness, key subject matter, shape, symmetry, spatial
`relationship (surroundedness/occlusion), bordemess, indoor/
`outdoor, orientation, depth (when applicable), and motion
`(when applicable for video sequence).
`In the present invention, the low-level early vision fea-
`tures include color, brightness, and texture. The geometric
`features include location (centrality), spatial relationship
`(borderness, adjacency, surroundedness, and occlusion),
`size, shape, and symmetry. The semantic features include
`flesh, face, sky, grass, and other green vegetation. Those
`skilled in the art can define more features without departing
`from the scope of the present invention.
`82: Region Segmentation
`The adaptive Bayesian color segmentation algorithm
`(Luo et al., “Towards physics-based segmentation of pho-
`tographic color images,” Proceedings of the IEEE Interna-
`tional Conference on Image Processing, 1997) is used to
`generate a tractable number of physically coherent regions
`of arbitrary shape. Although this segmentation method is
`preferred, it will be appreciated that a person of ordinary
`skill in the art can use a different segmentation method to
`
`Bosch v. Monument Peak Ventures
`|PR2019—01473 MPV Ex. 2001
`
`Page 13
`
`Bosch v. Monument Peak Ventures
`IPR2019-01473 MPV Ex. 2001
`Page 13
`
`

`

`US 6,282,317 B1
`
`7
`obtain object regions of arbitrary shape without departing
`from the scope of the present invention. Segmentation of
`arbitrarily shaped regions provides the advantages of: (1)
`accurate measure of the size, shape, location of and spatial
`relationship among objects; (2) accurate measure of the
`color and texture of objects; and (3) accurate classification
`of key subject matters.
`Referring to FIG. 8, there is shown a block diagram of the
`preferred segmentation algorithm. First, an initial segmen-
`tation of the image into regions is obtained SSO. A color
`histogram of the image is computed and then partitioned into
`a plurality of clusters that correspond to distinctive, promi-
`nent colors in the image. Each pixel of the image is classified
`to the closest cluster in the color space according to a
`preferred physics-based color distance metric with respect to
`the mean values of the color clusters (Luo et al., “Towards
`physics-based segmentation of photographic color images,”
`Proceedings of the IEEE International Conference on Image
`Processing, 1997). This classification process results in an
`initial segmentation of the image. A neighborhood window
`is placed at each pixel in order to determine what neighbor-
`hood pixels are used to compute the local color histogram
`for this pixel. The window size is initially set at the size of
`the entire image 852, so that the local color histogram is the
`same as the one for the entire image and does not need to be
`recomputed. Next, an iterative procedure is performed
`between two alternating processes: re-computing SS4 the
`local mean values of each color class based on the current
`
`segmentation, and re-classifying the pixels according to the
`updated local mean values of color classes SS6. This itera-
`tive procedure is performed until a convergence is reached
`S60. During this iterative procedure, the strength of the
`spatial constraints can be adjusted in a gradual manner SSS
`(for example, the value of [3, which indicates the strength of
`the spatial constraints,
`is increased linearly with each
`iteration). After the convergence is reached for a particular
`window size, the window used to estimate the local mean
`values for color classes is reduced by half in size S62. The
`iterative procedure is repeated for the reduced window size
`to allow more accurate estimation of the local mean values
`
`for color classes. This mechanism introduces spatial adap-
`tivity into the segmentation process. Finally, segmentation
`of the image is obtained when the iterative procedure
`reaches convergence for the minimum window size S64.
`S4 & S6: Perceptual Grouping
`The segmented regions may be grouped into larger seg-
`ments that consist of regions that belong to the same object.
`Perceptual grouping can be non-purposive and purposive.
`Referring to FIG. 2, non-purposive perceptual grouping S4
`can eliminate over-segmentation due to large illumination
`differences, for example, a table or wall with remarkable
`illumination falloff over a distance. Purposive perceptual
`grouping S6 is generally based on smooth, noncoincidental
`connection of joints between parts of the same object, and in
`certain cases models of typical objects (for example, a
`person has head, torso and limbs).
`Perceptual grouping facilitates the recognition of high-
`level vision features. Without proper perceptual grouping, it
`is difficult to perform object recognition and proper assess-
`ment of such properties as size and shape. Perceptual
`grouping includes: merging small regions into large regions
`based on similarity in properties and compactness of the
`would-be merged region (non-purposive grouping); and
`grouping parts that belong to the same object based on
`commonly shared background, compactness of the would-be
`merged region, smoothness in contour connection between
`regions, and model of specific object (purposive grouping).
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`SS: Feature Extraction
`
`8
`
`For each region, an extensive set of features, which are
`shown to contribute to visual attention, are extracted and
`associated evidences are then comput

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket