`(12) Patent Application Publication (10) Pub. No.: US 2007/0030391 A1
`Kim et al.
`(43) Pub. Date:
`Feb. 8, 2007
`
`US 20070O30391A1
`
`(54) APPARATUS, MEDIUM, AND METHOD
`SEGMENTINGVIDEO SEQUENCES BASED
`ON TOPIC
`(75) Inventors: Jungbae Kim, Yongin-si (KR); Doosun
`Hwang, Seoul (KR); Jiyeun Kim,
`Seoul (KR)
`Correspondence Address:
`STAAS & HALSEY LLP
`SUTE 700
`1201 NEW YORK AVENUE, N.W.
`WASHINGTON, DC 20005 (US)
`(73) Assignee: Samsung Electronics Co., Ltd., Suwon
`si (KR)
`(21) Appl. No.:
`11/498,857
`(22) Filed:
`Aug. 4, 2006
`
`(30)
`
`Foreign Application Priority Data
`
`Aug. 4, 2005 (KR)............................ 10-2005-0071507
`
`Publication Classification
`
`(51) Int. Cl.
`(2006.01)
`H04N 5/445
`(52) U.S. Cl. .............................................................. 34.8/564
`
`ABSTRACT
`(57)
`Provided are an apparatus, medium, and method segmenting
`Video sequences based on a topic. The apparatus may
`include a start-shot determination unit detecting a plurality
`of key-frames by using character information from video
`sequences including a plurality of frames to determine the
`detected key-frames as start-shots for each topic, and a topic
`list creation unit creating a topic list by using the start-shots
`for each topic.
`
`CHAPTER 4
`
`CHAPTER 5
`
`CHAPTER 6
`
`CHAPTER 7
`
`s
`
`Sri
`:
`
`risex
`geoists' .
`8T mir
`
`CHAPTER 22
`
`
`
`START :
`
`CONTENTS:
`
`START :
`
`CONTENTS:
`
`Petitioner Apple Inc. - Ex. 1049, p. 1
`
`
`
`Patent Application Publication
`
`US 2007/0030391 A1
`
`
`
`Petitioner Apple Inc. - Ex. 1049, p. 2
`
`
`
`Patent Application Publication Feb. 8, 2007 Sheet 2 of 10
`
`US 2007/0030391 A1
`
`FIG. 2
`
`START-SHOT
`DETERMINATION
`UNIT
`
`TOPC LIST
`CREATION
`UNT
`
`FIG 3
`
`29
`
`
`
`VIDEO
`SEOUENCES
`
`EPG
`SIGNAL
`
`
`
`310
`
`330
`
`350
`
`PRE-
`PROCESSING
`UNIT
`
`
`
`FACE
`DETECTION
`UNIT
`
`KEY-FRAME
`DETERMINATION
`UNIT
`
`Petitioner Apple Inc. - Ex. 1049, p. 3
`
`
`
`Patent Application Publication Feb. 8, 2007 Sheet 3 of 10
`
`US 2007/0030391 A1
`
`FIG. 4A
`
`
`
`
`
`
`
`TITLE: WJREPORTERS
`CHANNEL : KBS2
`BROADCASTING TIME:
`411 PM 9:55
`- 411 PM 11:05
`GENRE: CURRENT AFFAIRS
`/DOCUMENTARY-SOCIETY
`BROADCASTER :
`JUNG-MIN, HWANG
`
`
`
`FIG, 4B
`
`
`
`
`
`
`
`
`
`
`
`
`
`Petitioner Apple Inc. - Ex. 1049, p. 4
`
`
`
`Patent Application Publication Feb. 8, 2007 Sheet 4 of 10
`
`US 2007/0030391 A1
`
`FIG. 4C
`
`
`
`
`
`VIDEO
`SEOUENCES
`
`HUMBNAIL
`MAGE
`CREATION
`UNIT
`
`CHANGE
`DETECTION
`UNIT
`
`DE6.
`UNIT
`
`
`
`EPG
`
`SIGNAL
`
`EPG
`ANALYZER
`UNIT
`
`NUMBER-OF-MAIN
`-CHARACTERS
`DETERMINATION
`UNIT
`
`TO KEY FRAME
`DETERMINATION
`UNIT
`
`Petitioner Apple Inc. - Ex. 1049, p. 5
`
`
`
`Patent Application Publication Feb. 8, 2007 Sheet 5 of 10
`
`US 2007/0030391 A1
`
`FIG. 6A
`
`THUMBNAL MAGE
`RE-ORGANIZATION UNIT
`
`CLASSIFYING
`UNIT
`
`FIG. 7
`
`M. S.
`NY's
`M.
`v.0.3 sq to
`
`.
`
`
`
`710
`
`730
`
`750
`
`Petitioner Apple Inc. - Ex. 1049, p. 6
`
`
`
`Patent Application Publication Feb. 8, 2007 Sheet 6 of 10
`
`US 2007/0030391 A1
`
`FIG. 8A
`
`811
`
`812
`
`813
`
`814
`
`86
`
`815
`
`FIG. 8B
`
`821
`
`823
`
`
`
`Petitioner Apple Inc. - Ex. 1049, p. 7
`
`
`
`Patent Application Publication Feb. 8, 2007 Sheet 7 of 10
`
`US 2007/0030391 A1
`
`FIG. 9
`
`
`
`900 -
`
`Petitioner Apple Inc. - Ex. 1049, p. 8
`
`
`
`Patent Application Publication
`
`Feb. 8,2007 Sheet 8 of 10
`
`US 2007/0030391 Al
`
`S]NOILVYSN39
`
`éQ313FIdWOO
`
`StNOMLO3140
`
`éINSSSAQONS
`
`dOvA
`
`SINOILWHANI9D
`
`é05LI1dWOO
`
`SINOILO3LS0
`
`AOVA
`
`VOT‘Old
`
`S3AMOONIM-SNS3LVYSNI9 SLOl
`
`
`
`
`
`JOVISWVYS
`
`AZINVDYO-3u
`
`é1INSSSAQONS
`
`
`
`NOILOASLSuldNI
`
`
`
`JOVANITIVWNEWNHL
`
`
`
`WHOSLNAZINVOYO
`
`
`
`NOILOISLSHIJYOJOVWI
`
`
`
`
`
`Y4OJSMOGNIM-SNS3LVH3NI9TWHSSALNSZINVDYO
`
`
`
`
`
`
`
`
`
`SNOILOSSGNODSSONYLSHI4YO4J9vWI
`
`
`
`031V901MOCGNIM-8NS30N10x3)QNOOASNOILO3S
`
`
`
`
`
`
`
`
`
`(NOILOASLSHI3AINONI
`
`Petitioner Apple Inc. - Ex. 1049, p. 9
`
`Petitioner Apple Inc. - Ex. 1049, p. 9
`
`
`
`
`
`
`
`
`
`
`
`Patent Application Publication
`
`Feb. 8, 2007 Sheet 9 of 10
`
`US 2007/0030391 A1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`MOCINIM-90S BOTTOXB)
`
`(1S8||-|ATNO NI CJELWOOT
`
`Petitioner Apple Inc. - Ex. 1049, p. 10
`
`
`
`Patent Application Publication Feb. 8, 2007 Sheet 10 of 10
`
`US 2007/0030391 A1
`
`FIG 11
`
`CLOTHING
`INFORMATION
`EXTRACTION UNIT
`
`CHARACTER
`CLUSTERNG
`UNIT
`
`MAIN CHARACTER
`DETERMINATION
`UNIT
`
`
`
`1210 -
`
`Petitioner Apple Inc. - Ex. 1049, p. 11
`
`
`
`US 2007/0O30391 A1
`
`Feb. 8, 2007
`
`APPARATUS, MEDIUM, AND METHOD
`SEGMENTING VIDEO SEQUENCES BASED ON
`TOPIC
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`0001. This application claims the benefit of Korean
`Patent Application No. 10-2005-0071507, filed on Aug. 4,
`2005, in the Korean Intellectual Property Office, the disclo
`sure of which is incorporated herein in its entirety by
`reference
`
`BACKGROUND OF THE INVENTION
`
`0002)
`1. Field of the Invention
`0003. An embodiment of the present invention relates to
`segmentation of video sequences, and more particularly, to
`an apparatus, medium, and method segmenting video
`sequences based on a topic at high speed by detecting main
`characters.
`0004 2. Description of the Related Art
`0005 Developments in digital signal processing tech
`niques such as video and audio compression have allowed
`users to retrieve and browse desired multimedia content at
`desired points in time. Fundamental techniques required to
`browse and retrieve non-linear multimedia content include
`shot segmentation and shot clustering, with these two tech
`niques being most important for structurally and hierarchi
`cally analyzing multimedia content.
`0006. A “shot' in a video program is a sequence of
`frames that can be obtained from a video camera without
`interruption, and may functions as a basic unit for analyzing
`or organizing the video program. The shot may mean a
`single frame or a plurality of frames, however, for simplicity
`of explanation, the term shot will be exemplified by the
`single frame, noting that embodiments of the invention are
`not limited to the same. In addition, a 'scene” in the video
`program is a semantic element of a video construction or
`development of a story, and includes a collection of shots
`related to one another by the same semantic context. The
`concept of the shot or the scene may be similarly applied to
`an audio program as well as the video program.
`0007. A multimedia indexing technique allows users to
`easily browse or retrieve a desired part of the video program.
`A conventional multimedia indexing technique may include
`extracting organizational information of video content in
`units of shots or scenes, extracting main characteristic
`elements such as key-frames capable of representing a
`corresponding segment for each organizational unit, index
`ing the organizational information for multimedia content,
`and describing semantic information, such as an occurrence
`of an event, advent of visual or auditory objects, and
`conditions and backgrounds of objects, along a temporal
`aX1S.
`0008 However, such conventional multimedia content
`indexing techniques fail to easily identify the result of a
`Summarization because excessive segments are generated
`when segmentation is performed on the basis of scene
`change. In addition, conventional techniques fail to accu
`rately detect start points of the segments because the mul
`timedia content is not segmented on the basis of similarity
`
`of content, but rather, the multimedia content is Summarized
`using a single piece of information Such as similarity of
`colors. Further, it is difficult to summarize the multimedia
`content when a broadcast type or genre is changed because
`only a characteristic of a particular genre is used. Moreover,
`due to an excessive processing load generated during the
`summarization of the multimedia content, it is difficult to
`apply conventional techniques to embedded systems such as
`mobile phones, personal digital assistants (PDAs), and digi
`tal cameras, which have low performance processors.
`
`SUMMARY OF THE INVENTION
`0009. An embodiment of the present invention provides
`an apparatus, medium, and method for segmenting video
`sequences based on a topic, at high speed, based on the
`detection of main characters.
`0010 Additional aspects and/or advantages of the inven
`tion will be set forth in part in the description which follows
`and, in part, will be apparent from the description, or may be
`learned by practice of the invention.
`0011 To achieve the above and/or other aspects and
`advantages, embodiments of the present invention include
`an apparatus topic based segmenting a video program, the
`apparatus including a start-shot determination unit to detect
`a plurality of key-frames based on character information
`from video sequences including a plurality of frames to
`determine the detected key-frames as start-shots for each
`topic, and a topic list creation unit to create a topic list based
`on the start-shots for each topic.
`0012. The start-shot determination unit may detect key
`frames based on clothing information of at least one main
`character.
`0013 The topic list creation unit may organize frames
`existing between a current topic start-shot and a next topic
`start-shot into a current topic episode, and add the current
`topic episode to the start-shot of each topic in the topic list.
`0014 Further, the start-shot determination unit may
`include a pre-processing unit to determine frames belonging
`to a respective scene by detecting scene change among
`frames included in the video sequences and to obtain a
`number of main characters appearing in the video sequences,
`a face detection unit to detect faces from the determined
`frames belonging to the respective scene to determine face
`detection frames, and a key-frame determination unit to
`cluster the determined face detection frames according to the
`main characters corresponding to the number of main char
`acters to determine the key-frames.
`0015 The pre-processing unit may detect the scene
`change by calculating similarity between a current frame
`and a previous frame.
`0016.
`In addition, the pre-processing unit may obtain the
`number of main characters from an electronic program guide
`(EPG) signal.
`0017. The pre-processing unit may include a thumbnail
`image creation unit to create thumbnail images for input
`frames, a scene change detection unit to detect the scene
`change using similarity of color histograms between thumb
`nail images of neighboring frames, and a number-of-main
`characters determination unit to determine the number of
`main characters by analyzing an EPG signal.
`
`Petitioner Apple Inc. - Ex. 1049, p. 12
`
`
`
`US 2007/0O30391 A1
`
`Feb. 8, 2007
`
`0018. In addition, the face detection unit may include a
`thumbnail image re-organization unit to create an integral
`image for thumbnail images of input frames and to re
`organize the thumbnail images using the integral image, a
`Sub-window generation unit to generate a Sub-window for
`the re-organized thumbnail images, and a classifying unit to
`determining whether the sub-window includes a face.
`0.019
`Here, the face detection unit may divide the thumb
`nail images of the input frames into a plurality of sections
`having a section having a highest probability of detecting the
`face, and sequentially provide the plurality of sections to the
`thumbnail image re-organization unit in descending order
`from the section having the highest probability of detecting
`the face to a section having a lowest probability of detecting
`the face.
`0020. The key-frame determination unit may further
`include a clothing information extraction unit to extract
`clothing information from a face detection frame, a character
`clustering unit to perform a character clustering method
`based on the extracted clothing information, and a main
`character determination unit to select a cluster correspond
`ing to the main character from a plurality of clusters,
`clustered in the character clustering unit, corresponding to
`the number of main characters and to provide frames
`included in the selected cluster as key-frames of each topic.
`0021. The clothing information may include a clothing
`color histogram.
`0022. To achieve the above and/or other aspects and
`advantages, embodiments of the present invention include a
`method of topic based segmenting of video sequences, the
`method including detecting a plurality of key-frames based
`on character information from video sequences including a
`plurality of frames to determine the detected key-frames as
`start-shots for each topic, and creating a topic list based on
`the start-shots for each topic.
`0023 The determination of the start-shots may include
`detecting key-frames based on clothing information of at
`least one main character.
`0024. Further, the creation of the topic list may include
`organizing frames existing between a current topic start-shot
`and a next topic start-shot into a current topic episode, and
`adding the current topic episode to the start-shot of each
`topic in the topic list.
`0.025 The determination of the start-shots may include
`detecting a scene change from the frames included in the
`Video sequences to determine frames belonging to a respec
`tive scene and obtaining a number of main characters
`appearing in the video sequences, detecting faces from the
`determined frames belonging to the respective scene to
`determine face detection frames, and clustering the deter
`mined face detection frames according to the main charac
`ters corresponding to the number of main characters to
`determine the face detection frames as key-frames.
`0026. The scene change may be detected by creating
`thumbnail images of input frames and using similarity of
`color histograms between thumbnail images of neighboring
`frames.
`0027. In addition, the number of main characters may be
`obtained by analyzing an electronic program guide (EPG)
`signal.
`
`0028. The detection of the faces my include creating an
`integral image for thumbnail images of input frames and
`re-organizing the thumbnail images using the integral
`image, generating a sub-window for the re-organized
`thumbnail images, and determining whether the Sub-window
`includes a face.
`0029. The detection of the faces may further include
`dividing the thumbnail images of the input frames into a
`plurality of sections including a section having a highest
`probability of detecting a face, and sequentially providing
`the thumbnail images for the thumbnail image re-organizing
`in descending order from the section having the highest
`probability of detecting the face to a section having a lowest
`probability of detecting the face.
`0030 The determination of the key-frames may include
`extracting clothing information from the face detection
`frames, performing a character clustering method based on
`the extracted clothing information, and selecting a cluster
`corresponding to the main character from a plurality of
`clusters corresponding to the number of main characters and
`providing frames included in the selected cluster as the
`key-frames of each topic.
`0031) To achieve the above and/or other aspects and
`advantages, embodiments of the present invention include a
`medium including computer readable code to implement a
`method of topic based segmenting of video sequences, the
`method may include detecting a plurality of key-frames
`based on character information from video sequences
`including a plurality of frames to determine the detected
`key-frames as start-shots for each topic, and creating a topic
`list based on the start-shots for each topic.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`0032. These and/or other aspects and advantages of the
`invention will become apparent and more readily appreci
`ated from the following description of the embodiments,
`taken in conjunction with the accompanying drawings of
`which:
`0033 FIG. 1 illustrates an example of topic-based seg
`mentation of video sequences related to news;
`0034 FIG. 2 illustrates an apparatus for segmenting
`Video sequences based on a topic, according to an embodi
`ment of the present invention;
`0035 FIG. 3 illustrates a start-shot determination unit,
`such as that of FIG. 2, according to an embodiment of the
`present invention;
`0036 FIGS. 4A to 4C illustrate an operation of each
`element of a start-shot determination unit, Such as that of
`FIG. 3, according to an embodiment of the present inven
`tion;
`0037 FIG. 5 illustrates a pre-processing unit of a start
`shot determination unit, such as that of FIG. 3, according to
`an embodiment of the present invention;
`0038 FIG. 6A illustrates a face detection unit of a start
`shot determination unit, such as that of FIG. 3, according to
`an embodiment of the present invention;
`0039 FIG. 6B illustrates a method of organizing an
`integral image, according to an embodiment of the present
`invention;
`
`Petitioner Apple Inc. - Ex. 1049, p. 13
`
`
`
`US 2007/0O30391 A1
`
`Feb. 8, 2007
`
`0040 FIG. 7 illustrates an example of a sub-window used
`in a face detection unit of a start-shot determination unit,
`such as that of FIG. 3, according to an embodiment of the
`present invention;
`0041
`FIGS. 8A and 8B illustrate examples of character
`istics used in a classifier of a face detection unit, such as that
`of FIG. 6A, according to an embodiment of the present
`invention;
`0.042
`FIG. 9 illustrates an example of frame image
`segmentation for detecting faces in a face detection unit of
`a start-shot determination unit, such as that of FIG. 3,
`according to an embodiment of the present invention;
`0043 FIGS. 10A and 10B illustrate an operation of a face
`detection unit of a start-shot determination unit, Such as that
`of FIG. 3, according to an embodiment of the present
`invention;
`0044 FIG. 11 illustrates a key-frame determination unit
`of a start-shot determination unit, such as that of FIG. 3,
`according to an embodiment of the present invention; and
`0045 FIG. 12 illustrates an operation of a clothing infor
`mation extraction unit, Such as that of FIG. 11, according to
`an embodiment of the present invention.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`0046 Reference will now be made in detail to embodi
`ments of the present invention, examples of which are
`illustrated in the accompanying drawings, wherein like
`reference numerals refer to the like elements throughout.
`Embodiments are described below to explain the present
`invention by referring to the figures.
`0047 FIG. 1 illustrates an example of topic-based seg
`mentation of video sequences related to news. Referring to
`FIG. 1, Chapters 1 to 25 are segmented based on a topic,
`whereby each chapter includes a start-shot set as a key
`frame having a main character and material frames, e.g., an
`episode, for Supporting corresponding content. Here, though
`only news has been shown, embodiments of the present
`invention are equally available for alternate topics in addi
`tion to news.
`0.048
`FIG. 2 illustrates an apparatus for segmenting
`Video sequences based on a topic, according to an embodi
`ment of the present invention. Referring to FIG. 2, the
`apparatus for segmenting video sequences based on a topic
`may include a start-shot determination unit 210 and a topic
`list creation unit 230, for example, in order to segment video
`sequences based on the topic by detecting the main charac
`terS.
`0049 Referring to FIG. 2, the start-shot determination
`unit 210 may detect a plurality of key-frames by using
`character information from video sequences including a
`plurality of frames to determine the detected key-frames as
`start-shots for each topic. In one embodiment, a main
`character may appear in each key frame. In addition, an
`operation of detecting the start-shot preferably may be
`performed in units of Scenes.
`0050. The topic list creation unit 230 may further create
`a topic list by using the start-shots for each topic determined
`by the start-shot determination unit 210. The start-shots
`detected for each scene are combined to create the topic list.
`In one embodiment, frames existing between a current topic
`start-shot and a next topic start-shot are made into a current
`
`topic episode, and the current topic episode is added to the
`start-shot of each topic of the topic list.
`0051
`FIG. 3 illustrates a make up of a start-shot deter
`mination unit 210. Such as that of FIG. 2, according to an
`embodiment of the present invention. The start-shot deter
`mination unit 210 may include a pre-processing unit 310, a
`face detection unit 330, and a key-frame determination unit
`350, for example.
`0.052
`Referring to FIG. 3, the pre-processing unit 310
`may receive video sequences making up one video program
`and detect scene changes to determine frames belonging to
`a current scene. In addition, the pre-processing unit 310 may
`receive an electronic program guide (EPG) signal of a
`corresponding video program and determines the number of
`main characters. As shown in FIG. 4A, the EPG signal may
`include various kinds of information Such as broadcasting
`time, program genre, title, name of a director, names of
`characters, plot, etc.
`0053) The face detection unit 330 may detect faces in
`each of the frames belonging to the current scene, e.g., as
`determined by the pre-processing unit 310. Since the main
`characters may look to the front, front faces may be detected.
`In this case, only whether a face exists may be determined,
`for example, regardless of the number of faces in each
`frame. Here, a variety of well-known face detection algo
`rithms may be employed to detect faces.
`0054 The key-frame determination unit 350 may detect
`clothing information from the frames in which faces have
`been detected, e.g., in the face detection unit 330, cluster
`frames for each character corresponding to the clothing
`information, and determine frames including the main char
`acter as the key-frames, e.g., start-shots of a corresponding
`topic. Since the clothing information of a main character
`seldom changes in a single video program, the clothing
`information may be used in a character clustering method.
`Clusters having relatively few frames may also be removed
`from a plurality of clusters generated as a result of the
`clustering, in consideration with a determined number of
`main characters, e.g., as determined in the pre-processing
`unit 310, assuming that the main characters appear more
`frequently compared to other characters. The key frame
`determination unit 350, thus, may determine the result of the
`character clustering, for example, the key-frames of FIG.
`4B, and use the key-frames to create a topic list as shown in
`FIG 4C.
`0055 FIG. 5 illustrates a make up of a pre-processing
`unit 310 of a start-shot determination unit 210, such as that
`of FIG. 3, according to an embodiment of the present
`invention. The pre-processing unit 310 may include a frame
`input unit 510, a thumbnail image creation unit 530, a scene
`change detection unit 550, an EPG analyzing unit 570, and
`a number-of-main-characters determination unit 590, for
`example.
`0056 Referring to FIG. 5, the frame input unit 510 may
`sequentially receive frame images detected from the video
`Sequences.
`0057 The thumbnail image creation unit 530 may sample
`pixels with a constant interval for original frame images
`provided from the frame input unit 510 in a size of WXH to
`create thumbnail images having a reduced size of wxh.
`These thumbnail images allow the face detection unit 330 to
`detect faces at a higher speed in comparison with when the
`original frame images are used.
`
`Petitioner Apple Inc. - Ex. 1049, p. 14
`
`
`
`US 2007/0O30391 A1
`
`Feb. 8, 2007
`
`0.058. The scene change detection unit 550 may store
`previous frame images and calculate similarity of color
`histograms between two Successive frame images, e.g.,
`between a current frame image and the previous frame
`image. When the calculated similarity is lower than a
`predetermined threshold value, it may be determined that a
`scene change is detected in the current frame. In this case,
`the similarity Sim(Ht, Hit--1) may be calculated from the
`below Equation 1, for example.
`
`W
`Sim(Hi, H) = Xmin H. (n), H., (n)),
`
`Equation 1
`
`0059 Here, Ht corresponds to a color histogram of the
`previous frame image, Hit--1 corresponds to a color histo
`gram of the current frame image, and N corresponds to a
`histogram level.
`0060. The EPG analyzing unit 570 may analyze an EPG
`signal included in a single video program, and the number
`of-main-characters determination unit 590 may determine
`the number of main characters based on the result of the
`analysis in the EPG analyzing unit 570.
`0061
`FIG. 6A illustrates a detailed make up of a face
`detection unit 330 of a start-shot determination unit 210,
`such as that of FIG. 3, according to an embodiment of the
`present invention. The face detection unit 330 may include
`a thumbnail image re-organization unit 610, a Sub-window
`generation unit 630, and a classifying unit 650, for example.
`0062 Referring to FIG. 6A, the thumbnail re-organiza
`tion unit 610 may obtain integral images at each point from
`the thumbnail images for the frames belonging to the current
`scene, e.g., as provided by the pre-processing unit 310, to
`re-organize a thumbnail image. A method of obtaining the
`integral images will be further described below in greater
`detail with reference to FIG. 6B.
`0063 Referring to FIG. 6B, the thumbnail image may
`include four regions A, B, C, and D, and four points a, b, c,
`and d, specified according to an embodiment of the present
`invention. An integral image of a point a refers to a Sum of
`pixel values in a region on an upper left side of the point a.
`That is, the integral image of the point a corresponds to a
`Sum of pixel values in the region A. In this case, each of the
`pixel values may include a luminance level of a pixel, for
`example. In addition, an integral square image of the point
`a refers to a sum of squared pixel values in the region on the
`upper left side of the point a. That is, the integral square
`image at the point a corresponds to a sum of squared pixel
`values included in the region A. This concept of Such an
`integral image allows convenient calculation of the Sum of
`the pixel values in any region of an image. In addition, use
`of such an integral image allows for fast segmentation in the
`segmentation unit 670. For example, the sum of the pixel
`values of the region D may be calculated from the below
`Equation 2, for example.
`Equation 2
`S(D)=i(d)-i(b)-i(c)+i(a)
`0064. Here, ii(d) corresponds to the integral image of the
`point d, ii (b) corresponds to the integral image of the point
`b. ii (c) corresponds to the integral image of the point c, and
`i(a) corresponds to the integral image of the point a.
`
`0065. The thumbnail image re-organization unit 610 may
`reorganize the thumbnail images using integral images at
`each point, as calculated from Equation 2, for example. In
`one embodiment, the inclusion of the thumbnail re-organi
`zation unit 610 may be optional.
`0066. The sub-window generation unit 630 may generate
`sub-windows by dividing the re-organized thumbnail
`images, e.g., as re-organized in the thumbnail image re
`organization unit 610. In one embodiment, the size of the
`sub-window may be previously determined and may be
`linearly enlarged by a predetermined ratio. For example, the
`size of the sub-window may be initially set to 20x20 pixels,
`and the entire image may be divided using the Sub-window
`having the above initial size. Then the size of the sub
`window may be linearly enlarged by a ratio of 1:2, and the
`entire image may be divided again using the Sub-window
`having the enlarged size. The image may be divided by
`enlarging the size of the sub-window until the size of the
`Sub-window becomes equal to the size of the entire image.
`The Sub-windows generated in the Sub-window generation
`unit 630 may be superposed with one another, for example.
`Reference numerals 710, 730, and 750 of FIG. 7 further
`illustrate examples of sub-windows generated by the sub
`window generation unit 630.
`0067. The classifying unit 650 may be implemented by n
`stages S1 to Sn, which may further be cascaded. Each of the
`stages S1 to Sn detects faces using classifiers based on a
`simple characteristic. The number of classifiers may also
`increase as the stage number increases. For example, four or
`five classifiers may be used in the first stage S1, and fifteen
`to twenty classifiers may be used in the second stage S2, and
`SO. O.
`0068. Each stage may have a weighted sum for a plurality
`of classifiers and may determine whether the face has been
`Successfully detected based on the sign of the weighted Sum.
`The sign of the weighted Sum of each stage can be expressed
`by the following Equation 3, for example.
`
`Equation 3
`
`0069. Here, cm corresponds to a weighting value of a
`classifier, and fm(X) corresponds to an output of a classifier.
`Each classifier has a single simple characteristic and a
`threshold value. As a result, -1 or +1 is output as the value
`offm(x).
`0070. In the classifying unit 650, the first stage S1 may
`receive the k-th sub-window provided from the sub-window
`generation unit 630 and tries to detect faces. When the face
`detection fails, the k-th sub-window is determined as a
`non-face sub-window. Conversely, when the face detection
`is successful, the k-th Sub-window image is provided to the
`second stage S2. When the face detection is successful in the
`k-th sub-window of the final stage Sn, the k-th sub-window
`is determined as a face sub-window. On the other hand, an
`Adaboost learning algorithm may also be employed in each
`classifier to select the weighting value. According to the
`Adaboost algorithm, some important visual characteristics
`are selected from a large characteristic set to generate a very
`
`Petitioner Apple Inc. - Ex. 1049, p. 15
`
`
`
`US 2007/0O30391 A1
`
`Feb. 8, 2007
`
`efficient classifier. Such a cascaded stage structure allows the
`non-face Sub-window to be determined even by using a
`small number of simple characteristics. Therefore, the non
`face Sub-window can be directly rejected at initial stages
`Such as the first or second stage, and then the next (k+1)-th
`sub-window can be received to detect faces. As a result, it is
`possible to improve a total speed of the face detection
`process.
`FIG. 8A illustrates edge simple characteristics 811
`0071
`and 812, and line simple characteristics 813, 814, 815, and
`816 used in each classifier of the classifying unit 650,
`according to an embodiment of the present invention. Each
`simple characteristic includes two or three rectangular areas
`having a white or black color. Each classifier subtracts the
`sum of pixel values of the white rectangular area from the
`Sum of pixel values of the black rectangular area according
`to the simple characteristics, and the Subtraction result is
`compared with the threshold value corresponding to the
`simple characteristic. A sign value of -1 or +1 is output
`depending on the result of the comparison between the
`subtraction result and the threshold value. FIG. 8B further
`illustrates an example of eye detection using a line simple
`characteristic 821 having one white rectangular area and two
`black rectangular areas or an edge simple characteristic 823
`having one white rectangular area and one black rectangular
`area. When the line simple characteristic is used, the differ
`ence of pixel values between an eye region and a nose ridge
`region of a face is measured taking into consideration that
`the eye region is darker than the nose ridge region. When the
`edge simple characteristic is used, the difference of grada
`tions between the eye region and the cheek region of a face
`is measured taking into consideration that the eye region is
`darker than the cheek region. As described above, the simple
`characteristics for detecting the face may be variously
`provided.
`0072 FIG. 9 illustrates an example of frame image
`segmentation for detecting faces at high speed using a face
`detection unit 330 of a start-shot determination unit 210,
`such as that of FIG. 3, according to an embodiment of the
`present invention. A frame image may be divided into first
`to fifth sections 910, 930, 950, 970, and 990 according to a
`possibility of face existence before the thumbnail images are
`input to the thumbnail re-organization unit 610. In this case,
`the segmentation locations for each section may be statisti
`cally determined through experiments or simulations, for
`example. Generally, since the first section 910 has the
`highest probability of detecting the face 900, the plurality of
`sections may be sequentially provided to the thumbnail
`image re-organization unit