`
`1111111111111111111111111111111111111111111111111111111111111
`US007911513B2
`
`c12) United States Patent
`Garrison et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,911,513 B2
`Mar.22,2011
`
`(54) SIMULATING SHORT DEPTH OF FIELD TO
`MAXIMIZE PRIVACY IN VIDEOTELEPHONY
`
`(75)
`
`Inventors: William J. Garrison, Warminster, PA
`(US); Albert Fitzgerald Elcock,
`Havertown, PA (US)
`
`(73) Assignee: General Instrument Corporation,
`Horsham, PA (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 456 days.
`
`(21) Appl. No.: 11/737,813
`
`(22) Filed:
`
`Apr. 20, 2007
`
`(65)
`
`Prior Publication Data
`
`US 2008/0259154 AI
`
`Oct. 23, 2008
`
`(51)
`
`Int. Cl.
`H04N 51262
`(2006.01)
`(52) U.S. Cl. ....................................................... 348/239
`(58) Field of Classification Search .................. 348/586,
`348/239
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`5,384,615 A
`111995 Hsieh eta!.
`6,148,113 A
`1112000 Wolverton et a!.
`6,590,571 B2
`7/2003 Laffargue et al.
`6,950,130 B1 *
`9/2005 Qian ............................. 348/239
`7,227,567 B1 *
`6/2007 Becket a!. ................. 348/14.07
`2002/0079425 A1
`6/2002 Rhoads
`2004/0120584 A1 *
`6/2004 Jang et al ...................... 382/232
`2006/0193509 A1 *
`8/2006 Criminisi eta!. ............. 382/154
`2007/0053513 A1
`3/2007 Hofiberg
`2007/0237393 A1 *
`10/2007 Zhang et al ................... 382/173
`
`OTHER PUBLICATIONS
`
`Erik Hjelmas eta!, "Face Detection: A Survey", Computer Vision and
`Image Understanding 83,236-274 (2001), pp. 236-275.
`Harold M. Merklinger, "A Technical View of Bokeh", Photo Tech(cid:173)
`niques, May/Jun. 1997, 5 pages.
`Gary Bradski eta!, "Learning-Based Computer Vision with Intel's
`Open Source Computer Vision Library,"Intel Technology Journal,
`vol. 9, Issue 02, May 19, 2005, pp. 119-130.
`PCT International Search Report and Written Opinion for PCT/
`US2008/058338. Dated Jun. 30, 2008.
`* cited by examiner
`
`Primary Examiner- James M Hannett
`(74) Attorney, Agent, or Firm- Stewart M. Wiener
`
`(57)
`
`ABSTRACT
`
`An arrangement for simulating a short depth of field in a
`captured videophone image is provided in which the back(cid:173)
`ground portion of the image is digitally segregated and
`blurred to render it indistinct. Thus, the displayed video of a
`user in the foreground is kept in focus while the background
`appears to be out of focus. Image tracking or fixed templates
`are used to segregate an area of interest that is kept in focus
`from the remaining captured video image. Image processing
`techniques are applied to groups of pixels in the remaining
`portion to blur that portion of the captured video image. Such
`techniques include the application of a filter that are alterna(cid:173)
`tively selected from convolution filters in the spatial domain
`(e.g., mean, median, or Gaussian filters), or frequency filters
`in the frequency domain (e.g., low-pass or Gaussian filters).
`User-selectable control is optionally implemented for con(cid:173)
`trolling the type offoreground/background segregation tech(cid:173)
`nique utilized (i.e., dynamic face-tracking or fixed template
`shape), degree of blurring applied to the background, and
`on/off control of the background blurring.
`
`20 Claims, 9 Drawing Sheets
`
`GTL 1005
`
`
`
`U.S. Patent
`
`Mar.22,2011
`
`Sheet 1 of9
`
`US 7,911,513 B2
`
`
`
`FIG. 4
`
`:..tO_Q
`
`452
`
`MIDDLE GROUND
`
`442
`
`462
`
`BACKGROUND
`
`460
`
`/
`
`/
`I'
`
`/
`
`418
`
`/
`t'
`
`/
`
`/
`
`/
`
`413
`
`435
`
`
`
`U.S. Patent
`
`Mar. 22, 2011
`
`Sheet 3 of9
`
`US 7,911,513 B2
`
`FIG. 5
`
`532
`
`
`
`U.S. Patent
`
`Mar.22,2011
`
`Sheet 4 of9
`
`US 7,911,513 B2
`
`LD
`N
`1.0
`
`121
`
`~I
`
`
`
`U.S. Patent
`
`Mar.22,2011
`
`Sheet 5 of9
`
`US 7,911,513 B2
`
`1.0
`
`t
`N rn r------1
`~
`I
`.-1
`I_.
`L _____ .,.
`I
`I 21
`+
`
`0')
`
`~I
`
`Oi
`oi
`mi
`
`00
`0
`~
`
`~
`
`a
`0
`k:
`
`§I
`
`o: o:
`o!
`'\"--:
`
`lDI 0
`0
`..-
`
`
`
`FICi. 12
`
`FIG. 13
`
`I11 I12 1u 114 I1s 116 L7 I1s 119
`I21 122 In I24 I2s 126 127 1281 I29
`Ln 132 I33 I34 IJs I36 I37 138 I39
`141 I42 143 144 I4s 146 In I48 149
`Js1 152 153 154 l55 156 157 158 159
`
`161 I62 I63 I64 I6s 166 167 l68 169
`
`IMAGE
`
`J;J_JQ.
`
`K11 Kl°Kl3
`
`K21 Kn K.23
`
`KERNEL
`
`FJ(J. 14
`
`1
`9
`1
`9
`1
`9
`
`.,
`9
`1
`g
`1
`9
`
`1
`9
`1
`9
`1
`9
`
`_HJ_Q
`
`
`
`FIG. 15
`
`1500
`
`"1516
`
`VIDEOPHONE ARCHITECTURE
`
`HARDWARE LAYER
`
`
`
`U.S. Patent
`
`Mar.22,2011
`
`Sheet 8 of9
`
`US 7,911,513 B2
`
`FIC7. 16
`
`c START
`!
`
`1605
`
`VIDEO IMAGE CAPTURED BY CAMERA
`WITH LONG DEPTH OF FIELD
`
`............ 1611
`
`!
`
`SPATIALLY SEGREGATE AND BUFFER A PORTION ~ 1616
`OF THE CAPTURED VIDEO IMAGE
`
`!
`
`IMAGE PROCESS THE SEGREGATED VIDEO
`PORTION TO INCREASE CIRCLE OF CONFUSION
`
`!
`
`GENERATE COMPOSITE VIDEO IMAGE
`
`!
`
`.....,.-.. 1620
`
`.....,.-.. 1622
`
`REFRESH BUFFER WITH COMPOSITE VIDEO
`IMAGE
`
`............ 1625
`
`____________ jl _____________
`r RENDER COMPOSITE VIDEO IMAGE ONTO
`~1631
`1
`L------------- -------------~
`I
`
`DISPLAY SCREEN TO CONFIRM PRIVACY
`ENABLEMENT TO USER
`
`I
`I
`
`TRANSMIT COMPOSITE IMAGE TO REMOTE
`VIDEOPHONE
`
`.,....,.1635
`
`1640
`
`
`
`U.S. Patent
`
`Mar.22,2011
`
`Sheet 9 of9
`
`US 7,911,513 B2
`
`
`
`US 7,911,513 B2
`
`1
`SIMULATING SHORT DEPTH OF FIELD TO
`MAXIMIZE PRIVACY IN VIDEOTELEPHONY
`
`BACKGROUND
`
`2
`Such techniques include the application of one or more filters
`selected from convolution filters in the spatial domain (e.g.,
`mean, median, or Gaussian filters), or frequency filters in the
`frequency domain (e.g., low-pass or Gaussian filters). Fixed
`templates are also alternatively utilizable to segregate the
`portions of the captured video which are respectively focused
`and blurred. The templates have various shapes including
`those that are substantially rectangular, oval, or arch-shaped.
`For example, application of the oval-shaped template keeps
`10 the portion of the captured video image falling inside a fixed
`oval in focus and the remaining portion of the image falling
`outside the oval is then digitally blurred.
`User-selectable control is optionally provided to enable
`control of the type of foreground/background segregation
`15 technique utilized (i.e., dynamic object detection/tracking or
`fixed template shape), degree of blurring applied to the back(cid:173)
`ground, and on/off control of the background blurring.
`The simulated short depth of field provided by present
`arrangement advantageously enables a high degree of privacy
`20 to be implemented while preserving the intrinsic value of
`videophone telephony by keeping the video component of the
`videophone call intact. The privacy feature is provided using
`economically-implemented digital image processing tech(cid:173)
`niques that do not require modifications or additions to the
`25 camera hardware which would add undesirable costs. In addi-
`tion, the blurred background portion of the video image
`appears natural to the viewer because short depth of field
`images are in common use in television, movies, and other
`media presentations. Thus, privacy is enabled in a non-intru(cid:173)
`sive manner that does not interfere with the videophone call
`or bring attention to the fact that privacy is being utilized.
`
`DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 shows a camera and two black and white patterned
`targets located in the camera's field of view;
`FIGS. 2 and 3 show images captured by the camera to
`illustrative depth of view;
`FIG. 4 is a pictorial view of an illustrative arrangement
`showing two videophone users;
`FIG. 5 is a pictorial view of one of the videophones shown
`in FIG. 4;
`FIG. 6 shows an illustrative screen shot of a video image
`having a long depth of field that is rendered by a videophone;
`FIG. 7 shows an illustrative screen shot of a video image
`with a simulated short depth of field that is rendered by a
`videophone in accordance with the principles of the present
`arrangement;
`FIG. 8 is an illustration showing an illustrative segregation
`of a captured video image into a portion of interest that is kept
`in focus and a remaining portion that is blurred using a variety
`of alternative image processing techniques;
`FIGS. 9-11 show various illustrative fixed templates, each
`of which segregate a portion of interest in a video image that
`55 is kept in focus while the remaining portions are blurred;
`FIG. 12 is a diagram of an illustrative template having a
`transition area between the portion of interest that is kept in
`focus and the blurred portion;
`FIG. 13 shows an illustrative image and kernel arrays used
`60 to perform convolution attendant to application of digital
`filtering;
`FIG. 14 is an illustrative kernel used with a mean (i.e.,
`averaging) digital filter;
`FIG. 15 is simplified diagram of an illustrative videophone
`65 architecture;
`FIG.16 is a flowchart of an illustrative method simulating
`depth of field effects in a video image; and
`
`Current videophones use cameras having a long depth of
`field which results in the subject matter in a scene captured by
`the camera from foreground to background being in focus.
`This compares to video images captured by cameras having a
`shorter depth of field where subject matter in the foreground
`appears in focus while subject matter in the background of the
`scene appears out of focus.
`Long depth of field in videophones generally results from
`a small digital imaging sensor size relative to the lens aperture
`in combination with a fixed focal length and shutter speed.
`These particular design parameters are selected in order to
`provide good videophone image quality while maintaining
`low component costs which is important for videophones
`sold into the highly competitive consumer electronics market.
`Consumer-market videophones provide excellent perfor(cid:173)
`mance overall, and the long depth of field provided is nor(cid:173)
`mally acceptable in many settings. Not only does it provide a
`perception that the videophone image is sharp and clear over(cid:173)
`all, but a videophone can be used in a variety of settings
`without the user worrying that some portions of a captured
`scene be out of focus. For example, a group of people on one
`end of a videophone call can have some participants posi(cid:173)
`tioned close to the camera while others are farther away.
`Another user may wish to use the videophone to show some(cid:173)
`thing that needs to be kept at some distance from the camera. 30
`However, the videophone's long depth of field can present
`issues in some situations. Some users may find the details in
`the background of the received video image to be distracting.
`Others might be uncomfortable that their videophone cap(cid:173)
`tures too a clear view of themselves, their home, or surround- 35
`ings and represents some degree of intrusion on their privacy.
`And even for those users who fully embrace the videophone's
`capabilities, it is possible that details of a user's life may be
`unintendedly revealed during a videophone call. For
`example, a person might not realize that a videophone call is 40
`taking place and walk through the background in a state of
`attire that is inappropriate for viewing by people outside the
`home.
`One current solution to address privacy concerns includes
`placing controls on the videophone that let a user turn the 45
`videophone camera off while keeping the audio portion of the
`call intact. While effective in many situations, it represents an
`all or none solution that not all users accept since the loss of
`the video function removes a primary feature provided by the
`videophone. In addition, such user controls do not prevent the 50
`accidental capture of undesirable or inappropriate content by
`the videophone.
`
`SUMMARY
`
`An arrangement for simulating a short depth of field in a
`captured videophone image is provided in which the back(cid:173)
`ground portion of the image is digitally segregated and
`blurred to render it indistinct. As a result, the displayed video
`image of a videophone user in the foreground is kept in focus
`while the background appears to be out of focus.
`In various illustrative examples, image detection and track(cid:173)
`ing techniques are used to dynamically segregate a portion of
`interest-such as a person's face, or face and shoulder area
`that is kept in focus-from the remaining video image. Image
`processing techniques are applied to groups of pixels in the
`remaining portion to blur that portion and render it indistinct.
`
`
`
`US 7,911,513 B2
`
`3
`FIG. 17 shows an illustrative screen shot of a video image
`with a simulated short depth of field that is rendered by a
`videophone to provide positive feedback to a user that privacy
`is enabled in accordance with the principles of the present
`arrangement.
`Like reference numerals indicate like elements throughout
`the drawings.
`
`DETAILED DESCRIPTION
`
`Various compositional techniques are employed in tradi(cid:173)
`tional photography to emphasize the primary subject matter
`in a scene. One such technique is known as "Bokeh" which is
`Japanese term that translates into "fuzzy" or "dizziness."
`Bokeh refers to the use of out-of-focus highlights or areas in 15
`a rendered image. Bokeh techniques may be used for a variety
`of functional, artistic, or aesthetic reasons in which an
`attribute known as "depth of field" is manipulated to provide
`the desire effect where the primary subject is kept in focus
`while the remaining portion of the rendered image is out of 20
`focus.
`Depth of field in both still and video photography is deter(cid:173)
`mined by lens aperture, film negative/image sensor size (in
`traditional/digital imaging, respectively), and focal length.
`Traditional35 mm film has a short depth of field because the 25
`negative size is large compared with the lens aperture. By
`comparison, to minimize costs, most videophones targeted at
`the consumer market use a very small digital image sensor
`along with an optics package that includes a fixed focal length
`and shutter speed. Thus, traditional techniques used to 30
`shorten depth of field by adjusting the aperture number (i.e.,
`f/stop) down below the lens's maximum aperture and reduc(cid:173)
`ing shutter speed to compensate for exposure are not gener(cid:173)
`ally applicable to videophone cameras.
`Depth of field is the range of distance around the focal 35
`plane which is acceptably sharp. The depth of field varies
`depending on camera type, aperture and focusing distance,
`although the rendered image size and viewing distance can
`influence the perception of it. The depth of field does not
`abruptly change from sharp to unsharp, but instead occurs as 40
`a gradual transition. In fact, everything immediately in front
`of or in back of the focusing distance begins to lose sharpness
`even if this is not perceived by the viewer or by the resolution
`of the camera.
`Because there is no critical point of transition, a term called
`the "circle of confusion" is used to define how much a par(cid:173)
`ticular point needs to be blurred in order to be perceived as
`being unsharp. The circle of confusion is an optical spot
`caused by a cone oflight from a lens not coming to a perfect
`focus when imaging a point source. Objects with a small
`"circle of confusion" show a clear and clean dot and are in
`focus. Objects with a large "circle of confusion" show a dot
`with blurry edges and are out of focus.
`Accordingly, the present arrangement provides a person's
`face or other area of interest in the foreground of the rendered
`videophone image with a small circle of confusion. The
`remaining portion of the image is rendered with a large circle
`of confusion. Further discussion of Bokeh techniques, circle
`of confusion and sample images are available in H. Merk(cid:173)
`linger, A Technical View ofBokeh, Photo Techniques, May/
`June (1997).
`FIGS. 1-3 are provided to illustrate the application of the
`principles of depth of field to the present arrangement. FIG. 1
`is a pictorial illustration showing a camera 105 having two
`black and white patterned targets 112 and 115 within its field
`of view. As shown, target 112 is in the foreground of the
`camera's field of view and target 115 is in the background.
`
`4
`FIG. 2 shows an example of the appearance of an image with
`a long depth of focus taken by camera 105. As shown, targets
`112 and 115 are both in focus. By comparison, FIG. 3 shows
`an example of an image having a shorter depth of focus. Here,
`the target 112 in the foreground is in focus, but target 115 in
`the background is no longer in focus and appears blurry.
`Turning to FIG. 4, there is shown an illustrative arrange(cid:173)
`ment 400 in which two videophone users are engaged in a
`video telephony session. User 405 is using videophone 408 in
`10 home 413. Videophone 408 is coupled over a network 418 to
`videophone 426 used by user 430 in home 435. Videophones
`generally provide better image quality with both higher frame
`rates and resolution when calls are carried over broadband
`networks, although some videophones are configured to work
`over regular public switched telephone networks ("PSTN s").
`Broadband networks services are commonly provided from
`cable, DSL (Digital Subscriber Line) and satellite service
`providers. Videophones are normally used in pairs where
`each party on the call uses a videophone.
`FIG. 5 is a pictorial view of the videophone 408 shown in
`FIG. 4. Videophone 408 is representative of videophones that
`are available to the consumer market. Videophone 408
`includes a display component 502 that is attached to a base
`505 with a mounting arm 512. Base 505 is configured to allow
`videophone 408 to be positioned on desk or table, for
`example. A camera 514 is disposed in the display component
`having a lens that is oriented towards the videophone user, as
`shown. A microphone (not shown) is also positioned near
`camera 514 to capture voices and other sounds associated
`with a videophone call.
`Camera 514 is commonly implemented using a CCD
`(charge coupled device) image sensor that captures images
`formed, from a multiplicity of pixels (i.e., discrete picture
`elements), of the videophone user and surrounding area. The
`images from camera 514 are subjected to digital signal pro(cid:173)
`cessing in videophone 408 to generate a digital video image
`output stream that is transmitted to the videophone 426 on the
`other end of the videophone call. In this illustrative example,
`the digital video image output stream is a compressed video
`stream compliant with MPEG-4 video standard defined by
`the Moving Picture Experts Group with the International
`Organization for Standardization ("ISO"). In alternative
`embodiments, other formats and/or video compression
`schemes are usable including one selected from MPEG-1,
`45 MPEG-2, MPEG-7, MPEG-21, VC-1 (also known as Society
`of Motion Picture and Television Engineers SMPTE 421M),
`DV (Digital Video), DivX created by DivX, Inc. (formerly
`known as DivXNetworks Inc.), International Telecommuni(cid:173)
`cations Union ITU H.261, ITU H.263, ITU H.264, WMV
`50 (Windows Media Video), RealMedia, RealVideo, Apple
`QuickTime, ASF (Advanced Streaming Format, also known
`as Advanced System Format), AVI (Audio Video Interface),
`3GPP (3rd Generation Partnership Project), 3GPP2 (3rd Gen(cid:173)
`eration Partnership Project 2), JPEG (Joint Photographic
`55 Experts Group), or Motion-JPEG.
`Display component 502 includes a screen 516 that com(cid:173)
`prises a receiving picture area 520 and a sending picture area
`525. The receiving picture area 520 of screen 516 is arranged
`to display the video image of the user 430 captured by a
`60 camera in videophone 426 shown in FIG. 4. The sending
`picture area 525 displays a relatively smaller image of the
`user 405 captured by the camera 514. Sending picture area
`525 thus enables user 405 to see the picture ofhimselfthat is
`being sent and seen by the other user 430. Such feedback is
`65 important to enable user 405 to place himself in field of view
`of camera 514 with the desired positioning and framing
`within the captured video image.
`
`
`
`US 7,911,513 B2
`
`5
`Mounting arm 512 is arranged to position the display com(cid:173)
`ponent 502 and camera 514 at a distance above the base 505
`to provide comfortable viewing of the displayed video image
`and position the camera 514 with a good field of view of the
`videophone user. Disposed in mounting arm 512 are video(cid:173)
`phone operating controls 532 which are provided for the user
`to place videophone calls, set user-preferences, adjust video(cid:173)
`phone settings, and the like.
`Referring again to FIG. 4, videophone user 430 is posi(cid:173)
`tioned in the foreground of a scene 440 captured by the 10
`camera disposed in videophone 426. The foreground is indi(cid:173)
`cated by reference numeral442. Similarly, as shown, a house(cid:173)
`plant 450 is in the middle ground 452 of the scene, and a
`family member 460 is in the background 462.
`FIG. 6 shows an illustrative screen shot 600 of a video 15
`image of the captured scene 440 in FIG. 4 as rendered onto
`screen 516 by the videophone 408. As shown, the rendered
`image appears with a long depth of field as user 430, house(cid:173)
`plant 450, and family member 460 are all in focus. As noted
`above, such long depth of field is normally provided for video
`images rendered by conventional videophones. And, such
`clear imaging of all the subject matter in the capture scene
`may present privacy concerns.
`In comparison to the conventional long depth of field video
`image shown in FIG. 6, FIG. 7 shows an illustrative screen
`shot 700 of a video image ofhaving a simulated short depth of
`field as provided by the present arrangement. The video
`image shown in screen shot 700 is of the same captured scene
`440 as rendered onto screen 516 by the videophone 408. Here,
`only the image of the user 430 in the foreground 442 is kept in
`focus while the houseplant 450 and family member 460 are
`blurred and rendered indistinct as indicated by the dot pat(cid:173)
`terns in FIG. 7.
`FIG. 8 is an illustration showing an illustrative segregation
`of a captured video image into a region of interest 805 that is
`kept in focus and a remaining portion 810 that is blurred using
`a one of several alternative image processing techniques (as
`described below in the text accompanying FIGS. 13 and 14).
`In this illustrative example, object detection techniques are
`utilized in which a specific feature, in this case the user's face,
`head, and shoulders are dynamically detected in the captured
`video image and tracked as the user moves and/or changes
`position during the course of the videophone call. While FIG.
`8 shows the area of interest comprises the user's face, head,
`and shoulder region, other areas of interest may also be 45
`defined for detection and tracking. For example, the area of
`the image kept in focus using a dynamic detection and track(cid:173)
`ing technique may be limited to just the user's face area.
`Object detection, and in particular, face detection is an
`important element of various computer vision areas, such as
`image retrieval, shot detection, video surveillance, etc. The
`goal is to find an object of a pre-defined class in a video image.
`A variety of conventional object detection in video images
`techniques are usable depending on the requirements of a
`specific application. Such techniques include feature-based
`approaches which locate face geometry features by extract(cid:173)
`ing, for example certain image features, such as edges, color
`regions, textures, contours, video motion cues etc., and then
`using some heuristics to find configurations and/or combina(cid:173)
`tions of those features specific to the object of interest.
`Other object detection techniques use
`image-based
`approaches in which the location of objects such as faces is
`essentially treated as a pattern recognition problem. The basic
`approach in recognizing face patterns is via a training proce(cid:173)
`dure which classifies examples into face and non-face proto- 65
`type classes. Comparison between these classes and a 2D
`intensity array (hence the name image-based) extracted from
`
`6
`an input image allows the decision of face existence to be
`made. Image-based approaches include linear subspace
`methods, neural networks, and statistical approaches.
`An overview of these techniques and a discussion of others
`may be found in E. Hjelmas and B. K. Low, Face Detection:
`A Survey, Computer Vision and Image Understanding 83,
`236-274 (2001). In addition, a variety of open source code
`sources are available to implement appropriate face-detection
`algorithms including the OpenCV computer vision facility
`from Intel Corporation provides both low-level and high(cid:173)
`level APis (application programming interfaces) for face
`detection using a statistical model. This statistical model, or
`classifier, takes multiple instances of the object class of inter(cid:173)
`est, or "positive" samples, and multiple "negative" samples,
`i.e., images that do not contain objects of interest. Positive
`and negative samples together make a training set. During
`training, different features are extracted from the training
`samples and distinctive features that can be used to classify
`the object are selected. This information is "compressed" into
`20 the statistical model parameters. If the trained classifier does
`not detect an object (misses the object) or mistakenly detects
`the absent object (i.e., gives a false alarm), it is easy to make
`an adjustment by adding the corresponding positive or nega(cid:173)
`tive samples to the training set. More information on Intel
`25 OpenCV face detection may be found in G. Bradski, A. Kae(cid:173)
`hler, and V. Pisarevsky, Learning-Based Computer Vision
`with Intel's Open Source Computer Vision Library, Intel
`Technical Journal, Vol. 9, Issue 2, (2005).
`FIGS. 9-11 show illustrative examples of fixed templates
`30 that are applied to a captured video image to segregate the
`portion of interest from the remaining portion. By com pari(cid:173)
`son to the object detection technique where the shape of the
`target portion dynamically varies as the subject moves, the
`templates in FIGS. 9-11 use a fixed border between the target
`35 and remaining portions. Use of fixed templates may provide a
`less complex implementation of the segregation aspect of the
`present arrangement for implementing privacy while main(cid:173)
`taining the majority of its functionality which may be benefi(cid:173)
`cial in some scenarios. In an optional arrangement, control is
`40 provided to the videophone user to select from various tem(cid:173)
`plates to find a template that best matches the particular use
`and circumstances. In other arrangements, the relative sizes
`of the target and remaining portions may be adjusted, either in
`fixed increments or infinitely in a fixed range.
`As shown, template 900 in FIG. 9 has a substantially rect-
`angular target portion 905 that is disposed in an area that fills
`approximately the central two-thirds of the screen. Target
`portion 905 is positioned to allow the remaining portion 910
`to fill the top and sides of the screen. This template makes use
`50 of the observation that most videophone users position them(cid:173)
`selves to fill the central portion of the videophone camera's
`field of view. Accordingly, the areas of potential privacy con(cid:173)
`cern will tend to be at the tops and sides of the captured image.
`As noted above, in optional arrangements the relative size
`55 between the target portion 905 and remaining portion 910
`may be configured to be user adjustable as indicated by the
`dashed rectangle 925 in FIG. 9.
`FIG.10 shows a template 1000 that is similar to that shown
`in FIG. 9 (by occupying approximately the central two-thirds
`60 of the screen) except the top portion of the target portion 1005
`is curved. Thus, the target portion 1005 is substantially arched
`shaped. Use of this shape increases the area of the remaining
`portion 1010 and may provide a better fit between in-focus
`and blurred portions for a particular user's application.
`FIG. 11 shows a template 1100 in which the target portion
`is substantially oval shaped. In this case, the remaining por(cid:173)
`tion 1110 surrounds the target portion 1105 so that privacy
`
`
`
`US 7,911,513 B2
`
`8
`
`O(i, j) = ~~l(i+k -1, j+l-1)K(k, l)
`k=l l=l
`
`where i runs from 1 to M-m+1 andj runs from 1 to N-n+l.
`In one illustrative example, the convolution filter applied is
`called a mean filter where each pixel in the image is replaced
`by an average value of its neighbors, including itself. Mean
`filters are also commonly referred to as "box," "smoothing,"
`or "averaging" filters. The kernel used for the mean filter
`represents the size and shape of the neighborhood to be
`sampled when calculating the mean. Often, a 3x3 square
`kernel as indicated by reference numeral 1410 in FIG. 14,
`although larger 5x5, 7x7 etc., kernels may also be used to
`create more blurring. The kernel 1405 may also be applied
`more than once.
`A median filter is alternatively utilized in which the aver(cid:173)
`age value used in the mean filter is replaced by the median
`20 value of neighboring pixels.
`In another illustrative example, a Gaussian filter is applied
`to blur the remaining portions other than the portion of inter(cid:173)
`est in the image to be rendered in focus. This filter uses a
`kernel having a shape that represents a Gaussian (i.e., bell(cid:173)
`shaped curve) as represented by:
`
`7
`blurring will be performed at the bottom center of the ren(cid:173)
`dered image (unlike templates 900 and 1000) along with the
`top and side areas of the screen.
`FIG. 12 shows an illustrative template 1200 having a tran(cid:173)
`sition area 1202 between the target portion 1205 in which
`focus is kept intact and remaining portion 1210 that is blurred
`using the present techniques described herein. The transition
`area 1202 is configured with an intermediate degree of circle
`of confusion between the target portion 1205 and remaining
`portion 1210. This enables a softer transition between focus 10
`and blurred areas to be achieved which may help to make the
`rendered image appear more natural in some situations. The
`size of the transition area 1202 is a design choice that will
`normally be selected according to the requirements of a par- 15
`ticular application. Although the transition area is shown
`being used with a template having an oval target portion, it is
`emphasized that such transition area may be used with any
`target portion shape in both fixed templates and dynamic
`object detection embodiments.
`Once a captured video image is segregated into a portion of
`interest and a remaining portion, digital image processing is
`performed to increase the circle of confusion for groups of
`pixels in the remaining portion to thereby blur it and render it
`indistinct. In this illustrative example, the digital image pro- 25
`cessing comprises filtering in either the spatial domain or
`frequency domain.
`The spatial domain is normal image space in which an
`image is represented by intensities at given points in space.
`The spatial domain is a common representation for image 30
`data. A convolution operator is applied to blur the pixels in the
`remaining portion. Convolution is a simple mathematical
`operation which is fundamental to many common image pro(cid:173)
`cessing operations. Convolution provides a way of multiply- 35
`ing together two arrays of numbers, generally of different
`sizes, but of the same dimensionality, to produce a third array
`of numbers of the same dimensionality. This can be used in
`image processing to implement operators whose output pixel
`values are simple linear combinations of certain input pixel 40
`values.
`In an image processing context, one of the input arrays is
`typically a set of intensity values (i.e., gray level) for one of
`the color components in the video image, for example using
`the RGB (red green blue) color model. The second array is
`usually much smaller, and is also two-dimensional (although
`it may be just a single pixel thick), and is known as the kernel.
`FIG. 13 shows an example image 1305 and kemel1310 used
`to illustrate convolution.
`The convolution is performed by sliding the kernel over the
`image, generally starting at the top left comer, so as to move
`the kernel through all the positions where the kernel fits
`entirely within the boundaries of the image. (Note that imple(cid:173)
`mentations differ in what they do at the edges of images, as
`explained below.) Each kernel position corresponds to a
`single output pixel, the value of which is