`
`(12) United States Patent
`Thomas
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 7,027,659 B1
`Apr. 11, 2006
`
`(54) METHOD AND APPARATUS FOR
`GENERATING VIDEO IMAGES
`
`(75) Inventor: David R. Thomas, Opio (FR)
`
`(73) Assignee: Texas Instruments Incorporated,
`Dallas, TX (US)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. N0.: 09/315,247
`
`(22) Filed:
`
`May 20, 1999
`
`Foreign Application Priority Data
`(30)
`May 20, 1998
`
`(EP) ................................ .. 98401228
`
`(51)
`
`(52)
`(58)
`
`(56)
`
`Int. Cl.
`(2006.01)
`G06K 9/40
`US. Cl. ..................................... .. 382/254; 382/260
`
`Field of Classi?cation Search ...... .. 382/254i255,
`382/260i269; 358/19
`See application ?le for complete search history.
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,629,752 A *
`
`5/1997 Kinjo ........................ .. 355/35
`
`FOREIGN PATENT DOCUMENTS
`
`OTHER PUBLICATIONS
`
`Omura et al: “3D Display With Accomodative Compensation
`(3DDAC) Employing Real-Time GaZe Detection”, SID
`Internationa Symposium, Digest of Technical Papers, San
`Diego, May 12-17, 1996, vol. 27, pp. 889-892.*
`
`* cited by examiner
`
`Primary ExamineriThomas D Lee
`Assistant ExamineriStephen Brinich
`(74) Attorney, Agent, or FirmiWade James Brady III;
`Frederick J. Telecky, Jr.
`
`(57)
`
`ABSTRACT
`
`A method and system for generating a video image is
`disclosed in Which an object is monitored With a video
`camera to produce a sequence of video frames. Each of the
`video frames is divided into a plurality of regions, each
`region being representative of a portion of said object. For
`example, the frame of the video image may include the head
`and shoulder region of a user. Regions corresponding to
`predetermined facial features may be selected, such as the
`chin, opposing edges of the mouth, the nose, and the outer
`edge of each eye. At least one of the plurality of regions is
`selected. In the illustrative example, the selected region may
`comprise the mouth of the monitored user. The selected
`region is then recombined With each of the remaining
`regions of the video frame to form a display video image.
`The sequence of video frames is then displayed such that the
`selected region is perceived by an observer to form a sharp
`image, and remaining regions of said display video image
`are less sharp in accordance With the distance between said
`respective portion of said object and said video camera.
`
`JP
`
`2001208524 A *
`
`8/2001
`
`25 Claims, 13 Drawing Sheets
`
`412
`
`Page 1
`
`SECURUS EXHIBIT 2005
`
`
`
`U.S. Patent
`U.S. Patent
`
`Apr. 11,2006
`Apr. 11,2006
`
`Sheet 1 0f 13
`Sheet 1 0f 13
`
`US 7,027,659 B1
`US 7,027,659 B1
`
`FIG.
`
`1
`
`dx
`
`14
`
`Page 2
`
`Page 2
`
`
`
`U.S. Patent
`
`Apr. 11,2006
`
`Sheet 2 0f 13
`
`US 7,027,659 B1
`
`FIG. 2b
`
`FIG. 2c
`
`Page 3
`
`
`
`U.S. Patent
`
`Apr. 11, 2006
`
`Sheet 3 0f 13
`
`US 7,027,659 B1
`
`FIG. 3
`
`(EA?
`
`MONITOR EVENT
`
`302\
`
`$
`304
`\ GENERATE AUDIO/VIDEO
`
`308\
`
`‘
`306
`\ DlVIDE VIDEO FRAMES
`4
`SELECT REGION
`OF VIDEO FRAME
`&
`DE-EMPHASIZE REMAINING
`310/ REGIONS OF VIDEO FRAME
`&
`312/ RECOMBINE REGIONS
`l
`314 / DISPLAY VIDEO FRAMES
`
`400
`
`Page 4
`
`
`
`U.S. Patent
`U.S. Patent
`
`Apr. 11,2006
`Apr. 11,2006
`
`Sheet 4 0f 13
`Sheet 4 0f 13
`
`US 7,027,659 B1
`US 7,027,659 B1
`
`FIG. 4b
`FIG. 4b
`:::2:::::::::::::::::::::::/’
`
`415
`4,5
`
`405
`
`410
`
`@360 M a
`
`N
`If
`
`410
`410
`410
`410
`
`410
`
`410
`
`410
`410
`
`6%
`
`
`
`Pages
`
`Page 5
`
`
`
`U.S. Patent
`U.S. Patent
`
`Apr. 11,2006
`Apr. 11,2006
`
`Sheet 5 0f 13
`Sheet 5 0f 13
`
`US 7,027,659 B1
`US 7,027,659 B1
`
`
`
`Page 6
`
`Page 6
`
`
`
`U.S. Patent
`
`Apr. 11,2006
`
`US 7,027,659 B1
`
`Sheet 6 6f 13
`
`mmw _ _
`
`
`
`
`g Q: _ 532‘ 6 538: 6 53% 6 6 6 E56 _
`
`
`
`9 .m __ 298528 6 0256816? 6 0258 zocww?éo 6 6 89> "
`
`
`
`“IllllllllllllIllllllllllllllllllllllllllllllllllllllllllIlli _
`
`rlllllIllllllllIllllllllllllllllllllllIlllllllllll-fllllllllL
`
`8% 6% 6E 05
`
`_ Q05: 6 025:: _
`
`_ \ \ \ / / _ _ n 0% Na 6% m5 m5 = n
`
`" mww mg www 3o _
`_ 538: _
`
`“ 20:2 ~62 s26 “ _ \ \ / / _
`
`_ 22225 22222 _
`
`
`
`_ ?/ 538: _ " 225559 6 u
`
`
`
`. _ 538: 5382 _
`
`n 052 N; n
`
`_ ‘ \ _
`
`Page 7
`
`
`
`U.S. Patent
`
`Apr. 11,2006
`
`Sheet 7 0f 13
`
`US 7,027,659 B1
`
`P | l l l | | | | l l l l I l l I I l I I | I I I I I I I | I | | | | lllllllllllllll IIIIIIIIL
`
`_ 53% Z9225 , _ n 05%;: 102
`
`
`226: :55: n u \ \ f n n Q8 mg g n
`
`n n
`_ _
`
`25 o m DEN
`\ . 0%
`
`n n " 5 Eu m8 mg mg n _ / / / \ \ _ _ _
`
`_ _ _ _
`
`
`
`
`
`
` 52s: 0 522‘ A 58% A 58% _ 3 .0: “ 89> 22252505 258% 02502-52 225E003 . __ A 5: n = " 3m
`_ 255
`
`
`
`
`
`_|ll|l|IIlllllllllllllllllllllllllllllllllll | l | I | | | | | | | | | | |||_
`
`Page 8
`
`
`
`U.S. Patent
`
`Apr. 11,2006
`
`Sheet 8 0f 13
`
`US 7,027,659 B1
`
`FIG. 8 a
`
`805\
`
`808/
`
`MONITOR EvENT
`I
`806\ GENERATE AUDIO/VIDEO
`I
`807/ IDENTIFY ACTIVE ENTHY ._®
`I
`SELECT z
`IMAGE PLANES
`I
`DlVlDE IMAGE
`809/ PLANES INTo BLOCKS
`
`Mb
`
`V
`82O\ ouANTIzE VIDEO/AUDIO
`I
`822\ ENCODE VIDEO/AUDIO
`I
`824 / ELIMINATE \llDEO/AUDIO
`
`QUANTIZE VIDEO/AUDIO / 834
`T
`LOOP FITLTERING
`
`\832
`
`826 / COMPRESS \ilDEO/AUDIO
`
`MOTION ESTIMATION \830
`
`828/ TRANSMIT SIGNAL
`
`@
`
`Page 9
`
`
`
`U.S. Patent
`
`Apr. 11,2006
`
`Sheet 9 0f 13
`
`US 7,027,659 B1
`
`
`
`FIG. 10
`
`2010
`
`
`
`Page 10
`
`Page 10
`
`
`
`U.S. Patent
`
`Apr. 11, 2006
`
`Sheet 10 0f 13
`
`US 7,027,659 B1
`
`FIG. 9b
`@815
`/91O
`
`V
`
`>
`
`V
`
`f912
`
`4
`REFRESH FIRST f914
`IMAGE PLANE
`
`916
`
`YES
`REFRESH SECOND /920
`IMAGE PLANE
`
`YES
`
`922
`
`REFRESH THIRD
`IMAGE PLANE
`
`YES
`
`928
`
`REFRESH NTH
`IMAGE PLANE \ 932
`
`NO
`
`YES
`
`934
`
`818
`
`Page 11
`
`
`
`U.S. Patent
`
`Apr. 11,2006
`
`Sheet 11 0f 13
`
`US 7,027,659 B1
`
`FIG.
`
`17
`
`FIG. 12
`
`FIG. 73
`
`2202
`
`_____ -__|
`
`Page 12
`
`
`
`U.S. Patent
`U.S. Patent
`
`Apr. 11,2006
`Apr. 11,2006
`
`Sheet 12 0f 13
`Sheet 12 0f 13
`
`US 7,027,659 B1
`US 7,027,659 B1
`
`FIG. 14
`FIG. 74
` 2103
`
`
`
`2100
`
`2110
`
`
`
`FIG. 76
`2300
`
`m
`f‘~é\\j
`{2111
`;
`:
`réx~+2113
`
`\1‘
`
`,,\JN“
`
`—————— ——r—+“T::t::===
`
`$1
`r
`
`O
`
`Page 13
`
`Page 13
`
`
`
`U.S. Patent
`
`Apr. 11,2006
`
`Sheet 13 0f 13
`
`US 7,027,659 B1
`
`FIG. 17b
`
`Page 14
`
`
`
`US 7,027,659 B1
`
`1
`METHOD AND APPARATUS FOR
`GENERATING VIDEO IMAGES
`
`FIELD OF THE INVENTION
`
`The present invention relates generally to the ?eld of
`video images, and more particularly to a method and appa
`ratus for generating video images that are perceived by the
`observer to be three-dimensional.
`
`BACKGROUND OF THE INVENTION
`
`Many systems for generating pseudo three-dimensional
`(3D) images have been developed over recent years. Gen
`erally, these systems can be characterised by the methods by
`Which the observer is deceived into perceiving the image as
`three-dimensional (i.e. having depth).
`In the real-World, the human eye perceives depth in an
`image due to the combination of a number of visual cues.
`With the ?rst visual cue, more distant objects are per
`ceived by the observer to be both smaller and higher in the
`?eld of vieW, than objects that are closer to the observer.
`Typically, distant objects are also blocked from the observ
`er’s ?eld of vieW by closer objects and the observer per
`ceives the resolution, contrast, and brightness to be less Well
`de?ned.
`With the second visual cue the observer perceives an
`apparent change in the position of the object relative to the
`more distant background image as his oWn position changes.
`This effect is knoWn as parallax and can effect the image
`perceived by the observer in both the horizontal and vertical
`planes.
`With the third visual cue the lateral separation of the
`observer’s eyes means that the distance betWeen a point on
`an object and each eye can be different. This effect is knoWn
`in the art as binocular disparity and results in each eye
`observing a slightly different perspective. HoWever, in real
`life, this effect is resolved by the human brain to produce the
`single clear image perceived by the observer.
`The fourth visual cue to three-dimensional perception of
`video images is depth disparity. Since the human eye has a
`?nite ?eld of vieW in both the horizontal and vertical planes,
`the eye tends to focus on an object, or region of an object,
`that is of immediate interest. Consequently, surrounding
`objects, or regions of the object, Which form the background
`image are out of focus and blurred. The human brain
`perceives these surrounding objects or regions to be at a
`different distance to provide a depth cue.
`Known stereoscopic and auto-stereoscopic systems for
`generating pseudo three-dimensional images, generate alter
`nate and slightly differing frames of the video image for each
`eye. The different frames are intended to correspond to the
`different vieWs perceived by the human brain due to the
`separation betWeen the eyes, and produce a binocular dis
`parity.
`The observer of a video image generated using a stereo
`scopic system must be provided With an optical device such
`as a pair of spectacles having one red lens and one green
`lens. A separate frame of the video image is shoWn alter
`nately for each eye and at suf?cient frequency that the
`observer resolves a single image.
`Auto-stereoscopic systems Were developed to produce
`video images With multiple image planes (i.e. the observer
`can vieW around foreground objects). These auto-stereo
`scopic systems are designed to focus separate frames of the
`image into each eye using an arrangement of optical ele
`ments. Typically, these elements Will include vertically
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`aligned lenticular lenses. These systems have found appli
`cation in items such as postcards, but their more Widespread
`use is limited by the narroW ?eld of vieW.
`As the observer of a stereoscopic or auto-stereoscopic
`image changes their point of focus, either by looking from
`one object to another, or by looking at a different region of
`the object, the eyes must readjust. Each eye Will take a ?nite
`period to adjust to the focal plane associated With the object
`perceived by the observer. Therefore, the focal plane of the
`image perceived by each eye may differ and the human brain
`must converge the images into a single focused image of the
`object (knoWn in the art as Convergence).
`Similarly, the human eye has a ?nite depth of focus, or
`region in space in Which the focus of an object can be
`resolved. This is due to the physical requirement for the
`cornea to change shape to produce a sharp image of the
`object on the surface of the retina. Therefore, as the observer
`sWitches his attention from a distant object to a close object
`or vice versa, objects outside the ?eld of vieW become less
`Well de?ned and blur (knoWn in the art as Accommodation).
`Recent research has shoWn that users of stereoscopic and
`auto-stereoscopic systems are prone to fatigue, eye-strain,
`and headaches. It is thought that this can be attributed to the
`fact that convergence and accommodation of images in the
`real-World coincide, and hence the human brain interprets
`muscular actions associated With the control of the cornea to
`predict that objects are at different distances.
`Conversely, in stereoscopic and auto-stereoscopic sys
`tems convergence and accommodation occur at different
`points in space. FIG. 1 illustrates a stereoscopic system for
`generating three-dimensional video images in Which a dis
`play screen 10, such as an LCD or CRT display, shoWs an
`image 12 of an object. The eyes of the observer 16 are
`focused on the display 10 producing an accommodation
`distance Da. HoWever, the object 12 is perceived to be in
`front of the display 10, and hence the convergence distance
`Dc at Which the image 14 of the object 12 is perceived is
`betWeen the display 10 (Where the object is in focus) and the
`observer 16.
`Since the object 12 is not perceived by the observer 16 to
`be at the display surface 10, the human brain directs the eyes
`at the point in space Where it predicts the image 14 to be.
`This results in the human brain being provided With con
`?icting signals that are indicative of the accommodation and
`convergence and can result in fatigue, eye strain and head
`aches.
`
`SUMMARY OF THE INVENTION
`
`Therefore, a need has arisen for a method and apparatus
`for generating an image that is perceived by the observer to
`be three-dimensional, and in Which the accommodation and
`convergence of the image substantially coincide thereby
`alleviating eye strain and fatigue.
`Accordingly the present invention provides a method and
`system for generating a video image. An object is monitored
`With a video camera to produce a sequence of video frames.
`Each of the video frames is divided into a plurality of
`regions, each region being representative of a portion of the
`object. For example, the frame of the video image may
`include the head and shoulder region of a user. Regions
`corresponding to predetermined facial features may be
`selected, such as the chin, opposing edges of the mouth, the
`nose, and the outer edge of each eye. Preferably, the frame
`of the video image is divided into substantially triangular
`regions or blocks of pixels. The selection of regions of
`
`Page 15
`
`
`
`US 7,027,659 B1
`
`3
`frames of a monitored video image is discussed in co
`pending European Patent Application No. 974017725 ?led
`23 Jul. 1997.
`At least one of the plurality of regions is selected. In the
`illustrative example, the selected region may comprise the
`mouth of the monitored user. The selected region is then
`recombined With each of the remaining regions of the video
`frame to form a display video image.
`The sequence of video frames is then displayed such that
`the selected region is perceived by an observer to form a
`sharp image, and remaining regions of the display video
`image are less sharp in accordance With the distance
`betWeen the respective portion of the object and the selected
`region.
`In a further embodiment of the invention video data
`indicative of each region of the video frames is transmitted
`to a receiver before one of the plurality of regions is selected.
`Typically, the selected region Will be a region of the frame
`of the video image de?ning a foreground object. HoWever,
`regions of the frame may also be selected by an observer.
`In a yet further preferred embodiment of the invention, the
`region of the video frame is selected according to the
`position of an object relative to at least one other object
`monitored by the video camera. Typically, this includes
`selecting a region of the frame de?ning an active entity in a
`monitored event, such as for example, the mouth or eyes of
`a monitored user.
`The video image is divided into a plurality of regions each
`de?ning a focal plane, so that each focal plane is represen
`tative of a different distance betWeen a respective portion of
`the object and the video camera.
`Preferably, the remaining regions of the frame are de
`emphasised according to the distance betWeen a respective
`portion of the object and the selected region. Greater de
`emphasisation is applied to regions of the video image that
`are representative of portions of the object Where there is a
`greater distance betWeen the respective portion of the object
`and the selected region than regions of the video image that
`are representative of portions of the object Where there is a
`smaller distance betWeen the respective portion of the object
`and the video camera. Therefore, more distant portions of
`the object are less Well de?ned in the resulting video image.
`In a yet further preferred embodiment of the present
`invention, the selected region of the frame of the video
`image is recombined With arti?cially generated simulations
`of the remaining regions of the video image.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention Will noW be further described, by
`Way of example, With reference to the accompanying draW
`ings in Which:
`FIG. 1 shoWs a schematic illustration of a stereoscopic
`system for generating pseudo three-dimensional video
`images according to the prior art;
`FIGS. 2ai2c shoWs an array of spaced photographic
`transparencies for illustrating the principles of the method of
`the present invention;
`FIG. 3 is a block diagram illustrating the method of the
`present invention;
`FIGS. 4a4c shoW an illustration of the method of FIG. 3
`utilised in a video-conferencing system;
`FIG. 5 illustrates the division of the head and shoulder
`portion of an active entity into blocks of pixels for use in the
`method of FIG. 3;
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`FIGS. 6ai6b illustrate the transmitting portion and the
`receiving portion respectively of the video conferencing
`system of FIGS. 411440;
`FIG. 7 schematically illustrates a method of determining
`the relative image planes forming the video display image;
`FIGS. 81148!) are block diagrams illustrating the operation
`of the transmitting portion and the receiving portion of the
`video conferencing system of FIGS. 4114c;
`FIG. 9 is a block diagram illustrating a process for
`refreshing the video display image;
`FIG. 10 is a schematic that illustrates a camera that is
`focused on an object in a plane of focus;
`FIG. 11 is a schematic that illustrates hoW objects beyond
`the plane of focus of the camera in FIG. 10 appear defocused
`by an amount corresponding to the depth disparity, accord
`ing to an aspect of the present invention;
`FIG. 12 is a schematic that illustrates hoW a different
`depth disparity results in a different amount of defocusing,
`according to an aspect of the present invention;
`FIG. 13 is a block diagram of a camera that contains a
`digital signal processor for processing images according to
`aspects of the present invention;
`FIG. 14 is a schematic that illustrates tWo separate cam
`eras focused on an object in a plane of focus;
`FIG. 15 is a schematic that illustrates hoW an alignment
`error can be corrected When the mechanical plane of focus
`of the tWo cameras of FIG. 14 is offset from the optical plane
`of focus, according to an aspect of the present invention;
`FIG. 16 is a schematic that illustrates hoW objects beyond
`the plane of focus of the cameras in FIG. 14 appear
`defocused by an amount corresponding to the depth dispar
`ity, according to an aspect of the present invention; and
`FIGS. 171141719 shoW the formation of image planes from
`blocks of pixels.
`
`DETAILED DESCRIPTION OF THE DRAWINGS
`
`The method of the present invention can be considered
`analogous to vieWing an array of spaced photographic
`transparencies as illustrated FIGS. 211420. The transparen
`cies (20,22,24) are arranged such that each transparency
`(20,22,24) is separated from adjacent transparencies (20,22,
`24) by a distance dx. For the purposes of illustration, each
`transparency (20,22,24) comprises an image of a different
`object (26,28,30) and de?nes an image plane.
`It Will be appreciated that although the present invention
`is described in relation to an array of transparencies (20,22,
`24) each of Which represents a different object, the principles
`disclosed are equally applicable to an array in Which each
`transparency (20,22,24) is representative of a different
`region of a single object at a prede?ned distance from the
`observer.
`The ?rst transparency 20 shoWs an image of an object 26
`(i.e. ?owers), the second transparency 22 shoWs an image of
`object 28 (i.e. an elephant), and the third transparency 24
`shoWs an image of an object 30 (i.e. a building). The ?rst,
`second and third transparencies (20,22,24) are separated
`from the observer 16 by distances d1, d2 and d3 respectively.
`Referring noW to FIG. 2a, When the observer 16 is
`focused on the object 26 contained in the ?rst transparency
`20 the accommodation and convergence distance are equiva
`lent to distance d1. Since the eyes of the observer 16 are
`focused on the object 26 contained in ?rst transparency 20,
`the objects (28,30) contained in the second and third trans
`parencies (22,24) are perceived by the observer 16 to be
`blurred due to depth disparity.
`
`Page 16
`
`
`
`US 7,027,659 B1
`
`5
`Should the observer 16 switch his attention from the
`object 26 contained in the ?rst transparency 20, for example
`by focussing on the object 28 contained in the second
`transparency 22 (FIG. 2b), the eyes Will not immediately
`focus on the object 28. However, the eyes Will focus on the
`object 28 on completion of a ?nite acclimatisation period
`and the accommodation and convergence distance Will then
`be equivalent to distance d2.
`When the eyes of the observer 16 are focused on the
`object 28 contained in second transparency 22 the objects
`(26,30) contained in the ?rst and third transparencies (20,24)
`are perceived by the observer 16 to be blurred due to depth
`disparity. HoWever, since the object 26 contained in the ?rst
`transparency 20 is in front of the objects (28,30) contained
`in the second and third transparencies (22,24) the focused
`image of object 28 may be partially obscured by the object
`26 of the ?rst transparency 20. As the observer 16 changes
`his position relative to the object 26 contained in the ?rst
`transparency 20, more or less of the object 28 contained in
`the second transparency 22 may come into the observer’s
`?eld of vieW. Similarly, as the observer 16 changes his
`position/orientation relative to the object 26 contained in the
`?rst transparency 20, more or less of the object 30 contained
`in the third transparency 24 may come into the observer’s
`?eld of vieW.
`Should the observer 16 sWitch his attention to the object
`30 contained in the third transparency 24, as illustrated in
`FIG. 20, the eyes Will focus on object 30 folloWing a ?nite
`acclimatisation period. Consequently, the accommodation
`and convergence distance are equivalent to distance d3. The
`eyes of the observer 16 are then focused on the object 30
`contained in the third transparency 24, the objects (26,28)
`contained in the ?rst and second transparencies (20,22) are
`perceived by the observer 16 to be blurred due to depth
`disparity, the object 26 in the ?rst transparency 20 being less
`Well de?ned than the object 28 in the second transparency
`22.
`Since the objects (26,28) contained in the ?rst and second
`transparencies (20,22) are in front of the object 30, it may be
`partially obscured by the objects (26,28) contained in the
`?rst and second transparencies (20,22). As the observer 16
`changes his position/orientation relative to the objects (26,
`28) contained in the ?rst or second transparency (20,22),
`more or less of the object 30 contained in the third trans
`parency 24 may come into the observer’s ?eld of vieW.
`Effectively, the system is creating a plurality of virtual
`image planes, so that the image that is in focus can be
`vieWed in free space.
`FIG. 3 illustrates in block schematic form the method of
`the present invention. Firstly, an event or scene is monitored
`using a video camera (Block 302). The video camera gen
`erates video data comprising a sequence of video frames,
`each video frame being indicative of the monitored event at
`an instant in time (Block 304). The video frame is then
`divided into a number of regions or blocks of pixels (Block
`306).
`Generally the video frame Will be divided into a prede
`termined number of regions. Where certain portions of the
`video frame contain objects or information that require
`greater de?nition, the number of pixels in each region or
`block, or the number of regions or blocks may be increased.
`Alternatively, Where the video frame contains objects or
`information that require less de?nition, sub-regions or sub
`blocks that are representative of groups of, for example four
`pixels, can be provided. These sub-regions or sub-blocks
`enable data transmission or data storage requirements to be
`alleviated.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`Typically, the regions or blocks, and sub-regions or sub
`blocks Will be selected by a processor. Digital processors
`(DSP) such as those manufactured by Texas Instruments
`Incorporated of Dallas Tex. are particulary suitable for such
`applications. HoWever, the operation of the processor may
`be overridden by the user Where objects of particular impor
`tance, for example a White board used in a presentation, are
`used. Therefore, the video frame may be divided into a
`plurality of regions of varying siZe, a greater number of
`regions being assigned to regions of the video frame that
`contain objects or information that require greater de?nition.
`In a video conferencing environment, it has been found
`that observer’s of the generated display video images are
`better able to comprehend audio data (speech) Where the
`facial movements of other users are distinct. Therefore, it is
`desirable to maintain and even enhance the resolution of the
`display video image in regions comprising the facial features
`of the user. Since a large part of facial movement that takes
`place during a conversation is produced to generate spoken
`information, there is an inherent correlation betWeen the
`generated speech and the facial features of the user at any
`instant. Thus, the regions of the video frame containing
`facial features of the user, such as the mouth, eyes chin etc.
`require greater de?nition.
`One or more regions of the video frame are selected either
`by the user or by a processor according to a point of
`reference on the video frame (Block 308). The selected
`regions are generally stored in a memory, and remaining
`regions of the video frame are de-emphasised (Block 310) so
`that these regions appear blurred or out-of-focus in the
`resulting display video image. These remaining regions may
`be arti?cially simulated by the display receiving equipment
`to alleviate data transmission requirements of the system.
`Alternatively, key integers of the remaining regions of the
`video frame can be determined by the user or a processor,
`and may be utilised to generate a simulation of the remaining
`regions of the video frame.
`The de-emphasised or simulated remaining regions are
`then recombined With the selected region(s) of the video
`frame to produce each frame of the display video image
`(Block 312). Each frame of the display video image is then
`sequentially displayed to the observer (Block 314).
`For convenience, the present invention Will noW be
`described in detail With reference to a video communication
`system, and speci?cally a video-conferencing apparatus.
`HoWever, the skilled person Will appreciate that the prin
`ciples, apparatus and features of the invention may ?nd
`application in various other ?elds Where pseudo three
`dimensional images are required.
`FIGS. 4aic illustrates a typical video-conferencing sce
`nario in Which participants 410 at a ?rst location, (generally
`denoted 400) are in audio/video communication With the
`participants 410' at a second location, (generally denoted
`400').
`Referring to the ?rst location 400 illustrated in FIGS. 4a
`and 4b, a video camera 412 is utilised to monitor the ?rst
`location during the video conference. FIG. 4b illustrates
`three alternative locations for a single video camera 412. It
`Will be apparent to the skilled person that the system may
`utilise any one, or a combination of more than one of these
`and other video camera 412 locations. In particular the video
`camera 412 is utilised to monitor the active entity 405, or
`instantaneously active participant (i.e. person speaking or
`giving presentation), at the ?rst location and is directed at,
`and focused on, the active entity 405. HoWever, generally
`due to the large ?eld of vieW and depth of ?eld of the camera
`412, other participants 410 and surrounding and background
`
`Page 17
`
`
`
`US 7,027,659 B1
`
`7
`features at the ?rst location Will be captured by the camera
`412 While it monitors the active entity 405.
`Referring noW to the second location 400' illustrated in
`FIG. 40, the participants 410' in the second location Will
`observe a display video image generated from the scene
`captured by the camera 412 on a screen 415. More particu
`larly, the participants Will observe a display video image of
`the active entity 405 and other objects Within the camera 412
`?eld of vieW.
`When the display video image includes an image of the
`active entity 405, it has been found that participants 410'
`derive signi?cant information from the facial regions. In fact
`it has been found that participants 410' are better able to
`comprehend the audio component (i.e. speech) When
`regions, particularly around the mouth and eyes, of an active
`entity 405 are Well de?ned and the resolution of the display
`video image in these regions is good. In particular it is
`knoWn that participants 410' are better able to determine the
`speech of the active entity 405 if the instantaneous shape of
`the mouth can be determined.
`No.
`Application
`Co-pending
`European
`Patent
`974017725, ?led 23 Jul. 1997 and assigned to Texas Instru
`ments France, describes a video communication system
`Which utilises this concept by updating the data associated
`With the facial regions of the active entity 405 in the display
`video image more frequently than surrounding regions.
`FIG. 5 illustrates the head and shoulder region of an active
`entity 405 monitored by the video camera 412 as described
`in the teachings of co-pending European application No.
`974017725.
`Preferably, a processor selects integers corresponding to
`predetermined facial features. For example, the selected
`integers in FIG. 5 may be the chin 512, opposing edges of
`the mouth 514' and 514" respectively, the nose 516, and the
`outer edge of each eye 518 and 520 respectively.
`The video image may be divided into substantially trian
`gular regions or blocks of pixels. Each of these regions is
`represented by an eigen phase. Regions Where motion is
`likely to be frequent (i.e. the background) but Which assist
`the participants 410' little in their comprehension of the
`audio data (speech) comprise a larger area of pixels than
`other regions. Conversely, for regions from Which provide
`participants 410' gain much assistance in the comprehension
`of the audio data (eg mouth, chin, eyes, nose) comprise a
`smaller area of pixels. Therefore, eigen phases for video data
`corresponding to regions enclosed by the integers 512, 514,
`516, 518, 520 are representative of a smaller area of pixels
`than eigen phases corresponding to an area of other regions.
`Since observers Will tend to focus on the information
`bearing facial regions 512, 514, 516, 518, 520, 521 of the
`active entity 405 other adjacent facial features, such as for
`example the ears, need not be refreshed as frequently.
`Furthermore, as observers of the display image generally
`focus on the information bearing portion of the facial
`regions of the active entity 405, other regions of the display
`image can be less Well de?ned Without detriment to the
`observer.
`In fact it has been discovered that these regions can be
`de-emphasised to generate a display image that is analogous,
`When observed by the participants 410', to someone observ
`ing an image of himself in a mirror. It has further been found
`that an image in Which the information bearing facial
`regions are sharply in focus, and other regions are de
`emphasised generates a so-called “Mona-Lisa” effect
`Whereby it appears to each participant 410' that the active
`entity is looking directly at that participant 410'.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`Operation of a video communication system 600 accord
`ing to a preferred embodiment of the present invention Will
`noW be described With reference to FIGS. 6*16. For con
`venience, the schematic illustration of the video communi
`cation system 600 Will be described in terms of a transmit
`ting portion 610 and a receiving portion 650. HoWever, it
`Will be understood by the skilled person that generally
`operation of the video communication system 600 Will
`require both the transmitting portion 610 and the receiving
`portion 650 to be capable of both generating and transmit
`ting video data, and receiving and converting the video data
`to generate a display video image.
`The transmitting portion 610 includes a video camera
`412, camera actuation device 614, image plane module 616,
`video quantiZation module 618, coding module 620, pre
`processing module 622, loop ?ltering circuit 624, motion
`estimation module 626, memory 628, compression module
`630, and audio quantiZation module 632.
`The receiving portion 650 comprises a video display 652,
`dequantiZation module 654, decoding module 656, post
`processing module 658, loop ?ltering module 660, motion
`estimation module 662, memory 664, and decompression
`module 666. It should be understood that vario