throbber
US007027659B1
`
`(12) United States Patent
`Thomas
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 7,027,659 B1
`Apr. 11, 2006
`
`(54) METHOD AND APPARATUS FOR
`GENERATING VIDEO IMAGES
`
`(75) Inventor: David R. Thomas, Opio (FR)
`
`(73) Assignee: Texas Instruments Incorporated,
`Dallas, TX (US)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. N0.: 09/315,247
`
`(22) Filed:
`
`May 20, 1999
`
`Foreign Application Priority Data
`(30)
`May 20, 1998
`
`(EP) ................................ .. 98401228
`
`(51)
`
`(52)
`(58)
`
`(56)
`
`Int. Cl.
`(2006.01)
`G06K 9/40
`US. Cl. ..................................... .. 382/254; 382/260
`
`Field of Classi?cation Search ...... .. 382/254i255,
`382/260i269; 358/19
`See application ?le for complete search history.
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,629,752 A *
`
`5/1997 Kinjo ........................ .. 355/35
`
`FOREIGN PATENT DOCUMENTS
`
`OTHER PUBLICATIONS
`
`Omura et al: “3D Display With Accomodative Compensation
`(3DDAC) Employing Real-Time GaZe Detection”, SID
`Internationa Symposium, Digest of Technical Papers, San
`Diego, May 12-17, 1996, vol. 27, pp. 889-892.*
`
`* cited by examiner
`
`Primary ExamineriThomas D Lee
`Assistant ExamineriStephen Brinich
`(74) Attorney, Agent, or FirmiWade James Brady III;
`Frederick J. Telecky, Jr.
`
`(57)
`
`ABSTRACT
`
`A method and system for generating a video image is
`disclosed in Which an object is monitored With a video
`camera to produce a sequence of video frames. Each of the
`video frames is divided into a plurality of regions, each
`region being representative of a portion of said object. For
`example, the frame of the video image may include the head
`and shoulder region of a user. Regions corresponding to
`predetermined facial features may be selected, such as the
`chin, opposing edges of the mouth, the nose, and the outer
`edge of each eye. At least one of the plurality of regions is
`selected. In the illustrative example, the selected region may
`comprise the mouth of the monitored user. The selected
`region is then recombined With each of the remaining
`regions of the video frame to form a display video image.
`The sequence of video frames is then displayed such that the
`selected region is perceived by an observer to form a sharp
`image, and remaining regions of said display video image
`are less sharp in accordance With the distance between said
`respective portion of said object and said video camera.
`
`JP
`
`2001208524 A *
`
`8/2001
`
`25 Claims, 13 Drawing Sheets
`
`412
`
`Page 1
`
`SECURUS EXHIBIT 2005
`
`

`

`U.S. Patent
`U.S. Patent
`
`Apr. 11,2006
`Apr. 11,2006
`
`Sheet 1 0f 13
`Sheet 1 0f 13
`
`US 7,027,659 B1
`US 7,027,659 B1
`
`FIG.
`
`1
`
`dx
`
`14
`
`Page 2
`
`Page 2
`
`

`

`U.S. Patent
`
`Apr. 11,2006
`
`Sheet 2 0f 13
`
`US 7,027,659 B1
`
`FIG. 2b
`
`FIG. 2c
`
`Page 3
`
`

`

`U.S. Patent
`
`Apr. 11, 2006
`
`Sheet 3 0f 13
`
`US 7,027,659 B1
`
`FIG. 3
`
`(EA?
`
`MONITOR EVENT
`
`302\
`
`$
`304
`\ GENERATE AUDIO/VIDEO
`
`308\
`
`‘
`306
`\ DlVIDE VIDEO FRAMES
`4
`SELECT REGION
`OF VIDEO FRAME
`&
`DE-EMPHASIZE REMAINING
`310/ REGIONS OF VIDEO FRAME
`&
`312/ RECOMBINE REGIONS
`l
`314 / DISPLAY VIDEO FRAMES
`
`400
`
`Page 4
`
`

`

`U.S. Patent
`U.S. Patent
`
`Apr. 11,2006
`Apr. 11,2006
`
`Sheet 4 0f 13
`Sheet 4 0f 13
`
`US 7,027,659 B1
`US 7,027,659 B1
`
`FIG. 4b
`FIG. 4b
`:::2:::::::::::::::::::::::/’
`
`415
`4,5
`
`405
`
`410
`
`@360 M a
`
`N
`If
`
`410
`410
`410
`410
`
`410
`
`410
`
`410
`410
`
`6%
`
`
`
`Pages
`
`Page 5
`
`

`

`U.S. Patent
`U.S. Patent
`
`Apr. 11,2006
`Apr. 11,2006
`
`Sheet 5 0f 13
`Sheet 5 0f 13
`
`US 7,027,659 B1
`US 7,027,659 B1
`
`
`
`Page 6
`
`Page 6
`
`

`

`U.S. Patent
`
`Apr. 11,2006
`
`US 7,027,659 B1
`
`Sheet 6 6f 13
`
`mmw _ _
`
`
`
`
`g Q: _ 532‘ 6 538: 6 53% 6 6 6 E56 _
`
`
`
`9 .m __ 298528 6 0256816? 6 0258 zocww?éo 6 6 89> "
`
`
`
`“IllllllllllllIllllllllllllllllllllllllllllllllllllllllllIlli _
`
`rlllllIllllllllIllllllllllllllllllllllIlllllllllll-fllllllllL
`
`8% 6% 6E 05
`
`_ Q05: 6 025:: _
`
`_ \ \ \ / / _ _ n 0% Na 6% m5 m5 = n
`
`" mww mg www 3o _
`_ 538: _
`
`“ 20:2 ~62 s26 “ _ \ \ / / _
`
`_ 22225 22222 _
`
`
`
`_ ?/ 538: _ " 225559 6 u
`
`
`
`. _ 538: 5382 _
`
`n 052 N; n
`
`_ ‘ \ _
`
`Page 7
`
`

`

`U.S. Patent
`
`Apr. 11,2006
`
`Sheet 7 0f 13
`
`US 7,027,659 B1
`
`P | l l l | | | | l l l l I l l I I l I I | I I I I I I I | I | | | | lllllllllllllll IIIIIIIIL
`
`_ 53% Z9225 , _ n 05%;: 102
`
`
`226: :55: n u \ \ f n n Q8 mg g n
`
`n n
`_ _
`
`25 o m DEN
`\ . 0%
`
`n n " 5 Eu m8 mg mg n _ / / / \ \ _ _ _
`
`_ _ _ _
`
`
`
`
`
`
` 52s: 0 522‘ A 58% A 58% _ 3 .0: “ 89> 22252505 258% 02502-52 225E003 . __ A 5: n = " 3m
`_ 255
`
`
`
`
`
`_|ll|l|IIlllllllllllllllllllllllllllllllllll | l | I | | | | | | | | | | |||_
`
`Page 8
`
`

`

`U.S. Patent
`
`Apr. 11,2006
`
`Sheet 8 0f 13
`
`US 7,027,659 B1
`
`FIG. 8 a
`
`805\
`
`808/
`
`MONITOR EvENT
`I
`806\ GENERATE AUDIO/VIDEO
`I
`807/ IDENTIFY ACTIVE ENTHY ._®
`I
`SELECT z
`IMAGE PLANES
`I
`DlVlDE IMAGE
`809/ PLANES INTo BLOCKS
`
`Mb
`
`V
`82O\ ouANTIzE VIDEO/AUDIO
`I
`822\ ENCODE VIDEO/AUDIO
`I
`824 / ELIMINATE \llDEO/AUDIO
`
`QUANTIZE VIDEO/AUDIO / 834
`T
`LOOP FITLTERING
`
`\832
`
`826 / COMPRESS \ilDEO/AUDIO
`
`MOTION ESTIMATION \830
`
`828/ TRANSMIT SIGNAL
`
`@
`
`Page 9
`
`

`

`U.S. Patent
`
`Apr. 11,2006
`
`Sheet 9 0f 13
`
`US 7,027,659 B1
`
`
`
`FIG. 10
`
`2010
`
`
`
`Page 10
`
`Page 10
`
`

`

`U.S. Patent
`
`Apr. 11, 2006
`
`Sheet 10 0f 13
`
`US 7,027,659 B1
`
`FIG. 9b
`@815
`/91O
`
`V
`
`>
`
`V
`
`f912
`
`4
`REFRESH FIRST f914
`IMAGE PLANE
`
`916
`
`YES
`REFRESH SECOND /920
`IMAGE PLANE
`
`YES
`
`922
`
`REFRESH THIRD
`IMAGE PLANE
`
`YES
`
`928
`
`REFRESH NTH
`IMAGE PLANE \ 932
`
`NO
`
`YES
`
`934
`
`818
`
`Page 11
`
`

`

`U.S. Patent
`
`Apr. 11,2006
`
`Sheet 11 0f 13
`
`US 7,027,659 B1
`
`FIG.
`
`17
`
`FIG. 12
`
`FIG. 73
`
`2202
`
`_____ -__|
`
`Page 12
`
`

`

`U.S. Patent
`U.S. Patent
`
`Apr. 11,2006
`Apr. 11,2006
`
`Sheet 12 0f 13
`Sheet 12 0f 13
`
`US 7,027,659 B1
`US 7,027,659 B1
`
`FIG. 14
`FIG. 74
` 2103
`
`
`
`2100
`
`2110
`
`
`
`FIG. 76
`2300
`
`m
`f‘~é\\j
`{2111
`;
`:
`réx~+2113
`
`\1‘
`
`,,\JN“
`
`—————— ——r—+“T::t::===
`
`$1
`r
`
`O
`
`Page 13
`
`Page 13
`
`

`

`U.S. Patent
`
`Apr. 11,2006
`
`Sheet 13 0f 13
`
`US 7,027,659 B1
`
`FIG. 17b
`
`Page 14
`
`

`

`US 7,027,659 B1
`
`1
`METHOD AND APPARATUS FOR
`GENERATING VIDEO IMAGES
`
`FIELD OF THE INVENTION
`
`The present invention relates generally to the ?eld of
`video images, and more particularly to a method and appa
`ratus for generating video images that are perceived by the
`observer to be three-dimensional.
`
`BACKGROUND OF THE INVENTION
`
`Many systems for generating pseudo three-dimensional
`(3D) images have been developed over recent years. Gen
`erally, these systems can be characterised by the methods by
`Which the observer is deceived into perceiving the image as
`three-dimensional (i.e. having depth).
`In the real-World, the human eye perceives depth in an
`image due to the combination of a number of visual cues.
`With the ?rst visual cue, more distant objects are per
`ceived by the observer to be both smaller and higher in the
`?eld of vieW, than objects that are closer to the observer.
`Typically, distant objects are also blocked from the observ
`er’s ?eld of vieW by closer objects and the observer per
`ceives the resolution, contrast, and brightness to be less Well
`de?ned.
`With the second visual cue the observer perceives an
`apparent change in the position of the object relative to the
`more distant background image as his oWn position changes.
`This effect is knoWn as parallax and can effect the image
`perceived by the observer in both the horizontal and vertical
`planes.
`With the third visual cue the lateral separation of the
`observer’s eyes means that the distance betWeen a point on
`an object and each eye can be different. This effect is knoWn
`in the art as binocular disparity and results in each eye
`observing a slightly different perspective. HoWever, in real
`life, this effect is resolved by the human brain to produce the
`single clear image perceived by the observer.
`The fourth visual cue to three-dimensional perception of
`video images is depth disparity. Since the human eye has a
`?nite ?eld of vieW in both the horizontal and vertical planes,
`the eye tends to focus on an object, or region of an object,
`that is of immediate interest. Consequently, surrounding
`objects, or regions of the object, Which form the background
`image are out of focus and blurred. The human brain
`perceives these surrounding objects or regions to be at a
`different distance to provide a depth cue.
`Known stereoscopic and auto-stereoscopic systems for
`generating pseudo three-dimensional images, generate alter
`nate and slightly differing frames of the video image for each
`eye. The different frames are intended to correspond to the
`different vieWs perceived by the human brain due to the
`separation betWeen the eyes, and produce a binocular dis
`parity.
`The observer of a video image generated using a stereo
`scopic system must be provided With an optical device such
`as a pair of spectacles having one red lens and one green
`lens. A separate frame of the video image is shoWn alter
`nately for each eye and at suf?cient frequency that the
`observer resolves a single image.
`Auto-stereoscopic systems Were developed to produce
`video images With multiple image planes (i.e. the observer
`can vieW around foreground objects). These auto-stereo
`scopic systems are designed to focus separate frames of the
`image into each eye using an arrangement of optical ele
`ments. Typically, these elements Will include vertically
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`aligned lenticular lenses. These systems have found appli
`cation in items such as postcards, but their more Widespread
`use is limited by the narroW ?eld of vieW.
`As the observer of a stereoscopic or auto-stereoscopic
`image changes their point of focus, either by looking from
`one object to another, or by looking at a different region of
`the object, the eyes must readjust. Each eye Will take a ?nite
`period to adjust to the focal plane associated With the object
`perceived by the observer. Therefore, the focal plane of the
`image perceived by each eye may differ and the human brain
`must converge the images into a single focused image of the
`object (knoWn in the art as Convergence).
`Similarly, the human eye has a ?nite depth of focus, or
`region in space in Which the focus of an object can be
`resolved. This is due to the physical requirement for the
`cornea to change shape to produce a sharp image of the
`object on the surface of the retina. Therefore, as the observer
`sWitches his attention from a distant object to a close object
`or vice versa, objects outside the ?eld of vieW become less
`Well de?ned and blur (knoWn in the art as Accommodation).
`Recent research has shoWn that users of stereoscopic and
`auto-stereoscopic systems are prone to fatigue, eye-strain,
`and headaches. It is thought that this can be attributed to the
`fact that convergence and accommodation of images in the
`real-World coincide, and hence the human brain interprets
`muscular actions associated With the control of the cornea to
`predict that objects are at different distances.
`Conversely, in stereoscopic and auto-stereoscopic sys
`tems convergence and accommodation occur at different
`points in space. FIG. 1 illustrates a stereoscopic system for
`generating three-dimensional video images in Which a dis
`play screen 10, such as an LCD or CRT display, shoWs an
`image 12 of an object. The eyes of the observer 16 are
`focused on the display 10 producing an accommodation
`distance Da. HoWever, the object 12 is perceived to be in
`front of the display 10, and hence the convergence distance
`Dc at Which the image 14 of the object 12 is perceived is
`betWeen the display 10 (Where the object is in focus) and the
`observer 16.
`Since the object 12 is not perceived by the observer 16 to
`be at the display surface 10, the human brain directs the eyes
`at the point in space Where it predicts the image 14 to be.
`This results in the human brain being provided With con
`?icting signals that are indicative of the accommodation and
`convergence and can result in fatigue, eye strain and head
`aches.
`
`SUMMARY OF THE INVENTION
`
`Therefore, a need has arisen for a method and apparatus
`for generating an image that is perceived by the observer to
`be three-dimensional, and in Which the accommodation and
`convergence of the image substantially coincide thereby
`alleviating eye strain and fatigue.
`Accordingly the present invention provides a method and
`system for generating a video image. An object is monitored
`With a video camera to produce a sequence of video frames.
`Each of the video frames is divided into a plurality of
`regions, each region being representative of a portion of the
`object. For example, the frame of the video image may
`include the head and shoulder region of a user. Regions
`corresponding to predetermined facial features may be
`selected, such as the chin, opposing edges of the mouth, the
`nose, and the outer edge of each eye. Preferably, the frame
`of the video image is divided into substantially triangular
`regions or blocks of pixels. The selection of regions of
`
`Page 15
`
`

`

`US 7,027,659 B1
`
`3
`frames of a monitored video image is discussed in co
`pending European Patent Application No. 974017725 ?led
`23 Jul. 1997.
`At least one of the plurality of regions is selected. In the
`illustrative example, the selected region may comprise the
`mouth of the monitored user. The selected region is then
`recombined With each of the remaining regions of the video
`frame to form a display video image.
`The sequence of video frames is then displayed such that
`the selected region is perceived by an observer to form a
`sharp image, and remaining regions of the display video
`image are less sharp in accordance With the distance
`betWeen the respective portion of the object and the selected
`region.
`In a further embodiment of the invention video data
`indicative of each region of the video frames is transmitted
`to a receiver before one of the plurality of regions is selected.
`Typically, the selected region Will be a region of the frame
`of the video image de?ning a foreground object. HoWever,
`regions of the frame may also be selected by an observer.
`In a yet further preferred embodiment of the invention, the
`region of the video frame is selected according to the
`position of an object relative to at least one other object
`monitored by the video camera. Typically, this includes
`selecting a region of the frame de?ning an active entity in a
`monitored event, such as for example, the mouth or eyes of
`a monitored user.
`The video image is divided into a plurality of regions each
`de?ning a focal plane, so that each focal plane is represen
`tative of a different distance betWeen a respective portion of
`the object and the video camera.
`Preferably, the remaining regions of the frame are de
`emphasised according to the distance betWeen a respective
`portion of the object and the selected region. Greater de
`emphasisation is applied to regions of the video image that
`are representative of portions of the object Where there is a
`greater distance betWeen the respective portion of the object
`and the selected region than regions of the video image that
`are representative of portions of the object Where there is a
`smaller distance betWeen the respective portion of the object
`and the video camera. Therefore, more distant portions of
`the object are less Well de?ned in the resulting video image.
`In a yet further preferred embodiment of the present
`invention, the selected region of the frame of the video
`image is recombined With arti?cially generated simulations
`of the remaining regions of the video image.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention Will noW be further described, by
`Way of example, With reference to the accompanying draW
`ings in Which:
`FIG. 1 shoWs a schematic illustration of a stereoscopic
`system for generating pseudo three-dimensional video
`images according to the prior art;
`FIGS. 2ai2c shoWs an array of spaced photographic
`transparencies for illustrating the principles of the method of
`the present invention;
`FIG. 3 is a block diagram illustrating the method of the
`present invention;
`FIGS. 4a4c shoW an illustration of the method of FIG. 3
`utilised in a video-conferencing system;
`FIG. 5 illustrates the division of the head and shoulder
`portion of an active entity into blocks of pixels for use in the
`method of FIG. 3;
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`FIGS. 6ai6b illustrate the transmitting portion and the
`receiving portion respectively of the video conferencing
`system of FIGS. 411440;
`FIG. 7 schematically illustrates a method of determining
`the relative image planes forming the video display image;
`FIGS. 81148!) are block diagrams illustrating the operation
`of the transmitting portion and the receiving portion of the
`video conferencing system of FIGS. 4114c;
`FIG. 9 is a block diagram illustrating a process for
`refreshing the video display image;
`FIG. 10 is a schematic that illustrates a camera that is
`focused on an object in a plane of focus;
`FIG. 11 is a schematic that illustrates hoW objects beyond
`the plane of focus of the camera in FIG. 10 appear defocused
`by an amount corresponding to the depth disparity, accord
`ing to an aspect of the present invention;
`FIG. 12 is a schematic that illustrates hoW a different
`depth disparity results in a different amount of defocusing,
`according to an aspect of the present invention;
`FIG. 13 is a block diagram of a camera that contains a
`digital signal processor for processing images according to
`aspects of the present invention;
`FIG. 14 is a schematic that illustrates tWo separate cam
`eras focused on an object in a plane of focus;
`FIG. 15 is a schematic that illustrates hoW an alignment
`error can be corrected When the mechanical plane of focus
`of the tWo cameras of FIG. 14 is offset from the optical plane
`of focus, according to an aspect of the present invention;
`FIG. 16 is a schematic that illustrates hoW objects beyond
`the plane of focus of the cameras in FIG. 14 appear
`defocused by an amount corresponding to the depth dispar
`ity, according to an aspect of the present invention; and
`FIGS. 171141719 shoW the formation of image planes from
`blocks of pixels.
`
`DETAILED DESCRIPTION OF THE DRAWINGS
`
`The method of the present invention can be considered
`analogous to vieWing an array of spaced photographic
`transparencies as illustrated FIGS. 211420. The transparen
`cies (20,22,24) are arranged such that each transparency
`(20,22,24) is separated from adjacent transparencies (20,22,
`24) by a distance dx. For the purposes of illustration, each
`transparency (20,22,24) comprises an image of a different
`object (26,28,30) and de?nes an image plane.
`It Will be appreciated that although the present invention
`is described in relation to an array of transparencies (20,22,
`24) each of Which represents a different object, the principles
`disclosed are equally applicable to an array in Which each
`transparency (20,22,24) is representative of a different
`region of a single object at a prede?ned distance from the
`observer.
`The ?rst transparency 20 shoWs an image of an object 26
`(i.e. ?owers), the second transparency 22 shoWs an image of
`object 28 (i.e. an elephant), and the third transparency 24
`shoWs an image of an object 30 (i.e. a building). The ?rst,
`second and third transparencies (20,22,24) are separated
`from the observer 16 by distances d1, d2 and d3 respectively.
`Referring noW to FIG. 2a, When the observer 16 is
`focused on the object 26 contained in the ?rst transparency
`20 the accommodation and convergence distance are equiva
`lent to distance d1. Since the eyes of the observer 16 are
`focused on the object 26 contained in ?rst transparency 20,
`the objects (28,30) contained in the second and third trans
`parencies (22,24) are perceived by the observer 16 to be
`blurred due to depth disparity.
`
`Page 16
`
`

`

`US 7,027,659 B1
`
`5
`Should the observer 16 switch his attention from the
`object 26 contained in the ?rst transparency 20, for example
`by focussing on the object 28 contained in the second
`transparency 22 (FIG. 2b), the eyes Will not immediately
`focus on the object 28. However, the eyes Will focus on the
`object 28 on completion of a ?nite acclimatisation period
`and the accommodation and convergence distance Will then
`be equivalent to distance d2.
`When the eyes of the observer 16 are focused on the
`object 28 contained in second transparency 22 the objects
`(26,30) contained in the ?rst and third transparencies (20,24)
`are perceived by the observer 16 to be blurred due to depth
`disparity. HoWever, since the object 26 contained in the ?rst
`transparency 20 is in front of the objects (28,30) contained
`in the second and third transparencies (22,24) the focused
`image of object 28 may be partially obscured by the object
`26 of the ?rst transparency 20. As the observer 16 changes
`his position relative to the object 26 contained in the ?rst
`transparency 20, more or less of the object 28 contained in
`the second transparency 22 may come into the observer’s
`?eld of vieW. Similarly, as the observer 16 changes his
`position/orientation relative to the object 26 contained in the
`?rst transparency 20, more or less of the object 30 contained
`in the third transparency 24 may come into the observer’s
`?eld of vieW.
`Should the observer 16 sWitch his attention to the object
`30 contained in the third transparency 24, as illustrated in
`FIG. 20, the eyes Will focus on object 30 folloWing a ?nite
`acclimatisation period. Consequently, the accommodation
`and convergence distance are equivalent to distance d3. The
`eyes of the observer 16 are then focused on the object 30
`contained in the third transparency 24, the objects (26,28)
`contained in the ?rst and second transparencies (20,22) are
`perceived by the observer 16 to be blurred due to depth
`disparity, the object 26 in the ?rst transparency 20 being less
`Well de?ned than the object 28 in the second transparency
`22.
`Since the objects (26,28) contained in the ?rst and second
`transparencies (20,22) are in front of the object 30, it may be
`partially obscured by the objects (26,28) contained in the
`?rst and second transparencies (20,22). As the observer 16
`changes his position/orientation relative to the objects (26,
`28) contained in the ?rst or second transparency (20,22),
`more or less of the object 30 contained in the third trans
`parency 24 may come into the observer’s ?eld of vieW.
`Effectively, the system is creating a plurality of virtual
`image planes, so that the image that is in focus can be
`vieWed in free space.
`FIG. 3 illustrates in block schematic form the method of
`the present invention. Firstly, an event or scene is monitored
`using a video camera (Block 302). The video camera gen
`erates video data comprising a sequence of video frames,
`each video frame being indicative of the monitored event at
`an instant in time (Block 304). The video frame is then
`divided into a number of regions or blocks of pixels (Block
`306).
`Generally the video frame Will be divided into a prede
`termined number of regions. Where certain portions of the
`video frame contain objects or information that require
`greater de?nition, the number of pixels in each region or
`block, or the number of regions or blocks may be increased.
`Alternatively, Where the video frame contains objects or
`information that require less de?nition, sub-regions or sub
`blocks that are representative of groups of, for example four
`pixels, can be provided. These sub-regions or sub-blocks
`enable data transmission or data storage requirements to be
`alleviated.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`Typically, the regions or blocks, and sub-regions or sub
`blocks Will be selected by a processor. Digital processors
`(DSP) such as those manufactured by Texas Instruments
`Incorporated of Dallas Tex. are particulary suitable for such
`applications. HoWever, the operation of the processor may
`be overridden by the user Where objects of particular impor
`tance, for example a White board used in a presentation, are
`used. Therefore, the video frame may be divided into a
`plurality of regions of varying siZe, a greater number of
`regions being assigned to regions of the video frame that
`contain objects or information that require greater de?nition.
`In a video conferencing environment, it has been found
`that observer’s of the generated display video images are
`better able to comprehend audio data (speech) Where the
`facial movements of other users are distinct. Therefore, it is
`desirable to maintain and even enhance the resolution of the
`display video image in regions comprising the facial features
`of the user. Since a large part of facial movement that takes
`place during a conversation is produced to generate spoken
`information, there is an inherent correlation betWeen the
`generated speech and the facial features of the user at any
`instant. Thus, the regions of the video frame containing
`facial features of the user, such as the mouth, eyes chin etc.
`require greater de?nition.
`One or more regions of the video frame are selected either
`by the user or by a processor according to a point of
`reference on the video frame (Block 308). The selected
`regions are generally stored in a memory, and remaining
`regions of the video frame are de-emphasised (Block 310) so
`that these regions appear blurred or out-of-focus in the
`resulting display video image. These remaining regions may
`be arti?cially simulated by the display receiving equipment
`to alleviate data transmission requirements of the system.
`Alternatively, key integers of the remaining regions of the
`video frame can be determined by the user or a processor,
`and may be utilised to generate a simulation of the remaining
`regions of the video frame.
`The de-emphasised or simulated remaining regions are
`then recombined With the selected region(s) of the video
`frame to produce each frame of the display video image
`(Block 312). Each frame of the display video image is then
`sequentially displayed to the observer (Block 314).
`For convenience, the present invention Will noW be
`described in detail With reference to a video communication
`system, and speci?cally a video-conferencing apparatus.
`HoWever, the skilled person Will appreciate that the prin
`ciples, apparatus and features of the invention may ?nd
`application in various other ?elds Where pseudo three
`dimensional images are required.
`FIGS. 4aic illustrates a typical video-conferencing sce
`nario in Which participants 410 at a ?rst location, (generally
`denoted 400) are in audio/video communication With the
`participants 410' at a second location, (generally denoted
`400').
`Referring to the ?rst location 400 illustrated in FIGS. 4a
`and 4b, a video camera 412 is utilised to monitor the ?rst
`location during the video conference. FIG. 4b illustrates
`three alternative locations for a single video camera 412. It
`Will be apparent to the skilled person that the system may
`utilise any one, or a combination of more than one of these
`and other video camera 412 locations. In particular the video
`camera 412 is utilised to monitor the active entity 405, or
`instantaneously active participant (i.e. person speaking or
`giving presentation), at the ?rst location and is directed at,
`and focused on, the active entity 405. HoWever, generally
`due to the large ?eld of vieW and depth of ?eld of the camera
`412, other participants 410 and surrounding and background
`
`Page 17
`
`

`

`US 7,027,659 B1
`
`7
`features at the ?rst location Will be captured by the camera
`412 While it monitors the active entity 405.
`Referring noW to the second location 400' illustrated in
`FIG. 40, the participants 410' in the second location Will
`observe a display video image generated from the scene
`captured by the camera 412 on a screen 415. More particu
`larly, the participants Will observe a display video image of
`the active entity 405 and other objects Within the camera 412
`?eld of vieW.
`When the display video image includes an image of the
`active entity 405, it has been found that participants 410'
`derive signi?cant information from the facial regions. In fact
`it has been found that participants 410' are better able to
`comprehend the audio component (i.e. speech) When
`regions, particularly around the mouth and eyes, of an active
`entity 405 are Well de?ned and the resolution of the display
`video image in these regions is good. In particular it is
`knoWn that participants 410' are better able to determine the
`speech of the active entity 405 if the instantaneous shape of
`the mouth can be determined.
`No.
`Application
`Co-pending
`European
`Patent
`974017725, ?led 23 Jul. 1997 and assigned to Texas Instru
`ments France, describes a video communication system
`Which utilises this concept by updating the data associated
`With the facial regions of the active entity 405 in the display
`video image more frequently than surrounding regions.
`FIG. 5 illustrates the head and shoulder region of an active
`entity 405 monitored by the video camera 412 as described
`in the teachings of co-pending European application No.
`974017725.
`Preferably, a processor selects integers corresponding to
`predetermined facial features. For example, the selected
`integers in FIG. 5 may be the chin 512, opposing edges of
`the mouth 514' and 514" respectively, the nose 516, and the
`outer edge of each eye 518 and 520 respectively.
`The video image may be divided into substantially trian
`gular regions or blocks of pixels. Each of these regions is
`represented by an eigen phase. Regions Where motion is
`likely to be frequent (i.e. the background) but Which assist
`the participants 410' little in their comprehension of the
`audio data (speech) comprise a larger area of pixels than
`other regions. Conversely, for regions from Which provide
`participants 410' gain much assistance in the comprehension
`of the audio data (eg mouth, chin, eyes, nose) comprise a
`smaller area of pixels. Therefore, eigen phases for video data
`corresponding to regions enclosed by the integers 512, 514,
`516, 518, 520 are representative of a smaller area of pixels
`than eigen phases corresponding to an area of other regions.
`Since observers Will tend to focus on the information
`bearing facial regions 512, 514, 516, 518, 520, 521 of the
`active entity 405 other adjacent facial features, such as for
`example the ears, need not be refreshed as frequently.
`Furthermore, as observers of the display image generally
`focus on the information bearing portion of the facial
`regions of the active entity 405, other regions of the display
`image can be less Well de?ned Without detriment to the
`observer.
`In fact it has been discovered that these regions can be
`de-emphasised to generate a display image that is analogous,
`When observed by the participants 410', to someone observ
`ing an image of himself in a mirror. It has further been found
`that an image in Which the information bearing facial
`regions are sharply in focus, and other regions are de
`emphasised generates a so-called “Mona-Lisa” effect
`Whereby it appears to each participant 410' that the active
`entity is looking directly at that participant 410'.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`Operation of a video communication system 600 accord
`ing to a preferred embodiment of the present invention Will
`noW be described With reference to FIGS. 6*16. For con
`venience, the schematic illustration of the video communi
`cation system 600 Will be described in terms of a transmit
`ting portion 610 and a receiving portion 650. HoWever, it
`Will be understood by the skilled person that generally
`operation of the video communication system 600 Will
`require both the transmitting portion 610 and the receiving
`portion 650 to be capable of both generating and transmit
`ting video data, and receiving and converting the video data
`to generate a display video image.
`The transmitting portion 610 includes a video camera
`412, camera actuation device 614, image plane module 616,
`video quantiZation module 618, coding module 620, pre
`processing module 622, loop ?ltering circuit 624, motion
`estimation module 626, memory 628, compression module
`630, and audio quantiZation module 632.
`The receiving portion 650 comprises a video display 652,
`dequantiZation module 654, decoding module 656, post
`processing module 658, loop ?ltering module 660, motion
`estimation module 662, memory 664, and decompression
`module 666. It should be understood that vario

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket