`Baker
`
`TOON CON TYA
`US005686957A,
`[11] Patent Number:
`[45] Date of Patent:
`
`5,686,957
`Nov. 11, 1997
`
`[54] TELECONFERENCING IMAGING SYSTEM
`WITH AUTOMATIC CAMERA STEERING
`
`4,980,761
`5,508,734
`
`
`....
`we 348/15
`12/1990 Natori
`4/1996 Baker...sessssssssersessenseeseseere 348/36
`
`Inventor: Robert G. Baker, Delray Beach, Fla.
`[75]
`[73] Assignee:
`International Business Machines
`Corporation, Armonk, N.Y.
`
`[21] Appl. No.: 496,742
`[22] Filed:
`Jun. 30, 1995
`
`Related U.S. Application Data
`
`Primary Examiner—Howard W. Britton
`Attomey, Agent, or Firm—Richard A. Tomlin; John C.
`Black; Malin, Haley, DiMaggio & Crosby, PA.
`[57]
`ABSTRACT
`An automatic, voice-directional video camera image steer-
`ing system specifically for use for teleconferencing that
`electronically selects segmented images from a selected
`panoramic video scene typically around a conference table
`so that the participant in the conference currently speaking
`will be the selected segmented image in the proper viewing
`[63] Continuation-in-part ofSer. No. 281,331, Jul. 27,1994, Pat|aspect ratio, eliminating the need for manual camera move-
`No. 5,508,734.
`ment or automated mechanical camera movement. The
`6
`system includes an audio detection circuit from an array of
`Tint, C1. aeccssssscesonsesensenssnvssensecensoessseessess HO4N 7/18
`TSL]
`microphonesthat can instantancously determine the direc-
`[52] U.S. C1. oeeeeseeneeee 48136;eeSree
`tionofaparticular speaker andprovidedirectional signals to
`[58] Field of Search
`348/15. 214. 36
`a video camera and lens system that provides a panoramic
`display that can electronically select portions of that image
`348/53, 580; HO4N 7/18
`and, through warping techniques, remove any distortion
`from the most significant portions of the image which lie
`from the horizon up to approximately 30 degrees in a
`hemispheric viewing area,
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`4,264,928
`
`4/1981 Schober ...ccsssscesssensecnseesessenseenee 348/15
`
`14 Claims, 7 Drawing Sheets
`
`as
`
`
`15
`
` AUDIO
`DIRECTION
`PROCESSOR
`
`17
`
`
`
`1
`
`.
`
`APPLE 1006
`
`APPLE 1006
`
`1
`
`
`
`USS. Patent
`
`Nov. 11, 1997
`
`Sheet 1 of 7
`
`5,686,957
`
`ae
`
`Of
`
`—SQVGH
`
`dn
`
`AW1dSI0
`
`GS
`
`YOSSA00"d
`
`WYOAISNVAL
` ANISNA
`
`2
`
`
`
`
`
`
`U.S. Patent
`
`Nov. 11, 1997
`
`Sheet2 of 7
`
`5,686,957
`
`INTERPOLATION:
`COEFFICIENT :
`
`PROCESSOR
`
`WARP
`ENGINE
`
`c
`
`Y)
`OLUMN(
`WARP
`ENGINE
`
`_
`
`DIRECTION
`
`3
`
`
`
`US. Patent
`
`Sheet 3 of 7
`
`Nov. 11, 1997
`
`5,686,957
`
`4
`
`
`
`U.S. Patent
`
`Sheet 4 of 7
`
`Nov. 11, 1997
`
`5,686,957
`
`5
`
`
`
`Nov. 11, 1997
`
`Sheet 5 of 7
`
`U.S. Patent
`
`5,686,957
`
`IBB
`
`IBB
`
`6
`
`
`
`US. Patent
`
`Nov. 11, 1997
`
`Sheet 6 of 7
`
`5,686,957
`
`
`
`PROCESSING
`INTERPOLATION
`MULTIPLY/
`AND
`COEFFICIENT
`ACCUMULATE
`
`
`WARPING
`BUFFER
`UNIT
`
`
`CIRCUITS
`
`
`
`WARPED
`
`ADDRESS
`DATA
`
`
`
`COLUMN(Y)
`I
`eeBUFFER
`WARP
`
`ENGINE
`
`
`DESTINATION
`;
`ADDRESS
`fig. 6B
`
`WARPED
`
`7
`
`
`
`U.S. Patent
`
`
`
`
`
`
`
`WOdsVLIWAOVNISOWNYYIHLOYOOSLN
`
`YAOVWNIDINOMLOSA
`
`
`_YYAWVOWOMLNdNI
`
`5,686,957
`
`
`
`
`
`
`
`OVOWVYOVGWVY/FOWIMAINISNOVGWVd
`
`TOXLNOD
`
`AVI1dSIDOLAWIdSICOLAVIdSIGOLFgOUg
`
`JOVI W1avL
`
`
`MOG.SUALINVaVdWYOFSNVYL
`
`OLSOTYNYFOV!SINVHONVd
`Pt}
`pe_
`
`dnyoo7
`
`Nov. 11, 1997
`
`Sheet 7 of 7
`
`
`
`INISSSIOddINISSSOONd
`
`GNVQNV
`
`
`
`
`
`ONIdYVMONIGHVMONIGHYVM
`
`
`
`
`
`SLINDAIDSLINOYIDSLINDAIO
`
`ISOH
`
`snd
`
`00
`
`
`
`TOYLNOOD
`
`SS3Yddv
`
`
`
`AYOWSAN.
`
`8
`
`
`
`
`
`5,686,957
`
`1
`TELECONFERENCING IMAGING SYSTEM
`WITH AUTOMATIC CAMERA STEERING
`
`This application is a continuation-in-part of U.S. patent
`application Ser. No. 281,331 filed Jul. 27, 1994 U.S. Pat. No.
`5,508,734.
`
`BACKGROUND OF THE INVENTION
`1. Field of the Invention
`
`This invention relates to a video conferencing system that
`has automatic, voice-directional camera image steering, and.
`specifically to a teleconferencing system that employs auto-
`matic video image selection of the current participant speak-
`ing electronically selected from a panoramic video scene.
`2. Description of the Prior Art
`Teleconferencing provides for the exchange of video and
`audio information between remotely separated participants.
`Typically, a first group of participants is arranged around a
`conference table or seated strategically in a conference room
`and telecommunicating with a second group of participants
`similarly situated at a remote location. One or more video
`cameras at each location creates video images of the par-
`ticipants through manual manipulation of each video
`camera, normally directed at the participant speaking at the
`moment. Microphones at each location provide for sound
`transmission signals. The video image and audio voice
`signals are then transmitted to the remote location. The
`video image is projected onto a large screen or other type of
`video display which also would include audio outputs for
`providing the sounds.
`Manual manipulation of each video camera at each con-
`ference site is required to change the direction of each
`camerato different participants as speakers change, unless a
`Jarge overall view of all the participants is maintained. Such
`a process is labor intensive. Also image content and
`perspective, dependent on the location of the video camera
`relative to the participants, contributes to the quality of the
`final visual display available to the participants watching the
`display screen. The quality of the image and the scene
`content all contribute to the overall effectiveness of the
`telecommunication process, In particular, in a setting such as
`a conference table in a conference room, a hemispheric or
`panoramic viewpoint would be much more efficient for
`video image capture of surrounding selected participants.
`With a hemispheric scene, certain efficiencies are gained by
`eliminating large areas that are unused scene content while
`concentrating on a band of hemispheric areas populated by
`the teleconferencing participants. Therefore, it is believed
`that hemispheric or panoramic electronic imaging would be
`greatly beneficial to a teleconferencing environment, espe-
`cially when controlled with audio directional processors.
`The selected video image is taken from a desired segmentof
`a hemispherical view in the correct video aspect ratio. A
`centralized panoramic image capture system which already
`has a distorted picture of the hemisphere bounded by the
`plane of the table upward selects a portion of the scene and
`warps the image to correspond to a normal aspect ratio view
`of the person speaking. The signal can be converted to
`whatever display format is desired for transmission to a
`remote location. The present invention has incorporated, in
`one automated system, audio beam steering and electroni-
`cally selectable subviews of a much larger panoramic scene.
`The video/subviews can be converted to an NISC display
`format for transmission to a remote location for video
`display.
`The collection, storage, and display of large areas of
`visual information can be an expensive anddifficult process
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`65
`
`2
`to achieve accurately. With the recent increased emphasis on
`‘multimedia applications, various methods and apparatuses
`have been developed to manage visual data. A unique class
`of multimedia data sets is that of hemispheric visual data.
`Known multimedia methods and apparatuses attempt to
`combine various multimedia imaging data, such asstill and
`motion (or video) images, with audio content using storage
`media such as photographic film, computer diskettes, com-
`pact discs (CDs), and interactive CDs. These are used in
`traditional multimedia applications in variousfields, such as
`entertainment and education. Teleconferencing is an appli-
`cation where automatedelectronic selection of scene content
`would result in greatly improved usability. Non-multimedia
`applications also exist that would employ hemispheric visual
`data, such as in security, surveillance, unmanned
`exploration, and fire and police situations. However, as will
`be described below, the known methods and apparatuses
`havecertain limitations in capturing and manipulating valu-
`able information and hemispheric scenes in a rapid (ie.,
`real-time) and cost effective manner.
`One well known multimedia technique is used at theme
`parks, wherein visual information from a scene is displayed
`on a screen or collection of screens that covers almost 360
`degreesfield of view. Such a technique unfortunately results
`in the consumption of vast quantities of film collected from
`multiple cameras, requires specially designed carriages to
`carry and support the cameras during filming of the scene,
`and necessitates synchronization of shots during capture and
`display. The techniqueis also limited in that the visual image
`cannotbe obtained with a single camera nor manipulated for
`display, e.g., pan, tilt, zoom, etc., after initial acquisition.
`Hence, this technique, while providing entertainment,
`is
`unable to fulfill critical technical requirements of many
`functional applications.
`Other known techniques for capturing and storing visual
`information about a large field of view (FOV)are described.
`in U.S. Pat. Nos. 4,125,862; 4,442,453; and 5,185,667. In
`U.S. Pat. No. 4,125,862, a system is disclosed that converts
`signal information from a scene into digital form, stores the
`data of the digitized scene serially in two-dimensional
`format, and reads out the data by repetitive scan in a
`direction orthogonally related to the direction in which the
`data wasstored. U.S. Pat. No. 4,442,453 discloses a system
`in which a landscape is photographed and stored on firm.
`Thefilm is then developed, with display accomplished by
`scanning with electro-optical sensors at “near real-time”
`rates. These techniques, however, do not provide instant
`visual image display, do not cover the field of view required.
`for desired applications (hemispheric or 180 degrees field-
`of-view), do not generate visual image data in the format
`provided by the techniquesof this invention, and are also not
`easily manipulated for further display, e.g., pan,tilt, etc.
`The technique disclosed in U.S. Pat. No. 5,185,667 over-
`comes some of the above-identified drawbacks in thatit is
`able to capture a near-hemispheric field of view, correct the
`image using high speed circuitry to form a normal image,
`and electronically manipulate and display the image at
`real-time rates.
`
`For many hemispheric visual applications, however, even
`U.S. Pat. No. 5,185,667 has limitations in obtaining suffi-
`cient information of critical and useful details. This is
`particularly true when the camerais oriented with the central
`axis of the lens perpendicular to the plane bounding the
`hemisphere of acquisition (i.e. lens pointing straight up). In
`such applications, the majority of critical detail in a scene is
`contained in areas ofthe field along the horizon andlittle or
`no useful details are contained in central areas of the field
`
`9
`
`
`
`5,686,957
`
`3
`located closer to the axis of the lens (the horizon being
`defined as the plane parallel to the image or camera plane
`and perpendicular to the optical axis of the imaging system).
`For example, in surveillance, the imaging system is aimed
`upward and the majority of the critical detail in the scene
`includes people, buildings, trees, etc., most of which are
`located within only a few degrees along the horizon(i.e., this
`is the peripheral content). Also, in this example, although the
`sky makesup the larger central arc of the view, it contains
`little or no useful information requiring higher relative
`resolution.
`
`To obtain sufficient detail on the critical objects in the
`scene, the technique should differentiate between the rel-
`evant visual information along the horizon and the remain-
`ing visual information in the scene in order to provide
`greater resolution in areas of higher importance. U.S. Pat.
`No. 5,185,667 does not differentiate between this relevant
`visual
`information contained along the horizon and the
`remaining visual information in this scene. Thus,it fails to
`yield a sufficient quality representation of the critical detail
`of the scene for projected applications.
`Instead,
`techniques described above concentrate on
`obtaining, storing, and displaying the entire visual informa-
`tion in the scene, even when portions of this information are
`not necessary or useful. To obtain the near-hemispheric
`visual information, such techniques require specific lens
`types to map image information in the field of view to an
`image plane (where either a photographic film or electronic
`detector or imager is placed). Known examples of U.S.Pat.
`No, 5,185,667 and U.S. Pat. No. 4,442,453 respectively use
`a fish-eye lens and a general wide-angle lens. As these lenses
`map information of a large field without differentiation
`between the central and peripheral areas, information from
`the periphery will be less fully represented in the image
`plane than from the central area of acquisition.
`In U.S. Pat. No. 4,170,400, Bach et al. describes a
`wide-angle optical system employing a fiber optic bundle
`that has differing geometric shapes at the imaging ends.
`Although this is useful in itself for collecting and reposi-
`tioning image data, bending of light is a natural character-
`istic of optical fibers and not exclusive to that patent.
`Further, U.S. Pat. No. 4,170,400 employs a portion of a
`spherical mirror to gather optical information, rendering a
`very reduced subset of the periphery in the final imaging
`result. This configuration is significantly different from the
`multi-element lens combination described in the present
`invention.
`
`Imperfections in the image representation of any field
`inherently result from the nature of creating an image with
`any spherical glass (or plastic} medium such as a lens. The
`magnitude of these imperfections increases proportionally to
`the distance a pointin the field is from the axis perpendicular
`to the optical imaging system. As the angle between the
`optical axis and a point in the field increases, aberrations of
`the corresponding image increase proportional to this angle
`cubed. Hence, aberrations are more highly exaggerated in
`the peripheral areas with respect to more central areas of a
`hemispheric image.
`Although the lens types above achieve a view ofa large
`field, the valuable content from the peripheral areas lacks in
`potential image quality (resolution) mapping because the
`imaging device and system does not differentiate between
`these areas and the central areas of less valuable detail.
`Often,
`the difference between the imaging capabilities
`between the two areas is compensated for by using only the
`central portion of a lens to capture the scene (“stopping the
`
`4
`lens down”). This works in effect to reduce the image quality
`of both areas such that the difference in error is a lesser
`percentage of the smallest area even the central area can
`resolve. Simultaneously, this compensation technique fur-
`ther degrades the performance of the lens by limiting the
`amountof light which is allowed to enter the lens, and thus
`reducing the overall intensity of the image.
`More typically, the peripheral content imaged by a con-
`ventional lens is so degraded in comparison with the central
`area that the lens allows for only a minimal area of the
`periphery to be recorded by thefilm or electronic imager. As
`aresult of these “off-axis” aberrations inherentto largefield,
`the relevant information of the horizon in the scene can be
`underutilized or worse yet, lost.
`Another limitation in U.S. Pat. No. 5,185,667 is its
`organization for recording only views already corrected for
`perspective. The nature of that methodology is that the
`specific view of interest must be selected and transformed
`prior to the recording process. The resultis that no additional
`selection of views can be accomplished after the storage
`process, reducing system flexibility from the user’s perspec-
`tive.
`
`Hence,there is a demandin the industry for single camera
`imaging systems that efficiently capture, store, and display
`valuable visual information within a hemispheric field of
`view containing particularly peripheral content, and that
`allow electronic manipulation and selective display of the
`image post-acquisition while minimizing distortion effects.
`Such a system finds advantageous application in a tele-
`conferencing environment in accordance with the present
`invention.
`
`Limited control of video cameras is disclosed in the prior
`art. U.S. Pat. No. 4,980,761 issued to Natori, Sep. 25, 1990
`describes an image processing system that rotates a camera
`for a teleconference system. A control unit outputs a drive
`signal based on an audio signal to control the movement of
`the image until
`the image controlling unit receives an
`operational completion signal. In this case, the rotational
`movementof the camera, moving the video image from one
`participant to another participant, alleviates having to view
`the camera movement. Once the camera stops skewing, the
`picture will then provide the proper aspect ratio. A plurality
`of microphones are provided to each attendant. A sound
`control unit then determines with a speaker detection unit
`which participant is speaking. U.S. Pat. No. 4,965,819
`showsa video conferencing system for courtroom andother
`applications in which case each system includes a local
`module that includes a loud speaker, a video camera, a video
`monitoring unit and a microphone for each local conferee.
`U.S. Pat. No. 5,206,721 issued to Ashida, Apr. 27, 1993,
`showsa television conference system that allows for auto-
`matically mechanically moving and directing a camera
`towards a speaking participant. In this system a microphone
`is provided for each participant and is recognized by the
`control system. Image slew is corrected to avoid camera
`image motion. Areview of these systems thus showsthat the
`automation provided is very expensive and in every case
`requires individualized equipment for each participant.
`Limited audio direction finding for multiple microphone
`arrays is knownin the prior art. For example, a self steering
`digital microphone array defined by W. Kellerman of Bell
`Labs at ICASSP in 1991 created a teleconference in which
`a unique steering algorithm was used to determine direction
`of sound taking into account the acoustical environmentin
`which the system was located. Also a two stage algorithm
`for determiningtalker location from linear microphonearray
`
`15
`
`20
`
`25
`
`35
`
`45
`
`50
`
`55
`
`65
`
`10
`
`10
`
`
`
`5,686,957
`
`5
`data was developed by H. Silverman and S. Kirkman at
`Brown University and disclosed in April, 1992. Thefiltered
`cross correlation of the system is introduced asthe locating
`algorithm.
`A“telepresence” concept from BellCorp briefly described
`in IEEE Network Magazine in March, 1992 suggests a
`spherical camera for use in the teleconference system.
`However, the entire image is sent in composite form forthe
`Temote users to select from at the other end. The present
`invention is quite different and includes automated pointing
`and control including incorporation in one automated system
`of both audio beam steering and selectable subviews of a
`much larger panoramic scene.
`
`SUMMARYOF THE INVENTION
`
`The present invention comprises a video conferencing,
`voice-directional video imaging system for automatic elec-
`tronic video image manipulation of a selected, directional
`signal of a hemispheric conference scene transmitted to a
`remote conference site. The system employs three separate
`subsystems for voicedirected, electronic image manipula-
`tion suitable for automated teleconferencing imaging in a
`desirable video aspect ratio.
`The audio beam, voice pickup and directing subsystem
`includes a plurality of microphonesstrategically positioned
`near a predetermined central location, such as on a confer-
`ence table. The microphonearray is arranged to receive and
`transmit the voices of participants, while simultaneously
`determining the direction of a participant speaking relative
`to the second subsystem, which is a hemispheric imaging
`system used with a video camera. The third subsystem is a
`personal computer or controller circuits in conjunction with
`the hemispheric imaging system which ultimately provides
`automatic image selection of the participant speakingthatis
`ultimately transmitted as a video signal to the remote video
`display at the remote teleconference location.
`The hemispheric electronic image manipulator subsystem
`includes a video camera having a capture lens in accordance
`with the invention that allows for useful electronic manipu-
`lation of a segmented portion of a hemispheric scene. In a
`conference table setting, as viewed from the center of the
`conference table, participants are arranged around the table
`in the lower segmentof the hemisphere, with the plane of the
`table top forming the base of the hemisphere. Theelectronic
`image is warped to provide a desired subview in proper
`aspect ratio in the audio selected direction.
`The present invention provides a new and useful voice-
`directional visual
`imaging system that emphasizes the
`peripheral content of a hemispheric field of view using a
`single video camera. The invention allows user-selected
`portions of a hemispheric scene to be electronically
`manipulated, transmitted, and displayed remotely from the
`video camera in real-time and in a cost-effective manner.
`
`imaging system of the present invention
`The visual
`involves a video image having a lens with enhanced periph-
`eral content imaging capabilities. The lens provides an
`enhanced view of the valuable information in the scene’s
`periphery by imaging the field of view to the image plane
`such that the ratio of the size of the smallest detail contained
`within the periphery of the scene to the size of the smallest
`resolving pixel of an image device is increased. For this to
`be accomplished, the peripheral content must mapto a larger
`percentage of a given image detector area and,
`simultaneously, the mapped image of the central area of the
`scene must be minimized by the lens so that it does not
`interfere with the peripheral content now covering a wider
`
`6
`annulus in the image plane. Information in the image plane
`is then detected by the video camera. The detected infor-
`mation of the entire hemispheric scene is then stored as a
`single image in memory using traditional methods.
`Whena portionof the sceneis to be displayed, the image
`information relating to the relevant portion of the scene is
`instantaneously retrieved from memory. A transform pro-
`cessor subsystem electronically manipulates the scene for
`display as a perspective-correct image on a display device,
`such as a teleconference display screen or monitor, asif the
`particular portion of the scene had been viewed directly with
`the video camera pointed in that direction. The transform
`processor subsystem compensates for the distortion or dif-
`ference in magnification between the central and peripheral
`areas of the scene caused by the lens by applying appropriate
`correction criteria to bring the selected portion of the scene
`into standard viewing format. The transform processor sub-
`system can also more fully compensate for any aberrations
`of the enhanced peripheral image because of the image’s
`improved resolution as it covers a larger portion of the image
`device (increased number of pixels used to detect and
`measure the smallest detail in the periphery image). More
`pixels equates to more measurement data, hence more
`accurate data collection.
`
`The stored image can also be manipulated by the trans-
`form processor subsystem to display an operator-selected
`portion of the image through particular movements, such as
`pan, zoom, up/down,tilt, rotation, etc.
`By emphasizing the peripheral content of a scene, the
`visual imaging system can use a single camera to capture the
`relevant visual information within a panoramicfield of view
`existing along the horizon, while being able to convention-
`ally store and easily display the scene, or portions thereof, in
`real-time. Using a single optical system and camera is not
`only cost-effective, but keeps all hemispheric visual data
`automatically time-synchronized.
`In the present invention, at a conference table view point,
`with participants seated around a conference table, hemi-
`spheric scene content is ideally suited for segmented sub-
`views of participants, especially when directionally elec-
`tronically manipulated by voice actuation. The video image
`should be of the current speaker.
`Oneadvantage of the present invention is that the unique
`visual imaging system lens can capture information from a
`hemispheric scene by emphasizing the peripheral portion of
`the hemispheric field of view and thus provide greater
`resolution with existing imaging devices for the relevant
`visual
`information in the scene. As an example, if an
`ordinary fisheye lens focuses the lowest 15 degrees up from
`the horizon on ten percent of the imager at the imaging plane
`and the peripheral-enhancing lens focuses that same 15
`degrees on fifty percent of the imager, there is a five-fold
`increase in resolution using the same imaging device.
`Depending on the application and exact formulation of the
`lens equations, there will be at least a five times increase in
`resolving power by this lens/imager combination.
`_
`Thethird subsystem of the present invention comprises a
`control apparatus such as a personal computer or other
`collection of electronic circuits, connected to the imagery
`system to allow flexible operation delivering options and
`defaults, including an override of the automated video image
`manipulation. A minimal control program is the software of
`the host controller to provide the options that may be
`necessary for individual teleconferences. An example would
`be to delay switching time segments between speakers, or
`perhapsthe use of alternate cameras that may include a dual
`display.
`
`10
`
`20
`
`25
`
`30
`
`35
`
`50
`
`35
`
`60
`
`65
`
`11
`
`11
`
`
`
`5,686,957
`
`8
`FIG. 5 is a schematic block diagram of the panoramic
`transform processor subsystem for use with the teleconfer-
`encing system of the present invention.
`FIGS. 6A and 6B are a schematic diagram showing how
`multiple transform processor subsystems can be tied into the
`same distorted image to provide multiple different view
`perspectives to different users from the same source image
`as described in the parent application.
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`
`The invention will be defined initially with a brief
`description of the principles thereof.
`
`Principles of the Present Invention
`
`As described in the parent U.S. patent application, the
`imaging invention stems from the realization by the inven-
`tors that in many of the technical hemispheric field
`applications, where the image detector is parallel to the
`plane of the horizon, much of the relevant visual information
`in the scene (e.g., trees, mountains, people, etc.) is found
`only in a small angle with respect to the horizon. Although
`the length of the arc from the horizon containing the relevant
`information varies depending upon the particular
`application, the inventors have determined that in many
`situations, almost all
`the relevant visual information is
`contained within about 10 to 45 degrees with respect to the
`horizon. This determination is especially true with respect to
`the teleconference environment which is normally centered
`around a conference table or conference room.
`
`To maximize data collection and resolution for analysis
`and/or display of the relevant visual information located in
`this portion of the hemispheric scene, it is desirable to
`maximize the dedication of the available image detection
`area to this peripheral field portion. To accommodate this, it
`is necessary that the “central” portion of the scene (from 45
`to 90 degrees with respect to the horizon) cover only the
`remaining areas of the imager plane so as not to interfere
`with light from the periphery.
`In many cases, since the “central” area contains less
`detailed information, such as a solid white ceiling or a clear
`or lightly clouded sky, it is allowable to maximize com-
`pletely the dedication of the available image detection area
`to the peripheral field portion by reducing the portion of the
`imager device representing the “central” area to near zero.
`Of course, in certain instances, it is desirable to analyze this
`less detailed information, but this portion of the scene can be
`minimized to some extent without significant degradation of
`such visual information. As will be described herein in more
`detail, the present invention provides two manners (Example
`I and Example Il) for capturing, storing, and selectively
`displaying the critical visual information in a scene for many
`important applications.
`
`System Organization and Components
`
`FIG. 1 is a schematic illustration of the visual imaging
`system organization and components of the parent applica-
`tion.
`FIG. 14is a schematic illustration of the automated video
`conferencing system organization and components.
`FIGS. 2A, 2B, and 2C show a cross sectional diagram
`indicating the field input and output rays and the resulting
`relative field coverage a lens typically provides in the image
`plane for detection by an imager device.
`FIGS. 3AA, 3AB, and 3AC show a cross sectional
`diagram indicating the field input and output rays and the
`resulting field coverage that optical system Example I,
`constructed according to the principles of the present
`Referring now to the drawings, and initially to FIG. 1,the
`invention, provides in the image plane for detection by an
`visual imaging system of the parent invention includesa still
`imaging device or substrate.
`image or moving picture camera 10, having a lens, indicated.
`FIGS. 3BA, 3BB showacross sectional diagram indicat-
`generally at 14, designed to capture and enhance the periph-
`ing the field input and output rays and the resulting field
`eral content of a hemispheric scene. The captured scene can
`coverage that optical system Example I of this present
`be stored onto an assortment of media, e.g., photographic
`invention provides in the image plane for detection by an
`film 16, electronic storage 18, or other conventional storage
`imaging device or substrate.
`means. Electronic storage 18 is preferred because of the ease
`FIG. 4 is a schematic representation of the mapping
`of electronic manipulation thereof. Additionally, photo-
`locations on the imaging device.
`graphic film 16 requires an image scanner 20 or other
`
`7
`In operation,at a particular teleconferencing site, partici-
`pants will be arranged at a conference table or in a confer-
`ence room with an array of microphones, each of which will
`pick up the normal speaking voice of each participant. The
`array of microphones is directly connected to an audio
`direction processor. The hemispheric lens system in con-
`junction with the video camerais attached to view warping
`logic as explained above andto the controller, or a personal
`computer. The video and audio signals are then transmitted
`through a transmission medium in an NTSC or other format
`to the remote teleconferencing site for remote display.
`Sound from the participant speaking that is processed in
`the audio direction processor determines the direction of the
`participant speakingrelative to the panoramic video camera
`and lens. Once the particular speaker direction is
`determined, the panoramic imageof a specific hemispherical
`region of interest, such as the participant’s face, is processed
`to provide a normal video aspect ratio view for the remote
`participants using the system.
`It is a principal object and advantage ofthis invention to
`provide a video conferencing system that automatically
`directs a video image to the participant that is speaking while
`providing a hemispherical video imaging subview that can
`be electronically manipulated in the direction of the speaker
`selected from a panoramic scene.
`It is another principal advantage of this invention to
`provide an automatic teleconferencing system that saves
`transmission time, reduces coincident cost by eliminating or
`reducing manual operation of a video camera, and does not
`detract from the concentration of the subject during the
`conference.
`
`And yet another advantage of this invention is to provide
`an automatic video camera with electronic image manipu-
`lation for video conferencing equipment that has no moving
`mechanical parts or physical mechanisms which improves
`the reliability of the system and reduces maintenance costs
`and service costs.
`
`In accordance with these and other objects which will
`become apparent hereinafter, the instant invention will now
`be described with particular reference to the accompanying
`drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`es)
`
`65
`
`12
`
`12
`
`
`
`5,686,957
`
`9
`capture-and-conversion method to change the image into
`electronic format before electronic manipulation can be
`performed. A video camera 11 in FIG. 1A is used for the
`teleconferencing system to capture the image.
`The stored electronic image data is then selectively
`accessed by a transfor