`
`12
`
`United States Patent
`Knappe
`
`10) Patent No.:
`(45) Date of Patent:
`
`US 7,180.997 B2
`9
`9
`Feb. 20, 2007
`
`(54) METHOD AND SYSTEM FOR IMPROVING
`THE INTELLIGIBILITY OF A MODERATOR
`DURING AMULTIPARTY
`COMMUNICATION SESSION
`
`1/2001 Horn .......................... 379f2O2
`6,178,237 B1
`8/2001 Jonsson ................. 379,202.01
`6,272,214 B1
`2002/0181686 A1* 12/2002 Howard et al. ........ 379,202.01
`FOREIGN PATENT DOCUMENTS
`
`(75) Inventor: Michael E. Knappe, Sunnyvale, CA
`(US)
`
`(73) Assignee: Cisco Technology, Inc., San Jose, CA
`(US)
`
`- r
`c
`(*) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 241 days.
`21) Appl. No.: 10/236,484
`(21) App
`9
`(22) Filed:
`Sep. 6, 2002
`
`(65)
`
`O
`O
`Prior Publication Data
`US 2004/OO52218 A1
`Mar. 18, 2004
`
`EP
`WO
`
`9, 1996
`O 730 365 A2
`WOOO,72560 A1 11, 2000
`OTHER PUBLICATIONS
`M
`PCT Notification of International Search Report, Application No.
`PCT/US03/25580 filed Aug. 15, 2003, Authorized by Carole Emery
`and mailed Jan. 1, 2004.
`Kok Soon Phua et al., “Spatial Speech Coding for Multi-Telecon
`ferencing, TENCON 99, Proceedings of the IEEE Region 10
`Conference, Cheju Island, South Korea, Sep. 15-17, 1999, pp.
`313-316.
`* cited by examiner
`Primary Examiner Sinh Tran
`Assistant Examiner Walter F Briney, III
`(74) Attorney, Agent, or Firm Baker Botts L.L.P.
`
`ABSTRACT
`(57)
`51) Int. C.
`A E. an method R improving the intelligibility of a
`(2006.01)
`(51) H04M 3/56
`(52) U.S. Cl. ............................. 379/387.01: 379/202.01 E.N. t Alter
`Act N
`(58) Field of Classification Search ................ 379/202,
`E" ity E". An
`379f2O2.07 20307,204.07 20507,206.O7
`Irom a plura 1ty o respective COCC participants.
`S
`licati
`fil f
`l t •
`s h hi t
`incoming moderator Voice stream may be received from a
`ee applicauon Ille Ior complete searcn n1story.
`moderator. The plurality of participant voice streams and the
`References Cited
`moderator voice stream are transmitted Such that the intel
`ligibility of the moderator voice stream is improved relative
`to at least one of the participant Voice streams.
`
`(56)
`
`U.S. PATENT DOCUMENTS
`
`4,499,578 A
`6,011,851 A
`
`... 370.62
`2, 1985 Marouf et al. ......
`1/2000 Connor et al. ................ 381.17
`
`40 Claims, 4 Drawing Sheets
`
`200
`
`ESTABLISH MUTI-PARTY
`ON SESSION
`COMMUNICAT
`
`IDENTY PRIORITY OF
`PARTCIPANTS
`
`204
`
`RECEIVE PARTICIPANT
`VOICE STREAMS
`
`
`
`
`
`
`
`MODERATOR
`SPEAKENG
`
`YES
`IMPROVE INTELLIGIBILITY OF
`210-11 MODERATOR WOICE STREAM
`
`208
`Z
`TRANSMIT PARTICIPANT
`VOICESREAMS
`
`
`
`212-/
`
`TRANSMIT PARTICIPANT AND
`MODERATOR WOICE STREAMS
`
`
`
`
`
`CONFERENCE
`TERMINATEO
`
`CSCO-1023
`CISCO SYSTEMS, INC. / Page 1 of 12
`
`
`
`U.S. Patent
`
`Feb. 20, 2007
`
`Sheet 1 of 4
`
`US 7,180,997 B2
`
`COMMUNICATIONS
`DEVICE
`
`28
`
`
`
`
`
`COMMUNICATIONS
`DEVICE
`
`COMMUNICATIONS
`DEVICE
`
`- 28
`
`50
`
`NETWORK
`
`52
`
`
`
`CALL
`MANAGER
`
`CONFERENCE
`BRIDGE
`
`28
`COMMUNICATIONS
`DEVICE
`
`FIC. 1
`
`1
`
`
`
`50
`
`54 CONFERENCE BRIDGE
`
`CONTROLLER
`
`CONVERTERS
`
`MIXER
`
`52
`
`5
`6
`
`60
`
`58
`
`62
`
`DATABASE
`
`62
`
`CONFERENCE
`PARAMETERS
`CONFERENCE
`PARTICIPANTS
`
`64
`
`66
`
`PARTICIPANT
`PRIORITIES
`
`CONFERENCE
`PARAMETERS
`CONFERENCE
`PARTICIPANTS
`
`64
`
`66
`
`PARTICIPANT
`PRIORITIES
`
`32
`
`FIC. 2
`
`CSCO-1023
`CISCO SYSTEMS, INC. / Page 2 of 12
`
`
`
`U.S. Patent
`
`Feb. 20, 2007
`
`Sheet 2 of 4
`
`US 7,180,997 B2
`
`
`
`PARTICIPANT INPUTS
`
`MONAURAL MIXER
`
`108
`
`108
`
`108
`
`108
`
`108
`
`CONFERENCE OUTPUTS
`
`
`
`PARTICIPANT INPUTS
`
`100
`
`106
`
`DP)
`
`106
`
`STEREO MIXER
`
`DP) 106YDP) 198
`
`(DPI
`
`106
`
`(DPI
`
`106
`
`CONFERENCE OUTPUTS
`
`LATERAL DIRECTIVITY
`
`BACK
`
`DEPTH
`PERCEPTION
`
`122
`
`FRONT
`
`CSCO-1023
`CISCO SYSTEMS, INC. / Page 3 of 12
`
`
`
`U.S. Patent
`
`Feb. 20, 2007
`
`Sheet 3 of 4
`
`US 7,180,997 B2
`
`FIC.. 6
`
`LATERAL DIRECTIVITY
`
`DEPTH
`
`FIC 7
`
`DIRECTIONAL PROCESSOR
`
`SPATIAL
`PROCESSOR
`
`
`
`106
`
`108
`
`CSCO-1023
`CISCO SYSTEMS, INC. / Page 4 of 12
`
`
`
`U.S. Patent
`
`Feb. 20, 2007
`
`Sheet 4 of 4
`
`US 7,180,997 B2
`
`FIC. 8
`
`
`
`200
`
`
`
`ESTABLISH MULTI-PARTY
`COMMUNICATION SESSION
`
`
`
`
`
`
`
`202
`
`IDENTIFY PRIORITY OF
`PARTICIPANTS
`
`RECEIVE PARTICIPANT
`VOICE STREAMS
`
`2O6
`
`MODERATOR
`SPEAKING?
`
`NO
`
`IMPROVE INTELLIGIBILITY OF
`MODERATOR VOICE STREAM
`
`TRANSMIT PARTICIPANT
`VOICE STREAMS
`
`208
`
`TRANSMIT PARTICIPANT AND
`MODERATOR VOICE STREAMS
`
`CONFERENCE
`TERMINATED?
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`CSCO-1023
`CISCO SYSTEMS, INC. / Page 5 of 12
`
`
`
`US 7,180,997 B2
`
`1.
`METHOD AND SYSTEM FOR IMPROVING
`THE INTELLIGIBILITY OF A MODERATOR
`DURING AMULTIPARTY
`COMMUNICATION SESSION
`
`TECHNICAL FIELD OF THE INVENTION
`
`The present invention relates generally to the field of
`multiparty communications, and more particularly to a
`method and system for improving the intelligibility of a
`moderator during a multiparty communication session.
`
`10
`
`BACKGROUND OF THE INVENTION
`
`Communication networks, such as the Public Switched
`Telephone Network (PSTN), for transporting electrical rep
`resentations of audible sounds from one location to another,
`are well known. Additionally, packet switched networks,
`Such as the Internet, are able to perform a similar function
`by transporting packets containing data that represents
`audible sounds from one location to another. The audible
`Sounds are encoded into digital data and placed into packets
`at the origination point, and extracted from the packets and
`decoded into audible sounds at the destination point.
`Such communication networks also allow multiple people
`to participate in a single call, typically known as a “confer
`ence call.” In a conference call, the audible sounds at each
`device, usually telephones, are distributed to all of the other
`devices participating in the conference call. Thus, each
`participant in the conference call may share information
`with all of the other participants.
`Modern business practices often require that several per
`Sons meet on the telephone to engage in a conference call.
`The conference call has introduced certain applications and
`techniques that are Superior to those found in a meeting with
`persons physically present in the same location. For
`example, a conference call attendee who is not participating
`at the moment may wish to mute their audio output and
`simply listen to the other conferences. This allows the
`particular conferencee to work on another project while still
`participating in the conference.
`While the conference call has been substantially helpful in
`minimizing travel expenses and other costs associated with
`business over long distances, significant obstacles still
`remain in accomplishing many tasks with the same effi
`ciency as one would in having a meeting with all persons in
`the same physical location. For example, the inability of
`conferencees to use or see visual aids or commands com
`plicates the control and organization of the conference. This
`often results in multiple speakers “stepping-on each other's
`speech Such that the resultant audio is incomprehensible.
`Furthermore, it is difficult to determine which participant(s)
`is speaking at any given time. Accordingly, it is difficult for
`a designated moderator(s) to control the flow and/or orga
`nization of the conference.
`
`15
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`SUMMARY OF THE INVENTION
`
`The present invention provides an improved method and
`system for enhancing the intelligibility of a moderator
`during a multiparty communication session that Substan
`tially eliminate or reduce the disadvantages and problems
`associated with previous systems and methods. In particular,
`the intelligibility of audio generated by a conference mod
`erator is improved with respect to other conference partici
`65
`pants. This enhances the ability of the moderator to control
`the organization, flow and/or control of the conference.
`
`60
`
`2
`In accordance with a particular embodiment of the present
`invention, a system and method for improving the intelligi
`bility of a moderator during a multi-party communications
`session includes receiving a plurality of participant voice
`streams from a plurality of respective conference partici
`pants. An incoming moderator Voice stream is also received,
`from a moderator. The method includes processing the
`plurality of participant voice streams and the moderator
`voice stream such that the intelligibility of the moderator
`voice stream is enhanced relative to at least one of the
`participant voice streams.
`In accordance with another embodiment of the present
`invention, the method includes storing a priority associated
`with each of the plurality of participant Voice streams. Such
`that at least one lowest priority Voice stream may be iden
`tified. In this embodiment, the incoming moderator voice
`stream is detected and transmitted to the conference partici
`pants. The at least one lowest priority voice stream may be
`blocked from transmission, while the moderator voice
`stream is being transmitted.
`In accordance with yet another embodiment, an increase
`in signal strength of the incoming moderator voice stream is
`detected. The participant voice streams are transmitted with
`a diminished signal strength approximately proportional to
`the increase in the signal strength of the incoming moderator
`Voice stream.
`Technical advantages of particular embodiments of the
`present invention include an improved method and system
`for improving the intelligibility of a moderator, during a
`multiparty communication session. The present invention
`allows the moderator's voice stream to be enhanced with
`respect to other conference participants. Accordingly, con
`ference participants can distinguish the moderator from
`other conference participants.
`Another technical advantage of particular embodiments
`of the present invention includes a method for improving the
`intelligibility of a moderator, in which the lowest priority
`voice stream may be blocked while the moderator's voice
`stream is being transmitted. This allows the moderator to
`speak over one participant, while allowing the rest of the
`participants to continue speaking intelligibly, while the
`moderator is exercising control over the telephone confer
`CCC.
`Other technical advantages of the present invention will
`be readily apparent to one skilled in the art from the
`following figures, description and claims. Moreover, while
`specific advantages have been enumerated above, various
`embodiments may include all, some, or none of the enu
`merated advantages.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`For a more complete understanding of the present inven
`tion and its advantages, reference is now made to the
`following description taken in conjunction with the accom
`panying drawings, wherein like numerals represent like
`parts, in which:
`FIG. 1 is a block diagram illustrating a communications
`system, in accordance with one embodiment of the present
`invention;
`FIG. 2 is a block diagram illustrating details of the
`conference bridge of FIG. 1, in accordance with one
`embodiment of the present invention;
`FIG. 3 is a block diagram illustrating a monaural mixer
`for the conference bridge of FIG. 2, in accordance with one
`embodiment of the present invention;
`
`CSCO-1023
`CISCO SYSTEMS, INC. / Page 6 of 12
`
`
`
`US 7,180,997 B2
`
`3
`FIG. 4 is a block diagram illustrating a stereo mixer for
`the conference bridge of FIG. 2, in accordance with one
`embodiment of the present invention:
`FIG. 5 is a block diagram illustrating spatial placements
`of participants in a stereo conference stream generated by
`the stereo mixer of FIG. 6, in accordance with one embodi
`ment of the present invention;
`FIG. 6 is a block diagram illustrating spatial movement of
`a conference moderator to a position of higher prominence
`in a stereo conference stream, in accordance with one
`embodiment of the present invention:
`FIG. 7 is a block diagram illustrating the directional
`processors and summers of the stereo mixer of FIG. 6, in
`accordance with one embodiment of the present invention;
`and
`FIG. 8 is a flow diagram illustrating a method for improv
`ing the intelligibility of a moderator, during a conference
`call, in accordance with one embodiment of the present
`invention.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`10
`
`15
`
`25
`
`30
`
`4
`video and/or other information over the network 14. The
`communication devices 16 also communicate control infor
`mation with the network 14 to control call setup, teardown
`and processing as well as call services.
`For voice calls, the communication devices 16 comprise
`real-time applications that play traffic as it is received or
`substantially as it is received and to which packet delivery
`cannot be interrupted without severely degrading perfor
`mance. A codec (coder/decoder) converts audio, video or
`other Suitable signals generated by users from analog signals
`into digital form. The digital encoded data is encapsulated
`into IP or other suitable packets for transmission over the
`network 14. IP packets received from the network 14 are
`converted back into analog signals and played to the user. It
`will be understood that the communication devices 16 may
`otherwise Suitably encode and decode signals transmitted
`over or received from the network 14.
`The gateway 20 provides conversion between analog
`and/or digital formats. The standard analog telephones 18
`communicate standard telephony signals through PSTN 22
`to the gateway 20. At the gateway 20, the signals are
`converted to IP packets in the VoIP format. Similarly, VoIP
`packets received from the network 14 are converted into
`standard telephony signals for delivery to the destination
`telephone 18 through PSTN 22. The gateway 20 also trans
`lates between the network call control system and the
`Signaling System 7 (SS7) protocol and/or other signaling
`protocols used in PSTN 22.
`In one embodiment, the network 14 includes a call
`manager 30 and a conference bridge 32. The call manager 30
`and the conference bridge 32 may be located in a central
`facility or have their functionality distributed across and/or
`at the periphery of the network 14. The call manager 30 and
`the conference bridge 32 are connected to the network 14 by
`any suitable type of wireline or wireless link. In another
`embodiment, the network 14 may be operated without the
`call manager 30, in which case the communication devices
`16 may communicate control information directly with each
`other or with other suitable network elements. In this
`embodiment, services are provided by the communication
`devices 16 and/or other suitable network elements.
`The call manager 30 manages calls in the network 14. A
`call is any communication session between two or more
`parties. The parties may be persons and/or equipment Such
`as computers. The sessions may include real-time connec
`tions, connections having real-time characteristics, non real
`time connections and/or a combination of connection types.
`The call manager 30 is responsive to service requests
`from the communication devices 16, including the standard
`telephones 18 through the gateway 20. For example, the call
`manager 30 may provide Voicemail, bridging, multicasting,
`call hold, conference call and other multiparty communica
`tions and/or other suitable services for the communications
`devices 16. The call manager 30 provides services by
`performing the services, controlling performance of the
`services, delegating performance of the services and/or by
`otherwise initiating the services.
`The conference bridge 32 provides conference call and
`other suitable audio, video, and/or real-time multiparty
`communication sessions between communication devices
`16. A multiparty communication session includes three or
`more parties exchanging audio and/or other suitable infor
`mation. In particular, the conference bridge 32 receives
`media from participating devices 16 and, using Suitable
`signal processing techniques, mixes the media to produce
`conference signals. During normal operation, each device 16
`receives a conference signal that includes contributions from
`
`FIG. 1 illustrates a communication system 12 in accor
`dance with one embodiment of the present invention. In this
`embodiment, the communication system 12 is a distributed
`system transmitting audio, video, Voice, data and other
`suitable types of real-time and/or non real-time traffic
`between source and destination endpoints. Communication
`system 12 may be used to conduct multiple party telephone
`conference communication sessions. In accordance with a
`particular embodiment of the present invention, various
`components of communication system 12 may be config
`ured to automatically improve the intelligibility of a mod
`erator during a multi-party communication session. The
`35
`disclosed embodiments allow the moderator to exercise
`control and influence over the telephone conference, without
`completely silencing all other participants. The various
`methods and systems by which this is accomplished are
`described and illustrated herein.
`Referring to FIG. 1, communication system 12 includes a
`network 14 connecting a plurality of communication devices
`16 to each other and to standard analog telephones 18
`through a gateway 20 and the public switched telephone
`network (PSTN) 22. The communication devices 16, stan
`dard analog telephones 18 and gateway 20 are connected to
`the network 14 and/or PSTN 22 through twisted pair, cable,
`fiber optic, radio frequency, infrared, microwave and/or any
`other suitable type of wireline or wireless links 28.
`In accordance with a particular embodiment, network 14
`is the Internet, a wide area network (WAN), a local area
`network (LAN) or other suitable packet-switched network.
`In the Internet embodiment, the network 14 transmits infor
`mation in Internet Protocol (IP) packets. Telephony voice
`information is transmitted in the Voice over IP (VoIP)
`format. Real-time IP packets such as VoIP packets are
`encapsulated in real-time transport protocol (RTP) packets
`for transmission over the network 14. It will be understood
`that the network 14 may comprise any other Suitable types
`of elements and links and that traffic may be otherwise
`Suitably transmitted using other protocols and formats.
`The communication devices 16 comprise IP or other
`digital telephones, personal and other Suitable computers or
`computing devices, personal digital assistants (PDAs), cell
`or other mobile telephones or handset or any other device or
`set of devices such as the telephone 18 and gateway 20
`combination capable of communicating real-time audio,
`
`55
`
`40
`
`45
`
`50
`
`60
`
`65
`
`CSCO-1023
`CISCO SYSTEMS, INC. / Page 7 of 12
`
`
`
`15
`
`5
`all other participating devices. As used herein, the term each
`means every one of at least a subset of the identified items.
`As described in more detail below, the conference bridge
`32 improves the intelligibility of a moderator during multi
`party communication sessions. In particular, the conference
`bridge 32 provides a system and method for enhancing the
`voice stream of the conference moderator, with respect to
`Voice streams of other participants.
`In operation, a call initiation request is first sent to the call
`manager 30 when a call is placed over the network 14. The
`call initiation request may be generated by a communication
`device 16 and/or the gateway 20 for telephones 18. Once the
`call manager 30 receives the call initiation request, the call
`manager 30 sends a signal to the initiating communication
`device 16 and/or gateway 20 for telephones 18 offering to
`call the destination device. If the destination device can
`accept the call, the destination device replies to the call
`manager 30 that it will accept the call. By receiving this
`acceptance, the call manager 30 transmits a signal to the
`destination device causing it to ring. When the call is
`answered, the call manager 30 instructs the called device and
`the originating device to begin media streaming to each
`other. If the originating device is a PSTN telephone 18, the
`media streaming occurs between the gateway 20 and the
`destination device. The gateway 20 then transmits the media
`to the telephone 18.
`For conference calls, the call manager 30 identifies par
`ticipants based on the called number or other suitable
`criteria. The call manager 30 controls the conference bridge
`32 to set up, process and tear down conference calls and
`other multiparty communication sessions. During the mul
`tiparty communications sessions, participants are connected
`and stream media through the conference bridge 32. The
`media is cross connected and mixed to produce conference
`output streams for each participant. The conference output
`stream for a participant includes the media of all other
`participants, a Subset of other participants or other Suitable
`mix dictated by the type of multiparty session and/or the
`participant.
`FIG. 2 illustrates details of the conference bridge 32 in
`accordance with a particular embodiment of the present
`invention. In this embodiment, the conference bridge 32
`provides real-time multiparty audio connections between
`three or more participants. It will be understood that the
`conference bridge 32 may support other types of suitable
`multiparty communications sessions including real-time
`audio streams and/or video streams, without departing from
`the scope of the present invention.
`Referring to FIG. 2, conference bridge 32 includes a
`controller 50, buffers 52, converters 54, normalizer 56,
`mixer 58 and database 60. The controller 50, buffers 52,
`converters 54, normalizer 56, mixer (e.g., adaptive Sum
`mers) 58 and database 60 of the conference bridge as well
`as other Suitable components of the communications system
`12 may comprise logic encoded in media. Logic comprises
`functional instructions for carrying out programmed tasks.
`The media comprises computer disks or other Suitable
`computer-readable media, applications specific integrated
`circuits (ASIC), field programmable gate arrays (FPGA) or
`other Suitable specific or general purpose processors, trans
`mission media or other Suitable media in which logic may be
`encoded and utilized.
`The controller 50 directs the other components of the
`conference bridge 32 and communicates with the call man
`ager 30 to set up, process and tear down conference calls.
`65
`The controller 50 also receives information regarding the
`priority of each participant either directly from the commu
`
`40
`
`45
`
`50
`
`55
`
`60
`
`US 7,180,997 B2
`
`10
`
`25
`
`30
`
`35
`
`6
`nication devices 16 or through the call manager 30. Such
`information is stored in the database 60.
`The buffers 52 include input and output buffers. The input
`buffers receive and buffer packets of input audio streams
`from participants for processing by the conference bridge
`32. The output buffers receive and buffer conference output
`streams generated by the conference bridge 32 for transmis
`sions to participants. In a particular embodiment, a particular
`input buffer or set of input buffer resources are assigned to
`each audio input stream and a particular output buffer or set
`of output buffer resources are assigned to each conference
`output stream. The input and output buffers may be associ
`ated with corresponding input and output parts or interfaces
`and perform error check, packet loss prevention, packet
`ordering and congestion control functions.
`The converters 54 include input and output converters.
`The input converters receive input packets of a participant
`from a corresponding buffer and convert the packet from the
`native format of the participants device 16 to a standard
`format of the conference bridge 32 for cross linking and
`processing in the conference bridge 32. Conversely, the
`output converters receive conference output streams for
`participants in the standard format and convert the confer
`ence output streams to the native format of participants
`devices. In this way, the conference bridge 32 allows par
`ticipants to connect using a variety of devices and technolo
`gies.
`The normalizers 56 include input and/or output normal
`izers. The normalizers receive packets from the input audio
`streams in a common format and normalize the timing of the
`packets for cross connections in the mixer 58.
`The mixer 58 includes a plurality of summers or other
`Suitable signal processing resources each operable to Sum,
`add or otherwise combine a plurality of input streams into
`conference output streams for participants to a conference
`call. As described in more detail below, the mixer 58 may be
`a monaural mixer or a stereo mixer. Once the mixer 58 has
`generated the conference output streams, each conference
`output stream is converted by a corresponding converter and
`buffered by a corresponding output buffer for transmission
`to the corresponding participant.
`The database 60 includes a set of conference parameters
`62 for each ongoing conference call of the conference bridge
`32. The conference parameters 62 for each conference call
`include an identification of participants 64 and the priority
`assigned to each participant (priority parameters) 66 for the
`conference call. In one embodiment, the participants are
`identified at the beginning of a conference call based on
`caller ID, phone number or other suitable identifier. The
`priority parameters may be initially set to a default.
`FIG. 3 illustrates components and operation of the mixer
`58 in a monaural embodiment of the present invention. In
`particular, FIG. 3 illustrates details of a monaural mixer 80
`in accordance with a particular embodiment. It will be
`understood that a monaural mixer may be otherwise Suitably
`implemented without departing from the scope of the
`present invention.
`Referring to FIG. 3, the monaural mixer 80 receives
`participant input streams 84 and combines the streams in
`summers 82 to generate conference output streams 86 for
`each participant to a conference call. In one embodiment,
`each participant is assigned a Summer 82 that receives audio
`input streams from each other participant to the conference
`call. The summer 82 combines the audio input streams to
`generate a conference output stream for delivery to the
`participant.
`
`CSCO-1023
`CISCO SYSTEMS, INC. / Page 8 of 12
`
`
`
`7
`During normal operation, each participant receives the
`audio input of each other participant. Thus, for example, the
`conference output stream of participant 1 includes the audio
`inputs of participants 2–5. Similarly, the conference output
`stream of participant 2 includes the audio inputs of partici
`pants 1 and 3–5. The conference output stream of participant
`3 includes the audio inputs of participants 1–2 and 4–5. The
`conference output stream of participant 4 includes the audio
`inputs of participants 1–3 and 5. The conference output
`stream of participant 5 includes the audio inputs of partici
`pants 1-4.
`The audio input 84 of the conference moderator may be
`amplified and/or the audio input 84 of the remaining par
`ticipants attenuated to focus on or provide higher promi
`nence to the audio input 84 of the conference moderator. A
`15
`higher prominence is provided by increasing the intelligi
`bility of the moderator relative to the remaining participants.
`For a conference moderator, the audio streams may be
`made prominent in the conference output stream by ampli
`fying the Voice input stream of the moderator or by attenu
`ating voice input streams 90 of the other participants. For
`example, the Voice input stream of the moderator may be
`multiplied by “1.2' while the voice input streams of the
`other conference participants are multiplied by "0.8. Other
`methods for enhancing the intelligibility of the moderator
`with respect to other conference participants will be
`described with regard to FIGS. 4-8.
`FIGS. 4–7 illustrate components and operation of the
`mixer 58 in a stereo embodiment of the present invention. In
`particular, FIG. 6 illustrates details of a stereo mixer 100 in
`accordance with a particular embodiment. FIG. 5 illustrates
`spatial positioning of participant audio in a stereo confer
`ence stream of a conference call participant. FIG. 7 illus
`trates details of a directional processor 106 and a summer
`108 of the Stereo mixer 100.
`Referring to FIG. 4, the stereo mixer 100 receives par
`ticipant input streams 102 and generates Stereo conference
`output streams 104 using the directional processors 106 and
`the summers 108. In one embodiment, each participant is
`assigned a directional processor 106 and a summer 108. The
`directional processor 106 receives audio input streams 102
`from other participants to the conference call and generates
`spatially positioned stereo streams that are combined by the
`summer 108 to generate the stereo conference output
`streams 104. Each stereo conference output stream 104
`includes a left (L) and a right (R) channel.
`During normal operation, each participant receives the
`audio input of each other participant to a conference call.
`Thus, for example, the Stereo conference output stream for
`participant 1 includes the audio inputs of participants 2-5.
`Similarly, the stereo conference output stream for participant
`2 includes the audio inputs from participants 1 and 3–5. The
`Stereo conference output stream for participant 3 includes
`the audio inputs of participants 1–2 and 4–5. The stereo
`conference output stream for participant 4 includes the audio
`inputs from participants 1–3 and 5. The stereo conference
`output stream for participant 5 includes the audio inputs
`from participants 1–4.
`Referring to FIG. 5, each stereo conference output stream
`104 includes audio inputs or sources 120 from the other
`participants or groups of participants that are perceived by
`the listener 122 as coming from different spatial locations.
`The spatial locations vary from front to back in the listener's
`depth perception and from left to right in the listener's lateral
`directivity. Because the Sound sources are spatially sepa
`rated, the listener 122 can more easily focus on individual
`Sound sources of auditory information in the presence of
`
`65
`
`40
`
`45
`
`50
`
`55
`
`60
`
`US 7,180,997 B2
`
`10
`
`25
`
`30
`
`35
`
`8
`other Sound sources. Thus, the spatial separation of the
`sound sources 120 increases the ability of the listener 122 to
`differentiate between the multiple sound sources 120.
`In the illustrated embodiment, each participant 1-4 is
`spatially positioned in front and at an equal distance from the
`participant 5. In this configuration, each participant 1-4 has
`an equal degree or Substantial degree of prominence with
`respect to the participant 5. As described in more detail
`below, participants 1–4 in the Stereo conference output
`stream 104 may be repositioned to the foreground to provide
`a higher degree of intelligibility and prominence to partici
`pant 5.
`Referring to FIG. 6, the output stream 104 for participant
`5, for example, includes the audio input of moderator 3 in
`the foreground with the other participants 1, 2 and 4 in the
`background. The foreground position provides participant 5
`or other listener 160 with the highest degree of intelligibility
`such that the listener may focus on moderator 3 or other
`moderator audio Sources 162 while still hearing non-mod
`erator sources 164 in the background. It will be understood
`that moderator 162 may be otherwise suitably positioned in
`the output stream(s) 104, without departing from the scope
`of the present invention.
`Referring to FIG. 7, the directional processor 106 of the
`stereo mixer 100 includes a plurality of spatial processors
`180 and the summer 108 includes a left and right channel
`summers 182. The spatial processors 180 each present
`monaural Sources at different locations in a binaural Sound
`field using standard intensity panning and/or Head Related
`Transfer Function (HRTF) position filtering. The binaural
`Sound streams each include left and right channels compo
`nents 184 generating a perceived position such as, for
`example, back/left front/center and back/right. The left
`channel of each binaural stream is fed to the left channel
`summer 182 while the right channels are fed to the right
`channel summer 182. The summers 182 generate a com
`bined left stream 186 and combined right stream 188 includ
`ing a perceived plurality of discrete audio inputs spatially
`positioned in two or three dimensional space relative to the
`listener. Further information regarding the directional pro
`cessor 106 and summer 108 are provided in co-owned U.S.
`Pat. No. 6,011.851, which is hereby incorpo