throbber
(12) United States Patent
`Aldous et al.
`
`I 1111111111111111 11111 111111111111111 IIIII IIIII IIIII IIIII IIIIII IIII 11111111
`US006654 722B 1
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 6,654,722 Bl
`Nov. 25, 2003
`
`(54) VOICE OVER IP PROTOCOL BASED
`SPEECH SYSTEM
`
`(75)
`
`Inventors: Anne M. Aldous, Davie, FL (US);
`Joseph Celi, Jr., Boca Raton, FL (US);
`Brett Gavagni, Sunrise, FL (US);
`Kyriakos Leontiades, Boca Raton, FL
`(US); Bruce D. Lucas, Yorktown
`Heights, NY (US); David E. Reich,
`Jupiter, FL (US)
`
`(73) Assignee: International Business Machines
`Corporation, Armonk, NY (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 681 days.
`
`(21) Appl. No.: 09/596,769
`
`(22) Filed:
`
`Jun. 19,2000
`
`Int. Cl.7 ................................................ GlOL 11/00
`(51)
`(52) U.S. Cl. .................................................... 704/270.1
`(58) Field of Search .............................. 704/270, 270.1,
`704/275, 276; 370/237; 379/88.02, 221.01,
`219; 709/204
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`5,581,600 A * 12/1996 Watts et al.
`
`............. 379/88.02
`
`5,881,135 A * 3/1999 Watts et al.
`............. 379/88.02
`5,916,302 A * 6/1999 Dunn et al. ................. 709/204
`5,933,490 A * 8/1999 White et al. ........... 379/221.01
`5,983,190 A * 11/1999 Trower et al.
`.............. 704/276
`6,014,437 A * 1/2000 Acker et al. ................ 379/219
`6,154,445 A * 11/2000 Farris et al.
`................ 370/237
`6,269,336 Bl * 7/2001 Ladd et al.
`................. 704/270
`* cited by examiner
`
`Primary Examiner-David D. Knepper
`(74) Attorney, Agent, or Firm-Akerman Senterfitt
`
`(57)
`
`ABSTRACT
`
`A VoIP-enabled speech server can include a speech appli(cid:173)
`cation which can be configured to communicate with a VoIP
`telephony gateway server over a VoIP communications path.
`The VoIP-enabled speech server can also include a VoIP(cid:173)
`compliant call control interface to the VoIP telephony gate
`server, the VoIP-compliant call control interface establishing
`the VoIP communications path. In operation, the speech
`application can receive VoIP-compliant packets from the
`VoIP telephony gateway server over the VoIP communica(cid:173)
`tions path. Subsequently, digitized audio data can be recon(cid:173)
`structed from the VoIP-compliant packets, and the digitized
`audio data can be speech-to-text converted. Additionally,
`text can be synthesized into digitized audio data and the
`digitized audio data can be encapsulated in VoIP-compliant
`packets which can be transmitted over the VoIP communi(cid:173)
`cations path to the telephony gateway server.
`
`21 Claims, 3 Drawing Sheets
`
`2
`
`3
`
`5
`
`VoiP
`Telephony l-4--+I
`Gateway
`Server
`
`VoIP
`,_ _ _, Enabled._ _ _,
`Speech
`Server
`
`/
`
`7
`
`Web Server
`
`8
`
`CISCO EXHIBIT 1012
`Page 1 of 10
`
`

`

`i,-
`~
`N
`N
`~
`,I;;..
`(It
`a-...
`-..a-...
`rJ'J.
`e
`
`~
`
`'"""' 0 ....,
`~ ....
`'JJ. =(cid:173)~
`
`8
`0
`N
`N y.
`~
`z 0
`
`~ = ......
`~ ......
`~
`•
`r:JJ.
`d •
`
`8
`
`VoiceXML
`
`Web Server
`
`7.
`
`\ 6
`
`Network
`
`I
`-~
`
`Comrnunlcations
`
`Data
`
`(cid:141)
`
`FIG. 1
`
`I Enabled I(cid:141)
`
`VoIP
`
`Server
`Speech
`
`4/
`
`(cid:141)
`
`I(cid:141)
`
`VoIP Nei:work
`
`llllol
`
`Server
`Gateway
`Telephony I~
`
`VoIP
`
`5
`
`I
`
`3
`
`/2
`
`CISCO EXHIBIT 1012
`Page 2 of 10
`
`

`

`i,-
`~
`N
`N
`~
`,I;;..
`(It
`a-...
`_,.a-...
`rJ"J.
`e
`
`~
`
`0 ....,
`N
`~ ....
`rF.J =(cid:173)~
`
`0 8
`N
`N y.
`~
`z 0
`
`~ = ......
`~ ......
`~
`•
`r:JJ.
`d •
`
`FIG. 2
`
`13
`
`18
`
`, Server
`Speech
`'--.j, Enabled
`
`r---------~-
`
`VoIP
`
`5.
`
`13
`
`,,...5
`
`Server I
`Speech
`Enabled
`
`VoIP
`
`~18
`
`~ 4/
`
`14
`
`t
`
`3
`
`13
`
`VoIP Network
`
`\
`
`~
`
`--
`
`\
`
`/
`
`18
`
`15
`
`·16
`
`Server
`Speech
`I Enabled
`VoIP
`✓--------
`
`I
`
`5.
`
`2
`
`1
`
`CISCO EXHIBIT 1012
`Page 3 of 10
`
`

`

`i,-
`~
`N
`N
`~
`,I;;..
`(It
`a-...
`_,.a-...
`rJ'J.
`e
`
`~
`
`~
`
`0 ....,
`~ ....
`'J'J. =(cid:173)~
`
`~
`0
`0
`N
`Ul
`N
`~
`
`~
`
`z 0
`
`~ = ......
`~ ......
`~
`•
`r:JJ.
`d •
`
`Telephony
`
`Module
`
`H.323
`
`2
`
`FIG. 3
`
`Speech Application
`
`35
`
`33
`
`\
`I
`
`Engine
`
`Text to Speech
`
`Browser
`
`JSAPI ~·
`
`34
`
`Engine
`
`Recognition
`
`Speech
`
`I
`:
`11---------f
`I
`I
`I
`I
`
`2\ ______ _ 30
`
`23
`
`5
`
`22
`
`21
`
`Speech Server
`VoIP Enabled
`
`CISCO EXHIBIT 1012
`Page 4 of 10
`
`

`

`1
`VOICE OVER IP PROTOCOL BASED
`SPEECH SYSTEM
`
`US 6,654,722 Bl
`
`BACKGROUND OF THE INVENTION
`1. Technical Field
`This invention relates to the field of voice recognition and
`more particularly to a speech application for use in a Voice
`over IP protocol network.
`2. Description of the Related Art
`LAN telephony, which means "the integration of tele(cid:173)
`phony and data services provided by packet-switched data
`networks," is the technology that takes person-to-person
`communication to a high new level and associated costs to
`a lower level. LAN telephony enables a more flexible and
`cost-efficient use of many applications, for example auto(cid:173)
`mated call distribution, interactive voice response, voice
`logging, etc. This is in contrast to the relatively limited
`integration offered by the current voice/data integration
`paradigm, computer-telephony integration in which voice 20
`traffic is kept separate from data traffic and carried over
`circuit-switched links. Whereas the old paradigm for inte(cid:173)
`grating data and voice has been to use the circuit-switched
`telephony fabric for data communications, the obvious
`drawbacks of the relatively low bandwidth available to data 25
`traffic, the inefficiency of circuit-switched data communica(cid:173)
`tions due to the "bursty" nature of data traffic, and the
`limited voice/data integration possibilities have led to
`present topologies in which IP data servers are bundled with
`proprietary PBXs or voice circuit switches in order to 30
`provide a loose integration between circuit and packet(cid:173)
`switched networks and voice is carried by the circuit(cid:173)
`switched network.
`One of the most common uses of LAN telephony is in the
`enterprise Internet/Intranet environment, referred to as IP
`telephony. The Voice over IP ("VoIP") protocol is the
`protocol upon which voice traffic can be transmitted across
`IP networks. In a VoIP network, analog speech signals
`received from an analog speech audio source, for example a
`PSTN or a microphone, are digitized, compressed and
`translated into IP packets for transmission over an IP net(cid:173)
`work. Several well-known protocols implement the VoIP
`protocol specification including H.323, Session Initializa(cid:173)
`tion Protocol ("SIP") and Master Gateway Control Protocol
`("MGCP").
`A common application for IP telephony is the integration
`of voice mail ("v-mail") and electronic mail ("e-mail").
`Another application can include voice logging by financial
`or emergency-response organizations. Additionally, auto(cid:173)
`mated call distribution ("ACD") can be facilitated whereby 50
`an ACD server performs value-based queuing of incoming
`telephone calls. Finally, interactive voice response systems
`can incorporate IP telephony in which responses are pre(cid:173)
`programmed in a server as a workflow component. Still,
`speech recognition and speech synthesis applications 55
`("speech applications") have lagged in the use of IP tele(cid:173)
`phony.
`In particular, speech applications operate on real-time
`audio signals which cannot tolerate latencies associated with
`traditional data communications. As such, where speech
`applications have been incorporated in an IP telephony
`topology, the speech applications have been closely inte(cid:173)
`grated with IP telephony server in order to preclude a
`negative impact from network based latencies. Accordingly,
`the design and development of such IP telephony enabled 65
`speech applications have been closely linked to the propri(cid:173)
`etary nature of the IP telephony server.
`
`5
`
`2
`The tight linkage between the speech application and the
`IP telephony server substantially limits both the design and
`the extensibility of the speech application. Specifically, in
`the present paradigm the speech application design must
`incorporate functionality directly related to the chosen pro(cid:173)
`tocol for transporting packetized voice data to a speech
`recognition system and from a speech synthesis system. The
`development of a superior voice transport protocol, by
`nature of the tight linkage between the IP telephony server
`10 and the speech application, can compel the redesign of the
`speech application. Accordingly, there exists a need for a
`speech a VoIP-based speech system in which the design and
`implementation of the speech application remains separate
`from the design and implementation of the IP telephony
`15 system.
`
`SUMMARY OF THE INVENTION
`
`It is an object of the present invention to provide a
`VoIP-based speech system in which the design and imple(cid:173)
`mentation of the speech application remains separate from
`the design and implementation of the IP telephony system.
`It is a further object of the present invention to provide a
`VoIP-enabled speech server which can receive audio input
`from the IP telephony system over a VoIP network. It is yet
`another object of the present invention to provide a method
`for coupling a speech application to a telephony gateway
`server in a VoIP network. Finally, it is an object of the
`present invention to provide each of the VoIP-based speech
`system, the VoIP-enabled speech server and the method for
`coupling the speech application to the telephony gateway
`server using standards-based interfaces to the VoIP network,
`the t server and the speech application.
`These and other objects of the present invention are
`35 accomplished in a VoIP-based speech system including: a
`VoIP telephony gateway server; at least one speech server,
`each speech server containing a VoIP-enabled speech appli(cid:173)
`cation; a VoIP-compliant call control interface between the
`VoIP telephony gateway server and the speech server; and,
`40 a VoIP communications path between the VoIP telephony
`gateway-server and the speech application in the at least one
`speech server. In the VoIP-based speech system, the VoIP
`telephony gateway server and the speech application can
`establish the VoIP communications path through the VoIP-
`45 compliant call control interface.
`In operation, the VoIP telephony gateway server can
`receive audio signals from a telephony interface, digitize the
`audio signals into digitized audio data, compress the digi(cid:173)
`tized audio data into VoIP-compliant packets, and transmit
`the VoIP-compliant packets to the speech application in the
`at least one speech server through the VoIP communications
`path using the VoIP protocol. Correspondingly, the speech
`application can receive the VoIP-compliant packets, recon(cid:173)
`struct the digitized audio data from the VoIP-compliant
`packets, and speech-to-text converting the digitized audio
`data. In addition, the speech application can synthesize text
`into digitized audio data, encapsulate the digitized audio
`data in VoIP-compliant packets and transmit the VOiP(cid:173)
`compliant packets through the VoIP communications path to
`60 the VoIP telephony gateway server. Subsequently, the VoIP
`telephony-gateway. server can receive the VoIP-compliant
`packets, reconstruct the digitized audio data from the VoIP(cid:173)
`compliant packets, and transmit the digitized audio data
`through the telephony interface.
`In one aspect of the present invention, the VoIP telephony
`server can include a telephony interface and a VoIP Gate(cid:173)
`keeper. The VoIP Gatekeeper can receive a voice call
`
`CISCO EXHIBIT 1012
`Page 5 of 10
`
`

`

`US 6,654,722 Bl
`
`4
`Preferably, the speech application is a speech browser.
`The speech browser can retrieve Web content responsive to
`voice commands received through the VoIP communications
`path. The speech browser can also speech synthesize the
`retrieved Web content into audio data. Subsequently, the
`speech browser can transmit the audio data through the VoIP
`communications path to the VoIP telephony-gateway-server.
`Significantly, the Web content can be a VoiceXML docu(cid:173)
`ment.
`Preferably, the VoIP-enabled speech server can be imple-
`mented using standards-based interfaces to the VoIP tele(cid:173)
`phony gateway server, the VoIP communications path, and
`the speech application. Specifically, the VoIP-enabled
`speech server can include a JTAPI telephony interface for
`15 establishing a voice call connection for transporting digital
`audio data between the telephony gateway server and the
`speech application. Additionally, the VoIP-enabled speech
`server can have a JMF media interface for establishing a data
`path for transporting the digital audio data between the
`20 speech application and the voice call connection. Finally, the
`VoIP-enabled speech server can have a JSAPI speech inter(cid:173)
`face both for communicating the digitized audio data from
`the speech application to the speech recognition engine, and
`for communicating speech synthesized audio data from the
`25 text-to-speech engine to the speech application.
`Finally, the present invention can include a method for
`coupling a speech application to a telephony gateway server
`in a VoIP network. The method can include the steps of
`establishing a VoIP communications path with the VoIP
`30 telephony gateway server and configuring the speech appli(cid:173)
`cation to communicate with the telephony gateway server
`over the established VoIP communications path.
`Additionally, VoIP-compliant packets can be received from
`the telephony gateway server over the established VoIP
`35 communications path. Digitized audio data can be recon(cid:173)
`structed from the VoIP-compliant packets and, subsequently,
`the digitized audio data can be speech-to-text converted.
`Additionally, the method can include the steps of synthe(cid:173)
`sizing text into digitized audio data; encapsulating the
`40 digitized audio data in VoIP-compliant packets; and, trans(cid:173)
`mitting the VoIP-compliant packets over the VoIP commu(cid:173)
`nications path to the telephony gateway server.
`In the preferred embodiment, the method can further
`include the steps of retrieving Web content responsive to
`45 speech recognized voice commands received through the
`VoIP communications path; synthesizing the retrieved Web
`content into audio data; and, transmitting the audio data
`through the VoIP communications path to the telephony
`gateway server. Significantly, the Web content can be a
`50 VoiceXML document.
`
`5
`
`3
`through the telephony interface, and responsively, the VoIP
`Gatekeeper can choose a speech server from among the
`speech servers. Once a speech server has been chosen, the
`VoIP Gatekeeper can alert the VoIP-enabled speech appli(cid:173)
`cation in the chosen speech server that the voice call has
`been received.
`In another aspect of the present invention, the speech
`server can include a speech recognition engine; a text-to(cid:173)
`speech engine; a call control interface for establishing a
`voice call connection through the VoIP telephony gateway 10
`server; and, an audio data path. Notably, the audio data path
`can stream audio data through the established voice call
`connection to the speech recognition engine. Similarly, the
`audio data path can stream audio data through the estab(cid:173)
`lished voice call connection from the text-to-speech engine.
`In yet another aspect of the present invention, the speech
`application can be a speech browser. The speech browser
`can retrieve Web content responsive to voice commands
`received through the VoIP communications path. Also, the
`speech browser can speech synthesize the retrieved Web
`content into audio data. Finally, the speech browser can
`transmit the audio data through the VoIP communications
`path to the VoIP telephony gateway server. Significantly, the
`Web content can be a VoiceXML document.
`Preferably, the speech server can be implemented using
`standards-based interfaces to the VoIP telephony gateway
`server, the VoIP communications path, and the speech appli(cid:173)
`cation. Specifically, the speech server can include a speech
`recognition engine; a text-to-speech engine; a JSAPI speech
`interface; a JTAPI telephony interface; and a JMF media
`interface. The JTAPI telephony interface can establish a
`voice call connection for transporting digital audio data
`between the Ages telephony gateway server and the speech
`application. The JMF media interface can establish a data
`path for transporting the digital audio data between the
`speech application and the voice call connection. The JSAPI
`speech interface can communicate the digitized audio data
`from the speech application to the speech recognition
`engine. Similarly, the JSAPI speech interface can commu(cid:173)
`nicate speech synthesized audio data from the text-to-speech
`engine to the speech application.
`The present invention can also be embodied in a VoIP(cid:173)
`enabled speech server which can include a speech applica(cid:173)
`tion which can be configured to communicate with a VoIP
`telephony gateway server over a VoIP communications path.
`The VoIP-enabled speech server can also include a VoIP(cid:173)
`compliant call control interface to the VoIP telephony gate(cid:173)
`way server, the VoIP-compliant call control interface estab(cid:173)
`lishing the VoIP communications path. In operation, the
`speech application can receive VoIP-compliant packets from
`the VoIP telephony gateway server over the VoIP commu(cid:173)
`nications path. Subsequently, digitized audio data can be
`reconstructed from the VoIP-compliant packets, and the
`digitized audio data can be speech-to-text converted. 55
`Additionally, text can be synthesized into digitized audio
`data and the digitized audio data can be encapsulated in
`VoIP-compliant packets which can be transmitted over the
`VoIP communications path to the telephony gateway server.
`In another aspect of the VoIP-enabled speech server, the
`VoIP-enabled speech server can include a speech recogni(cid:173)
`tion engine, a text-to-speech engine and an audio data path.
`The audio data path can stream audio data through the
`established voice call connection to the speech recognition
`engine. Also, the audio data path can stream audio data 65
`through the established voice call connection from the
`text-to-speech engine.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`There are presently shown in the drawings embodiments
`which are presently preferred, it being understood, however,
`that the invention is not limited to the precise arrangements
`and instrumentalities shown.
`FIG. 1 is a schematic illustration of a VoIP-based speech
`system according to the present invention.
`FIG. 2 is a diagram of a preferred architecture for the VoIP
`60 telephony gateway server of FIG. 1.
`FIG. 3 is a diagram of a preferred architecture for the
`speech server of FIG. 1.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`The present invention is a Voice over IP ("VoIP") based
`speech system in which a speech server can be coupled to a
`
`CISCO EXHIBIT 1012
`Page 6 of 10
`
`

`

`US 6,654,722 Bl
`
`5
`telephony gateway server in a VoIP network. The telephony
`gateway server can receive voice calls from an external
`telephone network, for example a public switched telephone
`network ("PSTN"), an integrated services digital network
`("ISDN") and the like. The speech server can include a 5
`speech application which can receive real-time speech input
`through a VoIP communications path originating from voice
`calls in the telephony gateway server. Likewise, the speech
`application can transmit speech synthesized audio data
`through the VoIP communications path to the telephony 10
`gateway server and ultimately to a termination point in the
`external telephone network. Significantly, the speech appli(cid:173)
`cation can receive voice browser commands through the
`voice call, responsive to which the speech application can
`retrieve Web content from external Web servers. 15
`Additionally, the Web-content can be speech synthesized
`and transmitted through the VoIP communications path, also
`as part of the voice call. In the preferred embodiment, the
`Web content can be a VoiceXML document.
`FIG. 1 illustrates a VoIP-based speech system according 20
`the preferred embodiments. Notably, as is well-known in the
`art, the VoIP specification can be implemented using several
`published standards, for instance H.323, SIP and MGCP.
`However, the present invention implements H.323 although
`the invention is not limited with regard to the particular 25
`implementation of VoIP. As shown in FIG. 1, in operation,
`a user can initiate a voice call using telephone device 1. The
`voice call can attempt to connect with a VoIP telephony
`gateway server 3 through a telephone network 2, for instance
`a PSTN or ISDN. The VoIP telephony gateway server 3 can 30
`translate the address of the intended recipient of the voice
`call to the IP address of a device residing in the VoIP network
`4, in this instance a VoIP Enabled Speech Server 5.
`Subsequently, the VoIP telephony gateway server 3 can
`notify the VoIP Enabled Speech Server 5 of the voice call 35
`which the VoIP Enabled Speech Server 5 can accept. Upon
`accepting the voice call, the VoIP Enabled Speech Server 5
`can establish a VoIP communications path between the VoIP
`telephony gateway server 3 and the VoIP Enabled Speech
`Server 5 such that VoIP-compliant packets of audio data can 40
`be transported between the VoIP telephony gateway server 3
`and the VoIP Enabled Speech Server 5. In this manner, audio
`data originating in the telephone device 1 can be received
`and processed in the VoIP Enabled Speech Server 5.
`Likewise, audio data originating in the VoIP Enabled Speech 45
`Server 5 can be transmitted back to the telephone device 1.
`Notably, in the preferred embodiment, the VoIP Enabled
`Speech Server 5 can accept voice commands originating in
`the telephone device 1 for retrieving Web content from a
`Web server 7 in a data communications network 6. 50
`Specifically, the Web content 8 can be a VoiceXML docu(cid:173)
`ment 8. In response, the VoIP Enabled Speech Server 5 can
`retrieve the VoiceXML document 8 from the Web server 7
`and can synthesize audio data according to instructions
`contained in the VoiceXML document 8. Subsequently, the
`synthesized audio data can be transported across the VoIP
`network 4 to the VoIP telephony gateway server 3 and
`ultimately to the telephone device 1.
`FIG. 2 illustrates the VoIP network 4 of FIG. 1 and
`contains a more detailed illustration of the VoIP telephony
`gateway server 3. As shown in FIG. 2, the VoIP telephony
`gateway server 3 can receive the voice call from the tele(cid:173)
`phone device 1 through the telephone network 2 into a
`telephony interface 11. The telephony interface 11 can
`perform address translation of the address of the intended 65
`recipient of the voice call and can direct the voice call
`accordingly. Specifically, in the, preferred embodiment, a
`
`6
`VoIP Gatekeeper 14 is incorporated in the VoIP telephony
`gateway server 3 in order to provide call management
`functionality to the VoIP telephony gateway server 3. In
`particular, the VoIP Gatekeeper 14 can perform load(cid:173)
`balancing in order to ensure the high-availability of VoIP
`Enabled Speech Servers 5 able to receive the voice call.
`Hence, upon receiving a voice call in the telephony
`interface 11, call control can be passed to the VoIP Gate(cid:173)
`keeper 14 through call control interfaces 13. Notably, the
`VoIP Gatekeeper 14 can communicate with other compo(cid:173)
`nents of the VoIP telephony gateway server 3 through data
`path 17. Moreover, a call control interface 13 can be
`included in the VoIP Gatekeeper 14 in order to control the
`establishment, progress and termination of voice calls pro(cid:173)
`cessed through the VoIP Gatekeeper 14. Because the pre(cid:173)
`ferred implementation of VoIP is an implementation of the
`RTP-based H.323 standard, the call control interfaces 13 are
`H.323-based call control interfaces.
`Subsequently, the control having been passed to the VoIP
`Gatekeeper 14, call processor 16 using advanced call man(cid:173)
`agement functions 15 can examine the status of each VoIP
`Enabled Speech Server 5 in the VoIP Network 4 and identify
`a VoIP Enabled Speech Server 5 in the VoIP network 4 best
`suited to receive the voice call. As a result, the VoIP
`Gatekeeper can choose a suitable VoIP Enabled Speech
`Server 5 and can alert the chosen VoIP Enabled Speech
`Server 5 of the voice call.
`Upon receiving an alert, the chosen VoIP Enabled Speech
`Server 5 can establish a VoIP communications path 18
`between the VoIP telephony gateway server 3 and the VoIP
`Enabled Speech Server 5 through which VoIP-compliant
`packets can be transmitted. Subsequently, the telephony
`interface 11 can digitize audio signals contained in the voice
`call into digitized audio data, compress the digitized audio
`data into VoIP-compliant packets, and transmit the VoIP(cid:173)
`compliant packets to the chosen VoIP Enabled Speech
`Server 5 through the VoIP communications path 18 using the
`VoIP protocol.
`Significantly, the present invention is not limited to the
`particular arrangement of the VoIP telephony gateway server
`3. In particular, the depiction of the VoIP Gatekeeper 14 as
`a separate entity from the remaining components of the VoIP
`telephony gateway server 3 is not meant to limit the inven(cid:173)
`tion as such. Rather, the placement of the VoIP Gatekeeper
`14 in FIG. 2 is intended for illustrative purposes only.
`Additionally, the scope of the invention with regard to the
`VoIP telephony gateway server 3 in combination with the
`VoIP Gatekeeper should be limited only inasmuch as the
`VoIP telephony gateway server 3 can receive a voice call and
`the VoIP Gatekeeper, 14 can perform call management by
`identifying a suitable terminus for the voice call in the VoIP
`network 4.
`FIG. 3 illustrates a preferred architecture for the VoIP
`55 Enabled Speech Server 5 of FIG. 1. The VoIP Enabled
`Speech Server 5 can be implemented in a conventional
`network server which traditionally includes a central pro(cid:173)
`cessing unit (CPU), and internal memory devices, such as
`random access memory (RAM) 21, and fixed storage 22 for
`60 example a hard disk drive (HDD). Because the VoIP Enabled
`Speech Server 5 is speech-enabled, the VoIP Enabled Speech
`Server 5 also includes audio circuitry (not shown) so as to
`provide an audio processing capability to the VoIP Enabled
`Speech Server 5.
`The VoIP Enabled Speech Server 5 can store in the fixed
`storage 22 an operating system 23 upon which various
`applications programs can execute. Additionally, the fixed
`
`CISCO EXHIBIT 1012
`Page 7 of 10
`
`

`

`US 6,654,722 Bl
`
`7
`storage 22 can store therein a speech application 24 and a
`VoIP telephony module 25. The operating system 23 can
`include any suitable operating system, for example
`Microsoft Windows NT®, Sun Solaris® or Debian Linux.
`Notably, the invention is not limited in regard to the arrange(cid:173)
`ment of speech application 24 and telephony module 25 in
`relation to the operating system 23. Rather, each can be
`integrated with the other in various combinations. For
`example, the VoIP telephony module 25 can be integrated in
`the operating system 23. Alternatively, the VoIP telephony
`module 25 can remain independent of the operating system
`23.
`Also, the invention is not limited to the storage location
`of the VoIP telephony module 25, the speech application 24
`and the components thereof. Rather, the present invention
`can be implemented in a more complex distributed system in
`which the various components reside in multiple network
`servers and execute in process address spaces remote from
`one another, each application communicating with other
`applications through well-known interprocess communica(cid:173)
`tion mechanisms, for example TCP/IP. Upon the bootstrap
`of the VoIP Enabled Speech Server 5, the operating system
`23 can load into RAM 21. Subsequently, both the speech
`application 24 and the VoIP telephony module 25 can load
`and execute in RAM 21. Once executing, the VoIP Enabled
`Speech Server 5 is configured to receive a voice call and
`subsequent data over a VoIP communications path.
`The speech application 24 can include a speech recogni(cid:173)
`tion engine 34 and a text-to-speech engine 35. In operation,
`the VoIP Enabled Speech Server 5 can receive VoIP(cid:173)
`compliant packets, reconstruct digitized audio data from the
`VoIP-compliant packets, and speech-to-text convert the digi(cid:173)
`tized audio data in the speech recognition engine 34.
`Conversely, the speech application 24 can synthesize text
`into digitized audio data in the text-to-speech engine 35,
`encapsulate the digitized audio data in VoIP-compliant pack(cid:173)
`ets and transmit the VoIP-compliant packets through the
`VoIP communications path 18 to the VoIP telephony gate(cid:173)
`way server 3.
`In the preferred embodiment, the speech application
`includes a speech browser 30. Notably, the speech browser
`30 can retrieve Web content responsive to voice commands
`which are received through the VoIP communications path
`18, speech-to-text converted by the speech recognition
`engine 34, and interpreted by the speech browser 30. Also,
`the speech browser 30 can transmit received Web content to
`the text-to-speech engine 35 for speech synthesis prior to
`transmitting the speech synthesized audio data through the
`VoIP communications path 18 to the VoIP telephony gate(cid:173)
`way server 3. Significantly, the Web content can be a
`VoiceXML document 8.
`Preferably, the speech application can be implemented
`using standards-based interfaces to VoIP communications
`and the speech recognition and speech synthesis functions. 55
`Specifically, the speech application 24 can include a JSAPI
`speech interface 33 between the speech recognition and
`text-to-speech engines 34, 35 and the speech browser 30.
`Also, the speech application 24 can include a JTAPI tele(cid:173)
`phony interface 31 between the telephony module 25 and the 60
`speech browser 30. Finally, the speech application 24 can
`include a JMF media interface 32 between the telephony
`module 25 and the speech browser 30.
`The JTAPI telephony interface 31 can be used by the
`speech browser 30 to establish a voice call connection for 65
`transporting VoIP-compliant packets containing digital
`audio data between the telephony gateway server 3 and the
`
`8
`speech application 24. The JMF media interface 32 can
`establish a VoIP communications data path for transporting
`the VoIP-compliant packets containing the digital audio data
`between the speech application 24 and the voice call con-
`s nection. The JSAPI speech interface 33 can communicate
`the digitized audio data from the speech application 24 to the
`speech recognition engine 34. Similarly, the JSAPI speech
`interface 33 can communicate speech synthesized audio data
`from the text-to-speech engine 35 to the speech application
`10 24.
`The VoIP-based speech system of the present invention
`allows a user to access a Web site using a telephone. The
`user is audibly presented with prompts describing the Web
`site and the Web site's features. Thus, the presentation of the
`15 VoIP-based speech system is similar to an Integrated Voice
`Response system. During the presentation of the Web site,
`the user can provide spoken commands to the VOiP-based
`speech system in order to select options and input informa(cid:173)
`tion for completing Web-based forms, etc. Advantageously,
`20 the VoIP-based speech system can retrieve Web content
`having, as its page description language, VoiceXML.
`A significant element of the present invention is the
`speech server which can send and receive audio and control
`messages using H.323, a well-known, standard VoIP proto-
`25 col. The use of the VoIP protocol permits speech server to
`remain isolated from other elements of the speech system
`and therefore allows the speech server to be better opti(cid:173)
`mized. The use of an isolated speech server also simplifies
`considerably the development process, since it frees the
`30 speech server from the details of interacting with one of a
`multitude of available telephony hardware implementations.
`Finally, the use of an isolated VoIP enabled speech server
`allows the speech server to be used with any VoIP telephony
`system that supports the standard H.323 protocol, or other
`35 VoIP protocol, without any special development effort.
`A VoIP-based speech system in accordance

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket