`
`(12) United States Patent
`Maes et al.
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 7,003,463 B1
`Feb. 21, 2006
`
`(54) SYSTEM AND METHOD FOR PROVIDING
`NETWORK COORDINATED
`CONVERSATIONAL SERVICES
`
`(75)
`
`Inventors: Stepl1ane H. Maes, Danbury, CT (US);
`Ponani Gopalakrishnan, Yorktown
`Heights, NY (US)
`International Business Machines
`Corporation, Armonk, NY (US)
`
`Assignee:
`
`Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. l54(b) by 0 days.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,544,228 A
`5,594,789 A
`5,774,857 A
`
`8/1996 Wagner et al.
`1/1997 Seazholtz et al.
`6/1998 Newlin
`
`(Continued)
`FOREIGN PATENT DOCUMENTS
`
`0450610 A2
`
`10/1991
`
`(Continued)
`0'l‘l-ll:R PUBLICATIONS
`
`Appl. No.:
`
`09/806,425
`
`PCT Filed:
`
`Oct. 1, 1999
`
`PCT No.:
`
`PCT/US99/22925
`
`§ 371 (CX1),
`(2), (4) Date:
`
`Jun. 25, 2001
`
`PCT Pub. N0.: W000/21075
`
`PCT Pub. Date: Apr. 13, 2000
`
`Related U.S. Application Data
`
`Provisional application No. 60/102,957, filed on Oct.
`2, 1998, provisional application No. 60/117,595, filed
`on Jan. 27, 1999.
`
`Int. Cl.
`G10L 21/00
`G10L 15/00
`G06F 15/16
`
`(2006.01)
`(2006.01)
`(2006.01)
`
`U.S. Cl.
`
`................. .. 704/270.1, 704/231; 709/203;
`709/201
`
`Field of Classification Search ........... .. 704/270.1;
`709/203
`
`Patent Abstract of Japan, [or publication No.: 09—098221.
`
`(Continued)
`
`Primary Examiner—W. R. Young
`Assistant Examirzer—Matthew J. Sked
`(74) Attorney, Agem, or Firm—Frank V. DeRosa; F.Chau &
`Associates,LLC
`(57)
`
`ABSTRACT
`
`A system and method for providing automatic and coordi-
`nated sharing of conversational resources, e.g., functions
`and arguments, between network-connected servers and
`devices and their corresponding applications. In one aspect,
`a system for providing automatic and coordinated sharing of
`conversational resources inlcudes a network having a first
`and second network device, the first and second network
`device each comprising a set of conversational resources, a
`dialog manager for managing a conversation and executing
`calls requesting a conversational service, and a communi-
`cation stack for communicating messages over the network
`using conversational protocols, wherein the conversational
`protocols establish coordinated network communication
`between the dialog managers of the first and second network
`device to automatically share the set of conversational
`resources of the first and second network device, when
`necessary, to perform their respective requested conversa-
`tional service.
`
`See application file for complete search history.
`
`6 Claims, 5 Drawing Sheets
`
`Gmvmcllmd Dvnmflx
`mum»
`Inytflllin
`an
`m ‘
`iiunhimuruhim I
`
`'
`
`‘vac
`I‘ — ‘ — — — - ‘ - ' “_ _ ‘ _ - _ W51
`1!
`r-‘ Ar»!
`ma
`
`GOOGLE EXHIBIT 1005
`
`Page 1 of 17
`
`
`
`US 7,003,463 B1
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`FOREIGN PATENT DOCUMENTS
`
`5,943,648
`5,956,683
`5,960,399
`6,098,041
`6,119,087
`6,173,259
`6,195,641
`6,282,268
`6,282,508
`6,327,568
`6,353,348
`6,408,272
`6,456,974
`6,594,628
`6,615,171
`2005/0131704
`
`'
`
`8/1999 Tel ........................ .. 704/270.1
`9/1999 Jacobs et al.
`.......... .. 704/270.1
`9/1999 Barclay et al.
`8/2000 Matsumoto ............... .. 704/260
`9/2000 Kuhn et al.
`..
`704/270
`1/2001
`et £11.
`704/275
`2/2001 Loring et al.
`379/88.03
`8/2001 Hughes etal. ..
`704/10
`8/2001 Kirnura et al.
`704/270.1
`12/2001 Joost
`......... ..
`,,,,,,,,, ,, 704/270.1
`3/2002 Besfing et a1,
`6/2002 VVhite et al.
`........... .. 704/270.1
`9/2002 Baker et al.
`.
`704/270.1
`7/2003 Jacobs ct al.
`............. .. 704/231
`9/2003 Kanevsky et al.
`........ .. 704/246
`6/2005 Dragosh et al.
`....... .. 704/270.1
`
`'
`
`0654930 A1
`09.093221
`10.207555
`10214258
`10228431
`‘V0 97/47122
`'
`
`5/1995
`4/1997
`3/1998
`8/1998
`8/1998
`12}/1997
`
`OTHER PUBLICATIQNS
`‘
`‘
`_
`_
`Patent Abstract 01 Japan, for publication N0.: 10-207683.
`Patent Abstract of Japan, for publication N0.: 10-214258.
`Patent Abstract of Japan, for publication N0.: l0—22843l.
`
`* cited by examiner
`
`Page 2 of 17
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 1 0f5
`
`US 7,003,463 B1
`
`Local App]icaIion(s)
`
`104
`
`103
`
`Local Dialog Manager
`
`Clifilll Communicaiion
`112
`
`‘ Conversalional Proiocois ‘
`
`"3
`
`Conversalional Discovery,
`Regisiralion and
`Negolialion Prolocols
`
`102
`
`Y
`be I C
`°m.'ers°'°n°
`0
`Engmes
`
`‘
`
`Fea.iu'res/
`
`Local Audio Capture and
`Acouslic Front End
`
`|_ _ _ _ _ _ _ _ _ _ _ _ __
`I 10
`
`Networked Server
`
`Conversolionai Engines
`
`_ _ _ _ _ _ ___c’_":‘.l______.._
`
`Server AppIicalion(s)
`
`'09
`
`108
`
`Sewer Dialog Manager
`
`Server Coramunicalion
`116
`
`& Conversalional Prolocols ‘
`
`"7
`
`Conversalionol Discovery,
`Regislralion and
`Negolialion Prolocols
`
`107
`
`sewer cmemfionai
`.
`Engines
`
`I-
`
`_—-_._1——...—-——.—-——_—.__—...._.—————-——-
`
`Page 3 of 17
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 2 0f5
`
`US 7,003,463 B1
`
`Request: from Local Appucocam
`
`momrmm
`/
`|\VavdormfoRemoieSmer
`;
`._.?.___J
`
`Features/Wardonn/Rcsuiis lo 1
`
`Anchor For Processing»
`
`Page 4 of 17
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 3 0f5
`
`US 7,003,463 B1
`
`Receive inpui Speech Locally or
`Requesis From Local Application
`
`Local Processing?
`
`Aliacale Local Canversafional
`Engine To A Perl
`
`Ailocole Local Engine To Anolher
`Peri if Nal Curreniiy Used
`
`Allacaie Anolher Engine Io
`Original Peri
`if Local Engine
`is iloi Available
`
`15 Remote gem;
`gmoadedg
`
`Seleci Analher Server/Device For
`Remoie Processing
`
`No
`
`307
`
`Perform Remaie Processing
`
`3
`
`Page 5 of 17
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`5f04teehS
`
`US 7,003,463 B1
`
`\._E_
`
`§__.___...._n_
`
`
`
`A.2§_e..
`
`___E=§§=8.
`
`2:.
`
`3»
`
`__§.._w_;g.
`
`BE___%
`
`_83
`
`Page 6 of 17
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`5f05teehS
`
`US 7,003,463 B1
`
`..i2225.8
`
`scam§.§_m
`
`osfis
`
`.83
`
`j8_.%_>
`
`u._..mac_
`
`_2a__§:._8
`
`..._322._
`
`3a_%a3332.,
`
`\.E._3.8
`
`_§____.22..._5
`
`
`
`330$.
`
`¢%E.
`
`.E__._%a
`
`_2s__=§.._8
`
`.,,.._§_e..
`
`Page 7 of 17
`
`
`
`US 7,003,463 B1
`
`1
`SYSTEM AND METHOD FOR PROVIDING
`NETVVORK COORDINATED
`CONVERSATIONAL SERVICES
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`This application is a U.S. National Phase Application filed
`under 35 U.S.C. 371 based on International Application No.
`PCT/US99/22925, filed on Oct. 1, 1999, which is based on
`provisional applications U.S. Ser. No. 60/102,957, filed on
`Oct. 2, 1998, and U.S. Ser. No. 60/117,595 filed on Jan. 27,
`1999.
`
`BACKGROUND
`
`1. Technical Field
`The present application relates generally to conversa-
`tional systems and, more particularly,
`to a system and
`method for automatic and coordinated sharing of conversa-
`tional
`functions/resources
`between
`network-connected
`devices, servers and applications.
`2. Description of Related Art
`Conventional conversational systems (i.e., systems witl1
`purely voice 1/O or multi-modal systems with voice I/O) are
`typically limited to personal computers (PCs) and local
`machines having suitable architecture and sufficient process-
`ing power. On the other hand, for telephony applications,
`conversational systems are typically located on a server
`(e.g., the IVR server) and accessible via a conventional and
`cellular phones. Although such conversational systems are
`becoming increasingly popular, typically all the conversa-
`tional processing is performed either on the client side or on
`the server side (i.e., all the configurations are either f11lly
`local or fully client/server).
`is
`it
`With the emergence of pervasive computing,
`expected that billions of low resource client devices (e .g.,
`PDAs, smartphones, etc.) will be networked together. Due to
`the decreasing size of these client devices and the increasing
`complexity of the tasks that users expect such devices to
`perform, the user interface (UI) becomes a critical issue
`since conventional graphical user interfaces GUI) on such
`small client devices would be impractical. For this reason, it
`is to be expected that conversational systems will be key
`element of the user interface to provide purely speech/audio
`I/O or multi-modal I/O with speech/audio I/0.
`Consequently, speech embedded conversational applica-
`tions in portable client devices are being developed and
`reaching maturity. Unfortunately, because of
`limited
`resources, it is to be expected that such client devices may ,
`not be able to perform complex conversational services such
`as, for example, speech recognition (especially wl1en the
`vocabulary size is large or specialized or when domain
`specific/application specific language models or grammars
`are needed), NLU (natural language understanding), NLG
`(natural language generation), TTS (text-to-speech synthe-
`sis), audio capture and compression/decompression, play-
`back, dialog generation, dialog management, speaker rec-
`ognition, topic recognition, and audio/multimedia indexing
`and searching, etc. For instance, the memory and CPU (and
`other resource) limitations of a device can limit the conver-
`sational capabilities that such device can offer.
`Moreover, even if a networked device is “powerful”
`enough
`terms of CPU and memory) to execute all these
`conversational tasks, the device may not have the appropri-
`ate conversational resources (e.g., engines) or conversa-
`tional arguments (i.e, the data files used by the engines)
`
`2
`(such as grammars, language models, vocabulary files, pars-
`ing, tags, voiceprints, TTS mles, etc.) to perform the appro-
`priate task. Indeed, some conversational functions may be
`too specific and proper to a given service, thereby requiring
`back end information that
`is only available from other
`devices or machines on the network. For example, NLU and
`NLG services on a client device typically require server-side
`assistance since the complete set of conversational argu-
`ments or functions needed to generate the dialog (e.g.,
`parser, tagger, translator, etc.) either require a large amount
`of memory for storage (not available in the client devices) or
`are too extensive (in terms of communication bandwidth) to
`transfer to the client side. This problem is further exacer-
`bated with multi-lingual applications when a client device or
`local application has insuflicient memory or processing
`power to store and process the arguments that are needed to
`process speech and perform conversational functions in
`multiple languages. Instead, the user must manually connect
`to a remote server for performing such tasks.
`Also, the problems associated with a distributed architec-
`ture and distributed processing between client and servers
`requires new methods for conversational networking. Such
`methods comprise management of traffic and resources
`distributed across the network to guarantee appropriate
`dialog flow of for each user engaged in a conversational
`interaction across the network.
`Accordingly, a system and method that allows a network
`device with limited resources to perform complex specific
`conversational
`tasks
`automatically
`using
`networked
`resources in a manner which is automatic and transparent to
`a user is highly desirable.
`
`SUMMARY OF THE INVENTION
`
`The present invention is directed to a system and method
`for providing automatic and coordinated sharing of conver-
`sational resources between network-connected servers and
`devices (and their corresponding applications). A system
`according to one embodiment of the present
`invention
`comprises a plurality of networked servers, devices and/or
`applications that are made “conversationally aware” of each
`other by communicating messages using conversational
`network protocols (or methods) that allow each conversa-
`tionally aware network device to automatically share con-
`versational resources automatically and in a coordinated and
`synchronized manner so as to provide a seamless conver-
`sational interface through an interface of one of the network
`devices.
`In accordance with one aspect of the present invention, a
`system for providing automatic and coordinated sharing of
`conversational resources comprises:
`a network comprising at least a first and second network
`device;
`the first and second network device each comprising
`a set of conversational resources;
`a dialog manager for managing a conversation and
`executing calls requesting a conversational service;
`and
`
`.
`
`a communication stack for communicating messages
`using conversational protocols over the network,
`wherein the messages communicated by the conver-
`sational protocols establish coordinated network
`communication between the dialog managers of the
`first and second device to automatically share the set
`of conversational resources of the first and second
`network device, when necessary,
`to perform their
`respective requested conversational service.
`
`Page 8 of 17
`
`
`
`US 7,003,463 B1
`
`3
`The present invention allows a low resource client device
`to transparently perform simple tasks locally, as well as
`complex tasks in binary or analog connection witl1 a server
`(or other device) having more complex conversational capa-
`bilities. The server-side functions (such as speech recogni-
`tion) can be performed through a regular IP network or LAN
`network as well as via digital transmission over a conven-
`tional telephone line or a packet switched network, or via
`any conventional wireless data protocol over a wireless
`network.
`
`a full
`invention offers
`the present
`Advantageously,
`fledged conversational user interface on any device (such as
`a pervasive embedded device) with limited CPU, memory
`and power capabilities (as well as limited conversational
`resources), which provides complex conversational services
`using a low resource client device without
`the need to
`download, for example, the necessary conversational argu-
`ments from a network server. The local capabilities allows
`the user to utilize the local device without requiring con-
`nection, e.g., outside coverage of a wireless phone provider.
`Also, the cost of a continuous connection is reduced and the
`di iculties of recoveries when such continuous connections
`are lost can be mitigated.
`These and other aspects, features and advantages of the
`present invention will be described and become apparent
`from the following detailed description of preferred embodi-
`ments, which is to be read in connection with the accom-
`panying drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`'
`
`'
`
`FIG. 1 is a block diagram of a system for providing
`conversational services via automatic and coordinated shar-
`ing of conversational resources between networked devices
`according to an embodiment of the present invention;
`FIG. 2 is a flow diagram of a method for providing
`conversational services via automatic and coordinated shar-
`ing of conversational resources between networked devices
`according to one aspect of the present invention;
`FIG. 3 is a flow diagram of a method for providing
`conversational services via automatic and coordinated shar-
`ing of conversational resources between networked devices
`according to another aspect of the present invention;
`FIG. 4 is a block diagram of a distributed system for
`providing conversational services according to another
`embodiment of the present invention employing a conver-
`sational browser; and
`FIG. 5 is a block diagram of a distributed system for
`providing conversational services according to another g
`embodiment of the present invention employing a conver-
`sational browser.
`
`DETAILED DESCRIPTION OF PREFERRED
`EMBODIMENTS
`
`It is to be understood that the present invention may be
`implemented in various forms of hardware, software, firin-
`ware, special purpose processors, or a combination thereof.
`Preferably, the present invention is implemented in software
`as an application comprising program instructions that are
`tangibly embodied on a program storage device (e.g., mag-
`netic floppy disk, RAM, CD ROM, ROM and Flash
`memory) and executable by any device or machine com-
`prising suitable architecture such as one or more central
`processing units (CPU), a random access memory (RAM),
`and audio input/output (I/O) i11terface(s).
`
`4
`It is to be further understood that, because some of the
`constituent system components and method steps depicted
`in the accompanying Figures are preferably implemented in
`software, the actual connections between the system com-
`ponents (or the process steps) may differ depending upon the
`manner in which the present
`invention is programmed.
`Given the teachings herein, one of ordinary skill in the
`related art will be able to contemplate these and similar
`implementations or configurations of the present invention.
`Referring now to FIG. 1, a block diagram illustrates a
`system for providing conversational services through the
`automatic
`and coordinated sharing of conversational
`resources and conversational arguments (data files) between
`networked devices according to an exemplary embodiment
`of the present invention. The system comprises a local client
`device 100 comprising an acoustic front end 101 for pro-
`cessing audio/speech input and outputting audio/speech gen-
`erated by the client device 100. The client device 100 may
`be, for example, a smartphone or any speech—enabled PDA
`(personal digital assistant). The client device 100 further
`comprises one or more local conversational engines 102 for
`processing the acoustic features and/or waveforms gener-
`ated and/or captured by the acoustic front-end 101 and
`generating dialog for output to the user. The local conver-
`sational engines 102 can include, for instance, an embedded
`speech recognition, a speaker recognition engine, a TTS
`engine, a NLU and NLG engine and an audio capture and
`compression/decompression engine as well as any other type
`of conversational engine.
`The client device 100 further comprises a local dialog
`manager 103 that performs task management and controls
`and coordinates the execution of a conversational service
`
`(either locally or via a network device) that is requested via
`a system call (API or protocol call), as well as managing the
`dialog locally and with networked devices. More specifi-
`cally, as explained in greater detail below, the dialog man-
`ager 103 determines whether a given conversational service
`is to be processed and executed locally on the client 100 or
`on a remote network-connected server (or device). This
`determination is based on factors such as the conversational
`capabilities of the client 100 as compared with the capabili-
`ties of other networked devices, as well
`the available
`resources and conversational arguments that may be neces-
`sary for processing a requested conversational service. Other
`factors include network tra ic and anticipated delays in
`receiving results from networked devices. The dialog man-
`ager 103 performs task management and resource manage-
`ment tasks such as load management and resource alloca-
`tion, as well as managing the dialog between the local
`. conversational engines 102 and speech—enabled local appli-
`cations 104.
`As shown in FIG. 1 by way of example, the client device
`100 is network-connected via network 105 to a server 106
`that comprises server applications 109, as well as server
`conversational engines 107 for providing conversational
`services to the client device 100 (or any other network
`device or application) as necessary. As with the local engines
`102, the server engines 107 can include, for instance, an
`embedded speech recognition, a TTS engine, a NLU and
`NLG engine, an audio capture and compression/decompres-
`sion engine, as well as any other type of conversational
`engine. The server 106 comprises a server dialog manager
`108 which operates in a manner similar to the local dialog
`manager 103 as described above. For example, the server
`dialog manager 108 determines whether a request for a
`conversational service from the local dialog manager 103 is
`to be processed a11d executed by the server 106 or on another
`
`Page 9 of 17
`
`
`
`US 7,003,463 B1
`
`5
`remote network—connected server or device. In addition, the
`server dialog manager 108 manages the dialog between the
`server conversational engines 107 and speech-enabled
`server applications 109.
`The system of FIG. 1 further illustrates the client device
`100 and the remote server 106 being network—connected to
`a server 110 having conversational engines and/or conver-
`sational arguments that are accessible by the client 100 and
`server 106 as needed. The network [05 may be, for example,
`the Internet, a LAN (local area network), and corporate
`intranet, a PSTN (public switched telephone network) or a
`wireless network (for wireless communication via RF (radio
`frequency), or IR (infrared).
`It
`is to be understood that
`although FIG. 1 depicts an client/server system as that term
`is understood by those skilled in the art, the system of FIG.
`1 can include a plurality of networked servers, devices and
`applications that are “conversationally aware” of each other
`to provide automatic and coordinated sharing of conversa-
`tional functions, arguments and resources. As explained in
`further detail below, such “conversational awareness” may
`be achieved using conversational network protocols (or
`methods) to transmit messages that are processed by the
`respective dialog managers to allow the networked devices
`to share conversational resources and functions in an auto-
`matic and synchronized manner. Such conversational coor-
`dination provides a seamless conversational interface for
`accessing remote servers, devices and applications through
`the interface of one network device.
`In particular,
`to provide conversational coordination
`between the networked devices to share their conversational
`functions, resources and arguments, each of the networked
`devices communicate messages using conversational proto-
`cols (or methods) to exchange information regarding their
`conversational capabilities and requirements. For instance,
`as shown in FIG. 1,
`the client device 100 comprises a
`communication stack 111 for transmitting and receiving
`messages using conversational protocols 112, conversa-
`tional discovery, registration and negotiation protocols 113
`and speech transmission protocols 114 (or conversational
`coding protocols). Likewise,
`the server 106 comprises a
`server communication stack [15 comprising conversational
`protocols 116, conversational discovery, registration and
`negotiation protocols 117 and speech transmission protocols
`118. These protocols (methods) are discussed in detail with
`respect to a CVM (conversational virtual machine) in the
`patent application IBM Docket No. YO999-IIIP, filed con-
`currently herewith, entitled “Conversational Computing Via
`Conversational Virtual Machine,” (i.e., International Appl.
`No. l"C'l‘/US99/22927, filed on Oct. 1, 1999 and correspond-
`ing US. patent application Ser. No. 09/806,565) which is .
`commonly assigned and incorporated herein by reference.
`Briefly, the conversational protocols 112, 116 (or what is
`referred to as “distributed conversational protocols” in
`YO999-111P) are protocols (or methods) that allow the
`networked devices (e.g., client 100 and server 106) or
`applications to transmit messages for registering their con-
`versational state, arguments and context with the dialog
`managers of other network devices. The conversational
`protocols I12, 116 also allow the devices to exchange other
`information such as applets,ActiveX components, and other
`executable code that allows the devices or associated appli-
`cations to coordinate a conversation between such devices
`in, e.g., a master/slave or peer-to-peer conversational net-
`work configuration. The distributed conversational protocols
`112, 116 allow the exchange of information to coordinate the
`conversation involving multiple devices or applications
`including master/salve conversational network, peer conver-
`
`'
`
`.
`
`6
`sational network, silent partners. The information that may
`be exchanged between networked devices using the distrib-
`uted conversational protocols comprise, pointer to data files
`(arguments),
`transfer (if needed) of data files and other
`conversational arguments, notification for
`input, output
`events and recognition results, conversational engine API
`calls and results, notification of state and context changes
`and other system events, registration updates: handshake for
`registration, negotiation updates: handshake for negotiation,
`and discovery updates when a requested resources is lost.
`The (distributed) conversational protocols also comprise
`dialog manager (DM) protocols which allow the dialog
`mangers to distribute services, behavior and conversational
`applications, I/() and engine Al’Is such as described in IBM
`Docket No. Y0999-111P. For instance, the DM protocols
`allow the following information to be exchanged: (1) DM
`architecture registration (e.g., each DM can be a collection
`of locals DMs); (2) pointers to associated meta-information
`(user, device capabilities, application needs, etc.);
`nego-
`tiation of DM network topology (e.g., master/slave, peer-
`to-peer); (4) data files (conversational arguments) if appli-
`cable i.e., if engines are used that are controlled by a master
`DM); (5) notification of I/O events such as user input,
`outputs to users for transfer to engines and/or addition to
`contexts; (6) notification of recognition events; (7) transfer
`of processed input from engines to a master DM; (8) transfer
`of responsibility of master DM to registered DMs; (9) DM
`processing result events; (10) DM exceptions; (11) transfer
`of confidence and ambiguity results, proposed feedback and
`output, proposed expectation state, proposed action, pro-
`posed context changes, proposed new dialog state; (12)
`decision notification, context update, action update, state
`update, etc; (13) notification of completed, failed or inter-
`rupted action; (14) notification of context changes; and/or
`(15) data files, context and state updates due to action.
`For instance, in master-slave network configuration, only
`one of the networked devices drives the conversation at any
`given time. In particular, the master device (i.e., the dialog
`manager of the master device) manages and coordinates the
`conversation between the network devices and decides
`which device will perform a given conversational service or
`function. This decision can based on the information pro-
`vided by each of the devices or applications regarding their
`conversational capabilities. This decision may also be based
`on the master determining which slave device (having the
`necessary conversational capabilities) can perform the given
`conversational function most optimally. For instance,
`the
`master can request a plurality of slaves to perform speech
`recognition and provide the results to the master. The master
`can then select the optimal results. It is to be understood that
`what is described here at the level of the speech recognition
`is the niechanisni at the level of the DM (dialog manager)
`protocols between distributed dialog managers (as described
`in Y0999-IIIP). Indeed when dialog occurs between mul-
`tiple dialog managers, the master will obtain measure of the
`score of the results of each dialog manager and a decision
`will be taken accordingly to see which dialog manager
`proceeds with the input, not only on the basis of the speech
`recognition accuracy, but based on the dialog (meaning),
`context and history (as well as other items under consider-
`ation, such as the preferences of the user, the history, and the
`preferences of the application.
`In peer-to-peer connections, each device will attempt to
`determine the functions that it can perform and log a request
`to do so. The device that has accepted the task will perform
`
`Page 10 of 17
`
`
`
`US 7,003,463 B1
`
`7
`such task and then score its performance. The devices will
`then negotiate which device will perform the task based on
`their scores.
`In one embodiment, the distributed conversational proto-
`cols l12, 116 are implemented via RMI (remote method
`invocation) or RPC (remote procedure call) system calls to
`implement the calls between the applications and the dif-
`ferent conversational engines over the network. As is known
`in the art, RPC is a protocol that allows one application to
`request a service from another application across the net-
`work. Similarly, RMI is a method by which objects can
`interact in a distributed network. RMI allows one or more
`objects to be passed along with the request. In addition, the
`information can be stored in an object which is exchanged
`via CORBA or DCOM or presented in a declarative manner
`(such as via XML). As discussed in the above-incorporated
`patent application IBM Docket No. YO999-1111’, conversa-
`tional protocols (methods) (or the distributed protocols) can
`be used for achieving distributed implementation of conver-
`sational functions supported by a CVM (conversational
`virtual machine) shell between conversational applications
`and the CVM shell via conversational Al’Is or between the
`CVM and conversational engines via conversational engine
`APIs. The conversational engine APIs
`are
`interfaces
`between the core engines and applications using them and ,
`protocols to communicate with core engines (local and/or
`networked). The conversational APIs provide an API layer
`to hook or develop conversationally aware applications,
`which includes foundation classes and components to build
`conversational user interfaces.
`in accordance with the
`Similarly,
`a dialog manager
`present invention can communicate via APIs with applica-
`tions and engines (local and/or networked). In this manner,
`a dialog manager can act on the results and call backs from
`all remote procedures (procedural calls to remote engines
`and applications) as if it was a local application so as to, e.g.,
`arbitrate between the applications and resources (local and,’
`or networked) to prioritize and determine the active appli-
`cation, and determine which result to consider as active.
`The conversational discovery, registration and negotiation
`protocols 113, 117 are network protocols (or methods) that
`are used to “discover” local or network conversationally
`aware systems (i.e. applications or devices that “speak”
`conversational protocols). The registration protocols allow
`devices or applications to register their oonversational capa-
`bilities, state and arguments. The negotiation protocols
`allow devices to negotiate master—slave, peer—to—peer or
`silent partner network.
`In one embodiment, the discovery protocols implement a
`“broadcast and listen” approach to trigger a reaction from .
`other “broadcast and listen” devices. This can allow, for
`instance, the creation of dynamic and spontaneous networks
`(such as Bluetooth and Hopping networks discussed below).
`In another embodiment, a default server (possibly the mas-
`ter) setting can be used which registers the “address” of the
`di erent network devices. In this embodiment, the discovery
`amounts to each device in the network communicating with
`the server to check the list of registered devices so as to
`determine which devices connect
`to such devices. The
`in ormation that is exchanged via the discovery protocols
`comprises the following: (1) broadcast requests for hand-
`shake or listening for requests;
`exchange of device
`identifiers; (3) exchange of handles/pointer for first regis-
`tration; and (4) exchange of handles for first negotiation.
`In one embodiment for implementing the registration
`protocols, upon connection, the devices can exchange infor-
`mation about their conversational capabilities with a prear-
`
`8
`ranged protocol (e.g., TTS English, any text, Speech recog-
`nition, 500 words+FSG grammar, no speaker recognition,
`etc.) by exchanging a set of flags or a device property object.
`Likewise, applications can exchange engine requirement
`lists. With a master/slave network configuration, the master
`dialog manager can compile all
`the lists and match the
`functions and needs with conversational capabilities. In the
`absence of a master device (dialog manager), a common
`server can be used to transmit the conversational informa-
`tion to each machine or device in the network. The regis-
`tration protocols allow the following information to be
`exchanged:
`capabilities and load messages including
`definition and update events; (2) engine resources (whether
`a given device includes NLU, DM, NLG, TTS, speaker
`recognition, speech recognition compression, coding, stor-
`age, etc.); (3) I/O capabilities; (4) CPU, memory, and load
`capabilities; (5) data file types (domain specific, dictionary,
`language models, languages, etc.);
`network addresses
`and features; (7) information about a user (definition and
`' update events); (8) user preferences for the device, applica-
`tion or dialog;
`customization; 10) user experience, 11)
`help;
`(12) capability requirements per application (and
`application state) (definition and update events); (13) meta
`information for CUI services and behaviors (help files,
`categories, conversational priorities, etc.) (definition and
`update events, typically via pointer to table); (14) protocol
`handshakes; and/or (15) topology negotiation.
`Registration may be performed using a traditional com-
`munication protocol such as TCP/IP, TCP/IP 29, X—10 or
`CEBus, and socket communication between devices. The
`devices use a distributed conversationa