`Julia et al.
`
`USOO651.3063B1
`(10) Patent No.:
`US 6,513,063 B1
`(45) Date of Patent:
`Jan. 28, 2003
`
`(54) ACCESSING NETWORK-BASED
`ELECTRONIC INFORMATION THROUGH
`SCRIPTED ONLINE INTERFACES USING
`SPOKEN INPUT
`(75) Inventors: Luc Julia, Menlo Park, CA (US);
`Dimitris Voutsas, Thessaloniki (GR);
`Adam Cheyer, Palo Alto, CA (US)
`(73) Assignee: SRI International, Menlo Park, CA
`(US)
`
`O
`
`O
`
`O
`
`-
`
`0
`
`9/1998 Eberman et al. .............. 395/12
`5,805,775 A
`5,855,002 A 12/1998 Armstrong .................. 704/270
`E. A &EC By s
`- - - - - 7:
`2- Y-a-2
`1Cldy el al. . . . . . . . . . . . . . . . . . . .
`6,003,072 A 12/1999 Gerritsen et al. ........... 709/218
`6,012,030 A 1/2000 French-St. George
`... 707/1
`6,026,388 A
`2/2000 Liddy et al. .
`... 707/5
`6,173,279 B1 * 1/2001 Levin et al. .....
`6,192,338 B1
`2/2001 Haszto et al. ............... 704/257
`FOREIGN PATENT DOCUMENTS
`
`et al. .......................... 704/275
`
`(*) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`EP
`EP
`
`(21) Appl. No.: 09/524,868
`(22) Filed:
`Mar 14, 2000
`
`2/1999 ............ HO4M/3/50
`O 895 396 A2
`4/2001
`........... G06F/17/30
`1 O94 406 A2
`OTHER PUBLICATIONS
`http://www.ai. Sri.com/natural-language/projects/arpa-Sls/
`nat-lang.html "Gemini: A Natural Language System for
`Spoken-Language Understanding” “Interleaving Syntax
`and Semantics in an Efficient Bottom-Up Parser,”.
`http://www.ai. Sri.com/natural-language/projects/arpa-Sls/
`Related U.S. Application Data
`spnl-int.html “Combining Linguistic and Statistical Knowl
`edge Sources in Natural Language Processing for ATIS".
`(63) Continuation-in-part of application No. 09/225,198, filed on
`Jan. 5, 1999.
`(List continued on next page.)
`(60) Final application No. 60/124,718, filed on Mar 17,
`, provisional application No. 60/124,720, filed on Mar.
`Primary Examiner David Wiley
`NE" an evisional application No. 60/124,719, filed
`(74) Attorney, Agent, or Firm Moser, Patterson &
`Sheridan, LLP, Kin-Wah Tong, Esq.
`(51) Int. Cl. ................................................ Gos 1300
`(57)
`ABSTRACT
`(52) U.S. Cl. ........................................ 709/219; 709/227
`(58) Field of Search .................................. 709/219, 227 A System, method, and article of manufacture are provided
`(56)
`References Cited
`for navigating an electronic data Source that has a Scripted
`online interface by means of Spoken input. When a Spoken
`U.S. PATENT DOCUMENTS
`request is received from a user, it is interpreted. A navigation
`5,197,005 A 3/1993 Schwartz et al. ........... 364419
`query is constructed based on the interpretation of the
`5,386,556 A
`1/1995 Hedin et al. ......
`... 395/600
`Speech input and a template extracted by Scraping an online
`5.434,777 A
`7/1995 Luciw ........................ 364/419
`Scripted interface to the data Source. The resulting interpre
`5,519,608 A 5/1996 Kupiec ...
`... 364/419.08
`tation of the request is thereupon used to automatically
`5,608,624 A 3/1997 Luciw ........................ 395/794
`construct an operational navigation query to retrieve the
`5,721,938 A 2/1998 Stuckey ..
`... 395/754
`desired information from one or more electronic network
`5,729,659 A 3/1998 Potter ........
`... 395/2.79
`data Sources, which is then transmitted to a client device of
`5,748,974 A 5/1998 Johnson ........
`... 395/759
`the user
`5,774,859 A
`6/1998 Houser et al. ....
`... 704/275
`5,794,050 A 8/1998 Dahlgren et al. .
`... 395/708
`5,802,526 A 9/1998 Fawcett et al. ............. 707/104
`
`102 Claims, 7 Drawing Sheets
`
`A. 92
`
`RECEIVE SPOKEN NL REQUESF
`-
`- -
`NTERPRE REQUEST
`
`iDENTIFYISELECT DATASOURC
`s
`406 CONSTRUCT NAVIGATION GUERY
`
`407 KOFICIENCES
`
`a O 8
`
`NAW16ATE DATA SURC
`
`
`
`4 9
`
`REN
`QUERY?
`
`MO
`TRANSMAmicosplay To
`CENT
`
`4. 10
`
`SOCIt
`ADOTIONAL
`(MULTIMODAL)
`JSE INP
`
`412
`
`Comcast - Exhibit 1012, page 1
`
`
`
`US 6513,063 B1
`Page 2
`
`OTHER PUBLICATIONS
`http://www.ai.Sri.com/-oaa.applications.html “InfoWiz: An
`Animated Voice Interactive Information System”.
`“Com
`http://www.ai. Sri.com/~leSaf/commandtalk.html:
`mand Talk: A Spoken-Language Interface for Battlefield
`Simulations”, 1997, by Robert Moore, John Dowding, Harry
`Bratt, J. Mark Gawron, Yonael Gorfu and Adam Cheyer, in
`“Proceedings of the Fifth Conference on Applied Natural
`Language Processing”, Washington, DC, pp. 1-7, ASSocia
`tion for Computational Linguistics.
`“The Command Talk Spoken Dialogue System”, 1999, by
`Amanda Stent, John Dowding, Jean Mark Gawron, Eliza
`beth Owen Bratt and Robert Moore, in “Proceedings of the
`Thirty-Seventh Annual Meeting of the ACL, pp. 183-190,
`University of Maryland, College Park, MD, Association for
`Computational Linguistics.
`http://www.ai.Sri.com/-lesaf/commandtalk.html “Interpret
`ing Language in Context in Command Talk', 1999, by John
`Dowding and Elizabeth Owen Bratt and Sharon Goldwater,
`in “Communicative Agents: The Use of Natural Language in
`Embodied Systems”, pp. 63-67, Association for Computing
`Machinery (ACM) Special Interest Group on Artificial Intel
`ligence (SIGART), Seattle, WA.
`
`Stent, Amanda et al., “The Command Talk Spoken Dialogue
`System', SRI International.
`Moore, Robert et al., “Command Talk: A Spoken-Language
`Interface for Battlefield Simulations”, Oct. 23, 1997, SRI
`International.
`Dowding, John et al., “Interpreting Language in Context in
`Command Talk', Feb. 5, 1999, SRI International.
`http://www.ai. Sri.com/-oaa/infowiz.html, InfoWiz: An Ani
`mated Voice Interactive Information System, May 8, 2000.
`Dowding, John, “Interleaving Syntax and Semantics in an
`Efficient Bottom-up Parser', SRI International.
`Moore, Robert et al., “Combining Linguistic and Statistical
`Knowledge Sources in Natural-Language Processing for
`ATIS', SRI International.
`Dowding, John et al., “Gemini: A Natural Language System
`For Spoken-Language Understanding', SRI International.
`Wyard, P.J. et al., “Spoken Language Systems-Beyond
`Prompt and Response', BT Technology Journal, vol. 14, No.
`1, 1996, pp. 187–207.
`Notification of Transmittal of The International Search
`Report and the International Search Report, dated Sep. 7,
`2001, filed in International Applin. No. PCT/US01/07924.
`* cited by examiner
`
`Comcast - Exhibit 1012, page 2
`
`
`
`U.S. Patent
`
`Jan. 28, 2003
`
`Sheet 1 of 7
`
`US 6,513,063 B1
`
`104
`
`1 O2
`
`--
`
`oS)
`
`Network
`1 O6
`
`112
`
`108
`300 (see Fig. 3)
`11 O
`
`108n
`
`Fig. 1 a
`
`Comcast - Exhibit 1012, page 3
`
`
`
`U.S. Patent
`
`Jan. 28, 2003
`
`Sheet 2 of 7
`
`US 6,513,063 B1
`
`
`
`Comcast - Exhibit 1012, page 4
`
`
`
`U.S. Patent
`
`Jan. 28, 2003
`
`Sheet 3 of 7
`
`US 6,513,063 B1
`
`Network
`2O6
`
`2O8
`300 (see Fig. 3)
`21 O
`
`'-
`
`es Ds
`
`2 O 4.
`
`Fig. 2
`
`Comcast - Exhibit 1012, page 5
`
`
`
`U.S. Patent
`
`Jan. 28, 2003
`
`Sheet 4 of 7
`
`US 6,513,063 B1
`
`REQUEST PROCESSING LOGIC 300
`
`
`
`SPEECH RECOGNITION
`ENGINE
`
`NATURAL ANGUAGE
`PARSER
`
`QUERY CONSTRUCTION
`LOGIC
`
`OUERY REF NEMENT LOGIC
`
`Fig. 3
`
`Comcast - Exhibit 1012, page 6
`
`
`
`U.S. Patent
`
`Jan. 28, 2003
`
`Sheet 5 of 7
`
`US 6513,063 B1
`
`4O2 RECEIVE SPOKEN NL RECRUEST
`
`4 O
`4
`
`INTERPRET REQUEST
`
`4 ---
`O5 IDENTFY/SELECT DATA SOURCE
`
`
`
`4O6 CONSTRUCT NAVIGATION OUERY
`
`
`
`
`
`
`
`
`
`SOLICIT
`ADDITIONAL
`(MULTIMODAL)
`USER INPUT
`
`412
`
`408
`
`NAVIGATE DATA SOURCE
`
`
`
`409
`
`REFINE
`OUERY2
`
`NO
`
`41 O
`--
`
`TRANSMIT AND DISPLAY TO
`CLENT
`
`Fig. 4
`
`Comcast - Exhibit 1012, page 7
`
`
`
`U.S. Patent
`
`Jan. 28, 2003
`
`Sheet 6 of 7
`
`US 6513,063 B1
`
`(from step 406, Fig. 4)
`
`
`
`SCRAPE THE ONLINE SCRIPTED FORM TO
`EXTRACT AN INPUTTEMPLATE
`
`NSTANTATE THE INPUT TEMPLATE USING
`INTERPRETATION OF STEP 4O4
`
`(to step 407, Fig. 4)
`
`Comcast - Exhibit 1012, page 8
`
`
`
`U.S. Patent
`
`Jan. 28, 2003
`
`Sheet 7 of 7
`
`US 6513,063 B1
`
`O
`CO
`
`
`
`
`
`->{O_L\7 LITI OV/-|
`
`Comcast - Exhibit 1012, page 9
`
`
`
`US 6513,063 B1
`
`1
`ACCESSING NETWORK-BASED
`ELECTRONIC INFORMATION THROUGH
`SCRIPTED ONLINE INTERFACES USING
`SPOKEN INPUT
`
`This is a Continuation. In Part of co-pending U.S. patent
`application Ser. No. 09/225,198, filed Jan. 5, 1999, Provi
`sional U.S. Patent Application No. 60/124,718, filed Mar.
`17, 1999, Provisional U.S. Patent Application No. 60/124,
`720, filed Mar. 17, 1999, and Provisional U.S. Patent Appli
`cation No. 60/124,719, filed Mar. 17, 1999, from which
`applications priority is claimed and these application are
`incorporated herein by reference.
`
`2
`directed by Clint Eastwood'-as opposed to Speaking in
`terms of arbitrary navigation structures (e.g., hierarchical
`layers of menus, commands, etc.) that are essentially arti
`facts reflecting constraints of the pre-existing text/click
`navigation System. At the same time, the front-end must
`recognize and accommodate the reality that a Stream of
`naive spoken natural language input will, over time, typi
`cally present a variety of errors and/or ambiguities: e.g.,
`garbled/unrecognized words (did the user say "Eastwood” or
`“Easter'?) and under-constrained requests (“Show me the
`Clint Eastwood movie”). An approach is needed for han
`dling and resolving Such errors and ambiguities in a rapid,
`user-friendly, non-frustrating manner.
`What is needed is a methodology and apparatus for
`rapidly constructing a voice-driven front-end atop an
`existing, non-voice data navigation System, whereby users
`can interact by means of intuitive natural language input not
`Strictly conforming to the Step-by-Step browsing architecture
`of the existing navigation System, and wherein any errors or
`ambiguities in user input are rapidly and conveniently
`resolved. The solution to this need should be compatible
`with the constraints of a multi-user, distributed environment
`such as the Internet/Web or a proprietary high-bandwidth
`content delivery network, a Solution contemplating one-at
`a-time user interactions at a Single location is insufficient, for
`example.
`
`SUMMARY OF THE INVENTION
`The present invention addresses the above needs by
`providing a System, method, and article of manufacture for
`navigating network-based electronic data Sources in
`response to spoken input requests. When a Spoken input
`request is received from a user, it is interpreted, Such as by
`using a speech recognition engine to extract Speech data
`from acoustic Voice Signals, and using a language parser to
`linguistically parse the Speech data. The interpretation of the
`spoken request can be performed on a computing device
`locally with the user or remotely from the user. The resulting
`interpretation of the request is thereupon used to automati
`cally construct an operational navigation query to retrieve
`the desired information from one or more electronic network
`data Sources, which is then transmitted to a client device of
`the user. If the network data Source is a database, the
`navigation query is constructed in the format of a database
`query language.
`Typically, errors or ambiguities emerge in the interpreta
`tion of the spoken request, Such that the System cannot
`instantiate a complete, valid navigational template. This is to
`be expected occasionally, and one preferred aspect of the
`invention is the ability to handle Such errors and ambiguities
`in relatively graceful and user-friendly manner. Instead of
`Simply rejecting Such input and defaulting to traditional
`input modes or simply asking the user to try again, a
`preferred embodiment of the present invention seeks to
`converge rapidly toward instantiation of a valid navigational
`template by Soliciting additional clarification from the user
`as necessary, either before or after a navigation of the data
`Source, via multimodal input, i.e., by means of menu Selec
`tion or other input modalities including and in addition to
`spoken input. This clarifying, multi-modal dialogue takes
`advantage of whatever partial navigational information has
`been gleaned from the initial interpretation of the user's
`spoken request. This clarification proceSS continues until the
`System converges toward an adequately instantiated navi
`gational template, which is in turn used to navigate the
`network-based data and retrieve the user's desired informa
`tion. The retrieved information is transmitted acroSS the
`network and presented to the user on a Suitable client display
`device.
`
`BACKGROUND OF THE INVENTION
`The present invention relates generally to the navigation
`of electronic data by means of spoken natural language
`requests, and to feedback mechanisms and methods for
`resolving the errors and ambiguities that may be associated
`with Such requests.
`AS global electronic connectivity continues to grow, and
`the universe of electronic data potentially available to users
`continues to expand, there is a growing need for information
`navigation technology that allows relatively naive users to
`navigate and access desired data by means of natural lan
`guage input. In many of the most important markets
`including the home entertainment arena, as well as mobile
`computing-spoken natural language input is highly
`desirable, if not ideal. AS just one example, the proliferation
`of high-bandwidth communications infrastructure for the
`home entertainment market (cable, Satellite, broadband)
`enables delivery of movies-on-demand and other interactive
`multimedia content to the consumer's home television Set.
`For users to take full advantage of this content Stream
`ultimately requires interactive navigation of content data
`bases in a manner that is too complex for user-friendly
`Selection by means of a traditional remote-control clicker.
`Allowing spoken natural language requests as the input
`modality for rapidly Searching and accessing desired content
`is an important objective for a Successful consumer enter
`tainment product in a context offering a dizzying range of
`database content choices. AS further examples, this same
`need to drive navigation of (and transaction with) relatively
`complex data warehouses using spoken natural language
`requests applies equally to Surfing the Internet/Web or other
`networks for general information, multimedia content, or
`e-commerce transactions.
`In general, the existing navigational Systems for browsing
`electronic databases and data warehouses (search engines,
`menus, etc.), have been designed without navigation via
`spoken natural language as a specific goal. So today's world
`is full of existing electronic data navigation Systems that do
`not assume browsing via natural Spoken commands, but
`rather assume text and mouse-click inputs (or in the case of
`TV remote controls, even less). Simply recognizing voice
`commands within an extremely limited Vocabulary and
`grammar-the spoken equivalent of button/click input (e.g.,
`speaking “channel 5” selects TV channel 5) is really not
`sufficient by itself to satisfy the objectives described above.
`In order to deliver a true “win” for users, the voice-driven
`front-end must accept spoken natural language input in a
`manner that is intuitive to users. For example, the front-end
`should not require learning a highly specialized command
`language or format. More fundamentally, the front-end must
`allow users to speak directly in terms of what the user
`ultimately wants--e.g., “I’d like to see a Western film
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Comcast - Exhibit 1012, page 10
`
`
`
`US 6513,063 B1
`
`15
`
`25
`
`3
`BRIEF DESCRIPTION OF THE DRAWINGS
`The invention, together with further advantages thereof,
`may best be understood by reference to the following
`description taken in conjunction with the accompanying
`drawings in which:
`FIG. 1a illustrates a System providing a spoken natural
`language interface for network-based information
`navigation, in accordance with an embodiment of the
`present invention with Server-side processing of requests,
`FIG. 1b illustrates another System providing a spoken
`natural language interface for network-based information
`navigation, in accordance with an embodiment of the
`present invention with client-side processing of requests,
`FIG. 2 illustrates a System providing a spoken natural
`language interface for network-based information
`navigation, in accordance with an embodiment of the
`present invention for a mobile computing Scenario,
`FIG. 3 illustrates the functional logic components of a
`request processing module in accordance with an embodi
`ment of the present invention;
`FIG. 4 illustrates a process utilizing spoken natural lan
`guage for navigating an electronic database in accordance
`with one embodiment of the present invention;
`FIG. 5 illustrates a process for constructing a navigational
`query for accessing an online data Source via an interactive,
`Scripted (e.g., CGI) form; and
`FIG. 6 illustrates an embodiment of the present invention
`utilizing a community of distributed, collaborating elec
`tronic agents.
`DETAILED DESCRIPTION OF THE
`INVENTION
`1. System Architecture
`a. Server-End Processing of Spoken Input
`FIG. 1a is an illustration of a data navigation System
`driven by spoken natural language input, in accordance with
`one embodiment of the present invention. AS Shown, a user's
`Voice input data is captured by a voice input device 102,
`such as a microphone. Preferably voice input device 102
`includes a button or the like that can be pressed or held
`down to activate a listening mode, So that the System need
`not continually pay attention to, or be confused by, irrelevant
`background noise. In one preferred embodiment well-Suited
`for the home entertainment Setting, voice input device 102
`is a portable remote control device with an integrated
`microphone, and the Voice data is transmitted from device
`102 preferably via infrared (or other wireless) link to com
`munications box 104 (e.g., a set-top box or a similar
`50
`communications device that is capable of retransmitting the
`raw voice data and/or processing the voice data) local to the
`user's environment and coupled to communications network
`106. The voice data is then transmitted across network 106
`to a remote server or servers 108. The voice data may
`preferably be transmitted in compressed digitized form, or
`alternatively-particularly where bandwidth constraints are
`Significant-in analog format (e.g., via frequency modulated
`transmission), in the latter case being digitized upon arrival
`at remote server 108.
`At remote server 108, the voice data is processed by
`request processing logic 300 in order to understand the
`user's request and construct an appropriate query or request
`for navigation of remote data Source 110, in accordance with
`the interpretation process exemplified in FIG. 4 and FIG. 5
`and discussed in greater detail below. For purposes of
`executing this process, request processing logic 300 com
`
`4
`prises functional modules including Speech recognition
`engine 310, natural language (NL) parser 320, query con
`struction logic 330, and query refinement logic 340, as
`shown in FIG.3. Data source 110 may comprise database(s),
`Internet/web site(s), or other electronic information
`repositories, and preferably resides on a central Server or
`servers which may or may not be the same as server 108,
`depending on the Storage and bandwidth needs of the
`application and the resources available to the practitioner.
`Data Source 110 may include multimedia content, Such as
`movies or other digital Video and audio content, other
`various forms of entertainment data, or other electronic
`information. The contents of data source 110 are
`navigated-i.e., the contents are accessed and Searched, for
`retrieval of the particular information desired by the user
`using the processes of FIGS. 4 and 5 as described in greater
`detail below.
`Once the desired information has been retrieved from data
`Source 110, it is electronically transmitted via network 106
`to the user for viewing on client display device 112. In a
`preferred embodiment well-suited for the home entertain
`ment Setting, display device 112 is a television monitor or
`Similar audiovisual entertainment device, typically in Sta
`tionary position for comfortable viewing by users. In
`addition, in Such preferred embodiment, display device 112
`is coupled to or integrated with a communications box
`(which is preferably the same as communications box 104,
`but may also be a separate unit) for receiving and decoding/
`formatting the desired electronic information that is received
`across communications network 106.
`Network 106 is a two-way electronic communications
`network and may be embodied in electronic communication
`infrastructure including coaxial (cable television) lines,
`DSL, fiber-optic cable, traditional copper wire (twisted
`pair), or any other type of hardwired connection. Network
`106 may also include a wireleSS connection Such as a
`Satellite-based connection, cellular connection, or other type
`of wireless connection. Network 106 may be part of the
`Internet and may support TCP/IP communications, or may
`be embodied in a proprietary network, or in any other
`electronic communications network infrastructure, whether
`packet-Switched or connection-oriented. A design consider
`ation is that network 106 preferably provide suitable band
`width depending upon the nature of the content anticipated
`for the desired application.
`b. Client-End Processing of Spoken Input
`FIG. 1b is an illustration of a data navigation System
`driven by Spoken natural language input, in accordance with
`a Second embodiment of the present invention. Again, a
`user's voice input data is captured by a voice input device
`102, such as a microphone. In the embodiment shown in
`FIG. 1b, the voice data is transmitted from device 202 to
`requests processing logic 300, hosted on a local Speech
`processor, for processing and interpretation. In the preferred
`embodiment illustrated in FIG. 1b, the local speech proces
`Sor is conveniently integrated as part of communications box
`104, although implementation in a physically separate (but
`communicatively coupled) unit is also possible as will be
`readily apparent to those of skill in the art. The Voice data is
`processed by the components of request processing logic
`300 in order to understand the user's request and construct
`an appropriate query or request for navigation of remote data
`Source 110, in accordance with the interpretation process
`exemplified in FIGS. 4 and 5 as discussed in greater detail
`below.
`The resulting navigational query is then transmitted elec
`tronically across network 106 to data source 110, which
`
`35
`
`40
`
`45
`
`55
`
`60
`
`65
`
`Comcast - Exhibit 1012, page 11
`
`
`
`S
`preferably resides on a central server or servers 108. As in
`FIG.1a, data source 110 may comprise database(s), Internet/
`web site(s), or other electronic information repositories, and
`preferably may include multimedia content, Such as movies
`or other digital Video and audio content, other various forms
`of entertainment data, or other electronic information. The
`contents of data Source 110 are then navigated-i.e., the
`contents are accessed and Searched, for retrieval of the
`particular information desired by the user-preferably using
`the process of FIGS. 4 and 5 as described in greater detail
`below. Once the desired information has been retrieved from
`data source 110, it is electronically transmitted via network
`106 to the user for viewing on client display device 112.
`In one embodiment in accordance with FIG. 1b and
`well-Suited for the home entertainment Setting, voice input
`device 102 is a portable remote control device with an
`integrated microphone, and the Voice data is transmitted
`from device 102 preferably via infrared (or other wireless)
`link to the local Speech processor. The local Speech proces
`Sor is coupled to communications network 106, and also
`preferably to client display device 112 (especially for pur
`poses of query refinement transmissions, as discussed below
`in connection with FIG. 4, step 412), and preferably may be
`integrated within or coupled to communications box 104. In
`addition, especially for purposes of a home entertainment
`application, display device 112 is preferably a television
`monitor or Similar audiovisual entertainment device, typi
`cally in Stationary position for comfortable viewing by
`users. In addition, in Such preferred embodiment, display
`device 112 is coupled to a communications box (which is
`preferably the same as communications box 104, but may
`also be a physically separate unit) for receiving and
`decoding/formatting the desired electronic information that
`is received acroSS communications network 106.
`Design considerations favoring Server-side processing
`and interpretation of spoken input requests, as exemplified
`in FIG. 1a, include minimizing the need to distribute costly
`computational hardware and Software to all client users in
`order to perform speech and language processing. Design
`considerations favoring client-Side processing, as exempli
`fied in FIG. 1b, include minimizing the quantity of data Sent
`upstream acroSS the network from each client, as the Speech
`recognition is performed before transmission acroSS the
`network and only the query data and/or request needs to be
`Sent, thus reducing the upstream bandwidth requirements.
`c. Mobile Client Embodiment
`A mobile computing embodiment of the present invention
`may be implemented by practitioners as a variation on the
`embodiments of either FIG. 1a or FIG. 1b. For example, as
`depicted in FIG.2, a mobile variation in accordance with the
`Server-Side processing architecture illustrated in FIG. 1 a
`may be implemented by replacing Voice input device 102,
`communications box 104, and client display device 112,
`with an integrated, mobile, information appliance 202 Such
`as a cellular telephone or wireleSS personal digital assistant
`(wireless PDA). Mobile information appliance 202 essen
`tially performs the functions of the replaced components.
`Thus, mobile information appliance 202 receives Spoken
`natural language input requests from the user in the form of
`voice data, and transmits that data (preferably via wireless
`data receiving station 204) across communications network
`206 for server-side interpretation of the request, in similar
`fashion as described above in connection with FIG. 1.
`Navigation of data source 210 and retrieval of desired
`information likewise proceeds in an analogous manner as
`described above. Display information transmitted electroni
`cally back to the user across network 206 is displayed for the
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 6513,063 B1
`
`6
`user on the display of information appliance 202, and audio
`information is output through the appliance's Speakers.
`Practitioners will further appreciate, in light of the above
`teachings, that if mobile information appliance 202 is
`equipped with Sufficient computational processing power,
`then a mobile variation of the client-side architecture exem
`plified in FIG.2 may similarly be implemented. In that case,
`the modules corresponding to request processing logic 300
`would be embodied locally in the computational resources
`of mobile information appliance 202, and the logical flow of
`data would otherwise follow in a manner analogous to that
`previously described in connection with FIG. 1b.
`AS illustrated in FIG. 2, multiple users, each having their
`own client input device, may issue requests, Simultaneously
`or otherwise, for navigation of data source 210. This is
`equally true (though not explicitly drawn) for the embodi
`ments depicted in FIGS. 1a and 1b. Data source 210 (or
`100), being a network accessible information resource, has
`typically already been constructed to Support access requests
`from Simultaneous multiple network users, as known by
`practitioners of ordinary skill in the art. In the case of
`Server-Side Speech processing, as exemplified in FIGS. 1 a
`and 2, the interpretation logic and error correction logic
`modules are also preferably designed and implemented to
`Support queuing and multi-tasking of requests from multiple
`Simultaneous network users, as will be appreciated by those
`of skill in the art.
`It will be apparent to those skilled in the art that additional
`implementations, permutations and combinations of the
`embodiments set forth in FIGS. 1a, 1b, and 2 may be created
`without Straying from the Scope and Spirit of the present
`invention. For example, practitioners will understand, in
`light of the above teachings and design considerations, that
`it is possible to divide and allocate the functional compo
`nents of request processing logic 300 between client and
`Server. For example, Speech recognition-in entirety, or
`perhaps just early Stages Such as feature extraction-might
`be performed locally on the client end, perhaps to reduce
`bandwidth requirements, while natural language parsing and
`other necessary processing might be performed upstream on
`the Server end, So that more extensive computational power
`need not be distributed locally to each client. In that case,
`corresponding portions of request processing logic 300, Such
`as Speech recognition engine 310 or portions thereof, would
`reside locally at the client as in FIG. 1b, while other
`component modules would be hosted at the Server end as in
`FIGS. 1a and 2.
`Further, practitioners may choose to implement the each
`of the various embodiments described above on any number
`of different hardware and Software computing platforms and
`environments and various combinations thereof, including,
`by way of just a few examples: a general-purpose hardware
`microprocessor Such as the Intel Pentium Series, operating
`system software such as Microsoft Windows/CE, Palm OS,
`or Apple Mac OS (particularly for client devices and client
`side processing), or Unix, Linux, or Windows/NT (the latter
`three particularly for network data Servers and Server-side
`processing), and/or proprietary information access platforms
`such as Microsoft's WebTV or the Diva Systems video-on
`demand System.
`2. Processing Methodology
`The present invention provides a spoken natural language
`interface for interrogation of remote electronic databases
`and retrieval of desired information. A preferred embodi
`ment of the present invention utilizes the basic methodology
`outlined in the flow diagram of FIG. 4 in order to provide
`this interface. This methodology will now be discussed.
`
`Comcast - Exhibit 1012, page 12
`
`
`
`7
`a. Interpreting Spoken Natural Language Requests
`At Step 402, the user's spoken request for information is
`initially received in the form of raw (acoustic) voice data by
`a Suitable input device, as previously discussed in connec
`tion with FIGS. 1-2. At step 404 the voice data received
`from the user is interpreted in order to understand the user's
`request for information. Preferably this step includes per
`forming Speech recognition in order to extract words from
`the Voice data, and further includes natural language parsing
`of those words in order to generate a structured linguistic
`representation of the user's request.
`Speech recognition in Step 404 is performed using Speech
`recognition engine 310. A variety of commercial quality,
`Speech recognition engines are readily available on the
`market, as practitioners will know. For example, Nuance
`Communications offers a Suite of Speech recognition
`engines, including Nuance 6, its current flagship product,
`and Nuance Express, a lower cost package for entry-level
`applications. As one other example, IBM offers the ViaVoice
`Speech recognition engine, including a low-cost Shrink
`wrapped version available through popular consumer distri
`bution channels. Basically, a speech recognition engine
`processes acoustic Voice data and attempts to generate a text
`Stream of recognized words.
`Typically, the Speech recognition engine is provided with
`a Vocabulary lexicon of likely words or phrases that the
`recognition engine can match against its analysis of acous
`tical Signals, for purposes of a given application. Preferably,
`the lexicon is dynamically adjusted to reflect the current user
`context, as established by the preceding user inputs. For
`example, if a user is engaged in a dialogue with the System
`about movie Selection, the recognition engine's vocabulary
`may preferably be adjusted to favor relevant words and
`phrases, Such as a Stored list of proper names for popular
`movie actors and directors, etc. Whereas if the current
`dialogue involves Selection and Viewing of a Sports event,
`the engine's Vocabulary might preferably be adjusted to
`favor a Stored list of proper names for professional Sports
`teams, etc. In addition, a spee