throbber
(12) United States Patent
`Julia et al.
`
`USOO651.3063B1
`(10) Patent No.:
`US 6,513,063 B1
`(45) Date of Patent:
`Jan. 28, 2003
`
`(54) ACCESSING NETWORK-BASED
`ELECTRONIC INFORMATION THROUGH
`SCRIPTED ONLINE INTERFACES USING
`SPOKEN INPUT
`(75) Inventors: Luc Julia, Menlo Park, CA (US);
`Dimitris Voutsas, Thessaloniki (GR);
`Adam Cheyer, Palo Alto, CA (US)
`(73) Assignee: SRI International, Menlo Park, CA
`(US)
`
`O
`
`O
`
`O
`
`-
`
`0
`
`9/1998 Eberman et al. .............. 395/12
`5,805,775 A
`5,855,002 A 12/1998 Armstrong .................. 704/270
`E. A &EC By s
`- - - - - 7:
`2- Y-a-2
`1Cldy el al. . . . . . . . . . . . . . . . . . . .
`6,003,072 A 12/1999 Gerritsen et al. ........... 709/218
`6,012,030 A 1/2000 French-St. George
`... 707/1
`6,026,388 A
`2/2000 Liddy et al. .
`... 707/5
`6,173,279 B1 * 1/2001 Levin et al. .....
`6,192,338 B1
`2/2001 Haszto et al. ............... 704/257
`FOREIGN PATENT DOCUMENTS
`
`et al. .......................... 704/275
`
`(*) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`EP
`EP
`
`(21) Appl. No.: 09/524,868
`(22) Filed:
`Mar 14, 2000
`
`2/1999 ............ HO4M/3/50
`O 895 396 A2
`4/2001
`........... G06F/17/30
`1 O94 406 A2
`OTHER PUBLICATIONS
`http://www.ai. Sri.com/natural-language/projects/arpa-Sls/
`nat-lang.html "Gemini: A Natural Language System for
`Spoken-Language Understanding” “Interleaving Syntax
`and Semantics in an Efficient Bottom-Up Parser,”.
`http://www.ai. Sri.com/natural-language/projects/arpa-Sls/
`Related U.S. Application Data
`spnl-int.html “Combining Linguistic and Statistical Knowl
`edge Sources in Natural Language Processing for ATIS".
`(63) Continuation-in-part of application No. 09/225,198, filed on
`Jan. 5, 1999.
`(List continued on next page.)
`(60) Final application No. 60/124,718, filed on Mar 17,
`, provisional application No. 60/124,720, filed on Mar.
`Primary Examiner David Wiley
`NE" an evisional application No. 60/124,719, filed
`(74) Attorney, Agent, or Firm Moser, Patterson &
`Sheridan, LLP, Kin-Wah Tong, Esq.
`(51) Int. Cl. ................................................ Gos 1300
`(57)
`ABSTRACT
`(52) U.S. Cl. ........................................ 709/219; 709/227
`(58) Field of Search .................................. 709/219, 227 A System, method, and article of manufacture are provided
`(56)
`References Cited
`for navigating an electronic data Source that has a Scripted
`online interface by means of Spoken input. When a Spoken
`U.S. PATENT DOCUMENTS
`request is received from a user, it is interpreted. A navigation
`5,197,005 A 3/1993 Schwartz et al. ........... 364419
`query is constructed based on the interpretation of the
`5,386,556 A
`1/1995 Hedin et al. ......
`... 395/600
`Speech input and a template extracted by Scraping an online
`5.434,777 A
`7/1995 Luciw ........................ 364/419
`Scripted interface to the data Source. The resulting interpre
`5,519,608 A 5/1996 Kupiec ...
`... 364/419.08
`tation of the request is thereupon used to automatically
`5,608,624 A 3/1997 Luciw ........................ 395/794
`construct an operational navigation query to retrieve the
`5,721,938 A 2/1998 Stuckey ..
`... 395/754
`desired information from one or more electronic network
`5,729,659 A 3/1998 Potter ........
`... 395/2.79
`data Sources, which is then transmitted to a client device of
`5,748,974 A 5/1998 Johnson ........
`... 395/759
`the user
`5,774,859 A
`6/1998 Houser et al. ....
`... 704/275
`5,794,050 A 8/1998 Dahlgren et al. .
`... 395/708
`5,802,526 A 9/1998 Fawcett et al. ............. 707/104
`
`102 Claims, 7 Drawing Sheets
`
`A. 92
`
`RECEIVE SPOKEN NL REQUESF
`-
`- -
`NTERPRE REQUEST
`
`iDENTIFYISELECT DATASOURC
`s
`406 CONSTRUCT NAVIGATION GUERY
`
`407 KOFICIENCES
`
`a O 8
`
`NAW16ATE DATA SURC
`
`
`
`4 9
`
`REN
`QUERY?
`
`MO
`TRANSMAmicosplay To
`CENT
`
`4. 10
`
`SOCIt
`ADOTIONAL
`(MULTIMODAL)
`JSE INP
`
`412
`
`Comcast - Exhibit 1012, page 1
`
`

`

`US 6513,063 B1
`Page 2
`
`OTHER PUBLICATIONS
`http://www.ai.Sri.com/-oaa.applications.html “InfoWiz: An
`Animated Voice Interactive Information System”.
`“Com
`http://www.ai. Sri.com/~leSaf/commandtalk.html:
`mand Talk: A Spoken-Language Interface for Battlefield
`Simulations”, 1997, by Robert Moore, John Dowding, Harry
`Bratt, J. Mark Gawron, Yonael Gorfu and Adam Cheyer, in
`“Proceedings of the Fifth Conference on Applied Natural
`Language Processing”, Washington, DC, pp. 1-7, ASSocia
`tion for Computational Linguistics.
`“The Command Talk Spoken Dialogue System”, 1999, by
`Amanda Stent, John Dowding, Jean Mark Gawron, Eliza
`beth Owen Bratt and Robert Moore, in “Proceedings of the
`Thirty-Seventh Annual Meeting of the ACL, pp. 183-190,
`University of Maryland, College Park, MD, Association for
`Computational Linguistics.
`http://www.ai.Sri.com/-lesaf/commandtalk.html “Interpret
`ing Language in Context in Command Talk', 1999, by John
`Dowding and Elizabeth Owen Bratt and Sharon Goldwater,
`in “Communicative Agents: The Use of Natural Language in
`Embodied Systems”, pp. 63-67, Association for Computing
`Machinery (ACM) Special Interest Group on Artificial Intel
`ligence (SIGART), Seattle, WA.
`
`Stent, Amanda et al., “The Command Talk Spoken Dialogue
`System', SRI International.
`Moore, Robert et al., “Command Talk: A Spoken-Language
`Interface for Battlefield Simulations”, Oct. 23, 1997, SRI
`International.
`Dowding, John et al., “Interpreting Language in Context in
`Command Talk', Feb. 5, 1999, SRI International.
`http://www.ai. Sri.com/-oaa/infowiz.html, InfoWiz: An Ani
`mated Voice Interactive Information System, May 8, 2000.
`Dowding, John, “Interleaving Syntax and Semantics in an
`Efficient Bottom-up Parser', SRI International.
`Moore, Robert et al., “Combining Linguistic and Statistical
`Knowledge Sources in Natural-Language Processing for
`ATIS', SRI International.
`Dowding, John et al., “Gemini: A Natural Language System
`For Spoken-Language Understanding', SRI International.
`Wyard, P.J. et al., “Spoken Language Systems-Beyond
`Prompt and Response', BT Technology Journal, vol. 14, No.
`1, 1996, pp. 187–207.
`Notification of Transmittal of The International Search
`Report and the International Search Report, dated Sep. 7,
`2001, filed in International Applin. No. PCT/US01/07924.
`* cited by examiner
`
`Comcast - Exhibit 1012, page 2
`
`

`

`U.S. Patent
`
`Jan. 28, 2003
`
`Sheet 1 of 7
`
`US 6,513,063 B1
`
`104
`
`1 O2
`
`--
`
`oS)
`
`Network
`1 O6
`
`112
`
`108
`300 (see Fig. 3)
`11 O
`
`108n
`
`Fig. 1 a
`
`Comcast - Exhibit 1012, page 3
`
`

`

`U.S. Patent
`
`Jan. 28, 2003
`
`Sheet 2 of 7
`
`US 6,513,063 B1
`
`
`
`Comcast - Exhibit 1012, page 4
`
`

`

`U.S. Patent
`
`Jan. 28, 2003
`
`Sheet 3 of 7
`
`US 6,513,063 B1
`
`Network
`2O6
`
`2O8
`300 (see Fig. 3)
`21 O
`
`'-
`
`es Ds
`
`2 O 4.
`
`Fig. 2
`
`Comcast - Exhibit 1012, page 5
`
`

`

`U.S. Patent
`
`Jan. 28, 2003
`
`Sheet 4 of 7
`
`US 6,513,063 B1
`
`REQUEST PROCESSING LOGIC 300
`
`
`
`SPEECH RECOGNITION
`ENGINE
`
`NATURAL ANGUAGE
`PARSER
`
`QUERY CONSTRUCTION
`LOGIC
`
`OUERY REF NEMENT LOGIC
`
`Fig. 3
`
`Comcast - Exhibit 1012, page 6
`
`

`

`U.S. Patent
`
`Jan. 28, 2003
`
`Sheet 5 of 7
`
`US 6513,063 B1
`
`4O2 RECEIVE SPOKEN NL RECRUEST
`
`4 O
`4
`
`INTERPRET REQUEST
`
`4 ---
`O5 IDENTFY/SELECT DATA SOURCE
`
`
`
`4O6 CONSTRUCT NAVIGATION OUERY
`
`
`
`
`
`
`
`
`
`SOLICIT
`ADDITIONAL
`(MULTIMODAL)
`USER INPUT
`
`412
`
`408
`
`NAVIGATE DATA SOURCE
`
`
`
`409
`
`REFINE
`OUERY2
`
`NO
`
`41 O
`--
`
`TRANSMIT AND DISPLAY TO
`CLENT
`
`Fig. 4
`
`Comcast - Exhibit 1012, page 7
`
`

`

`U.S. Patent
`
`Jan. 28, 2003
`
`Sheet 6 of 7
`
`US 6513,063 B1
`
`(from step 406, Fig. 4)
`
`
`
`SCRAPE THE ONLINE SCRIPTED FORM TO
`EXTRACT AN INPUTTEMPLATE
`
`NSTANTATE THE INPUT TEMPLATE USING
`INTERPRETATION OF STEP 4O4
`
`(to step 407, Fig. 4)
`
`Comcast - Exhibit 1012, page 8
`
`

`

`U.S. Patent
`
`Jan. 28, 2003
`
`Sheet 7 of 7
`
`US 6513,063 B1
`
`O
`CO
`
`
`
`
`
`->{O_L\7 LITI OV/-|
`
`Comcast - Exhibit 1012, page 9
`
`

`

`US 6513,063 B1
`
`1
`ACCESSING NETWORK-BASED
`ELECTRONIC INFORMATION THROUGH
`SCRIPTED ONLINE INTERFACES USING
`SPOKEN INPUT
`
`This is a Continuation. In Part of co-pending U.S. patent
`application Ser. No. 09/225,198, filed Jan. 5, 1999, Provi
`sional U.S. Patent Application No. 60/124,718, filed Mar.
`17, 1999, Provisional U.S. Patent Application No. 60/124,
`720, filed Mar. 17, 1999, and Provisional U.S. Patent Appli
`cation No. 60/124,719, filed Mar. 17, 1999, from which
`applications priority is claimed and these application are
`incorporated herein by reference.
`
`2
`directed by Clint Eastwood'-as opposed to Speaking in
`terms of arbitrary navigation structures (e.g., hierarchical
`layers of menus, commands, etc.) that are essentially arti
`facts reflecting constraints of the pre-existing text/click
`navigation System. At the same time, the front-end must
`recognize and accommodate the reality that a Stream of
`naive spoken natural language input will, over time, typi
`cally present a variety of errors and/or ambiguities: e.g.,
`garbled/unrecognized words (did the user say "Eastwood” or
`“Easter'?) and under-constrained requests (“Show me the
`Clint Eastwood movie”). An approach is needed for han
`dling and resolving Such errors and ambiguities in a rapid,
`user-friendly, non-frustrating manner.
`What is needed is a methodology and apparatus for
`rapidly constructing a voice-driven front-end atop an
`existing, non-voice data navigation System, whereby users
`can interact by means of intuitive natural language input not
`Strictly conforming to the Step-by-Step browsing architecture
`of the existing navigation System, and wherein any errors or
`ambiguities in user input are rapidly and conveniently
`resolved. The solution to this need should be compatible
`with the constraints of a multi-user, distributed environment
`such as the Internet/Web or a proprietary high-bandwidth
`content delivery network, a Solution contemplating one-at
`a-time user interactions at a Single location is insufficient, for
`example.
`
`SUMMARY OF THE INVENTION
`The present invention addresses the above needs by
`providing a System, method, and article of manufacture for
`navigating network-based electronic data Sources in
`response to spoken input requests. When a Spoken input
`request is received from a user, it is interpreted, Such as by
`using a speech recognition engine to extract Speech data
`from acoustic Voice Signals, and using a language parser to
`linguistically parse the Speech data. The interpretation of the
`spoken request can be performed on a computing device
`locally with the user or remotely from the user. The resulting
`interpretation of the request is thereupon used to automati
`cally construct an operational navigation query to retrieve
`the desired information from one or more electronic network
`data Sources, which is then transmitted to a client device of
`the user. If the network data Source is a database, the
`navigation query is constructed in the format of a database
`query language.
`Typically, errors or ambiguities emerge in the interpreta
`tion of the spoken request, Such that the System cannot
`instantiate a complete, valid navigational template. This is to
`be expected occasionally, and one preferred aspect of the
`invention is the ability to handle Such errors and ambiguities
`in relatively graceful and user-friendly manner. Instead of
`Simply rejecting Such input and defaulting to traditional
`input modes or simply asking the user to try again, a
`preferred embodiment of the present invention seeks to
`converge rapidly toward instantiation of a valid navigational
`template by Soliciting additional clarification from the user
`as necessary, either before or after a navigation of the data
`Source, via multimodal input, i.e., by means of menu Selec
`tion or other input modalities including and in addition to
`spoken input. This clarifying, multi-modal dialogue takes
`advantage of whatever partial navigational information has
`been gleaned from the initial interpretation of the user's
`spoken request. This clarification proceSS continues until the
`System converges toward an adequately instantiated navi
`gational template, which is in turn used to navigate the
`network-based data and retrieve the user's desired informa
`tion. The retrieved information is transmitted acroSS the
`network and presented to the user on a Suitable client display
`device.
`
`BACKGROUND OF THE INVENTION
`The present invention relates generally to the navigation
`of electronic data by means of spoken natural language
`requests, and to feedback mechanisms and methods for
`resolving the errors and ambiguities that may be associated
`with Such requests.
`AS global electronic connectivity continues to grow, and
`the universe of electronic data potentially available to users
`continues to expand, there is a growing need for information
`navigation technology that allows relatively naive users to
`navigate and access desired data by means of natural lan
`guage input. In many of the most important markets
`including the home entertainment arena, as well as mobile
`computing-spoken natural language input is highly
`desirable, if not ideal. AS just one example, the proliferation
`of high-bandwidth communications infrastructure for the
`home entertainment market (cable, Satellite, broadband)
`enables delivery of movies-on-demand and other interactive
`multimedia content to the consumer's home television Set.
`For users to take full advantage of this content Stream
`ultimately requires interactive navigation of content data
`bases in a manner that is too complex for user-friendly
`Selection by means of a traditional remote-control clicker.
`Allowing spoken natural language requests as the input
`modality for rapidly Searching and accessing desired content
`is an important objective for a Successful consumer enter
`tainment product in a context offering a dizzying range of
`database content choices. AS further examples, this same
`need to drive navigation of (and transaction with) relatively
`complex data warehouses using spoken natural language
`requests applies equally to Surfing the Internet/Web or other
`networks for general information, multimedia content, or
`e-commerce transactions.
`In general, the existing navigational Systems for browsing
`electronic databases and data warehouses (search engines,
`menus, etc.), have been designed without navigation via
`spoken natural language as a specific goal. So today's world
`is full of existing electronic data navigation Systems that do
`not assume browsing via natural Spoken commands, but
`rather assume text and mouse-click inputs (or in the case of
`TV remote controls, even less). Simply recognizing voice
`commands within an extremely limited Vocabulary and
`grammar-the spoken equivalent of button/click input (e.g.,
`speaking “channel 5” selects TV channel 5) is really not
`sufficient by itself to satisfy the objectives described above.
`In order to deliver a true “win” for users, the voice-driven
`front-end must accept spoken natural language input in a
`manner that is intuitive to users. For example, the front-end
`should not require learning a highly specialized command
`language or format. More fundamentally, the front-end must
`allow users to speak directly in terms of what the user
`ultimately wants--e.g., “I’d like to see a Western film
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Comcast - Exhibit 1012, page 10
`
`

`

`US 6513,063 B1
`
`15
`
`25
`
`3
`BRIEF DESCRIPTION OF THE DRAWINGS
`The invention, together with further advantages thereof,
`may best be understood by reference to the following
`description taken in conjunction with the accompanying
`drawings in which:
`FIG. 1a illustrates a System providing a spoken natural
`language interface for network-based information
`navigation, in accordance with an embodiment of the
`present invention with Server-side processing of requests,
`FIG. 1b illustrates another System providing a spoken
`natural language interface for network-based information
`navigation, in accordance with an embodiment of the
`present invention with client-side processing of requests,
`FIG. 2 illustrates a System providing a spoken natural
`language interface for network-based information
`navigation, in accordance with an embodiment of the
`present invention for a mobile computing Scenario,
`FIG. 3 illustrates the functional logic components of a
`request processing module in accordance with an embodi
`ment of the present invention;
`FIG. 4 illustrates a process utilizing spoken natural lan
`guage for navigating an electronic database in accordance
`with one embodiment of the present invention;
`FIG. 5 illustrates a process for constructing a navigational
`query for accessing an online data Source via an interactive,
`Scripted (e.g., CGI) form; and
`FIG. 6 illustrates an embodiment of the present invention
`utilizing a community of distributed, collaborating elec
`tronic agents.
`DETAILED DESCRIPTION OF THE
`INVENTION
`1. System Architecture
`a. Server-End Processing of Spoken Input
`FIG. 1a is an illustration of a data navigation System
`driven by spoken natural language input, in accordance with
`one embodiment of the present invention. AS Shown, a user's
`Voice input data is captured by a voice input device 102,
`such as a microphone. Preferably voice input device 102
`includes a button or the like that can be pressed or held
`down to activate a listening mode, So that the System need
`not continually pay attention to, or be confused by, irrelevant
`background noise. In one preferred embodiment well-Suited
`for the home entertainment Setting, voice input device 102
`is a portable remote control device with an integrated
`microphone, and the Voice data is transmitted from device
`102 preferably via infrared (or other wireless) link to com
`munications box 104 (e.g., a set-top box or a similar
`50
`communications device that is capable of retransmitting the
`raw voice data and/or processing the voice data) local to the
`user's environment and coupled to communications network
`106. The voice data is then transmitted across network 106
`to a remote server or servers 108. The voice data may
`preferably be transmitted in compressed digitized form, or
`alternatively-particularly where bandwidth constraints are
`Significant-in analog format (e.g., via frequency modulated
`transmission), in the latter case being digitized upon arrival
`at remote server 108.
`At remote server 108, the voice data is processed by
`request processing logic 300 in order to understand the
`user's request and construct an appropriate query or request
`for navigation of remote data Source 110, in accordance with
`the interpretation process exemplified in FIG. 4 and FIG. 5
`and discussed in greater detail below. For purposes of
`executing this process, request processing logic 300 com
`
`4
`prises functional modules including Speech recognition
`engine 310, natural language (NL) parser 320, query con
`struction logic 330, and query refinement logic 340, as
`shown in FIG.3. Data source 110 may comprise database(s),
`Internet/web site(s), or other electronic information
`repositories, and preferably resides on a central Server or
`servers which may or may not be the same as server 108,
`depending on the Storage and bandwidth needs of the
`application and the resources available to the practitioner.
`Data Source 110 may include multimedia content, Such as
`movies or other digital Video and audio content, other
`various forms of entertainment data, or other electronic
`information. The contents of data source 110 are
`navigated-i.e., the contents are accessed and Searched, for
`retrieval of the particular information desired by the user
`using the processes of FIGS. 4 and 5 as described in greater
`detail below.
`Once the desired information has been retrieved from data
`Source 110, it is electronically transmitted via network 106
`to the user for viewing on client display device 112. In a
`preferred embodiment well-suited for the home entertain
`ment Setting, display device 112 is a television monitor or
`Similar audiovisual entertainment device, typically in Sta
`tionary position for comfortable viewing by users. In
`addition, in Such preferred embodiment, display device 112
`is coupled to or integrated with a communications box
`(which is preferably the same as communications box 104,
`but may also be a separate unit) for receiving and decoding/
`formatting the desired electronic information that is received
`across communications network 106.
`Network 106 is a two-way electronic communications
`network and may be embodied in electronic communication
`infrastructure including coaxial (cable television) lines,
`DSL, fiber-optic cable, traditional copper wire (twisted
`pair), or any other type of hardwired connection. Network
`106 may also include a wireleSS connection Such as a
`Satellite-based connection, cellular connection, or other type
`of wireless connection. Network 106 may be part of the
`Internet and may support TCP/IP communications, or may
`be embodied in a proprietary network, or in any other
`electronic communications network infrastructure, whether
`packet-Switched or connection-oriented. A design consider
`ation is that network 106 preferably provide suitable band
`width depending upon the nature of the content anticipated
`for the desired application.
`b. Client-End Processing of Spoken Input
`FIG. 1b is an illustration of a data navigation System
`driven by Spoken natural language input, in accordance with
`a Second embodiment of the present invention. Again, a
`user's voice input data is captured by a voice input device
`102, such as a microphone. In the embodiment shown in
`FIG. 1b, the voice data is transmitted from device 202 to
`requests processing logic 300, hosted on a local Speech
`processor, for processing and interpretation. In the preferred
`embodiment illustrated in FIG. 1b, the local speech proces
`Sor is conveniently integrated as part of communications box
`104, although implementation in a physically separate (but
`communicatively coupled) unit is also possible as will be
`readily apparent to those of skill in the art. The Voice data is
`processed by the components of request processing logic
`300 in order to understand the user's request and construct
`an appropriate query or request for navigation of remote data
`Source 110, in accordance with the interpretation process
`exemplified in FIGS. 4 and 5 as discussed in greater detail
`below.
`The resulting navigational query is then transmitted elec
`tronically across network 106 to data source 110, which
`
`35
`
`40
`
`45
`
`55
`
`60
`
`65
`
`Comcast - Exhibit 1012, page 11
`
`

`

`S
`preferably resides on a central server or servers 108. As in
`FIG.1a, data source 110 may comprise database(s), Internet/
`web site(s), or other electronic information repositories, and
`preferably may include multimedia content, Such as movies
`or other digital Video and audio content, other various forms
`of entertainment data, or other electronic information. The
`contents of data Source 110 are then navigated-i.e., the
`contents are accessed and Searched, for retrieval of the
`particular information desired by the user-preferably using
`the process of FIGS. 4 and 5 as described in greater detail
`below. Once the desired information has been retrieved from
`data source 110, it is electronically transmitted via network
`106 to the user for viewing on client display device 112.
`In one embodiment in accordance with FIG. 1b and
`well-Suited for the home entertainment Setting, voice input
`device 102 is a portable remote control device with an
`integrated microphone, and the Voice data is transmitted
`from device 102 preferably via infrared (or other wireless)
`link to the local Speech processor. The local Speech proces
`Sor is coupled to communications network 106, and also
`preferably to client display device 112 (especially for pur
`poses of query refinement transmissions, as discussed below
`in connection with FIG. 4, step 412), and preferably may be
`integrated within or coupled to communications box 104. In
`addition, especially for purposes of a home entertainment
`application, display device 112 is preferably a television
`monitor or Similar audiovisual entertainment device, typi
`cally in Stationary position for comfortable viewing by
`users. In addition, in Such preferred embodiment, display
`device 112 is coupled to a communications box (which is
`preferably the same as communications box 104, but may
`also be a physically separate unit) for receiving and
`decoding/formatting the desired electronic information that
`is received acroSS communications network 106.
`Design considerations favoring Server-side processing
`and interpretation of spoken input requests, as exemplified
`in FIG. 1a, include minimizing the need to distribute costly
`computational hardware and Software to all client users in
`order to perform speech and language processing. Design
`considerations favoring client-Side processing, as exempli
`fied in FIG. 1b, include minimizing the quantity of data Sent
`upstream acroSS the network from each client, as the Speech
`recognition is performed before transmission acroSS the
`network and only the query data and/or request needs to be
`Sent, thus reducing the upstream bandwidth requirements.
`c. Mobile Client Embodiment
`A mobile computing embodiment of the present invention
`may be implemented by practitioners as a variation on the
`embodiments of either FIG. 1a or FIG. 1b. For example, as
`depicted in FIG.2, a mobile variation in accordance with the
`Server-Side processing architecture illustrated in FIG. 1 a
`may be implemented by replacing Voice input device 102,
`communications box 104, and client display device 112,
`with an integrated, mobile, information appliance 202 Such
`as a cellular telephone or wireleSS personal digital assistant
`(wireless PDA). Mobile information appliance 202 essen
`tially performs the functions of the replaced components.
`Thus, mobile information appliance 202 receives Spoken
`natural language input requests from the user in the form of
`voice data, and transmits that data (preferably via wireless
`data receiving station 204) across communications network
`206 for server-side interpretation of the request, in similar
`fashion as described above in connection with FIG. 1.
`Navigation of data source 210 and retrieval of desired
`information likewise proceeds in an analogous manner as
`described above. Display information transmitted electroni
`cally back to the user across network 206 is displayed for the
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 6513,063 B1
`
`6
`user on the display of information appliance 202, and audio
`information is output through the appliance's Speakers.
`Practitioners will further appreciate, in light of the above
`teachings, that if mobile information appliance 202 is
`equipped with Sufficient computational processing power,
`then a mobile variation of the client-side architecture exem
`plified in FIG.2 may similarly be implemented. In that case,
`the modules corresponding to request processing logic 300
`would be embodied locally in the computational resources
`of mobile information appliance 202, and the logical flow of
`data would otherwise follow in a manner analogous to that
`previously described in connection with FIG. 1b.
`AS illustrated in FIG. 2, multiple users, each having their
`own client input device, may issue requests, Simultaneously
`or otherwise, for navigation of data source 210. This is
`equally true (though not explicitly drawn) for the embodi
`ments depicted in FIGS. 1a and 1b. Data source 210 (or
`100), being a network accessible information resource, has
`typically already been constructed to Support access requests
`from Simultaneous multiple network users, as known by
`practitioners of ordinary skill in the art. In the case of
`Server-Side Speech processing, as exemplified in FIGS. 1 a
`and 2, the interpretation logic and error correction logic
`modules are also preferably designed and implemented to
`Support queuing and multi-tasking of requests from multiple
`Simultaneous network users, as will be appreciated by those
`of skill in the art.
`It will be apparent to those skilled in the art that additional
`implementations, permutations and combinations of the
`embodiments set forth in FIGS. 1a, 1b, and 2 may be created
`without Straying from the Scope and Spirit of the present
`invention. For example, practitioners will understand, in
`light of the above teachings and design considerations, that
`it is possible to divide and allocate the functional compo
`nents of request processing logic 300 between client and
`Server. For example, Speech recognition-in entirety, or
`perhaps just early Stages Such as feature extraction-might
`be performed locally on the client end, perhaps to reduce
`bandwidth requirements, while natural language parsing and
`other necessary processing might be performed upstream on
`the Server end, So that more extensive computational power
`need not be distributed locally to each client. In that case,
`corresponding portions of request processing logic 300, Such
`as Speech recognition engine 310 or portions thereof, would
`reside locally at the client as in FIG. 1b, while other
`component modules would be hosted at the Server end as in
`FIGS. 1a and 2.
`Further, practitioners may choose to implement the each
`of the various embodiments described above on any number
`of different hardware and Software computing platforms and
`environments and various combinations thereof, including,
`by way of just a few examples: a general-purpose hardware
`microprocessor Such as the Intel Pentium Series, operating
`system software such as Microsoft Windows/CE, Palm OS,
`or Apple Mac OS (particularly for client devices and client
`side processing), or Unix, Linux, or Windows/NT (the latter
`three particularly for network data Servers and Server-side
`processing), and/or proprietary information access platforms
`such as Microsoft's WebTV or the Diva Systems video-on
`demand System.
`2. Processing Methodology
`The present invention provides a spoken natural language
`interface for interrogation of remote electronic databases
`and retrieval of desired information. A preferred embodi
`ment of the present invention utilizes the basic methodology
`outlined in the flow diagram of FIG. 4 in order to provide
`this interface. This methodology will now be discussed.
`
`Comcast - Exhibit 1012, page 12
`
`

`

`7
`a. Interpreting Spoken Natural Language Requests
`At Step 402, the user's spoken request for information is
`initially received in the form of raw (acoustic) voice data by
`a Suitable input device, as previously discussed in connec
`tion with FIGS. 1-2. At step 404 the voice data received
`from the user is interpreted in order to understand the user's
`request for information. Preferably this step includes per
`forming Speech recognition in order to extract words from
`the Voice data, and further includes natural language parsing
`of those words in order to generate a structured linguistic
`representation of the user's request.
`Speech recognition in Step 404 is performed using Speech
`recognition engine 310. A variety of commercial quality,
`Speech recognition engines are readily available on the
`market, as practitioners will know. For example, Nuance
`Communications offers a Suite of Speech recognition
`engines, including Nuance 6, its current flagship product,
`and Nuance Express, a lower cost package for entry-level
`applications. As one other example, IBM offers the ViaVoice
`Speech recognition engine, including a low-cost Shrink
`wrapped version available through popular consumer distri
`bution channels. Basically, a speech recognition engine
`processes acoustic Voice data and attempts to generate a text
`Stream of recognized words.
`Typically, the Speech recognition engine is provided with
`a Vocabulary lexicon of likely words or phrases that the
`recognition engine can match against its analysis of acous
`tical Signals, for purposes of a given application. Preferably,
`the lexicon is dynamically adjusted to reflect the current user
`context, as established by the preceding user inputs. For
`example, if a user is engaged in a dialogue with the System
`about movie Selection, the recognition engine's vocabulary
`may preferably be adjusted to favor relevant words and
`phrases, Such as a Stored list of proper names for popular
`movie actors and directors, etc. Whereas if the current
`dialogue involves Selection and Viewing of a Sports event,
`the engine's Vocabulary might preferably be adjusted to
`favor a Stored list of proper names for professional Sports
`teams, etc. In addition, a spee

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket