`
`(19) World Intellectual Property Organization
`International Bureau
`
`I lllll llllllll II llllll lllll llll I II Ill lllll lllll 111111111111111111111111111111111
`
`(43) International Publication Date
`12 July 2001 (12.07.2001)
`
`PCT
`
`(10) International Publication Number
`WO 01/50453 A2
`
`(51) International Patent Classification7:
`
`GlOL
`
`(74) Agent: WEN, Liu; Liu & Liu LLP, 811West7th Street,
`Suite 1100, Los Angeles, CA 90017 (US).
`
`(21) International Application Number: PCT/USOl/00376
`
`(22) International Filing Date: 4 January 2001 (04.01.2001)
`
`(25) Filing Language:
`
`(26) Publication Language:
`
`English
`
`English
`
`(30) Priority Data:
`60/174,371
`
`4 January 2000 (04.01.2000) US
`
`(71) Applicant
`(jor all designated States except US):
`HEYANITA, INC.
`[US/US]; 6100 Wilshire Blvd.,
`Suite 600, Los Angeles, CA 90048 (US).
`
`(72) Inventors; and
`KOVATCH,
`(75) Inventors/Applicants (jor US only):
`Alexander, L. [US/US]; 1140 The Strand #B, Manhat(cid:173)
`tan Beach, CA 90266 (US). KUWADEKAR, Sanjeev
`[US/US]; 11679 Amigo Avenue, Northridge, CA 91326
`(US). DESAI, Adesh [US/US]; 12021 Doral Avenue,
`Northridge, CA 91326 (US). SODHI, Deepak [US/US];
`12041 Doral Avenue, Northridge, CA 91326 (US).
`
`(81) Designated States (national): AE, AG, AL, AM, AT, AU,
`AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CR, CU, CZ,
`DE, DK, DM, DZ, EE, ES, Fl, GB, GD, GE, GH, GM, HR,
`HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, LK, LR,
`LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, MX, MZ,
`NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, TJ, TM,
`TR, TT, TZ, UA, UG, US, UZ, VN, YU, ZA, ZW.
`
`(84) Designated States (regional): ARIPO patent (GH, GM,
`KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZW), Eurasian
`patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European
`patent (AT, BE, CH, CY, DE, DK, ES, Fl, FR, GB, GR, IE,
`IT, LU, MC, NL, PT, SE, TR), OAPI patent (BF, BJ, CF,
`CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG).
`
`Published:
`Without international search report and to be republished
`upon receipt of that report.
`
`For two-letter codes and other abbreviations, refer to the "Guid(cid:173)
`ance Notes on Codes and Abbreviations" appearing at the begin(cid:173)
`ning of each regular issue of the PCT Gazette.
`
`iiiiiiii
`
`----iiiiiiii
`==
`iiiiiiii ---
`
`iiiiiiii
`
`--iiiiiiii
`iiiiiiii ----
`
`~
`ln
`~
`Q ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`~ (54) Title: INTERACTIVE VOICE RESPONSE SYSTEM
`,....i
`Q
`0 various destination nodes within the network. No change is required in the applications provided by the destination nodes. A user
`> can control and navigate the system with no prior knowledge of the system via self-discovery facilities provided as part of a learning
`~ system that adapts itself to the user.
`
`(57) Abstract: A voice response system and method for navigating any network and using facilities and applications provided by
`
`
`
`WO 01/50453
`
`PCT/USOl/00376
`
`INTERACTIVE VOICE RESPONSE SYSTEM
`
`5 BACKGROUND OF THE INVENTION
`
`1.
`
`Field of the Invention
`
`The present invention relates to voice-based interactive user interfaces, particularly to
`
`10
`
`interactive voice response systems, and more particularly to interactive voice response
`
`systems for accessing information from a computer network via remote tel~phony
`
`devices.
`
`2.
`
`Description of Related Art
`
`15
`
`Voice mail and other interactive voice response (IVR) systems allow a user to access
`
`audio information stored in a computer memory such as a hard disk. Typically, the
`
`audio information is stored in audio files created either by the user or for the user.
`
`Conventional IVR systems use dual-tone multi-frequency (DTMF) signalling to allow
`
`20
`
`the user to interact with the server through a standard telephone keypad. Pre-recorded
`
`audio information is available on IVR systems in the form of instructional phrases
`
`such as ,"Please type in your account number followed by the pound sign."
`
`Pre-recorded audio is also used for introductory phrases such as "Your account
`
`25
`
`balance is ... " At this point, the IVR computer may access a connected database that
`
`stores the requested account balance in numerical format, convert the nu:qierical
`
`format to an audio format using a numerical text-to-speech engine, and state the
`
`account balance. This conversion from numerical format to audio format is extremely
`
`rigid and completely predefined. IVR systems are "closed" in that each IVR system is
`
`30
`
`uniquely designed, not connected to a computer network, and IVR systems cannot be
`
`used interchangeably. Also, these IVR systems are designed specifically for audio
`
`interaction.
`
`In contrast, audio/visual information on an audio/visual server in a computer network
`
`1
`
`
`
`WO 01/50453
`
`PCT/USOl/00376
`
`may be accessed using a personal computer. For example, a World Wide Web (Web)
`
`page on the Internet may be accessed using a computer linked through an Internet
`
`access provider, such as America On Line™. or Prodigy™, to a Web server.
`
`5
`
`The Internet has emerged as a mass communications, commerce and entertainment
`
`medium. Worldwide, people are enabled to interact, distribute and collect
`
`inforillation, create community with individuals sharing similar interests and make
`
`purchases electronically. According to International Data Corporation ("IDC"),
`
`worldwide e-commerce totaled approximately $32 billion in 1998 and is expected to
`
`10
`
`total over $425 billion in 2002. IDC also projects that worldwide Internet use will
`
`grow from approximately 142 million users in 1998 to 502 million users in 2003. In
`
`light of the proliferation oflnternet usage, Forrester Research projects that global
`
`online advertising spending will reach $33 billion by 2004, while online advertising in
`
`the U.S. will grow from $2.8 billion in 1999 to $22 billion in 2004.
`
`15
`
`The growth of the Internet over the past five years has been nothing short of
`
`spectacular, particularly in the U.S. This proliferation however, is largely confined to
`
`westernized countries. Recent studies by Commerce Net and the Stanford Institute
`
`for the Quantitative Study of Society have yielded some startling results:
`
`20
`
`• 92% of the world's population has no access to the Internet
`
`• 90% of the U.S. population also has no access to the Internet at least half of
`
`the time
`
`• People are more mobile than ever before
`
`• Cell phone penetration is rapidly increasing
`
`25
`
`• A quarter of the U.S. population is apprehensive about or experiences
`
`difficulty using computers and the Internet
`
`Further, in certain situations, however, use of a computer may not be feasible or
`
`access to a computer may not be possible. For example, a cellular telephone user
`
`30
`
`driving an automobile may want to lmow about traffic in the surrounding area,
`
`however, the user cannot operate a computer while in the car. In situations such as
`
`this, an audio interface may be useful for obtaining information from the Internet or
`
`another computer network.
`
`2
`
`
`
`WO 01/50453
`
`PCT/USOl/00376
`
`10
`
`The telecommunications industry has experienced strong growth over the last decade.
`
`Despite its growth, the highly fragmented telecommunications industry is ·being
`
`changed by the emergence of the Internet as a global medium for communication,
`
`news, information and commerce. Substantial portions of the commerce and
`
`advertising markets remain uncaptured. The proliferation oflnternet, cellular and
`
`15
`
`telecommunications users, combined with the global reach and lower cost of
`
`distribution in such arenas, have created a powerful channel for delivering
`
`entertainment and information and conducting related advertising and commerce.
`
`It is interesting to note that each area code enables nearly 8 million separate telephone
`
`20
`
`numbers and the total number of area codes in service has nearly doubled since 1991,
`
`growing from 119 to 215, according to the FCC. In California alone, the California
`
`Public Utilities Commission expects the number of area codes in service to increase
`
`from 13 in January 1997, to 40 by 2002. A significant portion of this growth is due to
`
`the rapid proliferation of cellular and PCS telephone service. The number of U.S.
`
`25 wireless subscribers is expected to grow to 149 million in 2003, representing a
`
`wireless market penetration of 53%. The global wireless penetration is expected to
`
`increase from 425 million in 1999 to 953 million in 2003.
`
`U.S. Patent No. 5,884,262 discloses a computer document audio access and
`
`30
`
`conversion system that allows a user to access information originally formatted for
`
`audio/visual interfacing on a computer network via a simple telephone. Of course,
`
`files formatted specifically for audio interfacing can also be accessed by the system. A
`
`user can call a designated telephone number and request a file via dual-tone multi(cid:173)
`
`frequency (DTMF) signaling or through voice commands. The system analyzes the
`
`3
`
`
`
`WO 01/50453
`
`PCT/USOl/00376
`
`request and a~cesses a predetermined document. The document may be in a standard
`
`document file format, such as hyper-text mark-up language (HTML) which is used on
`
`the World Wide Web. The document is analyzed by the system, and depending on the
`
`different types of formats used in the document, information is translated from an
`
`5
`
`audio/visual format to an audio format and played to the user via the telephone
`
`interface. The document may contain links to other documents that can be invoked to
`
`access such other documents. In addition, the system can have a native command
`
`capability that allows the system to act independently of the accessed document
`
`contents to replay a document or carry out functions similar to those available in
`
`10
`
`conventional web browsers.
`
`The system disclosed in U.S. Patent No. 5,884,262 is limited to handling information
`
`originally formatted for audio/visual interfacing to a computer network via a
`
`telephone. There is a need for flexible interactive access to information that is not
`
`15
`
`originally formatted for audio interfacing to a computer network via telephony
`
`devices. There is a need for interactive telephony access to a computer network, such
`
`as the Internet, to expand and enrich usage with unique and compelling content and
`
`products.
`
`20
`
`SUMMARY OF THE INVENTION
`
`The present invention is directed to an interactive voice response system that permits
`
`users to access information that is not originally formatted for audio interfacing to an
`
`25
`
`information exchange network, such as a computer network. Users spoken utterance
`
`is analyzed and matched with an index of destinations. A list of valid destinations is
`
`produced and the user is the guided along the path with pre-recorded voice prompts.
`
`The user accessing the system can control the navigation via more speech and/or
`
`telephone keypad entry. The intent of the system is to be able to come up with a
`
`30
`
`single choice destination amongst the many offered within the system.
`
`The decision to choose a valid destination is driven by a variety of factors
`
`User preferences
`
`4
`
`
`
`WO 01/50453
`
`PCT/USOl/00376
`
`User profile derived from usage pattern history
`
`User responses
`
`Advertiser rules
`
`Utterance match weightage
`
`5
`
`Active context
`
`Call origin
`
`Call date/time
`
`Call length
`
`10
`
`The destination that is derived earlier is then accessed via spoken utterance and/or
`
`telephone keypad entry. User specific information about the destination is derived
`
`from the user profile and the cunent call context and is used to offer access to the
`
`facilities offered by the destination. The facilities offered are specific to the
`
`application provided by the destination node.
`
`15
`
`User responses and queries are appropriately translated to the destination fonnat and
`
`vice versa. All of the interaction is via concatenated pre-recorded or synthe~ized
`
`voice segments or fragments.
`
`20
`
`The inventive voice response system includes a number of novel functional and
`
`logical components, including without limitations query engine, ad generator, web
`
`parser, profiler and replication engine, managed by a manager. These components
`
`may physical reside in the same or different servers.
`
`25
`
`The present invention will be described in reference to "Hey Anita", and in the
`
`alternate "Anita", which references relates to the commercial system launched by
`
`HeyAnita, Inc. (www.heyanita.com).
`
`Hey Anita Inc. 's proposed solution is to enable the world's population to access, by
`
`30
`
`voice, the wealth of information and applications available on the Internet, using any
`
`type of phone - rotary, touchtone or wireless. The rationale behind this vision is
`
`threefold:
`
`1. Everyone knows how to use a telephone.
`
`5
`
`
`
`WO 01/50453
`
`PCT/USOl/00376
`
`2. Most cities in the world already have reliable land-line phones as well as
`
`wireless infrastructure.
`
`3. The easiest user interface is the speaker's natural language, both spoken and
`
`heard.
`
`5
`
`As competition within Internet and cellular usage intensifies, high traffic Internet
`
`portals, other e-commerce providers and traditional companies will continue to seek
`
`ways to expand and enrich their consumer offerings with unique and compelling
`
`content and products. This will create significant opportunities for Hey Anita to
`
`10
`
`connect eyeballs to eardrums, thereby enabling these companies to target and reach a
`
`significantly expanded audience.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`15
`
`FIG. 1 is a schematic representation of the Anita Server Architecture.
`
`FIG. 2 is a schematic representation of the logical internal structure of Anita Server.
`
`20
`
`FIG. 3 is a schematic representation of the overall Hey Anita global infrastructure that
`
`comprises Anita Servers in various countries, cities, and other locales.
`
`FIG. 4 illustrates one embodiment of a "tree" structure that exemplifies how
`
`clarification questions would be asked while narrowing down a search.
`
`25
`
`FIG. 5 is a schematic represe.ntation of the Hey Anita Operating System.
`
`30 DETAIL DESCRIPTION OF THE INVENTION
`
`The present description is of the best presently contemplated mode of carrying out the
`
`invention. This description is made for the purpose of illustrating the general
`
`6
`
`
`
`WO 01/50453
`
`PCT/USOl/00376
`
`principles of the invention and should not be taken in a limiting sense. The scope of
`
`the invention is best determined by reference to the appended claims.
`
`The present invention will be described below in reference to the Internet as an
`
`5
`
`example of an information exchange network. The present invention is applicable to
`
`other types of information network without departing from the scope and spirit of the
`
`present invention.
`
`10
`
`The HevAnita Solution
`
`Hey Anita enables individuals to surf the Internet from any phone, anywhere, anytime
`
`simply by using their voice. By utilizing its revolutionary Hey Anita operating system
`
`("Hey Anita OS") technology and easy to use interface, Hey Anita establishes a ·
`
`comprehensive Voice Internet Portal ("VIP"), providing a voice interface to the
`
`15
`
`Internet and allowing Internet and telephone users to access volumes of information,
`
`headline news, stock quotes, horoscopes, auctions, food delivery services, weather
`
`forecasts, sports scores, travel, shipping status, free integrated voice mail, and much
`
`more. In addition, HeyAnita enables e-commerce providers to add.voice application
`
`(v-application) services to their existing platform and enables traditional corporations
`
`20
`
`to efficiently compete in the digital arena. Hey Anita's unique solution increases
`
`traffic and commerce by providing access to individuals who do not use traditional
`
`Web-based browsers and also allows traditional Internet users access from locations
`
`lacking connectivity.
`
`25 Hey Anita uses its proprietary technology and easy to use interface to create an
`
`informative and entertaining environment to attract and retain a large and loyal user
`
`base. In addition to its easily brandable name and concept; Hey Anita offers the most
`
`comprehensive array of voice enabled services and allows phone users to access the
`
`Internet in multiple languages. Appendix B sets forth some of the application features
`
`30
`
`possible with the inventive Hey Anita system.
`
`Architecture
`
`Hey Anita Voice Platform is a set of components based on Microsoft Windows DNA
`
`architecture that allows developers and power-users to rapidly develop and deploy
`
`7
`
`
`
`WO 01/50453
`
`PCT/USOl/00376
`
`speech applications. The platform is an open environment that encapsulates a speech
`
`recognition engine, audio input sources (speaker, telephone) and audio output sources
`
`(speaker, telephone). It provides a vendor independent interface to the voice
`
`application by providing a consistent interface to the various audio devices and the
`
`5
`
`speech recognition engine.
`
`Any application written to these interfaces can be ported from one device to another
`
`or from one speech recognition vendor to another merely by creating the appropriate
`
`object. For example, .devefopers can develop and test their voice applications using a
`
`10
`
`PC speaker and a microphone and then move the application to the telephone just by
`
`creating objects that support the telephone device.
`
`The primary design considerations, features and functionalities for the Hey Anita
`
`Voice Platform are:
`
`15
`
`Device Transparency: Hey Anita Voice Platform is not tied to any hardware device.
`
`It provides plug-and-play flexibility to switch the underlying hardware without having
`
`to modify the actual application. Because of this, developers do not need any special
`
`hardware to write and test their applications. They will be able to write their
`
`20
`
`applications on standard Microsoft Windows PCs and deploy them on any telephony
`
`platfonn.
`
`Speech Recognition Engine Transparency: Hey Anita Voice Platform is not tied to
`
`any specific speech recognition engine. It provides plug-and-play flexibility to switch
`
`25
`
`the underlying speech recognition engine without having to modify the actual
`
`application. Developers will be able to develop applications on any shareware speech
`
`recognition engine and later deploy them on any of the popular commercial speech
`
`recognition engines s~ch as Speechworks or Nuance.
`
`30 Language of Choice: Hey Anita Voice Platform does not force developers to learn a
`
`new language such as VXML. In addition to W3C VXML, Hey Anita Voice Platform
`
`allows developers to write applications in a language qftheir choice. For instance,
`
`any COM-c.ompliant language such as Visual Basic, Visual C++ or Java can be used
`
`to develop applications on the Hey Anita Voice Platform.
`
`8
`
`
`
`WO 01/50453
`
`PCT/USOl/00376
`
`Rich VUI: Hey Anita Voice Platform's open architecture allows developers to plug in
`
`third-party components to make their Voice User Interfaces richer. Developers do not
`
`· have to settle for mediocre Voice Interfaces because of the limitations in the platform
`
`5
`
`or language.
`
`Location Transparency: Hey Anita Voice Platform allows developers to host their
`
`applications on any server on the Internet. All the pieces of Hey Anita Voice Platform
`
`are developed with location transparency in mind.
`
`10
`
`Multiple Language Support: Hey Anita Voice Platform has been designed to support
`
`international languages. Any application written on Hey Anita Voice Platform can be
`
`localized in any international language without any code changes.
`
`15 HevAnita Voice Platform/HeyAnita OS:
`
`Hey Anita OS is a multi-threaded surrogate process that hosts all the Hey Anita
`
`components and application objects. It takes care of all the thread management and
`
`monitoring, administration so that applications writers do not have to worry about
`
`issues such as thread synchronizations. Fig. 5 shows the components of the Hey Anita
`
`20 OS (100).
`
`HeyAnita Speech Objects (110):
`
`These are a set of COM+ components that encapsulate hardware devices and speech
`
`recognition engines. Once the applications are written using these interfaces, they can
`
`25
`
`be J)Orted easily from one hardware device to another or from one recognition engine
`
`to another by simply replacing the corresponding Hey Anita Speech Object.
`
`Speech Recognition Manager (SR) - This object encapsulates the speech
`
`recognition engine and the text to speech engines and provides a consistent interface
`
`to these engines in a vendor independent fashion.
`
`30 Audio Source (AI) - This object encapsulates the audio input device and provides a
`
`consistent interface in a device independent fashion.
`
`Audio Destination (AO) -This object encapsulates the audio output device and
`
`provides a consistent interface in a device independent fashion.
`
`9
`
`
`
`WO 01/50453
`
`PCT/USOl/00376
`
`Grammar Object (GO)-This object provides a consistent interface to provide
`
`grammar files for speech recognition. The grammar files can reside anywhere on the
`
`Internet. The grammar object refers to the grammars files by URI.
`
`Prompt Object (PO)-This object provides a consistent interface to provide prompts
`
`5
`
`in speech applications. The prompts can reside anywhere on the Internet. The prompt
`
`object refers to the prompt files by URI.
`
`A typical voice application will create a SR object for speech recognition, an AI
`
`object as an audio input object, an AO object as an audio output, a GO object for
`
`10
`
`recognizing speecl1- and several PO objects for the various prompts it may require. The
`
`application can then play the prompts using the audio out object, accept input using
`
`the audio in object and· recognize the input using the speech recognition· object while
`
`the grammar object gives context to the speech recognition object.
`
`15 Hey Anita Agent (116):
`
`Hey Anita Agent is a set of COM+ objects that allow speech applications to access
`
`data in a consistent manner. This makes speech applications transparent to the
`
`underlying data format. Applications access data in any OLE DB-compliant database,
`
`XML page, HTML page or W AP page using the same programming model.
`
`20
`
`Speech Applications (114):
`
`Speech applications are written as a set of COM+ components or VXML files. These
`
`applications can be written in any COM-compliant language such as Visual Basic,
`
`Visual C++ or Java. It is also possible to write an application using multiple
`
`25
`
`languages, e.g., it is possible to make use of a VXML file inside a Visual Basic
`
`speech application. This flexibility allows developers to write voice applications
`
`faster and in the language they are most comfortable with.
`
`Applications written to Hey Anita speech platforms don't have to reside on the same
`
`30
`
`server that the platform resides. These COM+ components can be installed locally on
`
`the telephony server or any remote machine. In fact these applications can reside
`
`anywhere on the Internet. Applications on the Internet communicate with the
`
`platform using SOAP.
`
`10
`
`
`
`WO 01/50453
`
`PCT/USOl/00376
`
`HeyAnita Tools/Wizards (118):
`
`Hey Anita tools are a set of design time controls (DTCs) that allows the developers to
`
`qufokly generate Speech Applications in a drag-and-drop fashion. Developers do not
`
`5
`
`have to learn a new language such as VXML. All the code is generated by these
`
`design time controls. These tools are provided for all components included in the
`
`Hey Anita framework. In addition to the DTCs, add-ins are provided for Office to
`
`facilitate easy authoring of content.
`
`10 Many components from the Hey Anita framework have associated metadata and data
`
`elements. Tools are provided for easy management of this content. Application
`
`wizards are provided for popular functions, such as a "shopping cart", "get a stock
`
`quote" etc. In addition, since the Hey Anita wizard model is a Visual Studio DTC,
`
`developers can create their own wizards or extend existing 011;es.
`
`15
`
`Hey Anita Framework (112):
`
`Hey Anita framework provides a number of plug-and-play COM+ components to
`
`facilitate rapid development and deployment of voice applications. Using these
`
`components as building blocks and writing just the code to glue them together,
`
`20
`
`programmers can create voice applications in a matter of hours. All the necessary
`
`voice user interface, grammars and functionality are implemented by these
`
`components. All the components contain the necessary audio prompts and
`
`grammars. Developers, however, have the ability to override these by customing their
`
`prompts or grammars.
`
`25
`
`This is an extensible, open framework. It allows developers to add new value-added
`
`components to this framework by simply exposing a set of published COM+
`
`interfaces. Most of the Hey Anita portal applications are built using this framework.
`
`30 Depending on the functionality, these components fall into one of the following
`
`categories:
`
`11
`
`
`
`WO 01/50453
`
`PCT/USOl/00376
`
`• Basic Components: These are basic building blocks for constructing a voice
`
`application~ When developers use these components, they automatically get
`
`consistent and easy-to-use voice interfaces across all their applications.
`
`5
`
`• Data-bound components: These components implement standardized voice
`
`interface on top of commonly used data elements.
`
`• Value-added components: Value-added components provide all the bells and
`
`whistles for making voice user interface entertaining and fun-to-use.
`
`10
`
`Basic Components:
`
`The Hey Anita framework may include the following basic components:
`
`15
`
`1. Sentence: Plays back a set of sentences.
`
`2. Input: Gets voice command input from the user.
`
`3. Menu: Implements smart voice menu.
`
`4. Number: Plays back a number.
`
`5. Currency: Plays back cmTency.
`
`20
`
`6. Date: Plays back date.
`
`7. Time: Plays back time.
`
`8. Credit Card: Gets credit card information from the user.
`
`9. Social Security Number: Gets social security number from the user.
`
`10. Name: Gets name information from the user.
`
`25
`
`11. Address: Gets address information from the user.
`
`12. VXML Parser: Parses and executes a W3C compatible VXML stream.
`
`Data-bound Components:
`
`30
`
`The Hey Anita framework may include the following data-bound components:
`
`1. Stock Quote: Retrieves individual stock quotes.
`
`2. Portfolio: Retrieves quotes for all the stocks in the portfolio. Also, allows the
`
`users to manage their portfolios.
`
`12
`
`
`
`WO 01/50453
`
`PCT/USOl/00376
`
`3. Weather: Retrieves weather information
`
`4. Movie Show Times: Retrieves movie show times
`
`5. Movie Previews: Retrieves movie previews
`
`6. Store/Service Locator: Locates a store or a service
`
`5
`
`7. Status Inquiry: Checks status of an order, shipment
`
`8. Yellow Pages: Yellow page inquires
`
`Developers will be able to bind these to any OLE DB provider or XML repository to
`
`retrieve the necessary data.
`
`10 Value-added Components:
`
`The Hey Anita framework may include the following value-added components:
`
`1. AdMixer: Selects. advertisements based on the user's preferences and history.
`
`2. Randomize: Randomizes selection of audio prompts (from a pre-defined set).
`
`15
`
`3. Joke-of-the-day: Selects a joke of the day.
`
`4. Login: Allows users to login.
`
`5. Registration: Allows users to register.
`
`6. Debug: Adds debugging trace to the voice application.
`
`Notifications/ Alerts: Sends outbound notifications/alerts.
`
`20
`
`Anita Server
`
`One of the primary components of the Hey Anita system is the Anita Server 120 (Fig.
`
`1) that implements the Hey Anita Voice Platform, which consists of several
`
`25
`
`components to implement the following functionality and features:
`
`1. Wait for an incoming call
`
`2. When a call is received, listen to user's voice as commands and/or free-form
`
`speech or telephone keypad entry
`
`30
`
`3. Decompose spoken utterance into proprietary commands using proprietary
`
`word-mapping techniques and voice recognition grammar
`
`4. Ask relevant questions in order to determine user preferences and context
`
`5. Identify the destination using proprietary search algorithms within the
`
`destination tree
`
`13
`
`
`
`WO 01/50453
`
`PCT/USOl/00376
`
`6. Navigate to the destination and retrieve requested information
`
`7. Translate retrieved information into voice prompts
`
`8. Generate commercials based on user preferences, usage history patterns and
`
`context
`
`5
`
`9. Intermix commercials and information in a seamless manner to generate
`
`unique entertaining experience for the user
`
`10. Return information back to the user in the form of concatenated speech
`
`fragments and/or synthesized voice
`
`10
`
`ANITA SERVER - ARCIDTECTURE
`
`Fig. 1 is a schematic representation of the Anita Server Architecture. The Anita
`
`15
`
`Server 120 is a fault tolerant, scaleable, remotely manageable, multi-threaded NT
`
`Service. This comprises the following components:
`
`a.
`
`Anita Teleplione Interface (1)
`
`20
`
`Implements call management features such as ring and hangup detection, call
`
`switch-over, call transfer, call waiting and tromboning. This also.implements
`
`functionality to transfonn computer audio files (.wav files) to audio streams that can
`
`be played on a telephone 15 and to detect user utterances on the phone line to pass
`
`them on to the Anita Speech Recognition Engine. This may be implemented using
`
`25 Dialogic system software version DNA 3 .2 and Nuance Speech recognition system
`
`version 6.2.
`
`b.
`
`Anita Speech Recognition Engine (2)
`
`30
`
`Translates spoken utterances to a set of text phrases. This engine supports a
`
`number of languages and is speaker independent. This may be implemented using
`
`Nuance Speech recognition system version 6.2. This engine serves as input to the
`
`Anita Natural Language Engine, described below.
`
`14
`
`
`
`WO 01/50453
`
`PCT/USOl/00376
`
`c.
`
`Anita Natura~ Language Engine (3)
`
`Converts natural language sentences to a set of structured commands. These
`
`structured commands are then used to drive Anita Query Engine. The Anita Natural
`
`5
`
`Language Engine in conjunction with Anita Query Engine identify destination nodes
`
`and the applications that are available to the user. This engines serves as input to the
`
`Anita Query Engine, described below:
`
`d.
`
`Anita Que1y Engine (4)
`
`10
`
`Maps commands to an application defined using the Hey Anita Speech Objects
`
`110 and Speech Applications 114, or Hey Anita function library (see example in
`
`Appendix A) and state machine definition language. An example of an application
`
`would be to obtain weather information using Yahoo! Web site. This would provide a
`
`15
`
`user of the system the capability of listening to weather infonnation for a set of cities
`
`or zip codes. The Anita Query Engine does the following:
`
`1) Play voice prompts for the user to exactly identify an application
`
`2) Generate web URLs to initiate execution of the selected application
`
`20
`
`3) Hand over control to the Anita State Machine and Web Parser, described
`
`below
`
`e.
`
`Anita State Machine and Web Parser (8)
`
`25
`
`Anita State Machine and Web Parser executes state machines written using a
`
`proprietary function library. This retrieves information web sites and other
`
`applications that are enabled for this operation. In addition, its web-parsing function
`
`also allows Anita Query Engine to retrieve web pages from any conventional web site
`
`on the Internet and convert unstructured HTML data into meaningful structured data.
`
`30
`
`It is not mandatory to make changes to existing web sites to make them work with
`
`Anita State Machine and Web Parser. An example of this would be the operations
`
`performed to pass in a zip code to the Yahoo web site, execute the form to retrieve the
`
`results, select and format the results, play relevant information in the form of
`
`concatenated speech fragments. In this scenario the Yahoo! web site was not
`
`15
`
`
`
`WO 01/50453
`
`PCT/USOl/00376
`
`modified to support the operations nor was it aware that a voice-enabled application
`
`was using its HTML based services.
`
`f.
`
`A1iita Profiler (10)
`
`During each user session, Anita Query Engine transfers relevant information
`
`to Anita Profiler. Anita Profiler captures and filters this infonnation t9 build a
`
`repository of user preferences, navigational history and usage patterns. Anita Profiler
`
`recognizes the phone number of the incoming caller and can work without any user
`
`10
`
`registration.
`
`g.
`
`Anita Ad Generator/Mixer (9)
`
`Implements complex algorithms to create an entertaining experience for the
`
`15
`
`user by mixing advertisements and information in a seamless manner. This algorithm
`