`Imielinski et al.
`
`111111111111111111111111111111111111111111111111111111111111111111111111111
`US006240448Bl
`US 6,240,448 Bl
`May 29,2001
`
`(10) Patent No.:
`(45) Date of Patent:
`
`(54) METHOD AND SYSTEM FOR AUDIO
`ACCESS TO INFORMATION IN A WIDE
`AREA COMPUTER NETWORK
`
`(58) Field of Search ..................................... 709/229, 217,
`709/218, 219, 246; 379/88.15, 90.01, 93.01,
`93.25, 101.01
`
`(75)
`
`Inventors: Tomasz Imielinski, North Brunswick;
`Aashu Virmani, Bridgewater, both of
`NJ (US)
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`(73)
`
`Assignee: Rutgers, The State University of New
`Jersey, Piscataway, NJ (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.:
`
`09/091,591
`
`(22) PCT Filed:
`
`Dec. 20, 1996
`
`(86) PCT No.:
`
`PCT/US96/20409
`
`§ 371 Date:
`
`May 7, 1999
`
`§ 102(e) Date: May 7, 1999
`
`(87) PCT Pub. No.: W097/23973
`
`PCT Pub. Date: Jul. 3, 1997
`
`Related U.S. Application Data
`(60) Provisional application No. 60/009,153, filed on Dec. 22,
`1995.
`Int. Cl? ............................. H04L 12/16; H04M 1/64
`(51)
`(52) U.S. Cl. ........................ 709/218; 709/246; 379/90.01
`
`5,297,249
`5,305,375
`5,349,636
`5,351,276
`
`................... 345/356
`3/1994 Bernstein et a!.
`4/1994 Sagara et a!. ..................... 379/88.27
`9/1994 Irribarren .......................... 379/88.15
`9/1994 Doll, Jr. eta!. .................. 379/88.17
`
`Primary Examiner-Dung C. Dinh
`(74) Attorney, Agent,
`or Firm-Mathews, Collins,
`Shepherd & Gould, P.A.
`ABSTRACT
`
`(57)
`
`Method and system for representing different types of infor(cid:173)
`mation on a wide area network in a form suitable for access
`over an audio interface (12). Audio enabled pages (29) are
`created to link particular text data, which data can be from
`the World Wide Web (WWW). The audio enabled pages (29)
`can be retrieved by an audio web server (16, 18) for
`interpreting the pages into audio which is displayed at the
`audio interface (12). Audio input means are created to
`traverse the links of the audio enable pages (29). A user can
`use the keypad of the phone (12) or spoken commands to
`traverse the audio menus. In addition, dynamic audio input
`means can be created to selectively traverse a database of the
`WWW information.
`
`31 Claims, 5 Drawing Sheets
`
`RingCentral Ex-1005, p. 1
`RingCentral v. Estech
`IPR2021-00574
`
`
`
`U.S. Patent
`
`May 29,2001
`
`Sheet 1 of 5
`
`US 6,240,448 Bl
`
`10 \
`
`11
`
`FIG. 1
`
`AUDIO WEB
`SERVER
`
`16
`
`HTTP
`FTP
`TCPIIP
`AND OTHER
`PROTOCOLS
`
`RingCentral Ex-1005, p. 2
`RingCentral v. Estech
`IPR2021-00574
`
`
`
`U.S. Patent
`
`May 29,2001
`
`Sheet 2 of 5
`
`US 6,240,448 Bl
`
`FIG. 2
`
`RingCentral Ex-1005, p. 3
`RingCentral v. Estech
`IPR2021-00574
`
`
`
`U.S. Patent
`
`May 29, 2001
`
`Sheet 3 of 5
`
`US 6,240,448 Bl
`
`FIG. 3
`
`~--l~~8_]5o ________ 5_1 --------60--~
`\
`r-
`177 i
`MEMORY FOR
`ACCESSED AUDIO
`-
`ENABLED PAGES
`
`I
`
`I
`I
`
`\
`CALL
`PROCESSOR
`
`)
`
`I
`I
`
`I
`I
`
`I
`I
`
`I
`I
`
`I
`I
`
`I
`I
`
`I
`I
`
`I
`I
`I
`
`I
`I
`
`I
`
`I
`I
`
`\52
`TEXT TO
`AUDIO
`CONVERSION
`
`\54
`INTERPERTER
`OF AUDIO
`ENABLED PAGES
`
`\56
`
`AUDIO
`WEB
`MANAGER
`•
`
`• \62
`MEMORY
`FOR USER
`PROFILES
`
`I
`I
`
`I
`I
`
`I
`
`I
`I
`I
`I -------------------------- --
`\...1 9
`
`587
`MEMORY FOR
`LOCALLY STORED
`AUDIO ENABLED
`PAGES
`
`RingCentral Ex-1005, p. 4
`RingCentral v. Estech
`IPR2021-00574
`
`
`
`U.S. Patent
`
`May 29,2001
`
`Sheet 4 of 5
`
`US 6,240,448 Bl
`
`100
`
`~
`
`FIG. 4
`
`102 7
`ENTER TEXT TO BE DISPLAYED WHEN
`USER ACCESSES AUDIO ENABLED PAGES
`
`104 7
`CREATE LINKS
`BETWEEN USER'S
`KEYPAD AND AUDIO
`ENABLED PAGES
`
`106_7
`CREATE USER AUDIO
`INPUT MENU
`REQUESTING USER
`TO ENTER DATA
`FROM THE KEYPAD
`TO ACCESS
`DATA FILES
`
`108 7
`CREATE USER
`AUDIO INPUT
`MENUS TO
`TRAVERSE
`LINKS OF
`AUDIO ENABLED
`PAGES
`
`200
`
`\
`202 \..
`
`FIG. 5
`
`SPECIFY A NUMBER OF
`TARGET ATTRIBUTES OF
`THE DATABASE
`
`204
`
`\..
`
`SPECIFY A NUMBER OF IDENTIFIER
`ATTRIBUTES IDENTIFYING
`THE TARGET ATTRIBUTES
`
`206
`
`"L
`
`DETERMINE RULES CONTROLLING
`THE FLOW OF USER INPUT
`INTO USER AUDIO MENUS
`
`RingCentral Ex-1005, p. 5
`RingCentral v. Estech
`IPR2021-00574
`
`
`
`U.S. Patent
`
`May 29,2001
`
`Sheet 5 of 5
`
`US 6,240,448 Bl
`
`302
`
`304
`
`306
`
`308
`
`310
`
`312
`
`314
`
`316
`
`318
`
`320
`
`322
`
`324
`
`FIG. 6
`SELECT PREFERENCE OF ATTRIBUTES
`
`CREATE GRAPH OF ATTRIBUTES BY
`IDENTIFYING ATTRIBUTES WHICH
`IMMEDIATELY FOLLOW ONE ANOTHER
`
`ADO TO THE GRAPH A SORT
`OF ATTRIBUTES IN DECREASING
`ORDER OF SELECTIVITY
`
`TOPOLOGICALLY SORT GRAPH
`DETERMINED IN BLOCK 306
`
`DETERMINE ATTRIBUTES OF ZERO DEGREE
`
`IF AMBIGUITIES EXIST FOR ATTRIBUTES
`OF ZERO DEGREE PERFORM AMBIGUITY RESOLUTION
`
`CREATE FIRST AUDIO ENABLED PAGE FOR
`DETERMINED ATTRIBUTES OF ZERO DEGREE
`
`DETERMINE ATTRIBUTES IMMEDIATELY FOLLOWING
`FROM ATTRIBUTES OF ZERO DEGREE
`
`IF AMBIGUITIES EXIST FOR DOUBLE EDGE
`ATTRIBUTES PERFORM AMBIGUITY RESOLUTION
`
`CREATE SUBSEQUENT AUDIO ENABLED PAGE
`FOR DOUBLE EDGE ATTRIBUTES
`
`UPDATE GRAPH BY REMOVING DETERMINE ATTRIBUTES
`OF ZERO DEGREE. DOUBLE EDGE ATTRIBUTES AND
`ALTERNATE ATTRIBUTES FROM GRAPH UNTIL NO
`ATTRIBUTES ARE REPRESENTED
`
`END OF TOPOLOGICAL SORT
`
`RingCentral Ex-1005, p. 6
`RingCentral v. Estech
`IPR2021-00574
`
`
`
`US 6,240,448 Bl
`
`1
`METHOD AND SYSTEM FOR AUDIO
`ACCESS TO INFORMATION IN A WIDE
`AREA COMPUTER NETWORK
`
`This Application claims benefit of Provisional Appln No.
`60/009,153 Dec. 22, 1995.
`
`BACKGROUND OF THE INVENTION
`
`2
`by the user through establishing a voice connection to the
`service. This solution has the drawback that existing Web
`pages include text with embedded links, for example, to
`other addresses of resources which is difficult to read and to
`be understood by the user. Also, numeric and spreadsheet
`data which are typically represented in a two dimensional
`visual table are difficult to convert to speech and even if the
`table is converted to speech, the amount of data in the table
`is difficult for the user to understand and remember.
`In summary, existing approaches to make information
`available on the world wide web accessible over an audio
`interie involve an automatic translation of html documents
`into audio. However, this process cannot be fully automated,
`and in general such an approach is not extensible beyond
`simple text-only pages. For instance, it cannot be used to
`represent numeric data, spreadsheets, tables and databases
`effectively.
`
`SUMMARY OF THE INVASION
`
`25
`
`1. Field of the Invention
`The present invention relates to a method and system for 10
`audio access to resources in a wide area public network,
`such as the Internet.
`2. Description of the Prior Art
`The number of users of wide area computer networks 15
`such as the Internet and the World Wide Web (WWW) is
`growing exponentially. A number of information services
`and resources are currently offered on the Internet and
`WWW. The underlying framework of these services is what
`a user enters a query on a computer which has access to the 20
`Internet. The user can input an address of the resource or can
`use a search engine for conducting a search of available
`resources. The query is processed and a connection between
`the user and a site on the Internet is established with a
`conventional protocol, such as http. A set of answers are
`generated and are returned to the user's computer using the
`protocol For example, stock quote searchable resources have
`been developed which include information directed to the
`prices of stocks in different stock markets. A user can query
`a particular stock, i.e., IBM, or index, i.e., utilities, and the
`resource returns a set of prices satisfying the query. One
`problem with accessing the WWW resources is that a user
`must have access to a computer which is connected to the
`Internet. However, the majority of the world's population
`does not have access to a computer. Also, a user that is away
`from their office or home where their home or office com(cid:173)
`puter is located, and is without a portable laptop computer,
`is not in the position to access the Internet.
`There exists current state of the art audio products on the
`WWW for embedding audio into a Web page or transmitting
`full duplex phone conversation over the Internet. The WWW
`is formed of Web pages authored in a language referred to
`as hypertext mark-up language (HTML). The products digi-
`tal the audio or phone conversation with a sound card. The
`digitized audio is encoded for compressing the audio data in
`order to provide real time connections over the Internet. The
`encoded data can be embedded into a Web page by linking
`the Web page to the encoded data with functions specified in
`HTML. After a user accesses the Internet with a computer,
`the user's Web browser receives the encoded audio data
`when the user accesses the Web page. A user can play the
`audio on a media player at the user's site by the user clicking
`on an audio link in the Web page. Alternatively, the encoded
`audio data can be directly transmitted across the Internet,
`i.e., in an Internet phone product. An Internet phone appli(cid:173)
`cation decodes the encoded audio data and plays the trans(cid:173)
`mitted data at a subscriber's phone. The above-described
`applications have the drawback that the encoded audio
`stream is a large amount of data which must be encoded and
`even after encoding the audio data may be slow in traversing
`the Internet. In addition, the current state of the art audio
`products require the use of a computer to access the audio
`services.
`One current state of the art attempt to overcome the
`aforementioned problem of not having access to a computer 65
`has been to provide a service which recite verbatim an
`existing WWW page to a user. The service can be accessed
`
`Briefly described, the present invention relates to a system
`and method for providing access to internet resources with
`a telephone. The process uses defined steps to represent
`information to be presented over audio. The system and
`method are used to represent different kinds of information
`existing on the web in a form suitable for access over a
`variety of audio interfaces which include touch tone and
`speech recognition.
`Audio enabled pages are created to link particular text
`30 data, which data can be from conventional Web pages.
`Audio enabled pages are stored at an audio web server or on
`individual user machines. An authoring language audio text
`manipulation language, referred to asATML, can be used for
`generating the audio enabled pages. An audio web server
`35 translates the audio enabled pages into audio. The audio
`enabled pages form an "Audio Web", similar to conven(cid:173)
`tional HTML authored pages forming the current Internet
`and World Wide Web. Accordingly, the system of the present
`invention has the advantage that it is not necessary for the
`40 user to have Internet account to use the Audio Web. A user
`can access audio enabled pages through the Audio Web with
`a conventional audio interface, such as a phone, and can
`create audio enabled pages with the audio interface.
`The system includes at least one audio web server can be
`45 accessed by the user with a telephone. The audio web servers
`include call processing features for processing the user's
`call. Also, the audio web servers provide text to speech
`conversions to translate text data represented in audio
`enabled pages into audio. In addition, the audio web servers
`50 can include conventional speech to text converting hardware
`or software. The speech to text conversion is used to enable
`a user to access an audio enabled page with spoken input.
`Instead of entering "My stock" from the keypad, the user can
`say "My stock" and obtain an audio enabled page. Speech
`55 recognition has the advantage of making search and index(cid:173)
`ing tools easier to use. The ambiguities of speech recogni(cid:173)
`tion can be resoled using audio input menus listing all
`possible interpretations of the ambiguous word. Typically, a
`user's call is routed to the audio web server which is closest
`60 in proximity to the user's telephone network, i.e., within the
`user's local calling area.
`In operation, a user calls a dedicated name, for example,
`1-800-AWEB and is connected to an audio web server. Upon
`connecting with the audio web server, the user is queried by
`the audio web server with audio input means to allow the
`user to select data from an audio enabled page. Selection by
`the user results in an audio enabled page being brought into
`
`RingCentral Ex-1005, p. 7
`RingCentral v. Estech
`IPR2021-00574
`
`
`
`US 6,240,448 Bl
`
`4
`computer could be made. Additionally, Audio Web resources
`can be advantageously accessed by handicapped (blind)
`users when the audio enabled pages are combined with
`speech recognition. The invention will be more fully
`described by reference to the following drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a schematic diagram of a system for providing
`audio access in a wide area computer network of the present
`10 invention.
`FIG. 2 is a flow diagram of the progress of an audio
`application using the system shown in FIG. 1.
`FIG. 3 is a schematic diagram of an audio web server of
`the present invention.
`FIG. 4 is a flow diagram of a method for generating audio
`enabled pages of the present invention.
`FIG. 5 is a flow diagram of a method for generating
`dynamic input audio menus to selectively traverse a data-
`20 base of the present invention.
`FIG. 6 is a flow diagram of a method for selecting the
`order of attributes in the dynamic input audio menus of the
`present invention
`
`3
`the audio web server. If the audio enabled page is located
`remotely from the audio web server, a protocol such as http
`is used to connect the audio web server to a universal
`resource locator (URL). The URL is a physical address in
`terms of the WWW where the audio enabled page is actually
`residing. The audio enabled page is received at the audio
`web server and converted into audio at the audio web server.
`The main challenge in authoring the audio enabled pages
`is that it takes a number of audio enabled pages to represent
`the same information as one conventional "visual" page
`authored in HTML. For example, a two dimensional con(cid:173)
`ventional table which may be accessed on the Internet has
`four columns representing different markets (NASDAQ,
`NYSE, AMEX and DOW) and a number of rows corre(cid:173)
`sponding to different indexes (utilities, transportation ... ). 15
`Audio access to the above table can be created by asking the
`user first for the column and then for the row (or vice versa).
`Audio access for the two dimensional table uses NxM+2
`pages (one page to ask for the market selection, another for
`the index selection and N xM pages to store the proper values
`to be read in which N is the number of columns and M is the
`number of rows in the table). Accordingly, the Audio Web
`formed of audio enabled pages is more "dense" or requires
`more pages to represent a two dimensional table than the
`www.
`The present invention provides a method for generating
`audio enabled pages in which links are established between
`a user's keypad of a phone and audio enabled pages. A user
`audio input menu is created for requesting the user to enter
`data from the keypad to access information of data files. In
`addition the method creates user audio input menus to
`traverse links of audio enabled pages. Alternatively, speech
`recognition of commands spoken by the user can be used to
`establish user input accessing and creating audio pages.
`A method is also provided for generating dynamic audio
`input menus to selectively traverse a database. Attributes of
`the database are defined as Identifier attributes, specifying
`headings of the database and Target attributes, specifying
`values in the database. Rules are determined for controlling 40
`the flow of user input into the dynamic input audio menus.
`The rules include ambiguity resolution of user input values.
`A graphical representation of the attributes can be used to
`determine the selective ordering of attributes in the dynamic
`audio input menu.
`The system and method of the present invention can
`establish user profiles so that a user can form one larger
`concatenated numbers keywords to reach a particular audio
`enabled page directly without having to listen to the
`sequence of the dynamic audio input menus. Thus, for 50
`example, after being verified, the user can dial "MY
`STOCK" and this will automatically provide him with the
`latest quotes of his stock portfolio. Additionally, similar to
`HTML, the user can form a "Hot" list of the most frequently
`visited audio enabled pages.
`In summary, a user can access audio enabled pages which
`can be located at URL to form an Audio Web. It will be
`realized that only certain types of currently provided Web
`pages are good candidates for conversion into an audio
`enabled page. For example, an audio enabled page would 60
`have limited use for converting image data of a WWW page.
`Current WWW information service include numerous appli(cid:173)
`cations such as financial (stock quotes, mortgage
`calculators), weather, traffic, entertainment information
`which can be advantageously accessed using Audio Web 65
`technology from any phone, including pay phones without a
`user having to carry a computer, no matter how small the
`
`25
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENT
`
`30
`
`During the course of this description, like numbers will be
`used to identify like elements according to the different
`figures which illustrate the invention.
`FIG. 1 provides a schematic diagram of a system for audio
`access in a wide area computer network 10 in accordance
`with the teachings of the present invention. User 11 uses
`phone 12 to connect to a conventional telephone network 14.
`35 Phone 12 can be any audio interface including cellular
`phones, speaker phones, pay phones and touch tone phones.
`Telephone network 14 connects to an audio web server 16,
`18 over audio connection 17. Preferably, telephone network
`14 connects to the audio web server 16 or 18 which is closes
`in proximity to user 11. For example, telephone network 14
`would first try to connect to the audio web server 16 or 18
`which is located in the same area code or local exchange as
`phone 12. If no audio web servers are in the same area code
`or local exchange, telephone network 14 would connect to
`45 an audio web server 16 or 18 located in an area code
`different than user 12. It will be appreciated that any number
`of audio web servers can be established based on the number
`of users 11 of system 10.
`Audio web servers 16, 18 connect to a wide area computer
`network 20, i.e., the Internet over data connection 19. Audio
`web servers 16, 18 receive text data which can be audio
`enabled pages or text files over data connection 19 from
`wide area computer network 20 using conventional proto(cid:173)
`cols such as http, ftp, and tcpfip. Audio web servers 16, 18
`55 convert the received text data into audio. Audio web servers
`16, 18 transmit the audio over audio connection 17 to
`telephone network 14 which provides the audio to user 11.
`Conventional information services or content providers
`reside on non-audio WWW servers 22, 23, in wide area
`computer network 20. For example, information services
`can include a home page dispatcher, i.e., magazines and
`newspapers; search engines, i.e., Infoseek, and; generic
`home page owner. Currently, there are many companies on
`the WWW that allow users to query their databases. The
`WWW services typically return to the user a set of answers
`corresponding to the user's query from the database of a
`WWW service. For example, a stock market information
`
`RingCentral Ex-1005, p. 8
`RingCentral v. Estech
`IPR2021-00574
`
`
`
`US 6,240,448 Bl
`
`5
`provider may have a WWW site that allows a user to obtain
`a stock quote for a particular stock on a particular stock
`exchange.
`Audio enabled pages 24 are generated in a language
`referred to as audio text markup language (ATML) for
`enabling information to be displayed as audio to user 11.
`Audio enabled pages 24 can be generated from data residing
`on non-audio WWW servers 22, 23. Audio enabled pages
`can be lined in order to include data residing in other audio
`enabled pages 24. Audio enabled pages 24 can reside in 10
`memory of non-audio WWW servers 22, 23. Alternatively,
`audio enabled pages 24 can reside at an audio Web user's
`location 25 or at audio web servers 16, 18. Audio enabled
`pages 24 are interpreted by audio web servers 16, 18 for
`displaying the linked data as audio to telephone network 14, 15
`which will be described in more detail below.
`FIG. 2 illustrates a flow diagram of progress of an audio
`application using system 10. In block 30, user 11 uses
`telephone network 14 to call an audio web server 16, 18. For
`example, user 11 can call an audio server 16,18 by dialing
`1-800-GOA-UWEB with phone 12. Audio web server 16, 18
`responds to the call and provides a list of options to the user
`in block 32. For example, the options can include traversing
`any audio enabled page 24 by entering the address of the
`audio enabled page 24 with the keypad of phone 12. In block 25
`34, user 11 makes selections until a query is resolved to an
`audio enabled page 24 using touch tone interactive dialogue.
`For example, user 11 can make selections from the keypad
`of phone 12. The following is a typical scenario of the
`interaction between user 11 and audio web server 16, 18 in 30
`blocks 32 and 34.
`User: Dials 1-800-GOA-UWEB
`Audio Web Server: Responds, "Hello, you have reached
`1-800-AWEB site. Please identify yourself."
`User: Types his user name with phone keypad
`Audio Web Server: Responds, "Password"
`User: Types his password with phone keypad
`Audio Web Server: Queries, "if you know the address of
`the page you want to reach, please dial it now."
`Audio Web Server: Queries, "Otherwise Press 1 for
`E-mail."
`Audio Web Server: Queries, "Press 2 for HOT LIST
`(BookMarks)"
`If the option for E-mail is selected by the user, the user's
`E-mail can be sent over phone 12. The Hot List option is
`used to keep a profile of a user's preferred audio enabled
`pages 24. The Hot List of audio enabled pages 24 can be
`further organized into a user dependent directory. The user
`dependent directory can be accessed by user 11 interacting
`with audio web server 16, 18. The following is an example
`of a created user dependent directory as queried from audio
`web server 16, 18.
`Audio Web Server:
`Press 1 for IBM Stock Quote
`Press 2 for traffic on 27
`Press 3 for weather in New Brunswick
`Press 4 for the events in "Old Bay" this week.
`In block 34, user 11 can also be requested by audio web
`server 16, 18 to enter data via the keypad of phone 12. For
`example, after user 11 presses " " for IBM Stock Quote, user
`11 can be requested by audio server web 16, 18 to enter a
`value for the stock, i.e., high or low. In block 36, a decision
`is made as to whether the selected audio enabled page 24
`resides locally on audio web server 16, 18 or remotely from
`
`6
`audio web server 16, 18. If the audio enabled page 24locally
`resides on audio web server 16, 18. The audio web servers
`16, 18 convert the text data of audio enabled pages 24 to
`audio for transmission of audio to user 11 in block 40. The
`audio enabled page can be presented to user 11 as a menu of
`choices. These audio enabled pages 24 are established with
`audio page links to allow a user to recursively traverse a
`page, as described in more detail below. Audio page licks
`can be read to a user upon pressing predetermined numbers
`of the keypad of phone 12. For example, audio server 16, 18
`can prompt user 11 with the following response: "Press 3 for
`weather".
`Alternatively, if the audio enabled page 24 resides
`remotely of audio web server 16, 18, a connection is
`established to the address of the audio enabled pages 24 in
`block 38. The address of the audio enabled page 24 can be
`a conventional URL. Audio enabled pages 24 are transmit(cid:173)
`ted to audio web servers 16, 18 as text data with standard
`http, ftp, tcp/ip or other protocols known in the art. The
`20 audio enabled page 24 transmitted to audio web server 16,
`18. The audio enabled page 24 is transmitted to user 11 as
`audio in block 40.
`Alternatively, speech recognition can be used at audio
`web servers 16, 18 to interpret words spoken by user 11 into
`selections to the menu without using a keypad. A speech
`index can be generated of spoken words to provide search
`capabilities in a similar manner as WWW indexes. Each
`audio enabled page 24 has a "context" to understand and
`resolve the spoken commands. The use of a context makes
`it easer to interpret commands since the vocabulary will be
`limited. Additionally, global and local commands can be
`recognized for each audio enabled page 24. For example, a
`spoken command "Links" can refer to a global command
`that results in all links originating from a given audio
`35 enabled page being read to user 11. Similarly, spoken
`commands, such as "Forward" and "Back" can result in
`forward and backward linking to audio enabled pages 24.
`Spoken commands, such as "Index" and "Email" and "Hot-
`link" can be used to link to audio enabled pages as discussed
`above. Local commands can act like "local variables" to be
`interpreted in the context of the given page. An example of
`a local command is as follows. The spoken command "Order
`the ticket" can be used "If you would like to order the ticket"
`with an audio enabled page. The spoken command can be
`45 viewed as a member of the audio enabled page's context or
`dictionary. The audio enabled page's context can be defined
`as a union of all global terms and local terms. The global and
`local terms can be downloaded together with the audio
`enabled page 24 to the audio web server 16, 18. The speech
`50 index can be organized as a tree with nodes represented as
`audio enabled pages with the "context" limited to the global
`and local terms.
`FIG. 3 illustrates a schematic diagram of audio web server
`16, 18. For example, audio web server 16, 18 can be a
`55 pentium 200 running Windows NT 4.0. Call processor 50
`receives requests for establishing a call to audio web server
`16, 18 over connection 17 from telephone network 14. Call
`processor 50 establishes, maintains and terminates calls
`between telephone network 14 and audio web server 16, 18.
`60 An example of a call processor useful for practice of the
`present invention is manufactured by Dialogic as dual span
`series T-1/E-i, ISDN, PRI.
`Call processor 50 establishes a link to audio web manager
`56. Selections entered from the keypad of phone 12 to
`65 establish a connection to an audio enabled page 24 are
`transmitted from call processor 50 to audio web manager 56.
`Alternatively, spoken command selections can be transmit-
`
`40
`
`RingCentral Ex-1005, p. 9
`RingCentral v. Estech
`IPR2021-00574
`
`
`
`US 6,240,448 Bl
`
`7
`ted from call processor 50 to speech text converter 51 for
`converting the spoken commands into text. The converted
`text is forwarded to audio web manager 56. Audio web
`manager 56 establishes electronic connections 19 to wide
`area computer network 20 for accessing audio enabled pages
`24 which are located remotely of audio web server 16, 18.
`Upon a predetermined selection by user 11, audio enabled
`pages 24 can be retrieved from a URL located in wide area
`computer network 20 and stored in memory for accessed
`audio enabled pages 60. An interpreter of audio enabled 10
`pages 54 interprets audio enabled pages 24 into text data.
`Text to audio converter 52 converts text data from interpreter
`54 to audio. The audio is transmitted from call processor 50
`to user 11. An example of a text to audio converter useful for
`practice of the present invention is AT&T Watson text to 15
`speech software or DECTALK. Audio Web server 18 can
`include memory for locally storing audio enabled pages 58.
`User profiles directed to predetermined user links, referred
`to as Hot irks, can be stored in memory for user profiles 62.
`FIG. 4 illustrates a flow diagram of a method for gener- 20
`ating audio enabled pages 100. In block 102, text to be
`displayed as a welcome message for an audio enabled page
`24 is determined. A command referred to as "TEXT" can be
`used to generate the message. The combination of the below
`described commands for authoring audio enabled pages is 25
`referred to as ATML. The audio enabled pages 24 are
`generated with a conventional text editor or with a graphical
`software interface such as TCL-TK, as described in Intro(cid:173)
`duction to TCL_Tk by John Ousterhout. An example of the
`format of the TET command is:
`TEXT="Hello, you've requested the audio stock quote
`reporting page."
`In block 104, links between a user's telephone keypad and
`audio enabled pages 24 are determined. A command referred
`to as "LINK" can be used to identify an audio prompt for the
`user and a link to an audio enabled page 24. An example of
`a format of a LINK command is as follows:
`LINK=number: prompt: file.atml.
`In the LINK command, the term "number" indicates the
`number of the keypad which can be pressed by user 11 to 40
`access an audio enabled page 24. The audio enabled page is
`referred to with the term "file.atml". The term "prompt"
`refers to an audio prompt which is spoken to the user.
`Preferably, when an audio enabled page 24 is translated by
`audio web server 16, 18, the prompts of all LINK commands 45
`in the audio enabled page 24 are spoken to user 11.
`Thereafter, if user 11 presses a number of the keypad
`specified by the LINK command. The audio enabled page
`liked to the specified number of the keypad is retrieved at
`audio web server 16, 18, either by locally accessing the 50
`audio enabled page 24 if it is stored locally at the audio
`server or remotely accessing the audio enabled page 24 by
`electronically connecting to the URL of the audio enabled
`page and forwarding audio enabled page 24 with a standard
`protocol to audio web server 16, 18. The retrieved audio 55
`enabled page 24 is stored as the parent page at audio server
`16, 18.
`In block 106, user input audio menus are created which
`request web user 11 to enter a plurality of data from the
`keypad of phone 12 or make spoken selection over phone 12 60
`in order to access data files. Each piece of entered data is
`resolved into a section of a point key. The point key indexes
`a data file which can be used to answer a user's query. A
`command referred to as ENTER can be used for creating
`user input menus having the following format:
`ENTER=promptl:formatl:validation_filel.db,
`prompt2:format2:validation_file2.db,
`
`8
`promptn:formatn:validation_filen.db,
`DataFileName.txt
`The terms "format 1", "format 2" ... "format n", identify
`the format to be used by user 11 when entering data from the
`keypad of phone 12. For example, the format can be
`represented by terms "(d)+#" which translate into "Read all
`digits of the data entered from the keypad of a user phone
`until a # sign is seen". This format is used in cases in which
`input length of data to be entered by a user from the keypad
`of phone 12 is not known apriori. Alternatively, the format
`can be represented by the terms "d(n)" which translates into
`the command "read up to first n digits" of the data entered
`from the keypad of a phone 12 at audio web server 16, 18.
`This format can be used in cases where it is known that a
`fixed number of digits are used to represent data entered
`from the keypad. The term "prompt" can be used to refer to
`an audio prompt that is spoken to the user which may refer
`user to what input should be entered by the user into the
`input audio menu.
`The term "DataFileName.txt" is used to refer to an output
`datafile to be accessed by audio saver 16, 18. The output
`datafile includes pieces of data indexed by the point key. A
`user read-ahead is possible to allow a user 11 that is familiar
`with the audio input menu to enter data from the keypad
`even before the prompts are issued. For example, user 11
`could enter all information from prompt 1, ... promptn after
`hearing promptl of the user input audio menu for generating
`the output datafile.
`An ENTER command having a format such as ENTER=
`30 datafile.text can be used to read the entire contents of a data
`file. For example, the command ENTER=current_
`weather.text can be used to read the contents of a current
`weather file.
`The following is an example for generating an audio
`35 enabled pages 24 and user