`ACCESS SYSTEMDEVICE CONTROLLER
`
`FIELD OF THE INVENTION
`
`The present invention relates generally to the field of providinga robust and highly
`reliable system that allows users to browse web sites and retrieve information access. In
`particular, the by using conversational voice commands. Additionally, the present invention
`relates to voice accessallows users to information and audio transmission of retrieved
`informationcontrol and monitor other systems and devices that are connected the Internet or any
`other network by using voice commands.
`
`BACKGROUND OF THE INVENTION
`
`Standard access to information over the Internet requires the use of a personal computer
`(PC). Standard Internet access is thus limited to situations where personal computers are
`accessible
`for example at a desk or at times during travel when a portable PC can be used.
`Portable PCs allow for Internet access in a variety of situations, but if they are to be used away
`from wall connectors, they must be used with costly portable modems or constant mobile phone
`connections. Further, the user must generally have space to place a portable PC to obtain a
`reasonable means of access to the Internet.
`
`Telephones and mobile phones, on the other hand, offer convenient and relatively
`inexpensive methods of communication. Mobile phones in particular can be used in virtually any
`situation or environment, and home, office, and pay telephones are almost always accessible and
`cost much less than mobile phones or mobile modems. Telephones and mobile phones on their
`own, however, do not currently allow a user voice access to information via the Internet because
`a useful connection generally requires some type of input device and display to give and receive
`information over the Internet. Thus, a need exists for an Internet information access method that
`makes the vast resources on the Internet available to telephone and mobile phone users without
`bulky and costly input devices and displays.
`
`Currently, three options exist for a user who wishes to gather information from a web site
`accessible over the Internet. The first option is to use a desktop or a laptop computer connected
`to a telephone line via a modem or connected to a network with Internet access. The second
`option is to use a Personal Digital Assistant (PDA) that has the capability of connecting to the
`Internet either through a modem or a wireless connection. The third option is to use one of the
`newly designed web-phones or web-pagers that are now being offered on the market. Although
`each of these options can allow a user to access the Internet and browse web sites, each of them
`have their own drawbacks.
`
`Desktop computers are very large and bulky and are difficult to transport. Laptop
`computers solve this inconvenience, but many are still quite heavy and are inconvenient to carry.
`Further, laptop computers cannot be carried and used everywhere that a user travels. For
`instance, if a user wishes to obtain information from a remote location where no electricity or
`communication lines are installed, it would be nearly impossible to use a laptop computer.
`Oftentimes, information is needed on an immediate basis where a computer is not accessible.
`
`Petitioner Google Ex-1031, 0001
`
`
`
`Furthermore, the use of laptop or desktop computers to access the Internet requires a connection
`to either a network or a POTS (Plaint Old Telephone Service) line. Oftentimes, such connections
`are not available when a user desires to connect to the Internet.
`
`The second option for remotely accessing web sites is the use of PDAs. These devices
`also have their own set of drawbacks. First, PDAs that have the ability to connect to the Internet
`and access web sites are not readily available. As a result, these PDAs tend to be very expensive.
`Furthermore, users are usually required to pay a special service fee to enable the web browsing
`feature of the PDA. A further disadvantage of these PDAs is that web sites must be specifically
`designed to allow these devices to access information on the web site. Therefore, a limited
`number of web sites are available that are accessible by these web-enabled PDAs. Finally, it is
`very common today for users to carry cell phones, however, users must also carry a separate
`PDA if they require the ability to gather information from various web sites. Therefore, users
`must carry two separate devices and must also subscribe to and pay for two separate services.
`That is, a user must pay for both cellular telephone service and also for the web-enabling service
`for the PDA. This results in a very expensive alternative for the consumer.
`
`The third alternative mentioned above is the use of web-phones or web-pagers. These
`devices suffer many of the same drawbacks as PDAs. First, these devices are expensive to
`purchase. Further, the number of web sites accessible to these devices is limited since web sites
`must be specifically designed to allow access by these devices. Furthermore, users are often
`required to pay an additional fee in order to gain wireless web access. Again, this service is
`expensive. Another drawback of these web-phones or web-pagers is that as technology develops,
`the methods used by the various web sites to allow access by these devices may change. These
`changes may require users to purchase new web-phones or web-pagers or have the current device
`serviced in order to upgrade the firmware or operating system stored within the device. At the
`least, this would be inconvenient to users and may actually be quite expensive.
`
`SUMMARY OF THE INVENTION
`
`One object of this invention is to allow quick, efficient, and inexpensive information
`retrieval from the Internet or other computer networks via standard telephones, mobile phones,
`or other voice based communication devices.
`
`Another object of this invention is to provide secure and reliable retrieval of information
`over the Internet or other computer network in any situation where a user has access to a
`standard telephone or mobile phone.
`
`In accordance with one aspect of the present invention, these and other objectives are
`realized by a voice activated information access method. The voice activated information access
`method comprises a user communicating with voice servers. The voice servers receive voice
`messages from the users and employ speech to text conversion programs to translate the voice
`messages to computer readable requests. These computer readable requests are then sent to
`information retrieval computers which access and retrieve information from sites on the Internet
`or other networks or information sources corresponding to the requests received from the voice
`servers. The information retrieval computers associate incoming requests with proper locators
`which can be used to access information sources. These locators are sent to access their
`
`Petitioner Google Ex-1031, 0002
`
`
`
`corresponding information sources. Information from the corresponding information sources is
`then sent back to the retrieval computers. The retrieval computers process the information to
`assure the information is in a proper text-based format. This text-based information is then sent
`back to the voice servers. The voice servers process the text into speech recognizable by the user,
`and transmit a speech message with the requested information back to the user.
`
`The present invention allows users to access and browse web sites without being
`subjected to the added expenses, inconveniences, and limitations that exist in currently available
`web browsing systems. This is accomplished by providing a system and method that allows users
`to browse web sites using conversational voice commands spoken into any type of voice
`receiving device (i.e., any type of wireline or wireless telephone, IP phone, or other wireless
`device). These spoken commands are then converted into data messages by a speech recognition
`engine running on a media server. These data messages are then processed by a web browsing
`module and transmitted to the desired web site using the Internet. Responses sent from a web site
`are received and processed by the web browsing module and transmitted as formatted data to the
`media server. The media server converts this data into audible messages by either matching the
`data with a prerecorded audio prompt or by using a speech synthesizer.
`
`The voice browser system and method of the present invention uses a web site polling
`and ranking methodology that allows the system to detect changes in web sites and adapt to those
`changes in real-time. This enables the voice browser system to deliver highly reliable
`information to users over any voice enabled device. This ranking system also enables the present
`invention to provide rapid responses to user requests. Long delays before receiving responses to
`requests are not tolerated by users of voice-based systems, such as telephones. When a user
`speaks into a telephone, an almost immediate response is expected. This expectation does not
`exist for non-voice communications, such as email transmissions or accessing a web site using a
`personal computer. In such situations, a reasonable amount of transmission delay is acceptable.
`The ranking system of the present invention ensures users will always receive the fastest possible
`response to their request.
`
`An alternative embodiment of the present invention allows users to control and monitor
`the operation of a variety of devices connected to a network using voice commands spoken into a
`voice receiving device.
`
`It is an object of the present invention to allow users to gather information from web sites
`by using voice receiving devices, such as wireline or wireless telephones.
`
`An additional object of the present invention is to provide a system and method that
`allows the searching and retrieving of publicly available information by controlling a web
`browsing module using naturally spoken voice commands.
`
`It is an object of the present invention to provide a robust voice browsing system that can
`obtain the same information from several web sites based upon a ranking order. The ranking
`order is automatically adjusted if the system detects that a given web site is not functioning, is
`too slow, or has been modified in such a way that the requested information cannot be retrieved
`any longer.
`
`Petitioner Google Ex-1031, 0003
`
`
`
`A further object of the present invention is to provide a system and method that allows
`users to browse web sites using a single device that requires only one subscription service.
`
`A still further object of the invention is to allow users to gather information from web
`sites from any location where a telephonic connection can be made.
`
`Another object of the present invention is to allows users to browse web sites on the
`Internet using conversational voice commands spoken into wireless or wireline telephones.
`
`An additional object of the present invention is to provide a system and method for using
`voice commands to control and monitor devices connected to a network.
`
`It is an object of the present invention to provide a system and method which allows
`devices connected to a network to be controlled by conversational voice commands spoken into
`any voice enabled device interconnected with the same network.
`
`A further object of the present invention is to allow devices connected to the Internet to
`be controlled by conversational voice commands spoken into any voice enabled device
`connected to the Internet.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 displays an outgoing information request process in one embodimentis a depiction
`of the present invention;
`
`FIG. 2 displays the operation of a voice server for use in onebrowsing system of the first
`embodiment of the present invention;.
`
`FIG. 3 displays2 is a depiction of the operation of an information retrieval computer for
`use in one embodimentdevice browsing system of the present invention; and FIG. 4 displays a
`returning information process in onesecond embodiment of the present invention.
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`A first embodiment of the present invention uses various forms of signal is a system and
`data transmissionmethod for allowing users to allow a user to retrieve information browse web
`sites by using audio communication. A user verbally requests information and is given the
`requested information in aconversational voice message.
`
`A user may begin interacting with the commands spoken into a voice-receiving device.
`Users are not required to learn a special language or command set in order to communicate with
`the voice browsing system of the present invention by placing a telephone call via a standard or
`mobile telephone to a voice server and verbally requesting information available on the Internet
`or other computer networks. The voice server processes the request using a speech recognition
`program. The output of the speech recognition program is a computer-readable request that is
`sent to an information retrieval computer. The information retrieval computer associates the
`computer readable data from the voice server with information sources, which the information
`retrieval computer contacts with an information request. Multiple information sources may be
`
`Petitioner Google Ex-1031, 0004
`
`
`
`prioritized by the information retrieval computer so that if any one source is unavailable, other
`sources can be quickly contacted. When an information source receives an information request, it
`responds with the desired information which is transferred back to the information retrieval
`computer. This begins the information’s return trip to the user.
`
`The information retrieval computer processes the incoming information into a text
`message that is readable by a text to voice converter located on the voice server, and then sends
`this text message to the voice server. At this point, the information retrieval computer is capable
`of parsing incoming information from a variety of sources and in a variety of formats. The voice
`server receives the text message and, using a text to voice converter, converts it into a voice
`message that it sends back to the user. The user then receives the voice message containing the
`requested information, completing one session of the voice based information retrieval process.
`If the user wishes to have more information, he starts the process again with another verbal
`request.
`
`The invention can be used on a variety of systems with several types of data transmission
`procedures. In one application of the present invention, the user’s initial call to a voice server is
`made over the Public Switched Telephone Network (PSTN). This call may be made using a
`standard telephone, a cellular phone, a digital mobile phone, or any other type of PSTN voice
`communication device. Following the voice recognition step in the voice server, the outgoing
`request may be transferred to an information retrieval computer using a variety of data
`communication services. For example, the outgoing request may be transmitted over the Internet
`using the Transmission Control Protocol/Internet Protocol (TCP/IP) model. Alternately,
`asynchronous transfer mode (ATM), or a frame relay service may be used to transmit the request
`from a voice server to an information retrieval computer. A voice server and an information
`retrieval computer may also be linked to each other via a local area network (LAN), a wide area
`network (WAN), or a wireless network, using a wide variety of network architectures.
`
`It is presently preferred to transmit digital requests from a voice server to an information
`retrieval computer through a firewall, using the TCP/IP model. It is also presently preferred for
`information requests to pass through a firewall on their way out from information retrieval
`computers, and for the returning information to pass once again through a firewall on its way to
`an information retrieval computer.
`
`The information retrieval computer is capable of using an intelligent middleware
`algorithm to track changes in the location and format of information so that pertinent information
`is consistently available to the user. For example, the information retrieval computer may scan an
`information source for titles, and transmit these titles back to a voice server. If the user hears a
`title that he believes will contain the necessary information, the user may say this title to the
`voice server, which transfers the title back to the information retrieval computer. Then, the
`information retrieval computer can send back information associated with the title. Using this
`system, a user may quickly negotiate a complicated information source to find the exact
`information he wants.
`
`The present invention will now be described in connection with Common and ordinary
`commands and phrases are all that is required for a user to operate the voice browsing system.
`The voice browsing system recognizes naturally spoken voice commands and does not have to
`
`Petitioner Google Ex-1031, 0005
`
`
`
`be trained to recognize the voice patterns of each individual user. Such voice recognition systems
`use phonemes to recognize spoken words and not voice patterns.
`
`The first embodiment allows users to select from various categories of information and to
`search those categories for desired data by using conversational voice commands. The present
`invention uses a media server containing a speech recognition engine. This speech recognition
`engine is used to recognize natural, conversational voice commands spoken by the user and
`converts them into data messages. These data messages are processed by a web browsing module
`and used to access the appropriate web site to gather information requested by the user. The
`media server also contains a speech synthesis engine that converts the data responses received
`from the various web sites into audio messages that are transmitted to the user. A more detailed
`description of this embodiment will now be provided.
`
`Referring to FIG. 1, a database 2 designed by Webley Systems Incorporated is connected
`to one or more web browsing modules 4 as well as to one or more media servers 6. This database
`2 contains a listing of accessible web sites, parameters required to access each listed web site,
`“content descriptors” that describe the expected format of the responses received from each web
`site, and the rank number of each web site. These features will be further described later. The
`database also contains a listing of pre-recorded voice responses. Further, database 2 may contain
`customer profile information, system activity reports, and any other data or software modules
`necessary for the testing or administration of the voice browsing system.
`
`The web browsing modules 4 provide access to any computer network such as the
`Internet 8. These modules also perform the task of receiving responses from web sites and
`extracting the data requested by the user. This task is also known as “content extraction.” The
`web browsing modules 4 also perform the task of periodically polling or “pinging” various web
`sites and modifying the ranking numbers of these web sites depending upon their response and
`speed. This polling feature is further discussed below. The media servers 6 provide speech
`recognition, speech synthesis, and call handling functions. The speech recognition function is
`performed by a speech recognition engine that converts voice commands received from the
`user’s voice receiving device 10 (i.e., any type of wireline or wireless telephone, Internet
`Protocol (IP) phones, or other special wireless units) into data messages that comply with the
`appropriate communications protocol. In the preferred embodiment, the data messages must
`comply with the TCP/IP communication protocol. These data messages are then processed by the
`web browsing modules 4 and the appropriate web site 12 is accessed over the Internet 8 to obtain
`the desired information. The web sites accessible by this system may be written in any type of
`software language, including XML, HDML, HTML, or any variation of these languages. The
`speech synthesis function of media server 6 is performed by a speech synthesis engine that
`converts the data messages received from the web site 12 into audio messages. These audio
`messages are then transmitted to the user’s voice receiving device 10.
`
`A preferred speech recognition engine is developed by Nuance Communications of 1380
`Willow Road, Menlo Park, California 94025 (www.nuance.com). A preferred speech synthesis
`engine is developed by Lernout and Hauspie Speech Products, 52 Third Avenue, Burlington,
`Massachusetts 01803 (www.lhsl.com). Further, a preferred media server is based on an Intel
`Dual Pentium III 650 MHz microprocessor running a Natural Speech Recognition Engine from
`Nuance Communications and a Speech Synthesis Engine developed by Webley Systems using
`
`Petitioner Google Ex-1031, 0006
`
`
`
`Lernout and Hauspie software. The Nuance engine delivers 40 recognition units as defined in the
`vendor specification.
`
`In operation, a user establishes a connection between his voice receiving device 10 and a
`media server 6. This may be done through the Public Switched Telephone Network (PSTN) 16
`by calling a telephone number associated with the voice browsing system. Once the connection
`is established, the media server 6 initiates an interactive voice response (IVR) application. The
`IVR application aurally provides the user with a list of options, such as, “stock quotes”, “flight
`status”, “yellow pages”, “weather”, and “news”. The user selects the desired option by speaking
`the name of the option into the voice receiving device 10.
`
`As an example, if a user wishes to obtain restaurant information, he may speak into his
`telephone the word “yellow pages”. The IVR application would then ask the user what he would
`like to find and the user may respond by stating “restaurants”. The user may then be provided
`with further options related to searching for the desired restaurant. For instance, the user may be
`provided with the following restaurant options, “Mexican Restaurants”, “Italian Restaurants”, or
`“American Restaurants”. The user then speaks into his voice receiving device 10 the restaurant
`type of interest. The IVR application running on the media server 6 may also request additional
`information limiting the geographic scope of the restaurants to be reported to the user. For
`instance, the IVR application may ask the user to identify the zip code of the area where the
`restaurant should be located. The media server 6 uses the speech recognition engine to interpret
`the speech commands received from the user. The speech recognition engine converts the user’s
`voice signals into properly formatted data messages that are subsequently used by a web
`browsing module 4 to access the appropriate web sites 12 via the Internet.
`
`The content information received from the responding web site 12 is then processed by
`the web browsing module 4 according to the “content descriptor” information stored in database
`2. This processed response is then transmitted to the media server 6 for conversion into audio
`messages using either the speech synthesis software or selecting among a database of
`prerecorded voice responses contained within the database 2.
`
`The database 2 contains a listing of web sites 12 that may be accessed by the voice
`browser system of the present invention. This database 2 is an integral part of the voice browsing
`system of the present invention. The web sites listed in this database 2 are grouped by the various
`categories to which they apply. For instance, a set of web sites useful in obtaining restaurant
`information is listed under the “restaurants” category. For each web site listed within database 2,
`a table exists listing the parameters required to access the particular web site and the data
`formatting requirements. For instance, certain web sites may require inputting a zip code or a
`user identification number in order to access the site. The database 2 also contains “content
`descriptors” related to each listed web site. These “content descriptors” provide information
`regarding how to interpret responses received from that web site. Individual web sites may
`provide response data in varying formats. The “content descriptors” allow the web browsing
`modules 4 to successfully recognize the data received from each web site and reformat the data
`into a format useable by the media servers 6.
`
`For each category searchable by a user, the database 2 may list several web sites that may
`be searched. Each of these web sites is assigned a rank number. As an example, three different
`
`Petitioner Google Ex-1031, 0007
`
`
`
`web sites may be listed as searchable under the category of “restaurants”. Each of those web sites
`will be assigned a rank number such as 1, 2, or 3. The site with the highest rank (rank = 1) will
`be the first web site accessed by a web browser module 4. If the information requested by the
`user cannot be found at this first web site, then the web browser module 4 will search the second
`ranked web site and so forth down the line until it is able to obtain all information requested by
`the user or has no more web sites left to check.
`
`The web site ranking method and system of the present invention enables the voice
`browser system to be robust and adaptable to changes that may occur as web sites evolve. In the
`rapidly changing area of Internet applications, web sites change frequently. For instance, the
`information required by a web site 12 to perform a search or the format of the reported response
`data may change. Without the ability to adequately monitor and detect these changes, a search
`requested by users may turn-up with an incomplete response, no response, or an error. Such
`useless responses may result from incomplete data being provided to the web site or the web
`browser module 4 being unable to recognize the response data messages received from the
`searched web site 12.
`
`The robustness and reliability of the voice browsing system of the present invention is
`further improved by continually polling or “pinging” each of the sites listed in the database 2.
`During this polling function, a web browsing module 4 sends brief requests to each web site
`listed in database 2. The web browsing module 4 monitors the response received from each web
`site and determines whether it is a complete response and whether the response is in the expected
`format specified by the “content descriptors” listed in database 2. Those polled web sites that
`provide complete responses in the format specified by the “content descriptors” have their
`ranking established based on the speed of their responses. Web sites that provide fast response
`times will be assigned higher rankings than those with slow response times. If the web browsing
`module 4 receives no response from the polled web site or if the response received is not in the
`expected format, then the rank of that web site is lowered. Additionally, an alarm may be
`generated for the system administrator indicating that the specified web site has been modified
`and requires further review.
`
`Since the web browsing modules 4 access web sites based upon their ranking number,
`only those web sites that produce useful and error-free responses will be used by the voice
`browser system to gather information requested by the user. Further, since the ranking numbers
`are also based upon the speed of a web site in providing responses, only the most time efficient
`sites are accessed. Those web sites that the voice browser system has trouble accessing will be
`assigned lower rank numbers and therefore will not be primarily used in performing searches.
`This system assures that users will get complete, timely, and relevant responses to their requests.
`Without this feature, users may often be provided with information that is not relevant to their
`request or may not get any information at all. The constant polling and re-ranking of the web
`sites used within each category allows the voice browser of the present invention to operate
`efficiently. Finally, it allows the voice browser system of the present invention to dynamically
`adapt to changes in the rapidly evolving web sites that exist on the Internet.
`
`Selecting only those web sites that provide rapid responses is an important factor for
`maintaining the desirability and usability of the present invention to users. When users access
`web sites using devices such as personal computers, delays in receiving responses are tolerated
`
`Petitioner Google Ex-1031, 0008
`
`
`
`and are even expected, however, such delays are not expected when a user communicates with a
`telephone. Users expect communications over a telephone to occur immediately with a minimal
`amount of delay time. A user attempting to find information using a telephone expects
`immediate responses to his search requests. A system that introduces too much delay between
`the time user makes a request and the time of response will not be tolerated by users and will
`lose its usefulness. The ranking system of the present invention is crucial to avoiding this
`problem and providing a useful system to users.
`
`A second embodiment of the present invention provides is depicted in FIG. 2. This
`embodiment provides a system and method for controlling a variety of devices 20 connected to a
`network 22 by using voice conversational commands spoken into a voice receiving device 24
`(i.e., wireline or wireless telephones, Internet Protocol (IP) phones, or other special wireless
`units). The networked devices may include various household devices or household systems. For
`instance, voice commands may be used to control household security systems, VCRs, TVs,
`outdoor or indoor lighting, sprinklers, or heating and air conditioning systems.
`
`Each of these systems or devices 20 is connected to a network 22. These devices 20 may
`contain embedded microprocessors or may be connected to other computer equipment that allow
`the device 20 to communicate with network 22. This network 22 interfaces with one or more
`device browsing modules 26 manufactured by Webley Systems Incorporated. The device
`browsing modules perform many of the same functions as the web browsing modules 4 discuss
`above in the first preferred embodiment. The device browsing modules 26 are also connected to
`a database 28.
`
`Database 28 lists all devices that are connected to the network 22. The database 28 also
`contains a listing of the options and functions available for each of the devices 20 connected on
`the network 22. Furthermore, database 28 contains the information necessary to properly
`communicate with the networked devices 20. Such information would include, for example,
`communication protocols, message formatting requirements, and required operating parameters.
`Database 2 may also include any other data or software necessary to test and administer the
`device browsing system.
`
`A device browsing module 26 also receives messages from the various networked
`devices 20, appropriately formats those messages, and transmits them to one or more media
`servers 30 which are part of the device browsing system. The user’s voice receiving devices 24
`access the device browsing system by calling into a media server 30 via the Public Switched
`Telephone Network (PSTN) 32.
`
`The function of the media servers 30 is to provide speech synthesis and natural speech
`recognition. When data messages are received from the device browser module 26, a media
`server 30 will convert the data message into audio messages that are transmitted to the voice
`receiving device of the user 24. Voice commands received from the voice receiving device of the
`user 24 are converted by a media server 30 into data messages conforming to the appropriate
`communication protocol via the speech recognition software engine running on the media server
`30. A preferred speech recognition engine is developed by Nuance Communications of 1380
`Willow Road, Menlo Park, California 94025 (www.nuance.com). A preferred speech synthesis
`engine is developed by Lernout and Hauspie Speech Products, 52 Third Avenue, Burlington,
`
`Petitioner Google Ex-1031, 0009
`
`
`
`Massachusetts 01803 (www.lhsl.com). A preferred media server 30 is based on the Intel Dual
`Pentium M 650 MHz microprocessor running a Natural Speech Recognition Engine from
`Nuance Communications. This e