throbber
(19) United States
`(12) Patent Application Publication (10) Pub. No.: US 2005/0049854 A1
`Reding et al.
`(43) Pub. Date:
`Mar. 3, 2005
`
`US 2005.0049854A1
`
`(54) METHODS AND APPARATUS FOR
`GENERATING, UPDATING AND
`DISTRIBUTING SPEECH RECOGNITION
`MODELS
`(76) Inventors: Craig Reding, Midland Park, NJ (US);
`Suzi Levas, Nanuet, NY (US)
`Correspondence Address:
`VERIZON CORPORATE SERVICES GROUP
`INC.
`CIO CHRISTIAN R. ANDERSEN
`600 HIDDEN RIDGE DRIVE
`MAILCODE HQEO3H14
`IRVING, TX 75038 (US)
`(21) Appl. No.:
`10/961,781
`(22) Filed:
`Oct. 8, 2004
`Related U.S. Application Data
`(63) Continuation of application No. 09/726,972, filed on
`Nov. 30, 2000, now Pat. No. 6,823,306.
`
`Publication Classification
`
`(51) Int. Cl. .................................................. G10L 11/00
`
`(52) U.S. Cl. .............................................................. 704/201
`
`(57)
`
`ABSTRACT
`
`Techniques for generating, distributing, and using speech
`recognition models are described. A shared speech proceSS
`ing facility is used to Support Speech recognition for a wide
`variety of devices with limited capabilities including busi
`neSS computer Systems, personal data assistants, etc., which
`are coupled to the Speech processing facility via a commu
`nications channel, e.g., the Internet. Devices with audio
`capture capability record and transmit to the Speech pro
`cessing facility, Via the Internet, digitized speech and receive
`Speech processing Services, e.g., Speech recognition model
`generation and/or Speech recognition Services, in response.
`The Internet is used to return Speech recognition models
`and/or information identifying recognized words or phrases.
`Thus, the Speech processing facility can be used to provide
`Speech recognition capabilities to devices without Such
`capabilities and/or to augment a device's Speech processing
`capability. Voice dialing, telephone control and/or other
`Services are provided by the Speech processing facility in
`response to Speech recognition results.
`
`A
`
`N.
`1OO
`
`BUSINESS
`PREMISES
`
`
`
`
`
`CUSTOMER
`PREMISES
`1
`
`CUSTOMER
`PREMISES
`2
`
`CUSTOMER
`PREMSES
`N
`
`
`
`SPEECH
`PROCESSING
`FACLITY
`
`
`
`
`
`TELEPHONE
`NETWORK
`
`22
`
`Exhibit 1023
`Page 01 of 26
`
`

`

`Patent Application Publication Mar. 3, 2005 Sheet 1 of 14
`
`US 2005/0049854 A1
`
`ZZ
`
`| -61-I
`
`Z
`
`
`
`SESIWE? Jc3
`
`è?EWOLSTO
`
`|--
`
`
`
`SESIWE??ejSESIVNE? Jc3
`
`
`
`
`
`
`
`
`
`
`
`èJEWOLSTVOSSBN|S|[^{}
`
`Exhibit 1023
`Page 02 of 26
`
`

`

`Patent Application Publication Mar. 3, 2005 Sheet 2 of 14
`
`US 2005/0049854 A1
`
`
`
`di SÐNITIVO
`
`BONEYJE-NOO
`
`81
`
`Š??
`
`F=======================
`
`T1
`
`ENOHd'ETEL
`
`X!!!ONALEN
`
`O
`
`° 09
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Exhibit 1023
`Page 03 of 26
`
`

`

`Patent Application Publication
`
`Mar. 3, 2005 Sheet 3 of 14
`
`US 2005/0049854A1
`
`09Xè?ONALEN
`
`OL
`
`BNOH.&ETEL
`
`WECJOWO
`
`º?º | IndNodny | góc | Indiño?igny
`
`BOIAEO
`
`HOSSECO}}&
`
`
`
`
`
`
`
`
`
`
`
`Exhibit 1023
`Page 04 of 26
`
`

`

`Patent Application Publication Mar. 3, 2005 Sheet 4 of 14
`
`US 2005/0049854A1
`
`302
`
`
`
`416
`
`VOICE DALNG
`ROUTINE
`
`WORD PROCESSORW1
`SPEECH RECOGNITION
`INTERFACE
`
`4.18
`
`420
`
`
`
`VOICE DALING
`DATA
`422
`
`SD VOICE DALNG
`CUSTOMER RECORD
`
`424
`
`
`
`S VOICE DALNG
`CUSTOMER RECORD
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`MEMORY
`TELEPHONE
`CONTACT NUMBER
`
`
`
`
`
`
`
`SPEECH
`RECOGNITION
`ROUTINES
`
`401
`
`402
`
`SPEAKER
`INDEPENDENT
`SRMS
`
`SPEAKER
`DEPENDENT SRMS
`
`406
`
`MODELTRAINING 408
`ROUTINE
`
`
`
`
`
`
`
`
`
`4
`10
`
`412
`
`SPEECH
`DATA
`EXTRACTED
`FEATURE
`NFORMATION
`DIGITAL
`SPEECH
`RECORDING
`
`Fig. 4
`
`Exhibit 1023
`Page 05 of 26
`
`

`

`Patent Application Publication Mar. 3, 2005 Sheet 5 of 14
`
`US 2005/0049854 A1
`
`
`
`
`
`
`§EWWELSAS ?NISSEOOHdHOBEASBLOWB}}
`
`W00 SÅSEOIOA:SSEAQQy
`
`Exhibit 1023
`Page 06 of 26
`
`

`

`Patent Application Publication Mar. 3, 2005 Sheet 6 of 14
`
`US 2005/0049854A1
`
`70
`
`VOICE DIALING IP
`MEMORY
`
`612
`
`SPEECH
`RECOGNITION
`ROUTINE
`
`CALL SETUP
`ROUTINE
`
`
`
`613
`
`
`
`614
`
`615
`
`VOICE DIALNG
`ROUTINE
`
`DATABASE
`PERSONEL
`CORPORATE
`DALER
`DALER
`RECORDS
`RECORDS
`CUSOMER 1
`CORPORATION
`RECORD
`1 RECORD
`622
`626 .
`624
`CUSTOMERN
`RECORD
`
`628
`CORPORATION
`N RECORD
`
`Fig. 6
`
`
`
`
`
`
`
`SPEECH
`RECOGNIZER
`CRCUT
`
`TO
`SSP
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`SIGNAL
`NE
`S.
`
`
`
`606
`
`SWITCH fo
`INTERFACE
`
`PROCESSOR
`608
`
`
`
`610
`
`NETWORK
`INTERFACE
`
`TO
`NERME
`
`Exhibit 1023
`Page 07 of 26
`
`

`

`Patent Application Publication Mar. 3, 2005 Sheet 7 of 14
`
`US 2005/0049854A1
`
`702
`
`START
`MODEL TRAINING ROUTINE
`
`
`
`-N
`700
`
`RECEIVE TEXT OF WORD ORNAME TO
`BE TRANED
`
`704
`
`
`
`PROMPT USER TO STATE
`WORD OR NAME TO BE
`TRANED
`
`706
`
`RECORD USER
`SPEECH
`
`708
`
`710
`
`
`
`
`
`
`
`
`
`S
`LOCAL FEATURE
`EXTRACTION
`
`712
`
`
`
`
`
`
`
`
`
`
`
`PERFORM FEATURE
`EXTRACTION ON
`RECEIVED SPEECH
`
`
`
`Fig. 7
`
`714.
`
`TRANSMIT USER DENTIFIER
`NADDITION TO RECORDED
`SPEECH, EXTRACTED
`FEATURE INFORMATION,
`TEXT VERSON OF SPEECH
`AND/OR EXISTING SPEECH
`RECOGNITION MODELO
`SPEECH PROCESSING
`FACLITY
`
`
`
`
`
`
`
`RECEIVE SPEECH RECOGNITION
`MODEL(S) FROMSPEECH
`PROCESSING FACFLTY
`
`STORE RECEIVED SPEECH
`RECOGNITION MODEL(S)
`
`STOP
`MODEL TRAINING ROUTINE
`
`mawnwarrauxwm.super
`
`716
`
`718
`
`720
`
`Exhibit 1023
`Page 08 of 26
`
`

`

`Patent Application Publication Mar. 3, 2005 Sheet 8 of 14
`
`US 2005/0049854 A1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Fig. 8
`
`->
`
`800
`
`START
`VOICE DIALING ROUTINE
`8
`
`8O2
`c
`
`MONITORFOR
`SPEECH
`
`V
`
`806
`
`HAS
`SPEECH BEEN
`RECEIVED?
`
`PERFORM
`FEATURE
`EXTRACTION ON
`RECEIVED
`SPEECH
`
`IS
`LOCAL FEATURE
`EXTRACTION
`PPORTED2
`
`
`
`CAL
`E.g
`ROUTINE
`
`
`
`
`
`IS
`LOCAL SPEECH
`RECOGNITION CAPABILITY
`AVALABLE
`
`LOCAL VOICE DIALING
`OPERATION SUCCESSFUL2
`
`DETECT RESPONSE
`FROM REMOTE SPEECH
`PROCESSINGFACLITY
`
`828
`
`
`
`
`
`
`
`
`
`
`
`DOES
`RESONSE INCLUDE
`TELEPHONE NUMBERTO
`BEDIAED
`
`829
`
`DAL TELEPHONE
`NUMBER
`
`
`
`
`
`
`
`
`
`PROVIDE MESSAGE TO
`SYSTEM USER INDICATING
`THAT THE NAMED PARTY IS
`BEING CALLED OR THAT THE
`SYSTEM WAS UNABLE TO
`IDENTIFY A PARTY TO BE
`CALEO
`
`826
`
`
`
`
`
`
`
`
`
`
`
`
`
`818
`/
`TRANSMT SYSTEM USER
`ID TO REMOTE SPEECH
`PROCESSINGFACITY
`
`820
`
`TRANSM SPEECH AND/OR
`EXTRACTE
`DFEATURE
`INFORMATION OREMOTE
`SPEECH PROCESSINGFACLITY
`822
`
`DOES
`-TELEPHON
`COMPUTER
`CONNECTION
`EX
`
`RANSMIT TELEPHONE
`CONTACT NUMBERTO
`REMOTE SPEECH
`PROCESSING FACTY
`
`
`
`Exhibit 1023
`Page 09 of 26
`
`

`

`Patent Application Publication Mar. 3, 2005 Sheet 9 of 14
`
`US 2005/0049854 A1
`
`AN
`900
`
`908
`
`RETURN
`O wE. 80
`UNSUCCESSFUL
`VOICE DALING
`NDICATOR
`
`
`
`903
`
`902
`
`
`
`
`
`
`
`
`
`EXTRACTED
`FEATURE
`INFORMATION
`
`
`
`
`
`START
`LOCAL VOICE DIALING
`SUBROUTINE
`
`PERFORM SPEECH
`RECOGNITION
`OPERATION USING
`LOCALLY STORED
`SPEECH RECOGNITION
`MODELS
`
`
`
`904
`
`
`
`NAME
`RECOGNIZED
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`DOES
`COMPUTER-TELEPHONE
`CONNECTION
`EXIST?
`
`
`
`
`
`
`
`DAL TELEPHONE
`NUMBER ASSOCATED
`WITH RECEIVED NAME
`
`DETECT CALL
`COMPLETION
`
`TRANSMIT TELEPHONE
`NUMBERTO BE DIALED &
`CONTACT TELEPHONE
`NUMBER TO CAL
`ESTABLISHMENT DEVICE
`
`918
`
`
`
`
`
`
`
`RETURN
`TO STEP 810
`WITH
`SUCCESSFUL
`VOICE DALING
`NDICATOR
`
`
`
`Exhibit 1023
`Page 10 of 26
`
`

`

`Patent Application Publication
`
`Mar. 3, 2005 Sheet 10 of 14
`
`US 2005/0049854 A1
`
`START
`REMOTE VOICE DIALNG
`ROUTINE
`
`
`
`
`
`RECEIVE VOICE DIALING
`SERVICE INPUTFROMREMOTE
`DEVICE
`
`1004
`
`1 OO6
`
`
`
`RETRIEVE VOICE DALING
`INFORMATION TO BE USED WITH
`IDENTIFIED USER
`1008
`
`Y
`
`WAS
`XTRACTED FEATUR
`INFORMATION
`RECEIVE)
`
`AN
`1000
`
`101 O
`
`PERFORM FEATURE
`EXTRACTION OPERATION
`ON RECEIVED SPEECH
`
`1022
`
`S
`HERE AN
`ADDITIONAL REMOTE SPEECH
`PROCESSING SYSTEMASSOCIATED
`WTH THEIDENTIFIED
`
`1024
`
`Fig. 10
`
`
`
`
`
`
`
`
`
`PERFORMSPEECHRECOGNITION
`OPERATIONUSING RETRIEVED
`VOICE DALING INFORMATION
`
`1014
`
`NAME
`RECOGNIZED
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`WAS
`ATELEPHONE CONTACT
`NUMBER
`RECEIVED?
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`TRANSMTTELEPHONE
`NUMBERTO BE DALED &
`CONTACT TELEPHONE
`NUMBER TO CALL
`NTATION DEVCE
`
`
`
`
`
`
`
`TRANSMIT TELEPHONE
`NUMBER ASSOCATED
`WITH RECOGNIZED NAME
`TO REMOTE COMPUTER
`SYSTEM
`
`SEND MESSAGETO
`REMOTE
`COMPUTER
`SYSTEM NOTFYING
`OF FAILED WOICE
`DALNGATEMPT
`
`1028
`
`
`
`SOP
`REMOTE VOICE DALENG
`ROUTINE
`
`RANSMIT SYSTEM USER
`D TO ADOTIONAL
`REMOTE SPEECH
`PROCESSINGFACILITY
`
`
`
`
`
`
`
`
`
`TRANSMTSPEECH AND/OR
`EXTRACED FEATURE
`NFORMATION TO
`ADDITIONAL REVOE
`SPEECH PROCESSING
`FACTY
`
`Exhibit 1023
`Page 11 of 26
`
`

`

`Patent Application Publication Mar. 3, 2005 Sheet 11 of 14
`
`US 2005/0049854A1
`
`1N
`
`1100
`
`STAR
`CALL ESTABLISHMENT
`ROUTINE
`
`11 O2
`
`
`
`
`
`
`
`RECEIVE TELEPHONE NUMBERTO BE 1104
`DALED & CONTACT TELEPHONE
`NUMBER
`
`NTATE CALL TO TELEPHONE
`NUMBER TO BE DIALED
`
`NITATE CALL TO CONTACT
`TELEPHONE NUMBER
`
`Fig. 11
`
`BRIDGE INTATED
`CAS
`
`
`
`Allow BRIDGED call 1112
`TO TERMANTE IN
`NORMAL MANNER
`
`stop
`CALLESTABLISHMENT
`ROUTINE
`
`1114
`
`
`
`
`
`
`
`Exhibit 1023
`Page 12 of 26
`
`

`

`Patent Application Publication Mar. 3, 2005 Sheet 12 of 14
`
`US 2005/0049854 A1
`
`1200
`
`1224
`
`MANTAIN SYSTEM
`CLOCK
`
`
`
`
`
`HAS
`PRESELECTED TIME
`BETWEENUPDATES
`OCCURRED?
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`TRANSMIT UPDATED MODELS
`ANDOR UPDATED SPEECH
`RECOGNITIONSOFTWARE TO
`USER DEVICES
`
`
`
`
`
`Fig. 12
`
`START MODEL GENERATION
`ROUTINE
`
`1202
`
`
`
`1204
`
`
`
`MONTORFOR INPUT:
`
`RECEIVE INFORMATION
`1206a
`USER ID, SPEECH/
`FEATURE
`INFORMATION, TEXT
`INFORMATION &
`MODELTYPE INFO
`
`USERID, EXISTING
`MODE, EXISTING
`MODELTYPE, SPEECH/
`FEATURE
`NFORMATION,
`UPDATED MODELTYPE
`
`TRAINING
`DATABASE
`
`AUGMENT TRAINING
`DATABASE WITH RECEIVED
`INFORMATION
`
`
`
`GENERATE NEWSPEECHMODEL FROM
`RECEIVED SPEECH AND OTHER
`RECEIVED INFORMATION AND/OR
`SPEECHFN RAINFNG DATABASE
`
`STORE GENERATED
`SPEECH RECOGNITION
`MODEL
`
`TRANSMIT GENERATED
`SPEECH MODELTO
`DEVICE WHICH
`REQUESTED A MODE.
`GENERATIONOR
`UPDATE SERVICE
`
`
`
`Exhibit 1023
`Page 13 of 26
`
`

`

`Patent Application Publication
`
`Mar. 3, 2005 Sheet 13 of 14
`
`US 2005/0049854 A1
`
`Å LITIO\/-|
`
`HOEWEc}S
`
`
`
`SÕNISSE OOXJej
`
`9 | -61
`
`
`A&#1100}}10 9NISSEOOH)
`
`TWNOIS OICJOW
`
`
`HOSSE OOY}A
`ESWGWLWG
`
`5) NINIW}}]
`
`OL
`
`BNOH?ETEL
`XèJONALEN
`
`
`
`
`
`
`
`Exhibit 1023
`Page 14 of 26
`
`

`

`Patent Application Publication Mar. 3, 2005 Sheet 14 of 14
`
`US 2005/0049854 A1
`
`
`
`
`
`
`
`
`
`START
`SPEECH RECOGNITION
`ROUTINE
`
`1402
`
`1400
`
`1404.
`
`RECEIVE SPEECH RECOGNITION SERVICE
`REQUST FROMA REMOTE DEVE, THE REQUEST
`INCLUDING SPEECHOR EXTRACTED FEATURE
`INFORMATION AND A SYSTEM DENTFER
`
`PERFORMA SPEECH RECOGNITION
`OPERATIONUSING THE RECEIVED
`SPEECHOR EXTRACTED FEATURE
`INFORMATION
`
`
`
`
`
`GENERATEAMESSAGE INCLUDING
`SPEECH RECOGNITION RESULTS,
`INCLUDING RECOGNIZED WORDS IN
`TEXT FORM
`
`TRANSMFT GENERATED MESSAGE
`NCLUDING RECOGNIZED WORDS TO
`SYSTEM IDENTIFIED BY SYSTEM
`IDENTFER ASSOCATED WITH RECEIVED
`SPEECH
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Fig. 14
`
`Exhibit 1023
`Page 15 of 26
`
`

`

`US 2005/0049854 A1
`
`Mar. 3, 2005
`
`METHODS AND APPARATUS FOR GENERATING,
`UPDATING AND DISTRIBUTING SPEECH
`RECOGNITION MODELS
`
`FIELD OF THE INVENTION
`0001. The present invention is directed to speech recog
`nition techniques and, more particularly, to methods and
`apparatus for generating Speech recognition models, distrib
`uting speech recognition models and performing speech
`recognition operations, e.g., voice dialing and word proceSS
`ing operations, using Speech recognition models.
`
`BACKGROUND OF THE INVENTION
`0002 Speech recognition, which includes both speaker
`independent speech recognition and Speaker dependent
`Speech recognition, is used for a wide variety of applica
`tions.
`0.003 Speech recognition normally involves the use of
`Speech recognition models or templates that have been
`trained using Speech Samples provided by one or more
`individuals. Commonly used speech recognition models
`include Hidden Markov Models (HMMS). An example of a
`common template is a dynamic time warping (DTW) tem
`plate. In the context of the present application “speech
`recognition model” is intended to encompass both speech
`recognition models as well as templates which are used for
`Speech recognition purposes.
`0004 AS part of a speech recognition operation, speech
`input is normally digitized and then processed. The proceSS
`ing normally involves extracting feature information, e.g.,
`energy and/timing information, from the digitized signal.
`The extracted feature information normally takes the form of
`one or more feature vectors. The extracted feature vectors
`are then compared to one or more speech recognition models
`in an attempt to recognize words, phrases or Sounds.
`0005. In speech recognition systems, various actions,
`e.g., dialing a telephone number, entering information into a
`form, etc., are often performed in response to the results of
`the Speech recognition operation.
`0006 Before speech recognition operations can be per
`formed, one or more speech recognition models need to be
`trained. Speech recognition models can be either speaker
`dependent or Speaker independent. Speaker dependent (SD)
`Speech recognition models are normally trained using
`Speech from a single individual and are designed So that they
`should accurately recognize the Speech of the individual
`who provided the training speech but not necessarily other
`individuals. Speaker independent (SI) speech recognition
`models are normally generated from Speech provided from
`numerous individuals or from text. The generated Speaker
`independent speech recognition models often represent
`composite models which take into consideration variations
`between different speakers, e.g., due to differing pronuncia
`tions of the same word. Speaker independent Speech recog
`nition models are designed to accurately identify speech
`from a wide range of individuals including individuals who
`did not provide Speech Samples for training purposes.
`0007. In general, model training involves one or more
`individuals Speaking a word or phrase, converting the
`Speech into digital Signal data, and then processing the
`digital Signal data to generate a speech recognition model.
`
`Model training frequently involves an iterative process of
`computing a speech recognition model, Scoring the model,
`and then using the results of the Scoring operation to further
`improve and retrain the Speech recognition model.
`0008 Speech recognition model training processes can
`be very computationally complex. This is true particularly in
`the case of SI models where audio data from numerous
`SpeakerS is normally processed to generate each model. For
`this reason, Speech recognition models are often generated
`using a relatively powerful computer Systems.
`0009 Individual speech recognition models can take up a
`considerable amount of Storage Space. For this reason, it is
`often impractical to Store speech recognition models corre
`sponding to large numbers of words or phrases, e.g., the
`names of all the people in a mid-sized company, or large
`dictionary in a portable device or Speech recognizer where
`Storage Space, e.g., memory, is limited.
`0010. In addition to limits in storage capacity, portable
`devices are often equipped with limited processing power.
`Speech recognition, like the model training process, can be
`a relatively computationally complex proceSS and can there
`for be time consuming given limited processing resources.
`Since most users of a Speech processing System expect a
`prompt response from the System, to Satisfy user demands
`Speech processing often needs to be performed in real or
`near real time. AS the number of potential words which may
`be recognized increases, So does the amount of processing
`required to perform a speech recognition operation. Thus,
`devices with limited processing power which may be able to
`perform a speech recognition operation involving recogniz
`ing, e.g., 20 possible names in near real time, may not be fast
`enough to perform a recognition operation in near real time
`where the number of names is increased to 100 possible
`CS.
`0011. In the case of voice dialing and other applications
`where the recognition results need to be generated in near
`real time, e.g., with relatively little delay, the limited pro
`cessing power of portable devices often limits the size of the
`Vocabulary which can be considered as possible recognition
`OutCOmeS.
`0012. In addition to the above implementation problems,
`implementers of Speech recognition Systems are often con
`fronted with logistical problems associated with collecting
`Speech Samples to be used for model training purposes. This
`is particularly a problem in the case of Speaker independent
`Speech recognition models where the robustness of the
`models are often a function of the number of Speech Samples
`used for training and the differences between the individuals
`providing the Samples. In applications where speech recog
`nition models are to be used over a wide geographical
`region, it is particularly desirable that Speech Samples be
`collected from the various geographic regions where the
`models will ultimately be used. In this manner, regional
`Speech differences can be taken into account during model
`training.
`0013 Another problem confronting implementers of
`Speech recognition Systems is that older Speech recognition
`models may include different feature information than cur
`rent speech recognition models. When updating a System to
`use newer Speech recognition models, previously used mod
`els in addition to speech recognition Software may have to
`
`Exhibit 1023
`Page 16 of 26
`
`

`

`US 2005/0049854 A1
`
`Mar. 3, 2005
`
`be revised or replaced. This frequently requires Speech
`Samples to retrain and/or update the older models. Thus the
`problems of collecting training data and training speech
`recognition models discussed above are often encountered
`when updating existing Speech recognition Systems.
`0.014.
`In Systems using multiple speech recognition
`devices, Speech model incompatibility may require the
`extraction of different speech features for different speech
`recognition devices when the devices are used to perform a
`Speech recognition operation on the same speech Segment.
`Accordingly, in Some cases it is desirable to be able to
`Supply the Speech to be processed to multiple Systems So that
`each System can perform its own feature extraction opera
`tion.
`0.015. In view of the above discussion, it is apparent that
`there is a need for new and improved methods and apparatus
`relating to a wider range of Speech recognition issues. For
`example, there is a need for improvements with regard to the
`collecting of Speech Samples for purposes of training speech
`recognition models. There is also a need for improved
`methods of providing users of portable devices with limited
`processing power, e.g., notebook computers and personal
`data assistants (PDAS) speech recognition functionality.
`Improved methods of providing speech recognition func
`tionality in Systems where different types of Speech recog
`nition models are used by different speech recognizers is
`also desirable. Enhanced methods and apparatus for updat
`ing Speech recognition models are also desirable.
`
`SUMMARY OF THE INVENTION
`0016. The present invention is directed to methods and
`apparatus for generating, distributing, and using Speech
`recognition models. In accordance with the present inven
`tion, a shared, e.g., centralized, Speech processing facility is
`used to Support Speech recognition for a wide variety of
`devices, e.g., notebook computers, busineSS computer Sys
`tems personal data assistants, etc. The centralized speech
`processing facility of the present invention may be located
`at a physically remote Site, e.g., in a different room, building,
`or even country, than the devices to which it provides Speech
`processing and/or speech recognition Services. The shared
`Speech processing facility may be coupled to numerous
`devices via the Internet and/or one or more other commu
`nications channels. Such as telephone lines, a local area
`network (LAN), etc.
`0.017. In various embodiments, the Internet is used as the
`communications channel via which model training data is
`collected and/or Speech recognition input is received by the
`shared speech processing facility of the present invention.
`Speech files may be sent to the Speech processing facility as
`electronic mail (E-mail) message attachments. The Internet
`is also used to return Speech recognition models and/or
`information identifying recognized words or phrases
`included in the processed speech. The Speech recognition
`models may be returned as E-mail message attachments
`while the recognized words may be returned as text in the
`body of an E-mail message or in a text file attachment to an
`E-mail message.
`0.018 Thus, via the Internet, devices with audio capture
`capability and Internet access can record and transmit to the
`centralized speech processing facility of the present inven
`tion digitized speech, e.g., as Speech files. The Speech
`
`processing facility then performs a model training operation
`or Speech recognition operation using the received speech. A
`Speech recognition model or data message including the
`recognized words, phases or other information is then
`returned depending on whether a model training or recog
`nition operation was performed, to the device which Sup
`plied the Speech.
`0019. Thus, the speech processing facility of the present
`invention can be used to provide Speech recognition capa
`bilities and/or to augment a device's Speech processing
`capability by performing Speech recognition model training
`operations and/or additional Speech recognition operations
`which can be used to Supplement local Speech recognition
`attempts.
`0020 For example, in various embodiments of the
`present invention, the generation of Speech recognition
`models to be used locally is performed by the remote speech
`processing facility. In one Such embodiment, when the local
`computer device needs a speech recognition model to be
`trained, the local computer System collects the necessary
`training data, e.g., Speech Samples from the System user and
`text corresponding to the retrieved speech Samples and then
`transmits the training data, e.g., via the Internet, to the
`Speech processing facility of the present invention. The
`Speech processing facility then generates one or more speech
`recognition models and returns them to the local computer
`System for use in local Speech recognition operations.
`0021. In various embodiments, the shared speech pro
`cessing facility updates a training database with the Speech
`Samples received from local computer Systems. In this way,
`a more robust Set of training data is created at the remote
`Speech processing facility as part of the model training
`and/or updating proceSS without imposing addition burdens
`on individual devices beyond those needed to Support Ser
`vices being provided to a use of an individual device, e.g.,
`notebook computer or PDA. AS the training database is
`augmented, Speaker independent Speech recognition models
`may be retrained periodically using the updated training data
`and then transmitted to those computer Systems which use
`Speech recognition models corresponding to those models
`which are retrained. In this manner, multiple local Systems
`can benefit from one or more different users initiating the
`retraining of Speech recognition models to enhance recog
`nition results.
`0022. As discussed above, in various embodiments, the
`remote speech processing facility of the present invention is
`used to perform Speech recognition operations and then
`return the recognition results or take other actions based on
`the recognition results. For example, in one embodiment
`business computer Systems capture Speech from, e.g., cus
`tomers, and then transmit the Speech or extracted Speech
`information to the shared speech processing facility via the
`Internet. The remote speech processing facility performs
`Speech recognition operations on the received speech and/or
`received extracted Speech information. The results of the
`recognition operation, e.g., recognized words in the form of,
`e.g., text, are then returned to the busineSS computer System
`which Supplied the processed speech or Speech information.
`The busineSS System can then use the information returned
`by the Speech processing facility, e.g., recognized text, to fill
`in forms or perform other Services Such as automatically
`respond to Verbal customer inquires. Thus, the remote
`
`Exhibit 1023
`Page 17 of 26
`
`

`

`US 2005/0049854 A1
`
`Mar. 3, 2005
`
`Speech processing method of the present invention can be
`used to Supply Speech processing capabilities to customers,
`e.g., businesses, who can't, or do not want to, Support local
`Speech processing operations.
`0023. In addition to providing speech recognition capa
`bilities to Systems which can't perform Speech recognition
`locally, the Speech processing facility of the present inven
`tion is used in various embodiments to augment the Speech
`recognition capabilities of various devices Such as notebook
`computers and personal data assistants. In Such embodi
`ments the remote speech processing facility may be used to
`perform Speech recognition when the local device is unable
`to obtain a Satisfactory recognition result, e.g., because of a
`limited Vocabulary or limited processing capability.
`0024.
`In one particular exemplary embodiment, a note
`book computer attempts to perform a voice dialing operation
`on received speech using locally Stored speech recognition
`models prior to contracting the Speech processing facility of
`the present invention. If the local Speech recognition opera
`tion fails to result in the recognition of a name, the received
`Speech or extracted feature information is transmitted to the
`remote speech processing facility. If the local notebook
`computer can't perform a dialing operation the notebook
`computer also transmits to the remote speech processing
`facility a telephone number where the user of the notebook
`computer can be contacted by telephone. The remote speech
`processing facility performs a speech recognition operation
`using the received speech and/or extracted feature informa
`tion. If the Speech recognition operation results in the
`recognition of a name with which a telephone number is
`asSociated the telephone number is retrieved from the
`remote speech processing facility's memory. The telephone
`number is returned to the device requesting that the Voice
`dialing Speech recognition operation be performed unless a
`contact telephone number was provided with the Speech
`and/or extracted feature information. In Such a case, the
`Speech processing facility uses telephone circuitry to initiate
`one telephone call to the telephone number retrieved from
`memory and another telephone call to the received contact
`telephone number. When the two calls are answered, they
`are bridged thereby completing the Voice dialing operation.
`0.025
`In addition to generating new speech recognition
`models to be used in Speech processing operations and
`providing speech recognition Services, the centralized
`Speech processing facility of the present invention can be
`used for modernizing existing Speech recognition System but
`upgrading speech recognition models and the Speech recog
`nition engine used therewith. In one particular embodiment,
`Speech recognition models or templates are received via the
`Internet from a System to be updated along with Speech
`corresponding to the modeled words. The received models
`or templates and/or speech are used to generate updated
`models which include different speech characteristic infor
`mation or have a different model format than the existing
`Speech recognition models. The updated models are returned
`to the Speech recognition Systems along with, in Some cases,
`new speech recognition engine Software.
`0026. In one particular embodiment, speech recognition
`templates used by voice dialing Systems are updated and
`replaced with HMMs generated by the central processing
`System of the present invention.
`
`0027. At the time the templates are replaced, the speech
`recognition engine Software is also replaced with a new
`Speech recognition engine which uses HMMs for recogni
`tion purposes.
`0028. Various additional features and advantages of the
`present invention will be apparent from the detailed descrip
`tion which follows.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`0029 FIG. 1 illustrates a communication system imple
`mented in accordance with an exemplary embodiment of the
`present invention.
`0030 FIG. 2 illustrates the communications system of
`FIG. 1 in greater detail.
`0031
`FIG. 3 illustrates a computer system for use in the
`communications system illustrated in FIG. 1.
`0032 FIG. 4 illustrates memory which may be used as
`the memory of a computer in the system illustrated in FIG.
`1.
`0033 FIG. 5 illustrates a voice dialing customer record
`implemented in accordance with the present invention.
`0034 FIG. 6 illustrates a voice dialing IP device which
`may be used in the system illustrated in FIG. 1.
`0035 FIG. 7 illustrates a model training routine of the
`present invention.
`0036 FIG. 8 illustrates an exemplary voice dialing rou
`tine of the present invention.
`0037 FIG. 9 illustrates a local voice dialing subroutine
`of the present invention.
`0038 FIG. 10 illustrates a remote voice dialing routine
`implemented in accordance with the present invention.
`0039 FIG. 11 illustrates a call establishment routine of
`the present invention.
`0040 FIG. 12 illustrates a model generation routine of
`the present invention.
`0041
`FIG. 13 illustrates a speech processing facility
`implemented in accordance with one embodiment of the
`present invention.
`0042 FIG. 14 illustrates a speech recognition routine
`that can be executed by the Speech processing facility of
`FIG. 13.
`
`DETAILED DESCRIPTION
`0043. As discussed above, the present invention is
`directed to methods and apparatus for generating speech
`recognition models, distributing Speech recognition models
`and performing Speech recognition operations, e.g., Voice
`dialing and word processing operations, using Speech rec
`ognition models.
`0044 FIG. 1 illustrates a communications system 100
`implemented in accordance with the present invention. AS
`illustrated, the system 100 includes a business premises 10
`and customer premises 12, 14, 16. Each one of the premises
`10, 12, 14, 16 represents a customer or business site. While
`only one business premise 10 is shown, it is to be understood
`that any number of busineSS and customer premises may be
`
`Exhibit 1023
`Page 18 of 26
`
`

`

`US 2005/0049854 A1
`
`Mar. 3, 2005
`
`included in the system 100. The various premises 10, 12, 14,
`16, 18 are coupled together and to a shared speech proceSS
`ing facility 18 of the present invention via the Internet 20
`and a telephone network 22. Connections to the Internet 20
`may be via digital subscriber lines (DSL), cable modems,
`cable lines, high Speed data links, e.g., T1 links, dial-up
`lines, wireleSS connections or a wide range of other com
`munications channels. The premises 10, 12, 14, 16, 18 may
`be connected to the Speech processing facility via a LAN or
`other communications channel instead of, or in addition to,
`the Internet.
`0.045 While businesses have frequently contracted for
`high Speed Internet connections, e.g., T1 links and other high
`Speed Services, which may be on during all hours of business
`Service, residential customers are now also moving to rela
`tively high Speed Internet connections which are “always
`on'. AS part of Such Services, a link to the Internet is
`maintained while the computer user has his/her computer on
`avoiding delays associated with establishing an Internet
`connection when data needs to be sent or received over the
`Internet. Examples of Such Internet connections include
`cable modem services and DSL services. Such services
`frequently Support Sufficient bandwidth for the transmission
`of audio Signals. AS the Speed of Internet connections
`increases, the number of Internet Service Subscribers capable
`of transmitting audio signals in real or near real time will
`continue to increase.
`0046) The speech processing facility 18 is capable of
`receiving speech from the various premises 10, 12, 14, 16
`and performing speech processing operations thereon. The
`operations may include Speech model training, e.g., gener

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket