throbber
United States Patent (19)
`Wrench, Jr. et al.
`
`54 REAL-TIME SPEECH PROCESSING
`DEVELOPMENT SYSTEM
`Edwin H. Wrench, Jr.; Alan L.
`75 Inventors:
`Higgins, both of San Diego, Calif.
`73) Assignee: ITT Corporation, New York, N.Y.
`21 Appl. No.: 376,076
`22 Filed:
`Jul. 6, 1989
`51
`Int. Cl............................ G10L 7/08; G10L 7/02
`52 U.S. C. ......................................... 381/43; 381/41
`58 Field of Search .................................... 381/41-46;
`364/513.5
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`4,144,582 3/1979 Hyatt ..................................... 381/43
`4,641,238 2/1987 Kneib .................................. 364/200
`4,653,097 3/1987 Watanabe et al. .................... 381/42
`4,695,977 9/1987 Hansen et al. ...................... 364/900
`4,799,144 1/1989 Parruck et al. ..................... 364/200
`4,827,522 5/1989 Matsuura et al. ..................... 381/.43
`OTHER PUBLICATIONS
`Perdue et al., AT&T Technical Journal, Conversant 1
`Voice System: Architecture and Applications, Sep./Oct.
`1986, vol. 65, No. 5; pp. 34-47.
`
`Patent Number:
`11
`45 Date of Patent:
`
`5,036,539
`Jul. 30, 1991
`
`Primary Examiner-Gary V. Harkconn
`Assistant Examiner-Michelle Doerrler
`Attorney, Agent, or Firm-Arthur L. Plevy
`57
`ABSTRACT
`A real-time speech processing development system has
`a control subsystem (CS) and a recognition subsystem
`(RS) interconnected by a CS/RS interface. The control
`subsystem includes a control processor, an operator
`interface, a user interface, and a control program mod
`ule for loading any one of a plurality of control pro
`grams which employ speech recognition processes. The
`recognition system RS includes a master processor,
`speech signal processor, and template matching proces
`sors all interconnected on a common bus which com
`municates with the control subsystem through the me
`diation of the CS/RS interface. The two-part configura
`tion allows the control subsystem to be accessed by the
`operator for non-real-time system functions, and the
`recognition subsystem to be accessed by the user for
`real-time speech processing functions. An embodiment
`of a speaker verification system includes template en
`rollment, template training, recognition by template
`concatenation and time alignment, silence and filler
`template generation, and speaker monitoring modes.
`
`11 Claims, 5 Drawing Sheets
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`DIGITAL
`DATA
`NTERFACE
`
`
`
`DATABASE
`STORAGE
`
`GENERAL
`PURPOSE
`CONTROL
`PROCESSOR
`
`
`
`
`
`RECOGNITION
`SUBSYSTEM
`CSIRS
`INTERFACE
`
`RS
`
`SIGNAL
`Pri SSING
`RECOGNITION ANA-9
`SPEECH
`INTERFACE
`
`
`
`
`
`OPERATORS
`TERMINAL
`
`Amazon / Zentian Limited
`Exhibit 1010
`Page 1
`
`

`

`U.S. Patent
`
`July 30, 1991
`
`Sheet 1 of 5
`
`5,036,539
`
`CONTROL
`SUBSYSTEM
`CS
`
`AES
`
`RECOGNITION
`UBSYSTEM
`
`GENERAL
`PURPOSE
`CONTROL
`PROCESSOR
`
`SIGNAL
`PROCESSING
`ANALOG
`RECOGNITIONSE
`INTERFACE
`
`DIGITAL
`BSA
`NTERFACE
`
`
`
`PRINTER
`NERFACE
`
`OPERATORS
`TERMINAL
`
`
`
`CONTROL SUBSYSTEM
`ICONIRO
`|PROCESSOR
`
`OPERATOR
`
`
`
`
`
`USER
`
`PERPH,
`
`USER
`NTERFACE
`CSIRS
`INTERFACE
`DATABASE
`
`
`
`
`
`
`
`
`
`
`
`FG.2
`
`RECONTION SUBSYSTEM
`MASTER
`PROCESSOR
`TEMPLATE
`MATCHNG
`PROCESSORS
`
`SPEECH
`USER
`SGNA
`PROCESSORS SE
`
`
`
`Amazon / Zentian Limited
`Exhibit 1010
`Page 2
`
`

`

`U.S. Patent
`
`July 30, 1991
`
`Sheet 2 of 5
`
`5,036,539
`
`CONTROL SUBSYSTEM CONTROL STRUCTURE
`
`
`
`OPERATOR
`INTERFACE
`PROGRAM
`
`OPERATOR
`DBMS
`SETUP FLE
`
`
`
`SPEAKER
`ENROLLMENT
`PROGRAM
`
`SPEAKER
`VERIFICATION
`PROGRAM
`
`SPEAKER
`MONTORING
`PROGRAM
`
`USER DBMS RS
`
`USER DBMS RS
`
`USER DBMS RS
`
`FIG. 3
`
`RECOGNITION SUBSYSTEM CONTROL STRUCTURE
`
`
`
`CS
`TEMPLATE PROC.
`SPEECH S.G. PROC.
`
`INTIALZE
`
`ACCEPT
`FRAME
`DATA
`(FROM
`SPEECH
`SIGNAL
`PROCESSOR)
`
`TRANSMT
`MESSAGE
`
`RECEIVE
`MESSAGE
`
`BACKGROUND
`DIAG.
`
`COMPUTE
`TEMPLATE
`WALUES
`- RECOG.
`- TRAN
`- FLER
`-MONITOR
`
`FIG. 4
`
`Amazon / Zentian Limited
`Exhibit 1010
`Page 3
`
`

`

`U.S. Patent
`
`July 30, 1991
`
`Sheet 3 of 5
`
`5,036,539
`
`MESSAGE PROTOCOL FOR TEMPLATE TRAINING
`CONTROL SUBSYSTEM
`RECOGNITION SUBSYSTEM
`
`
`
`TRAN COMMAND
`
`ACK RESPONSE
`
`ACK RESPONSE
`UPDATE TEMPLATES COMMAND
`
`ACK RESPONSE
`
`ACK RESPONSE
`
`ACK RESPONSE
`UPDATE TEMPLATES COMMAND
`
`ACK RESPONSE -
`
`S SYNTAX
`FE
`RECOGNIZER WER
`S SCRIPT
`FE
`RECOGNIZER WER
`PROMPT MESSAGE
`
`SCHEDULES SNTX
`PERFORMS OPEN RECOGNITION
`CORRECT RESULTS
`RESULTS MESSAGE
`
`SCHEDULES TMPT
`TEMPLATES TRA NED
`PROMPT MESSAGE
`
`SCHEDULE SNTX
`PERFORMS OPEN RECOGNITION
`NCORRECT RESULTS
`RESULTS MESSAGE
`
`PERFORMS FORCED RECOGNITION
`CORRECT RESULTS
`RESULTS MESSAGE
`
`SCHEDULES TMPT
`TEMPLATES TRANED
`TRANING COMPLETE MESSAGE
`
`GOES TO STANDBY MODE
`
`F.G. 5
`
`Amazon / Zentian Limited
`Exhibit 1010
`Page 4
`
`

`

`U.S. Patent
`
`July 30, 1991
`
`Sheet 4 of 5
`
`5,036,539
`
`MESSAGE PROTOCOL FOR TEMPLATE ENROLLMENT
`
`
`
`CONTROL SUBSYSTEM
`
`RECOGNON SUBSYSTEM
`
`ENROLL COMMAND
`
`ACK RESPONSE
`
`ACK RESPONSE
`
`ACK RESPONSE
`
`RECOGNIZER CONSTRUCTS SYNTAX
`RECOGNIZER WERFES SCRIPT (FOADED)
`PROMPT MESSAGE
`
`SCHEDULES SNTX
`PERFORMS RECOGNITION
`SCHEDULES TEMPLATE ENROLLMENT (TMP)
`STORESTEMPLATE IN MEMORY
`PROMP MESSAGE
`
`SCHEDULES SNTX
`PERFORMS RECOGNITION
`SCHEDULESTEMPLATE ENROLMENT (TMPT)
`STORES TEMPLATE IN MEMORY
`TRAINING COMPLETE MESSAGE
`
`GOES TO STANDBY MODE
`
`F.G. 6
`
`Amazon / Zentian Limited
`Exhibit 1010
`Page 5
`
`

`

`U.S. Patent
`
`July 30, 1991
`
`Sheet 5 of 5
`
`5,036,539
`
`
`
`MESSAGE PROTOCOL FOR SPEECH RECOGNITION
`CONTROL SUBSYSTEM
`RECOGNITION SUBSYSTEM
`
`RECOGNIZE COMMAND
`
`STANDBY MODE COMMAND
`
`SCHEDULES OPEN RECOGNITION (SYTX)
`ACK
`
`PERFORMS OPEN RECOGNITION
`RESULTS MESSAGE
`RESCHEDULES OPEN RECOGNITION (SNTX)
`PERFORMS OPEN RECOGNITION
`RESULTS MESSAGE
`
`SR GOES TO STANDBY MODE
`
`ACK
`
`FIG. 7
`
`Amazon / Zentian Limited
`Exhibit 1010
`Page 6
`
`

`

`1.
`
`REAL-TIME SPEECH PROCESSING
`DEVELOPMENT SYSTEM
`
`5,036,539
`
`10
`
`5
`
`40
`
`50
`
`55
`
`FIELD OF INVENTION
`The present invention relates to a computerized sys
`ten for performing speaker verification, speech recog
`nition, and other speech processing functions, and par
`ticularly, to one which can be efficiently used for both
`system development operations and real-time speech
`processing operations.
`BACKGROUND OF INVENTION
`Conventional speech processing systems commonly
`employ a speech recognition module which transforms
`input signals representing speech utterances into dis
`crete representations that are compared to stored digital
`representations (templates) of expected words or speech
`sound units. The input speech signals are "recognized'
`20
`usually by using a statistical algorithm to measure and
`detect a match to a corresponding word or sound tem
`plate. Speech processing systems and algorithms are
`usually designed for one or more particular modes of
`25
`operation, e.g., speaker-dependent or independent
`speech recognition, text- or application-dependent or
`independent speech recognition, speaker verification
`(authentication of identity), speaker recognition (selec
`tion from a number of candidates), or speaker monitor
`30
`ing (identity, direction, etc.). The design of such sys
`tens can vary widely with the application, speaker
`vocabulary, syntax, or environment of use.
`Over the past several years, speech processing tech
`nology has achieved a level of performance sufficient to
`35
`admit the introduction of successful commercial prod
`ucts. Development work continues to further improve
`the accuracy, reduce the vulnerability, and expand the
`capabilities of such systems. However, progress toward
`improvement has been limited by the available tools for
`system and algorithm development.
`One factor limiting progress is that error rates have
`become low enough, for example, in text-dependent
`speaker verification, that a large test must be performed
`45
`to ascertain whether an improvement has been made.
`To illustrate, if the probability of false acceptance is on
`the order of 1/1000, and the test is designed to observe
`30 errors, then 30,000 trials are needed. Performing
`such a test using a simulation running on a time-sharing
`computer could take weeks or months. To mitigate this
`problem, tests may be run using a fast special-purpose
`hardware implementation of the recognition algorithm.
`However, this leads to a second problem, i.e. making
`changes to the algorithm may be very difficult because
`of the constraints imposed by the hardware or software.
`A third important factor is that the recognition sys
`tem itself influences the user's speaking behavior. This
`influence is absent if the user's speech input is prere
`corded and the user does not have a real-time interac
`tion with the system. The environment in which the
`system is installed, the details of the user interface, and
`the feedback of past acceptance or rejection decisions
`can all affect the user's interaction with the system.
`Thus, valid testing in the intended environment of use
`requires a real-time implementation of the recognition
`algorithm and an accurate simulation of the user inter
`face.
`
`60
`
`65
`
`2
`SUMMARY OF INVENTION
`In order to improve upon the utility and effectiveness
`of conventional speech processing systems, it is a princi
`pal object of the invention that the system be operable
`in real-time response to speaker input using a realistic
`user interface, while at the same time remain flexible
`and accessible enough to be a useful tool for the devel
`opment and improvement of speech recognition algo
`rithms and system designs. A particular object of the
`invention is to provide a speech processing develop
`ment system which allows non-real-time access to its
`command, control, and recognition structures so that
`changes can be made readily to the system design and
`/or recognition algorithm, and which simultaneously
`allows real-time interaction of a speaker with its speech
`recognition functions.
`In accordance with the invention, a real-time speech
`processing development system comprises:
`(a) a speech recognition subsystem including a master
`processor, a template matching processor, a speech
`signal processor, and speech signal input means,
`wherein said master processor is configured to receive
`internal subsystem operation commands for performing .
`speech recognition and to thereupon generate subsys
`tem execution commands for real-time operation of said
`speech signal processor to process speech signals re
`ceived from a user through said speech signal input
`means into corresponding digital representations, and of
`said template matching processor to compare the digital
`representations provided by said speech signal proces
`sor with stored templates of word or sound units and to
`produce a real-time speech recognition output based
`thereon;
`(b) a control subsystem including an operator inter
`face for control communications with a system opera
`tor, a user interface for control communications with a
`user, a control program module, and a control proces
`sor connected to said operator interface, said speaker
`interface, and said control program module, wherein
`said control processor is configured to receive operator
`control inpuc through said operator interface and to
`access said control program module to load one of
`plurality of control programs selected by the operator,
`including a speech recognition control program, and
`wherein said control processor is further configured to
`operate said user interface, when the speech recognition
`control program is selected by the operator, for control
`communications with the user, and to execute said
`speech recognition control program so as to generate
`the internal subsystem operation commands for per
`forming speech recognition provided to said master
`processor of said recognition subsystem; and
`(c) an interface connected between said control sub
`system and said recognition system for transmitting the
`internal subsystem operation commands for performing
`speech recognition from said control subsystem to said
`recognition subsystem and the real-time speech recogni
`tion output from said recognition subsystem to said
`control subsystem, whereby said control subsystem can
`be accessed by the operator for non-real-time system
`development functions while said recognition subsys
`tem can be accessed by the user for real-time speech
`recognition functions.
`In the preferred embodiment of the invention, the
`control program module of the control subsystem in
`cludes control programs for a speaker enrollment pro
`gram for enrolling speech or vocabulary samples of a
`
`Amazon / Zentian Limited
`Exhibit 1010
`Page 7
`
`

`

`5
`
`15
`
`20
`
`5,036,539
`4.
`3
`K. P. Li and E. H. Wrench; and copending U.S. patent
`speaker into the system, a speaker verification program
`application No. 346,054, filed on 5/2/89, by B. P. Land
`for verifying a speaker based upon comparisons to
`ell et al., entitled "Automatic Speech Recognition Sys
`stored speech samples of the speaker, and a speaker
`tem. Using Seed Templates', now U.S. Pat. No.
`monitoring program for passively monitoring the iden
`4,994,983.
`tity of a speaker. The three control programs of the
`Referring to FIG. 1, a real-time speech processing
`system each includes use of the core speech recognition
`development system in accordance with the invention
`program.
`has a control subsystem CS and a recognition subsystem
`BRIEF DESCRIPTION OF DRAWINGS
`RS interconnected by a CS/RS interface. The control
`subsystem CS includes a general purpose control pro
`The above objects and further features and advan
`10
`cessor, an operator interface to an operator's terminal, a
`tages of the invention are described in detail below in
`user interface to a user's terminal, a database storage,
`conjunction with the drawings, of which:
`and input/output interfaces to system peripherals such
`FIG. 1 is a schematic diagram of the external configu
`as a printer, modem, datalink, etc. The recognition sys
`ration of a real-time speech processing development
`tem RS includes a signal processing and recognition
`system in accordance with the invention;
`module which receives analog speech signal input from
`FIG. 2 is a schematic diagram of the internal configu
`an input device (microphone set) for the user.
`ration of a control subsystem and a recognition subsys
`As shown in FIG. 2, the internal structure of the
`tem for the real-time speech processing development
`control subsystem CS includes the general purpose
`system of the invention;
`control processor which is interconnected for external
`FIG. 3 is a schematic diagram of the structure of the
`communication with the operator through the operator
`control programs for the control subsystem shown in
`interface and with the user through the user interface,
`FIG. 2;
`the CS/RS interface to the common bus for internal
`FIG. 4 is a schematic diagram of the control structure
`communication with the recognition subsystem RS, the
`of the recognition subsystem shown in FIG. 2;
`database storage, and the digital data input/output in
`FIG. 5 is a diagram of the communication exchange
`25
`terface. The control processor is connected to a control
`between the control and recognition subsystems for the
`program module, to be described further below.
`template training process;
`The control processor is configured to receive opera
`FIG. 6 is a diagram of the communication exchange
`tor control input through the operator interface and to
`between the control and recognition subsystems for the
`access the control program module to load one of a
`template enrollment process; and
`30
`plurality of control programs selected by the operator.
`FIG. 7 is a diagram of the communication exchange
`The control program is executed by the control proces
`between the control and recognition subsystems for the
`sor to control communications with the operator and
`speech recognition process
`/or user, and to generate internal subsystem operation
`DETAILED DESCRIPTION OF INVENTION
`commands which are provided through the CS/RS
`35
`interface to the master processor of the recognition
`The present invention encompasses speech process
`subsystem RS via a common bus. The database storage
`ing systems for a wide range of speech processing func
`is used to store the system data files, the user data files,
`tions and modes of operation, including speaker
`and the template files.
`dependent or independent speech recognition, text- or
`The internal structure of the recognition subsystem
`application-dependent or independent speech recogni
`40
`RS includes a master processor, a plurality of template
`tion, speaker verification, speaker recognition, speaker
`matching processors in parallel, and a speech signal
`monitoring, continuous speech recognition, wordspot
`processor which is connected to the speech signal input
`ting, and isolated word recognition, as well as for differ
`device. The master processor, template processors, and
`ent environments of use and field applications. The
`speech signal processor are all interconnected with the
`preferred embodiment of the invention described herein
`45
`common bus. The master processor is configured to
`is directed to a system for speaker verification using a
`receive the internal subsystem operation commands for
`speaker-dependent, template-matching recognition al
`performing speech recognition sent from the control
`gorithm. However, it is to be understood that the princi
`subsystem CS via the common bus, and to thereupon
`ples of the invention are equally applicable to other
`generate subsystem execution commands. The RS exe
`types of systems and are not limited to the described
`cution commands activate RS program modules (to be
`system.
`described hereinafter) for real-time operation of the
`For a wider, more detailed explanation of speech
`speech signal processor to process input speech signals
`recognition systems, the following are incorporated
`into corresponding digital representations, and of the
`herein by reference: "A Comparison of Four Tech
`template matching processors to compare the digital
`niques for Automatic Speaker Recognition', by R. E.
`55
`representations provided by the speech signal processor
`Wohlford, E. H. Wrench, and B. P. Landell, 1980 IEEE
`with templates of word or sound units and to produce a
`International Conference on Acoustics, Speech and
`real-time speech recognition output based thereon.
`Signal Processing (ICASSP), vol. 3, pp. 908-911; "A
`The common bus connects the control subsystem and
`Realtime Implementation of a Text Independent
`the recognition system through the mediation of the
`Speaker Recognition System", by E. H. Wrench, 1981
`CS/RS interface to transmit the internal subsystem
`ICASSP, vol. 1, pp. 193-196; "Keyword Recognition
`operation commands from the control subsystem CS to
`Using Template Concatenation", by A. L. Higgins and
`the recognition subsystem RS and the real-time outputs
`R. E. Wohlford, 1985 ICASSP; "Speaker Recognition
`and responses from the recognition subsystem to the
`by Template Matching', by A. L. Higgins, Proceedings
`control subsystem. The two-subsystem configuration
`of Speech Technology 1986, New York, N.Y.; “Im
`65
`allows the control subsystem to be accessed by the
`proved Speech Recognition in Noise', by B. P. Landell,
`operator for non-real-time system functions, such as
`R. E. Wohlford, and L. G. Bahler, 1986 ICASSP, vol.
`handling the user interface, program decision-making,
`1, no.1; U.S. Pat. No. 4,720,863 issued Jan. 19, 1988, to
`
`50
`
`Amazon / Zentian Limited
`Exhibit 1010
`Page 8
`
`

`

`20
`
`5,036,539
`6
`5
`The overall operation of the system is under the con
`program editing, database management, and system
`trol of an operator interface program which allows the
`initialization, testing and diagnostics. This frees the
`operator complete control over the operation of the
`recognition subsystem to be accessed by the user for
`system. This program is used to select the mode of
`real-time speech functions, such as speech signal digiti
`operation, the input source, system parameters, etc. It
`zation, spectral frame analysis, and template matching.
`creates a "setup' file containing the system parameters
`In a practical implementation, the control subsystem
`and invokes the appropriate control program by open
`hardware is a Sun 3/160 Unix workstation which in
`ing a new Unix process. Upon startup, the control pro
`cludes a 16 MHz Motorola 68020 CPU with 8 mega
`gram reads the setup file. Each control program can
`bytes of CPU memory and 160 megabytes of disk stor
`also be run using shell scripts without the operator
`age. A System Administrator CRT console is used as
`10
`interface program, which facilitates off-line processing
`the operator's console, and a VT-100 type terminal as
`of pre-recorded speaker databases. The interface is pref.
`the user's terminal. The recognition subsystem is com
`erably menu driven to minimize the need for typed
`posed of an ICS-100 4-channel A/D and D/A con
`input.
`verter, a 400 KHz maximum sampling rate, and 4255
`A wide range of different operational modes can be
`tap programmable digital FIR filters for signal acquisi
`15
`setup, tested, and simulated on the described system by
`tion and conditioning; a Sky Challenger board contain
`selectively loading the appropriate control program or
`ing two TMS-32020s at 20 MHz clock rate to perform
`programs from the control program module. The spe
`front end signal processing of real-time parameter val
`cific function and design of each control program is a
`ues of the input speech signals into frames of speech
`matter of choice and need not be described in detail
`data; and seven Motorola 68020-based single-board
`herein. Examples of the operation of speaker enroll
`processors, one of which is used to serve as the master
`ment, speaker verification, and speaker monitoring pro
`processor, and the other six to perform template match
`grams are given further below for purposes of illustra
`ing. Each 68020 board contains 1 megabyte of random
`tion. The essential concept in the present invention is
`access memory and operates at a 20 MHz clock rate.
`the dual configuration of the control and recognition
`All of the above devices are commercially available
`25
`subsystems connected by a common interface wherein a
`VME-compatible boards that plug directly into the Sun
`particular control program selected by an operator is
`3/160 VME backplane. The VME bus serves as the
`loaded by the control subsystem and the real-time
`medium of communication between the two subsys
`speech processing functions of the selected program are
`tems. The throughput of the system is approximately
`executed by the recognition subsystem in response to
`3000 template frames per input frame. A microphone
`internal subsystem operation commands generated by
`preamplifier and a digital sampling clock are imple
`the control program. This dual configuration keeps the
`mented on a custom wire wrap board mounted with the
`recognition subsystem accessible to the speaker/user
`ICS-100A/D converter. The entire system is contained
`for real-time simulation, while allowing the control
`within the chassis of the Sun workstation. A telephone
`subsystem to pursue non-time-critical tasks and main
`style handset is plugged into a connector located on the
`tain flexibility in program range and development.
`rear of the unit.
`A database management system (DBMS) is provided
`PROGRAM STRUCTURE OF RECOGNITION
`with the control subsystem to manage the data storage
`SUBSYSTEM
`and retrieval operations. In the speaker verification
`The master processor of the recognition subsystem
`operational mode of the system, for example, three
`RS acts as the subsystem executive to the six template
`databases are maintained: a user database, an access
`processors and the speech signal processor. As shown in
`database, and a score database. The user database con
`FIG. 4, the master processor software consists of a
`tains personal information about each user of the system
`system executive program and six program modules of
`such as name, address, etc. The access database contains
`different priority levels, labelled 0-5. In order of de
`information about each attempted access such as date,
`45
`creasing priority they are: Initialization; Accept Frame
`time, identity code, and verification decision. The oper
`Data; Receive Message; Transmit Message; Compute
`ator may query the user and the access databases to
`Template Values; and Background. The Initialization
`audit the system usage. The score database is used by
`module performs the various tasks for initializing the
`the verification control program to set thresholds for
`RS parameters. The Accept Frame Data module coor
`acceptance or rejection.
`dinates signal acquisition, conditioning, and processing
`PROGRAM STRUCTURE OF CONTROL
`by the speech signal processor into frames of speech
`SUBSYSTEM
`data. The Transmit and Receive Message modules coor
`dinate sending response outputs to and receiving com
`Referring to FIG. 3, the program structure for the
`mand inputs from the CS. The Compute Template Val
`control subsystem CS will now be described. The im
`55
`ues module performs various speech recognition func
`plemented CS embodiment employs the Sun Unix oper
`tions, such as template matching, template training (for
`ating system and a set of control programs written in
`enrolling a speaker vocabulary), filler template genera
`the language C. A separate control program exists for
`tion (for silence and other filler templates), and monitor
`each of the distinct tasks performed by the CS, includ
`ing (for speaker monitoring and verification operational
`ing speaker enrollment, speaker verification, and
`modes). The Background module performs system di
`speaker monitoring. The function of each control pro
`agnostics, and is always scheduled.
`gram is to perform the particular task by issuing appro
`Template matching is performed by the master pro
`priate internal subsystem operation commands to the
`cessor and the six 68020 template processors. The soft
`RS and to receive and process the results. The control
`ware for these processors is designed to implement
`programs also implement the user interface, display
`template matching by dynamic programming (DP).
`instructions, prompt the operator or user, interpret the
`The master processor receives a frame of speech data
`operator's or user's terminal keyboard input, and store
`input from the frontend speech signal processor, re
`the program's data files (DBMS).
`
`65
`
`30
`
`35
`
`50
`
`Amazon / Zentian Limited
`Exhibit 1010
`Page 9
`
`

`

`5
`
`10
`
`15
`
`5,036,539
`8
`7
`ate command modules and logic in the RS to carry out
`ceives command messages from the CS, and performs
`the CS command message.
`the DP algorithm of searching for the closest match
`The RS sends a message to the CS by writing the data
`among templates downloaded from the CS. The tem
`words (if any) into the receive buffer along with a mes
`plate processors are called once per frame to perform
`sage header. The RS then schedules the low-level com
`the template difference (distance) computations in par
`munications (RS transmit) module to place the message
`allel, each one processing a preassigned group of tem
`in a location that the CS will access and sends an inter
`plates evenly divided among the template processors.
`rupt request to notify the CS that a new message is
`Tasks to be performed by the RS are requested by
`available. The CS then sends an interrupt acknowledge
`sending "inter-module packets' which are placed on a
`ment. The RS responds with a status/id, clears the inter
`scheduling queue. When a module completes a task and
`rupt request, and clears the header and data words from
`gives up control, the system executive schedules the
`the receive buffer. If appropriate, the CS sends an ac
`next priority task by scanning the queue from highest
`knowledgement to the RS upon receipt of the message.
`priority to lowest priority. The executive has entry
`Upon receipt of the acknowledgement, the RS termi
`points on the queue to schedule modules both from an
`nates the transmission sequence.
`interrupt code and from a noninterrupt code. If the
`In the preferred embodiment of the system having
`priority of a requested task is higher than that of the
`speaker enrollment, verification, and monitoring modes
`currently-running task, the executive performs a con
`of operation, examples of CS-to-RS messages include:
`text switch, giving control to the requested task. The
`Calibrate Noise; Create Silence Template; Enroll New
`interrupted task is put on the scheduling queue, and the
`Template; Train Template (by averaging a repeated,
`program counter and all registers are preserved so that
`utterance with an enrolled template); Recognize
`it can take up where it was interrupted upon regaining
`Speech; Upload and Download Templates; Download
`control.
`Syntax; Download System Parameters; Filler Genera
`Within the RS, each processor may access the mem
`tion; and Monitoring Mode. Examples of RS-to-CS.
`ory of all the processors on the common bus. Communi
`messages include: Acknowledgement; Recognition Re
`cation between the master processor and the template
`25
`sults; Prompt Speaker; Action Completed; Upload
`processors is performed by means of setting flags in the
`Templates; Upload Noise Calibration; Error messages;
`shared memory. The frontend speech signal processor
`and Speaker Verifier Results. The important system
`passes frames of input speech data to the master proces
`functions and internal command sequences of a pre
`sor by sending it an interrupt once per frame. The A/D
`ferred speech processing development system are de
`converter passes PCM (pulse code modulated) data to
`30
`the frontend through a parallel port and generates an
`scribed in more detail below.
`interrupt once per sample.
`SYSTEM FUNCTIONS OF SPEAKER
`VERIFICATION SYSTEM
`CS/RS INTERFACE
`In one preferred embodiment, a speech processing
`Other than the analog input from the microphone, the
`development system is designed for speaker verifica
`only input/output to/from the recognition subsystem
`tion, i.e. for accepting or rejecting the claimed identity
`RS are the internal subsystem operation commands
`of a speaker seeking to enter a controlled area based
`issued by the CS under a selected control program and
`upon measurements of input speech samples. The verifi
`the recognition responses provided by the RS through
`cation system includes control programs for enrollment
`the CS/RS interface. Thus, the command structure to
`of the speaker's vocabulary (speech samples) in the
`the RS is entirely transparent to the user, and the user
`system, for verification of a speaker's input speech as
`can interact with the RS in real-time simulation through
`compared to templates of a previously enrolled vocabu
`the user interface of the CS. The CS/RS interface in
`lary, and for monitoring a speaker's input without con
`cludes a pair of message buffers, i.e. a transmit buffer for
`straint from the system except to authenticate the speak
`messages from the CS to the RS, and a receive buffer
`45
`er's identity claim periodically.
`for messages from the RS to the CS. The buffers can be
`The verification system uses a speech recognition
`implemented on either side of the VME bus. Sending
`algorithm based upon template matching for both text
`and receiving messages on the Sun workstation is done
`dependent and text-independent applications. The rec
`by Unix device drivers. The frequency of messages is
`ognition algorithm is the core of the recognition subsys
`low enough that there is no difficulty in servicing them.
`50
`tem. In general terms, the algorithm is of the type which
`Messages are exchanged synchronously. The CS/RS
`concatenates a sequence of templates and adjusts their
`interface facilitates asynchronous input and output from
`time alignment to minimize a measure of dissimilarity
`and to the control programs running on the CS. The
`between the concatenated templates and the incoming
`interface maintains the input and output queues in the
`speech. A finite-state syntax is specified to limit the
`buffers, and invokes the appropriate message handling
`template sequences that the algorithm matches to the
`routine when a message arrives.
`incoming speech. Special nodes are declared in the
`The CS writes directly into the transmit buffer and
`syntax to allow templates corresponding to silence
`reads directly from the receive buffer. The CS begins a
`(background noise) or non-speech sounds to be matched
`transmission to the RS by writing a message header and
`at any

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket