`Wrench, Jr. et al.
`
`54 REAL-TIME SPEECH PROCESSING
`DEVELOPMENT SYSTEM
`Edwin H. Wrench, Jr.; Alan L.
`75 Inventors:
`Higgins, both of San Diego, Calif.
`73) Assignee: ITT Corporation, New York, N.Y.
`21 Appl. No.: 376,076
`22 Filed:
`Jul. 6, 1989
`51
`Int. Cl............................ G10L 7/08; G10L 7/02
`52 U.S. C. ......................................... 381/43; 381/41
`58 Field of Search .................................... 381/41-46;
`364/513.5
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`4,144,582 3/1979 Hyatt ..................................... 381/43
`4,641,238 2/1987 Kneib .................................. 364/200
`4,653,097 3/1987 Watanabe et al. .................... 381/42
`4,695,977 9/1987 Hansen et al. ...................... 364/900
`4,799,144 1/1989 Parruck et al. ..................... 364/200
`4,827,522 5/1989 Matsuura et al. ..................... 381/.43
`OTHER PUBLICATIONS
`Perdue et al., AT&T Technical Journal, Conversant 1
`Voice System: Architecture and Applications, Sep./Oct.
`1986, vol. 65, No. 5; pp. 34-47.
`
`Patent Number:
`11
`45 Date of Patent:
`
`5,036,539
`Jul. 30, 1991
`
`Primary Examiner-Gary V. Harkconn
`Assistant Examiner-Michelle Doerrler
`Attorney, Agent, or Firm-Arthur L. Plevy
`57
`ABSTRACT
`A real-time speech processing development system has
`a control subsystem (CS) and a recognition subsystem
`(RS) interconnected by a CS/RS interface. The control
`subsystem includes a control processor, an operator
`interface, a user interface, and a control program mod
`ule for loading any one of a plurality of control pro
`grams which employ speech recognition processes. The
`recognition system RS includes a master processor,
`speech signal processor, and template matching proces
`sors all interconnected on a common bus which com
`municates with the control subsystem through the me
`diation of the CS/RS interface. The two-part configura
`tion allows the control subsystem to be accessed by the
`operator for non-real-time system functions, and the
`recognition subsystem to be accessed by the user for
`real-time speech processing functions. An embodiment
`of a speaker verification system includes template en
`rollment, template training, recognition by template
`concatenation and time alignment, silence and filler
`template generation, and speaker monitoring modes.
`
`11 Claims, 5 Drawing Sheets
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`DIGITAL
`DATA
`NTERFACE
`
`
`
`DATABASE
`STORAGE
`
`GENERAL
`PURPOSE
`CONTROL
`PROCESSOR
`
`
`
`
`
`RECOGNITION
`SUBSYSTEM
`CSIRS
`INTERFACE
`
`RS
`
`SIGNAL
`Pri SSING
`RECOGNITION ANA-9
`SPEECH
`INTERFACE
`
`
`
`
`
`OPERATORS
`TERMINAL
`
`IPR2023-00037
`Apple EX1010 Page 1
`
`
`
`U.S. Patent
`
`July 30, 1991
`
`Sheet 1 of 5
`
`5,036,539
`
`CONTROL
`SUBSYSTEM
`CS
`
`AES
`
`RECOGNITION
`UBSYSTEM
`
`GENERAL
`PURPOSE
`CONTROL
`PROCESSOR
`
`SIGNAL
`PROCESSING
`ANALOG
`RECOGNITIONSE
`INTERFACE
`
`DIGITAL
`BSA
`NTERFACE
`
`
`
`PRINTER
`NERFACE
`
`OPERATORS
`TERMINAL
`
`
`
`CONTROL SUBSYSTEM
`ICONIRO
`|PROCESSOR
`
`OPERATOR
`
`
`
`
`
`USER
`
`PERPH,
`
`USER
`NTERFACE
`CSIRS
`INTERFACE
`DATABASE
`
`
`
`
`
`
`
`
`
`
`
`FG.2
`
`RECONTION SUBSYSTEM
`MASTER
`PROCESSOR
`TEMPLATE
`MATCHNG
`PROCESSORS
`
`SPEECH
`USER
`SGNA
`PROCESSORS SE
`
`
`
`IPR2023-00037
`Apple EX1010 Page 2
`
`
`
`U.S. Patent
`
`July 30, 1991
`
`Sheet 2 of 5
`
`5,036,539
`
`CONTROL SUBSYSTEM CONTROL STRUCTURE
`
`
`
`OPERATOR
`INTERFACE
`PROGRAM
`
`OPERATOR
`DBMS
`SETUP FLE
`
`
`
`SPEAKER
`ENROLLMENT
`PROGRAM
`
`SPEAKER
`VERIFICATION
`PROGRAM
`
`SPEAKER
`MONTORING
`PROGRAM
`
`USER DBMS RS
`
`USER DBMS RS
`
`USER DBMS RS
`
`FIG. 3
`
`RECOGNITION SUBSYSTEM CONTROL STRUCTURE
`
`
`
`CS
`TEMPLATE PROC.
`SPEECH S.G. PROC.
`
`INTIALZE
`
`ACCEPT
`FRAME
`DATA
`(FROM
`SPEECH
`SIGNAL
`PROCESSOR)
`
`TRANSMT
`MESSAGE
`
`RECEIVE
`MESSAGE
`
`BACKGROUND
`DIAG.
`
`COMPUTE
`TEMPLATE
`WALUES
`- RECOG.
`- TRAN
`- FLER
`-MONITOR
`
`FIG. 4
`
`IPR2023-00037
`Apple EX1010 Page 3
`
`
`
`U.S. Patent
`
`July 30, 1991
`
`Sheet 3 of 5
`
`5,036,539
`
`MESSAGE PROTOCOL FOR TEMPLATE TRAINING
`CONTROL SUBSYSTEM
`RECOGNITION SUBSYSTEM
`
`
`
`TRAN COMMAND
`
`ACK RESPONSE
`
`ACK RESPONSE
`UPDATE TEMPLATES COMMAND
`
`ACK RESPONSE
`
`ACK RESPONSE
`
`ACK RESPONSE
`UPDATE TEMPLATES COMMAND
`
`ACK RESPONSE -
`
`S SYNTAX
`FE
`RECOGNIZER WER
`S SCRIPT
`FE
`RECOGNIZER WER
`PROMPT MESSAGE
`
`SCHEDULES SNTX
`PERFORMS OPEN RECOGNITION
`CORRECT RESULTS
`RESULTS MESSAGE
`
`SCHEDULES TMPT
`TEMPLATES TRA NED
`PROMPT MESSAGE
`
`SCHEDULE SNTX
`PERFORMS OPEN RECOGNITION
`NCORRECT RESULTS
`RESULTS MESSAGE
`
`PERFORMS FORCED RECOGNITION
`CORRECT RESULTS
`RESULTS MESSAGE
`
`SCHEDULES TMPT
`TEMPLATES TRANED
`TRANING COMPLETE MESSAGE
`
`GOES TO STANDBY MODE
`
`F.G. 5
`
`IPR2023-00037
`Apple EX1010 Page 4
`
`
`
`U.S. Patent
`
`July 30, 1991
`
`Sheet 4 of 5
`
`5,036,539
`
`MESSAGE PROTOCOL FOR TEMPLATE ENROLLMENT
`
`
`
`CONTROL SUBSYSTEM
`
`RECOGNON SUBSYSTEM
`
`ENROLL COMMAND
`
`ACK RESPONSE
`
`ACK RESPONSE
`
`ACK RESPONSE
`
`RECOGNIZER CONSTRUCTS SYNTAX
`RECOGNIZER WERFES SCRIPT (FOADED)
`PROMPT MESSAGE
`
`SCHEDULES SNTX
`PERFORMS RECOGNITION
`SCHEDULES TEMPLATE ENROLLMENT (TMP)
`STORESTEMPLATE IN MEMORY
`PROMP MESSAGE
`
`SCHEDULES SNTX
`PERFORMS RECOGNITION
`SCHEDULESTEMPLATE ENROLMENT (TMPT)
`STORES TEMPLATE IN MEMORY
`TRAINING COMPLETE MESSAGE
`
`GOES TO STANDBY MODE
`
`F.G. 6
`
`IPR2023-00037
`Apple EX1010 Page 5
`
`
`
`U.S. Patent
`
`July 30, 1991
`
`Sheet 5 of 5
`
`5,036,539
`
`
`
`MESSAGE PROTOCOL FOR SPEECH RECOGNITION
`CONTROL SUBSYSTEM
`RECOGNITION SUBSYSTEM
`
`RECOGNIZE COMMAND
`
`STANDBY MODE COMMAND
`
`SCHEDULES OPEN RECOGNITION (SYTX)
`ACK
`
`PERFORMS OPEN RECOGNITION
`RESULTS MESSAGE
`RESCHEDULES OPEN RECOGNITION (SNTX)
`PERFORMS OPEN RECOGNITION
`RESULTS MESSAGE
`
`SR GOES TO STANDBY MODE
`
`ACK
`
`FIG. 7
`
`IPR2023-00037
`Apple EX1010 Page 6
`
`
`
`1.
`
`REAL-TIME SPEECH PROCESSING
`DEVELOPMENT SYSTEM
`
`5,036,539
`
`10
`
`5
`
`40
`
`50
`
`55
`
`FIELD OF INVENTION
`The present invention relates to a computerized sys
`ten for performing speaker verification, speech recog
`nition, and other speech processing functions, and par
`ticularly, to one which can be efficiently used for both
`system development operations and real-time speech
`processing operations.
`BACKGROUND OF INVENTION
`Conventional speech processing systems commonly
`employ a speech recognition module which transforms
`input signals representing speech utterances into dis
`crete representations that are compared to stored digital
`representations (templates) of expected words or speech
`sound units. The input speech signals are "recognized'
`20
`usually by using a statistical algorithm to measure and
`detect a match to a corresponding word or sound tem
`plate. Speech processing systems and algorithms are
`usually designed for one or more particular modes of
`25
`operation, e.g., speaker-dependent or independent
`speech recognition, text- or application-dependent or
`independent speech recognition, speaker verification
`(authentication of identity), speaker recognition (selec
`tion from a number of candidates), or speaker monitor
`30
`ing (identity, direction, etc.). The design of such sys
`tens can vary widely with the application, speaker
`vocabulary, syntax, or environment of use.
`Over the past several years, speech processing tech
`nology has achieved a level of performance sufficient to
`35
`admit the introduction of successful commercial prod
`ucts. Development work continues to further improve
`the accuracy, reduce the vulnerability, and expand the
`capabilities of such systems. However, progress toward
`improvement has been limited by the available tools for
`system and algorithm development.
`One factor limiting progress is that error rates have
`become low enough, for example, in text-dependent
`speaker verification, that a large test must be performed
`45
`to ascertain whether an improvement has been made.
`To illustrate, if the probability of false acceptance is on
`the order of 1/1000, and the test is designed to observe
`30 errors, then 30,000 trials are needed. Performing
`such a test using a simulation running on a time-sharing
`computer could take weeks or months. To mitigate this
`problem, tests may be run using a fast special-purpose
`hardware implementation of the recognition algorithm.
`However, this leads to a second problem, i.e. making
`changes to the algorithm may be very difficult because
`of the constraints imposed by the hardware or software.
`A third important factor is that the recognition sys
`tem itself influences the user's speaking behavior. This
`influence is absent if the user's speech input is prere
`corded and the user does not have a real-time interac
`tion with the system. The environment in which the
`system is installed, the details of the user interface, and
`the feedback of past acceptance or rejection decisions
`can all affect the user's interaction with the system.
`Thus, valid testing in the intended environment of use
`requires a real-time implementation of the recognition
`algorithm and an accurate simulation of the user inter
`face.
`
`60
`
`65
`
`2
`SUMMARY OF INVENTION
`In order to improve upon the utility and effectiveness
`of conventional speech processing systems, it is a princi
`pal object of the invention that the system be operable
`in real-time response to speaker input using a realistic
`user interface, while at the same time remain flexible
`and accessible enough to be a useful tool for the devel
`opment and improvement of speech recognition algo
`rithms and system designs. A particular object of the
`invention is to provide a speech processing develop
`ment system which allows non-real-time access to its
`command, control, and recognition structures so that
`changes can be made readily to the system design and
`/or recognition algorithm, and which simultaneously
`allows real-time interaction of a speaker with its speech
`recognition functions.
`In accordance with the invention, a real-time speech
`processing development system comprises:
`(a) a speech recognition subsystem including a master
`processor, a template matching processor, a speech
`signal processor, and speech signal input means,
`wherein said master processor is configured to receive
`internal subsystem operation commands for performing .
`speech recognition and to thereupon generate subsys
`tem execution commands for real-time operation of said
`speech signal processor to process speech signals re
`ceived from a user through said speech signal input
`means into corresponding digital representations, and of
`said template matching processor to compare the digital
`representations provided by said speech signal proces
`sor with stored templates of word or sound units and to
`produce a real-time speech recognition output based
`thereon;
`(b) a control subsystem including an operator inter
`face for control communications with a system opera
`tor, a user interface for control communications with a
`user, a control program module, and a control proces
`sor connected to said operator interface, said speaker
`interface, and said control program module, wherein
`said control processor is configured to receive operator
`control inpuc through said operator interface and to
`access said control program module to load one of
`plurality of control programs selected by the operator,
`including a speech recognition control program, and
`wherein said control processor is further configured to
`operate said user interface, when the speech recognition
`control program is selected by the operator, for control
`communications with the user, and to execute said
`speech recognition control program so as to generate
`the internal subsystem operation commands for per
`forming speech recognition provided to said master
`processor of said recognition subsystem; and
`(c) an interface connected between said control sub
`system and said recognition system for transmitting the
`internal subsystem operation commands for performing
`speech recognition from said control subsystem to said
`recognition subsystem and the real-time speech recogni
`tion output from said recognition subsystem to said
`control subsystem, whereby said control subsystem can
`be accessed by the operator for non-real-time system
`development functions while said recognition subsys
`tem can be accessed by the user for real-time speech
`recognition functions.
`In the preferred embodiment of the invention, the
`control program module of the control subsystem in
`cludes control programs for a speaker enrollment pro
`gram for enrolling speech or vocabulary samples of a
`
`IPR2023-00037
`Apple EX1010 Page 7
`
`
`
`5
`
`15
`
`20
`
`5,036,539
`4.
`3
`K. P. Li and E. H. Wrench; and copending U.S. patent
`speaker into the system, a speaker verification program
`application No. 346,054, filed on 5/2/89, by B. P. Land
`for verifying a speaker based upon comparisons to
`ell et al., entitled "Automatic Speech Recognition Sys
`stored speech samples of the speaker, and a speaker
`tem. Using Seed Templates', now U.S. Pat. No.
`monitoring program for passively monitoring the iden
`4,994,983.
`tity of a speaker. The three control programs of the
`Referring to FIG. 1, a real-time speech processing
`system each includes use of the core speech recognition
`development system in accordance with the invention
`program.
`has a control subsystem CS and a recognition subsystem
`BRIEF DESCRIPTION OF DRAWINGS
`RS interconnected by a CS/RS interface. The control
`subsystem CS includes a general purpose control pro
`The above objects and further features and advan
`10
`cessor, an operator interface to an operator's terminal, a
`tages of the invention are described in detail below in
`user interface to a user's terminal, a database storage,
`conjunction with the drawings, of which:
`and input/output interfaces to system peripherals such
`FIG. 1 is a schematic diagram of the external configu
`as a printer, modem, datalink, etc. The recognition sys
`ration of a real-time speech processing development
`tem RS includes a signal processing and recognition
`system in accordance with the invention;
`module which receives analog speech signal input from
`FIG. 2 is a schematic diagram of the internal configu
`an input device (microphone set) for the user.
`ration of a control subsystem and a recognition subsys
`As shown in FIG. 2, the internal structure of the
`tem for the real-time speech processing development
`control subsystem CS includes the general purpose
`system of the invention;
`control processor which is interconnected for external
`FIG. 3 is a schematic diagram of the structure of the
`communication with the operator through the operator
`control programs for the control subsystem shown in
`interface and with the user through the user interface,
`FIG. 2;
`the CS/RS interface to the common bus for internal
`FIG. 4 is a schematic diagram of the control structure
`communication with the recognition subsystem RS, the
`of the recognition subsystem shown in FIG. 2;
`database storage, and the digital data input/output in
`FIG. 5 is a diagram of the communication exchange
`25
`terface. The control processor is connected to a control
`between the control and recognition subsystems for the
`program module, to be described further below.
`template training process;
`The control processor is configured to receive opera
`FIG. 6 is a diagram of the communication exchange
`tor control input through the operator interface and to
`between the control and recognition subsystems for the
`access the control program module to load one of a
`template enrollment process; and
`30
`plurality of control programs selected by the operator.
`FIG. 7 is a diagram of the communication exchange
`The control program is executed by the control proces
`between the control and recognition subsystems for the
`sor to control communications with the operator and
`speech recognition process
`/or user, and to generate internal subsystem operation
`DETAILED DESCRIPTION OF INVENTION
`commands which are provided through the CS/RS
`35
`interface to the master processor of the recognition
`The present invention encompasses speech process
`subsystem RS via a common bus. The database storage
`ing systems for a wide range of speech processing func
`is used to store the system data files, the user data files,
`tions and modes of operation, including speaker
`and the template files.
`dependent or independent speech recognition, text- or
`The internal structure of the recognition subsystem
`application-dependent or independent speech recogni
`40
`RS includes a master processor, a plurality of template
`tion, speaker verification, speaker recognition, speaker
`matching processors in parallel, and a speech signal
`monitoring, continuous speech recognition, wordspot
`processor which is connected to the speech signal input
`ting, and isolated word recognition, as well as for differ
`device. The master processor, template processors, and
`ent environments of use and field applications. The
`speech signal processor are all interconnected with the
`preferred embodiment of the invention described herein
`45
`common bus. The master processor is configured to
`is directed to a system for speaker verification using a
`receive the internal subsystem operation commands for
`speaker-dependent, template-matching recognition al
`performing speech recognition sent from the control
`gorithm. However, it is to be understood that the princi
`subsystem CS via the common bus, and to thereupon
`ples of the invention are equally applicable to other
`generate subsystem execution commands. The RS exe
`types of systems and are not limited to the described
`cution commands activate RS program modules (to be
`system.
`described hereinafter) for real-time operation of the
`For a wider, more detailed explanation of speech
`speech signal processor to process input speech signals
`recognition systems, the following are incorporated
`into corresponding digital representations, and of the
`herein by reference: "A Comparison of Four Tech
`template matching processors to compare the digital
`niques for Automatic Speaker Recognition', by R. E.
`55
`representations provided by the speech signal processor
`Wohlford, E. H. Wrench, and B. P. Landell, 1980 IEEE
`with templates of word or sound units and to produce a
`International Conference on Acoustics, Speech and
`real-time speech recognition output based thereon.
`Signal Processing (ICASSP), vol. 3, pp. 908-911; "A
`The common bus connects the control subsystem and
`Realtime Implementation of a Text Independent
`the recognition system through the mediation of the
`Speaker Recognition System", by E. H. Wrench, 1981
`CS/RS interface to transmit the internal subsystem
`ICASSP, vol. 1, pp. 193-196; "Keyword Recognition
`operation commands from the control subsystem CS to
`Using Template Concatenation", by A. L. Higgins and
`the recognition subsystem RS and the real-time outputs
`R. E. Wohlford, 1985 ICASSP; "Speaker Recognition
`and responses from the recognition subsystem to the
`by Template Matching', by A. L. Higgins, Proceedings
`control subsystem. The two-subsystem configuration
`of Speech Technology 1986, New York, N.Y.; “Im
`65
`allows the control subsystem to be accessed by the
`proved Speech Recognition in Noise', by B. P. Landell,
`operator for non-real-time system functions, such as
`R. E. Wohlford, and L. G. Bahler, 1986 ICASSP, vol.
`handling the user interface, program decision-making,
`1, no.1; U.S. Pat. No. 4,720,863 issued Jan. 19, 1988, to
`
`50
`
`IPR2023-00037
`Apple EX1010 Page 8
`
`
`
`20
`
`5,036,539
`6
`5
`The overall operation of the system is under the con
`program editing, database management, and system
`trol of an operator interface program which allows the
`initialization, testing and diagnostics. This frees the
`operator complete control over the operation of the
`recognition subsystem to be accessed by the user for
`system. This program is used to select the mode of
`real-time speech functions, such as speech signal digiti
`operation, the input source, system parameters, etc. It
`zation, spectral frame analysis, and template matching.
`creates a "setup' file containing the system parameters
`In a practical implementation, the control subsystem
`and invokes the appropriate control program by open
`hardware is a Sun 3/160 Unix workstation which in
`ing a new Unix process. Upon startup, the control pro
`cludes a 16 MHz Motorola 68020 CPU with 8 mega
`gram reads the setup file. Each control program can
`bytes of CPU memory and 160 megabytes of disk stor
`also be run using shell scripts without the operator
`age. A System Administrator CRT console is used as
`10
`interface program, which facilitates off-line processing
`the operator's console, and a VT-100 type terminal as
`of pre-recorded speaker databases. The interface is pref.
`the user's terminal. The recognition subsystem is com
`erably menu driven to minimize the need for typed
`posed of an ICS-100 4-channel A/D and D/A con
`input.
`verter, a 400 KHz maximum sampling rate, and 4255
`A wide range of different operational modes can be
`tap programmable digital FIR filters for signal acquisi
`15
`setup, tested, and simulated on the described system by
`tion and conditioning; a Sky Challenger board contain
`selectively loading the appropriate control program or
`ing two TMS-32020s at 20 MHz clock rate to perform
`programs from the control program module. The spe
`front end signal processing of real-time parameter val
`cific function and design of each control program is a
`ues of the input speech signals into frames of speech
`matter of choice and need not be described in detail
`data; and seven Motorola 68020-based single-board
`herein. Examples of the operation of speaker enroll
`processors, one of which is used to serve as the master
`ment, speaker verification, and speaker monitoring pro
`processor, and the other six to perform template match
`grams are given further below for purposes of illustra
`ing. Each 68020 board contains 1 megabyte of random
`tion. The essential concept in the present invention is
`access memory and operates at a 20 MHz clock rate.
`the dual configuration of the control and recognition
`All of the above devices are commercially available
`25
`subsystems connected by a common interface wherein a
`VME-compatible boards that plug directly into the Sun
`particular control program selected by an operator is
`3/160 VME backplane. The VME bus serves as the
`loaded by the control subsystem and the real-time
`medium of communication between the two subsys
`speech processing functions of the selected program are
`tems. The throughput of the system is approximately
`executed by the recognition subsystem in response to
`3000 template frames per input frame. A microphone
`internal subsystem operation commands generated by
`preamplifier and a digital sampling clock are imple
`the control program. This dual configuration keeps the
`mented on a custom wire wrap board mounted with the
`recognition subsystem accessible to the speaker/user
`ICS-100A/D converter. The entire system is contained
`for real-time simulation, while allowing the control
`within the chassis of the Sun workstation. A telephone
`subsystem to pursue non-time-critical tasks and main
`style handset is plugged into a connector located on the
`tain flexibility in program range and development.
`rear of the unit.
`A database management system (DBMS) is provided
`PROGRAM STRUCTURE OF RECOGNITION
`with the control subsystem to manage the data storage
`SUBSYSTEM
`and retrieval operations. In the speaker verification
`The master processor of the recognition subsystem
`operational mode of the system, for example, three
`RS acts as the subsystem executive to the six template
`databases are maintained: a user database, an access
`processors and the speech signal processor. As shown in
`database, and a score database. The user database con
`FIG. 4, the master processor software consists of a
`tains personal information about each user of the system
`system executive program and six program modules of
`such as name, address, etc. The access database contains
`different priority levels, labelled 0-5. In order of de
`information about each attempted access such as date,
`45
`creasing priority they are: Initialization; Accept Frame
`time, identity code, and verification decision. The oper
`Data; Receive Message; Transmit Message; Compute
`ator may query the user and the access databases to
`Template Values; and Background. The Initialization
`audit the system usage. The score database is used by
`module performs the various tasks for initializing the
`the verification control program to set thresholds for
`RS parameters. The Accept Frame Data module coor
`acceptance or rejection.
`dinates signal acquisition, conditioning, and processing
`PROGRAM STRUCTURE OF CONTROL
`by the speech signal processor into frames of speech
`SUBSYSTEM
`data. The Transmit and Receive Message modules coor
`dinate sending response outputs to and receiving com
`Referring to FIG. 3, the program structure for the
`mand inputs from the CS. The Compute Template Val
`control subsystem CS will now be described. The im
`55
`ues module performs various speech recognition func
`plemented CS embodiment employs the Sun Unix oper
`tions, such as template matching, template training (for
`ating system and a set of control programs written in
`enrolling a speaker vocabulary), filler template genera
`the language C. A separate control program exists for
`tion (for silence and other filler templates), and monitor
`each of the distinct tasks performed by the CS, includ
`ing (for speaker monitoring and verification operational
`ing speaker enrollment, speaker verification, and
`modes). The Background module performs system di
`speaker monitoring. The function of each control pro
`agnostics, and is always scheduled.
`gram is to perform the particular task by issuing appro
`Template matching is performed by the master pro
`priate internal subsystem operation commands to the
`cessor and the six 68020 template processors. The soft
`RS and to receive and process the results. The control
`ware for these processors is designed to implement
`programs also implement the user interface, display
`template matching by dynamic programming (DP).
`instructions, prompt the operator or user, interpret the
`The master processor receives a frame of speech data
`operator's or user's terminal keyboard input, and store
`input from the frontend speech signal processor, re
`the program's data files (DBMS).
`
`65
`
`30
`
`35
`
`50
`
`IPR2023-00037
`Apple EX1010 Page 9
`
`
`
`5
`
`10
`
`15
`
`5,036,539
`8
`7
`ate command modules and logic in the RS to carry out
`ceives command messages from the CS, and performs
`the CS command message.
`the DP algorithm of searching for the closest match
`The RS sends a message to the CS by writing the data
`among templates downloaded from the CS. The tem
`words (if any) into the receive buffer along with a mes
`plate processors are called once per frame to perform
`sage header. The RS then schedules the low-level com
`the template difference (distance) computations in par
`munications (RS transmit) module to place the message
`allel, each one processing a preassigned group of tem
`in a location that the CS will access and sends an inter
`plates evenly divided among the template processors.
`rupt request to notify the CS that a new message is
`Tasks to be performed by the RS are requested by
`available. The CS then sends an interrupt acknowledge
`sending "inter-module packets' which are placed on a
`ment. The RS responds with a status/id, clears the inter
`scheduling queue. When a module completes a task and
`rupt request, and clears the header and data words from
`gives up control, the system executive schedules the
`the receive buffer. If appropriate, the CS sends an ac
`next priority task by scanning the queue from highest
`knowledgement to the RS upon receipt of the message.
`priority to lowest priority. The executive has entry
`Upon receipt of the acknowledgement, the RS termi
`points on the queue to schedule modules both from an
`nates the transmission sequence.
`interrupt code and from a noninterrupt code. If the
`In the preferred embodiment of the system having
`priority of a requested task is higher than that of the
`speaker enrollment, verification, and monitoring modes
`currently-running task, the executive performs a con
`of operation, examples of CS-to-RS messages include:
`text switch, giving control to the requested task. The
`Calibrate Noise; Create Silence Template; Enroll New
`interrupted task is put on the scheduling queue, and the
`Template; Train Template (by averaging a repeated,
`program counter and all registers are preserved so that
`utterance with an enrolled template); Recognize
`it can take up where it was interrupted upon regaining
`Speech; Upload and Download Templates; Download
`control.
`Syntax; Download System Parameters; Filler Genera
`Within the RS, each processor may access the mem
`tion; and Monitoring Mode. Examples of RS-to-CS.
`ory of all the processors on the common bus. Communi
`messages include: Acknowledgement; Recognition Re
`cation between the master processor and the template
`25
`sults; Prompt Speaker; Action Completed; Upload
`processors is performed by means of setting flags in the
`Templates; Upload Noise Calibration; Error messages;
`shared memory. The frontend speech signal processor
`and Speaker Verifier Results. The important system
`passes frames of input speech data to the master proces
`functions and internal command sequences of a pre
`sor by sending it an interrupt once per frame. The A/D
`ferred speech processing development system are de
`converter passes PCM (pulse code modulated) data to
`30
`the frontend through a parallel port and generates an
`scribed in more detail below.
`interrupt once per sample.
`SYSTEM FUNCTIONS OF SPEAKER
`VERIFICATION SYSTEM
`CS/RS INTERFACE
`In one preferred embodiment, a speech processing
`Other than the analog input from the microphone, the
`development system is designed for speaker verifica
`only input/output to/from the recognition subsystem
`tion, i.e. for accepting or rejecting the claimed identity
`RS are the internal subsystem operation commands
`of a speaker seeking to enter a controlled area based
`issued by the CS under a selected control program and
`upon measurements of input speech samples. The verifi
`the recognition responses provided by the RS through
`cation system includes control programs for enrollment
`the CS/RS interface. Thus, the command structure to
`of the speaker's vocabulary (speech samples) in the
`the RS is entirely transparent to the user, and the user
`system, for verification of a speaker's input speech as
`can interact with the RS in real-time simulation through
`compared to templates of a previously enrolled vocabu
`the user interface of the CS. The CS/RS interface in
`lary, and for monitoring a speaker's input without con
`cludes a pair of message buffers, i.e. a transmit buffer for
`straint from the system except to authenticate the speak
`messages from the CS to the RS, and a receive buffer
`45
`er's identity claim periodically.
`for messages from the RS to the CS. The buffers can be
`The verification system uses a speech recognition
`implemented on either side of the VME bus. Sending
`algorithm based upon template matching for both text
`and receiving messages on the Sun workstation is done
`dependent and text-independent applications. The rec
`by Unix device drivers. The frequency of messages is
`ognition algorithm is the core of the recognition subsys
`low enough that there is no difficulty in servicing them.
`50
`tem. In general terms, the algorithm is of the type which
`Messages are exchanged synchronously. The CS/RS
`concatenates a sequence of templates and adjusts their
`interface facilitates asynchronous input and output from
`time alignment to minimize a measure of dissimilarity
`and to the control programs running on the CS. The
`between the concatenated templates and the incoming
`interface maintains the input and output queues in the
`speech. A finite-state syntax is specified to limit the
`buffers, and invokes the appropriate message handling
`template sequences that the algorithm matches to the
`routine when a message arrives.
`incoming speech. Special nodes are declared in the
`The CS writes directly into the transmit buffer and
`syntax to allow templates corresponding to silence
`reads directly from the receive buffer. The CS begins a
`(background noise) or non-speech sounds to be matched
`transmission to the RS by writing a message header and
`at any point in the speech.
`any appropriate data words into the transmit buffer. It
`The heavy computational load