throbber
(19) United States
`(12) Patent Application Publication (10) Pub. No.: US 2007/0043563 A1
`(43) Pub. Date:
`Feb. 22, 2007
`Comerford et al.
`
`US 2007.0043563A1
`
`(54) METHODS AND APPARATUS FOR
`BUFFERING DATA FOR USE IN
`ACCORDANCE WITH A SPEECH
`RECOGNITION SYSTEM
`
`(75) Inventors: Liam D. Comerford, Carmel, NY
`(US); David Carl Frank, Ossining, NY
`(US); Burn L. Lewis, Ossining, NY
`(US); Leonid Rachevksy, Ossining, NY
`(US); Mahesh Viswanathan, Yorktown
`Heights, NY (US)
`Correspondence Address:
`Ryan, Mason & Lewis, LLP
`90 Forest Avenue
`Locust Valley, NY 11560 (US)
`(73) Assignee: International Business Machines Cor
`poration, Armonk, NY
`(21) Appl. No.:
`11/209,004
`
`(22) Filed:
`
`Aug. 22, 2005
`
`Publication Classification
`
`(51) Int. Cl.
`(2006.01)
`GIOL I5/06
`(52) U.S. Cl. .............................................................. 704/243
`
`(57)
`
`ABSTRACT
`
`Techniques are disclosed for overcoming errors in speech
`recognition systems. For example, a technique for process
`ing acoustic data in accordance with a speech recognition
`system comprises the following steps/operations. Acoustic
`data is obtained in association with the speech recognition
`system. The acoustic data is recorded using a combination of
`a first buffer area and a second buffer area, such that the
`recording of the acoustic data using the combination of the
`two buffer areas at least substantially minimizes one or more
`truncation errors associated with operation of the speech
`recognition system.
`
`201
`
`INITIALIZE BUFFERS
`
`202
`
`BEGINCIRCULAR BUFFER RECORDING
`
`
`
`204
`
`
`
`e MICON RECEIVED
`YES
`SAWE CIRCULAR BUFFER
`WRITE POINTER ADDRESS
`
`SET WRITE POINTER TO
`START OF LINEAR BUFFER
`
`
`
`PREPEND LINEAR COPY OF CIRCULAR
`BUFFER TO LINEAR BUFFER
`
`FIND SILENCE IN
`COMPOSITE AUDIO BUFFER
`
`PASS BUFFER START ADDRESS
`TO ASR AND START DECODING
`
`WAIT FOR ASR SILENCE DETECTION
`
`HAL RECORDING
`
`OWERWRITE CIRCULAR BUFFER
`
`
`
`
`
`
`
`RESTART CIRCULAR BUFFER RECORDING
`
`ALLOCATE NEW OR CLEAR LINEAR BUFFER
`
`
`
`2O7
`
`208
`
`209
`
`210
`
`211
`
`212
`
`213
`
`Exhibit 1014
`Page 01 of 12
`
`

`

`Patent Application Publication Feb. 22, 2007 Sheet 1 of 4
`
`US 2007/0043563 A1
`
`(OZI)/
`
`(01)/,
`
`
`
`Exhibit 1014
`Page 02 of 12
`
`

`

`Patent Application Publication Feb. 22, 2007 Sheet 2 of 4
`
`US 2007/0043563 A1
`
`FIG. 2
`
`201
`
`INITIALIZE BUFFERS
`
`2O2
`
`BEGIN CIRCULAR BUFFER RECORDING
`
`203
`
`essed YES
`
`204
`
`MICON RECEIVED 2
`
`SAVE CIRCULAR BUFFER
`WRITE POINTER ADDRESS
`
`SET WRITE POINTERTO
`START OF LINEAR BUFFER
`
`PREPEND LINEAR COPY OF CIRCULAR
`BUFFER TO LINEAR BUFFER
`
`FIND SILENCE IN
`COMPOSITE AUDIO BUFFER
`
`PASS BUFFER START ADDRESS
`TO ASR AND START DECODING
`
`WAIT FOR ASR SILENCE DETECTION
`
`HALT RECORDING
`
`OVERWRITE CIRCULAR BUFFER
`
`205
`
`206
`
`2O7
`
`208
`
`209
`
`210
`
`211
`
`212
`
`RESTART CIRCULAR BUFFER RECORDING - 23
`
`ALLOCATE NEW OR CLEAR LINEAR BUFFER us 214
`
`Exhibit 1014
`Page 03 of 12
`
`

`

`Patent Application Publication Feb. 22, 2007 Sheet 3 of 4
`
`US 2007/0043563 A1
`
`FIG. 3
`
`SILENCE TO
`SPEECH TRANSITION
`DETECTED
`(320)
`LOW APIUDEN 4:
`SOUNDS NA
`(314)
`7
`SPEECH TO --Es
`SILENCE
`TRANSITION
`DETECTED
`
`
`
`
`
`
`
`HIGH AMPLITUDE
`SOUNDS
`(316)
`
`\\
`
`cy."
`(312)
`
`LARGE CIRCULAR
`BUFFER
`(308)
`
`
`
`
`
`FILLING PROCEEDS
`"CLOCKWISE"
`
`BUFFER START AND END"
`(FILLING BEGINS HERE ON FIRST USE)
`(310)
`
`START
`(302)
`|
`SMALL LINEAR
`BUFFER - -3
`(300)
`
`
`
`:
`
`FINISH
`(306)
`|
`
`CURRENT WRITE POSITION
`(304)
`
`Exhibit 1014
`Page 04 of 12
`
`

`

`Patent Application Publication Feb. 22, 2007 Sheet 4 of 4
`
`US 2007/0043563 A1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`400
`
`
`
`
`
`INITIALIZE ASR,
`CIRCULAR BUFFER
`AND OTHER SOFTWARE
`COMPONENTS
`
`F16. 1 BUFFERMANAGEMENT PROCESS
`
`406
`
`402
`
`START AUDIO
`RECORDING PROCESS
`AND SPEECH
`TRANSITION DETECTION
`
`EXAMINE SPEECH
`TRANSITION RECORD TO
`DETERMINE WHETHER
`USER IS SPEAKING
`
`IN
`SILENT BUFFER
`INTERWAL
`t
`
`APPEND LINEAR BUFFER
`CONTENTS TO CIRCULAR
`BUFFER AT CURRENT
`WRITE POSITION
`
`424
`
`CONSTRUCT DATA
`BUFFER FROM LEADING
`SILENCE AND INCOMING
`AUDIO FOR ASR
`
`CONSTRUCT DATA
`BUFFER FROM
`CIRCULAR BUFFER
`DATA FOR ASR
`
`START ASR
`RECOGNITION PROCESS
`
`428
`
`S.
`
`DETECTS SILENCE
`
`STORE TRANSITION AND
`CURRENT CIRCULAR
`BUFFER WRITE POSITION
`
`FIC. 6
`
`l? 500
`
`
`
`540
`
`PROCESSOR
`
`MEMORY
`
`USER INTERFACE
`
`Exhibit 1014
`Page 05 of 12
`
`

`

`US 2007/0043563 A1
`
`Feb. 22, 2007
`
`METHODS AND APPARATUS FOR BUFFERING
`DATA FOR USE IN ACCORDANCE WITH A
`SPEECH RECOGNITION SYSTEM
`
`FIELD OF INVENTION
`0001. The present invention relates generally to speech
`processing systems and, more particularly, to speech recog
`nition systems.
`
`BACKGROUND OF THE INVENTION
`0002 Speech recognition systems may be described in
`terms of several properties including whether they use
`discrete word vocabularies (typical of large vocabulary
`recognizers) or grammar-based Vocabularies (typical of
`Small Vocabulary recognizers), and whether they continu
`ously process an uninterrupted stream of input audio or
`commence processing on command (typically a “micro
`phone on' or “MICON event). Recognizers that use control
`events may terminate recognition on an external event
`(typically a “microphone off or “MICOFF' event), comple
`tion of processing of an audio buffer, or detection of silence
`in the buffered audio data. The processed audio stream in
`any case may be “live' or streamed from a buffer.
`0003. It is a common problem of continuously operated
`recognition systems that they generate large numbers of
`errors of recognition and spurious recognition output at
`times that the recognition system is not being addressed. For
`example, in a vehicle-based speech recognition system, this
`problem may occur due to audio from the radio, person-to
`person conversation, and/or noise. This fact makes the use of
`a microphone button or other dialog pacing mechanism
`almost universal in automotive (telematic) speech recogni
`tion applications.
`0004. It is a common problem of microphone-button
`paced speech applications that the application user fails to
`operate the button correctly. The two typical errors are
`completing the pushing of the microphone-on button after
`speech has already begun and releasing the microphone-on
`button (or pushing the microphone-off button) before speech
`has ended. In either case, Some speech intended to be
`recognized is being cut off due to these errors of operation
`by the user.
`0005 Accordingly, techniques for overcoming errors in
`speech recognition systems are needed.
`
`SUMMARY OF THE INVENTION
`0006 Principles of the present invention provide tech
`niques for overcoming errors in speech recognition systems.
`0007. It is to be understood that any errors that may occur
`in a speech recognition system that could cause loss of
`speech intended to be recognized will be generally referred
`to herein as “truncation errors.” While two examples of user
`operation failure in microphone-button paced speech appli
`cations are given above, it is to be understood that the phrase
`“truncation error” is not intended to be limited thereto.
`0008. In one aspect of the invention, a technique for
`processing acoustic data in accordance with a speech rec
`ognition system comprises the following stepS/operations.
`Acoustic data is obtained in association with the speech
`recognition system. The acoustic data is recorded using a
`
`combination of a first buffer area and a second buffer area,
`Such that the recording of the acoustic data using the
`combination of the two buffer areas at least substantially
`minimizes one or more truncation errors associated with
`operation of the speech recognition system.
`0009. It is to be appreciated that the data structures types
`chosen for each of the buffer areas reflect the functions of
`those areas. By way of example, the first buffer type may be
`chosen to allow continuous recording so that the most recent
`few seconds (depending on the buffer size) of acoustic data
`is available for processing. Data structures such as circular
`or ring buffers and FIFO (First In First Out) stacks are
`suitable examples. The second buffer type may be chosen to
`ensure that the system will not run out of buffer space when
`recording long utterances. Linked lists or appropriately large
`memory blocks are exemplary data structures for imple
`menting such buffers.
`0010. It is to be further appreciated that the acoustic data
`referenced here may be, by way of example, digital repre
`sentations of speech and other audio signals present at a
`system input microphone. It is, in any case, data comprising
`acoustic features or data in a format that is suitable for
`extracting acoustic features, as is well known in the art. Such
`features may be used to determine whether speech was
`taking place at the time of the recording or not. Further, Such
`data may be decoded into text that represents the words
`uttered by the user to create a part of the acoustic data.
`0011. In one embodiment, the recording step? operation
`may further comprise: recording acoustic data obtained by
`the speech recognition system in the first buffer area; stop
`ping recording of acoustic data in the first buffer area and
`starting recording of acoustic data obtained by the speech
`recognition system at the start of the second buffer area,
`when an indication that the speech recognition system is
`being addressed is detected; and prepending, to the begin
`ning of the acoustic data stored in the second buffer area,
`acoustic data in the first buffer. This data may be arranged so
`that the oldest acoustic data in the first buffer is located at the
`start of the segment prepended to the second buffer. This
`means that the acoustic data recorded immediately before
`the indication that the system is being addressed ends the
`prepended segment and is contiguous in memory with the
`acoustic data which immediately followed the “being
`addressed’ event that is stored in the second area.
`0012. The technique may further comprise processing the
`acoustic data in the composite buffer area (prepended area
`and second buffer area) to detect features indicating silence.
`The location of the silence closest to the end of the
`prepended segment may then be used as the location in the
`composite buffer at which speech intended for the system to
`process begins. This silence will be in the prepended seg
`ment if the indication of speech was given after speech
`started. It will follow the end of the prepended segment if
`speech began after the indication event.
`0013 The technique may further comprise decoding
`acoustic data in the composite buffer area from the acoustic
`data format into text. The decoding of acoustic data in the
`composite buffer area may begin when the starting silence
`location has been established from the acoustic data.
`0014. The recording of acoustic data in the second buffer
`area may continue until an indication that the speech rec
`
`Exhibit 1014
`Page 06 of 12
`
`

`

`US 2007/0043563 A1
`
`Feb. 22, 2007
`
`ognition system is no longer being addressed is detected and
`a silence indication is detected in the acoustic data recorded
`in the second buffer area. Recording of acoustic data in the
`second buffer area may stop and recording of acoustic data
`in the first buffer area may restart, when the indication that
`the speech recognition system is no longer being addressed
`is detected and the silence indication is detected in the
`acoustic data recorded in the second buffer area.
`0.015 The indication that the speech recognition system
`is being addressed may comprise a microphone on event,
`and the indication that the speech recognition system is no
`longer being addressed may comprise a microphone off
`event.
`0016. The first buffer area may comprise a circular buffer,
`and the second buffer area may comprise a linear buffer.
`Further, the first buffer area and the second buffer area may
`be at least part of a single storage data structure, or may be
`at least part of separate storage data structures. These buffers
`may be in addition to any buffer resources maintained by the
`speech recognizer.
`0017. In another embodiment, the recording step/opera
`tion may further comprise: recording acoustic data obtained
`by the speech recognition system in the first buffer area;
`appending acoustic data recorded in the first buffer area to
`the second buffer area, when the first buffer area is full;
`identifying the existence of a speech region and a silence
`region in the acoustic data appended to the second buffer
`area; detecting when an indication that the speech recogni
`tion system is being addressed occurs; and filling a recog
`nition buffer area at least with the acoustic data appended to
`the second buffer area, when a speech region is identified
`and when the indication that the speech recognition system
`is being addressed is detected, or filling the recognition
`buffer area at least with incoming acoustic data for the
`speech recognition system, when a silence region is identi
`fied and when the indication that the speech recognition
`system is being addressed is detected.
`0018. These and other objects, features and advantages of
`the present invention will become apparent from the fol
`lowing detailed description of illustrative embodiments
`thereof, which is to be read in connection with the accom
`panying drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`0.019
`FIG. 1 is a diagram illustrating an anti-truncation
`buffering methodology, according to an embodiment of the
`invention;
`0020 FIG. 2 is a flow diagram illustrating more details of
`an anti-truncation buffering methodology, according to an
`embodiment of the invention;
`0021
`FIG. 3 is a diagram illustrating an anti-truncation
`buffering methodology, according to another embodiment of
`the invention;
`0022 FIG. 4 is a flow diagram illustrating more details of
`an anti-truncation buffering methodology, according to
`another embodiment of the invention; and
`0023 FIG. 5 is a diagram illustrating a computing system
`for use in implementing an anti-truncation buffering meth
`odology, according to an embodiment of the invention.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`0024. It is to be understood that while the present inven
`tion may be described below in the context of a particular
`computing system environment and an illustrative speech
`recognition application, the invention is not so limited.
`Rather, the invention is more generally applicable to any
`computing system environment and any speech recognition
`application in which it would be desirable to overcome
`truncation errors.
`0025. As used herein, the phrase "acoustic data gener
`ally refers to any acoustic input picked up and transduced by
`a microphone of the Automatic Speech Recognizer (ASR)
`including, but not limited to, acoustic input representative of
`speech and acoustic input representative of silence. Depend
`ing on the requirements or features of a particular imple
`mentation, acoustic data may refer to compressed audio,
`audio that has undergone feature extraction and is repre
`sented, for example, as cepstra or any other digital repre
`sentation of audio that that is suitable for silence detection
`and decoding into text.
`0026. As will be explained below, illustrative embodi
`ments of the invention address the truncation error problem
`by means by of a software apparatus employing the 'silence
`detection' capability of an automatic speech recognizer, an
`audio buffer arrangement, a buffer management algorithm
`and the events generated by microphone mechanisms that
`signal the user's intention to address the system such as
`“Push-to-Talk’ or “Push-for-Attention buttons.
`0027. When a “Push-to-Talk” button is employed in an
`apparatus containing an ASR, button depression typically
`causes a “MICON event, and button release typically
`causes a “MICOFF event. When a “Push-for-Attention
`button is employed in an apparatus containing an ASR,
`button depression typically causes a “MICON event, and
`button release typically does not produce any event. Other
`mechanisms such as video speech recognition may imitate
`either of these patterns.
`0028 Conventional ASRs provide “alignment data that
`indicate which parts of an audio stream or buffer has been
`recognized or decoded into which particular words or
`silence.
`0029. In a conventional ASR, audio to be recognized
`begins to be stored in buffer memory upon receipt of the
`MICON message and ceases to be stored upon receipt of the
`MICOFF message. In accordance with illustrative principles
`of the invention, the acoustic data, which may include
`speech by the user, is continuously recorded in a circular
`buffer. That is, acoustic data is continuously picked up by the
`microphone of the ASR and is continuously stored in the
`buffer arrangement of the invention, as will be explained in
`detail below, regardless of the receipt of a MICON event or
`a MICOFF event (as will be seen, such events serve to
`trigger which buffer of the buffer arrangement does the
`storing).
`0030. As is known, the terms “circular buffer” or “ring
`buffer” refer to a commonly used programming method in
`which a region of memory is managed by a Software module
`so that when the region has been filled with incoming data,
`new data is written beginning again at the start of the
`memory region. The management software retains the
`
`Exhibit 1014
`Page 07 of 12
`
`

`

`US 2007/0043563 A1
`
`Feb. 22, 2007
`
`address of the current “write' location and the locations of
`the beginning and end of the memory region. Any portion of
`the memory region may be read. This permits the manage
`ment Software to read the data from the memory region as
`a continuous stream, in the correct order, even if, when the
`region is viewed as a linear segment of memory, the end of
`the data appears at a lower memory address than the start of
`the data. The effective topology of the memory region is thus
`made into a ring by the managing software. In contrast, a
`“linear buffer typically does not have such a wrapping
`feature, and thus when the end of the buffer is reached, the
`processor must allocate additional buffer space.
`0031. Accordingly, in accordance with illustrative prin
`ciples of the invention using a circular buffer and a linear
`buffer, upon receipt of the MICON message, the ASR and
`the software system of this invention proceed to:
`0032) 1... mark the point in the circular buffer correspond
`ing to the time at which the MICON was received by storing
`the address of that memory location and halt recording in
`that buffer,
`0033 2. buffer all further speech in a separate linear
`buffer, and;
`0034 3. prepend linear copy of circular buffer to linear
`buffer.
`0035 4. process the resulting composite buffer until the
`silence closest to the MICON marker is found.
`0036) The silence found in step 4 is taken to be the start
`of the utterance.
`0037 Depending on the particular ASR configuration,
`decoding into text may now proceed or be postponed until
`the MICOFF message is received or silence is detected in the
`most recently buffered audio. In either case, the audio buffer
`of step 2 continues to store additional acoustic data until
`both a detected silence occurs and a MICOFF message is
`received.
`0038. At detection of the terminal silence, and MICOFF
`event, recording re-commences in the circular buffer.
`0039. By these means, the meaning of MICON and
`MICOFF messages are converted into indications of the
`approximate time segment of speech to be decoded, and the
`exact boundaries of that period are determined using the
`silence detection features of the ASR. The silence detection
`feature of ASRs, such as the Embedded ViaVoiceTM product
`available from IBM Corporation (Armonk, N.Y.), operates
`in parallel with the speech to text decoding functions so, at
`the point that the MICOFF and silence condition has been
`met, the speech Sounds have also been decoded to text and
`are available for use by application Software or dialog
`managers or other Software components.
`0040. Referring initially to FIG. 1, a diagram illustrates
`an anti-truncation buffering methodology, according to an
`embodiment of the invention. It is to be understood that
`illustrative principles of the invention operate in the context
`of a computing system containing an acoustic signal capture
`and encoding capability. Software embodying the invention
`provides the means for directing the encoded audio into a
`circular buffer 110 or a linear buffer 120 by changing the
`value of a Write Pointer (shown as 130 for the circular buffer
`
`and 140 for the linear buffer) to indicate the next available
`memory address within a buffers address range.
`0041. Thus, as illustrated in FIG. 1, audio recording into
`the circular buffer has taken place for as long as the system
`has been turned on. At time 1, a MICON event is received.
`User speech may have begun prior to this event or may
`follow this event. The write pointer originally pointing to a
`position (130) in circular buffer 110 is then repositioned to
`point to a position (140) in linear buffer 120.
`0042. A linear copy of the circular buffer is prepended to
`the linear buffer (170). The data in this linear copy is
`arranged so that the oldest acoustic data in the circular buffer
`is located at the start of the segment prepended to the second
`buffer. This means that the acoustic data recorded immedi
`ately before the indication that the system is being addressed
`ends the prepended segment and is contiguous in memory
`with the acoustic data which immediately followed the
`MICON event that is stored in the second area.
`0043. The resulting composite buffer is searched for
`acoustic data representing silence. The location of the
`silence closest to the end of the prepended segment may then
`be used as the location in the composite buffer at which
`speech intended for the system to process begins. This
`silence will be in the prepended segment if the indication of
`speech was given after speech started. It will follow the end
`of the prepended segment if speech began after the MICON
`event.
`0044) The ASR may now start decoding (into text) the
`content of the augmented linear buffer from the address of
`the first detected silence. Later, at say time 3, the ASR
`detects silence (150) in the linear buffer data or receives a
`MICOFF event corresponding to time 3. As shown, at time
`3, the write pointer points to a position (160) in the linear
`buffer since the encoded audio continues to be written as the
`ASR decodes.
`0045 Referring now to FIG. 2, a more detailed flow
`diagram illustrates an anti-truncation buffering methodol
`ogy, according to an embodiment of the invention.
`0046.
`In step 201, the methodology is started.
`0047. In step 202, a set of buffers (a circular buffer and a
`linear buffer) and their low-level support software are
`instantiated. The specification and programming Support for
`such buffers and low-level support is well known to those of
`ordinary skill in the art. The circular buffer is of fixed size
`so that its oldest content is continually being over-written by
`the most recently captured acoustic signal. The linear buffer
`and its support are arranged to permit additional buffer space
`to be concatenated to the end of the buffer at any time the
`buffer runs low on space for recording.
`0048. In step 203, recording into the circular buffer
`begins. This completes the initialization phase.
`0049. In step 204, the methodology waits for a “MICON”
`event or signal or otherwise tests for the indication the user
`is addressing the system, by the means provided in the
`specific implementation of the system. If no event or indi
`cation is present, the methodology remains at step 204.
`Otherwise, in the presence of an indication or event or
`message, the system proceeds to 205.
`0050. In step 205, the memory location of the Write
`Pointer in the circular buffer is stored for later use.
`
`Exhibit 1014
`Page 08 of 12
`
`

`

`US 2007/0043563 A1
`
`Feb. 22, 2007
`
`0051. In step 206, the value of the Write Pointer is
`changed to the memory address of the start of a linear buffer,
`at which location audio recording continues without inter
`ruption. This is possible because the process of encoding the
`audio into a form which can be stored in digital computer
`memory requires computing on digital samples acquired at
`a rate of several tens of thousands a second while modem
`computers are capable of several billion operations a second.
`There is, therefore, a surplus of time available to switch
`storage locations (change the Write Pointer) without inter
`rupting the receipt or encoding or recording of the audio
`signal.
`In step 207, a linear copy is made of the circular
`0.052
`buffer segment. This copy begins at the oldest recoding
`(from the address of the write pointer in the circular buffer,
`stored in step 205) and continues using data from the circular
`buffer until the address in the circular buffer which imme
`diately precedes the write pointer address is reached. This
`copy buffer is then prepended to the linear buffer so the
`location in the copy buffer corresponding to the last data
`written into the circular buffer is contiguous with the first
`location in which data has been written into the linear buffer.
`Alternatively, the copy of the circular buffer could be made
`into memory space immediately preceding the linear buffer
`which had been allocated for the purpose of making the copy
`buffer.
`0053. In step 208, the audio in the composite buffer is
`decoded by the silence detection mechanism of the ASR.
`The “alignment data of the ASR is used to determine the
`beginning memory address of the silence that is in closest
`temporal proximity to the MICON event. The address of the
`silence closest to the end of the prepended segment may then
`be used as the address in the composite buffer at which
`speech intended for the system to process begins. This
`silence will be in the prepended segment if the indication of
`speech was given after speech started. It will follow the end
`of the prepended segment if speech began after the indica
`tion event. This address is stored for later use.
`0054) In step 209, the start address found in step 208 is
`passed to the ASR with the instruction to begin decoding.
`0055. In step 210, the ASR or some other mechanism
`detects that the user has stopped speaking for long enough
`to signal that the utterance is complete.
`0056.
`In step 211, the linear buffer recording is halted. At
`some time after this, the ASR returns the results of the
`recognition process through a channel created in another
`part of the application. This return is not a focus of the
`invention.
`0057. In step 212, the circular buffer is overwritten with
`values that cannot be misrecognized as silence and, in step
`213, the recording process begins again.
`0058. In step 213, either the old linear buffer is cleared or
`a new buffer is allocated with a "pre-pend” segment (e.g.,
`segment between arrows 170 and 140 in FIG. 1) long
`enough to hold the complete contents of the circular buffer.
`Control is then returned to step 204 so that the anti
`truncation buffering functionality can be used for the next
`utterance.
`0059) The above embodiment has been described in
`terms of a circular buffer offixed size, a linear buffer that can
`
`be extended, a microphone button supplying a “MICON”
`signal and an automatic speech recognizer with a silence
`detection feature. It should be understood that other con
`figurations of buffers and other buffer segment selection
`mechanisms may be realized by those of ordinary skill in the
`art and could be applied to implementing this invention
`without departing from the spirit of the invention. These
`include, but are not limited to, single buffer configurations
`which are expanded or changed in topology when “MICON
`or its equivalent is detected, multiple buffer configurations in
`which new allocation plays a reduced or nonexistent role,
`mechanisms which detect that speech is being directed to the
`recognition system without the use of a “microphone but
`ton, configurations which use First In First Out (FIFO)
`stacks in hardware or software in place of the circular buffer
`and utterance absence detection mechanisms other than
`acoustic silence detection.
`0060 An alternative embodiment of the invention is
`described below.
`0061. In this alternative embodiment, a large circular
`buffer is allocated when the program is started. This buffer
`is longer than the longest expected user utterance. For
`practical purposes, the buffer may hold approximately 100
`seconds of recorded speech. For the purpose of discussion,
`an ASR capable of providing several services is assumed.
`These include the capability to convert analog audio signals
`into a digital format such as pulse code modulated (PCM)
`format, the ability to detect (within some several hundreds
`of milliseconds) when speech Sounds have begun and when
`they have ended, and the ability to decode PCM stored in a
`memory buffer into a text representation of the speech audio
`stored in that buffer. The Embedded ViaVoice speech rec
`ognition engine from IBM Corporation (Armonk, N.Y.) is a
`currently commercially available ASR with these capabili
`ties. That is, the ASR adapted for use in the embodiments of
`FIGS. 1 and 2 may be adapted for use here as well.
`0062 Referring now to FIG. 3, a diagram illustrates the
`anti-truncation buffering methodology according to another
`embodiment of the invention. As in the embodiment above,
`data structures called buffers are used in order to capture and
`retain audio signals so that a memory buffer with a complete
`recording of a user utterance can be supplied to the ASR for
`decoding into text. The data structures allocated at initial
`ization are the linear buffer 300 and the circular buffer 308.
`Buffers, as is well understood in the art, are associated with
`data structures which are used to keep track of the buffer
`start (302, 310), finish (306, 310), and current writing
`location (304,312) along with other management data.
`0063. In operation, the ASR is used to capture speech in
`a form suitable for later recognition. It is typical for an ASR
`to deal with short frames of speech audio. In the case of the
`IBM Embedded ViaVoiceTM ASR, these frames are 100
`milliseconds in length. In order to preserve the captured
`audio for later use, the ASR is provided with a linear buffer
`in which it stores audio until the buffer is filled. A signal,
`referred to as a “callback,” is used to trigger a software
`component that transfers the content of the Small linear
`buffer to the next appropriate location in the large circular
`buffer.
`0064 Operation of the invention can be understood by
`considering several computer processes that are carried on
`“simultaneously' in “threads, in the sense used by those
`
`Exhibit 1014
`Page 09 of 12
`
`

`

`US 2007/0043563 A1
`
`Feb. 22, 2007
`
`with ordinary skill in the art of computer programming.
`These processes and operation of the invention are described
`below in the context of FIGS. 3 and FIG. 4.
`0065. The software instantiating this invention begins at
`step 400.
`0066.
`In step 402, the software components supporting
`the invention (the ASR) and the software components com
`prising the invention are initialized. For example, in the case
`of the IBM Embedded ViaVoice ASR product, part of the
`initialization includes making Application Programmers
`Interface (API) calls to the ASR to cause it to allocate and
`the Small linear buffer 300.
`0067. An API call is made in step 406 to cause the ASR
`to begin its audio recording function using the linear buffer
`300. This buffer 300 fills until the current write location 304
`corresponds to the buffer finish location 306. When this
`condition is detected, in step 408, a callback is generated to
`a function 410 that appends the content of buffer 300 to the
`content of buffer 308 at location 312.
`0068. Function 412 then sets the linear buffer 300 current
`write location 304 to equal the buffer start 302. This switch
`is accomplished in less time than is required by the ASR to
`process the next frame of speech so the change in write
`location has no effect on the continuity or integrity of the
`recorded data. The same function 412 advances the circular
`buffer current write location 312 to the location correspond
`ing to the end of the newly appended frame.
`0069. In step 414, the ASR is queried to determine
`whether a transition from speech sounds to silence 318 or
`from silence to speech sounds 320 occurred as detected in
`the last few frames. I

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket