throbber
Unlted States Patent [19]
`Goldhor et al.
`
`[54] VOICE CONTROLLED SYSTEM AND
`METHOD FOR GENERATING TEXT FROM
`A VOICE CONTROLLED INPUT
`[75] Inventors: Richard S. Goldhor, Belmont; John F.
`Dooley, Reading; Christopher N,
`
`lllIlIlllllllllllllllllllllllllllllllllllllllllIlllllllllllllllllllllllIIII
`USO05231670A
`[11] Patent Number:
`5,231,670
`[45] Date of Patent:
`Jul. 27, 1993
`
`4,468,804 3/1984 Kates et a1. -
`4,507,750 3/ 1985 Framl e‘ 31- -
`1
`893mm -
`4,630,304 12/1986 Borth et a1. .
`4,661,915 4/1987 Ott ....................................... .. 381/44
`
`isser .
`
`,
`
`,
`
`
`
`Hume’ Belmont; James P’ :Lerner, Newton; Bmm D‘ wllson’ Houston’
`
`
`
`Porter ................................. .. Primary Examiner-—Emanuel S. Kemeny
`
`an of Mass‘
`[73] Assignee: Kurzweil Applied Intelligence, Inc.,
`waltham’ Mass‘
`[2]] Appl' No‘: 855’461
`[22] Filed:
`Mar. 19, 1992
`‘
`Related U.S. Application Data
`Cominuation of Ser’ No_ 57,332, Jun. 1’ 1987’ aban_
`doned
`‘
`Int. C1.- .............................................. ..
`U:S. C1. ......................................... ..
`[58] Field of Search’ .................................. .. 38l/4339—54/62;
`
`[63]
`
`[56]
`
`References Cited
`Us PATENT DOCUMENTS
`
`g‘glg‘ng 12/1968 Clapper '
`,
`6.586 11/1974 Gnggs .
`3,989,897 11/1976 Carver l
`
`Attorney, Agent, or Firm-Henry D. Pahl, Jr.
`[57]
`ABSTRACI
`t
`t
`t‘
`t
`d '
`D‘ 1
`d
`th d f
`lSC ose IS a sys em an me 0 or genera mg ex
`from a voice input that divides the processing of each
`speech event into a dictation event and a text event.
`Each dictation event handles the processing of data
`relating to the input into the system, and each text event
`deals with the generation of text from the inputted voice
`signals. In order to easily distinguish the dictation
`events from each other and text events from each other
`the System and method creates a data Structure for st0r_
`certain information relating to each individual
`evenn Such data Structures enable the System and
`method to process both simple spoken words as well as
`spoken commands and to provide the necessary text
`generation in response to the spoken words or to exe
`cute an appropriate function in response to a command.
`Speech recognition includes the ability to distinguish
`.
`.
`between dlctatlon text and commands.
`
`4,144,582 3/1979 Hyatt ................................... .. 381/44
`4,435.617 3/1984 Gi'iggs ................................. .. 331/44
`
`25 Claims, 11 Drawing Sheets
`
`APPUCATION - 22
`
`PROGRAM
`
`APPUCATION INPUT
`
`10
`7*‘
`
`TEXT EVENT
`SUBSYSTEM
`
`:- 2°
`
`RECOGNIZER
`COMMANDS
`
`‘
`
`TRANSLA‘HONS AND
`DE HANDLES
`
`olc'rmou EVENT -- l8
`SUBSYSTEM
`
`aecoemzea
`COMMANDS
`
`CANDIDATE sets
`
`SPEECH EVENT
`ANALYZER
`
`- l6
`
`SPEECH EvENT DATA
`
`1
`
`2
`O
`r:
`5
`3
`O.
`Q.
`(
`
`a:
`‘,3
`E
`8
`U
`g;
`
`l2
`
`..
`'4
`
`SPEECH
`SIGNAL
`PROCESSOR
`
`"
`
`001
`
`Facebook/Instagram Ex. 1007
`
`

`

`US. Patent
`
`July 27, 1993
`
`Sheet 1 of 11
`
`5,231,670
`
`APPLICATION
`PROGRAM
`
`22
`
`‘
`
`APPLICATION INPUT
`
`IO
`‘
`v‘
`
`TEXT EVENT
`SUBSYSTEM
`
`2°
`
`Z
`0
`;
`5
`3
`O.
`O.
`<
`
`RECOGNIZER
`COMMANDS
`
`TRANSLATIONS AND
`DE HANDLES
`
`X
`
`DICTATION EVENT
`SUBSYSTEM
`
`'8
`
`, RECOGNIZER
`COMMANDS
`
`CANDIDATE SETS
`
`SPEECH EvENT
`ANALYZER
`
`l6
`
`SPEECH EvENT DATA
`
`l2
`
`SPEECH
`SIGNAL
`PROCESSOR
`
`'4
`
`FIGI
`
`0:
`g}
`2
`8
`L)
`gg
`
`'
`
`002
`
`Facebook/Instagram Ex. 1007
`
`

`

`US. Patent
`
`July 27, 1993
`
`Sheet 2 of 11
`
`5,231,670
`
`DICTATION EVENT
`HANDLE
`
`30 /
`
`so\
`
`— 32
`
`5 2 *- TExT EVENT HANDLE
`
`34
`CHRONOLOGICAL
`RELATIONSHIP INFO ‘’
`
`54 _ CHRONOLOGICAL
`RELATIONSHIP INFO v
`
`I
`
`CANDIDATE SET
`INFORMATION
`
`-36
`
`56;. HIERARCHICAL
`RELATIONSHIP INFO
`
`BEST MATCH
`CANDIDATE
`
`,38
`
`58_. TEXTUAL
`RELATIONSHIP INFO
`
`CORRECT CHOICE
`CANDIDATE
`
`__ 40
`
`so __ ASSOCIATED DICTATION
`EvENT HANDLE
`
`RECOCNIZER
`PERFORMANCE INFO
`
`'42
`
`6 2 *- INPUT EvENT
`, INFORMATION
`
`RECOGNIZER STATE
`INFORMATION
`
`44
`
`6 4 __ APPLICATION STATE
`INFORMATION
`
`IMPLEMENTATION-
`DEPENDENT INFO
`
`46
`
`es _- IMPLEMENTATION
`DEPENDENT INFO
`
`FlG.2i
`
`003
`
`Facebook/Instagram Ex. 1007
`
`

`

`US. Patent
`
`July 27, 1993
`
`‘Sheet 3 of 11
`
`5,231,670
`
`70
`
`9O
`
`WAIT FOR NEXT
`SPEECH EVENT
`
`_r_ 72
`
`_I__
`
`CREATE NEW
`DICTATION EVENT
`
`T’ 74
`
`9 2 V WAIT FOR NEXT
`INPUT EVENT
`
`CREATE NEW
`94“ TEXT EVENT
`
`FILL IN RECOGNIZER
`STATE INFORMATION
`
`7 6
`
`96
`
`FILL IN APPLICATION
`STATE INFORMATION
`
`BEST MATCH =
`BEST VALID CANDIDATE
`
`78
`
`98
`
`ASSOCIATE TE WITH
`DE IF APPROPRIATE
`
`TRANSLATE BEST MATCH
`
`80 I00
`
`PROCESS INPUT EVENT
`
`SEND TRANSLATION TO
`APPLICATION
`
`FIGAQ
`
`F'G'4b
`
`004
`
`Facebook/Instagram Ex. 1007
`
`

`

`U.S. Patent
`
`July 27, 1993
`
`Sheet 4 of 11
`
`5,231,670
`
`H2
`2
`IS THERE A CURRENT
`TEXT EVENT
`
`x = CURRENT TE
`Y = ASSOC. DE
`
`H6
`
`1
`
`UNDO EFFECTS OF X
`USING APPL STATE INFO
`
`‘
`
`I I8
`
`I
`SET DICTATION STATE
`usmcs RECOG STATE INFO
`
`|20
`
`1
`
`REMOVE X FROM TEDB
`REMOVE Y FROM DEDB
`
`'22
`
`005
`
`Facebook/Instagram Ex. 1007
`
`

`

`US. Patent
`
`July 27, 1993
`
`Sheet 5 of 11
`
`5,231,670
`
`1
`IS THERE A CURRENT
`TEXT EVENT ?
`
`I28
`
`X = CURRENT TE
`
`Y = TE BACK N FROM X
`(CHRONOLOGICALLY)
`
`l
`
`I32
`
`MAKE Y CURRENT TE
`
`I34
`
`SET APPLICATION STATE
`usmca APPL STATE INFO
`
`'4 3 6
`
`SET DICTATION STATE
`usms ASSOCIATED DEH
`
`'38
`
`F166
`
`006
`
`Facebook/Instagram Ex. 1007
`
`

`

`US. Patent
`
`July 27, 1993
`
`Sheet 6 of 11
`
`5,231,670
`
`I42
`)
`as THERE A cuRRENT
`TEXT EVENT ?
`
`I44
`
`X = CURRENT TE
`
`I46
`
`Y = TE FORWARD N FROM x
`(CHRONOLOGICALLY)
`
`'48
`
`MAKE Y CURRENT TE
`I
`
`SET APPLICATION STATE
`USING APPL STATE INFO
`
`Id’ '50
`
`| 5 2
`
`SET DICTATION STATE
`USING ASSOCIATED DEH
`
`I54
`
`FIG?
`
`007
`
`Facebook/Instagram Ex. 1007
`
`

`

`US. Patent
`
`July 27, 1993
`
`Sheet 7 of 11
`
`5,231,670
`
`IS THERE A CURRENT
`TEXT EVENT
`
`I60
`
`x = CURRENT TE - 162
`Y = DESIRED EVENT
`
`UNDO EFFECTS OF X
`USING STATE INFO
`
`Z = TE CHRONOLOGICALLY
`PREVIOUS TO X
`
`REMOVE X
`
`I72
`
`FIG.8
`
`008
`
`Facebook/Instagram Ex. 1007
`
`

`

`US. Patent
`
`July 27, 1993
`
`Sheet 8 of 11
`
`5,231,670
`
`I78
`
`IS THERE A CURRENT
`DICTATION EVENT?
`
`x = CURRENT TE
`
`F = FORM SPECIFICATION
`FROM INPUT INFO IN x
`I
`CREATE A TE FOR EACH
`FIELD IN F
`
`'82
`
`I 8 4
`
`MAKE THE CURRENT TE
`THE FIRST FIELD IN F
`
`l 86
`
`009
`
`Facebook/Instagram Ex. 1007
`
`

`

`US. Patent
`
`July 27, 1993
`
`Sheet 9 of 11
`
`5,231,670
`
`IS THERE A CURRENT
`DICTATION EVENT?
`
`I9 I
`
`X = CURRENT DE
`B = BEST MATCH
`
`I92
`
`UNDO EFFECTS OF
`TRANSLATION OF B
`
`I94
`
`SET DICTATION STATE
`USING RECOG STATE INFO
`
`'96
`
`MARK B AS INVALID
`FOR BEST MATCH
`
`I98
`
`BEST MATCH = N’TH BEST
`VALID CANDIDATE
`
`200
`
`TRANSLATE BEST MATCH
`
`202
`
`FIGIO
`
`010
`
`Facebook/Instagram Ex. 1007
`
`

`

`US. Patent
`
`July 27, 1993
`
`Sheet 10 of 11
`
`5,231,670
`
`206
`
`2 o a
`
`IS THERE A CURRENT
`TEXT EVENT?
`
`X = CURRENT TE
`Y = ASSOC DE
`
`UNDO EFFECTS OF X
`USING APPL STATE INFO
`
`WAIT FOR NExT
`SPEECH EVENT
`
`22°
`
`SET DICTATION STATE
`USING RECOG STATE INFO
`
`MARK REMEMBERED
`CANDIDATES INVALID
`
`I
`
`REMEMBERED CURRENTLY
`INVALID CANDIDATES
`
`BEST MATCH =
`BEST VALID CANDIDATE
`
`222
`
`224
`I
`
`REMOVE X FROM TEDB
`REMOVE Y FROM DEDB
`
`2I8,
`
`"226
`l
`TRANSLATE BEST MATCH
`
`011
`
`Facebook/Instagram Ex. 1007
`
`

`

`US. Patent
`
`July 27, 1993
`
`Sheet 11 of 11
`
`5,231,670
`
`l2
`
`')4
`SPEECH
`SIGNAL
`PROCESSOR
`‘W
`
`'5’
`
`—-> PROCESSOR
`
`23
`OUTPUT
`DISPLAY
`DEVICE
`
`FIG.|_2
`
`012
`
`Facebook/Instagram Ex. 1007
`
`

`

`1
`
`5,231,670
`
`VOICE CONTROLLED SYSTEM AND METHOD
`FOR GENERATING TEXT FROM A VOICE
`CONTROLLED INPUT
`
`This is a continuation of co-pending application Ser.
`No. 07/057,332 ?led on Jun. 1, 1987 now abandoned.
`
`2
`Known systems treat verbal input as typed input, or
`in other words, convert the speech into keystrokes. A
`speaker, however, supplies input to the system in word
`units, and verbal commands are more easily understood
`in terms of word units rather than characters. For this
`reason, known systems do not make effective usage of
`vocal commands, especially commands involving back
`tracking through documents.
`Another problem with known systems is that they
`assume that each input character is correct, and as a
`result the systems do not efficiently correct mistakenly
`translated verbal input.
`It is therefore a principal object of the present inven
`tion to provide a system and method for generating text
`from a voice input that organizes and records informa
`tion about system state and verbal and non-verbal input.
`Another object of the present invention is to provide
`a system and method for generating text from a voice
`input that reliably and effectively implements system
`functions which make it possible for the user to inform
`the system of misrecognitions; for the system to undo
`the effects of said misrecognitions; for the user to con
`trol the application by referring directly to earlier
`events in the dictation process; and for the system to
`control and modify the recognition of speech, including
`the ability to learn from earlier misrecognitions.
`A still further object of the present invention is to
`provide a system and method for generating text from a
`voice input that organizes the process of speech dicta
`tion to computerized systems, and the response of those
`systems, into similar structures which can be used to
`effectively control and modify system operation.
`A further object of the present invention is to provide
`a system and method for generating text from a voice
`input that groups and organizes the various inputs in a
`manner that facilitates retrieval of any input.
`
`15
`
`20
`
`25
`
`30
`
`35
`
`BACKGROUND OF THE INVENTION
`The present invention relates to text generating sys
`tems that generate text from a voice input and more
`particularly to a text generating system that generates
`text from a voice input and also accepts voice generated
`control commands with the ability to distinguish be
`tween the input and the commands.
`Voice controlled text generating systems generally
`include a microphone for detecting speech signals. A
`speech signal processor transforms the detected speech
`signals into a representation for recognition by a proces
`sor (e.g. short term spectral cross-sections). A speech
`signal processor transmits the processed signals to a
`speech event analyzer which generates a set of recogni
`tion candidates in response to each detected speech
`event. A recognition candidate is a vocabulary item
`which is stored in a system memory (not shown) that is
`similar to the detected speech event which represents
`the spoken word or phrase. The system creates a candi
`date set that includes all of the recognition candidates,
`or in other words, the candidate set includes all known
`vocabulary items which are sufficiently similar to the
`detected speech event that the speech event analyzer 16
`decides that there is a high degree of probability that the
`speech event is an instance of the vocabulary item rep
`resented by the recognition candidate.
`In order to enable the system to choose the most
`appropriate candidate, the system assigns a recognition
`score to each candidate. The recognition score indicates
`the likelihood that the speech event is an instance of the
`candidate, and after processing is complete, the recogni
`tion candidate with the highest recognition score is
`designated the “Best Match”. The system then selects
`the “Best Match" as the candidate representing the
`chosen vocabulary item.
`After the best match candidate has been selected, the
`system translates the candidate and transmits the trans
`lated candidate to the application. In other words, the
`translation is the input to the application that has been
`designated as the input to be sent when the candidate is
`chosen as best match for a particular state of the recog
`nizer. As a result, in theory, there is a speci?ed transla
`tion for each combination of best match vocabulary
`item and recognizer state. Often, of course: a translation
`is simply the spelling of the best match word or phrase,
`but it can be any legal input to the application.
`In addition to including the capability of accepting
`voice input and deriving the corresponding text to that
`voice input, it is also desirable to be able to control the
`system through the use of voice commands. In such a
`system, the voice commands actuate assigned tasks in
`response to the voice commands. This is especially
`important for a system designed for use by handicapped
`individuals or for a system designed for use by an indi
`vidual who does not have free use of his/her hands
`because the hands are occupiedywith another task dur
`ing use of the system. Moreover, when a text generating
`system is used for dictation, the person dictating usually
`can not efficiently use a keyboard, and voice operated
`commands would greatly facilitate use of the system.
`
`SUMMARY OF THE INVENTION
`Accordingly, the system and method for generating
`text from a voice input of the present invention divides
`the processing of each speech event into a dictation
`event and a text event. Each dictation event handles the
`processing of data relating to the input into the system,
`and each text event deals with the generation of text
`from the inputted voice signals. In order to easily distin
`guish the dictation events from each other and text
`events from each other the system and method creates a
`data structure for storing certain information relating to
`each individual event. Such data structures enable the
`system and method to process both simple spoken
`words as well as spoken commands and to provide the
`necessary text generation in‘ response to the spoken
`words or to execute an appropriate function in response
`to a command.
`These and other objects and features of the present
`invention will be more fully understood from the fol
`lowing detailed description which should be read in
`light of the accompanying drawings in which corre
`sponding reference numerals refer to corresponding
`parts throughout the several views.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 is a schematic diagram of the voice input-text
`generating system of the present invention;
`'
`FIG. 2 is a diagram of a typical data structure utilized
`to represent each dictation event created by the system
`and method of the present invention;
`
`45
`
`50
`
`55
`
`65
`
`013
`
`Facebook/Instagram Ex. 1007
`
`

`

`20
`
`25
`
`30
`
`35
`
`5,231,670
`3
`4
`FIG. 3 is a diagram of a data structure used to repre
`the speech event analyzer 16 decides there is an appre
`sent each text event generated in response to a dictation
`ciable possibility that the speech event was an instance
`event by the system and method of the present inven
`of the vocabulary item. Each candidate includes an
`tion:
`associated recognition score which indicates the likeli
`FIG. 4a is a flow chart of the operation of the system
`hood that the speech event is an instance of that candi
`and method of the present invention in processing
`date.
`speech events;
`The translation input to the application when a par
`FIG. 4b is a ?ow chart of the operation of the system
`ticular candidate is chosen as best match generally rep
`and method of the present invention in processing input
`resents the spelling of that particular candidate word or
`events;
`phrase. As will be described below, the translation may
`FIG. 5 is a ?ow chart of the operation of the system
`also be any other legal input into a particular applica
`and method of the present invention in response to a
`tion, and the translation may in fact be used to control
`“SCRATCH THAT” command;
`the application by voice. The translation may also in
`FIG. 6 is a flow chart of the operation of the system
`clude input to the recognizer whereby the operation of
`and method of the present invention in response to a
`the recognizer can be controlled and its state changed.
`"FORWARD N” command;
`As mentioned above, a dictation event describes the
`FIG. 7 is a flow chart of the operation of the system
`operation of the system of the present invention at the
`and method of the present invention in response to a
`input stage of the system. A dictation event record is a
`“BACK N” command;
`formal data object that describes the speech event, and
`FIG. 8 is a ?ow chart of the operation of the system
`the speech event is an occurrence in the speech signal of
`and method of the present invention in response to a
`an event interpreted as a word or phrase by the recog
`“BACKTRACK” command;
`nizer. For each such speech event, the recognizer stores
`FIG. 9 is a flow chart of the operation of the system
`useful information in a dictation event database and
`and method of the present invention in response to a
`provides techniques (commands, subroutine calls, mac
`“FILL IN FORM" command;
`ros, etc.) by which certain speci?ed operations may be
`FIG. 10 is a ?ow chart of the operation of the system
`performed on the dictation event database. Before dis
`and method of the present invention in response to a
`cussing these various operations the structure of each
`“TAKE N" command;
`individual data element in each record of the dictation
`FIG. 11 is a flow chart of the operation of the system
`event database will be described.
`and method of the present invention in response to a
`Referring now to FIG. 2, there is shown a dictation
`“TRY AGAIN" command.
`event record 30 of the dictation event database for a
`FIG. 12 is a block diagram of the components of the
`single dictation event. Each record includes a dictation
`system of the present invention.
`event handle 32 which is generally an address in the
`database where this record is stored. The chronological
`relationship information element 34 includes addresses
`or pointers to other dictation event records created
`immediately before and immediately after the current
`dictation event record.
`The candidate set information element 36 contains
`information relating to each of the potential recognition
`candidates that is chosen by the speech event analyzer
`16, and in one embodiment this information is a list of
`hash codes representing each one of the candidates.
`Element 36 of the dictation event record 30 will also
`frequently include the recognition scores representing
`the probability that each candidate is the best match for
`the speech event data transmitted to the speech event
`analyzer 16. The best match candidate element 38 indi
`cates the candidate chosen as the best match and in one
`embodiment this element is an index into the candidate
`set contained in element 36. In other words, element 38
`points to the best match candidate in the candidate set.
`The correct choice element 40 of the dictation event
`record is also an index into the candidate set that points
`to the correctly translated speech pattern. Of course,
`this record may point to the same candidate as the best
`match candidate element 38.
`The recognizer performance information element 42
`is a rather large substructure of the dictation event
`record 30. This element 42 receives data from various
`modules in the recognizer, and this data represents a
`variety of information items regarding the performance
`of the recognizer. For example, in a preferred embodi
`ment element 42 includes an internal representation of
`the waveform. By storing this internal representation,
`the system may playback the speech represented by the
`waveform. This element may also contain information
`concerning the acoustic characteristics of various spo
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`Referring to FIGS. 1 and 12 the system for generat
`ing text from voice input of the present invention in
`cludes a microphone 12 or other voice input means
`attached to a speech signal processor 14 which converts
`40
`the speech signals into digital data that can be processed
`by processor 15. The processor 15 performs several
`distinct functions, including serving as the speech event
`analyzer 16, the dictation event subsystem 18, the text
`event subsystem 20, and the executor of the application
`program. The speech signal processor 14 generates
`speech event data and transmits this data to the proces
`sor 15 to be processed ?rst by the speech event analyzer
`16. The speech event analyzer 16 generates a list or set
`of possible candidates that represent the voice input
`processed by the speech signal processor 14. The
`speech event analyzer 16 transmits the candidate sets to
`a dictation event subsystem 18. The dictation event
`subsystem 18 analyzes the candidate sets and chooses
`the “BEST MATCH”, i.e. the candidate with the high
`est degree of similarity. This candidate is then consid
`ered the correct translation, and the dictation event
`subsystem forwards the translation to text event subsys
`tem 20 which in turn inputs the translated text to an
`application. During the execution of the application,
`text can be displayed on output display device 23,
`which can be a CRT, printer, etc.
`The recognition candidates that are included in the
`candidate sets transmitted from the speech event analy
`zer 16 to the dictation event subsystem 18 are vocabu
`lary items similar to the detected speech event. The
`entire set includes all known vocabulary items which
`are sufficiently similar to the detected speech event that
`
`45
`
`55
`
`65
`
`014
`
`Facebook/Instagram Ex. 1007
`
`

`

`20
`
`5,231,670
`6
`5
`given dictation session and not between sessions. How
`ken phrases and may also include thresholds used inter
`nally to choose candidates.
`ever, as users of the system will frequently interrupt a
`The recognizer state information element 44 contains
`dictation session and continue at a later time either the
`state variables that insure that the same input to the
`entire dictation event database or individual dictation
`system 10 provides the same output. In addition, for
`event records can be permanently stored in ?les for use
`each dictation event the recognizer state information
`either in analyzing the performance of the recognizer or
`element 44 stores information describing the state of the
`for recreating the dictation session at a later time.
`recognition program. This information enables all val
`The system 10 also allows dictation event records to
`ues to be exactly reset and avoids causing the system to
`be deleted from the dictation event database in order to
`re-learn correct translations for speech events. The ?nal
`minimize the amount of storage required for the dicta
`element shown in the dictation event record is im
`tion event database. Dictation event records may also
`plementation-dependent information element 46. Ele
`be deleted in order to reduce the time required to per
`ment 46 stores many different data items, including, for
`form other dictation event operations thereby reducing
`example, data that allows the updating of vocabulary
`the searching time as well as the time associated with
`recognition data as a result of the way a speaker says
`other operations. Typically, dictation events corre
`words.
`sponding to the least recent speech events are removed
`A dictation event and consequently a dictation event
`?rst.
`record 30 is created as part of the process of recognizing
`When the user, the recognizer or the application
`a word. The creation of a dictation event record in
`determines that the system correctly recognized a par
`cludes the allocation or reservation of memory space in
`ticular speech event or incorrectly recognized a speech
`the dictation event database for the dictation event
`event, a process of adapting the speech related data
`record which will store the information described
`upon which the performance of the recognizer depends
`above and shown in FIG. 2. The data record 30 is also
`may be carried out. Information stored in the recog
`initialized at the time it is created, and the system gener
`nizer state information element 44 may be used in this
`ates a dictation event handle 32 which uniquely speci
`process.
`ties that dictation event record 30. Handle 32 is stored
`A chronological relationship exists between any two
`for each speci?c dictation event record by each facility
`records of the dictation event database as one record
`within the recognizer or application which may later
`was created before the other record. The speech event
`want to refer to a particular dictation event. Once the
`of the dictation event record that was created earlier
`system creates a dictation event record 30 a dictation
`occurred before the speech event of the dictation event
`event can be selected as the active dictation event for
`which was created later. This chronological order can
`any dictation event operation by specifying its dictation
`also be determined from the structure of the dictation
`event handle. Alternatively, a dictation event can be
`event database. As described above the chronological
`selected by specifying another dictation event which
`relationship information element 34 generally will in
`stands in some relationship to the desired dictation
`clude a pointer to and from chronologically adjacent
`event (such as chronologically following it) and speci
`dictation event records.
`fying the relevant relationship. If no dictation event is
`After the system 10 of the present invention processes
`currently active a “null dictation event" may be speci
`a dictation event, a text event is then created. Referring
`?ed.
`to FIG. 3, each text event record 50 is a formal data
`As described above, a candidate set is associated with
`object which contains data describing an “input event".
`each dictation event. From this set, the system chooses
`Such input events include the reception by an applica
`a best match candidate. Several operations can be per
`tion of some input that can be treated as a single unit.
`formed on a dictation event record that relates to the
`One important class of input events are the reception of
`candidate set of the dictation event. In particular, a
`output from the recognizer and this output is generally
`recognition candidate in the set can be marked as incor
`in the form of translations. Other input events include
`rect; a candidate can be marked as selected (i.e., can be
`typed input, input from pointing devices such as a
`speci?ed by the user as a correct recognition for the
`mouse, etc.
`speech event which the dictation event represents);
`For each input event, the application stores useful
`candidates in the set can be reordered so that for any
`information in a text event database that includes a
`speech event a different candidate than the candidate
`number of text event records 50. The application also
`originally determined by the system is produced as a
`provides techniques (commands, subroutine calls, mac
`best match candidate each time the speech event oc
`ros, etc.) by which certain speci?ed operations may be
`curs. Finally, the entire candidate set can be retrieved
`performed on the text event database. The term “text
`for display to enable a user of the system to select the
`event” has been chosen to describe all application
`correct candidate or for further processing.
`events whether or not the input events involve the
`Another important operation performed on the dicta
`processing or creation of text, and therefore text event
`tion event database is the resetting of the state of the
`records arealso used to record information about all
`recognizer to the recognizer state at the time of the
`types of input events.
`occurrence of a speech event. A common example of
`A text event record 50 of the text event database is
`this resetting is the re-analysis of an utterance which
`created as part of the process of accepting input to the
`was incorrectly recognized. It is the recognizer state
`application. The creation includes the allocation or
`information 44 stored in the dictation event record 30
`reservation of memory space in the text event database
`that is used to perform the reset operation, and as dis
`for the record which will store the information com
`cussed above this information includes state variables
`prising the text event. The creation also involves the
`that enable the system to provide the same output for
`initialization of that data record, and the generation of a
`the same input.
`text event handle 52 which can be subsequently used to
`The system 10 of the present invention generally
`uniquely specify that text event.
`maintains the dictation event database only within a
`
`55
`
`65
`
`25
`
`40
`
`45
`
`015
`
`Facebook/Instagram Ex. 1007
`
`

`

`20
`
`25
`
`5,231,670
`7
`As in the case of the dictation event handle 32, the
`text event handle 52 generally represents a memory
`address of a particular text event in the text event data
`base. This handle is stored for each application facility
`that may later want to reference that text event record,
`as the text event can be referenced by specifying its text
`event handle. Alternatively, a text event can be refer
`enced by specifying another text event record which
`stands in some relationship to the desired text event
`(such as chronologically following) and specifying the
`relevant relationship.
`Each text event record contains data describing the
`input event which resulted in the creation of the text
`event. The actual input data itself may be stored in the
`text event record, and a code number is stored in the
`text event record that identi?es the input event type.
`This data is stored in the input event information ele
`ment 62. Examples of typical types of input events are:
`reception of a translation from a recognizer; keyboard
`input; input from a pointing device; and “input” from a
`preprogrammed application activity.
`Each text event record 50 also includes a data ele
`ment that provides chronological relationship informa
`tion with respect to other text event records. As in the
`case of the chronological relationship information ele
`ment 34 in the dictation event records 30, the chrono
`logical relationship information element 54 in the text
`event records 50 includes links to and from text event
`records that were created immediately before and after
`each event record.
`Unlike dictation events, text events can have hierar
`chical relationships with respect to each other. Each
`text event record 50 contains a hierarchical relationship
`information element 56 identifying those text events
`which are either immediately superior or immediately
`inferior to itself. This superior and inferior relationship
`is created if a given text event is active when a new text
`event is created. In such a situation, the active text event
`is considered to be the superior of the next created text
`event. For any text event record in the text event data
`base, it is possible to determine all superior text events
`(if they exist) and all of its inferior text events (if they
`exist). Of course this order is only a partial ordering
`since not all text events stand in a hierarchical relation
`ship to each other. The data stored in the hierarchical
`relationship information element 56 may be either a list
`of addresses of the superior and inferior text event re
`cords or links and pointers to appropriate lists of supe
`rior and inferior text events.
`The text event record 50 also includes a data element
`that stores the textual relationship information so that
`actual text may be linked. In other words, any two
`consecutive items of text are identi?ed as being consec
`utive so that the systems may jump around to different
`text events and still maintain the proper order of the
`outputting text. This textual relationship information
`element 58 of each text event record is generally a
`pointer to and from each text event record which indi
`cates the relative textual position of any two text events
`that have text associated with them. This feature is
`especially important in a word processing program
`where text is added to a target document. In such a
`situation, for any two text events which result in such
`addition of text to the same document an ordering can
`be determined which speci?es which text event corre
`sponds to text closer to the beginning of the document
`and which text corresponds to text closer to the end of
`the document. This, of course, is only a partial ordering
`
`8
`since not all text events are associated with text in the
`same document.
`Each text event record also contains an input event
`information element 62 describing the input which re
`sulted in the creation of the text event. The actual input
`data itself may be stored in the text event record or a
`code number may be used and stored in the text event
`record that identi?es the input event type. Examples of
`types of input events are: reception of a translation from
`the recognizer; keyboard input; input from a pointing
`dev

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket