`
`_________________
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`
`_________________
`
`APPLE INC.,
`Petitioner
`
`v.
`
`ZENTIAN LIMITED,
`Patent Owner
`_________________
`
`Inter Partes Review Case No. IPR2023-00034
`U.S. Patent No. 7,979,277
`
`DECLARATION OF CHRISTOPHER SCHMANDT
`IN SUPPORT OF PETITION FOR INTER PARTES REVIEW OF
`U.S. PATENT NO. 7,979,277
`
`Amazon / Zentian Limited
`Exhibit 1003
`Page 1
`
`
`
`Declaration of Christopher Schmandt
`Patent No. 7,979,277
`
`TABLE OF CONTENTS
`
`I.
`
`INTRODUCTION AND QUALIFICATIONS ....................................... 15
`A.
`EDUCATIONAL BACKGROUND AND PROFESSIONAL EXPERIENCE ........ 15
`II. METHODOLOGY: MATERIALS CONSIDERED .............................. 18
`III. OVERVIEW OF LEGAL STANDARDS ............................................... 21
`A.
`PERSON OF ORDINARY SKILL IN THE ART ........................................... 22
`B.
`OBVIOUSNESS ..................................................................................... 22
`C.
`ANALOGOUS ART ............................................................................... 28
`D.
`CLAIM CONSTRUCTION ...................................................................... 29
`1.
`Construction Pursuant to 35 U.S.C. § 112, ¶ 6 ...................... 29
`IV. LEVEL OF A PERSON OF ORDINARY SKILL ................................. 31
`V.
`OVERVIEW OF THE TECHNOLOGY ................................................ 32
`A.
`SPEECH RECOGNITION ........................................................................ 32
`B.
`FEATURE VECTORS ............................................................................ 40
`C.
`ACOUSTIC MODELS ............................................................................ 49
`D.
`HIDDEN MARKOV MODELS ................................................................ 51
`E.
`DISTANCE CALCULATIONS ................................................................. 59
`F.
`GAUSSIAN DISTRIBUTION AND PROBABILITY ..................................... 63
`G.
`SPEECH RECOGNITION SYSTEM HARDWARE ....................................... 66
`H.
`PIPELINING ......................................................................................... 75
`I.
`INTERRUPTS ....................................................................................... 79
`J.
`PRIOR ART SPEECH RECOGNITION SYSTEMS ...................................... 81
`VI. OVERVIEW OF THE ’277 PATENT ..................................................... 82
`VII. SUMMARY OF UNPATENTABILITY ................................................. 83
`VIII. OVERVIEW OF THE PRIOR ART ....................................................... 85
`A.
`OVERVIEW OF JIANG .......................................................................... 85
`B.
`OVERVIEW OF BAUMGARTNER ........................................................... 85
`C.
`OVERVIEW OF BROWN ........................................................................ 86
`D.
`OVERVIEW OF KAZEROONIAN ............................................................ 86
`
`Amazon / Zentian Limited
`Exhibit 1003
`Page 2
`
`
`
`2.
`
`3.
`
`Declaration of Christopher Schmandt
`Patent No. 7,979,277
`OVERVIEW OF VENSKO ...................................................................... 87
`E.
`OVERVIEW OF SMYTH ........................................................................ 87
`F.
`IX. OPINIONS REGARDING GROUND 1: CLAIMS 1, 5, 7, 12,
`AND 14-16 ARE OBVIOUS OVER JIANG, BAUMGARTNER,
`AND BROWN ............................................................................................. 88
`A.
`INDEPENDENT CLAIM 1 ...................................................................... 88
`1.
`Claim 1(Pre): “A speech recognition circuit,
`comprising” ............................................................................ 88
`Claim 1(a): “an audio front end for calculating a feature
`vector from an audio signal,” ................................................. 89
`Claim 1(b): “wherein the feature vector comprises a
`plurality of extracted and/or derived quantities from said
`audio signal during a defined audio time frame;” ................. 94
`a)
`The ’277 Patent’s Discussion of “extracted and/or
`derived quantities” ........................................................ 94
`Jiang’s Teachings ......................................................... 96
`b)
`Claim 1(c): “a calculating circuit for calculating
`distances indicating the similarity between a feature
`vector and a plurality of predetermined acoustic states of
`an acoustic model; and” ......................................................... 98
`a)
`The ’277 Patent’s Discussion of “distances” .............. 100
`b)
`Jiang’s Teachings ....................................................... 102
`(1)
`“a calculating circuit” ....................................... 102
`(2)
`“a plurality of predetermined acoustic states
`of an acoustic model” ....................................... 104
`“a feature vector” .............................................. 107
`(3)
`“calculating distances …” ................................ 110
`(4)
`Baumgartner Teaches a “calculating circuit” ............. 121
`c)
`d) Motivation to Combine Jiang and Baumgartner’s
`Teachings .................................................................... 126
`Claim 1(d): “a search stage for using said calculated
`distances to identify words within a lexical tree, the
`lexical tree comprising a model of words;” ......................... 127
`a)
`“a lexical tree” ............................................................ 127
`
`5.
`
`4.
`
`Amazon / Zentian Limited
`Exhibit 1003
`Page 3
`
`
`
`b)
`c)
`
`c)
`
`d)
`
`e)
`
`Declaration of Christopher Schmandt
`Patent No. 7,979,277
`“the lexical tree comprising a model of words” ......... 130
`“a search stage for using said calculated distances
`to identify words within a lexical tree” ...................... 131
`Claim 1(e): “wherein said audio front end and said
`search stage are implemented using a first processor,
`and said calculating circuit is implemented using a
`second processor, and” ........................................................ 135
`a)
`Overview of Mapping ................................................. 135
`b)
`Baumgartner Teaches an “audio front end,” a
`“calculating circuit,” and a “search stage” ................. 139
`Baumgartner Teaches an Audio Front End
`“implemented using a first processor” ........................ 146
`Baumgartner Teaches a Calculating Circuit
`“implemented using a second processor” ................... 149
`Baumgartner Teaches a Search Stage
`“implemented using a first processor” ........................ 151
`Motivation to Combine Jiang and Baumgartner ........ 153
`f)
`Claim 1(f): “wherein data is pipelined from the front end
`to the calculating circuit to the search stage.” ..................... 160
`a)
`Brown’s Teachings ..................................................... 162
`b) Motivation to Combine Brown with Jiang-
`Baumgartner ............................................................... 164
`DEPENDENT CLAIM 5 ....................................................................... 168
`1.
`“A speech recognition circuit as claimed in claim 1,
`wherein the said calculating circuit is configured to
`autonomously calculate distances for every acoustic state
`defined by the acoustic model.” ............................................ 168
`DEPENDENT CLAIM 7 ....................................................................... 170
`1.
`“The speech recognition circuit of claim 1, wherein the
`feature vector comprises a plurality of spectral
`components of an audio signal for a predetermined time
`frame.” .................................................................................. 170
`DEPENDENT CLAIM 12 ..................................................................... 171
`
`6.
`
`7.
`
`B.
`
`C.
`
`D.
`
`Amazon / Zentian Limited
`Exhibit 1003
`Page 4
`
`
`
`E.
`
`F.
`
`1.
`
`2.
`
`3.
`
`4.
`
`5.
`
`6.
`
`Declaration of Christopher Schmandt
`Patent No. 7,979,277
`“The speech recognition circuit of claim 1, wherein the
`audio front end is configured to input a digital audio
`signal.” ................................................................................. 171
`INDEPENDENT CLAIM 14 .................................................................. 171
`1.
`Claim 14(Pre): “A speech recognition circuit,
`comprising:” ......................................................................... 171
`Claim 14(a): “an audio front end for calculating a
`feature vector from an audio signal,” ................................... 171
`Claim 14(b) “wherein the feature vector comprises a
`plurality of extracted and/or derived quantities from said
`audio signal during a defined audio time frame;” ............... 171
`Claim 14(c): “calculating means for calculating a
`distance indicating the similarity between a feature
`vector and a predetermined acoustic state of an acoustic
`model; and” .......................................................................... 172
`Claim 14(d): “a search stage for using said calculated
`distances to identify words within a lexical tree, the
`lexical tree comprising a model of words;” ......................... 173
`Claim 14(e): “wherein said audio front end, said
`calculating means, and said search stage are connected
`to each other to enable pipelined data flow.” ....................... 173
`INDEPENDENT CLAIM 15 .................................................................. 173
`1.
`Claim 15(Pre): “A speech recognition method,
`comprising:” ......................................................................... 173
`Claim 15(a): “calculating a feature vector from an audio
`signal using an audio front end,” ......................................... 174
`Claim 15(b): “wherein the feature vector comprises a
`plurality of extracted and/or derived quantities from said
`audio signal during a defined audio time frame;” ............... 174
`Claim 15(c): “calculating a distance indicating the
`similarity between a feature vector and a predetermined
`acoustic state of an acoustic model using a calculating
`circuit; and” .......................................................................... 174
`Claim 15(d): “using a search stage to identify words
`within a lexical tree using said calculated distances, the
`lexical tree comprising a model of words;” ......................... 174
`
`2.
`
`3.
`
`4.
`
`5.
`
`Amazon / Zentian Limited
`Exhibit 1003
`Page 5
`
`
`
`2.
`
`3.
`
`4.
`
`5.
`
`6.
`
`6.
`
`G.
`
`Declaration of Christopher Schmandt
`Patent No. 7,979,277
`Claim 15(e): “wherein data is pipelined from the front
`end, to the calculating circuit, and to the search stage.” ..... 175
`INDEPENDENT CLAIM 16 .................................................................. 175
`1.
`Claim 16(Pre): “A non-transitory storage medium
`storing processor implementable code for controlling at
`least one processor to implement a speech recognition
`method, the code comprising:” ............................................. 175
`Claim 16(a): “code for controlling the processor to
`calculate a feature vector from an audio signal,” ................ 176
`Claim 16(b): “wherein the feature vector comprises a
`plurality of extracted and/or derived quantities from said
`audio signal during a defined audio time frame;” ............... 176
`Claim 16(c): “code for controlling the processor to
`calculate a distance indicating the similarity between a
`feature vector and a predetermined acoustic state of an
`acoustic model; and” ............................................................ 177
`Claim 16(d): “code for controlling the processor to
`identify words within a lexical tree using said calculated
`distances, the lexical tree comprising a model of words,” ... 177
`Claim 16(e): “wherein data is pipelined by the processor
`pursuant to the code from the feature calculation, to the
`distance calculation, and to the word identification.” ......... 177
`OPINIONS REGARDING GROUND 2: CLAIM 4 IS OBVIOUS
`OVER JIANG, BAUMGARTNER, BROWN, AND
`KAZEROONIAN ...................................................................................... 177
`A.
`DEPENDENT CLAIM 4 ....................................................................... 177
`1.
`Claim 4: “A speech recognition circuit as claimed
`in claim 1, wherein the first processor supports multi-
`threaded operation, and runs the search stage and front
`ends as separate threads." .................................................... 177
`XI. OPINIONS REGARDING GROUND 3: CLAIMS 9-10 ARE
`OBVIOUS OVER JIANG, BAUMGARTNER, BROWN, AND
`VENSKO ................................................................................................... 182
`A.
`DEPENDENT CLAIM 9 ....................................................................... 182
`1.
`Claim 9: “The speech recognition circuit of claim 1,
`wherein the speech accelerator has an interrupt signal to
`
`X.
`
`Amazon / Zentian Limited
`Exhibit 1003
`Page 6
`
`
`
`B.
`
`2.
`
`3.
`
`4.
`
`Declaration of Christopher Schmandt
`Patent No. 7,979,277
`inform the front end that the accelerator is ready to
`receive a next feature vector from the front end.” ................ 182
`DEPENDENT CLAIM 10 ..................................................................... 188
`1.
`Claim 10: “The speech recognition circuit of claim 1,
`wherein the accelerator signals to the search stage when
`the distances for a new frame are available in a result
`memory.” .............................................................................. 188
`XII. OPINIONS REGARDING GROUND 4: CLAIMS 1, 5, 7, 12,
`AND 14-16 ARE OBVIOUS OVER JIANG, BAUMGARNTER,
`BROWN, AND SMYTH ........................................................................... 189
`A.
`INDEPENDENT CLAIM 1 .................................................................... 189
`1.
`Claim 1(Pre): “A speech recognition circuit,
`comprising:” ......................................................................... 189
`Claim 1(a): “an audio front end for calculating a feature
`vector from an audio signal,” ............................................... 189
`Claim 1(b) “wherein the feature vector comprises a
`plurality of extracted and/or derived quantities from said
`audio signal during a defined audio time frame;” ............... 189
`Claim 1(c): “a calculating circuit for calculating
`distances indicating the similarity between a feature
`vector and a plurality of predetermined acoustic states of
`an acoustic model; and” ....................................................... 189
`a)
`“a calculating circuit” ................................................. 190
`b)
`“a plurality of predetermined acoustic states” ............ 192
`c)
`“[a plurality of predetermined acoustic states] of
`an acoustic model” ...................................................... 196
`“for calculating distances …” ..................................... 200
`Motivation to Modify Jiang-Baumgartner-Brown
`with Smyth .................................................................. 203
`Claim 1(d): “a search stage for using said calculated
`distances to identify words within a lexical tree, the
`lexical tree comprising a model of words;” ......................... 206
`Claim 1(e): “wherein said audio front end and said
`search stage are implemented using a first processor,
`
`d)
`e)
`
`5.
`
`6.
`
`Amazon / Zentian Limited
`Exhibit 1003
`Page 7
`
`
`
`B.
`
`C.
`
`D.
`
`E.
`
`7.
`
`2.
`
`3.
`
`Declaration of Christopher Schmandt
`Patent No. 7,979,277
`and said calculating circuit is implemented using a
`second processor, and” ........................................................ 206
`Claim 1(f): “wherein data is pipelined from the front end
`to the calculating circuit to the search stage.” ..................... 206
`DEPENDENT CLAIM 5 ....................................................................... 207
`1.
`Claim 5: “A speech recognition circuit as claimed
`in claim 1, wherein the said calculating circuit is
`configured to autonomously calculate distances for every
`acoustic state defined by the acoustic model.” ..................... 207
`DEPENDENT CLAIM 7 ....................................................................... 207
`1.
`Claim 7: “The speech recognition circuit of claim 1,
`wherein the feature vector comprises a plurality of
`spectral components of an audio signal for a
`predetermined time frame.” .................................................. 207
`DEPENDENT CLAIM 12 ..................................................................... 207
`1.
`Claim 12: “The speech recognition circuit of claim 1,
`wherein the audio front end is configured to input a
`digital audio signal.” ............................................................ 207
`INDEPENDENT CLAIM 14 .................................................................. 207
`1.
`Claim 14(Pre): “a speech recognition circuit,
`comprising:” ......................................................................... 207
`Claim 14(a): “an audio front end for calculating a
`feature vector from an audio signal,” ................................... 207
`Claim 14(b) “wherein the feature vector comprises a
`plurality of extracted and/or derived quantities from said
`audio signal during a defined audio time frame;” ............... 207
`Claim 14(c): “calculating means for calculating a
`distance indicating the similarity between a feature
`vector and a predetermined acoustic state of an acoustic
`model; and” .......................................................................... 208
`Claim 14(d): “a search stage for using said calculated
`distances to identify words within a lexical tree, the
`lexical tree comprising a model of words;” ......................... 208
`
`4.
`
`5.
`
`Amazon / Zentian Limited
`Exhibit 1003
`Page 8
`
`
`
`F.
`
`G.
`
`6.
`
`2.
`
`3.
`
`4.
`
`Declaration of Christopher Schmandt
`Patent No. 7,979,277
`Claim 14(e): “wherein said audio front end, said
`calculating means, and said search stage are connected
`to each other to enable pipelined data flow.” ....................... 208
`INDEPENDENT CLAIM 15 .................................................................. 208
`1.
`Claim 15(Pre): “A speech recognition method,
`comprising:” ......................................................................... 208
`Claim 15(a): “calculating a feature vector from an audio
`signal using an audio front end,” ......................................... 208
`Claim 15(b): “wherein the feature vector comprises a
`plurality of extracted and/or derived quantities from said
`audio signal during a defined audio time frame;” ............... 209
`Claim 15(c): “calculating a distance indicating the
`similarity between a feature vector and a predetermined
`acoustic state of an acoustic model using a calculating
`circuit; and” .......................................................................... 209
`Claim 15(d): “using a search stage to identify words
`within a lexical tree using said calculated distances, the
`lexical tree comprising a model of words;” ......................... 209
`Claim 15(e): “wherein data is pipelined from the front
`end, to the calculating circuit, and to the search stage.” ..... 209
`INDEPENDENT CLAIM 16 .................................................................. 209
`1.
`Claim 16(Pre): “A non-transitory storage medium
`storing processor implementable code for controlling at
`least one processor to implement a speech recognition
`method, the code comprising:” ............................................. 209
`Claim 16(a): “code for controlling the processor to
`calculate a feature vector from an audio signal,” ................ 209
`Claim 16(b): “wherein the feature vector comprises a
`plurality of extracted and/or derived quantities from said
`audio signal during a defined audio time frame;” ............... 210
`Claim 16(c): “code for controlling the processor to
`calculate a distance indicating the similarity between a
`feature vector and a predetermined acoustic state of an
`acoustic model; and” ............................................................ 210
`
`5.
`
`6.
`
`2.
`
`3.
`
`4.
`
`Amazon / Zentian Limited
`Exhibit 1003
`Page 9
`
`
`
`5.
`
`6.
`
`Declaration of Christopher Schmandt
`Patent No. 7,979,277
`Claim 16(d): “code for controlling the processor to
`identify words within a lexical tree using said calculated
`distances, the lexical tree comprising a model of words,” ... 210
`Claim 16(e): “wherein data is pipelined by the processor
`pursuant to the code from the feature calculation, to the
`distance calculation, and to the word identification.” ......... 210
`XIII. OPINIONS REGARDING GROUND 5: CLAIM 4 IS OBVIOUS
`OVER JIANG, BAUMGARTNER, BROWN, SMYTH, AND
`KAZEROONIAN ...................................................................................... 210
`A.
`DEPENDENT CLAIM 4 ....................................................................... 210
`1.
`Claim 4: “A speech recognition circuit as claimed
`in claim 1, wherein the first processor supports multi-
`threaded operation, and runs the search stage and front
`ends as separate threads” ..................................................... 210
`XIV. OPINIONS REGARDING GROUND 6: CLAIMS 9-10 ARE
`OBVIOUS OVER JIANG, BAUMGARTNER, BROWN, SMYTH,
`AND VENSKO ......................................................................................... 211
`A.
`DEPENDENT CLAIM 9 ....................................................................... 211
`1.
`Claim 9: “The speech recognition circuit of claim 1,
`wherein the speech accelerator has an interrupt signal to
`inform the front end that the accelerator is ready to
`receive a next feature vector from the front end.” ................ 211
`DEPENDENT CLAIM 10 ..................................................................... 211
`1.
`Claim 10: “The speech recognition circuit of claim 1,
`wherein the accelerator signals to the search stage when
`the distances for a new frame are available in a result
`memory.” .............................................................................. 211
`XV. CONCLUSION ........................................................................................ 212
`
`B.
`
`Amazon / Zentian Limited
`Exhibit 1003
`Page 10
`
`
`
`Declaration of Christopher Schmandt
`Patent No. 7,979,277
`
`CLAIM LISTING
`
`Claim 1:
`
`1(Pre) A speech recognition circuit, comprising:
`
`1(a) an audio front end for calculating a feature vector from an audio signal,
`
`1(b) wherein the feature vector comprises a plurality of extracted and/or
`
`derived quantities from said audio signal during a defined audio time frame;
`
`1(c) a calculating circuit for calculating distances indicating the similarity
`
`between a feature vector and a plurality of predetermined acoustic states of an
`
`acoustic model; and;
`
`1(d) a search stage for using said calculated distances to identify words within
`
`a lexical tree, the lexical tree comprising a model of words;
`
`1(e) wherein said audio front end and said search stage are implemented using
`
`a first processor, and said calculating circuit is implemented using a second
`
`processor, and
`
`1(f) wherein data is pipelined from the front end to the calculating circuit to
`
`the search stage.
`
`Claim 4:
`
`A speech recognition circuit as claimed in claim 1, wherein the first processor
`
`supports multi-threaded operation, and runs the search stage and front ends as
`
`separate threads.
`
`Amazon / Zentian Limited
`Exhibit 1003
`Page 11
`
`
`
`Declaration of Christopher Schmandt
`Patent No. 7,979,277
`
`Claim 5:
`
`A speech recognition circuit as claimed in claim 1, wherein the said
`
`calculating circuit is configured to autonomously calculate distances for every
`
`acoustic state defined by the acoustic model.
`
`Claim 7:
`
`The speech recognition circuit of claim 1, wherein the feature vector
`
`comprises a plurality of spectral components of an audio signal for a predetermined
`
`time frame.
`
`Claim 9:
`
`The speech recognition circuit of claim 1, wherein the speech accelerator has
`
`an interrupt signal to inform the front end that the accelerator is ready to receive a
`
`next feature vector from the front end.
`
`Claim 10:
`
`The speech recognition circuit of claim 1, wherein the accelerator signals to
`
`the search stage when the distances for a new frame are available in a result memory.
`
`Claim 12:
`
`The speech recognition circuit of claim 1, wherein the audio front end is
`
`configured to input a digital audio signal.
`
`Claim 14:
`
`14(Pre) A speech recognition circuit, comprising:
`
`Amazon / Zentian Limited
`Exhibit 1003
`Page 12
`
`
`
`Declaration of Christopher Schmandt
`Patent No. 7,979,277
`14(a) an audio front end for calculating a feature vector from an audio signal,
`
`14(b) wherein the feature vector comprises a plurality of extracted and/or
`
`derived quantities from said audio signal during a defined audio time frame;
`
`14(c) calculating means for calculating a distance indicating the similarity
`
`between a feature vector and a predetermined acoustic state of an acoustic model;
`
`and
`
`14(d) a search stage for using said calculated distances to identify words
`
`within a lexical tree, the lexical tree comprising a model of words;
`
`14(e) wherein said audio front end, said calculating means, and said search
`
`stage are connected to each other to enable pipelined data flow.
`
`Claim 15:
`
`15(Pre) A speech recognition method, comprising:
`
`15(a) calculating a feature vector from an audio signal using an audio front
`
`end,
`
`15(b) wherein the feature vector comprises a plurality of extracted and/or
`
`derived quantities from said audio signal during a defined audio time frame;
`
`15(c) calculating a distance indicating the similarity between a feature vector
`
`and a predetermined acoustic state of an acoustic model using a calculating circuit;
`
`and
`
`Amazon / Zentian Limited
`Exhibit 1003
`Page 13
`
`
`
`Declaration of Christopher Schmandt
`Patent No. 7,979,277
`15(d) using a search stage to identify words within a lexical tree using said
`
`calculated distances, the lexical tree comprising a model of words;
`
`15(e) wherein data is pipelined from the front end, to the calculating circuit,
`
`and to the search stage.
`
`Claim 16:
`
`16(Pre) A non-transitory storage medium storing processor implementable
`
`code for controlling at least one processor to implement a speech recognition
`
`method, the code comprising:
`
`16(a) code for controlling the processor to calculate a feature vector from an
`
`audio signal,
`
`16(b) wherein the feature vector comprises a plurality of extracted and/or
`
`derived quantities from said audio signal during a defined audio time frame;
`
`16(c) code for controlling the processor to calculate a distance indicating the
`
`similarity between a feature vector and a predetermined acoustic state of an acoustic
`
`model; and
`
`16(d) code for controlling the processor to identify words within a lexical tree
`
`using said calculated distances, the lexical tree comprising a model of words,
`
`16(e) wherein data is pipelined by the processor pursuant to the code from the
`
`feature calculation, to the distance calculation, and to the word identification.
`
`Amazon / Zentian Limited
`Exhibit 1003
`Page 14
`
`
`
`Declaration of Christopher Schmandt
`Patent No. 7,979,277
`
`I, Christopher Schmandt, declare as follows:
`
`I.
`
`INTRODUCTION AND QUALIFICATIONS
`1.
`I am over the age of 21 and am competent to make this declaration.
`
`A.
`2.
`
`Educational Background and Professional Experience
`I retired several years ago after a 40-year career at the Massachusetts
`
`Institute of Technology (“MIT”); for most of that time I was employed as a Principal
`
`Research Scientist at the Media Laboratory. In that role I also served as faculty for
`
`the MIT Media Arts and Sciences academic program. I was a founder of the Media
`
`Laboratory, a research lab which now spans two buildings.
`
`3.
`
`I received my B.S. degree in Electrical Engineering and Computer
`
`Science from MIT in 1978, and my M.S. in Visual Studies (Computer Graphics) also
`
`from MIT. I was employed at MIT since 1980, initially at the Architecture Machine
`
`Group which was an early computer graphics and interactive systems research lab.
`
`In 1985, I helped found the Media Laboratory and continued to work there until
`
`retirement. I was director of a research group titled “Living Mobile.” My research
`
`spanned distributed communication and collaborative systems, with an emphasis on
`
`multi-media and user interfaces, with a strong focus on speech-based systems. I have
`
`over 70 published conference and journal papers and one book in the field of speech
`
`technology and user interaction.
`
`Amazon / Zentian Limited
`Exhibit 1003
`Page 15
`
`
`
`Declaration of Christopher Schmandt
`Patent No. 7,979,277
`For the first fifteen years of my career, my research emphasized speech
`
`4.
`
`recognition and speech user interfaces. I built the first conversational computer
`
`system utilizing speech recognition and synthesis (“Put That There”) starting in
`
`1980. I continued to innovate speech user interfaces using recognition, text-to-
`
`speech synthesis, and recorded audio in a wide variety of projects. I built one of the
`
`first graphical user interfaces for audio editing, employing keyword recognition on
`
`voice memos in 1982 (Intelligent Ear). I built the first research-grade unified
`
`messaging system, which combined text and voice messages into a single inbox,
`
`with speech recognition over the phone for remote access, and a graphical user
`
`interface for desktop access in 1983 (Phone Slave). Along with my students we built
`
`the first system for real time spoken driving directions, including speech-accessible
`
`maps of Cambridge, Massachusetts in 1987 (Back Seat Driver). We built some of
`
`the earliest speech-based personal assistants for managing messages, calendar,
`
`contacts, etc. (Conversational Desktop 1985, Chatter 1993, MailCall 1996). We built
`
`quite a few systems employing speech recognition in handheld mobile devices
`
`(ComMotion 1999, Nomadic Radio 2000, Impromptu 2001, and Symphony 2004,
`
`for example). We applied speech recognition to large bodies of everyday
`
`conversations captured with a wearable device and utilized as a memory aid
`
`(Memory Prosthesis 2004). We used speech recognition on radio newscasts to build
`
`a personalized version of audio newscasts (Synthetic News Radio, 1999) and also
`
`Amazon / Zentian Limited
`Exhibit 1003
`Page 16
`
`
`
`Declaration of Christopher Schmandt
`Patent No. 7,979,277
`investigated adding speech recognition to a mouse-based window system a few years
`
`earlier.
`
`5.
`
`I was later awarded the prestigious Association for Computing
`
`Machinery (ACM) Computer Human Interface (CHI) Academy membership
`
`specifically for those years of work pioneering speech user interfaces.
`
`6.
`
`In the course of my research, I built a number of speech recognition
`
`client/server distributed systems, with the first being in 1985. Much of the initial
`
`motivation for a server architecture was that speech recognition required expensive
`
`digital signal processing hardware that we could not afford to put on each computer,
`
`so a central server with the required hardware was used. Later versions of the speech
`
`recognition server architecture allowed certain computers to perform specialized
`
`tasks serving a number of client computers providing voice user interfaces, either on
`
`screens or over telephone connections.
`
`7.
`
`Because of my early work with distributed speech systems, I served for
`
`several years in the mid-1990s with a working group on the impact of multimedia
`
`systems on the Internet reporting to the Internet Engineering Task Force (IETF) and
`
`later the Internet Activities Board (IAB). This work impacted emerging standards
`
`such as Session Initiation Protocol (SIP).
`
`8.
`
`In my faculty position I taught graduate level courses in speech
`
`technology and user interaction design, and directly supervised student research and
`
`Amazon / Zentian Limited
`Exhibit 1003
`Page 17
`
`
`
`Declaration of Christopher Schmandt
`Patent No. 7,979,277
`theses at the Bachelors, Masters, and PhD level. I oversaw the Masters and PhD
`
`thesis programs for the entire Media Arts and Sciences academic program during
`
`my more senior years. I also served on the Media Laboratory intellectual property
`
`committee for many years.
`
`II. METHODOLOGY: MATERIALS CONSIDERED
`9.
`I have relied upon my education, knowledge and experience with
`
`speech technology and speech recognition systems, as well as the other materials as
`
`discussed in this declaration in forming my opinions.
`
`10.
`
`For this work, I have been asked to review U.S. Patent No. 7,979,277
`
`(“the ’277 Patent”) (Ex. 1001) including the specification and claims, and the ’277
`
`Patent’s prosecution history (“’277 File History”) (Ex. 1002). In developing my
`
`opinions relating to the ’277 Patent, I have considered the materials cited or
`
`discussed herein, incl