throbber

`
`
`
`
`
`
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`____________
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
` ____________
`
`APPLE INC.,
`
`Petitioner
`
`v.
`
`PARUS HOLDINGS, INC.,
`
`Patent Owner
`____________
`
`IPR2020-00686
`
`Patent No. 7,076,431
`
`AND
`
`IPR2020-00687
`
`Patent No. 9,451,084
`
` ____________
`
`
`
`SUPPLEMENTAL DECLARATION OF DR. LOREN TERVEEN
`
`
`
`
`
`IPR2020-00686 and IPR2020-00687
`
`EX1040 Page 1
`
`
`
`
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`
`I, Dr. Loren Terveen, hereby declare the following:
`
`I.
`
`INTRODUCTION
`1.
`
`I have been asked to respond to certain issues raised by Patent Owner
`
`in Patent Owner’s Response dated December 23, 2020 (“POR”). All of my opinions
`
`expressed in my original declaration (Ex. 1003) remain the same. I have reviewed
`
`the relevant portions of the POR (Paper 15) and the relevant portions of Mr.
`
`Occhiogrosso’s declaration (Ex. 2025) and deposition transcript (Ex. 1039) in
`
`connection with preparing this supplemental declaration. References to opinions of
`
`the ’431 Patent below are intended as equally applicable to the ’084 Patent.
`
`II. OPINIONS
`A. A Two-Step Speech Recognition Process Is Described in Both the
`’431 and ’084 Patents and Ladd
`
`2.
`
`As I discussed in my original declaration (Ex. 1003) at ¶¶ 81-83, Ladd
`
`teaches a system for retrieving information by uttering speech commands into a
`
`voice enabled device and for providing information retrieved from an information
`
`source, such as “web pages” or “web sites.” Specifically, Ladd’s system is an IVR
`
`(Interactive Voice Response) system that may answer a question, such as “what is
`
`the weather” from a web site in response to a spoken user request. Ex. 1003, ¶ 78,
`
`81-82 citing Ladd, 2:19-64, 3:7-53, 9:1-21. In an IVR system, including specifically
`
`Ladd’s, the computing system must determine the content of at least some of the
`
`speech uttered by the user in order to identify desired information for retrieval from
`
`
`
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 2
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`an appropriate information source. For example, when a user inquires about the
`
`current weather in Chicago, the system must determine the key words “weather” and
`
`“Chicago” were spoken and by comparison to the grammar, determine the command
`
`corresponding to the spoken words, i.e., that the user is commanding to retrieve
`
`Chicago’s weather. Ladd, 2:48-54, 4:64-5:11, 8:23-25, 10:3-11, 11:50-64, 38:4-16.
`
`This is in contrast to Mr. Occhiogrosso’s description of mere transcribing of free
`
`speech that may occur in some systems, where spoken utterances are transformed
`
`from audio messages into text and stored in memory, but no content is determined
`
`for any transcribed words. Ex. 1039, Occhiogrosso Dep. Tr., 39:10-40:22.
`
`3.
`
`In order for an IVR system to act upon user speech, it must perform two
`
`distinct steps. In the first step, the speech recognition device simply transforms the
`
`sound wave into text. Ex. 1039, 33:11-16, 49:5-19. At this juncture, the speech
`
`recognition device has not yet determined any content of what was said, i.e., what
`
`instruction is being commanded; it has merely generated a textual data message. Id.
`
`For example, a speech recognition device that has performed only this first step may
`
`generate the character string “weather” after the word “weather” was spoken, but the
`
`device does not yet know what to do in response to the character string “weather.”
`
`There are a number of methods by which a system may perform this first step of
`
`converting the spoken words into text, but Ladd is not specific on how it requires
`
`step one to occur. I note that Mr. Occhiogrosso also agrees there are various speech
`
`
`
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 3
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`recognition algorithms to recognize the user’s speech and convert into text. Ex.
`
`1039, 54:6-16.
`
`4.
`
`It is not until the second step of content recognition of the spoken
`
`speech that a speech recognition device determines the content of the spoken words
`
`(e.g., determining that the user uttered “weather” and is therefore instructing the IVR
`
`system to retrieve and respond with the current weather). Mr. Occhiogrosso agreed
`
`with this during his deposition in differentiating between the first step of converting
`
`speech into text and the second step of using a recognition grammar to “address[]
`
`what words are.” Id. at 50:17–51:8. Speech recognition devices that do not determine
`
`the content of transcribed words cannot act in response to the spoken words. Id. at
`
`40:13-22. (Mr. Occhiogrosso opining that when the user is “simply speaking and
`
`there is no higher order context of a recognition grammar that meters or governs the
`
`speech, then the speech recognition engine will dutifully translate what the user is
`
`speaking into text” and that “free speech” or “free text” is “effectively a dictation
`
`application with no imposed recognition grammar”). Systems such as the ’431 Patent
`
`and Ladd must perform both steps to act upon a spoken command to retrieve desired
`
`information, namely the steps of (1) converting speech utterances into text words,
`
`and (2) comparing the textual words to grammar to determine the content of the
`
`spoken command. As I explain further below, the statement in the ’431 Patent that
`
`it “recognizes spoken words without using predefined voice patterns” is
`
`
`
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 4
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`characterizing a method of performing the first step of speech recognition
`
`(transforming speech to text). In contrast, Ladd’s description of determining a
`
`“speech pattern” is characterizing a method of performing the second step of speech
`
`recognition (determining the content of the text). ’431 Patent, 4:38-43; Ladd, 9:27-
`
`44. I further note this second step is recited in the claims of the ’431 Patent at
`
`Limitations 1(f)-1(h), which recite the recognition grammar, that the speech
`
`command comprises an information request selectable by the user, and selecting the
`
`recognition grammar upon receiving the speech command.
`
`5.
`
`The ’431 Patent confirms the two-step process. Specifically, the speech
`
`recognition engine 300 “converts voice commands received from the user’s voice
`
`enabled device 112…into data messages.” ’431 Patent, 6:4-8. “The media server
`
`106 uses the speech recognition engine 300 to interpret the speech commands
`
`received from the user. Based upon these commands, the media server 106 retrieves
`
`the appropriate web site record 200 from the database 100.” Id. at 16:3-7. Therefore,
`
`the ’431 Patent describes a system where the speech commands are converted into
`
`data messages, i.e., text, and then the converted speech commands are interpreted to
`
`determine what web site record to retrieve.
`
`6.
`
`Ladd also confirms its system performs a two-step speech recognition
`
`process, stating: “The STT unit 256 of the VRU server 234 receives speech inputs
`
`or communications from the user and converts the speech inputs to textual
`
`
`
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 5
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`information (i.e., a text message). The textual information can be sent or routed to
`
`the
`
`communication
`
`devices 201, 202, 203 and 204,
`
`the
`
`content
`
`providers 208 and 209, the markup language servers, the voice browser, and the
`
`application server 242.” Ladd, 9:11-54, 10:3-20, 38:4-16. Ladd teaches the VRU
`
`server 234, which includes the ASR unit 254. As I discussed in ¶¶ 90-91 and 110 of
`
`my original Declaration, the ASR unit 254 is a speaker independent speech
`
`recognition device, as recited in the claims of the ’431 and ’084 Patents. I further
`
`note that Ladd teaches a VRU client 232 that is connected to the VRU server 234.
`
`Ladd, 8:3-5. The VRU client is part of the communication node 212, e.g., a mobile
`
`phone. See Ex. 1003, ¶ 91, citing Ladd, 7:28-33, Fig. 3; “The VRU client 232
`
`processes speech communications…from the user.” Ladd, 8:5-7. Ladd further
`
`teaches the VRU client 232 “routes the speech communications to the VRU server
`
`234.” Ladd, 8:7-9. Ladd teaches “It will be recognized that the VRU client 232 can
`
`be integrated with the VRU server.” Ladd, 8:10-11. The VRU client 232 includes
`
`voice communications boards that include a voice recognition unit having a
`
`vocabulary “for detecting a speech pattern (i.e., a key word or phrase).” Ladd, 8:19-
`
`28. Ladd further explains:
`
`The VRU server 234 receives speech communications from the user via
`the VRU client 232. The VRU server 234 processes the speech
`communications and compares the speech communications against a
`vocabulary or grammar stored in the database server unit 244 or a
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 6
`
`
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`memory device. The VRU server 234 provides output signals,
`representing the result of the speech processing, to the LAN 240. The
`LAN 240 routes the output signal to the call control unit 236, the
`application server 242, and/or
`the voice browser 250. The
`communication node 212 then performs a specific function associated
`with the output signals.
`
`Ladd, 8:55-67. Ladd then goes on to discuss the VRU server 234 including various
`
`components, including the ASR unit 254. Ladd, 9:1-3. Ladd specifically discloses
`
`the ASR unit determines whether a speech pattern matches any stored grammar or
`
`vocabulary. Ladd, 9:32-36. As I also discuss in ¶ 8 below, Ladd teaches using
`
`various grammars to “interpret the user’s response,” substantially similar to the ’431
`
`Patent. Ladd, 19:24-26.
`
`7.
`
`Reading the above-discussed disclosures collectively, it is my
`
`understanding that the user’s speech communications are received by the VRU client
`
`232 and transmitted to the VRU server 234, which then processes the user’s speech
`
`communications. Specifically, the speech communications are compared against a
`
`vocabulary or grammar. Based on the comparison, the communication node
`
`performs specific functions. Therefore, in my opinion, because the VRU client 232
`
`may be integrated in the VRU server 234, and the VRU server 234 includes the ASR
`
`unit 254 (which I discussed at ¶¶ 91-93, 110 of my previous declaration), the
`
`
`
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 7
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`comparison of the user’s speech communications to the vocabulary or grammar is
`
`performed by the ASR unit.
`
`8.
`
`Ladd also teaches a GRAMMAR input “used to specify an input
`
`grammar when interpreting the user’s responses.” Ladd, 20:48-58. Prior to
`
`interpreting the user’s responses using the grammar input, Ladd teaches collecting
`
`input from the user and “convert[ing] the input to text using the speech to text unit”
`
`and then sending the text to the markup language server (which proceeds to perform
`
`the functions instructed by the user’s spoken words). Ladd, 20:5-10; see also id. at
`
`20:20-21 (“The FORM input makes use of the speech to text unit to convert user
`
`input to text.”), 20:23-27 (discussing that if the user said “John Smith,” then the text
`
`string “john smith” would be sent to the server). Thus, user’s responses are
`
`interpreted using input grammars for various categories described throughout Ladd,
`
`including, for example, a DATE input grammar for interpreting dates (Ladd, 19:22-
`
`26), and a MONEY input grammar for interpreting a user’s response related to the
`
`input of money (Ladd, 21:61-64). In each example, the user’s speech is transformed
`
`to text, and then the system to determines how to interpret the input (i.e., selecting
`
`a recognition grammar). Ladd, 19:22-26, 21:61-64. Therefore, Ladd teaches, in my
`
`opinion, performing a first step of converting the speech input into text. After the
`
`speech input is converted into text, the Ladd system interprets the user commands
`
`by identifying key words. These key words are, per Ladd, speech patterns.
`
`
`
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 8
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`I also note Ladd describes using a commercially-available product from
`
`9.
`
`a company called Nuance to transform speech into text. This is the same
`
`commercially-available product from Nuance that the ’431 and ’084 Patents
`
`describe, further indicating to me that Ladd performs teaches a speaker independent
`
`speech recognition device substantially similar to the speaker independent speech
`
`recognition device described and claimed in the ’431 Patent. Ex. 1001, 6:4-24; Ladd,
`
`8:23-28.
`
`B.
`Ladd Equates a “Grammar” with a “Vocabulary”
`10. Ladd repeatedly equates a grammar and vocabulary, using the terms
`
`interchangeably. Ladd, 4:22-25, 6:25-29, 9:32-35, 10:12-14. I also note Mr.
`
`Occhiogrosso agreed on the equivalence relationship between a grammar and
`
`vocabulary. Ex. 1039, 17:21-18:3, 19:8-12. Thus, a PHOSITA would have
`
`recognized that Ladd’s meaning of “grammar” generally equates with its meaning
`
`of “vocabulary.” Notably, Mr. Occhiogrosso also stated that grammars don’t “have
`
`anything to do” with [predefined] voice patterns, expressly stating that they are not
`
`correlated. Id. at 31:9-11, 32:24–33:1.
`
`C. Ladd Defines a Speech/Voice Pattern as a Key Word or Key Phrase
`11. As I discussed in my previous declaration, Ladd expressly identifies its
`
`system as providing “speaker independent automatic speech recognition of speech
`
`inputs,” and processing the speech inputs “to determine whether a word or speech
`
`
`
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 9
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`pattern matches any of the [stored] grammars or vocabulary.” Ex. 1003, ¶ 90 citing
`
`Ladd, 9:28-44, 8:19-28. Specifically, Ladd states that its system “may include a
`
`voice recognition system engine having a vocabulary for detecting a speech pattern
`
`(i.e. a key word or phrase).” Ladd, 8:23-25. Thus, when the user utters a spoken
`
`command, the two steps of speech recognition discussed above are performed. First,
`
`the sound of the spoken command is transformed into text. Ladd refers to words
`
`spoken by a user as “speech communications” or “speech inputs.” Ladd, 8:58-61,
`
`9:28-30. The textual version of the words is then compared to a “vocabulary” or
`
`“grammar” to identify key words or key phrases invoking functions in the system.
`
`Ladd, 9:28-38, 10:3-20. Specifically, Ladd states that “the ASR unit 254 sends an
`
`output signal to implement the specific function associated with the recognized voice
`
`pattern.” Ladd, 9:35-38. The speech/voice pattern, i.e., the key word or phrase
`
`corresponding to the spoken command, is only recognized in the second step of the
`
`process, after the speech inputs have been transformed into text.
`
`12.
`
`In my opinion, Ladd provides an express definition of a speech or voice
`
`pattern as a key word or phrase. As discussed in my previous declaration, Ladd states
`
`“…a speech pattern (i.e. a key word or phrase).” Ladd, 8:23-25; Ex. 1003, ¶¶ 106-
`
`107, 111-112. Here the “i.e.” means “in other words” or “that is,” which I understand
`
`to mean the key word and key phrase are being presented as other words for or a
`
`definition of a “speech pattern.”
`
`
`
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 10
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`It is also my opinion a PHOSITA would have understood Ladd’s
`
`13.
`
`teaching of “…a speech pattern (i.e. a key word or phrase)” (Ladd, 8:23-25) means
`
`the “key” is modifying both “word” and “phrase,” meaning Ladd is searching for a
`
`key word or key phrase.
`
`14. My opinion that Ladd provides an express definition of speech/voice
`
`pattern as a key word or phrase is consistent with multiple other teachings in Ladd,
`
`including in context of the overall sentence at 8:23-25: “The voice communication
`
`boards may include a voice recognition engine having a vocabulary for detecting a
`
`speech pattern (i.e., a key word or phrase).” Here, Ladd is stating the voice
`
`recognition engine (1) has a vocabulary, (2) the vocabulary is used to detect a speech
`
`pattern, and (3) the speech pattern is a key word or phrase. Shortly after this teaching
`
`in Ladd, Ladd further explains that the speech inputs are compared against a
`
`“vocabulary” or “grammar” to detect key words or phrases. Ladd, 8:58-61, 10:3-20.
`
`Matching of the speech patterns to the grammar or vocabulary is also discussed at
`
`9:28-44.
`
`15. The understanding that Ladd’s speech/voice patterns are key words or
`
`phrases is further confirmed in the discussion at 9:28-44 (which I discussed in my
`
`original declaration, Ex. 1003, ¶¶ 105-108):
`
`The ASR unit 254 of the VRU server 234 provides speaker independent
`automatic speech recognition of speech inputs or communications from
`
`
`
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 11
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`the user. It is contemplated that the ASR unit 254 can include speaker
`dependent speech recognition. The ASR unit 254 processes the speech
`inputs from the user to determine whether a word or a speech pattern
`matches any of the grammars or vocabulary stored in the database
`server unit 244 or downloaded from the voice browser. When the ASR
`unit 254 identifies a selected speech pattern of the speech inputs, the
`ASR unit 254 sends an output signal to implement the specific function
`associated with the recognized voice pattern. The ASR unit 254 is
`preferably a speaker independent speech recognition software package,
`Model No. RecServer, available from Nuance Communications. It is
`contemplated that the ASR unit 254 can be any suitable speech
`recognition unit to detect voice communications from a user.
`
`Here, Ladd is explaining the automatic speech recognition unit (ASR unit) processes
`
`the speech inputs to determine whether a speech pattern matches any stored grammar
`
`or vocabulary. If there is a match, i.e., the user spoke a speech pattern (that is, a key
`
`word or phrase) matching a grammar/vocabulary, then the ASR unit outputs a signal
`
`associated with the word selected by the user, i.e., the key word or phrase spoken by
`
`the user. Ladd provides several examples of the user speaking a key word or phrase
`
`to thereby make a selection. For instance, Ladd describes a process by which its IVR
`
`system may conduct a dialog with a user concerning selecting a desired soda. Ladd,
`
`17:1-27, 23:40-44, Claim 8. The user’s spoken words are matched against a set of
`
`key words for sodas including Coke, Pepsi, 7Up, and root beer. Ladd, 17:1-35.
`
`Depending on the key word detected, a next step in the browser is selected by the
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 12
`
`
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`system. Ladd, 23:40-44, Claim 8. Thus, the key word or phrase matching the
`
`grammar or vocabulary (i.e., step 2) causes the ASR unit to “send an output signal
`
`to implement the specific function associated with the recognized voice pattern,”
`
`determining the system’s response. Ladd, 9:36-39. Consequently, by speaking a key
`
`word or phrase, e.g., by saying “Coke,” the user selects a speech pattern (the key
`
`word “Coke), and the ASR identifies the selected speech pattern (the key word
`
`“Coke”) by matching the speech inputs to the grammar/vocabulary. The “selected
`
`speech pattern” discussed at Ladd, 9:36-38 is the speech pattern, i.e., the key word
`
`or phrase, selected to be spoken by the user.
`
`16. Reading at least these collective disclosures (discussed in the above two
`
`paragraphs) together, a PHOSITA would reasonably understand the user’s speech
`
`inputs that are converted to text are then compared to a vocabulary/grammar to detect
`
`a speech pattern, where the speech pattern is a key word or phrase.
`
`17.
`
`I also note Ladd uses the phrase “speech pattern” and “voice pattern”
`
`elsewhere. Ladd, 4:15-18, 6:50-57. In my opinion, each of these discussions is
`
`consistent with my opinion that Ladd uses the phrase speech/voice pattern to mean
`
`a key word or phrase. In the discussion at 4:15-18, Ladd is discussing speaker-
`
`dependent speech recognition where the speaker is identified by “detecting a unique
`
`speech pattern”: “The system may also identify the user by detecting a unique speech
`
`
`
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 13
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`pattern from the user (i.e., speaker verification) or a PIN entered using voice
`
`commands or DTMF tones.” Ladd includes a similar teaching at 6:50-57:
`
`the electronic network 206 from a
`the user accesses
`When
`communication device not registered with the system (i.e., a payphone,
`a phone of a non-subscriber, etc.), the node answers the call and
`prompts the user to enter his or her name and/or a personal
`identification number (PIN) using speech commands or DTMF tones.
`The node can also utilize speaker verification to identify a particular
`speech pattern of the user.
`
`Ladd, 6:50-57. In my opinion, these disclosures (4:15-18, 6:50-57) discussing a
`
`speech pattern are referring to the user uttering a unique key word or phrase to
`
`identify the user to the device. For example, the sentence at 4:15-18 is discussing
`
`that the system may identify the user and provides examples for identification as the
`
`unique speech pattern from the user or a PIN. A PIN is commonly understood as a
`
`unique identifier. Similarly, a unique key word or phrase, i.e., the disclosed “unique
`
`speech pattern,” would also have been understood to identify the user, akin to the
`
`user saying a password. Therefore, in my opinion, Ladd’s disclosure at 4:15-18 is
`
`describing a circumstance where the user can identify himself or herself to the
`
`system by saying a unique key word or phrase or by saying a PIN.
`
`18.
`
`I note Ladd, 4:15-18 is an example of speaker-dependent speech
`
`recognition, but only insofar as the user is speaking a unique key word or phrase,
`
`
`
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 14
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`i.e., the unique speech pattern, to identify the user to the device. That is, there is
`
`nothing in Ladd that indicates the user’s unique voice attributes, akin to voice
`
`printing, is being identified in Ladd. The user’s spoken words are recognized and
`
`converted into text, but it is the content recognition of determining the user spoke a
`
`unique speech pattern that actually identifies the user to the device. This method of
`
`user identification makes sense within the context of Ladd, which is intended for use
`
`by users from any network-enabled device. Ladd, 2:40-47.
`
`19. Similarly, the discussion at 6:50-57 is stating the user can identify
`
`himself or herself to the system by entering a PIN. The following sentence is “[t]he
`
`node can also use speaker verification” to identify a “particular speech pattern” of
`
`the user. Similar to the disclosure at 4:15-18, I understand this section of Ladd to be
`
`explaining that that user can verify his/her identity to the system based on a particular
`
`speech pattern, i.e., a particular key word or phrase the user speaks to the system.
`
`20. Ladd also uses the term speech/voice pattern at 6:29-34, which states:
`
`The node 212 can provide various dialog voice personalities (i.e., a
`female voice, a male voice, etc.) and can implement various grammars
`(i.e., vocabulary) to detect and respond to the audio inputs from the
`user. In addition, the communication node can automatically select
`various speech recognition models (i.e., an English model, a Spanish
`model, an English accent model, etc.) based upon a user profile, the
`user’s communication device, and/or the user’s speech patterns. The
`
`
`
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 15
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`communication node 212 can also allow the user to select a particular
`speech recognition model.
`
`Ladd, 6:29-34. Here, Ladd is discussing speech recognition models that recognize
`
`the English language (the “English model”), the Spanish language (the “Spanish
`
`model”) or users who speak with an English accent (“the English accent model”).
`
`Ladd discloses the speech recognition model could be selected based on the user
`
`profile. In my opinion, selection of the speech recognition model based on the user
`
`profile may occur in instances where the user previously indicated their native is
`
`English or Spanish or where they indicated they were born in or lived in a country
`
`that speaks with an English accent (e.g., the UK). Ladd also discloses the speech
`
`recognition model could be selected based on the user’s communication device,
`
`which indicates to me a geographic region in which the user is located. Finally, Ladd
`
`discloses the speech recognition model could be based on the user’s speech patterns.
`
`Because we know from Ladd, 8:23-25 that the speech patterns or key words or
`
`phrases, then I understand this disclosure at Ladd, 6:29-34 to be stating that that
`
`based on key words or phrases the user speaks, a certain speech recognition model
`
`is selected. Although not detailed in Ladd, an example that would reasonably be
`
`contemplated in view of Ladd’s disclosure is the user speaking a particular word is
`
`“pesos” instead of “dollars” in response to a prompt of “How much would you like
`
`to deposit?” See Ladd, 22:4-19 (describing a MONEY input where the user is
`
`
`
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 16
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`depositing money via the IVR). In this instance, if the user says “pesos,” which is
`
`recognized as the currency of Mexico, then the system would identify the speech
`
`pattern of “peso” as a key word indicating the Spanish model with a Spanish speech
`
`recognition engine should be employed with the user.
`
`21. My opinion is further confirmed by Ladd’s disclosure at 4:20-36, which
`
`describes selecting a grammar and a personality based on various factors, including
`
`the accent of the caller. Selection of a grammar based on the accent of the caller
`
`indicates to me the Ladd IVR system is advanced enough to recognize an accent and
`
`select a particular grammar based on the accent. Thus, one example in Ladd is an
`
`English accent. British-English speakers often use different terms to identify a thing
`
`that American-English speakers, such as “pram” in Britain versus “stroller” in
`
`America. Therefore, should the user be recognized as having an English accent, I
`
`understand Ladd’s disclosure at 4:20-36 as selecting a grammar based on the English
`
`accent. For example, the selected grammar may then recognize “pram” as the
`
`equivalent of a stroller if spoken by the user.
`
`22.
`
`In sum, each of the disclosures in Ladd that reference a speech/voice
`
`pattern inform me that Ladd consistently uses the term to describe a key word or
`
`phrase, where recognition of the spoken word as a key word or phrase is determined
`
`by matching the voice pattern against the grammar or vocabulary. Additionally, in
`
`each instance Ladd uses the phrase “speech pattern” or voice pattern,” a PHOSITA
`
`
`
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 17
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`would have understood Ladd’s use of a “speech pattern” or “voice pattern” is not
`
`the same as the “predefined voice pattern” that the ’431 and ’084 Patents purportedly
`
`exclude. Ex. 1001, 4:30-43. Instead, Ladd’s “speech patterns” are key words or
`
`phrases determined using grammar (or vocabulary), which is not only allowed by
`
`the ’431 and ’084 Patents but is explicitly performed in at least one example
`
`described by the patents. See, e.g., Ex. 1001, 6:44-56. This conclusion regarding the
`
`distinction between the “predefined voice pattern” excluded by the ’431 and ’084
`
`Patents and the “speech pattern” of Ladd would have been recognized by a
`
`PHOSITA by any of the reasons I discuss here, each supported by evidence in the
`
`’431 and ’084 Patents, Ladd, and opinions from myself and/or Mr. Occhiogrosso.
`
`D. Ladd Does Not Include Any Disclosure Indicating the Disclosed
`Speech/Voice Patterns Are Spectral Energy as a Function of Time
`23. Parus’s construction of a “speaker-independent speech recognition
`
`device” requires a “speech recognition device that recognizes spoken words without
`
`using predefined voice patterns.” Paper 15, 21-24. Mr. Occhiogrosso explained in
`
`his deposition that it is Parus’s position that the ’431 Patent’s meaning of these
`
`“predefined voice patterns” being excluded are “a word or utterance, and its spectral
`
`energy—typically—spectral energy as a function of time.” Ex. 1039, Dep. Tr.,
`
`25:12-17, 25:22–26:13, 30:10-16.
`
`24.
`
`In my opinion, no description of spectral energy (as a function of time
`
`or otherwise) appears in Ladd explicitly nor would have been understood by a
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 18
`
`
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`PHOSITA to have been performed by Ladd implicitly or inherently. None of the
`
`citations I discussed above where Ladd uses the phrase “speech pattern” or “voice
`
`pattern” indicate to me that Ladd is using the phrase to mean spectral energy over
`
`time. Moreover, I went through each use of the phrase in Ladd above and provided
`
`an explanation for why the phrase is consistently used in Ladd to mean a key word
`
`or phrase. Additionally, there is no disclosure in Ladd that would, in my opinion,
`
`teach or suggest to a PHOSITA that Ladd even converts speech to text using the
`
`spectral energy of speech input as a function over time. Ladd does not detail how the
`
`speech is converted into text and instead just states that the user’s speech inputs are
`
`converted into text. See, e.g., Ladd, 9:45-54. This is understandable within the
`
`context of Ladd, where the discussion focuses on Ladd’s advanced IVR system and
`
`not mere speech recognition of converting speech into text, which was well-known
`
`at the time of Ladd and discussed in the Background of my original Declaration.
`
`25. Returning to the previous discussion of the two steps of speech and
`
`command recognition, the difference between the ’431 and ’084 Patents’
`
`“predefined voice patterns” and Ladd’s “speech patterns” is, in my opinion, evident:
`
`spectral energy as a function of time is one method by which a system might
`
`transform audible sound into text (and is thus one method of performing the first
`
`step of speech recognition), while Ladd’s “speech pattern” merely refers to detection
`
`
`
`IPR2020-00686 and IPR2020-00687
`
` EX1040 Page 19
`
`

`

`Supplemental Declaration of Dr. Loren Terveen
`U.S. Patent No. 9,451,084
`of key words to determine content (a method of performing the second step of
`
`recognizing a command).
`
`E.
`Sequential Access of Websites
`26. Mr. Occhiogrosso opines that “sequential access of websites is very
`
`different from sequential access of a database.” Ex. 2025, 94-95. This is incorrect,
`
`as I’ve discussed in my previous declaration. Ex. 1003, ¶¶ 103-104, 122. Both a
`
`website and a database electronically store information for access via a network
`
`using network addresses. See, e.g., Ex. 1004, 11:50-63; Ex. 1006, 9:33-44. Ladd
`
`expressly states that one of its “content sources” may include “a database, scripts,
`
`and/or markup language documents or pages,” illustrating to a PHOSITA that the
`
`databases and web pages are treated similarly in Ladd as sources of content, and that
`
`methods applied to databases would have been appropriate for application to website
`
`searches as well. Ex. 1004, 11:50-63. The frequency with which a content source,
`
`be it a website or database, is updated is irrelevant to the process taught by Ladd in
`
`view of Kurosawa and Goedken, in which the algorithm simply continues searching
`
`until information to be retrieved is found. If the information is not found on any
`
`particular website, or if that website is unavailable, the algorithm will continue until
`
`the information to be retrieved is found.
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket