`
`Paper No. ___
`Filed: May 21, 2021
`
`
`
`
`
`
`
`
`
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`_____________________________
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`
`_____________________________
`
`AMAZON.COM, INC.; AMAZON.COM LLC; AMAZON WEB
`SERVICES, INC.; A2Z DEVELOPMENT CENTER, INC. D/B/A LAB126;
`RAWLES LLC; AMZN MOBILE LLC; AMZN MOBILE 2 LLC;
`AMAZON.COM SERVICES, INC. F/K/A AMAZON FULFILLMENT
`SERVICES, INC.; and AMAZON.COM SERVICES LLC (formerly
`AMAZON DIGITAL SERVICES LLC),
`Petitioners,
`
`v.
`
`VB ASSETS, LLC,
`Patent Owner.
`_____________________________
`
`Case IPR2020-01367
`Patent No. 8,073,681
`_____________________________
`
`
`PATENT OWNER’S RESPONSE
`PURSUANT TO 37 C.F.R. § 42.120
`
`
`
`B.
`
`INTRODUCTION ........................................................................................... 1
`I.
`PERSON OF ORDINARY SKILL IN THE ART .......................................... 3
`II.
`III. GROUND 1: CLAIMS 1-2, 9, 13-14, 21, 25-26, AND 33 ARE NOT
`RENDERED OBVIOUS BY KENNEWICK ................................................. 4
`A.
`The petition fails to establish that Kennewick renders obvious
`“the utterance includes one or more words that have different
`meanings in different contexts” ............................................................. 4
`The petition fails to establish that Kennewick renders obvious
`“short-term shared knowledge” ............................................................. 7
`1. Automated speech recognition (“ASR”), Natural Language
`Understanding (“NLU”), and User Modeling (“UM”) .................. 8
`2. Kennewick’s tagged recognized words are not short-term
`shared knowledge ......................................................................... 12
`IV. GROUNDS 2 THROUGH 6: CLAIMS 3-8, 10-12, 15-20, 22-24, 27-
`32, AND 34-36 ARE NOT OBVIOUS ......................................................... 18
`CONCLUSION .............................................................................................. 19
`
`V.
`
`i
`
`
`
`I.
`
`INTRODUCTION
`Amazon.com, Inc., et al. (“Petitioners”) filed a petition for inter partes
`
`review of claims 1-42 of U.S. Patent No. 8,073,681 (the “’681 patent,” EX1001).
`
`The Board issued its decision instituting trial (“Decision,” Paper 7) on February 4,
`
`2021, finding that Petitioners had established a reasonable likelihood of success on
`
`only claims 37, 39, and 41. VB Assets, LLC (“Patent Owner”) hereby requests
`
`that the Board now issue a final written decision rejecting all grounds of challenge
`
`that are not moot and to confirm that claims 1-36 are not unpatentable.1
`
`As already recognized by the Board, the petition’s arguments with respect to
`
`the three remaining independent claims 1, 13, and 25 are flawed at least because
`
`the petition fails to identify prior art disclosure of interpreting one or more words
`
`that have “different meanings in different contexts.” Petitioners rely on two
`
`disclosures in the Kennewick application. In the first, Kennewick teaches that the
`
`word “temperature” within different dialogs can indicate different contexts—e.g.
`
`the context of weather or temperature. Kennewick never says there is an
`
`
`1 Patent Owner disclaimed claims 37-42 by filing a Statutory Disclaimer under
`
`37 C.F.R. § 1.321(a) on May 17, 2021, a copy of which is submitted as Exhibit
`
`2005. This response addresses the grounds of unpatentability not rendered moot
`
`by Patent Owner’s disclaimer.
`
`1
`
`
`
`ambiguity in the word “temperature” within a single utterance. Moreover, as the
`
`Board already recognized, the word “temperature” has the same meaning whether
`
`referring to body temperature or outdoor temperature—the “measurement of
`
`warmth or coolness.” Decision, 14. Similarly, as the Board also understood in the
`
`institution decision, Petitioners’ reliance on Kennewick’s discussion of the term
`
`“flight one hundred and twenty t[w]o” is misplaced because that term is only
`
`disclosed in a single context. Id., 14-15. This shortcoming is fatal to Petitioners’
`
`challenge to all claims remaining in the ’681 patent.
`
`The Board need look no further than the deficiency already recognized in the
`
`institution decision to reject Petitioners’ challenge at this stage as well.
`
`Nevertheless, the petition suffers from an additional significant flaw. Petitioners
`
`argue that Kennewick’s tagged recognized words constitute short-term shared
`
`knowledge. Yet this collection of recognized words tagged with the user’s identity
`
`has not been interpreted by the system. This is fatal to Petitioners’ case because
`
`the ’681 patent discloses short-term shared knowledge is accumulated using
`
`recognized words as input and then interpreting those words to extract shared
`
`expectations and assumptions in order to determine a user’s intent. The limitations
`
`of mere recognized words preclude them from being short-term shared knowledge
`
`since they impart no meaning on the utterances—limitations that Petitioners’ own
`
`expert, Dr. Smyth, acknowledged in both his declaration and at his deposition.
`
`2
`
`
`
`EX1002, ¶42 (Recognized words alone are “not sufficient for an algorithm to
`
`understand the user’s goals and intent.”); EX2003, 88:16-24 (“Q: Does
`
`Kennewick’s dialog history keep track of the user’s goals and preferences? A: In
`
`the sense that the dialog history is a tagged set of recognized words and phrases
`
`that the user spoke earlier in the conversation, then, no.”).
`
`Accordingly, the Board should reject the grounds of challenge not mooted
`
`by Patent Owner’s disclaimer and, consistent with the preliminary finding in the
`
`institution decision, conclude that claims 1-36 have not been shown to be
`
`unpatentable.
`
`II.
`
`PERSON OF ORDINARY SKILL IN THE ART
`Petitioners define the level of a person of ordinary skill in the art (“POSA”)
`
`at the time of the invention of the ’681 patent as someone who “would have at least
`
`a Bachelor-level degree in computer science, computer engineering, electrical
`
`engineering, or a related field in computing technology, and two years of
`
`experience with automatic speech recognition and natural language understanding,
`
`or equivalent education, research experience, or knowledge.” Pet., 4-5; EX1002,
`
`¶30. Patent Owner and its expert, Dr. Gershman, have also applied the same
`
`definition this response. EX2001, ¶¶19-20.
`
`3
`
`
`
`III. GROUND 1: CLAIMS 1-2, 9, 13-14, 21, 25-26, AND 33 ARE NOT
`RENDERED OBVIOUS BY KENNEWICK
`The petition fails to establish that claims 1-2, 9, 13-14, 21, 25-26, and 33 are
`
`rendered obvious by Kennewick alone or in combination with the knowledge of a
`
`POSA. EX2001, ¶¶21-54. First, as the Board has already recognized, the petition
`
`fails to establish that Kennewick discloses an utterance containing one or more
`
`words that have different meanings in different contexts. EX2001, ¶¶22-31.
`
`Second, the petition fails to establish that Kennewick discloses short-term shared
`
`knowledge. EX2001, ¶¶32-54.
`
`A. The petition fails to establish that Kennewick renders obvious
`“the utterance includes one or more words that have different
`meanings in different contexts”
`The petition fails to show that Kennewick discloses limitation 1.a. of the
`
`’681 patent, which recites “receiving an utterance at a voice input device during a
`
`current conversation with a user, wherein the utterance includes one or more words
`
`that have different meanings in different contexts.” EX2001, ¶22. This limitation
`
`recites a specific type of ambiguity where “one or more words … have different
`
`meanings in different contexts.” EX2001, ¶23. However, as noted in the
`
`institution decision, the petition cites to no disclosure in Kennewick describing this
`
`specific type of ambiguity. Id.; Decision, 14. Instead, Petitioners cite a
`
`“temperature” example that does not have two different meanings, and a “flight”
`
`4
`
`
`
`example that relates to determining which of two different words that sound the
`
`same is the correct word in a single context. EX2001, ¶¶24-30.
`
`The petition first relies on Kennewick’s teachings that an utterance
`
`containing “temperature” can “impl[y] a context value of weather for the question”
`
`and that “within a different dialog, the keyword ‘temperature’ can imply a context
`
`for a measurement.” Pet., 16 (quoting EX1003, ¶160); EX2001, ¶24. At the
`
`outset, this disclosure in Kennewick is not talking about a single utterance
`
`containing one or more words that have different meanings in different contexts.
`
`EX2001, ¶25. Instead, Kennewick explicitly says that the keyword “temperature”
`
`can imply a different context within different dialogs. EX1003, ¶160. Kennewick
`
`never says it recognizes different meanings for the word “temperature,” and even
`
`when it says “temperature” can indicate different contexts, it explicitly says that is
`
`for “different dialog[s]”—that is, expressly not a single utterance. Id.; EX2001,
`
`¶25. Thus, the petition’s assertion that Kennewick teaches that “‘temperature’
`
`could be interpreted as an outdoor temperature or as a body temperature” is not
`
`true. Pet., 16; EX2001, ¶25.
`
`Moreover, the petition still fails to establish two different meanings of
`
`“temperature.” EX2001, ¶¶26-27. The petition asserts that “‘temperature’ could
`
`be interpreted as an outdoor temperature or as a body temperature depending on
`
`the context.” Pet., 16; see also EX1002, ¶69. But “outdoor temperature” and
`
`5
`
`
`
`“body temperature” do not involve different meanings of “temperature.” EX2001,
`
`¶26. The additional qualifiers “outdoor” and “body” do not change the meaning of
`
`“temperature,” that is, the degree of hotness or coldness. EX2001, ¶27. Instead,
`
`they simply indicate the object whose temperature is being measured. Id. As the
`
`Board correctly concluded in the institution decision, “the word ‘temperature’ has
`
`the same meaning in both of these examples” because “the word ‘temperature’
`
`refers to a measurement of warmth or coolness.” Decision, 14.
`
`The second example Petitioners rely on relates to the question “what about
`
`flight one hundred and twenty too.” Pet., 16–17 (citing EX1003, ¶163); EX2001,
`
`¶28. Kennewick explains: “The parser and domain agent use flight information in
`
`the database and network information along with context to determine the most
`
`plausible interpretation among; flight 100 and flight 20 also, flight 100 and flight
`
`22, flight 122, etc.” EX1003, ¶163. The petition asserts that this teaches one
`
`context “in which ‘flight 122’ exists” and a different context “where the system
`
`just listed several flight options including ‘flight 100’ and ‘flight 20.” Pet., 16-17.
`
`However, Kennewick does not say that there are multiple contexts. EX2001, ¶29.
`
`At most, Kennewick indicates interpreting “flight one hundred and twenty too” in a
`
`single flight information context. Id. As the Board correctly noted in finding no
`
`reasonable likelihood of unpatentability, “Dr. Smyth does not provide any support
`
`6
`
`
`
`for his assertion that these hypothetical situations would constitute different
`
`contexts.” Decision, 15; EX2001, ¶29.
`
`Notably, this passage of Kennewick describes formulating a question or
`
`command after the context has already been determined and the appropriate agent
`
`has been invoked. EX2001, ¶30. The preceding paragraph indicates “the context
`
`for the question or command has been determined” (EX1003, ¶162 (emphasis
`
`added)) and the following disclosure relates to formulating a question or command
`
`for processing by the agent (EX1003, ¶¶162-63). EX2001, ¶30. In other words,
`
`Kennewick teaches that it disambiguates the phrase “flight one hundred and twenty
`
`too” after a single context has already been established. Thus, to the extent there
`
`are multiple meanings of the phrase “flight one hundred and twenty too,” they are
`
`multiple meanings within the same context. Id. Accordingly, Petitioners have
`
`failed to show that “flight one hundred and twenty too” has different meanings in
`
`different contexts. Id.
`
`Thus, the petition fails to establish that Kennewick discloses this element or
`
`renders it obvious. EX2001, ¶31.
`
`B.
`
`The petition fails to establish that Kennewick renders obvious
`“short-term shared knowledge”
`While, as noted in the institution decision, Petitioners’ failure to demonstrate
`
`Kennewick discloses the “different meanings in different contexts” limitation is
`
`enough on its own to reject the ground of challenge, the petition also falls short in
`
`7
`
`
`
`another respect. Petitioners’ theory that Kennewick’s tagged recognized words are
`
`“short-term shared knowledge [that] includes knowledge about the utterance
`
`received during the current conversation” is flawed because both experts agree that
`
`mere recognized words alone are not sufficient for a system to understand a user’s
`
`preferences, goals, or intent. EX2001, ¶¶32-54.
`
`In order to understand the flaws in the petition’s case, some background is
`
`provided below. EX2001, ¶33. As acknowledged in Dr. Smyth’s declaration,
`
`natural language conversational systems typically have multiple, distinct
`
`components, including an automatic speech recognition (ASR) component, a
`
`natural language understanding (NLU) component, and a user modeling (UM)
`
`component. EX1002, ¶¶39-42; see also EX2001, ¶¶34-40. Because Kennewick
`
`discloses that its tagged recognized words are generated by an ASR component,
`
`and because the ’681 patent discloses that short-term shared knowledge is
`
`generated by an NLU component, it is important to understand the differences
`
`between ASR and NLU. EX2001, ¶33. With that understanding in mind, it
`
`becomes evident that Kennewick’s tagged recognized words fail to meet the
`
`recited short-term shared knowledge. Id.
`
`1.
`
`Automated speech recognition (“ASR”), Natural Language
`Understanding (“NLU”), and User Modeling (“UM”)
`To understand the flaws with the petition’s mapping of the claim elements, it
`
`is important to understand the key components of natural language conversational
`
`8
`
`
`
`systems at the time of the ’681 patent. EX2001, ¶34. As explained below, and as
`
`both experts agree, these components include automatic speech recognition
`
`(“ASR”) (which converts a user’s utterance into recognized words), natural
`
`language understanding (“NLU”) (which interprets the recognized words), and
`
`user modeling (“UM”) (which uses the interpreted words to create representations
`
`of knowledge about the user). Id.
`
`Petitioners’ expert Dr. Smyth devotes several pages to describing how
`
`automated speech recognition (ASR) and natural language understanding (NLU)
`
`are distinct components. EX1002, ¶¶39-42; EX2001, ¶¶34-38. According to Dr.
`
`Smyth, “ASR involves recognizing words from spoken audio for each user
`
`utterance” whereas “NLU is the process of determining the meaning of the
`
`recognized words in the utterance.” EX1002, ¶39. Since “recognized words alone
`
`are typically not sufficient for an algorithm to understand the user’s goals and
`
`intent,” it is up to the NLU component to “interpret the meaning of the recognized
`
`words.” Id., ¶42; see also EX2003, 18:11-13 (“If a speech recognition engine
`
`produces recognized words, then NLU can be used to try to understand what those
`
`words mean.”). As Dr. Smyth testified, ASR can “recognize the word Portland”
`
`but is not capable of determining the meaning of the word—it is “the role of the
`
`NLU component” to derive meaning. EX2003, 32:7-14; see also 34:9-12 (“ASR
`
`broadly refers to technologies that take acoustic waveforms and map them to
`
`9
`
`
`
`words, and NLU broadly refers to technologies that try to infer meaning from those
`
`words.”).
`
`Dr. Smyth also testifies that the NLU component extracts knowledge out of
`
`the interpreted words. EX2001, ¶37. For example, Dr. Smyth notes that the NLU
`
`component interprets the phrase “I would like to fly to New York” to extract such
`
`knowledge like the request being one “to find a flight” and “that the flight is for the
`
`individual who is speaking.” EX1002, ¶42. Furthermore, Dr. Smyth notes that the
`
`NLU component “would need to understand that ‘New York’ is the name of a
`
`location, and that it is not the name of specific airport but is a city with multiple
`
`airports associated with it.” Id. (emphasis added). Dr. Smyth makes clear that it is
`
`“NLU technologies” which “handle such situations,” and not the ASR component.
`
`Id.; EX2001, ¶37.
`
`The distinction between ASR and NLU further extends to the fact that they
`
`are often used separately. EX2001, ¶¶35, 38. For example, ASR components can
`
`be used on their own for speech dictation applications such as assisting disabled
`
`persons in transcribing messages. Id., ¶38. Similarly, NLU systems are used to
`
`interpret written text, in which case an ASR component is not necessary. Id.; see
`
`also EX2003, 19:3-4 (“It’s fair to say NLU can be used outside the scope of speech
`
`recognition.”); 26:12-15 (“[I]f you’re focusing on an NLU research project, it
`
`doesn’t make sense to design an ASR system from scratch.”).
`
`10
`
`
`
`Another component that Petitioners’ expert Dr. Smyth discusses is the User
`
`Modeling (UM) component, which allows a system to have “access to specific
`
`knowledge about a user (such as their travel preferences)” by UM methods that
`
`“build an internal representation of a user.” EX1002, ¶44; see also EX2003,
`
`40:23-24 (“[A] user model is … some representation of information or knowledge
`
`about a user.”); EX2001, ¶39. As Dr. Smyth acknowledges, the user model can be
`
`built from previous user utterances such as “a user’s travel preferences from words
`
`in past conversations that a user has had with the system, or inferences from words
`
`in earlier utterances in the current conversation.” EX1002, ¶44. In order to
`
`represent knowledge about a user extracted from an utterance, it would require
`
`extracting meaning from the user’s utterances as performed by the NLU
`
`component, not the ASR component. EX2001, ¶39; see EX1002, ¶42
`
`(“[R]ecognized words alone are typically not sufficient for an algorithm to
`
`understand the user’s goals and intent.”). Once the knowledge about the user’s
`
`travel preferences is extracted by the NLU, the UM component can use it to build a
`
`user model. EX2001, ¶39.
`
`As discussed below, Dr. Smyth admits that Kennewick’s dialog history,
`
`which consists of tagged recognized words, is created by an ASR component.
`
`However, the ’681 patent’s description of short-term shared knowledge
`
`necessitates interpretation of the words, which would be performed by an NLU
`
`11
`
`
`
`component using the recognized words as input. EX2001, ¶40. This distinction
`
`proves fatal to the petition’s theory.
`
`2. Kennewick’s tagged recognized words are not short-term
`shared knowledge
`The petition fails to establish that Kennewick discloses or renders obvious
`
`“short-term shared knowledge” because Kennewick’s tagged recognized words are
`
`not interpreted and therefore do not constitute short-term shared knowledge.
`
`EX2001, ¶¶41-54.
`
`The petition begins its analysis of this limitation by discussing Kennewick’s
`
`disclosure of “recognize[d] words and phrases” of a user’s utterance that are
`
`tagged “with [the user’s] identity in all further processing.” Pet., 17 (quoting
`
`EX1003, ¶155); EX2001, ¶42. Specifically, Kennewick discloses that “speech
`
`recognition engine 120 may recognize words and phrases” and then adds the user
`
`identity tags. EX1003, ¶155. The petition asserts that “Kennewick’s system of
`
`tagging recognized words and phrases of each utterance with a user identity allows
`
`the system to build a dialog history of words and phrases uttered by that user
`
`during the conversation.” Pet., 18. The petition thus asserts that this dialog history
`
`containing the tagged recognized words is the claimed accumulation of “short-term
`
`shared knowledge about the current conversation.” Id.
`
`As an initial matter, the petition fails to explain how tagged recognized
`
`words would constitute any sort of knowledge about the utterance received.
`
`12
`
`
`
`EX2001, ¶43. Indeed, the petition does not have any discussion as to what
`
`“knowledge about the utterance” the Petitioners believe are reflected in
`
`Kennewick’s tagged recognized words. Id. Nor does Dr. Smyth, whose analysis is
`
`identical to the petition’s. See EX1002, ¶¶71-73; EX2001, ¶43.
`
`Indeed, tagged recognized words do not constitute short-term shared
`
`knowledge about the received utterance as described and claimed in the ’681
`
`patent and as understood by ordinary artisans. EX2001, ¶¶44-46. Instead, the ’681
`
`patent explains that short-term shared knowledge is built by the Session Input
`
`Accumulator using recognized words as an input, whereupon assumptions and
`
`expectations are extracted from the utterances to build short-term knowledge. Id.
`
`In other words, the Session Input Accumulator is part of an NLU component, not
`
`an ASR component, and therefore mere recognized words do not constitute short-
`
`term shared knowledge. Id. While the ’681 patent does acknowledge that short-
`
`term shared knowledge can include “conversation history,” as explained below,
`
`unlike Kennewick’s tagged recognized words, the ’681 patent’s “conversation
`
`history” is a history of interpreted utterances. Id.
`
`Specifically, the ’681 patent explains that “shared knowledge,” which
`
`includes “both short-term and long-term knowledge,” “may enable a user and a
`
`voice user interface to share assumptions and expectations … that facilitate a
`
`cooperative conversation between human users and a system.” EX1001, 4:51-53,
`
`13
`
`
`
`5:25-30; EX2001, ¶46. Shared knowledge is built by the Session Input
`
`Accumulator. EX2001, ¶48; EX1001, 13:44, 15:39-40 (“Intelligent Hypothesis
`
`Builder 310 may use short-term shared knowledge from the Session Input
`
`Accumulator.”). The Session Input Accumulator accumulates a number of inputs
`
`upon which the short-term shared knowledge is built, including “recognition text
`
`for each utterance, a recorded speech file for each utterance, a list-item selection
`
`history, a graphical user interface manipulation history, or other input data.”
`
`EX1001, 13:47-49. The Session Input Accumulator uses the extracted knowledge
`
`to “build shared knowledge models.” Id., 13:44; EX2001, ¶48. The Session Input
`
`Accumulator then passes the shared knowledge to the “Intelligent Hypothesis
`
`Builder 310” which “may use short-term knowledge to generate one or more
`
`implicit hypotheses of a user's intent when an utterance may be missing required
`
`qualifiers or other information needed to complete a request or task.” EX1001,
`
`15:65-16:2 (emphasis added); EX2001, ¶48. These hypotheses about a user’s
`
`intent are graded and used by the Adaptive Response Builder 315 to generate an
`
`adapted natural language response. EX2001, ¶48; EX1001, 16:60-17:20. To
`
`summarize, the Session Input Accumulator receives input such as recognized
`
`words of an utterance, interprets those words to build shared knowledge, and
`
`passes such knowledge to the Intelligent Hypothesis Builder for building
`
`hypotheses about user intent that is used to form adapted responses. EX2001, ¶48.
`
`14
`
`
`
`The fact that recognized words are used as an input for building shared
`
`knowledge is supported by the ’681 patent’s explanation that shared knowledge
`
`consists of “assumptions and expectations such as topic knowledge, conversation
`
`history, word usage, jargon, tone (e.g., formal, humorous, terse, etc.), or other
`
`assumptions and/or expectations that facilitate interaction at a Human-to-Machine
`
`interface.” EX1001, 14:26-30 (emphasis added); EX2001, ¶¶44-46. In other
`
`words, shared knowledge allows participants in a conversation to determine each
`
`other’s intent and provide meaningful responses. EX2001, ¶¶46-47. For example,
`
`if a speaker says, “I had a burger last night, and it was so good, I left a big tip,”
`
`there is at least shared knowledge between the two speakers that the presence of a
`
`“tip” indicates that the burger was consumed at a sit-down restaurant (as opposed
`
`to a fast-food establishment or a home-made meal). Id. Thus, the
`
`assumption/expectation of a sit-down restaurant could result in the other speaker
`
`asking, “Which restaurant was it?” or “How was the service?” Id. However,
`
`without such shared, tacit knowledge about restaurants, the dialog would be
`
`incomprehensible. Id. The shared knowledge is an integral part of the
`
`conversation. Id.
`
`Thus, because shared knowledge consists of shared assumptions and
`
`expectations, and because the building of shared knowledge is performed by an
`
`NLU component that uses recognized words as input, Kennewick’s tagged
`
`15
`
`
`
`recognized words are not shared knowledge. EX2001, ¶49. Dr. Smyth’s own
`
`testimony confirms that mere tagged recognized words do not constitute short-term
`
`shared knowledge about a conversation. Id. As he explains it, recognized words
`
`are “not sufficient for an algorithm to understand the user’s goals and intent.”
`
`EX1002, ¶42; see also EX2003, 68:16-24 (“Q: Does Kennewick’s dialog history
`
`keep track of the user’s goals and preferences? A: In the sense that the dialog
`
`history is a tagged set of recognized words and phrases that the user spoke earlier
`
`in the conversation, then, no.”); cf. EX1001, 14:25-30 (“[A] user and a voice user
`
`interface may share assumptions and expectations such as topic knowledge …”).
`
`Since Kennewick’s tagged recognized words do not provide the system with
`
`understanding of the user’s goals, intent, or preferences, they do not provide the
`
`system with short-term shared knowledge about the conversation (i.e. assumptions
`
`and expectations about the conversation). EX2001, ¶49. Indeed, Kennewick’s
`
`tagged recognized words do not reflect any meaning of the user’s utterance. Id.;
`
`EX2003, 34:9-12 (“ASR broadly refers to technologies that take acoustic
`
`waveforms and map them to words, and NLU broadly refers to technologies that
`
`try to infer meaning from those words.”).2
`
`
`2 Kennewick’s tagged recognized words are not strictly the recognized words
`
`from the ASR component, as the recognized words are also tagged with the user
`
`
`
`16
`
`
`
`The petition argues that “Kennewick’s description of building dialog history
`
`by tagging dialog utterances is consistent with the ’681 patent’s disclosure of
`
`‘short-term knowledge’ including a ‘history of the conversation’ built ‘as the
`
`conversation progresses.’” Pet., 18 (quoting EX1001, 15:65-16:7, 15:38-42);
`
`EX2001, ¶¶51-52. However, the ’681 patent’s “conversation history,” which is
`
`built by an NLU component, is clearly not the same as Kennewick’s tagged
`
`recognized words, which is built by an ASR component. EX2001, ¶52.
`
`Kennewick’s tagged recognized words are built by its speech recognition engine,
`
`an ASR component. EX1003, ¶155; EX2003, 55:3-6 (“[A]s described by
`
`Kennewick, speech recognition engine has the characteristics of an ASR
`
`component in its conversational interface system.”). But as the petition
`
`acknowledges, the ’681 patent teaches that it is the “Session Input Accumulator”
`
`
`identity. EX1003, ¶155; EX2001, ¶50. However, the tags do not change the fact
`
`that the words are not interpreted and therefore the system has not obtained any
`
`knowledge about, e.g., a user’s goals, preferences, intents, or assumptions from the
`
`utterances. EX2001, ¶50; see also EX2003, 88:17-24 (“Q: Does Kennewick’s
`
`dialogue history keep track of the user’s goals and preferences? A: In the sense that
`
`the dialogue history is a tagged set of recognized words and phrases that the user
`
`spoke earlier in the conversation, then, no.”).
`
`17
`
`
`
`which “build[s] a history of the conversation.” EX1001, 16:6-7. Yet as explained
`
`above, the Session Input Accumulator receives as input “recognition text for each
`
`utterance” in order to build the conversation history. Id., 13:47-51; EX2001, ¶52.
`
`Moreover, the ’681 patent makes clear that “shared knowledge 305” includes
`
`“share[d] assumptions and expectations such as … conversation history.”
`
`EX1001, 14:24-27 (emphasis added). In other words, just like a typical NLU
`
`component, the Session Input Accumulator receives recognized text as input,
`
`interprets the text, and builds shared assumptions and expectations (i.e. shared
`
`knowledge), which includes the conversation history. EX2001, ¶52. Mere
`
`recognized words generated by an ASR component are insufficient to constitute
`
`shared assumptions and expectations. EX2001, ¶¶52-53; EX1002, ¶42; see also
`
`EX2003, 88:17-24. Thus, the petition is wrong to equate the building of
`
`Kennewick’s tagged recognized words by an ASR component with the building of
`
`the ’681 patent’s conversation history by an NLU component. EX2001, ¶¶52-53.
`
`For these reasons, Kennewick’s tagged recognized words are not “short-term
`
`shared knowledge.” EX2001, ¶54. Accordingly, the petition fails to establish that
`
`this limitation is disclosed or rendered obvious by Kennewick.
`
`IV. GROUNDS 2 THROUGH 6: CLAIMS 3-8, 10-12, 15-20, 22-24, 27-32,
`AND 34-36 ARE NOT OBVIOUS
`The claims (that were not disclaimed) challenged in Grounds 2 through 6 of
`
`the petition are all dependent of claims 1, 13, and 25. The petition’s analysis for
`
`18
`
`
`
`all the challenged claims in Grounds 2 through 6 rely on its analysis of claims 1,
`
`13, and 25 in Ground 1. See, e.g., Pet., 32 (“As described in Section IX.A,
`
`Kennewick renders obvious independent claims 1, 13, and 25 … This section
`
`identifies additional disclosures of Huang that render these dependent claims
`
`obvious.”). Thus, these grounds of challenge fail for the same reasons discussed
`
`above. EX2001, ¶55.
`
`V. CONCLUSION
`For at least the reasons set forth above, as well as those noted in the Board’s
`
`institution decision, Petitioners have failed to meet their burden and the challenged
`
`claims should be found not unpatentable.
`
`
`
`
`
`Date: May 21, 2021
`
`
`
`
`
`Respectfully submitted,
`
`
`
`/ Matthew A. Argenti /
`Matthew A. Argenti, Lead Counsel
`Reg. No. 61,836
`
`
`
`19
`
`
`
`CERTIFICATE OF COMPLIANCE
`
`Pursuant to §42.24(d), the undersigned certifies that this paper contains no
`
`more than 14,000 words, not including the portions of the paper exempted by
`
`§42.24(b). According to the word-processing system used to prepare this paper, the
`
`paper contains 4,096 words.
`
`
`
`
`
`Date: May 21, 2021
`
`
`
`
`
`Respectfully submitted,
`
`
`
`/ Matthew A. Argenti /
`Matthew A. Argenti, Lead Counsel
`Reg. No. 61,836
`
`
`
`20
`
`
`
`LIST OF EXHIBITS
`
`Exhibit No.
`
`Description
`
`2001
`
`2002
`
`2003
`
`2004
`
`2005
`
`Declaration of Anatole Gershman
`
`Curriculum Vitae of Anatole Gershman
`
`Transcript of May 4, 2021 deposition of Padhraic Smyth, Ph.D.
`
`Intentionally Left Blank
`
`Disclaimer in Patent Under 37 C.F.R. 1.32(a), filed May 17,
`2021 for U.S. Patent No. 8,073,681
`
`
`
`
`
`
`
`21
`
`
`
`CERTIFICATE OF SERVICE
`
`I certify that the foregoing Patent Owner’s Response Pursuant to 37 C.F.R. §
`
`42.120 and Exhibits 2001-2005 were served on this 21st day of May, 2021, on the
`
`Petitioners at the correspondence address of the Petitioners as follows:
`
`J. David Hadden
`Saina Shamilov
`Brian Hoffman
`Johnson K. Kuncheria
`Allen Wang
`FENWICK & WEST LLP
`VBAssets-IPR@fenwick.com
`
`
`
`
`
`
`Date: May 21, 2021
`
`
`
`
`
`
`Respectfully submitted,
`
`
`
`/ Matthew A. Argenti /
` Matthew A. Argenti, Lead Counsel
` Reg. No. 61,836
`
`
`
`22
`
`