`
`
`In re Patent of: Morton et al.
`
`U.S. Patent No.: 7,296,121
`Issue Date:
`Nov. 13, 2007
`Appl. Serial No.: 10/966,161
`Filing Date: Oct. 15, 2004
`Title: REDUCING PROBE TRAFFIC IN MULTIPROCESSOR SYSTEMS
`
`
`Case Nos.: IPR2015-00159
`
`
`IPR2015-00163
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`REPLY DECLARATION OF DR. ROBERT HORST
`
`1.
`
`I have reviewed the “Patent Owner Response” in IPR2015-00159, the
`
`“Patent Owner Response” in IPR2015-00163 and the “Declaration of Vojin
`
`Oklobdzija, Ph.D. in Support of Patent Owner’s Responses,” each filed on August
`
`11, 2015. I also considered the references cited herein, including, for example:
`
`U.S. Patent Application Publication Number 2002/0053004 to Pong (“Pong”) (Ex.
`
`1003); U.S. Patent No. 7,698,509 to Koster et al. (“Koster”) (Ex. 1009);
`
`Deposition Transcript of Dr. Vojin G. Oklobdzija Vol. 1, November 23, 2015 (Ex.
`
`1026); Deposition Transcript of Dr. Vojin G. Oklobdzija Vol. 2, November 24,
`
`2015 (Ex. 1027); David E. Culler et al., Parallel Computer Architecture: A
`
`Hardware/software Approach (1st Ed.) (1998) (Ex. 1028); “InfiniBand
`
`Architecture Specification Volume 1 Release 1.0.a” (June 19, 2001) (Ex. 1029);
`
`James Laudon and Daniel Lenoski, Proceedings of the 24th Annual International
`
`Symposium on Computer Architecture, “The SGI Origin: A ccNUMA Highly
`
`Scalable Server” (1997) (Ex. 1030); Excerpts from Merriam-Webster’s Collegiate
`
`Page 1 of 28
`
`APPLE 1025
`Apple et al. v. Memory Integrity
`IPR2015-00163
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`Dictionary (10th ed. 1999) (Ex. 2014); Excerpts from Laughton et al., Electrical
`
`Engineer’s Reference Book, pp. 15/3 (16th ed. 2003) (Ex. 2015); and Fong Pong et
`
`al., Design and Performance of SMPs With Asynchronous Caches (Nov. 1999)
`
`(“the Pong whitepaper”) (Ex. 2024)1. Moreover, I attended the deposition of Dr.
`
`Oklobdzija on November 23 and 24, 2015. In my declaration, I am applying the
`
`standards and legal principles that I applied when drafting the declaration entitled
`
`“Declaration of Dr. Robert Horst” dated October 28, 2014, which were outlined in
`
`paragraphs 8 and 40-61 of that document. Based on these principles and my
`
`expertise in the relevant technology, I disagree with several inaccurate and/or
`
`misleading statements in both the Patent Owner’s Responses and Dr. Oklobdzija’s
`
`August 11, 2015 Declaration. Below, I address some of these statements.
`
`I.
`
`2.
`
`Proper Construction of the Term “Programmed”
`
`Dr. Oklobdzija asserts that “the broadest reasonable interpretation of
`
`the term ‘programmed’ in the context of the ’121 patent refers to a device that has
`
`been configured by a sequence of instructions.” Ex. 2016, ¶ 33. Dr. Oklobdzija
`
`further asserts that “the broadest reasonable interpretation of ‘programmed’ would
`
`not include hardwired logic.” Ex. 2016, ¶ 38. I respectfully disagree, because the
`
`
`
`1 Throughout this declaration I refer to Ex. 1004 as “Pong” or the “Pong
`
`prior art reference.” I refer to Ex. 2024 as the “Pong whitepaper.”
`
`Page 2 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`term “programmed” is commonly used to describe the design and configuration of
`
`hardwired logic.
`
`3.
`
`For example, a field programmable gate array (FPGA) is effectively
`
`an array of logic gates that can be inter-wired in different configurations according
`
`to a manufacturer’s programming. Contrary to Dr. Oklobdzija’s contention that
`
`the term “programmed” is limited to devices that execute a series of instructions, a
`
`field programmable gate array need not execute any instructions. Rather, a
`
`designer specifies a configuration of physical interconnections between the logic
`
`gates and transfers the configuration to a storage device. Depending on the type of
`
`FPGA, the configuration is either stored in non-volatile storage inside the FPGA,
`
`or the configuration is automatically transferred into the FPGA from a non-volatile
`
`memory when power is first applied. After initialization, the logic gates within the
`
`FPGA perform logical operations corresponding to the configuration and input
`
`signals.
`
`4. During his deposition, Dr. Oklobdzija admitted that an FPGA
`
`“doesn’t use [a] sequence of instructions.” Ex. 1026, 123:12-20. Dr. Oklobdzija
`
`asserted that despite its use of the term “programmable” in the name, FPGAs are
`
`not “programmable in a sense of executing a sequence of instructions.” Ex. 1026,
`
`123:21-23. He suggested that “the better, more accurate term would be field
`
`configurable logic because it's configured.” Ex. 1026, 123:24-124:1. I have not
`
`Page 3 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`heard of FPGAs referred to as “field configurable gate arrays” outside of this legal
`
`proceeding, and Dr. Oklobdzija stated he has not either. See Ex. 1026, 124:4-11.
`
`5.
`
`Contrary to Dr. Oklobdzija’s assertions during the deposition, the
`
`’121 Patent uses the term “programmable,” not “configurable,” when referring to
`
`devices that need not execute instructions, teaching that “the cache coherence
`
`controller 230 is a specially configured programmable chip such as a
`
`programmable logic device or a field programmable gate array.” Ex. 1001, 7:49-
`
`52. Moreover, the evidence cited in Dr. Oklobdzija’s declaration states that “in
`
`hardwired logic systems the physical interconnections of the elements govern the
`
`routes by which data flows between the processing elements and thus the sequence
`
`of processing operations performed on the data.” Ex. 2015 (Excerpts from
`
`Laughton et al., Electrical Engineer’s Reference Book, pp. 15/3 (16th ed. 2003)) at
`
`15/3. This is entirely consistent with the operation of the aforementioned field
`
`programmable gate array.
`
`6.
`
`Therefore, Dr. Oklobdzija’s definition of the term “programmed” to
`
`exclude hardwired systems is inconsistent with the use of the term
`
`“programmable” in the ’121 Patent, with the evidence Dr. Oklobdzija cites in his
`
`declaration, and with my experience with the term. The Merriam-Webster’s
`
`Collegiate Dictionary cited in Patent Owner’s Responses includes a definition for
`
`“program” that is “to work out a sequence of operations to be performed by (a
`
`Page 4 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`mechanism).” Ex. 2014, p. 931. I believe this definition is consistent with the
`
`usage of the term “programmed” in the ’121 Patent and the understanding of a
`
`person of ordinary skill in the art. Accordingly, I believe a reasonable
`
`interpretation of the term “programmed” is “designed to perform a sequence of
`
`operations,” regardless of whether this design is in hardware or software.
`
`II.
`
`Reply to Statements Made With Regard to Pong
`
`A.
`
`Pong Enables the Instituted Claims of the ’121 Patent
`
`7.
`
`In his declaration, Dr. Oklobdzija asserts that “teachings of Pong are
`
`confusing, internally inconsistent, and omit key disclosures that would enable one
`
`of ordinary skill in the art to practice the limitations of any of the independent
`
`claims of the ’121 Patent.” Ex. 2016, ¶ 73. I disagree. The level of
`
`implementation detail provided by Pong is consistent with the level of
`
`implementation detail provided in similar prior art disclosures of cache coherent
`
`systems, such as Koster. A person of ordinary skill in the art would have known
`
`how to implement the Pong system without undue experimentation.
`
`8.
`
`Pong’s disclosure is organized into a number of embodiments, which
`
`are generally demarked by easily identifiable headings. Pong describes various
`
`implementations for certain features, but a person of ordinary skill in the art would
`
`have been able to distinguish these implementations and understood how they
`
`relate to each of the embodiments.
`
`Page 5 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`A first set of embodiments, illustrated in FIGS. 2-3, are described in
`
`9.
`
`paragraphs 0029-0043. In these embodiments, the queues in the memory
`
`controller are “designed to broadcast the request to all other processors and the
`
`memory.” Ex. 1003, ¶ 0032. This is how these particular embodiments “emulat[e]
`
`a shared bus type of protocol,” a protocol that similarly ensures that all requests are
`
`sent to all processors. Ex. 1003, ¶ 0029. Thus, Pong does not repeatedly treat “the
`
`conventional bus based architecture as interchangeable with a point-to-point
`
`architecture,” as asserted by Dr. Oklobdzija. Ex. 2016, ¶ 76. Rather, just as in
`
`other prior art references, such as the Culler book cited by Dr. Oklobdzija (Ex.
`
`2011), Pong acknowledges that, in some embodiments, point-to-point architectures
`
`may be implemented to broadcast requests. See Ex. 1028, p. 556 (discussing the
`
`use of a scalable interconnection network using a two-level protocol with
`
`coherence within a node handled by an inner protocol and coherence between
`
`nodes handled with an outer protocol).
`
`10. To this point, I agree with Dr. Oklobdzija that the Culler book to
`
`which he cites is a “very highly regarded” textbook, and was taught to college
`
`students at the time of the ’121 Patent. Ex. 1026, 38:3-40:11. As a result, I believe
`
`it is an excellent indication of the information regarding cache coherency available
`
`to a person of ordinary skill in the art at the time of the ’121 Patent.
`
`Page 6 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`11. The Culler textbook describes Pong’s general architecture as
`
`“common” and identifies it as one of “four categories” into which all memory
`
`hierarchies generally fall, depending on the relative positioning of processors,
`
`cache(s), and main memory with respect to the interconnect. Ex. 1028, pp. 270-
`
`271. Pong’s architecture, known as “dancehall,” is shown in FIG. 5.2(c) of the
`
`Culler textbook, and applies to systems in which each processor has its own private
`
`cache and that is connected to main memory by a point-to-point interconnection
`
`network. See Ex. 1028, pp. 271. Chapter 8 of the Culler details various cache
`
`coherent “directory schemes [that] rely on point-to-point network transactions,”
`
`proving that the use of cache coherence schemes in scalable point-to-point
`
`architectures were well-known before 2002. Ex. 1028, p. 555 (introducing the
`
`subject matter of Chapter 8). Notably, in the context of the ‘121 patent, point-to-
`
`point achitectures (‘121), point-to-point links (Pong) and point-to-point
`
`interconnection networks (Culler) are substantially the same because they are all
`
`contrasting a switched or directly connected network with a shared bus.
`
`Page 7 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`
`Culler Book
`Cited by PO
`
`
`12. Starting in paragraph 44, Pong describes “[f]urther [o]ptimizations” to
`
`Pong
`
`the embodiments of FIGS. 2 and 3, and variations of these optimizations are
`
`specifically illustrated and discussed with regard to FIGS. 4-6. In the
`
`embodiments of FIGS. 4-6, a central directory located in either the memory
`
`controller (FIG. 4), a separate memory component (FIG. 5), or folded into the data
`
`blocks in main memory (FIG. 6) is used to filter requests to reduce traffic. See Ex.
`
`1003, ¶¶ 0056-0060.
`
`13. Pong describes various ways in which such a directory may be used to
`
`filter requests, and a POSITA would have been able to easily distinguish between
`
`them. For example, in paragraph 47, Pong describes an embodiment in which a
`
`requesting processor uses information contained in a directory to specifically
`
`address requests to a processor storing a valid copy. Ex. 1003, ¶ 0047. The Culler
`
`Page 8 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`textbook also discusses protocols which involve “finding the source of the
`
`directory information upon a miss and determining the location of the relevant
`
`copies.” See Ex. 1028, p. 565. Pong describes an alternative approach in paragraph
`
`56, in which the directory itself “filters the request and addresses it to the
`
`appropriate processors.” Ex. 1003, ¶ 0056. Again, the Culler textbook describes
`
`this as a well understood alternative to the embodiment of paragraph 47 that
`
`reduces transactions on the interconnect, referring to it as “intervention
`
`forwarding.” See Ex. 1028, p. 585. In other words, the Culler textbook
`
`demonstrates that a person of ordinary skill in the art would have understood
`
`Pong’s description in paragraphs 47 and 56 as well-known alternative
`
`embodiments, and would not have viewed them as conflicting.
`
`14. Relatedly, Pong describes three ways in which the directory could be
`
`structured. First, in paragraphs 47 to 49, Pong describes that the directory can
`
`include an “ID of the processor that currently has a particular data block.” Ex.
`
`1003, ¶ 0047. In this implementation, the directory information (i.e., the processor
`
`ID) is used to direct read requests, but write requests (i.e., write update or write
`
`invalidation requests) are broadcast. See Ex. 1003, ¶¶ 0047-0049. Second, in
`
`paragraphs 51 and 56 to 57, Pong describes that the directory can include a
`
`presence bit vector. See Ex. 1003, ¶ 0051 (“One way to implement the directory is
`
`with a presence bit vector”). In this implementation, the presence bit vector is used
`
`Page 9 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`to filter “all traffic” by directing read and write requests to all processors that the
`
`bit vector indicates have a copy of the data block. See Ex. 1003, ¶¶ 0050, 0051,
`
`0056-0057. Third, in paragraphs 49 and 60 to 61, Pong describes that the directory
`
`can include a processor ID and a presence bit vector. See Ex. 1003, ¶¶ 0049, 0060-
`
`0061. In this implementation, the processor ID is used to direct read requests to
`
`the processor associated with the processor ID, and write requests are directed to
`
`those processors that the bit vector indicates have a copy of the data block. See id.
`
`A person of ordinary skill in the art would have understood that at least the second
`
`and third of these three implementations could be employed with regard to the
`
`embodiment shown in FIG. 4.
`
`15. The other “incompatible embodiments” identified by Dr. Oklobdzija
`
`in paragraph 79 of his declaration would have similarly been identified as
`
`distinctly operable alternative embodiments of Pong’s system by a person of
`
`ordinary skill in the art, and not contradictions regarding a single embodiment.
`
`However, the embodiment on which I focused in my original declaration was the
`
`embodiment shown in FIG. 4, which is described, at least in part, in paragraphs
`
`0050-0057, where Pong specifies that, “[u]nless otherwise noted, the description of
`
`the components is the same as provided” with regard to FIGS. 2 and 3. Pong, ¶
`
`0056. In other words, Pong is clear that any description of the embodiment shown
`
`Page 10 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`in FIG. 4 takes precedence over any contradictory description of the previous
`
`embodiments.
`
`16. With regard to FIG. 4, Pong describes that the directory may be
`
`implemented with a presence bit vector. Ex. 1003, ¶ 0051. The Culler textbook
`
`confirms that the use of such a presence bit vector was common at the time of the
`
`’121 Patent and even provides specific description regarding how directories
`
`commonly used a presence bit vector to implement a cache coherence scheme. See
`
`Ex. 1028, pp. 560-64. Thus, the Culler textbook demonstrates that a person of
`
`ordinary skill in the art would have readily known how to implement the
`
`embodiment shown in FIG. 4 of Pong, in which a centralized directory utilizes
`
`intervention forwarding and a presence bit vector to implement a cache coherence
`
`protocol.
`
`17. While the over 120 pages devoted to directory-based cache coherence
`
`in the Culler textbook demonstrate that there were a number of low-level
`
`implementation choices a person of ordinary skill could have made in making the
`
`embodiment shown in FIG. 4, these pages provide evidence that these details were
`
`well within the ability and knowledge of a person of ordinary skill in the art. See
`
`generally Ex. 1028, Chap. 8. In fact, the Culler textbook details various case
`
`studies of working directory-based systems and simulations that were commonly
`
`used to test them. See Ex. 1028, pp. 576-77, 596-644.
`
`Page 11 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`18. Therefore, I believe that Pong would have enabled one of ordinary
`
`skill in the art to make or use the claimed invention without undue
`
`experimentation, particularly when considering the Pong disclosure together with
`
`information in the prior art, such as the teachings of the Culler textbook.
`
`Moreover, I disagree with Dr. Oklobdzija’s conclusion that, “based on the
`
`disclosures of Pong and the background knowledge of one skilled in the art, it
`
`would take in excess of two years to design, verify, and implement a computer
`
`system with a point-to-point architecture and supporting cache coherency.” Ex.
`
`2016, ¶ 83. In my experience, it usually takes more than two years to “design,
`
`verify, and implement” a multiprocessor system. Therefore, in my opinion, even
`
`assuming it took two years to “design, verify, and implement,” such a period
`
`would not constitute undue experimentation.
`
`B.
`
`Pong’s Presence Bit Vector Indicates Validity
`19. Dr. Oklobdzija asserts that “Pong itself suggests that other
`
`mechanisms track validity, and that Pong’s presence bit vector does not track
`
`validity.” Ex. 2016, ¶ 91. I disagree. As I described in my original declaration, in
`
`either an update or a write invalidation type of cache coherence protocol, presence
`
`bits in a centralized directory, like the directory filter 400 of the memory
`
`controller, only indicate the presence of valid copies of memory. See Ex. 1014 at
`
`¶¶ A-10 to A-13. Dr. Oklobdzija’s assertions to the contrary demonstrate his
`
`Page 12 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`misunderstanding of cache coherence directories generally and Pong’s directory
`
`specifically.
`
`20. First, Dr. Oklobdzija cites to paragraph 13 of Pong as proof that
`
`“Pong keeps track of validity separately from presence.” Ex. 2016, ¶ 92.
`
`However, the paragraph cited by Dr. Oklobdzija is not clearly tied to the FIG. 4
`
`embodiment of Pong on which Petitioner relies, and, in describing FIG. 4, Pong
`
`suggests that any previous disclosure that contradicts its description of FIG. 4 does
`
`not apply. See Ex. 1003, ¶ 0056.
`
`21. Moreover, Pong’s processors may be designed to redundantly check
`
`validity, such that disclosure of the processors’ separate validity check does not
`
`mean that the directory filter 400 fails to also track validity. For example, at the
`
`time of the ‘121 patent, processors were capable of flushing memory lines from
`
`their caches without alerting the central directory. Ex. 1028, pp. 562 (“there will
`
`be periods when a directory’s knowledge of a cache state is incorrect”), 604
`
`(discussing replacement of a block in a cache optionally triggering a replacement
`
`hint message back to the home to clear its presence bit in the director, but noting
`
`that many systems do not use replacement hints and thus fail to alert the central
`
`directory when flushing a memory line in a clean state). In this case, based on the
`
`best information available to it, the directory indicates the cache still contains a
`
`valid copy of the memory line, but the memory line has been replaced. See id.
`
`Page 13 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`Notably, this disconnect between the best information available at the central
`
`directory regarding where valid copies reside in a multi-cache system and the
`
`actual location of such valid copies is also present in the system of the ‘121 Patent
`
`itself, which fails to disclose any mechanism by which a cache that flushes a
`
`memory line alerts the ‘121 Patent’s central directory.
`
`22. Put simply, the fact that the memory controller uses the presence bit
`
`vector to forward requests to only a subset of the processor caches is indicative of
`
`the directory’s belief, based on the best information available at the memory
`
`controller, that only that subset of processor caches has the necessary valid copies
`
`of the requested memory lines in order to productively respond to the requests.
`
`After all, it makes no sense for the memory controller to forward requests to
`
`processor caches that it believes have an invalid copy, as such processor caches are
`
`already known to be unable to respond to the requests and, thus, forwarding
`
`requests to these caches does nothing more than needlessly increase traffic in
`
`Pong’s control path. Such a needless increase in traffic contradicts the very reason
`
`why the presence bit vector is used in Pong’s system – i.e., to optimize
`
`performance by “limiting traffic in the control path.” Ex. 1003, ¶ 0045.
`
`23. Dr. Oklobdzija’s citation of the Dragon protocol does not contradict
`
`my understanding of the presence bit vector in Pong. See Ex. 2016, ¶ 90. In the
`
`Dragon protocol, all states are “valid” states, because “the protocol always keeps
`
`Page 14 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`the blocks in the cache up-to-date, so it is always okay to use the data present in
`
`the cache if the tag match succeeds.” Ex. 2011 at 302. This is confirmatory of my
`
`interpretation of a presence bit vector in write update protocols: when a cache line
`
`is present it is necessarily valid, because a write update protocol always updates
`
`caches to keep them valid.
`
`24. Dr. Oklobdzija asserts that “there are legitimate reasons why Pong
`
`may keep track of present, but stale data in its presence bit vector, and it cannot be
`
`assumed that Pong necessarily changes the bit value upon write invalidates.” Ex.
`
`2016, ¶ 93. However, none of the evidence cited by Dr. Oklobdzija supports this
`
`conclusion. Paragraph 91 of Dr. Oklobdzija’s declaration cites to paragraph 61 of
`
`Pong which relates to the embodiment shown in FIG. 6, a different embodiment
`
`than is described with regard to FIG. 4. In this embodiment, the presence bit
`
`vector is incorporated into memory as part of the data blocks and supplemented
`
`with a validity bit. Ex. 1003, ¶ 0061. However, the validity bit in FIG. 6 indicates
`
`the state of the block in memory (i.e., whether the copy stored in main memory is
`
`valid or invalid). See Ex. 1003, FIG. 6 (which labels state information 602 “Mem
`
`Valid”). The presence bit vector in the memory block, in contrast, indicates the
`
`state of the block in the caches (i.e., present and valid, or not present/invalid).
`
`Thus, the embodiment in FIG. 6 is not evidence that Pong’s presence bit vector
`
`does not track validity, as Dr. Oklobdzija suggests.
`
`Page 15 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`25. Paragraph 93 of Dr. Oklobdzija’s declaration cites to Culler, which
`
`describes tracking a “not present” condition in addition to MESI states. However,
`
`Culler does not describe or suggest tracking memory lines that are present and
`
`invalid, as Dr. Oklobdzija implies. In other words, Culler provides no evidence
`
`that a presence bit vector like the one described in Pong would ever set a presence
`
`bit to ‘1’ (indicating presence of a copy in a cache) where the directory has
`
`previously received information that the stored copy is stale, i.e., invalid.
`
`Interestingly, this section of Culler refers to “not present” as a state, providing
`
`further proof that a person of ordinary skill in the art would understand Pong’s
`
`presence bit vector as indicative of states within a cache coherence protocol. Ex.
`
`1028, p. 307-310 (“Note in the table that a new state NP (not present), is
`
`introduced” (emphasis added)).
`
`26. Therefore, the presence bits in the directory filter 400 of the memory
`
`controller described by Pong only indicate the presence of valid copies of memory.
`
`Validity is undoubtedly a state of the memory blocks stored in the processors
`
`caches. See, e.g., Ex. 1028, p. 280.
`
`C.
`
`Pong’s Read Requests Elicit a Response that Maintains
`Cache Coherence
`27. Dr. Oklobdzija asserts that “one cannot conclude that Pong discloses
`
`that its read requests are mechanisms for maintaining cache coherency which are
`
`filtered by Pong’s memory controller.” Ex. 2016, ¶ 97. I disagree.
`
`Page 16 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`28. Pong’s FIG. 4 embodiment discloses a “directory filter that receives
`
`requests … determines which processors have a copy of the data block of interest,
`
`and forwards the requests to … these processors.” Ex. 1003, ¶ 0057 (emphasis
`
`added). Pong discloses a cache coherent system. Ex. 1003, ¶ 1003 (“invention
`
`relates to shared memory, multiprocessor systems, and in particular, cache
`
`coherence protocols.”). As such, the read requests are designed to elicit cache
`
`coherent copies of the requested memory line. See, e.g., Ex. 1028, pp. 276-77.
`
`The read requests also elicit a response (to maintain cache coherency) from the
`
`memory controller node, which forwards the request to other processors and
`
`updates its presence bit vector directory. Ex. 1003, ¶ 0057. Thus, Pong’s cache-
`
`to-cache read requests elicit responses (i.e., by the responding processors and by
`
`the memory controller) that “maintain cache coherency” in the system because
`
`these responses are designed to supply valid copies of a requested memory line to
`
`the requesting processor and maintain a current presence bit vector, which is
`
`essential to the cache coherence protocol.
`
`29.
`
`In fact, Pong describes that, “[w]hen the bit corresponding to a
`
`processor is set in the bit vector, the processor has a copy of the data block.” Pong,
`
`¶ 0051 (emphasis added). In order to indicate that a processor has a copy of the
`
`data block (as opposed to indicating that the processor will have a copy), a person
`
`of ordinary skill would have understood that Pong’s system waits until a read
`
`Page 17 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`response has been returned to the requesting processor to update the directory. At
`
`the time of the ‘121 patent, it was common for directory-based systems to wait
`
`until a read response is returned to a requesting processor to update the directory to
`
`indicate that the requesting processor was storing a copy of the requested data. See
`
`Ex. 1028, p. 563.
`
`30. Pong describes that “[i]n the case of a write invalidation [cache
`
`coherence] protocol, the memory controller . . . uses a directory to reduce traffic in
`
`the control path as explained in the next section.” Ex. 1003, ¶ 0049. In other
`
`words, Pong describes the directory playing a central role in the cache coherence
`
`protocol, as it is used, for example, to direct write invalidations to the processors
`
`storing cached copies of an invalidated memory line. Thus, an update to Pong’s
`
`directory filter 400 maintains cache coherency within the system by, for example,
`
`ensuring that write invalidations are forwarded to the proper processors. Because
`
`Pong teaches that a response to a read request results in an update to the directory
`
`filter 400 (i.e., the setting of a bit corresponding to a processor in the bit vector), a
`
`read request elicits a response from a node to maintain cache coherency in a
`
`D.
`
`Pong Describes that the Memory Controller Accumulates
`Multiple Read Responses to a Single Request
`
`system.
`
`Page 18 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`31. Dr. Oklobdzija asserts that “Pong does not teach accumulating
`
`multiple responses to a probe because, in Pong, only one processor ever responds
`
`to a request.” Ex. 2016, ¶ 109. This is incorrect.
`
`32. With regard to the implementation illustrated in FIG. 4, Pong
`
`describes that “the memory controller directs a request from the request queue to
`
`the directory, which filters the request and addresses it to the appropriate
`
`processors.” Ex. 1003, ¶ 0056 (emphasis added). Pong provides an example in
`
`which “[t]he directory filter 400 . . . determines which processors have a copy of
`
`the data block of interest, and forwards the request to the snoopQ(s) (e.g., 406,
`
`408) corresponding to these processors via the address bus 410.” Ex. 1003, ¶ 0057
`
`(emphasis added). In this description, Pong refers generally to all types of requests
`
`and does not limit its teachings to only write requests, as it does in other portions
`
`of the disclosure. See, e.g., Ex. 1003, ¶¶ 0024, 0048. Thus, in the implementation
`
`shown in FIG. 4, Pong describes that the directory filter 400 forwards individual
`
`requests (either read or write) to those processors that “have a copy of the data
`
`block of interest.” See Ex. 1003, ¶ 0057.
`
`33.
`
`“[E]ach processor propagates a read or write request through its cache
`
`hierarchy independently,” and, “if the processor has the requested block, it
`
`proceeds to provide it to the requesting processor.” Ex. 1003, ¶ 0024. Pong
`
`teaches that, “[u]nless otherwise noted, the description of the components [shown
`
`Page 19 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`in FIG. 4] is the same as provided above [with regard to FIGS. 2 and 3].” Ex.
`
`1003, ¶ 0056. Because Pong’s description in paragraph 24 is consistent with FIG.
`
`4, Pong’s description that “if the processor has the requested block, it proceeds to
`
`provide it to the requesting processor” is applicable to the embodiment shown in
`
`FIG. 4. In other words, Pong describes that when a processor receives a request
`
`and it has the requested data block, it provides a copy of the data block to the
`
`requesting processor. Pong provides no alternative description in which a
`
`processor receives a request, has the requested data block, and does not respond
`
`with the data block.
`
`34. Therefore, because Pong describes that the directory filter 400 of the
`
`memory controller is configured to forward a read request to all processors that are
`
`determined to have a copy of the requested data block and those processors will
`
`respond with a copy (as long as they have not flushed the data block from their
`
`cache), Pong describes multiple responses to a single probe.
`
`35. Pong describes that the memory controller gathers multiple responses
`
`to a probe. As described above, Pong describes that, “if the processor has the
`
`requested block, it proceeds to provide it to the requesting processor.” Ex. 1003, ¶
`
`0024. When the processor is responding to a request for a data block, it “transfers
`
`the data block to an internal data queue 374,” which then “processes data blocks in
`
`FIFO order, and transfers it to the corresponding data queue 352 in the memory
`
`Page 20 of 28
`
`
`
`Attorney Docket No.: 39521-0007IP1
`U.S. Patent No. 7,296,121
`controller.” Ex. 1003, ¶ 0043. In other words, the memory controller of Pong
`
`gathers, in its data queues, each of the responses from each of the processors to
`
`which a request was sent.
`
`36. Moreover, a person of ordinary skill in the art would understand that
`
`Pong’s system would have simultaneously stored multiple read responses to a
`
`single request. Though Pong does not explicitly describe an example in which the
`
`memory controller concurrently stores multiple responses to a single request, a
`
`person of ordinary skill in the art would have understood that, during normal
`
`operation, responses from multiple processors to a request would not always be
`
`spaced apart in time such that each response would have been forwarded through
`
`the memory controller’s queues before the next response is received by the
`
`memory controller. Put another way, in at least some cases, a person of ordinary
`
`skill in the art would have known that two processors responding to a request in
`
`Pong’s system have similar response times so that the two processors will respond
`
`in close enough succession that those responses will be simultaneously stored in
`
`the queues of Pong’s memory controller.
`
`37. This is not simply a possibility, but a reality of normal operation of
`
`multiprocessor systems. For example,