`(U.S. Patent No. 7,620,800)
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`
`––––––––––––––––––
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`
`––––––––––––––––––
`
`MICROSOFT CORPORATION,
`Petitioner,
`
`v.
`
`DIRECTSTREAM, LLC,
`Patent Owner.
`
`––––––––––––––––––
`
`Case No. IPR2018-01605, -01606, -01607
`U.S. Patent No. 7,620,800 B2
`
`––––––––––––––––––
`
`REPLY DECLARATION OF DR. HAROLD STONE
`
`Microsoft Corp. v. Directstream, LLC, IPR2018-01605, -01606, -01607
`Petitioner Microsoft Corp. – Ex. 1076 Cover
`
`
`
`IPR2018-01605, -01606, -01607
`(U.S. Patent No. 7,620,800)
`
`1.
`
`I have reviewed Patent Owner’s Response and Dr. Houman
`
`Homayoun’s supporting declaration. I have been asked to reply to certain
`
`statements by Patent Owner and its expert. This is my declaration on those topics.
`
`I. COMPUTATIONAL LOOPS
`
`2.
`
`Patent Owner and Dr. Homayoun have proposed that the definition of
`
`“computational loop” should be “a set of computations that is executed repeatedly
`
`per datum, either a fixed number of times or until some condition is true or false.”
`
`Patent Owner Response (“Response”), 65; EX2112¶¶207, 225-226, 229-230, 233,
`
`241. I disagree.
`
`3.
`
`In my prior declaration, I did not offer a construction for the phrase
`
`“computational loop” because I believed the term should have its plain and
`
`ordinary meaning and that meaning was generally well known. I have reviewed
`
`the Board’s Institution Decision (Paper 21) and agree that the Board’s construction
`
`of this term as “a set of computations that is executed repeatedly, either a fixed
`
`number of times or until some condition is true or false” reflects that plain and
`
`ordinary meaning. Institution Decision, 23. Based on my experience, this is how a
`
`person of ordinary skill in the art (“a Skilled Artisan”) reading the 800 Patent
`
`would understand the term in the 2002 time frame. I disagree with Patent Owner’s
`
`1
`
`Microsoft Corp. v. Directstream, LLC, IPR2018-01605, -01606, -01607
`Petitioner Microsoft Corp. – Ex. 1076, p. 1
`
`
`
`IPR2018-01605, -01606, -01607
`(U.S. Patent No. 7,620,800)
`
`and Dr. Homayoun’s proposed modification because it does not reflect that
`
`understanding.
`
`4.
`
`For example, nothing in the words “computational loop” requires
`
`repeatedly executing computations “per datum,” as the dictionary definition relied
`
`on by the Board confirms. While the patent discloses loops, EX1005, FIGS. 4A-
`
`4B, 6A-6G, 5:65-6:28, and 6:42-7:37, these loops do not disclose loop calculations
`
`that are repeated multiple times “per datum.” Rather, what is disclosed is merely
`
`the repeated execution of certain computations, i.e., a computational loop. A
`
`Skilled Artisan would understand that these examples are embodiments of
`
`“computational loop” as recited in the claims, and these exemplify the broadest
`
`reasonable interpretation of “computational loop.”
`
`5.
`
`Indeed, the Patent Owner’s proposed interpretation of “computational
`
`loop” is inconsistent with the disclosure of the 800 Patent. For instance, one
`
`example of a “computational loop” included in the disclosure of the 800 Patent can
`
`be found in the prior art paper published in 2001 by Caliga and Barker and
`
`incorporated-by-reference into the 800 Patent at EX1005, 4:58-62. Patent Owner
`
`cites this paper in support of its construction of “computational loop.” See
`
`Response, 67. That loop is reproduced below:
`
`2
`
`Microsoft Corp. v. Directstream, LLC, IPR2018-01605, -01606, -01607
`Petitioner Microsoft Corp. – Ex. 1076, p. 2
`
`
`
`IPR2018-01605, -01606, -01607
`(U.S. Patent No. 7,620,800)
`
`EX2037, 5 (annotated).
`
`6.
`
`Note that this figure specifically identifies the structure as a “loop
`
`over filter coefficients” and implements the following calculation for index j:
`
`Sum = datain(j)*R(j) + datain(j +1)*R(j+1)+Sum, j = 0,2,4,…,ncoef-1
`
`3
`
`Microsoft Corp. v. Directstream, LLC, IPR2018-01605, -01606, -01607
`Petitioner Microsoft Corp. – Ex. 1076, p. 3
`
`
`
`IPR2018-01605, -01606, -01607
`(U.S. Patent No. 7,620,800)
`
`7.
`
`Note that to carry out this equation, multiple iterations of the same
`
`calculations must occur, albeit with different data values for “datain,” “R,” and
`
`“Sum.” The values for “datain” and “R” are different for each iteration, as the
`
`indices for each will be different for each iteration. The value of “Sum” will also
`
`be different for each iteration, as that value is captured in a register each cycle and
`
`fed back over the path that I annotated as “computational loop” to be used in the
`
`next cycle for the next iteration of the calculation.
`
`8.
`
`The loop above – which is incorporated into the disclosure of the 800
`
`Patent – therefore does not meet the Patent Owner’s claim construction for
`
`“computational loop” because it is not a set of computations executed repeatedly
`
`per datum a fixed number of times. Patent Owner’s interpretation therefore
`
`excludes perhaps the most detailed example of a “computational loop” included in
`
`its patent. I believe a Skilled Artisan would accordingly not read the claim phrase
`
`“computational loop” as narrowly as Patent Owner does.
`
`9.
`
`Patent Owner also asserts that Splash2 (EX1007) does not satisfy the
`
`claimed “computational loops” under its proposed interpretation. Response, 78-81.
`
`I disagree with that conclusion as well. As I discussed in my original declaration,
`
`the Unidirectional Systolic Array described in Chapter 8 of Splash2 includes
`
`multiple processing elements executing computational loops simultaneously to
`4
`
`Microsoft Corp. v. Directstream, LLC, IPR2018-01605, -01606, -01607
`Petitioner Microsoft Corp. – Ex. 1076, p. 4
`
`
`
`IPR2018-01605, -01606, -01607
`(U.S. Patent No. 7,620,800)
`
`solve the edit distance calculation. See, e.g., EX1003¶¶105-06. That calculation
`
`requires the comparison of two genetic sequences: a source sequence and a target
`
`sequence. To perform the comparison, each processing element is initially loaded
`
`with a source sequence character. EX1007, 103; EX1003¶217. The characters of
`
`the target sequence are then streamed through the processing elements and
`
`compared against the stored source sequence characters. EX1003¶¶220-23. In
`
`each processing element during one time step, one target sequence character is
`
`compared against the source sequence character stored in that processing element.
`
`Id. In that same processing element during the next time step, a new target
`
`sequence character is compared against that same source sequence character. Id.,
`
`¶¶221-22. Thus, in each time step each processing element compares a different
`
`target sequence character against its stored source sequence character.
`
`10. As I also explained in my original declaration, each processing
`
`element of the Unidirectional Systolic Array performs these comparisons by
`
`executing the computations depicted in FIG. 8.12:
`
`5
`
`Microsoft Corp. v. Directstream, LLC, IPR2018-01605, -01606, -01607
`Petitioner Microsoft Corp. – Ex. 1076, p. 5
`
`
`
`IPR2018-01605, -01606, -01607
`(U.S. Patent No. 7,620,800)
`
`EX1007, 105; EX1003¶¶146, 211. These instructions calculate the edit distance
`
`between one target sequence character and the stored source sequence character in
`
`the particular processing element. EX1003¶146.
`
`11. Accordingly, because each processing element in the Unidirectional
`
`Array executes its loop instructions multiple times for the same source character,
`
`it executes the loop repeatedly per datum, as Patent Owner argues is required by
`
`the claims.
`
`6
`
`Microsoft Corp. v. Directstream, LLC, IPR2018-01605, -01606, -01607
`Petitioner Microsoft Corp. – Ex. 1076, p. 6
`
`
`
`IPR2018-01605, -01606, -01607
`(U.S. Patent No. 7,620,800)
`
`12. Moreover, in both the Unidirectional and Bidirectional Systolic
`
`Arrays disclosed in Splash2, each processing element executes its disclosed
`
`computational loop instructions “a fixed number of times or until some condition is
`
`true or false.” For example, Splash2 states that both systolic arrays described in
`
`Chapter 8 “exploit the locality of reference by computing the entries along each
`
`antidiagonal in parallel, as shown in Figure 8.4. … Each processing element (PE)
`
`computes the distances along a particular diagonal matrix. A block diagram of the
`
`PE and a listing of the algorithm it executes are shown in Figures 8.6 and 8.7,
`
`respectively.” EX1007, 100. A Skilled Artisan would have understood from this
`
`disclosure that each processing element executes its loop instructions until it
`
`“computes the distances along a particular diagonal of the distance matrix.” See
`
`EX1003¶¶225-26, 237-38.
`
`13.
`
`Splash2 also discloses that the sequences being compared in each
`
`array are of length “m” and “n,” respectively. EX1007, 103-04. A Skilled Artisan
`
`would therefore understand that each processing element carries out execution of
`
`its loop instructions until those indices (“m” and “n”) are reached, as I explained
`
`during my deposition. EX2063, 225:19-226:5 (“So that one of skill in the art
`
`would stop this when those indices are reached.”) Indeed, Patent Owner makes
`
`this same point in its response. See Response, 79 (“This is confirmed by Petitioner
`7
`
`Microsoft Corp. v. Directstream, LLC, IPR2018-01605, -01606, -01607
`Petitioner Microsoft Corp. – Ex. 1076, p. 7
`
`
`
`IPR2018-01605, -01606, -01607
`(U.S. Patent No. 7,620,800)
`
`and its expert that the ‘loop-endloop’ repeats until the amount of data
`
`concludes.”)(Emphasis in original.)
`
`14.
`
`In other words, the Unidirectional Systolic Array of Splash2 performs
`
`“a set of computations [as set forth in FIG. 8.12] that is executed repeatedly [once
`
`for each target sequence character streamed through the array] per datum [per
`
`source sequence character], either a fixed number of times or until some
`
`condition is true or false [until the last target sequence character is compared
`
`against the last source sequence character].”
`
`II. LOOPING ON SPLASH2
`
`15.
`
`Patent Owner and its expert also assert that the Splash 2 system
`
`implemented loops on an attached CPU rather than in the FPGAs of the system,
`
`relying solely on a paper by Gokhale and Minnich, entitled “FPGA Computing In
`
`A Data Parallel C.” Response 83; EX2112¶209. I disagree.
`
`16.
`
`The paper by Gokhale and Minnich does not relate to the edit distance
`
`calculations described in Chapter 8 of Splash2, which formed the basis of the
`
`opinions in my original declaration. Rather, the Gokhale and Minnich paper
`
`describes a technique for automatically synthesizing digital logic on the Splash 2
`
`system for programs written in a language called Data-parallel Bit-serial C, or
`
`“dbC,” for an SIMD (single-instruction multiple data) implementation. See
`
`8
`
`Microsoft Corp. v. Directstream, LLC, IPR2018-01605, -01606, -01607
`Petitioner Microsoft Corp. – Ex. 1076, p. 8
`
`
`
`IPR2018-01605, -01606, -01607
`(U.S. Patent No. 7,620,800)
`
`Gokhale, Maya, and Ron Minnich. "FPGA computing in a data parallel C." [1993]
`
`Proceedings IEEE Workshop on FPGAs for Custom Computing Machines. IEEE,
`
`1993 (EX1074), 94.
`
`17.
`
`I note that Splash2 discloses that the systolic arrays described in
`
`Chapter 8 and used to calculate edit distance were programmed in VHDL, not dbC.
`
`EX1007, 106. And a Skilled Artisan would understand that the systolic array
`
`structure of the edit distance implementations is not an SIMD structure. So
`
`whether or not the system described by Gokhale and Minnich implemented loops
`
`on the Splash 2 CPU, that paper has nothing to do with the systolic arrays
`
`described in Chapter 8 of Splash2.
`
`18.
`
`Indeed, as I explained in my original declaration, the computational
`
`loops of the systolic arrays of Splash2 are implemented in the FPGAs of the Splash
`
`2 system. EX1003¶¶208, 217-38. Splash2 expressly states that “[b]oth the
`
`bidirectional and unidirectional systolic arrays have been implemented on the
`
`Splash 2 programmable logic array, with versions for DNA and protein
`
`sequences.” EX1007, 104 (emphasis added).
`
`19.
`
`Splash2 also states that for the bidirectional systolic array, “[e]ach
`
`processing element (PE) computes the distances along a particular diagonal of the
`
`distance matrix. A block diagram of the PE and a listing of the algorithm it
`9
`
`Microsoft Corp. v. Directstream, LLC, IPR2018-01605, -01606, -01607
`Petitioner Microsoft Corp. – Ex. 1076, p. 9
`
`
`
`IPR2018-01605, -01606, -01607
`(U.S. Patent No. 7,620,800)
`
`executes are shown in Figures 8.6 and 8.7, respectively.” EX1007, 100. Figure
`
`8.7 depicts the loop instructions for that array. See id., 101. Similarly, for the
`
`unidirectional array, Splash2 states “[t]he algorithm executed by each PE in the
`
`unidirectional array is listed in Figure 8.12.” Id., 104. Figure 8.12 depicts the loop
`
`instructions for the unidirectional array. See id., 105. Thus, Splash2 discloses that
`
`it is the processing elements, or “PEs,” of the arrays that execute the looping
`
`instructions. The book further states, however, that “[f]or the DNA version of the
`
`bidirectional array, each of the 16 array FPGAs (Xl to X16) contains 24 PEs,”
`
`and “[i]n the DNA version of the unidirectional array, each of the 16 array FPGAs
`
`(Xl to Xl6) holds 14 PEs.” Id., 107 (emphasis added). Thus, Splash2 expressly
`
`states that the processing elements that implement the loop instructions for both
`
`systolic arrays are implemented in the FPGAs of the Splash2 system.
`
`III. RAPID
`
`20.
`
`Patent Owner and its expert also assert RaPiD (EX1009) does not
`
`disclose a computational loop, but instead discloses “a bypass or forwarding path.”
`
`Response, 89-90; EX2112¶¶235-40. I disagree.
`
`21. As I explained in my original declaration, RaPiD discloses an
`
`implementation of the Discrete Cosine Transform (DCT) on a systolic array of a
`
`reconfigurable computing system. EX1003¶303; EX1009, 5-7. It implements a 2-
`
`10
`
`Microsoft Corp. v. Directstream, LLC, IPR2018-01605, -01606, -01607
`Petitioner Microsoft Corp. – Ex. 1076, p. 10
`
`
`
`IPR2018-01605, -01606, -01607
`(U.S. Patent No. 7,620,800)
`
`D DCT operation as “two sequential 1-D DCTs,” EX1009, 5, which it explains can
`
`be implemented using the following two “dot product” equations:
`
`EX1003¶307. As I have explained, each of these equations calculates the
`
`dot product of a row of the first matrix (a in Eq. 5 and z in Eq. 6) and a column of
`
`the second matrix w (in both, though with difference indices). EX1003¶321.
`
`22. As I have also explained in my original declaration at ¶¶321-29, such
`
`calculations require multiple iterations, each iteration including
`
`(i) a multiplication of one element from a row of the first matrix (a or
`z) with a corresponding element from a column of the second matrix
`(w), and
`
`(ii) the addition of the product of that multiplication to the running
`sum of such products calculated during the previous iterations.
`
`11
`
`Microsoft Corp. v. Directstream, LLC, IPR2018-01605, -01606, -01607
`Petitioner Microsoft Corp. – Ex. 1076, p. 11
`
`
`
`IPR2018-01605, -01606, -01607
`(U.S. Patent No. 7,620,800)
`
`Thus, the equations set forth above calculate a running sum, with the intermediate
`
`sum output by each iteration fed back as an input to the next iteration of
`
`calculations.
`
`23.
`
`To implement the 2-D DCT, RaPiD discloses the following hardware
`
`of Figure 10 corresponding to a cell:
`
`EX1009, 111.1
`
`1 I note that a Skilled Artisan would have understood that the term “netlist” as used
`in Figure 10 to refer to the fact that the figure shows the electrical connectivity of
`the circuit realized in the RaPiD cell.
`
`12
`
`Microsoft Corp. v. Directstream, LLC, IPR2018-01605, -01606, -01607
`Petitioner Microsoft Corp. – Ex. 1076, p. 12
`
`
`
`IPR2018-01605, -01606, -01607
`(U.S. Patent No. 7,620,800)
`
`24.
`
`To illustrate how the calculations set forth above are carried out
`
`repeatedly, I highlighted in an annotated version of Figure 10 in ¶327 of my
`
`original declaration the location in the hardware where the output of the ALU (i.e.,
`
`the running sum) is looped back to the ALU input for use in the next iteration of
`
`the loop:
`
`25.
`
`Patent Owner and its expert assert that the data path I highlighted in
`
`Figure 10 above does not illustrate a computational loop, but rather is “a structure
`
`called a bypass or forwarding path.” Response, 89. I disagree. As explained
`
`above, the data path highlighted in Figure 10 loops the intermediate, or running,
`
`sum from the output of the ALU to its input so that it may be used in the next
`
`iteration of the DCT calculation. The path therefore represents the location in the
`
`13
`
`Microsoft Corp. v. Directstream, LLC, IPR2018-01605, -01606, -01607
`Petitioner Microsoft Corp. – Ex. 1076, p. 13
`
`
`
`IPR2018-01605, -01606, -01607
`(U.S. Patent No. 7,620,800)
`
`hardware of RaPiD where the running sum is fed back to an ALU input, so that it
`
`may be used in the next iteration of the computational loop.2
`
`26.
`
`That the disclosure of RaPiD constitutes a “computational loop” for
`
`purposes of the 800 Patent is beyond argument. Compare, for example, the loop of
`
`RaPiD to the computational loop incorporated-by-reference into the 800 Patent via
`
`EX2037 and discussed above. Both the RaPiD loop and the loop disclosed in
`
`EX2037 accumulate a running, or intermediate, sum by feeding back a value of the
`
`2 I also note that the Patent Owner and its expert assert that a “bypass” is not a
`“computational loop,” Response, 89-90, EX2112¶¶235-40, but do not provide
`support for that contention. The expert does cite to an earlier declaration that
`includes a Figure 4.56 the expert characterizes as including a “forwarding
`path.” Specifically, he identifies the “blue” portion of that figure as representing
`the forwarding path. See EX2029, ¶45, citing a figure from EX2043 at page 311
`(p. 334, using the exhibit page numbers). The blue portion of that figure, however,
`does not include the same loop structure that I have identified in RaPiD as a
`“computational loop.” In particular, the structure of RaPiD Figure 10 that I have
`highlighted feeds back the intermediate sum output by the ALU to the input of the
`ALU so that it can be used in the next iteration of the calculations. The blue
`portion of Figure 4.56 communicates control information through various stages to
`a “Forwarding Unit,” and then to the control inputs of two MUXes. EX2043 at
`311. Thus, the blue path of Figure 4.56 does not loop back the output of any
`structure to the input of that structure so that it can be used in the next iteration of
`calculations. RaPiD thus discloses something different from the forwarding path
`cited by Patent Owner’s expert. It discloses a “computational loop.”
`
`14
`
`Microsoft Corp. v. Directstream, LLC, IPR2018-01605, -01606, -01607
`Petitioner Microsoft Corp. – Ex. 1076, p. 14
`
`
`
`IPR2018-01605, -01606, -01607
`(U.S. Patent No. 7,620,800)
`
`sum obtained in the previous step to be used in the next step. A Skilled Artisan
`
`would therefore understand that the loop structure disclosed in RaPiD is analogous
`
`to the structure identified as a “loop” in the 800 Patent through the incorporated-
`
`by-reference paper EX2037.
`
`27.
`
`Finally, I note that during my deposition I agreed that Figure 10
`
`depicts a bypass, but I was not asked and did not specify where in the Figure that
`
`bypass is located. EX2063, 201:21-202:1. In particular, my testimony indicating
`
`that there is a bypass in Figure 10 referred to the bottom wire of Figure 10 labeled
`
`“Column 1-D DCT results flows out,” which either bypasses the cell or is
`
`terminated at the multiplexor while the cell’s DCT results are passed instead to the
`
`next cell. That bottom wire – and not the feedback path of Figure 10 I highlighted
`
`in my original declaration at ¶327 -- is the bypass I was referring to in my
`
`testimony, which I would have stated had I been asked.
`
`Dated: November 26, 2019
`
`Respectfully Submitted,
`
`(cid:3)(cid:5)(cid:4)(cid:2)(cid:7) (cid:1)(cid:6)(cid:7)
`
`15
`
`Microsoft Corp. v. Directstream, LLC, IPR2018-01605, -01606, -01607
`Petitioner Microsoft Corp. – Ex. 1076, p. 15
`
`