`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2176, p. 1
`
`
`
`18 A. LEw AND R. HALVERSON, JR. programming problems can be solved on an FPGA-based class of computers. 1. INTRODUCTION Dynamic programming has long had a reputation as a general-purpose methodology for solving optimization problems, but one that is not as useful in practice as it is in principle. To increase its practicality, many efforts have been made to develop more efficient dynamic programming algorithms using innovative computational means, including the design of parallel processing al- gorithms and computers. Other work on the design of parallel algorithms is often constrained by the adoption of particular hardware as target machines (e.g., array processors or hypercubes), and other work on the design of parallel machines is often limited by the choice of particular applications (such as partial differential equations or image processing). We report on our own complementary efforts to implement dynamic programming algorithms on an innovative class of parallel processors, and to improve the architectural design of such processors to make imple- mentations of dynamic programming algoritl~ms more efficient. In subsequent sections, we first discuss how an architectural successor to the HPC [1], the Functional Memory Computer (FMC) [4,5], allows implementations of deterministic algorithms without some of the artificial constraints imposed by the nature of von Neumann computers. The distinguishing characteristic of deterministic algorithms is the introduction of "loop variables" whose primary purpose is to control the sequencing of operations, such as may be required to find the minimum in a set. Using the FMC, such explicit sequencing is no longer necessary. Instead, nondeterministic constructs are made available in a decision table (DT) [6] source lan- guage, and expression-level computations are executed in dataflow fashion using FPGAs. We have built a prototype of the FMC incorporating several FPGA chips, and used it to execute a nondeterministic bubble sorting algorithm. This example demonstrates the viability of nonde- terministic programming on the FMC. We conclude with a discussion of how nondeterministic dynamic programming (DP) algorithms can be expressed and executed on the FMC. 2. DETERMINISTIC DP ON THE FMC To introduce the basic ideas, we consider first the following procedure which finds the shortest path length in an acyclic graph using dynamic programming [7, p. 360]: procedure f indshortestpath(var shpathlength: integer) ; var i,j ,k:integer; min : dat atype ; ptr : integer ; dtpbegin ! lambda= 01111 i>O -TTTF j>n -TFF- d[i,j]+f [j] <min --TF- f In] : =0 X .... f [i] :-rain -X--- t [i] : =ptr [ -X--- rain: --maxint ) XX--- min:=d[i,j]+f [j] --X-- pit : =n+ 1 XX--- ptr : -j --X-- i :-n-1 X .... £:=i-I -X--- j : -i+ 1 XX--- j : -j +1 --XX- shpathleng~h: -f [i] .... X repeat ! XXXX- exit ! .... X dtpend )
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2176, p. 2
`
`
`
`Dynamic Programming 19 The DT conventions adopted here (e.g., that lambda is initially set to zero by the system) are as described in [1]. Using the HPC [4], this procedure can be executed more quickly by performing the condition tests within FPGAs in parallel, comparing different conditions (e.g., i>0 and j>n) or different rules (columns) concurrently. Using the FMC [5], the calculations in the stubs can be performed in parallel in FPGAs using combinational logic, resulting in even greater speedup. For the shortest path problem in particular, a speedup of up to 38% has been achieved using our prototype machine. 3. NONDETERMINISTIC DECISION TABLES The dynamic programming algorithm given above was designed implicitly for execution on a sequential (von Neumann) computer. The variables i and j are loop variables that specify a deterministic sequence of operations. The i {=n-1 dot, into 1} loop is associated with the dynamic programming Mgorithm itself, which computes the shortest distance from the i th node to the "target" node n in a sequence that guarantees values are computed before they are needed for later computations (assuming the graph has been topologically sorted). On the other hand, the j {--i+l to n} loop is associated with finding the minimum in a set, where the values in the set are tested in a specific sequential order. However, if unconstrained by the usual characteristics of von Neumann computers, the testing for a set minimum can be performed nondeterministically. (We use the term nondeterminism in a general sense, which includes the parallel case; nondeterministic operations can be performed in a sequence in any order, or in parallel in concurrent or interleaved fashion.) We conclude that one key to speeding up the dynamic programming process is to replace the sequencing associated with loop variables by nondeterministic set operations whenever possible. We have extended our decision table language to add such nondeterministic or parallel processing constructs. As an illustration, consider the following nondeterministic bubble sort algorithm, in which an exchange operation takes place whenever an adjacent pair of array elements is found in the wrong order: {j in 1..n-lla[j]>a[j+l]} <> 0 IFT ................................. +-- exchange (a [j ] , a [ j + 1 ] ) ~ -X exit ! ~ X- (Note: 0 denotes the empty set here.) This procedure should be compared with the usual bubble sort algorithm that uses two loops, one within the other, the two distinct loop variables specifying a deterministic sequence of comparisons and potential exchanges. However, in the DT given here, "in" is a nondeterministic set selection operation, which is used so that comparisons (and related subscripting calculations) can be performed concurrently. Applied to the shortest path prob- lem, the nondeterministic set selection operation may be used to implement a nondeterministic Find-the-Minimum-In-a-Set operation. Note that if {j in 1..nla [j] <rain}--0, then rain is the minimum in set a. In practice, sets may be implemented using arrays, but a nondeterministic FindMinInSet operation would still be distinguished by the absence of a loop variable and its associated sequencing. 4. NONDETERMINISTIC DYNAMIC PROGRAMMING A nondeterministic version of the shortest path dynamic programming Mgorithm can be based upon a SetUp procedure that creates a set of values whose minimum is to be found, and a FindMinInSet procedure that finds the minimum of this SetUp set. Specifically, we need two procedures, SetUp (g, a,.. ) to produce a set {g [b] --- d (a, b) +f ct (b,..) } for all b, and Find- MinInSet (g ,min,pos) to find the minimum in the set {g [b] } (and the position of the minimum value). We discussed in the previous section how the FindMinInSet operation can be implemented
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2176, p. 3
`
`
`
`20 A. LEW AND R. HALVERSON, JR. nondeterministically. The remaining problem of implementing SetUp nondeterministically as well will be discussed in the next section. Recall that to find the shortest path in an acyclic graph using dynamic programming, we may recursively solve the DP functional equation fct(a)=min {d(a,b)+fct(b)}, b with boundary condition fct (TGT)=0, using FindM±nInSet to find the minimum in the set g[b] = d (a,b)+fct (b), for b=t.. n. We remark that if d(a, b) is set to "infinity" (MAXINT) when there is no branch from a to b, then the nodes need not be topologically sorted. A program using these ideas, which assumes arrays are used to represent sets, follows: program dtspag; const n=...; /* NO. OF NODES */ type settyp=array[l..n] of integer; var d:array[1..n] of array [l . . n] of integer; function FCT(src ,tgt : integer ; vat next : integer) : integer; procedure SetUp(var g:settyp; a:integer) ; begin /* produce a set {g[b]=d[a,b]+FCT(b,tgt,nx)} for all b */ end; vat min: integer; pos : integer ; g: settyp; dtpbegin ! srcftgt [ TF ........................... +-- min : =0 ; poe : =tgt [ X- SetUp(g,src) [-X FindMinInSet (g, min, poe) [ -X FCT : --min; next : =pos [XX exit ! [ XX dtpend ! {end FCT} var min:integer; temp : integer ; begin ... {readdata(n,d) ;} min: =FCT ( 1, n, temp) ; • . . {writeln( ' sp path length=',min) ; } end. (Note: FCT is a recursive function, calling itself within SetUp.) Nondeterministic versions of dynamic programming algorithms, such as those for finding the shortest path in a cyclic graph and for finding the shortest cyclic path traversing all nodes once (the traveling-salesman problem), can also be implemented using the same FindMinInSet operation. 5. IMPLEMENTATION CONSIDERATIONS A variety of different ways to solve dynamic programming problems using nondeterministic or parallel computation have been suggested; for example, see [8]. For the most part, these other approaches require relatively expensive parallel computer hardware and are intended for large-scale "massive" problems• We have a long-term interest in these problems as well, but here we focus our attention instead on small-scale "ordinary" albeit real-world problems and on the development of nondeterministic dynamic programming algorithms for execution on PC- class computers, extended perhaps by the addition of coprocessing boards. Our FMC is such a computer.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2176, p. 4
`
`
`
`Dynamic Programming 21 In the preceding section, we presented a decision table program designed for execution on the FMC that uses dynamic programming to compute the shortest path in a graph. FindMinIn- Set can be implemented nondeterministically in the fashion indicated at the end of Section 3. However, a nondeterministic implementation of SetUp is more complicated, primarily because it involves recursion. We can handle recursion easily in other contexts, by using stacks imple- mented as arrays, but this would be inefficient for the shortest path problem. As an alternative, we can implement SetUp deterministically, in an order specified by a loop variable, while still implementing FtndM±nInSet nondeterministically. This alternative is doable today. However, the nondeterministic shortest path algorithm so implemented would not be useful in practice. In any event, there are other problems that affect the practicality of our approach. One re- striction is that, for even small problems, the arrays (especially d) may not fit entirely within a single FPGA, so set operations must be implemented across interconnected FPGAs. Multiple chip configurations would require at least an operand-in/operand-out path on each chip inter- connecting it to its neighbors. Furthermore, for large dimensional problems, the arrays can no longer fit within all the available FPGAs, s~ set operations cannot be implemented wholly in combinational logic. One way to handle such problems is by using array sequencing instead of nondeterministic set operations. Use of the FMC in some cases, or of the HPC in others, is still possible, but without many of their benefits. We will discuss alternatives to this in a subsequent paper. One of our main research objectives is to explore possible ways to take advantage of repro- grammable combinational logic, say, to implement a library of nondeterministic functions, of which FindMinInSet is an example. Of course, in the future, we expect larger FPGA chips, and perhaps newly designed ones that are better suited for (and possibly motivated by) FM architec- tures, to become available, which should mitigate some of the dimensionality problems we have mentioned here. We are also examining how to partition problems in a fashion that makes good use of a given set of unconnected (or minimally connected) FPGAs. For set selection, a divide-and-conquer procedure should be helpful [9]. For dynamic programming, some of the partitioning approaches proposed long ago (e.g., "tearing" [10]), which were of limited usefulness on von Neumann sys- tems, may prove useful with this new FMC architecture. These are only a few of the many directions that appear to be worth pursuing. 6. CONCLUSION We have implemented a working prototype of our innovative FMC system, and intend to investigate ways to improve it and to extend its usefulness, specifically to execute nondeterministic dynamic programming algorithms as introduced here. The benefits of the FMC are only now beginning to be explored. In the short term, we expect there to be special-purpose applications for which the building of a customized computing machine would be cost-effective; examples of such applications include routing problems using dynamic programming (perhaps for an airline reservation system), Petri net modeling and simulation, and expert system design. In the long term, there is the potential that our architectural design concepts can be scaled up and lead to a new generation of general-purpose computer systems useful for some grander and more challenging problems. REFERENCES 1. A. Lew and R. Halverson, Jr., Dynamic programming, decision tables, and the Hawaii parallel computer, In Proc. 5 th Bellman Continuum, (1993); Computers Math. Applic. 2"I (9/10), 121-127 (1994). 2. R. Bellman, An Introduction to Artificial Intelligence: Can Computers Think?, Boyd & Fraser, San Fran- cisco, CA, (1978). 3. S. Trimberger, A reprogrammable gate array and applications, Proceedings o/the IEEE 81 (7), 1030-1041 (1993).
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2176, p. 5
`
`
`
`22 A. LEw AND R. HALVERSON, JR. 4. R. Halverson, Jr. and A. Lew, Programming the Hawaii parallel computer, Proc. ACM ~nd Intl. Workshop on FPGAs, (1994). 5. R. Halverson, Jr. and A. Lew, Programming with functional memory, Proc. ~3 rd Intl. Conf. on Parallel Processing, (1994). 6. CODASYL Task Group, A Modern Appraisal of Decision Tables, ACM, New York, (1970). 7. A. Lew, Computer Science: A Mathematical Introduction, Prentice-Hall International, London, (1985). 8. D.P. Bertsekas and J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Prentice-Hall, Englewood Cliffs, N J, (1989). 9. A.V. Aho, J.E. Hopcroft and J.D. Ullman, The Design and Analysis of Computer Algorithms, Addi- son-Wesley, Reading, MA, (1974). 10. D.V. Steward, Partitioning and tearing of systems of equations, SIAM J. Numer. Anal. Ser. B 2, 345-365 (1965).
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2176, p. 6
`
`