`
`Programmability with increased performance? New strategies to
`attain this goal include two approaches to data flow architecture:
`data flow multiprocessors and the cell block architecture.
`
`Data Flow Supercomputers
`
`Jack B. Dennis
`MIT Laboratory for Computer Science
`
`The architects of supercomputers must meet three
`challenges if the next generation of machines is to find
`productive large-scale application to the important prob-
`lems of computational physics. First, they must achieve
`high performance at acceptable cost. Instruction execu-
`tion rates of a billion floating-point operations each sec-
`ond are in demand, whereas current architectures require
`intricate programming to attain a fraction of their poten-
`tial, at best around one tenth of the goal. Brute force ap-
`proaches to increase the speed of conventional architec-
`tures have reached their limit and fail to take advantage of
`the major recent advances in semiconductor device tech-
`nology. Second, they must exploit the potential of LSI
`technology. Novel architectures are needed which use
`large numbers but only a few different types of parts, each
`with a high logic-to-pin ratio. In a supercomputer, most
`of these parts must be productive most of the time; hence
`the need to exploit concurrency of computation on a mas-
`sive scale. Third, it must be possible to program super-
`computers to exploit their performance potential. This
`has proven to be an enormous problem, even in the case of
`computations for which reasonably straightforward For-
`tran programs exist. Thus present supercomputer archi-
`tectures have exacerbated rather than resolved the soft-
`ware crisis.
`It appears that the objectives of improving program-
`mability and increasing performance are in conflict, and
`new approaches are necessary. However, any major de-
`parture from conventional architectures based on sequen-
`tial program execution requires that the whole process of
`program design, structure, and compilation be redone
`along new lines. One architecture under consideration is a
`multiprocessor machine made of hundreds of intercom-
`municating microcomputer processing elements. This
`architecture has attracted wide interest, but has many
`drawbacks; even if the processing elements had full float-
`
`ing-point capability and ran at a million instructions per
`second, at least one thousand would be required to attain
`a billion instructions per second performance. For such a
`number of processing elements there is no known way of
`permitting access to a shared memory without severe
`performance degradation. Similarly, no known way of
`arranging conventional microprocessors for synchroniza-
`tion or message passing allows efficient operation while
`exploiting fine grain parallelism in an application. And
`finally, there is no programming language or methodo-
`logy that supports mapping application codes onto such a
`multiprocessor in a way that achieves high performance.
`Language-based computer design can ensure the pro-
`grammability of a radical architecture. In a language-
`based design the computer is a hardware interpreter for a
`specific base language, and programs to be run on the sys-
`tem must be expressed in this language. I Because future
`supercomputers must support massive concurrency to
`achieve a significant increase in performance, a base lan-
`guage for supercomputers must allow expression of con-
`currency of program execution on a large scale. Since con-
`ventional languages such as Fortran are based on a global
`state model of computer operation, these languages are
`unsuitable for the next generation of supercomputers and
`will eventually be abandoned for large-scale scientific
`computation. At present, functional or applicative pro-
`gramming languages and data flow models of computa-
`tion are the only known foundation appropriate for a su-
`percomputer base language. Two programming lan-
`guages have been designed recently in response to the
`need for an applicative programming language suitable
`for scientific numerical computation: ID, developed at Ir-
`vine,2 and Val, designed at MIT.3'4
`Data flow architectures offer a possible solution to the
`problem of efficiently exploiting concurrency of compu-
`tation on a large scale, and they are compatible with
`
`48
`
`0018-9162/80/1100-0048$00.75 g 1980 IEEE
`
`COMPUTER
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2048, p. 1
`
`
`
`SRC00034162
`
`modern concepts of program structure. Therefore, they
`should not suffer so much from the difficulties of pro-
`gramming that *have hampered other approaches to
`highly parallel computation.
`The data flow concept is a fundamentally different way
`of looking at instruction execution in machine-level pro-
`grams-an alternative to sequential instruction execu-
`tion. In a data flow computer, an instruction is ready for
`execution when its operands have arrived. There is no
`concept of control flow, and data flow computers do not
`have program location counters. A consequence of data-
`activated instruction execution is that many instructions
`of a data flow program may be available for execution at
`once. Thus, highly concurrent computation is a natural
`consequence of the data flow concept.
`The idea of data-driven computation is old,5'6 but only
`in recent years have architectural schemes with attractive
`anticipated performance and the capability of supporting
`a general level of user language been developed. Work on
`data-driven concepts of program structure and on the
`design of practical data-driven computers is now in prog-
`ress in at least a dozen laboratories in the US and Europe.
`Several processors with data-driven instruction execution
`have been built, and more hardware projects are being
`planned. Most of this work on architectural concepts for
`data flow computation is based on a program representa-
`tion known as data flow program graphs7 which evolved
`from work of Rodriguez,8 Adams,9 and Karp and
`Miller.10 In fact, data flow computers are a form of
`language-based architecture in which program graphs are
`the base language. As shown in Figure 1, data flow pro-
`gram graphs serve as a formally specified interface be-
`tween system architecture on one hand and user program-
`ming language on the other. The architect's task is to
`define and realize a computer system that faithfully im-
`plements the formal behavior of program graphs; the
`language implementer's task is to translate source
`language programs into their equivalent as program
`graphs.
`
`The techniques used to translate source language pro-
`grams into data flow graphs I I are similar to the methods
`used in conventional optimizing compilers to analyze the
`paths of data dependency in source programs. High-level
`programming languages for data flow computation should
`be designed so it is easy for the translator to identify data
`dependence and generate program graphs that expose
`parallelism. The primary sources of difficulty are un-
`restricted transfer of control and the "side effects"
`resulting from assignment to a global variable or input ar-
`guments of a procedure. Removal of these sources of dif-
`ficulty not only makes concurrency easy to identify, it
`also improves program structure. Programs are more
`modular, and are easier to understand and verify. The im-
`plications of data flow for language designers are dis-
`cussed by Ackerman. 12
`This article presents two architectures from the variety
`of schemes devised to support computations expressed as
`data flow graphs. First we explain data flow graphs by ex-
`amples, and show how they are represented as collections
`of activity templates. Next we describe the basic instruc-
`tion-handling mechanism used in most current projects to
`
`November 1980
`
`build prototype data flow systems. Then we develop the
`two contrasting architectures and discuss the reasons for
`their differences-in particular the different approaches
`to communicating information between parts of a data
`flow machine.
`
`Data flow programs
`
`A data flow program graph is made up of actors con-
`nected by arcs. One kind of actor is the operator shown in
`Figure 2, drawn as a circle with a function symbol written
`inside-in this case +, indicating addition. An operator
`also has input arcs and output arcs which carry tokens
`bearing values. The arcs define paths over which values
`from one actor are conveyed by tokens to other actors.
`Tokens are placed on and removed from the arcs of a pro-
`gram graph according to firing rules, which are illustrated
`for an operator in Figure 3. To be enabled, tokens must be
`present on each input arc, and there must be no token on
`any output arc of the actor. Any enabled actor may be
`fired. In the case of an operator, this means removing one
`token from each input arc, applying the specified func-
`tion to the values carried by those tokens, and placing
`tokens labeled with the result value on the output arcs.
`
`Figure 1. Program graphs as a base language.
`
`Figure 2. Data flow actor.
`
`Figure 3. Firing rule: (a) before; (b) after.
`
`49
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2048, p. 2
`
`
`
`SRC00034163
`
`Figure 4. Interconnection of operators.
`
`Figure 5. An activity template.
`
`Operators may be connected as shown in Figure 4 to
`form program graphs. Here, presenting tokens bearing
`values for x and y at the two inputs will enable computa-
`tion of the value
`
`z = (x+y) * (x-y)
`
`by the program graph, placing a token carrying the result
`value on output arc z.
`Another representation for data flow programs-one
`much closer to the machine language used in prototype
`data flow computers-is useful in understanding the
`working of these machines. In this scheme, a data flow
`program is a collection of activity templates, each cor-
`responding to one or more actors of a data flow program
`graph. An activity template corresponding to the plus
`operator (Figure 2) is shown in Figure 5. There are four
`fields: an operation code specifying the operation to be
`performed; two receivers, which are places waiting to be
`filled in with operand values; and destination fields (in
`this case one), which specify what is to be done with the
`result of the operation on the operands.
`An instruction of a data flow program is the fixed por-
`tion of an activity template. It consists of the operation
`code and the destinations; that is,
`
`instruction:
`< opcode, destinations>
`
`Figure 6 shows how activity templates are joined to repre-
`sent a program graph, specifically the composition of op-
`erators in Figure 4. Each destination field specifies a tar-
`get receiver by giving the address of some activity tem-
`plate and an input integer specifying which receiver of the
`template is the target; that is,
`
`destination:
`<address, input>
`
`Figure 6. Configuration of activity templates for the pro-
`gram graph of Figure 4.
`
`Program structures for conditionals and iteration are
`illustrated in Figures 7 and 8. These use two new data flow
`actors, switch and merge, which control the routing ofda-
`ta values. The switch actor sends a data input to its T or F
`output to match a true or false boolean control input. The
`merge actor forwards a data value from its T or F input ac-
`cording to its boolean input value. The conditional pro-
`gram graph and implementation in Figure 7 represent
`computation of
`
`y: =(IFx>3 THENx+2 ELSEx-I)*4
`
`and the program graph and implementation in Figure 8
`represent the iterative computation
`
`WHILE x>O DO = x-3
`
`Execution of a machine program consisting of activity
`templates is viewed as follows. The contents of a template
`activated by the presence of an operand value in each re-
`ceiver take the form
`
`operation packet:
`<opcode, operands, destinations>
`
`Such a packet specifies one result packet of the form
`
`result packet:
`<value, destination>
`
`for each destination field of the template. Generation of a
`result packet, in turn, causes the value to be placed in the
`receiver designated by its destination field.
`Note that this view of data flow computation does not
`explicitly honor the rule of program graphs that tokens
`must be absent from the output arcs of an actor for it to
`fire. Yet there are situations where it is attractive to use a
`program graph in pipelined fashion, as illustrated in Fig-
`ure 9a. Here, one computation by the graph has produced
`the value 6 on arc z while a new computation represented
`
`COMPUTER
`
`50
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2048, p. 3
`
`
`
`SRC00034164
`
`r e
`
`lee
`
`3]
`
`Figure 7. A conditional schema (a) and its implementation (b).
`
`Figure 8. An iterative schema (a) and its implementation (b).
`
`Figure 9. Pipelining in a data flow program (a) and its implementation (b).
`
`November 1980
`
`51
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2048, p. 4
`
`
`
`SRC00034165
`
`OPERATION
`UNIT(S)
`
`INSTRUCTION
`QUEUE
`
`ACTIVITY
`STORE
`
`UPDATE
`
`by input values 5 and 3 on arcs xandy is ready to begin. To
`faithfully implement this computation, the add instruc-
`tion must not be reactivated until its previous result has
`been used by the multiply instruction. This constraint is
`enforced through use of acknowledge signals generated
`by specially marked designations (*) in an activity tem-
`plate. Acknowledge signals, in general, are sent to the
`templates that supply operand values to the activity tem-
`plate in question (Figure 9b). The enabling rule now re-
`quires that all receivers contain values, and the required
`number of acknowledge signals have been received. This
`number (if nonzero) is written adjacent to the opcode of
`an activity template.
`
`The basic mechanism
`
`The basic instruction execution mechanism used in sev-
`eral current data flow projects is illustrated in Figure 10.
`The data flow program describing the computation to be
`performed is held as a collection of activity templates in
`the activity store. Each activity template has a unique ad-
`dress which is entered in the instruction queue unit (a
`FIFO buffer store) when the instruction is ready for exe-
`cution. The fetch unit takes an instruction address from
`the instruction queue and reads the activity template from
`the activity store, forms it into an operation packet, and
`passes it on to the operation unit. The operation unit per-
`forms the operation specified by the operation code on
`the operand values, generating one result packet for each
`destinatiorn field of the operation packet. The update unit
`receives result packets and enters the values they carry in-
`to operand fields of activity templates as specified by their
`destination fields. The update unit also tests whether all
`operand and acknowledge packets required to activate
`
`Figure 10. Basic instruction execution mechanism.
`
`52
`
`the destination instruction have been received and, if so,
`enters the instruction address in the instruction queue.
`During program execution, the number of entries in the
`instruction queue measures the degree of concurrency
`present in the program. The basic mechanism of Figure 10
`can exploit this potential to a limited but significant de-
`gree: once the fetch unit has sent an operation packet off
`to the operation unit, it may immediately read another en-
`try from the instruction queue without waiting for the in-
`struction previously fetched to be completely processed.
`Thus a continuous stream of operation packets may flow
`from the fetch unit to the operation unit so long as the in-
`struction queue is not empty.
`This mechanism is aptly called a circular pipeline-ac-
`tivity controlled by the flow of information packets tra-
`verses the ring of units leftwise. A number of packets may
`be flowing simultaneously in different parts of the ring on
`behalf of different instructions in concurrent execution.
`Thus the ring operates as a pipeline system with all of its
`units actively processing packets at once. The degree of
`concurrency possible is limited by the number of units on
`the ring and the degree of pipelining within each unit. Ad-
`ditional concurrency may be exploited by splitting any
`unit in the ring into several units which can be allocated to
`concurrent activities. Ultimately, the level of concurrency
`is limited by the capacity of the data paths connecting the
`units of the ring. This basic mechanism is essentially that
`implemented in a prototype data flow processing element
`built by a group at the Texas Instruments Company.13
`The same mechanism, elaborated to handle data flow
`procedures, was described earlier by Rumbaugh, 14 and a
`new project at Manchester University uses another varia-
`tion of the same scheme.'5
`
`The data flow multiprocessor
`
`The level of concurrency exploited may be increased
`enormously by connecting many processing elements of
`the form we have described to form a data flow multipro-
`cessor system. Figure 1 la shows many processing ele-
`ments connected through a communication system, and
`Figure 1 lb shows how each processing element relates to
`the communication system. The data flow program is di-
`vided into parts which are distributed over the processing
`elements. The activity stores of the processing elements
`collectively realize a single large address space, so the ad.
`dress field of a destination may select uniquely any activi-
`ty template in the system. Each processing element sends a
`result packet through the communication network if its
`destination address specifies a nonlocal activity template,
`and to its own update unit otherwise.
`The communication network is responsible for deliver-
`ing each result packet received to the processing element
`that holds the target activity template. This network,
`called a routing network, transmits each packet arriving
`at an input port to the output specified by information
`contained in the packet. The requirements of a routing
`network for a data flow multiprocessor differ in two im-
`portant ways from those of a processor/memory switch
`for a conventional multiprocessor system. First, informa-
`tion flow in a routing network is in one direction-an im-
`
`COMPUTER
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2048, p. 5
`
`
`
`SRC00034166
`
`COMMUNI-
`CATION
`NETWORK
`SYSTEM:
`
`CATION
`SYSTEM
`
`ACTIVITY
`STORE
`
`mediate reply from the target unit to the originating unit is
`not required. Second, since each processing element holds
`many enabled instructions ready for processing, some
`delay can be tolerated in transmission of result packets
`without slowing down the overall rate of computation.
`The crossbar switch in conventional multiprocessor
`systems meets requirements for immediate response and
`small delay by providing for signal paths from any input
`to any output. These paths are established on request and
`maintained until a reply completes a processor/memory
`transaction. This arrangement is needlessly expensive for
`a data flow multiprocessor, and a number of alternative
`network structures have been proposed. The ring form of
`communication network is used in many computer net-
`works, and has been used by Texas Instruments to couple
`four processing elements in their prototype data flow
`computer. The drawback of the ring is that delay grows
`linearly with size, and there is a fixed bound on capacity.
`Several groups have proposed tree-structured networks
`for communicating among processing elements.16"17"18
`Here, the drawback is that traffic density at the root node
`may be unacceptably high. Advantages ofthe tree are that
`the worst case distance between leaves grows only as log2 N
`(for a binary tree), and many pairs of nodes are connected
`by short paths.
`The packet routing network shown in Figure 12 is a
`structure currently attracting much attention. A routing
`network with N input and N output ports may be as-
`sembled from (N/2) log2(N) units, each of which is a 2 x 2
`router. A 2 x 2 router receives packets at two input ports
`and transmits each received packet at one of its output
`ports according to an address bit contained in the packet.
`Packets are handled first come, first served, and both out-
`put ports may be active concurrently. Delay through an
`Nx N network increases as 1og2 N, and capacity rises
`nearly linearly with N. This form of routing network is de-
`scribed in Leung'9 and Tripathi and Lipovski.20 Several
`related structures have been analyzed for capacity and de-
`lay.21
`
`The cell block architecture
`
`In a data flow multiprocessor (Figure 11), we noted the
`problem of partitioning the instructions of a program
`among the processing elements to concentrate communi-
`cation among instructions held in the same processing ele-
`ment. This is advantageous because the time to transport
`a result packet to a nonlocal processor through the rout-
`ing network will be longer (perhaps much longer) than the
`time to forward a result locally.
`At MIT, an architecture has been proposed in response
`to an opposing view: each instruction is equally accessible
`to result packets generated by any other instruction,
`regardless of where they reside in the machine.22'23 The
`structure of this machine is shown in Figure 13. The heart
`of this architecture is a large set of instruction cells, each
`of which holds one activity template of a data flow pro-
`gram. Result packets arrive at instruction cells from the
`distribution network. Each instruction cell sends an op-
`eration packet to the arbitration network when all oper-
`ands and signals have been received. The function of the
`
`November 1980
`
`Figure 11. Data flow multiprocessor: (a) connection of many process-
`ing elements through a communication system; (b) relationship of
`each PE to the communication system.
`
`Figure 12. Routing network structure.
`
`53
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2048, p. 6
`
`
`
`SRC00034167
`
`DISTRI-
`BUTION
`NETWORK
`
`INSTRUCTION
`fies
`
`OPERATION
`SECTION
`
`DISTRI-
`BUTION
`NETWORK
`
`CELL BLOCKS
`
`ARBI-
`TRATION
`NETWORK
`
`ARBI-
`TRATION
`NETWORK
`
`UPDATE
`
`ACTIVITY
`STORE
`
`Figure 13. Genesis of the cell block architecture.
`
`Figure 14. Practical form of the cell block architecture.
`
`Figure 15. Cell block implementation.
`
`operation section is to execute instructions and to for-
`ward result packets to target instructions by way of the
`distribution network.
`The design in Figure 13 is impractical if the instruction
`cells are fabricated as individual physical units, since the
`number of devices and interconnections would be enor-
`mous. A more attractive structure is obtained if the in-
`struction cells are grouped into blocks and each block re-
`alized as a single device. Such an instruction cell block has
`a single input port for result packets and a single output
`port for operation packets. Thus one cell block unit re-
`places many instruction cells and the associated portion
`of the distribution network. Moreover, a byte-serial for-
`mat for result and operation packets further reduces the
`number of interconnections between cell blocks and other
`units.
`
`COMPUTER
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2048, p. 7
`
`
`
`SRC00034168
`
`. . . . . . . .
`
`. . . . . .
`
`. . . . . .
`
`. . . . . . . . .
`
`. . . . . .
`
`. . .
`
`. . . .
`
`1,895
`2,195
`
`TRANSNET corporation
`201-688-7800
`T W X 7 1 0 - 9 8 5 - 5 4 8 5
`
`P C E
`t a r y
`UNION, N.J. 07083
`
`.
`
`The resulting structure is shown in Figure 14. Here, sev-
`eral cell blocks are served by a shared group of functional
`, Pk. The arbitration network in each section
`units Pi, .
`.
`of the machine passes each operation packet to the appro-
`priate functional unit according to its opcode. The num-
`ber of functional unit types in such a machine is likely to
`be small (four, for example), or just one universal func-
`tional unit type might be provided, in which case the arbi-
`tration network becomes trivial.
`The relationship between the cell block architecture
`and the basic mechanism described earlier becomes clear
`when one considers how a cell block unit would be con-
`structed. As shown in Figure 15, a cell block would in-
`clude storage for activity templates, a buffer store for ad-
`dresses of enabled instructions, and control units to re-
`ceive result packets and transmit operation packets.
`These control units are functionally equivalent to the
`fetch and update units of the basic mechanism. The cell
`block differs from the basic data flow processing element
`in that the cell block contains no functional units, and
`there is no shortcut for result packets destined for succes-
`sor instructions held in the same cell block.
`
`Discussion and conclusions
`
`In the cell block architecture, communication of a re-
`sult packet from one instruction to its successor is equally
`easy (or equally difficult, depending on your point of
`view) regardless of how the two instructions are placed
`within the entire activity store of the machine. Thus the
`programmer need not be concerned that his program
`might run slowly due to an unfortunate distribution of in-
`structions in the activity store address space. In fact, a
`random allocation of instructions may prove to be ade-
`quate.
`In the data flow multiprocessor, communication be-
`tween two instructions is much quicker if these instruc-
`tions are allocated to the same processing element. Thus a
`program may run much faster if its instructions are clus-
`tered to minimize communication traffic between clusters
`and each cluster is allocated to one processing element.
`Since it will be handling significantly less packet traffic,
`the communication network of the data flow multipro-
`cessor will be simpler and less expensive than the distribu-
`tion network in the cell block architecture. Whether the
`cost reduction justifies the additional programming ef-
`fort is a matter of debate, contingent on the area of appli-
`cation, the technology of fabrication, and the time frame
`under consideration.
`Although the routing networks in the two forms of data
`flow processor have a much more favorable growth of
`logic complexity (N log N) with increasing size than the
`switching networks of conventional multiprocessor sys-
`tems, their growth is still more than linear. Moreover, in
`all suggested physical structures for N x N routing net-
`works, the complexity as measured- by total wire length
`grows as O(N2). This fact shows that interconnection
`complexity still places limits on the size of practical multi-
`unit systems which support universal intercommunica-
`tion. If we need still larger systems, it appears we must set-
`tle for arrangements of units that only support com-
`
`November 1980
`
`Reader Service Number 9
`
`munication with immediate neighbors.
`The advant-age data flow architectures have over other
`approaches to high-performance computation is that the
`scheduling and synchronization of concurrent activities
`are built in at the hardware level, enabling each instruc-
`tion execution to be treated as an independent concurrent
`action. This allows efficient fine grain parallelism, which
`is precluded when the synchronization and scheduling
`functions are realized in software or microcode. Further-
`more, there are well-defined rules for translating high-
`level programs into data flow machine code.
`What are the prospects for data flow supercomputers?
`Machines based on either of the two architectures pre-
`sented in this paper could be built today. A machine hav-
`ing up to 512 processing elements or cell blocks seems fea-
`sible. For example, a 4 x 4 router for packets, each sent as
`a series of 8-bit bytes, could be fabricated as a 100-pin LSI
`device, and fewer than one thousand of these devices
`could interconnect 512 processing elements or cell blocks.
`If each processing unit could operate at two million in-
`structions per second, the goal of a billion instructions per
`second would be achieved.
`Yet there are problems to be solved and issues to be ad-
`dressed. It is difficult to see how data flow computers
`could support programs written in Fortran without re-
`strictions on and careful tailoring of the code. Study is
`just beginning on applicative languages like Val and
`ID.24,25 These promise solutions to the problems of map-
`
`1
`
`IIW
`IRlbX1tl
`ly;U-1%
`PURCHASE 12-24 MONTH FULL1 36 MONTH
`OWNERSHIP PLAN I LEASE PLAN
`PLAN
`PURCHASE
`PER MONTH
`36 MOS.
`24 MOS.
`12 MOS.
`PRICE
`DESCRIPTION
`$ 61
`......... $1,695
`LA36 DECwriter
`$162
`$ 90
`LA34 DECwriter IV ........: .1,095
`40
`59
`105
`1,295
`LA34 DECwriter IV Forms CtrI.
`124
`47
`69
`239
`90
`2,495
`LA120 DECwriter IlIl KSR
`140
`LA180 DECprinter I. 2,095
`75
`200
`117
`VT100 CRT DECscope
`68
`1,895
`101
`182
`VT132 CRT DECscope
`83
`122
`220
`2,295
`DT80/1 DATAMEDIA CRT
`
`1 995
`
`191
`
`106
`
`72
`
`...
`
`1,595
`153
`T1745 Portable Terminal
`2,595
`249
`T1765 Bubble Memory Terminal
`T1810 RO Printer .1,895 182
`T1820 KSR Printer .2,195 210
`1,595
`153
`T1825 KSR Printer
`
`85
`146
`101
`117
`85
`
`57
`94
`68
`79
`57
`
`.
`
`.
`
`84
`139
`210
`316
`278
`
`91
`
`47
`78
`117
`176
`155
`
`51
`
`32
`53
`79
`119
`105
`
`34
`43
`47
`54
`96
`
`ADM3A CRT Terminal
`875
`ADM31 CRT Terminal . 1,450
`!
`2,195
`ADM42 CRT Terminal
`3,295
`QUME Letter Quality KSR
`QUME Letter Quality RO
`2,895
`HAZELTINE 1420 CRT
`945
`1,195
`64
`115
`HAZELTINE 1500 CRT
`1,295
`69
`124
`HAZELTINE 1552 CRT
`1,495
`80
`144
`Hewlett-Packard 2621A CRT
`2,650
`254
`Hewlett-Packard 2621P CRT
`142
`FULL OWNERSHIP AFTER 12 OR 24 MONTHS
`10% PURCHASE OPTION AFTER 36 MONTHS
`ACCESSORIES AND PERIPHERAL EQUIPMENT
`ACOUSTIC COUPLERS * MODEMS * THERMAL PAPER
`RIBBONS * INTERFACE MODULES * FLOPPY DISK UNITS
`PROMPT DELIVERY * EFFICIENT SERVICE
`
`Patent Owner Saint Regis Mohawk Tribe
`Ex. 2048, p. 8
`
`
`
`SRC00034169
`
`56
`
`ping high-level programs into machine-level programs
`that effectively utilize machine resources, but much re-
`mains to be done. Creative research is needed to handle
`data structures in a manner consistent with principles of
`data flow computation. These are among the problems
`under study in our data flow project at MIT. E
`
`Acknowledgment
`
`This paper is based on research supported by the
`Lawrence Livermore National Laboratory of the Univer-
`sity of California under contract 8545403.
`
`References
`
`I .
`
`2.
`
`4.
`
`5.
`
`7.
`
`8.
`
`9.
`
`J.B. Dennis, "On the Design and Specification of a Com-
`mon Base Language," Proc. Symp. Computers and Auto-
`mata, Polytechnic Press, Polytechnic Institute of Brook-
`lyn, Apr. 1971, pp. 47-74.
`Arvind, K.P. Gostelow, and W. Plouffe, An Asynchro-
`nous Programming Language and Computing Machine,
`Dept. of Information and Computer Science, University of
`California, Irvine, Technical Report 114a, Dec. 1978, 97
`PP.
`3. W.B. Ackerman and J.B. Dennis, VAL:A Value Oriented
`Algorithmic Language, Preliminary Reference Manual,
`Laboratory for Computer Science, MIT, Technical Report
`TR-218, June 1979, 80 pp.
`J.R. McGraw, Data Flow Computing: The VAL Lan-
`guage, submitted for publication.
`R.R. Seeber and A.B. Lindquist, "Associative Logic for
`Highly Parallel Systems," AFIPS Conf. Proc, 1963, pp.
`489-493.
`6. R.M. Shapiro, H. Saint, and D.L. Presberg, Representa-
`tion of Algorithms as Cyclic Partial Orderings, Applied
`Data Research, Wakefield, Mass., Report CA-7112-2711,
`Dec. 1971.
`J.B. Dennis, "First Version of a Data Flow Procedure
`Language," Lecture Notes in Computer Sci., Vol. 19,
`Springer-Verlag, 1974, pp. 362-376.
`J.E. Rodriguez, A Graph Model for Parallel Computa-
`tion, Laboratory for Computer Science, MIT, Technical
`Report TR-64, Sept. 1969, 120 pp.
`D.A. Adams, A Computation Model With Data Flow Se-
`quencing, Cbmputer Science Dept., School of Humanities
`and Sciences, Stanford University, Technical Report CS
`117, Dec. 1968, 130 pp.
`10. R.M. Karp and R.E. Miller, "Properties of a Model for
`Parallel
`Termination,
`Determinacy,
`Computations:
`Queueing," SIAM J. Applied Math., Vol. 14, Nov. 1966,
`pp. 1390-1411.
`J.D. Brock and L.B. Montz, "Translation and Optimiza-
`tion of Data Flow Programs," Proc. 1979 Int'l Conf. on
`Parallel Processing, Bellaire, Mich., Aug. 1979, pp. 46-54.
`12. W.B. Ackerman, "Data Flow Languages," AFIPS Conf.
`Proc., Vol. 48, 1979 NCC, New York, June 1979, pp.
`1087-1095.
`13. M. Cornish, private communication, Texas Instruments
`Corp., Austin, Tex.
`J.E. Rumbaugh, "A Data Flow Multiprocessor," IEEE
`Trans. Computers, Vol. C-26, No. 2, Feb. 1977, pp.
`138-146.
`
`11.
`
`14.
`
`15.
`
`16.
`
`17.
`
`I. Watson and J. Gurd, "A Prototype Data Flow Com-
`puter With Token Labelling," AFIPS Conf. Proc., 1979
`NCC, New York, June 1979, pp. 623-628.
`A. Davis, "A Data Flow Evaluation System Based on the
`Concept of Recursive Locality," AFIPS Conf. Proc., Vol.
`48, 1979 NCC, New York, June 1979, pp. 1079-1086.
`A. Despain and D. Patterson, "X-Tree: A Tree Structured
`Multi-Processor Computer Architecture, " Proc. Fifth An-
`nual Symp. Computer Architecture, Apr. 1978, pp.
`144-150.
`18. R.M. Keller, G. Lindstrom, and S.S. Patil, "A Loosely-
`Coupled Applicative Multi-processing System," AFIPS
`Conf. Proc., 1979 NCC, New York, June 1979, pp.
`613-622.
`C. Leung, On a Design Methodology for Packet Com-
`munication Architectures Based on a Hardware Design
`Language, submitted for publication.
`A.R. Tripathi and G.J. Lopovski, "Packet Switching in
`Banyan Networks," Proc. Sixth AnnualSymp. Computer
`Architecture, Apr. 1979, pp. 160-167.
`21. G.A. Boughton, Routing Networks in Packet Com-
`munication Architectures, MS Thesis, Dept. of Electrical
`Engineering