throbber
Associative and Parallel Processors
`
`
`
`
`
`
`
`
`
`
`
`
`KENNETH J. THURBER
`
`
`
`
`
`
`
`
`
`
`
`
`Product Development Group, Sperry Univac Defense Systems, St. Paul, Minnesota 55165
`
`
`
`
`
`
`
`
`
`Computer, Information and Control Sciences Department, Universzty of Minnesota,
`
`
`
`Minneapolis, Minnesota 55455
`
`
`
`LEON D. WALD
`
`
`
`
`
`
`
`
`
`
`Test Equipment Engineering, Government and Aeronautical Products Division, Honeywell, Inc.,
`
`
`Minneapolis, Minnesota
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`This paper is a tutorial survey of the area of parallel and associative processors.
`
`
`
`
`
`
`
`
`
`
`
`
`The paper covers the main design tradeoffs and major architectures of SIMD
`
`
`
`
`
`
`
`
`
`(Single Instruction Stream Multiple Data Stream) systems. Summaries of
`
`
`
`
`
`
`
`
`
`
`
`ILLIAC IV, STARAN, OMEN, and PEPE, the major SIMD processors, are
`
`included.
`
`
`
`
`
`
`
`
`
`
`Keywords and Phrases: associative processor, parallel processor, OMEN,
`
`
`
`
`
`
`
`
`STARAN, PEPE, ILLIAC IV, architecture, large-scale systems, SIMD
`
`
`
`
`processors, array processors, ensemble.
`
`
`
`
`
`CR Categories: 3.80, 4.20, 6.22
`
`
`
`
`
`
`
`
`
`
`
`
`
`INTRODUCTION
`ciative processors have been published [140].
`
`
`
`
`
`
`
`Some of these surveys are only of historical
`
`
`
`
`
`
`
`
`
`
`Purpose and Scope
`interest due to their age. Several conferences in
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`to present a
`The purpose of
`this paper is
`this area may also be of interest [1 1-14] .
`
`
`
`
`
`
`
`
`
`
`
`
`tutorial survey on the subject of parallel and
`
`
`
`
`
`
`Classification of Computer Architectures
`associative processors. The paper covers the
`
`
`
`
`
`
`
`
`
`
`
`Many approaches to classification of computer
`topics of system categorizations, applications,
`
`
`
`
`
`
`
`
`
`
`
`architectures are possible. Most techniques use
`main tradeoff
`issues, historically important
`
`
`
`
`
`
`
`
`
`
`
`
`
`global architecture properties and are thus valid
`architectures, and the architectures of systems
`
`
`
`
`
`
`
`
`
`
`
`that are currently available.
`only within limited ranges. The main classifica-
`
`
`
`
`
`
`
`
`
`tion techniques discussed below provide a
`Currently, the microprocessor/computer-on-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`framework within which to view associative and
`a-chip revolution is providing the potential for
`
`
`
`
`
`
`
`parallel processors.
`production of very cost-effective high-
`
`
`
`
`
`
`
`
`
`
`
`Flynn [15] proposed a classification scheme
`performance computers through utilization of a
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`that divides systems into categories based upon
`large number of these processors in parallel or
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`their procedure and data streams. The four
`in a network. The parallel and associative proc-
`
`
`
`
`
`
`
`
`
`
`categories are:
`essors provide a class of architectures which can
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`l) SISD (Single Instruction Stream Single
`be readily used to take immediate advantage of
`
`
`
`
`
`
`
`
`
`
`Data Stream) — a uniprocessor such as a
`the microprocessor technology.
`
`
`
`
`single processor IBM 360.
`
`
`
`
`
`
`
`Surveys
`2) MISD (Multiple Instruction Stream Single
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`A number of good surveys on the subject (or
`Data Stream) - a pipeline processor such
`
`
`
`
`
`
`
`
`
`
`
`as CDC STAR.
`bordering on the subject) of parallel and asso-
`
`
`
`
`
`
`
`
`
`
`
`Copyright © 1976, Association for Computing Machinery, Inc. General permission to republish
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`but not for profit, all or part of this material is granted, provided that ACM’s copyright notice is
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`given and that reference is made to this pubhcation, to its date of issue, and to the fact that re-
`
`
`
`
`
`
`
`
`
`
`
`
`printing privileges were granted by permission of the Association for Computing Machinery.
`
`,
`
`
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`INTEL - 1010
`
`INTEL - 1010
`
`

`

`
`216
`
`-
`
`
`
`Kenneth J. Thurber and Leon D. Wald
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`CONTENTS
`
`
`
`parallelism
`
`
`
`
`
`based
`cation
`upon
`technique
`
`
`
`
`properties. Their categories are:
`
`
`
`
`l) General-purpose network computer;
`
`
`
`
`
`’2) Special-purpose network With global
`
`
`parallelism; and
`
`
`
`
`.3) Non-global,
`semi-independent network
`
`
`
`
`
`
`
`with local parallelism — this category is a
`
`
`
`
`
`
`
`catchall for machines which do not fit
`
`
`
`
`
`into the first two categories.
`
`
`
`
`
`
`
`Categories
`and 2 have
`the following
`1
`
`subcases:
`
`
`
`
`
`1.a) General—purpose network with cen-
`
`
`
`tralized common control;
`
`
`
`
`l.b) General-purpose network With
`
`
`
`
`identical processors but
`independent
`
`
`
`instruction execution actions;
`
`
`
`2.a) Pattern processors;
`
`
`
`2.b) Associative processors.
`
`
`
`
`
`
`The purpose of this classification technique
`
`
`
`
`
`was to differentiate between multiprocessors
`
`
`
`
`and highly parallel organizations.
`
`
`
`
`
`sug-
`Another possible classification view,
`
`
`
`
`
`
`
`
`
`gested by Hobbs, et al.; [1] consists of the
`
`
`following categorles:
`
`
`1) Multiprocessor
`
`
`
`2) Associative processor
`
`
`
`
`
`
`3) Network or array processor, and
`
`
`
`4) Functional machines.
`
`
`
`
`
`
`
`Further, Hobbs, et al.; suggested that archi-
`
`
`
`
`
`
`
`tectures could be classified based upon the
`
`
`
`
`amount of parallelism in:
`
`
`1) Control
`
`
`
`
`2) Processmg units, and
`
`
`
`3) Data streams.
`
`
`
`
`
`
`
`However, it was noted that these parameters
`
`
`
`
`
`
`
`
`were present in all highly parallel machines and
`
`
`
`
`
`
`
`were therefore not adequate to define a
`
`
`machine archltecture.
`
`
`
`
`
`
`In his article against parallel processors,
`
`
`
`
`
`
`
`Shore
`[17] presents a unique classification
`
`
`
`
`
`
`technlque which derives the machine descrip-
`
`
`
`
`
`
`
`tions from the description of a uniprocessor.
`
`
`
`
`
`
`The machine categories considered are sum-
`
`
`
`marized as follows:
`
`
`
`
`
`
`1) Machine I — a uniprocessor.
`
`
`
`
`
`
`
`2) Machine 11 — a bit-slice associative proc-
`
`
`
`
`
`
`
`
`essor built from Machine I by adding bit-
`
`
`
`
`
`slice processing and access
`capability
`
`
`(e.g., STARAN).
`
`
`
`
`
`3) Machine III —an orthogonal computer
`
`
`
`
`
`
`
`derived from Machine 11 by adding par-
`
`
`
`
`
`
`allel word processing and access capa-
`
`
`
`bility (e.g., OMEN).
`
`INTRODUCTION
`
`Purpose and Scope
`
`
`
`Surveys
`
`Classrfication of Computer Architectures
`
`
`
`
`Definitions
`
`Reasons for Use of SIMD Processors
`
`
`
`
`
`
`Applications
`
`Matrix Multiplication on a Parallel Processor
`
`
`
`
`
`
`SIMD GENEALOGY
`
`
`MAIN TRADEOFF ISSUES
`
`
`
`Assocrauve Addressmg
`
`
`The State Transmon Concept and Assocrative
`
`
`
`
`
`Processor Algorithms
`
`
`Processmg Element (PE) lnterconnectrons
`
`
`
`Input/Output
`
`Host Interaction
`
`
`Memory Distribution
`
`
`Software
`
`Actmty Control and Match Resolution
`
`
`
`
`Balance Between Logic and Memory
`
`
`
`
`
`ASSOCIATIVE PROCESSORS
`
`
`Introduction
`
`DLM (Distributed Logic Memory)
`
`
`
`
`Highly Parallel Assomative Processors
`
`
`
`
`Mixed Mode and Multidimensmnal Memories
`
`
`
`
`Implemented Assocrative Processors
`
`
`
`Assocrattve Processor Simulators
`
`
`
`PARALLEL PROCESSOR AND ENSEMBLES
`
`
`
`Introduction
`
`Ungcr's Machine
`
`
`SOLOMON l, SOLOMON II, and ILLIAC IV
`
`
`
`
`
`
`
`Machines With Other Interconnection Structures
`
`
`
`
`
`Orthogonal Computer
`
`
`PEPE
`
`Comment
`
`ACTUAL MACHINES
`
`Introduction
`
`STARAN
`
`OMEN
`
`PEPE
`
`ILLIAC IV
`
`
`CONCLUSION
`
`REFERENCES
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`+
`
`
`
`
`
`
`
`3) SIMD (Single Instruction Stream Multiple
`
`
`
`
`
`
`Data Stream) — an associative or parallel
`
`
`
`
`
`processor such as ILLIAC IV.
`
`
`
`
`
`4) MIMD (Multiple Instruction Stream
`
`
`
`
`
`
`Multiple Data Stream) - a multiprocessor
`
`
`
`
`
`
`such as a Univac 1108 multiprocessor
`
`system.
`
`
`
`
`
`
`
`
`Murtha and Beadles [16] proposed a classifi-
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`INTEL - 1010
`
`INTEL - 1010
`
`

`

`
`
`0
`
`
`
`
`
`
`
`
`
`217
`
`
`
`
`
`Associative and Parallel Processors
`
`
`
`
`
`
`
`
`sistent with those used by Thurber [10]. The
`
`
`
`definitions are'
`
`
`
`
`
`
`l) SIMD machine: any computer with a
`
`
`
`
`
`
`single global control unit which drives
`
`
`
`
`
`multiple processing units, all of which
`
`
`
`
`
`either execute or
`ignore the current
`
`instruction.
`
`
`
`
`
`
`2) Associative Processor: any SIMD machine
`
`
`
`
`
`
`
`in which the processing units (or proc—
`
`
`
`
`
`
`essmg memory)
`are addressed by a
`
`
`
`
`
`
`
`property of the data contents rather than
`
`
`
`
`
`
`
`
`by address (i.e., multiply A and B to-
`
`
`
`
`
`
`
`gether in all elements where A < B).
`
`
`
`
`
`
`
`3) Parallel Processor: any SIMD machine in
`
`
`
`
`
`
`which the processing elements are the
`
`
`
`
`
`
`
`order of complexity of current day small
`
`
`
`
`
`
`computers and which has, typically, high
`
`
`
`
`
`level of interconnectivity between proc—
`
`
`essing elements.
`
`
`
`
`
`
`
`4) Ensemble
`a parallel processor in which
`
`
`
`
`
`the interconnection level between proc-
`
`
`
`
`
`
`
`essing elements is very low or nonex-
`
`istent.
`
`
`
`
`
`
`Clearly, the intersection of these definitions
`
`
`
`
`
`
`
`
`
`
`will not be null due to the overlap in claSSI-
`
`
`
`fying machine architectures.
`
`
`
`
`
`
`
`4) Machine IV — a machine derived from
`
`
`
`
`
`Machine I by rephcating the processing
`
`
`
`units (e.g., PEPE).
`
`
`
`
`
`5) Machine V — a machine derived from
`
`
`
`
`
`Machine IV by adding interconnections
`
`
`
`
`
`between processors (e.g., ILLIAC IV).
`
`
`
`
`
`
`
`6) Machine VI — a machine derived from
`
`
`
`
`
`
`Machine I by integrating processing logic
`
`
`
`
`
`
`into every memory element (e.g., Kautz’s
`
`
`
`
`logic-in-memory computers [85, 86] ).
`
`
`
`
`
`classifies computers using
`[67]
`Higbie
`
`
`
`
`
`
`four
`baSlC categories. However, he
`Flynn’s
`
`
`
`
`
`
`
`expands the SIMD category into the following
`
`
`four subcategories:
`
`
`
`
`
`
`
`1) Array Processor — a processor that proc-
`
`
`
`
`
`
`
`esses data in parallel and addresses the
`
`
`
`
`
`
`
`
`
`data by address instead of by tag or value.
`
`
`
`
`2) Associative Memory Processor — a
`
`
`
`
`
`processor‘that operates on data addressed
`
`
`
`
`
`
`
`
`by tag or value rather than by address
`
`
`
`
`
`
`(Note:
`this definition does not require
`
`
`
`
`
`parallel operation; however,
`the defini-
`
`
`
`
`
`
`
`tion does allow for machines that operate
`
`
`
`
`on data in parallel).
`
`
`
`
`
`
`3) Associative Array Processor — a processor
`
`
`
`
`
`
`assocxative and also operates
`that
`is
`
`
`
`
`
`
`
`on arrays of data (typically. the opera-
`
`
`
`
`
`
`
`
`
`tions are on a b1t-slice basis, i.e., a single
`
`
`
`
`bit of many words).
`
`
`
`
`
`
`4) Orthogonal Processor — a processor w1th
`
`
`
`
`two subsystems — an associative array
`
`
`
`
`
`
`processor subsystem and a serial (SISD)
`
`
`
`
`processor
`subsystem — which share
`a
`
`
`
`common memory array.
`
`
`
`
`
`
`Higbie’s categories provide for the identifica-
`
`
`
`
`
`
`
`
`tion of the ability to perform parallel, asso-
`
`
`
`
`
`
`
`ciative, and serial processing. Higbie defines a
`
`
`
`
`
`
`
`
`parallel processor to be any computer that con-
`
`
`
`
`
`
`
`tains multiple arithmetic units and operates on
`
`
`
`
`
`
`
`multiple data streams. Clearly, all of Higbie’s
`
`
`
`
`
`
`
`four subcategories of SIMD processors fit hls
`
`
`
`
`
`definltion of a parallel processor.
`
`
`
`
`
`
`Although none of the classification schemes
`
`
`
`
`
`
`presented here are mathematically precrse, they
`
`
`
`
`
`
`
`are necessary to place assocrative and parallel
`
`
`
`
`
`
`
`processors in perspective. In the next section
`
`
`
`
`
`
`the terminology of assoc1ative and parallel
`
`
`
`
`
`
`
`
`
`processors will be defined for use in the re-
`
`
`
`
`mainder of this paper.
`
`Definitions
`
`
`
`
`
`
`
`The definitions used for the remainder of this
`
`
`
`
`
`
`
`
`
`
`paper are based upon Flynn [15] , and are con-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Reasons for Use of SIMD Processors
`
`
`
`
`
`
`
`
`
`There are many reasons for the use of SIMD
`
`
`
`
`
`
`
`architectures. Some of the most important are:
`
`
`1) Functional
`
`
`
`
`
`
`
`a) Large problems such as weather data
`
`processing.
`
`
`
`
`
`b) Problems with inherent data structure
`
`
`and parallelism.
`
`
`
`
`c) Reliability and graceful degradation.
`
`
`
`(1) System complexity.
`
`
`
`e) Computational load.
`
`
`2) Hardware
`
`
`
`
`
`
`a) Better use of hardware on problems
`
`
`
`
`
`With large amounts of parallelism.
`
`
`
`
`
`b) Advent of LSI microprocessors.
`
`
`
`
`
`c) Economy of duplicate structures.
`
`
`
`
`(1) Lower nonrecurring and recurring
`
`costs.
`
`
`3) Software
`
`
`
`
`
`a) Simpler than for multiprocessor.
`
`
`
`
`
`
`b) Easier to construct large systems.
`
`
`
`
`0) Less executive function requirements.
`
`
`
`
`
`
`However, the reader is cautioned to remem-
`
`
`
`
`
`
`ber that these are special-purpose machines and
`
`
`
`
`
`
`
`any attempt
`to apply them to an incorrectly
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`INTEL - 1010
`
`INTEL - 1010
`
`

`

`Kenneth J. Thurber and Leon D. Wald
`
`
`
`
`
`
`
`
`
`
`
`
`
`218
`
`
`
`.
`
`
`
`
`
`
`
`
`sized, or designed, problem is an exercise in
`futility.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`systen
`followed by parallel
`level,
`systems
`
`
`
`
`
`design, rather than vice versa.
`
`
`
`
`
`
`SIMD processors have been proposed ant
`
`
`
`
`
`
`
`appear well suited for a number of applications
`
`
`
`
`
`
`
`
`these are listed here along with their primary
`
`
`
`
`
`
`
`references. Factors to be considered in applying
`
`
`
`
`
`
`SIMD processors have been discussed elsewhere
`
`
`
`
`
`
`
`
`
`[18]. To be useful for solving the listed prob-
`
`
`
`
`
`
`lems,
`the highly parallel processors must
`
`
`
`
`
`become more
`cost-effective. The suggested
`
`
`
`application areas are:
`
`
`
`
`
`
`l) Applications in which associative proc-
`
`
`
`
`
`essors appear to be cost-effective:
`
`
`
`
`
`
`[19]
`a) Virtual memory mechanisms
`
`
`
`
`[20]
`b) Resource allocation
`
`
`
`
`[21]
`c) Hardware executives
`
`
`
`
`
`d) Interrupt processmg
`[22,23]
`
`
`
`
`
`e) Protection mechanisms
`[19]
`
`
`
`f) Scheduling
`[20]
`
`
`
`
`
`
`2) Applications in which parallel processors
`
`
`
`
`
`appear cost-effective and in which
`
`
`
`
`
`
`associative processors may be
`effective:
`
`cost-
`
`Applications
`
`
`
`
`
`
`
`
`
`
`
`
`
`Numerous applications have been proposed for
`
`
`
`
`
`
`associative and parallel processors. Whether an
`
`
`
`
`
`
`
`application is ever implemented is based upon
`
`
`
`
`
`
`
`
`
`the economics of the problem. It is not enough
`
`
`
`
`
`
`
`
`
`to be able to describe a problem solution. To
`
`
`
`
`
`
`date, considerable energy has been expended
`
`
`
`
`
`searching for cost-effective parallel solutions,
`
`
`
`
`
`
`
`
`but little has been done to actually implement
`
`
`
`
`
`
`them. Examples of
`such applications are:
`
`
`
`
`matrix manipulation, differential equations,
`and linear programming.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Several areas of application appear quite well
`suited to associative and parallel processing. In
`some cases, SIMD processors provide cost-
`effective system augmentation (e.g., air traffic
`control and associative head-per-track disks). In
`others, SIMD processors are very close to func-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`tional analogs of the physical system (e.g., data
`
`
`
`
`compression, concentration, and multiplexing).
`
`
`
`
`
`
`
`The use of associative and parallel processors
`
`
`
`
`
`
`
`appears promising in the elimination of critical
`
`
`
`
`
`bottlenecks in current general-purpose com-
`
`
`
`
`
`
`
`puter systems; however, only very small asso-
`
`
`
`
`
`
`
`
`ciative memories are required, and cost is really
`
`
`
`
`
`
`
`
`
`not a critical factor in the solution of these
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`resource management,
`to problems such as
`
`
`
`
`
`
`virtual memory mechanisms, and some aug-
`
`
`
`
`
`
`
`mentation of current systems, rather than to
`
`
`
`
`
`
`data-base management or large file searching
`
`
`and processing. The application trend for
`
`
`
`parallel processors and ensembles seems to be in
`
`the area of large data computation problems
`
`
`such as weather data processing, nuclear data
`
`
`processing, and ballistic missile defense. Re-
`
`
`
`searchers must concentrate on systems aspects
`
`
`
`of the problem, not on developing solutions to
`
`
`
`
`
`
`problems which are ill-defined or nonexistent.
`
`
`
`
`
`
`The only truly useful data-processing applica-
`
`
`
`
`
`
`
`tions of parallel and associative processors have
`
`
`
`
`
`
`
`been developed through extensive work at the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`The application trend for associative
`processors is well defined. Due to cost factors,
`applications will be limited (in the near future)
`
`
`
`
`
`
`[24]
`a) Bulk filtering
`
`
`
`[25]
`b) Tracking
`
`
`
`
`
`[26]
`c) Air traffic control
`
`
`
`
`[27]
`d) Data compression
`
`
`
`
`e) Communications multiplexing [28,29]
`
`
`
`
`f) Signal processing
`[30]
`
`
`
`
`
`
`3) Applications in which associative proc-
`
`
`
`
`
`
`
`
`essors would be useful if they were more
`
`
`
`
`
`
`cost-effective and in which parallel proc-
`
`
`
`
`
`essors are probably cost-effective are:
`
`
`
`
`a) Information retrieval
`[3]]
`
`
`
`b) Sorting
`[32]
`
`
`
`
`c) Symbol manipulation
`[33]
`
`
`
`
`(1) Pattern recognition
`[34]
`
`
`
`e) 'Picture processing
`[35]
`
`
`
`
`f) Sonar processing
`[36]
`
`
`
`
`g) Sea surveillance
`[37]
`
`
`
`
`h) Graph processing
`[38]
`
`
`
`1) Dynamic programming
`[39]
`
`
`
`
`j) Differential equations
`[40]
`
`
`
`k) Eigenvector
`[41]
`
`
`
`
`1) Matrix operations
`[42]
`
`
`
`
`
`m) Network flow analysis
`[43]
`
`
`
`
`11) Document retrieval
`[44]
`
`
`
`
`
`0) Data file manipulation
`[45]
`
`
`
`
`
`p) Machine document translation
`[46]
`
`
`
`
`
`q) Data file searching
`[47]
`
`
`
`r) Compilation
`[48]
`
`
`
`
`s) Formatted files
`[49]
`
`
`
`
`
`t) Automatic abstracting
`[50,51]
`
`
`
`problems. Such problems may be encountered
`in the management of computer resources, and
`might involve such items as protection mecha-
`nisms,
`resource allocation, and memory
`management.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`INTEL - 1010
`
`INTEL - 1010
`
`

`

`
`
`
`
`
`Assocmtive and Parallel Processors
`
`Each cell is interconnected to its four nearest
`
`neighbors. Cannon [59] derived the following
`algorithm to multiply two nxn matrices to-
`gether in n stages using this processor.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`.
`
`219
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Algorithm:
`Set:
`
`CREG = 0
`
`
`
`Shift:
`
`
`
`[52]
`u) Dictionary-look-up translations
`[53]
`v) Data management
`[54]
`w) Theorem proving
`[55]
`x) Computer graphics
`[56]
`y) Weather data processing
`Hollander’s paper
`[2]
`presents many
`experts’ opinions about
`the applicability of
`associative and parallel processors. Slotnick
`[57] and Fuller
`[58] have compared asso-
`01at1ve and parallel processors. They concluded
`that parallel processors appear more useful than
`associative processors, but the machines would
`always be special purpose. Shore [17] has pre-
`sented a very eloquent case against SIMD
`
`machines and parallel processors.
`
`
`
`
`
`
`Matrix Multiplication on a Parallel Processor
`
`
`Matrix manipulations can be used to illustrate
`the potential of a parallel processor. One of the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`major computations in many of the possible
`applications cited previously is
`the matrix
`multiply. Assume that a parallel processor such
`
`
`
`
`Multiply:
`
`Add:
`
`Shift:
`
`
`
`
`
`
`(TREG = AREG times BREG)
`In Parallel in all PEs
`
`
`
`
`
`Jth column of B up J-l rows
`for all J <N
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`(CREG = CREG + TREG)
`In Parallel in all PEs
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`BREG [PEI J] = B [I,J] for all
`
`
`
`1,] <N
`’
`
`
`
`
`
`
`
`AREG [PEI J] = [I,J] for all
`
`
`1,] <N
`’
`
`
`
`
`
`
`
`1th row of A left I-l columns
`
`
`
`for all I <N
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`AREG right one row
`
`
`
`BREG down one column
`
`
`
`
`
`
`If not Nth pass, Jump to
`
`Multiply:
`
`
`
`Jump:
`
`As an example let:
`
`
`
`
`
`
`
`
`
`
`A z
`
`
`
`
`B =
`
`
`
`and
`
`
`
`C = A x B,
`
`
`
`a]
`
`
`
`34
`
`
`a7
`
`32
`
`
`
`as
`
`
`a8
`
`
`b1
`
`b2
`
`b3
`
`
`b4
`
`b5
`
`b6
`
`33‘
`
`
`
`
`36
`
`89
`
`
`
`b7
`
`b8
`b9
`
`
`
`
`
`
`
`
`
`as that shown in Figure 1
`this operation.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Each processing element will be assumed to
`have three registers, AREG, BREG, and CREG.
`
`is used to perform
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`mNTfiOL
`UNIT
`
`
`
`
`
`
`
`
`MEMORY TO AND FROM ALL PE;
`
`
`
`
`
`
`
`
`
`Figure 1. Parallel Processor with PE connec-
`tions to the Four Nearest Neighbors.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`After initialization, the memory map for CREG
`is:
`
`
`
`
`
`
`
`
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`
`
`
`0
`
`0
`
`0
`
`
`
`33
`
`
`
`a4
`
`as
`
`
`
`
`
`
`
`
`
`For AREG, the memory map is:
`
`
`
`
`
`
`
`a1
`
`as
`
`
`
`
`a9
`
`a2
`
`a6
`
`
`
`
`
`37
`
`
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`INTEL - 1010
`
`
`
`
`
`INTEL - 1010
`
`

`

`Kenneth J. Thurber and Leon D. Wald
`
`
`
`220
`
`-
`
`
`
`
`
`
`And the BREG ma is:
`p
`
`
`
`
`
`b2
`
`
`
`
`
`
`
`
`
`
`b7
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`b3
`
`
`
`b2
`
`
`b4
`
`
`
`b6
`
`
`b8
`
`
`’07
`
`
`
`
`
`b6
`
`
`
`
`
`
`
`
`
`
`
`After the multiply, add, shift, and jump, the
`memory maps appear as follows:
`
`
`
`
`
`
`
`
`
`
`CREG =
`
`
`
`
`
`
`
`
`a1"1
`
`35b2
`
`a9b3
`
`AREG =
`
`
`
`a3
`a
`
`4
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`After two more iterations the multiply will be
`
`
`
`
`
`
`
`
`finished and CREG [PEI I] will contain C[I,J].
`
`
`azbs
`
`36'36
`
`a7b4
`
`a 1
`a
`
`5
`
`
`
`
`
`
`
`a3b9
`
`a4b7
`38138
`
`a2
`a
`
`6
`
`
`
`
`
`
`
`
`
`a8
`
`
`
`
`
`a9
`
`
`
`a7
`
`
`
`SIMD GENEALOGY
`
`
`
`
`
`
`
`
`Figure 2 ind1cates the major parallel and asso-
`
`
`
`
`
`
`ciaiive architectures considered in this paper,
`
`
`
`
`
`
`together with their generic relationships. These
`
`
`
`
`
`
`
`
`may differ substantially, yet they are related by
`.
`.
`.
`
`
`
`
`
`
`
`Virtue of
`then global assocrative or parallel
`
`.
`properties.
`
`SIMD MACHINES
`
`
`T—w——I__——I—
`ASSOCIATIVE PROCESSORS
`PARALLEL PROCESSORS
`ENSEMBLES
`
`
`
`
`
`——_I—_—
`T—TTTT—
`MEMEX (72)
`UNGER
`SOLOMON I
`ORTHOGONAL
`PEPE
`
`
`
`
`
`
`
`
`MACHINE
`(108)
`COMPUTER
`(115)
`
`
`
`(m,
`__l:“)
`
`CRYOGENIC CATALOG MEMORY (73)
`
`
`
`
`
`DISTRIBUTED LOGIC MEMORY (31)
`
`
`
`
`
`'
`
`
`
`
`
`
`I
`SOLOMON ll
`
`
`(109)
`
`
`I
`
`(57)
`ILLIAC Iv
`
`
`
`
`
`
`(105)
`L
`
`OMEN
`
`
`
`(112)
`
`I
`IMPLEMENTED
`
`ASSOCIATIVE
`
`PROCESSORS
`
`
`I
`ASSOCIATION
`STORING
`
`PROCESSOR
`(78)
`
`
`
`
`
`
`TREE
`
`CHANNEL
`
`PROCESSOR
`(80)
`
`
`
`
`
`
`
`
`TWO
`
`DIMENSIONAL
`AND BULK
`
`
`DISTRIBUTED
`LOGIC
`
`MEMORIES
`
`I81)
`
`
`
`
`
`HIGHLY PARALLEL
`
`
`7—__I——_I_
`AUGMENTED
`ASSOCIATIVE AswCIATIVE
`
`
`CONTENT
`CONTROL
`LOGIC
`
`
`ADDRESSAB LE
`PROCESSING
`SNITCH
`
`SYSTEM
`MEMORY
`(24)
`
`I85)
`(87)
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`WTBIT
`
`
`SLICE
`(1T3)
`
`
`
`
`
`
`ADDER
`
`SK EWED
`BIT
`
`SLICE
`
`I92)
`
`
`
`
`
`EXOR
`
`SK EWED
`BIT
`
`SLICE
`I93)
`
`
`
`
`
`BYTE
`
`SLICE
`
`(106)
`
`
`
`DISTRIBUTED
`LOGIC
`
`
`I20)
`
`
`
`
`
`STARAN
`(114)
`
`
`
`
`
`——_I
`MIXED MODE AND MULTIDIMENSIONAL (23)
`
`
`
`
`
`
`
`Figure 2. SIMD Geneology.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`
`
`,
`
`INTEL - 1010
`
`INTEL - 1010
`
`

`

`
`
`
`MAIN TRADEOFF ISSUES
`
`.
`
`
`
`
`
`
`
`
`
`
`
`To allow the reader to place the processed
`
`
`
`
`
`
`architectures in perspective, the major system
`
`
`
`
`
`
`
`
`tradeoff issues in SIMD processors will be dis-
`
`
`
`
`
`
`
`
`cussed next. It should be understood that SIMD
`
`
`
`
`
`
`
`processors are special purpose. They are useful
`
`
`
`
`
`
`
`
`
`for a limited set of applications which may be
`
`
`
`
`
`characterized as having [18] :
`
`
`
`
`
`
`
`
`1) A large number of independent data sets;
`
`
`
`
`
`
`
`2) No relationships between data sets that
`
`
`
`
`
`
`prevent
`them from being processed in
`
`parallel;
`
`
`
`
`
`
`3) The requirement
`for quick throughput
`
`
`response; and
`
`
`
`
`
`
`associative
`to exp101t
`4) The potential
`
`
`
`addressing selection techniques.
`
`
`Associative Addressing
`
`
`
`
`
`
`
`
`Figure 3 indicates the main features required to
`
`
`
`
`
`
`implement an associative address process. The
`
`
`
`
`
`
`
`designer must
`include facilities to address or
`
`
`
`
`
`
`query the processor associatively,
`
`i.e.,
`
`to
`
`
`
`
`
`
`221
`Associative and Parallel Processors
`
`
`
`
`
`
`
`
`address the memory array by content. In Figure
`
`
`
`
`
`
`
`
`the application may require that all em-
`3,
`
`
`
`
`
`
`
`
`
`
`ployees with a salary of $35 per day be located.
`
`
`
`
`
`
`
`This would be accomplished by performing an
`
`
`
`
`
`
`
`
`
`equality search on the daily salary field of the
`
`
`
`
`
`
`
`
`
`file. This search is performed in parallel. To set
`
`
`
`
`
`
`
`
`
`up the search a Data Register must be loaded
`
`
`
`
`
`
`
`
`with the daily salary ($35) for comparison, a
`
`
`
`
`
`
`
`
`Mask Register is included to mask the data
`
`
`
`
`
`
`
`
`
`register so that only the desired fields may be
`
`
`
`
`
`
`
`searched, a Word Select (activity select) Reg-
`
`
`
`
`
`
`
`
`
`ister specifies the subset of words to be ad-
`
`
`
`
`
`
`
`
`dressed, a Results Register is required to collect
`
`
`
`
`
`
`
`
`
`the results of the search, a Match Indicator is
`
`
`
`
`
`
`
`
`
`used to indicate the number of matches, and a
`
`
`
`
`
`
`
`Multiple Match Resolver must be provided to
`
`
`
`
`
`
`
`generate a pointer to the “topmost” matched
`
`
`
`
`
`
`
`word. An eight-word example of a personnel
`
`
`
`
`
`
`
`
`file stored in a simple associatlve memory is
`
`
`
`
`
`
`
`
`illustrated in Figure 3,
`it assumes that
`the
`
`
`
`
`
`
`
`
`
`
`search for a salary of $35 per day was initiated
`
`
`
`
`
`
`
`
`on the entire file. All necessary registers and
`
`
`
`features are illustrated.
`
`CONTROL UNIT
`
`
`
`MATCH
`INDICATOR
`
`
`
`
`
`
`
`
`
`
`
`
`
`DATA REGISTER
`
`
`
`
`MASK REGISTER
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`GENERAL FORM OF THE
`
`
`
`FIELD DESCRIPTOR
`
`
`
`
`
`
`
`
`NAME
`
`EMPLOYEE
`DAILY
`
`SALARY NUMBER
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`I
`
`WORD
`
`SELECT
`
`REGISTER
`
`
`
`MULTIPLE
`
`MATCH
`RESOLVER
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`SEARCH
`
`RESULTS
`
`REGISTER
`
`
`
`1
`
`
`
`
`
`
`
`EXTERNAL
`PROCESSING
`LOGIC
`
`
`
`
`
`
`
`
`Figure 3. Assocrative Processor Storing a Personnel File.
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`INTEL - 1010
`
`INTEL - 1010
`
`

`

`222
`
`.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`PRESENT
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Kenneth J. Thurber and Leon D. Wald
`
`
`
`
`
`
`The State Transition Concept and Associative
`
`
`Processor Algorithms
`
`
`
`
`
`
`
`
`
`In 1965, Fuller and Bird [35] described a b1t-
`
`
`
`
`
`slice assocwtive processor architecture. Fuller
`
`
`
`
`
`
`
`
`and Bird discussed their algorithms in terms of
`
`
`
`
`
`
`state transitions. To perform parallel processing
`
`
`
`
`
`
`
`
`of data in an associative processor, two basic
`
`
`
`
`
`
`
`operations are required. These are a content
`
`
`
`
`
`
`search and a “multiwrlte” operation. The multi-
`
`
`
`
`
`
`
`write operatlon consists of
`the ability to
`
`
`
`
`
`
`simultaneously write into selected bits of
`
`
`
`
`
`
`
`several selected words. With a content search
`
`
`
`
`
`
`and multiwrite capability, an associative proc-
`
`
`
`
`
`
`
`
`
`essor can be used to process data in highly
`
`
`parallel fashion.
`
`
`
`
`
`
`consider
`the problem of
`example,
`For
`
`
`
`
`
`
`
`
`
`
`adding field A to field B. This can be accom-
`
`
`
`
`
`
`
`
`
`plished as a sequence of bit pair additions (Ai
`
`
`
`
`
`
`
`
`
`added to Bi): with suitable attention paid to the
`
`
`
`
`
`
`
`
`
`
`
`carry bit. If bit 1
`is the least significant and bit
`
`
`
`
`
`
`
`
`
`n the most, Figure 4 indicates the state table
`
`
`
`
`
`
`
`
`
`
`for the bit-serial addition of A to B with the
`
`
`
`
`
`
`
`
`result accumulating into B. As shown Bi (result)
`
`
`
`
`
`
`
`
`
`
`
`and Ci + 1 differ from Bi (operand) and Ci in
`
`
`
`
`
`
`
`
`
`
`the four states, 1, 3, 4, and 6. One possible
`
`
`
`
`
`
`
`
`addition method would be to search for these
`
`
`
`
`
`
`
`
`
`
`states and multiwrite the correct Bi and Ci + 1
`
`
`
`
`
`
`
`
`
`into the appropriate fields, i.e., set up a “tag
`
`
`
`
`
`
`
`
`
`
`field” designated as Ci, and set C1 = 0. Then
`
`
`
`
`
`
`
`operations would proceed from the least sig-
`
`
`
`
`
`
`
`
`
`nificant bit to the most significant bit of the
`
`
`
`
`
`
`
`
`
`
`pair of operands. At each bit position, i, Ai, Bi,
`
`
`
`
`
`
`
`
`
`and Ci would be searched for state 1. Then
`
`
`
`
`
`
`
`
`
`
`
`
`Ci + l = 0, B1 = 1 would be multiwritten into
`
`
`
`
`
`
`
`
`
`all matching words. After a search for state 3,
`
`
`
`
`
`
`
`
`
`
`Ci + 1 = 1, Bi = 0 would be multiwritten into all
`
`
`
`
`
`
`
`
`matching words. This same procedure would be
`
`
`
`
`
`
`
`
`
`
`
`carried out for states 4 and 6.
`if a word is
`
`
`
`
`
`
`
`
`matched on any search,
`it would be removed
`
`
`
`
`
`
`
`
`from the list of active cells until operations
`
`
`
`
`
`
`
`
`
`
`
`moved from bit position i
`to i + 1, so that
`
`
`
`
`
`
`
`
`
`writing the resultant B1 and the carry-in for bit
`
`
`
`
`
`
`
`
`
`position i + 1 into the memory does not cause
`
`
`
`
`
`
`
`the data already processed to be reprocessed.
`
`
`
`
`
`
`Obviously, utilizing the sequential state trans-
`
`
`
`
`
`formation approach, any desired transforma-
`
`
`
`
`
`
`
`can be processed in an assocrative
`tions
`
`
`
`
`
`
`
`processor with simple content search and multi-
`
`
`
`
`
`
`
`write capability using very simple word logic.
`
`
`
`
`
`
`
`Operating on the state transition concept, six
`
`
`
`
`
`
`
`general requirements of the architecture of an
`
`
`
`
`
`
`associative processor were deduced. These are:
`
`Bu
`0» “—
`
`
`
`

`° --
`
`
`
`
`
`
`
`
`
`1 _-___:'_'J

`
`
`
`. —-
`
`
`
`
`
`
`
`
`
`1
`1 __—
`
`
`O
`0
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`IlJIl
`
`
`
`
`
`"f
`' CARRY lNTO BlT POSITION i+1 FROM BIT POSlTlON l
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Figure 4. Bit-Slice Addition State Transition
`
`Diagram.
`
`
`
`
`
`
`
`
`
`
`the
`1) The basic processrng primitive is
`
`
`
`
`
`
`identification of all words meeting some
`
`
`
`
`binary state variable configuration;
`
`
`
`
`
`
`
`2) No explic1t addressing is necessary since
`
`
`
`
`
`can
`processing
`be performed Simul—
`
`
`
`
`
`
`
`taneously over all data sets of interest,
`
`
`
`
`
`
`
`3) Data is processed “column serially” to
`
`
`
`
`
`
`limit the number of state transformations
`
`
`
`to be remembered;
`
`
`
`
`
`
`
`4) Tag bits (temporary bit-slice storage) are
`
`
`
`
`
`
`
`very useful (for example, the carry bit);
`
`
`
`
`
`
`
`5) A search results/word select register and
`
`
`
`
`
`multiword write capability are desirable;
`
`and
`
`
`
`
`
`
`
`6) Stored program control of the associative
`
`
`
`processor is desrrable.
`
`
`
`
`
`
`Fuller and Bird recognized an important
`
`
`
`
`
`
`
`
`correlation between the way in which data is
`
`
`
`
`
`
`
`
`stored in the processor’s memory and its proc-
`
`
`
`
`
`
`
`essing speed. As examples of this phenomenon,
`
`
`
`
`
`
`
`
`
`they described both word per cell (W/C) and bit
`
`
`
`
`
`
`
`(B/C) data organizations for
`their
`per
`cell
`
`
`
`
`
`
`
`
`associative processor. Figures 5 and 6 show the
`
`
`
`
`
`
`
`W/C and B/C storage organizations for nine
`
`
`
`
`
`
`
`
`four-bit words X11, .. .
`, X33, respectively.
`
`
`
`
`
`
`
`
`The word per cell organization has become an
`
`
`
`
`
`
`
`
`
`is,
`industry standard, and it
`in fact, the way
`
`
`
`
`
`
`most associative processors store data. The
`
`
`
`
`
`
`
`desire to perform bit-slice processing on many
`
`
`
`
`
`
`data sets simultaneously without much inter-
`
`
`
`
`
`
`
`word communication led to the widespread use
`
`
`
`
`
`
`
`
`
`of this type of organization. W/C is very effi-
`
`
`
`
`
`
`
`
`
`cient if a large number of cells are involved;
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`INTEL - 1010
`
`INTEL - 1010
`
`

`

`
`
`
`Associative and Parallel Processors
`
`
`
`-
`
`
`
`223
`
`
`
`
`
`BITS 0 TO 3
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`OF WORD X13
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Figure 5. Word Per Cell Organization.
`
`
`
`
`
`
`
`
`operands can be stored in the same cell, and
`
`
`
`
`
`
`
`most operands are being processed by each bit-
`
`
`
`
`
`
`
`slice command. For example, in Figure 5, this
`
`
`
`
`
`
`
`type of organization would be very effic1ent if,
`
`
`
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket