`
`
`
`Associative and Parallel Processors
`
`
`
`
`KENNETH J. THURBER
`
`
`
`
`
`
`
`
`
`
`
`Product Development Group, Sperry Univac Defense Systems, St. Paul, Minnesota 55165
`
`
`
`
`
`
`
`
`
`Computer, Information and Control Sciences Department, University of Minnesota,
`
`
`
`Minneapolis, Minnesota 55455
`
`
`
`LEON D. WALD
`
`
`
`
`
`
`
`
`
`
`Test Equipment Engineering, Government and Aeronautical Products Division, Honeywell, Inc.,
`
`
`Minneapolis, Minnesota
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`This paper is a tutorial survey of the area of parallel and associative processors.
`
`
`
`
`
`
`
`
`
`
`
`
`The paper covers the main design tradeoffs and major architectures of SIMD
`
`
`
`
`
`
`
`
`
`(Single Instruction Stream Multiple Data Stream) systems. Summaries of
`
`
`
`
`
`
`
`
`
`
`
`ILLIAC IV, STARAN, OMEN, and PEPE, the major SIMD processors, are
`
`included.
`
`
`
`
`
`
`
`
`
`
`Keywords and Phrases; associative processor, parallel processor, OMEN,
`
`
`
`
`
`
`
`
`STARAN, PEPE, ILLIACIV, architecture, large-scale systems, SIMD
`
`
`
`
`processors, array processors, ensemble.
`
`
`
`
`
`CR Categories: 3.80, 4.20, 6.22
`
`
`
`
`
`
`
`
`
`
`
`
`
`ciative processors have been published [1-10].
`INTRODUCTION
`
`
`
`
`
`
`
`Some of these surveys are only of historical
`
`
`
`
`
`
`
`
`
`
`Purpose and Scope
`interest due to their age. Several conferences in
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`to present a
`The purpose of
`this paper is
`this area mayalso be of interest [11-14].
`
`
`
`
`
`
`
`
`tutorial survey on the subject of parallel and
`
`
`
`
`
`
`
`
`
`
`Classification of Computer Architectures
`associative processors. The paper covers the
`
`
`
`
`
`
`
`
`
`
`
`Many approaches to classification of computer
`topics of system categorizations, applications,
`
`
`
`
`
`
`
`
`
`
`
`architectures are possible. Most techniques use
`main tradeoff
`issues, historically important
`
`
`
`
`
`
`
`
`
`
`
`
`
`global architecture properties and are thus valid
`architectures, and the architectures of systems
`
`
`
`
`
`
`
`
`
`
`
`only within hmited ranges. The main classifica-
`that are currently available.
`
`
`
`
`
`
`
`
`
`tion techniques discussed below provide a
`Currently, the microprocessor/computer-on-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`framework within which to view associative and
`a-chip revolution is providing the potential for
`
`
`
`
`
`
`
`parallel processors.
`production of very cost-effective high-
`
`
`
`
`
`
`
`
`
`
`
`performance computers through utilization of a
`Flynn [15] proposed a classification scheme
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`large number of these processors in parallel or
`that divides systems into categories based upon
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`in a network, The parallel and associative proc-
`their procedure and data streams. The four
`
`
`
`
`
`
`
`
`
`
`essors provide a class of architectures which can
`categories are:
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`1) SISD (Single Instruction Stream Single
`be readily used to take immediate advantage of
`
`
`
`
`
`
`
`
`
`
`Data Stream) — a uniprocessor such as a
`the microprocessor technology.
`
`
`
`
`single processor IBM 360.
`
`
`
`
`
`
`
`Surveys
`2) MISD (Multiple Instruction Stream Single
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`A number of good surveys on the subject (or
`Data Stream) — a pipeline processor such
`
`
`
`
`
`
`
`
`
`
`
`bordering on the subject) of parallel and asso-
`as CDC STAR.
`
`
`
`
`
`
`
`
`
`
`
`Copyright © 1976, Association for Computing Machinery, Inc. General permission to republish,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`but not for profit, all or part of this material is granted, provided that ACM’s copyright noticeis
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`given and that reference is made to this publication, to its date of issue, and to the fact that re-
`
`
`
`
`
`
`
`
`
`
`
`
`printing privileges were granted by permission of the Association for Computing Machinery.
`
`
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, Deeember 1975
`INTEL - 1010
`
`INTEL - 1010
`
`
`
`
`216
`
`
`
`.
`
`
`
`
`
`
`
`Kenneth J. Thurber and Leon D. Wald
`
`
`
`
`CONTENTS
`
`parallelism
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`based
`cation
`upon
`technique
`
`
`
`
`properties. Their categories are:
`
`
`
`
`|) General-purpose network computer;
`
`
`
`
`
`2) Special-purpose network with global
`
`
`parallelism; and
`
`
`
`
`3) Non-global,
`semi-independent network
`
`
`
`
`
`
`
`with local parallelism — this category is a
`
`
`
`
`
`
`
`catchall for machines which do not fit
`
`
`
`
`
`into the first two categories.
`
`
`
`
`
`
`
`Categories
`and 2 have
`the following
`1
`
`subcases:
`
`
`
`
`
`l.a) General-purpose network with cen-
`
`
`
`tralized common control;
`
`
`
`
`General-purpose network with
`1.b)
`
`
`
`
`identical processors but
`independent
`
`
`
`instruction execution actions;
`
`
`
`2.a) Pattern processors;
`
`
`
`2.b) Associative processors.
`
`
`
`
`
`
`The purpose of this classification technique
`
`
`
`
`
`was to differentiate between multiprocessors
`
`
`
`
`andhighly parailei organizations.
`
`
`
`
`
`sug-
`Another possible classification view,
`
`
`
`
`
`
`
`
`
`gested by Hobbs, et al.; {1] consists of the
`
`
`following categories:
`
`
`1) Multiprocessor
`
`
`
`2) Associative processor
`
`
`
`
`
`
`3) Networkor array processor, and
`
`
`
`4) Functional machines.
`
`
`
`
`
`
`
`Further, Hobbs, et al.; suggested that archi-
`
`
`
`
`
`
`
`tectures could be classified based upon the
`
`
`
`
`amountof parallelism in:
`
`
`1) Control
`
`
`
`
`2) Processing units, and
`
`
`
`3) Data streams.
`
`
`
`
`
`
`
`However, it was noted that these parameters
`
`
`
`
`
`
`
`
`were present in all highly parallel machines and
`
`
`
`
`
`
`
`were therefore not adequate to define a
`
`
`machinearchitecture.
`
`
`
`
`
`
`In his article against parallel processors,
`
`
`
`
`
`
`
`Shore
`[17] presents a unique classification
`
`
`
`
`
`
`technique which derives the machine descrip-
`
`
`
`
`
`
`
`tions from the description of a uniprocessor.
`
`
`
`
`
`
`The machine categories considered are sum-
`
`
`
`marized as follows:
`
`
`
`
`
`
`1) Machine I — a uniprocessor.
`
`
`
`
`
`
`
`
`
`
`
`
`
`3) SIMD (Single Instruction Stream Multiple
`2) Machine II —abit-slice associative proc-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Data Stream) — an associative or parallel
`essor built from Machine I by addingbit-
`
`
`
`
`
`
`
`
`
`
`processor such as ILLIACIV.
`slice processing and access
`capability
`
`
`
`
`
`
`
`4) MIMD (Multiple Instruction Stream
`(e.g., STARAN).
`
`
`
`
`
`
`
`
`
`
`
`Multiple Data Stream) — a multiprocessor
`3) Machine III —an orthogonal computer
`
`
`
`
`
`
`
`
`
`
`
`
`
`such as a Univac 1108 multiprocessor
`derived from Machine II by adding par-
`
`
`
`
`
`
`
`system.
`allel word processing and access capa-
`
`
`
`
`
`
`
`
`
`
`
`Murtha and Beadles [16] proposeda classifi-
`bility (e.g., OMEN).
`
`INTRODUCTION
`
`
`
`
`Purpose and Scope
`
`Surveys
`
`
`
`
`Classification of Computer Architectures
`Definitions
`
`Reasons for Use of SIMD Processors
`
`
`
`
`
`
`
`Applications
`
`
`
`
`
`
`Matnx Multiplication on a Parallel Processor
`SIMD GENEALOGY
`
`
`MAIN TRADEOFFISSUES
`
`
`
`
`
`Associative Addressing
`
`
`
`
`
`The State Transition Concept and Associative
`
`
`Processor Algonthms
`
`
`
`Processing Element (PE) Interconnections
`
`Input/Output
`Host Interaction
`
`
`
`
`Memory Distribution
`Software
`
`
`
`
`
`Activity Control and Match Resolution
`
`
`
`
`
`Balance Between Logic and Memory
`ASSOCIATIVE PROCESSORS
`
`
`Introduction
`
`
`
`
`
`DLM (Distributed Logic Memory)
`
`
`
`
`Highly Parallel Associative Processors
`Mixed Mode and Multidimensional Memories
`
`
`
`
`
`
`
`Implemented Associative Processors
`Associative Processor Simulators
`
`
`
`PARALLEL PROCESSOR AND ENSEMBLES
`
`
`
`Introduction
`
`
`
`Unger’s Machine
`
`
`
`
`
`
`
`SOLOMONI, SOLOMONII, and ILLIAC IV
`Machines With Other Interconnection Structures
`
`
`
`
`
`
`
`Orthogonal Computer
`PEPE
`
`Comment
`
`ACTUAL MACHINES
`
`Introduction
`
`STARAN
`
`OMEN
`
`PEPE
`
`ILLIAC IV
`
`
`CONCLUSION
`
`REFERENCES
`
`
`
`
`ee
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`INTEL - 1010
`
`INTEL - 1010
`
`
`
`
`
`.
`
`
`
`
`
`
`
`
`
`217
`
`
`
`
`
`Associative and Parallel Processors
`
`
`
`
`
`
`
`
`sistent with those used by Thurber [10]. The
`
`
`
`definitions are’
`
`
`
`
`
`
`1) SIMD machine: any computer with a
`
`
`
`
`
`
`single global control unit which drives
`
`
`
`
`
`multiple processing units, all of which
`
`
`
`
`
`either execute or
`ignore the current
`
`instruction.
`
`
`
`
`
`
`2) Associative Processor: any SIMD machine
`
`
`
`
`
`
`
`in which the processing units (or proc-
`
`
`
`
`
`
`essing memory)
`are addressed by a
`
`
`
`
`
`
`
`property of the data contents rather than
`
`
`
`
`
`
`
`
`by address (i.e., multiply A and B to-
`
`
`
`
`
`
`
`gether in all elements where A <B).
`
`
`
`
`
`
`
`3) Parallel Processor: any SIMD machine in
`
`
`
`
`
`
`which the processing elements are the
`
`
`
`
`
`
`
`order of complexity of current day small
`
`
`
`
`
`
`computers and which has, typically, high
`
`
`
`
`
`level of interconnectivity between proc-
`
`
`essing elements.
`
`
`
`
`
`
`
`4) Ensemble
`a parallel processor in which
`
`
`
`
`
`the interconnection level between proc-
`
`
`
`
`
`
`
`essing elements is very low or nonex-
`
`istent.
`
`
`
`
`
`
`Clearly, the intersection of these definitions
`
`
`
`
`
`
`
`
`
`
`will not be null due to the overlap in classi-
`
`
`
`fying machine architectures.
`
`
`
`
`
`
`
`4) Machine IV —a machine derived from
`
`
`
`
`
`Machine I by replicating the processing
`
`
`
`units (e.g., PEPE).
`
`
`
`
`
`5) Machine V—a machine derived from
`
`
`
`
`
`Machine IV by adding interconnections
`
`
`
`
`
`between processors(e.g., ILLIAC IV).
`
`
`
`
`
`
`
`6) Machine VI—a machine derived from
`
`
`
`
`
`
`Machine I by integrating processing logic
`
`
`
`
`
`
`into every memory element (e.g., Kautz’s
`
`
`
`
`logic-in-memory computers [85, 86]).
`
`
`
`
`
`classifies computers using
`[67]
`Higbie
`
`
`
`
`
`
`four basic categories. However, he
`Flynn’s
`
`
`
`
`
`
`
`expands the SIMD category into the following
`
`
`four subcategories:
`
`
`
`
`
`
`
`1) Array Processor — a processor that proc-
`
`
`
`
`
`
`
`esses data in parallel and addresses the
`
`
`
`
`
`
`
`
`
`data by address instead of by tag or value.
`
`
`
`
`2) Associative Memory Processor—a
`
`
`
`
`
`processor that operates on data addressed
`
`
`
`
`
`
`
`
`by tag or value rather than bv address
`
`
`
`
`
`
`(Note:
`this definition does not require
`
`
`
`
`
`parallel operation; however,
`the defini-
`
`
`
`
`
`
`
`tion does allow for machines that operate
`
`
`
`
`on data in paraliel),
`
`
`
`
`
`
`3) Associative Array Processor — a processor
`
`
`
`
`
`
`is associative and also operates
`that
`
`
`
`
`
`
`
`on arrays of data (typically, the opera-
`
`
`
`
`
`
`
`
`
`tions are on a bit-slice basis, i.e., a single
`
`
`
`
`bit of many words).
`
`
`
`
`
`
`4) Orthogonal Processor — a processor with
`
`
`
`
`two subsystems —an associative array
`
`
`
`
`
`
`processor subsystem and a serial (SISD)
`
`
`
`
`processor
`subsystem — which share
`a
`
`
`
`common memory array.
`
`
`
`
`
`
`Higbie’s categories provide for the identifica-
`
`
`
`
`
`
`
`
`tion of the ability to perform parallel, asso-
`
`
`
`
`
`
`
`ciative, and serial processing. Higbie defines a
`
`
`
`
`
`
`
`
`parallel processor to be any computer that con-
`
`
`
`
`
`
`
`tains multiple arithmetic units and operates on
`
`
`
`
`
`
`
`multiple data streams. Clearly, all of Higbie’s
`
`
`
`
`
`
`
`four subcategories of SIMD processors fit his
`
`
`
`
`
`definition of a parallel processor.
`
`
`
`
`
`
`Although none of the classification schemes
`
`
`
`
`
`
`presented here are mathematically precise, they
`
`
`
`
`
`
`
`are necessary to place associative and parallel
`
`
`
`
`
`
`
`processors in perspective. In the next section
`
`
`
`
`
`
`the terminology of associative and parallel
`
`
`
`
`
`
`
`
`
`processors will be defined for use in the re-
`
`
`
`
`mainder of this paper.
`
`Definitions
`
`
`
`
`
`
`
`The definitions used for the remainder of this
`
`
`
`
`
`
`
`
`
`
`paper are based upon Flynn [15], and are con-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Reasons for Use of SIMD Processors
`
`
`
`
`
`
`
`
`
`There are many reasons for the use of SIMD
`
`
`
`
`
`
`
`architectures. Some of the most importantare:
`
`
`1) Functional
`
`
`
`
`
`
`
`a) Large problems such as weather data
`
`processing.
`
`
`
`
`
`b) Problems with inherent data structure
`
`
`and parallelism.
`
`
`
`
`c) Reliability and graceful degradation.
`
`
`
`d) System complexity.
`
`
`
`e) Computational load.
`
`
`2) Hardware
`
`
`
`
`
`
`a) Better use of hardware on problems
`
`
`
`
`
`with large amounts of parallelism.
`
`
`
`
`
`b) Advent of LSI microprocessors.
`
`
`
`
`
`c) Economyof duplicate structures.
`
`
`
`
`d) Lower nonrecurring and recurring
`
`costs.
`
`
`3) Software
`
`
`
`
`
`a) Simpler than for multiprocessor.
`
`
`
`
`
`
`b) Easier to construct large systems.
`
`
`
`
`c) Less executive function requirements.
`
`
`
`
`
`
`However, the reader is cautioned to remem-
`
`
`
`
`
`
`ber that these are special-purpose machines and
`
`
`
`
`
`
`
`any attempt
`to apply them to an incorrectly
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`INTEL - 1010
`
`INTEL - 1010
`
`
`
`
`218
`
`
`
`
`
`
`
`Kenneth J. Thurber and Leon D.
`°
`
`
`
`
`
`
`
`
`sized, or designed, problem is an exercise in
`
`futility.
`
`
`
`
`Applications
`
`
`
`
`
`
`Numerous applications have been proposed for
`
`
`
`
`
`
`associative and parallel processors. Whether an
`
`
`
`
`
`
`
`application is ever implemented is based upon
`
`
`
`
`
`
`
`
`
`the economics of the problem. It is not enough
`
`
`
`
`
`
`
`
`
`to be able to describe a problem solution. To
`
`
`
`
`
`
`date, considerable energy has been expended
`
`
`
`
`
`searching for cost-effective parallel solutions,
`
`
`
`
`
`
`
`
`but little has been done to actually implement
`
`
`
`
`
`
`them. Examples of
`such applications are:
`
`
`
`
`matrix manipulation, differential equations,
`
`
`
`and linear programming.
`
`
`
`
`
`
`
`Several areas of application appear quite well
`
`
`
`
`
`
`
`suited to associative and parallel processing. In
`
`
`
`
`
`
`some cases, SIMD processors provide cost-
`
`
`
`
`
`
`effective system augmentation (e.g., air traffic
`
`
`
`
`
`
`control and associative head-per-track disks). In
`
`
`
`
`
`
`
`
`others, SIMD processors are very close to func-
`
`
`
`
`
`
`
`
`tional analogs of the physical system (e.g., data
`
`
`
`
`compression, concentration, and multiplexing).
`
`
`
`
`
`
`
`The use of associative and parallel processors
`
`
`
`
`
`
`
`appears promising in the elimination of critical
`
`
`
`
`
`bottlenecks in current general-purpose com-
`
`
`
`
`
`
`
`puter systems; however, only very small asso-
`
`
`
`
`
`
`
`
`ciative memories are required, and costis really
`
`
`
`
`
`
`
`
`
`not a critical factor m the solution of these
`
`
`
`
`
`
`problems. Such problems may be encountered
`
`
`
`
`
`
`
`in the management of computer resources, and
`
`
`
`
`
`
`
`might involve such items as protection mecha-
`
`
`
`
`
`nisms,
`resource allocation, and memory
`
`management.
`
`
`
`
`
`The application trend for associative
`
`
`
`
`
`
`
`
`processors is well defined. Due to cost factors,
`
`
`
`
`
`
`
`
`applications will be limited (in the near future)
`
`
`
`
`
`
`to problems such as
`resource management,
`
`
`
`
`
`
`virtual memory mechanisms, and some aug-
`
`
`
`
`
`
`
`mentation of current systems, rather than to
`
`
`
`
`
`
`data-base management
`orlarge file searching
`
`
`
`
`
`
`and processing. The application trend for
`
`
`
`
`
`
`
`
`parallel processors and ensembles seemsto be in
`
`
`
`
`
`
`
`the area of large data computation problems
`
`
`
`
`
`
`
`such as weather data processing, nuclear data
`
`
`
`
`
`
`processing, and ballistic missile defense. Re-
`
`
`
`
`
`
`searchers must concentrate on systems aspects
`
`
`
`
`
`
`
`
`of the problem, not on developing solutions to
`
`
`
`
`
`
`problems which are ill-defined or nonexistent.
`
`
`
`
`
`
`The only truly useful data-processing applica-
`
`
`
`
`
`
`
`tions of parallel and associative processors have
`
`
`
`
`
`
`
`been developed through extensive work at the
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`
`
`Waid
`
`
`
`
`
`
`
`
`
`systen
`level,
`systems
`followed by parallel
`
`
`
`
`
`design, rather than vice versa.
`
`
`
`
`
`
`SIMD processors have been proposed anc
`
`
`
`
`
`
`
`appear well suited for a numberof applications
`
`
`
`
`
`
`
`
`these are listed here along with their primary
`
`
`
`
`
`
`
`references. Factors to be considered in applying
`
`
`
`
`
`
`SIMD processors have been discussed elsewhere
`
`
`
`
`
`
`
`
`
`[18]. To be useful for solving the listed prob.
`
`
`
`
`
`
`lems,
`the highly parallel processors must
`
`
`
`
`
`become more
`cost-effective. The suggested
`
`
`
`application areasare:
`
`
`
`
`
`
`|) Applications in which associative proc-
`
`
`
`
`
`essors appear to be cost-effective:
`
`
`
`
`
`
`a) Virtual memory mechanisms
`[19]
`
`
`
`
`b) Resource allocation
`[20]
`
`
`
`
`c) Hardware executives
`(21]
`
`
`
`
`
`d) Interrupt processing
`[22,23]
`
`
`
`
`
`e) Protection mechanisms
`[19]
`
`
`
`f) Scheduling
`[20]
`
`
`
`
`
`
`2) Applications in which parallel processors
`
`
`
`
`
`appear cost-effective and in which
`
`
`
`
`
`associative processors may be
`cost-
`
`effective:
`
`
`
`
`(24]
`a) Bulk filtering
`
`
`
`b) Tracking
`{25]
`
`
`
`
`
`c) Air traffic control
`[26]
`
`
`
`
`d) Data compression
`[27]
`
`
`
`
`e) Communications multiplexing [28,29]
`
`
`
`
`f) Signal processing
`[30]
`
`
`
`
`
`
`3) Applications in which associative proc-
`
`
`
`
`
`
`
`
`essors would be useful if they were more
`
`
`
`
`
`
`cost-effective and in which parallel proc-
`
`
`
`
`
`essors are probably cost-effective are:
`
`
`
`
`a) Information retrieval
`(31]
`
`
`
`b) Sorting
`[32]
`
`
`
`
`c) Symbol manipulation
`{33]
`
`
`
`
`d) Pattern recognition
`(34]
`
`
`
`e)’Picture processing
`[35]
`
`
`
`
`f) Sonar processing
`[36]
`
`
`
`
`g) Sea surveillance
`[37]
`
`
`
`
`[38]
`h) Graph processing
`
`
`
`1) Dynamic programming
`[39]
`
`
`
`
`j) Differential equations
`[40]
`
`
`
`k) Eigenvector
`[41]
`
`
`
`
`1) Matrix operations
`[42]
`
`
`
`
`
`m) Network flow analysis
`[43]
`
`
`
`
`n) Documentretrieval
`{44]
`
`
`
`
`
`0) Data file manipulation
`[45]
`
`
`
`
`
`p) Machine documenttranslation
`[46]
`
`
`
`
`
`q) Data file searching
`[47]
`
`
`
`r) Compilation
`[48]
`
`
`
`
`s) Formatted files
`[49]
`
`
`
`
`
`t) Automatic abstracting
`[50,51]
`
`
`
`INTEL - 1010
`
`INTEL - 1010
`
`
`
`
`
`.
`
`
`219
`
`
`
`
`
`
`
`
`
`
`Jump:
`
`
`
`
`
`Associative and Parallel Processors
`
`
`
`
`
`
`
`
`
`
`[52]
`vu) Dictionary-look-up translations
`Each cell is interconnected to its four nearest
`
`
`
`
`
`
`
`
`
`([53]
`v) Data management
`neighbors. Cannon [59] derived the following
`
`
`
`
`
`
`
`
`
`
`
`
`(54]
`w) Theorem proving
`algorithm to multiply two nxn matrices to-
`
`
`
`
`
`
`
`
`
`
`
`
`{[55]
`x) Computergraphics
`gether in n stages using this processor.
`
`
`
`
`
`
`{56]
`y) Weather data processing
`
`
`
`
`
`
`presents many Algorithm:
`Hollander’s paper
`[2]
`
`
`
`
`
`
`
`
`
`
`CREG =0
`experts’ opinions about
`the applicability of
`Set:
`
`
`
`
`
`
`
`
`
`
`
`
`
`associative and parallel processors. Slotnick BREG (PE; y) =B[I,J] for all
`
`
`
`
`
`
`
`
`
`
`
`[57] and Fuller
`[58] have compared asso-
`LJ <N
`,
`
`
`
`
`
`
`
`
`
`
`
`
`
`ciative and parallel processors. They concluded
`AREG (PE, yJ = [I,J] for all
`
`
`
`
`
`
`
`
`
`,
`that parallel processors appear more useful than
`LJ <N
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`associative processors, but the machines would Ith row ofAleft I-1 columnsShift:
`
`
`
`
`
`
`
`
`
`
`
`
`
`always be special purpose. Shore [17] has pre-
`for all I<N
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`sented a very eloquent case against SIMD
`Jth column of B up J-1 rows
`
`
`
`
`
`
`
`
`machinesand parallel processors.
`for all J <N
`
`
`
`
`
`Multiply: atin BREG)
`Matrix Multiplication on a Parallel Processor
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Matrix manipulations can be used to illustrate
`(CREG=CREG + TREG)
`Add:
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`the potential of a parallel processor. One of the
`In Parallel in all PEs
`
`
`
`
`
`
`
`
`
`
`
`
`
`major computations in many of the possible Shift:|AREGright one row
`
`
`
`
`
`
`
`
`
`
`applications cited previously is
`the matrix
`BREG downone column
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`multiply. Assume that a parallel processor such
`If not Nth pass, Jump to
`
`
`
`
`
`
`
`
`
`
`
`as that shown in Figure 1
`is used to perform
`Multiply:
`
`
`this operation.
`
`
`
`
`
`
`
`Each processing element will be assumed to
`
`
`
`
`
`
`
`have three registers, AREG, BREG, and CREG.
`
`
`
`conT#OL
`MEMORY
`
`
`TO AND FROM ALLPEs
`
`
`
`
`
`
`
`
`
`t
`
`@ 4—
`
`
`
`
`
`
`oe e
`
`¢
`
`
`
`
`
`
`
`Figure 1. Parallel Processor with PE connec-
`
`
`
`
`
`
`tions to the Four Nearest Neighbors.
`
`
`
`
`
`
`
`
`As an example let:
`ay
`
`
`ag
`
`aq
`
`
`A=
`
`
`
`ay
`
`
`as
`
`ag
`
`
`a3.
`
`
`46
`ag
`
`ba
`by
`b;
`
`
`
`
`
`
`
`B= a
`
`
`b3
`b6
`bg
`
`
`
`
`and
`
`C=A xB,
`
`
`
`
`
`
`After initialization, the memory map for CREG
`
`is:
`
`
`
`0
`
`0
`
`0
`
`0
`
`Q
`
`0
`
`
`
`
`
`
`
`
`
`For AREG, the memory mapis:
`
`
`
`4]
`
`
`as
`a
`9
`
`i)
`
`
`46
`a
`
`7
`
`0
`
`0
`
`0
`
`43
`
`a4
`a
`8
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`INTEL - 1010
`
`INTEL - 1010
`
`
`
`
`
`
`
`
`
`
`Kenneth J. Thurber and Leon D. Wald
`220
`
`
`
`
`And the BREG mapis:
`
`.
`
`
`
`
`
`
`
`
`Sig
`
`
`
`
`
`
`
`
`b3
`
`
`by
`
`
`by
`
`
`bg
`
`
`bg
`
`bg
`
`
`
`
`
`
`
`bg
`b,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`after two more iterations the multiply will be
`After the multiply, add, shift, and jump, the
`
`
`
`
`
`
`
`
`
`
`
`
`
`finished and CREG [PE,,] will contain C[1,J].
`memory maps appear as follows:
`
`
`
`
`a,b, agbs—agzby
`
`
`
`
`
`CREG =
`asbs
`agbe
`agba
`
`
`SIMD GENEALOGY
`
`
`
`
`
`
`
`
`
`
`a7b4
`Figure 2 indicates the major parallel and asso-
`agb3
`agbg
`
`
`
`
`
`
`cialive architectures considered in this paper,
`
`
`
`
`
`
`together with their generic relationships. These
`
`
`
`
`
`
`
`
`may differ substantially, yet they are related by
`
`
`
`
`
`
`
`.
`me
`virtue of
`their global associative or parallel
`
`.
`properties.
`
`
`AREG=
`
`
`
`a3
`
`
`|a
`
`ag
`
`4
`
`a)
`
`
`a
`
`ag
`
`5
`
`
`
`a4
`
`a
`aq
`
`6
`
`SIMD MACHINES
`
`
`
`ASSOCIATIVE PROCESSORS
`
`
`
`
`PARALLEL PROCESSORS
`
`
`
`
`ENSEMBLES
`
`
`MEMEX (72)
`
`
`
`
`
`
`DISTRIBUTED LOGIC MEMORY (31)
`
`
`
`
`
`
`UNGER
`MACHINE
`
`(107)
`CRYOGENIC CATALOG MEMORY {73)
`
`
`
`
`
`
`
`
`
`
`(81)
`
`
`
`ASSOCIATION
`STORING
`
`PROCESSOR
`(78)
`
`
`
`
`TREE
`
`CHANNEL
`
`PROCESSOR
`(80)
`
`
`
`
`
`
`TWO
`
`DIMENSIONAL
`AND BULK
`
`
`DISTRIBUTED
`LOGIC
`
`MEMORIES
`
`
`
`
`
`
`
`
`
`HIGHLY PARALLEL
`
`
`
`AUGMENTED
`
`CONTENT
`ADDRESSABLE
`
`MEMORY
`
`(85)
`
`
`
`
`
`
`ASSOCIATIVE
`
`CONTROL
`SWITCH
`
`(24)
`
`
`
`
`
`
`ASSOCIATIVE
`
`LOGIC
`PROCESSING
`SYSTEM
`
`(87)
`
`
`
`PEPE
`(115)
`
`
`
`
`ORTHOGONAL
`COMPUTER
`
`
`(105)
`{
`
`OMEN
`
`(112)
`
`
`
`
`
`SOLOMON |
`(108)
`
`|
`SOLOMONII
`
`
`(109)
`
`
`ILLIAC iV
`
`(57)
`
`
`
`
`’
`IMPLEMENTED
`
`ASSOCIATIVE
`
`PROCESSORS
`
`
`J
`
`BIT
`
`SLICE
`(113)
`
`
`
`
`J
`
`ADDER
`
`SKEWED
`BIT
`
`SLICE
`
`(92)
`
`
`
`
`J
`
`EXOR
`
`SKEWED
`BIT
`
`SLICE
`(93)
`
`
`
`
`i
`
`BYTE
`
`SLICE
`
`(406)
`
`
`
`
`J
`
`DISTRIBUTED
`LOGIC
`
`
`(20)
`
`
`
`
`
`STARAN
`(114)
`
`
`
`
`MIXED MODE AND MULTIDIMENSIONAL (23)
`
`
`
`
`
`
`
`Figure 2. SIMD Geneology.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`
`
`,
`
`INTEL - 1010
`
`INTEL - 1010
`
`
`
`
`
`
`MAIN TRADEOFF ISSUES
`
`
`
`
`
`
`
`
`
`To allow the reader to place the processed
`
`
`
`
`
`
`architectures in perspective, the major system
`
`
`
`
`
`
`
`
`tradeoff issues in SIMD processors will be dis-
`
`
`
`
`
`
`
`
`cussed next. It should be understood that SIMD
`
`
`
`
`
`
`
`processors are special purpose. They are useful
`
`
`
`
`
`
`
`
`
`for a limited set of applications which may be
`
`
`
`
`
`characterized as having [18]:
`
`
`
`
`
`
`
`
`1) A large numberof independent datasets;
`
`
`
`
`
`
`
`2) No relationships between data sets that
`
`
`
`
`
`
`prevent
`them from being processed in
`
`parallel;
`
`
`
`
`
`
`3) The requirement
`for quick throughput
`
`
`response; and
`
`
`
`
`
`
`associative
`to exploit
`4) The potential
`
`
`
`addressing selection techniques.
`
`
`Associative Addressing
`
`
`
`
`
`
`
`
`Figure 3 indicates the main features required to
`
`
`
`
`
`
`implement an associative address process. The
`
`
`
`
`
`
`
`designer must
`include facilities to address or
`
`
`
`
`
`
`query the processor associatively, i., to
`
`
`
`.
`
`
`
`
`
`
`Associative and Parallel Processors
`221
`
`
`
`
`
`
`
`
`address the memory array by content. In Figure
`
`
`
`
`
`
`
`
`the application may require that all em-
`3,
`
`
`
`
`
`
`
`
`
`
`ployees with a salary of $35 per day be located.
`
`
`
`
`
`
`
`This would be accomplished by performing an
`
`
`
`
`
`
`
`
`
`equality search on the daily salary field of the
`
`
`
`
`
`
`
`
`
`file. This search is performed in parallel. To set
`
`
`
`
`
`
`
`
`
`up the search a Data Register must be loaded
`
`
`
`
`
`
`
`
`with the daily salary ($35) for comparison, a
`
`
`
`
`
`
`
`
`Mask Register is included to mask the data
`
`
`
`
`
`
`
`
`
`register so that only the desired fields may be
`
`
`
`
`
`
`
`searched, a Word Select (activity select) Reg-
`
`
`
`
`
`
`
`
`
`ister specifies the subset of words to be ad-
`
`
`
`
`
`
`
`
`dressed, a Results Register is required to collect
`
`
`
`
`
`
`
`
`
`the results of the search, a Match Indicatoris
`
`
`
`
`
`
`
`
`
`used to indicate the number of matches, and a
`
`
`
`
`
`
`
`Multiple Match Resolver must be provided to
`
`
`
`
`
`
`
`generate a pointer to the “‘topmost” matched
`
`
`
`
`
`
`
`word. An eight-word example of a personnel
`
`
`
`
`
`
`
`
`file stored in a simple associative memory is
`
`
`
`
`
`
`
`
`illustrated in Figure 3,
`it assumes that
`the
`
`
`
`
`
`
`
`
`
`
`search for a salary of $35 per day wasinitiated
`
`
`
`
`
`
`
`
`on the entire file. All necessary registers and
`
`
`
`features are illustrated.
`
`CONTROL UNIT
`
`
`
`MATCH
`INDICATOR
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`GENERAL FORM OF THE
`
`
`
`FIELD DESCRIPTOR
`
`
`
`
`
`
`
`NAME
`
`EMPLOYEE
`DAILY
`
`SALARY |NUMBER
`
`
`
`
`
`DATA REGISTER
`
`
`
`
`MASK REGISTER
`
`
`
`
`MULTIPLE
`
`MATCH
`RESOLVER
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`WORD
`
`SELECT
`
`REGISTER
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`SEARCH
`
`RESULTS
`
`REGISTER
`
`
`
`|
`
`
`
`
`
`EXTERNAL
`PROCESSING
`LOGIC
`
`
`
`
`
`
`
`
`
`Figure 3. Associative Processor Storing a PersonnelFile.
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`INTEL - 1010
`
`INTEL - 1010
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`222
`
`
`
`:
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`a 0
`
`* CARRY INTO BIT POSITION 141 FROM BIT POSITION 1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Figure 4. Bit-Slice Addition State Transition
`
`Diagram.
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`INTEL - 1010
`
`
`
`
`
`tH
`
`
`
`
`
`
`
`
`
`
`Kenneth J. Thurber and Leon D. Wald
`
`
`
`
`
`
`PRESENT
`The State Transition Concept and Associative
`
`
`
`
`Processor Algorithms
`
`
`
`
`
`
`
`
`
`as fT
`In 1965, Fuller and Bird [35] described a bit-
`
`
`
`
`
`
`
`
`
`oof oe
`slice associative processor architecture. Fuller
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`and Bird discussed their algorithms in terms of
`ty t j|eo__»)SESE)
`
`o
`
`
`
`
`
`
`state transitions. To perform parallel processing
`fe
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`of data in an associative processor, two basic
`
`
`
`
`
`
`
`
`
`
`
`
`operations are required. These are a content
`
`
`
`
`
`
`
`
`0
`search and a “multiwrite” operation. The multi-
`
`
`
`
`
`
`
`
`
`write operation consists of
`the ability to
`
`
`
`
`
`
`
`
`simultaneously write into selected bits of
`
`
`
`
`
`
`
`
`
`several selected words. With a content search
`
`
`
`
`
`
`
`
`and multiwrite capability, an associative proc-
`
`
`
`
`
`
`
`
`
`essor can be used to process data in highly
`
`
`parallel fashion.
`
`
`
`
`
`
`consider
`the problem of
`example,
`For
`
`
`
`
`
`
`
`
`
`
`adding field A to field B. This can be accom-
`
`
`
`
`
`
`
`
`
`plished as a sequence of bit pair additions (A;
`
`
`
`
`
`
`
`
`
`added to Bj), with suitable attention paid to the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`the
`1) The basic processing primitive is
`carry bit. If bit 1 is the least significant and bit
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`n the most, Figure 4 indicates the state table
`identification of all words meeting some
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`binary state variable configuration;
`for the bit-serial addition of A to B with the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`result accumulating into B. As shown B; (result)
`2) No explicit addressing is necessary since
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`can
`processing
`be performed simul-
`and C; +
`differ from B; (operand) and C;in
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`taneously over all data sets of interest,
`the four states, 1, 3, 4, and 6. One possible
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`3) Data is processed “columnserially” to
`addition method would be to search for these
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`states and multiwrite the correct B; and Cj + 1
`limit the numberof state transformations
`
`
`
`
`
`
`
`
`
`
`
`
`into the appropniate fields, i.e., set up a “tag
`to be remembered;
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`field” designated as Cj, and set C} = 0. Then
`4) Tag bits (temporary bit-slice storage) are
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`operations would proceed from the least sig-
`very useful (for example, the carry bit);
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`nificant bit to the most significant bit of the
`5) A search results/word select register and
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`pair of operands. At each bit position, i, Aj, Bj,
`multiword write capability are desirable;
`
`
`
`
`
`
`
`
`
`
`and C; would be searched for state 1. Then
`and
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Cj + 1 = 0, B, = 1 would be multiwritten into
`6) Stored program control of the associative
`
`
`
`
`
`
`
`
`
`
`
`
`all matching words. After a search for state 3,
`processoris desirable.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Cj + 1 = 1, By = 0 would be multiwritteninto all
`Fuller and Bird recognized an important
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`matching words. This same procedure would be
`correlation between the way in which data is
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`carried out for states 4 and 6.
`if a word is
`stored in the processor’s memory andits proc-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`matched on any search, it would be removed
`essing speed. As examples of this phenomenon,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`from the list of active cells until operations
`they described both word per cell (W/C) and bit
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`moved from bit position i
`to i+ 1, so that
`(B/C) data organizations for
`their
`per
`cell
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`writing the resultant B, and the carry-in forbit
`associative processor. Figures 5 and 6 show the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`W/C and B/C storage organizations for nine
`position i+ 1 into the memory does not cause
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`the data already processed to be reprocessed.
`four-bit words Xj, ... , X33, respectively.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`The word per cell organization has become an
`Obviously, utilizing the sequential state trans-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`formation approach, any desired transforma-
`is,
`industry standard, and it
`in fact, the way
`
`
`
`
`
`
`
`
`
`
`
`
`
`most associative processors store data. The
`tions can be processed in an_associative
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`desire to perform bit-slice processing on many
`processor with simple content search and multi-
`
`
`
`
`
`
`
`
`
`
`
`
`
`data sets simultaneously without much inter-
`write capability using very simple word logic.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`word communication led to the widespread use
`Operating on the state transition concept, six
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`of this type of organization. W/C is very effi-
`general requirements of the architecture of an
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`cient if a large number of cells are involved;
`associative processor were deduced. Theseare:
`
`INTEL - 1010
`
`
`
`BITS 0 TO 3 OF
`
`
`
`
`WORD X13
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Associative and Parallel Processors
`
`
`
`
`
`.
`
`223
`
`
`
`
`
`BITS 0 TO 3
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`OF WORD X43
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Figure 5. Word Per Cell Organization.
`
`
`
`
`
`
`
`
`operands can be stored in the samecell, and
`
`
`
`
`
`
`
`most operands are being processed by eachbit-
`
`
`
`
`
`
`
`slice command. For example, in Figure 5, this
`
`
`
`
`
`
`
`type of organization would bevery efficient if,
`
`
`
`
`
`
`
`
`for all operands, Xj,, X,7, Xj3, it were desired
`
`
`
`
`
`
`
`
`
`
`
`to calculate Xj; + (Xj * Xj3) for alli, In this
`
`
`
`
`
`
`
`
`case all cells could be activated and the cal-
`
`
`
`
`
`
`culation could be performed in a bit-serial
`
`
`
`
`manneroverall operands.
`
`
`
`
`
`
`
`However, there are problems in which