throbber

`
`
`
`Associative and Parallel Processors
`
`
`
`
`KENNETH J. THURBER
`
`
`
`
`
`
`
`
`
`
`
`Product Development Group, Sperry Univac Defense Systems, St. Paul, Minnesota 55165
`
`
`
`
`
`
`
`
`
`Computer, Information and Control Sciences Department, University of Minnesota,
`
`
`
`Minneapolis, Minnesota 55455
`
`
`
`LEON D. WALD
`
`
`
`
`
`
`
`
`
`
`Test Equipment Engineering, Government and Aeronautical Products Division, Honeywell, Inc.,
`
`
`Minneapolis, Minnesota
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`This paper is a tutorial survey of the area of parallel and associative processors.
`
`
`
`
`
`
`
`
`
`
`
`
`The paper covers the main design tradeoffs and major architectures of SIMD
`
`
`
`
`
`
`
`
`
`(Single Instruction Stream Multiple Data Stream) systems. Summaries of
`
`
`
`
`
`
`
`
`
`
`
`ILLIAC IV, STARAN, OMEN, and PEPE, the major SIMD processors, are
`
`included.
`
`
`
`
`
`
`
`
`
`
`Keywords and Phrases; associative processor, parallel processor, OMEN,
`
`
`
`
`
`
`
`
`STARAN, PEPE, ILLIACIV, architecture, large-scale systems, SIMD
`
`
`
`
`processors, array processors, ensemble.
`
`
`
`
`
`CR Categories: 3.80, 4.20, 6.22
`
`
`
`
`
`
`
`
`
`
`
`
`
`ciative processors have been published [1-10].
`INTRODUCTION
`
`
`
`
`
`
`
`Some of these surveys are only of historical
`
`
`
`
`
`
`
`
`
`
`Purpose and Scope
`interest due to their age. Several conferences in
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`to present a
`The purpose of
`this paper is
`this area mayalso be of interest [11-14].
`
`
`
`
`
`
`
`
`tutorial survey on the subject of parallel and
`
`
`
`
`
`
`
`
`
`
`Classification of Computer Architectures
`associative processors. The paper covers the
`
`
`
`
`
`
`
`
`
`
`
`Many approaches to classification of computer
`topics of system categorizations, applications,
`
`
`
`
`
`
`
`
`
`
`
`architectures are possible. Most techniques use
`main tradeoff
`issues, historically important
`
`
`
`
`
`
`
`
`
`
`
`
`
`global architecture properties and are thus valid
`architectures, and the architectures of systems
`
`
`
`
`
`
`
`
`
`
`
`only within hmited ranges. The main classifica-
`that are currently available.
`
`
`
`
`
`
`
`
`
`tion techniques discussed below provide a
`Currently, the microprocessor/computer-on-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`framework within which to view associative and
`a-chip revolution is providing the potential for
`
`
`
`
`
`
`
`parallel processors.
`production of very cost-effective high-
`
`
`
`
`
`
`
`
`
`
`
`performance computers through utilization of a
`Flynn [15] proposed a classification scheme
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`large number of these processors in parallel or
`that divides systems into categories based upon
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`in a network, The parallel and associative proc-
`their procedure and data streams. The four
`
`
`
`
`
`
`
`
`
`
`essors provide a class of architectures which can
`categories are:
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`1) SISD (Single Instruction Stream Single
`be readily used to take immediate advantage of
`
`
`
`
`
`
`
`
`
`
`Data Stream) — a uniprocessor such as a
`the microprocessor technology.
`
`
`
`
`single processor IBM 360.
`
`
`
`
`
`
`
`Surveys
`2) MISD (Multiple Instruction Stream Single
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`A number of good surveys on the subject (or
`Data Stream) — a pipeline processor such
`
`
`
`
`
`
`
`
`
`
`
`bordering on the subject) of parallel and asso-
`as CDC STAR.
`
`
`
`
`
`
`
`
`
`
`
`Copyright © 1976, Association for Computing Machinery, Inc. General permission to republish,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`but not for profit, all or part of this material is granted, provided that ACM’s copyright noticeis
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`given and that reference is made to this publication, to its date of issue, and to the fact that re-
`
`
`
`
`
`
`
`
`
`
`
`
`printing privileges were granted by permission of the Association for Computing Machinery.
`
`
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, Deeember 1975
`INTEL - 1010
`
`INTEL - 1010
`
`

`

`
`216
`
`
`
`.
`
`
`
`
`
`
`
`Kenneth J. Thurber and Leon D. Wald
`
`
`
`
`CONTENTS
`
`parallelism
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`based
`cation
`upon
`technique
`
`
`
`
`properties. Their categories are:
`
`
`
`
`|) General-purpose network computer;
`
`
`
`
`
`2) Special-purpose network with global
`
`
`parallelism; and
`
`
`
`
`3) Non-global,
`semi-independent network
`
`
`
`
`
`
`
`with local parallelism — this category is a
`
`
`
`
`
`
`
`catchall for machines which do not fit
`
`
`
`
`
`into the first two categories.
`
`
`
`
`
`
`
`Categories
`and 2 have
`the following
`1
`
`subcases:
`
`
`
`
`
`l.a) General-purpose network with cen-
`
`
`
`tralized common control;
`
`
`
`
`General-purpose network with
`1.b)
`
`
`
`
`identical processors but
`independent
`
`
`
`instruction execution actions;
`
`
`
`2.a) Pattern processors;
`
`
`
`2.b) Associative processors.
`
`
`
`
`
`
`The purpose of this classification technique
`
`
`
`
`
`was to differentiate between multiprocessors
`
`
`
`
`andhighly parailei organizations.
`
`
`
`
`
`sug-
`Another possible classification view,
`
`
`
`
`
`
`
`
`
`gested by Hobbs, et al.; {1] consists of the
`
`
`following categories:
`
`
`1) Multiprocessor
`
`
`
`2) Associative processor
`
`
`
`
`
`
`3) Networkor array processor, and
`
`
`
`4) Functional machines.
`
`
`
`
`
`
`
`Further, Hobbs, et al.; suggested that archi-
`
`
`
`
`
`
`
`tectures could be classified based upon the
`
`
`
`
`amountof parallelism in:
`
`
`1) Control
`
`
`
`
`2) Processing units, and
`
`
`
`3) Data streams.
`
`
`
`
`
`
`
`However, it was noted that these parameters
`
`
`
`
`
`
`
`
`were present in all highly parallel machines and
`
`
`
`
`
`
`
`were therefore not adequate to define a
`
`
`machinearchitecture.
`
`
`
`
`
`
`In his article against parallel processors,
`
`
`
`
`
`
`
`Shore
`[17] presents a unique classification
`
`
`
`
`
`
`technique which derives the machine descrip-
`
`
`
`
`
`
`
`tions from the description of a uniprocessor.
`
`
`
`
`
`
`The machine categories considered are sum-
`
`
`
`marized as follows:
`
`
`
`
`
`
`1) Machine I — a uniprocessor.
`
`
`
`
`
`
`
`
`
`
`
`
`
`3) SIMD (Single Instruction Stream Multiple
`2) Machine II —abit-slice associative proc-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Data Stream) — an associative or parallel
`essor built from Machine I by addingbit-
`
`
`
`
`
`
`
`
`
`
`processor such as ILLIACIV.
`slice processing and access
`capability
`
`
`
`
`
`
`
`4) MIMD (Multiple Instruction Stream
`(e.g., STARAN).
`
`
`
`
`
`
`
`
`
`
`
`Multiple Data Stream) — a multiprocessor
`3) Machine III —an orthogonal computer
`
`
`
`
`
`
`
`
`
`
`
`
`
`such as a Univac 1108 multiprocessor
`derived from Machine II by adding par-
`
`
`
`
`
`
`
`system.
`allel word processing and access capa-
`
`
`
`
`
`
`
`
`
`
`
`Murtha and Beadles [16] proposeda classifi-
`bility (e.g., OMEN).
`
`INTRODUCTION
`
`
`
`
`Purpose and Scope
`
`Surveys
`
`
`
`
`Classification of Computer Architectures
`Definitions
`
`Reasons for Use of SIMD Processors
`
`
`
`
`
`
`
`Applications
`
`
`
`
`
`
`Matnx Multiplication on a Parallel Processor
`SIMD GENEALOGY
`
`
`MAIN TRADEOFFISSUES
`
`
`
`
`
`Associative Addressing
`
`
`
`
`
`The State Transition Concept and Associative
`
`
`Processor Algonthms
`
`
`
`Processing Element (PE) Interconnections
`
`Input/Output
`Host Interaction
`
`
`
`
`Memory Distribution
`Software
`
`
`
`
`
`Activity Control and Match Resolution
`
`
`
`
`
`Balance Between Logic and Memory
`ASSOCIATIVE PROCESSORS
`
`
`Introduction
`
`
`
`
`
`DLM (Distributed Logic Memory)
`
`
`
`
`Highly Parallel Associative Processors
`Mixed Mode and Multidimensional Memories
`
`
`
`
`
`
`
`Implemented Associative Processors
`Associative Processor Simulators
`
`
`
`PARALLEL PROCESSOR AND ENSEMBLES
`
`
`
`Introduction
`
`
`
`Unger’s Machine
`
`
`
`
`
`
`
`SOLOMONI, SOLOMONII, and ILLIAC IV
`Machines With Other Interconnection Structures
`
`
`
`
`
`
`
`Orthogonal Computer
`PEPE
`
`Comment
`
`ACTUAL MACHINES
`
`Introduction
`
`STARAN
`
`OMEN
`
`PEPE
`
`ILLIAC IV
`
`
`CONCLUSION
`
`REFERENCES
`
`
`
`
`ee
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`INTEL - 1010
`
`INTEL - 1010
`
`

`

`
`
`.
`
`
`
`
`
`
`
`
`
`217
`
`
`
`
`
`Associative and Parallel Processors
`
`
`
`
`
`
`
`
`sistent with those used by Thurber [10]. The
`
`
`
`definitions are’
`
`
`
`
`
`
`1) SIMD machine: any computer with a
`
`
`
`
`
`
`single global control unit which drives
`
`
`
`
`
`multiple processing units, all of which
`
`
`
`
`
`either execute or
`ignore the current
`
`instruction.
`
`
`
`
`
`
`2) Associative Processor: any SIMD machine
`
`
`
`
`
`
`
`in which the processing units (or proc-
`
`
`
`
`
`
`essing memory)
`are addressed by a
`
`
`
`
`
`
`
`property of the data contents rather than
`
`
`
`
`
`
`
`
`by address (i.e., multiply A and B to-
`
`
`
`
`
`
`
`gether in all elements where A <B).
`
`
`
`
`
`
`
`3) Parallel Processor: any SIMD machine in
`
`
`
`
`
`
`which the processing elements are the
`
`
`
`
`
`
`
`order of complexity of current day small
`
`
`
`
`
`
`computers and which has, typically, high
`
`
`
`
`
`level of interconnectivity between proc-
`
`
`essing elements.
`
`
`
`
`
`
`
`4) Ensemble
`a parallel processor in which
`
`
`
`
`
`the interconnection level between proc-
`
`
`
`
`
`
`
`essing elements is very low or nonex-
`
`istent.
`
`
`
`
`
`
`Clearly, the intersection of these definitions
`
`
`
`
`
`
`
`
`
`
`will not be null due to the overlap in classi-
`
`
`
`fying machine architectures.
`
`
`
`
`
`
`
`4) Machine IV —a machine derived from
`
`
`
`
`
`Machine I by replicating the processing
`
`
`
`units (e.g., PEPE).
`
`
`
`
`
`5) Machine V—a machine derived from
`
`
`
`
`
`Machine IV by adding interconnections
`
`
`
`
`
`between processors(e.g., ILLIAC IV).
`
`
`
`
`
`
`
`6) Machine VI—a machine derived from
`
`
`
`
`
`
`Machine I by integrating processing logic
`
`
`
`
`
`
`into every memory element (e.g., Kautz’s
`
`
`
`
`logic-in-memory computers [85, 86]).
`
`
`
`
`
`classifies computers using
`[67]
`Higbie
`
`
`
`
`
`
`four basic categories. However, he
`Flynn’s
`
`
`
`
`
`
`
`expands the SIMD category into the following
`
`
`four subcategories:
`
`
`
`
`
`
`
`1) Array Processor — a processor that proc-
`
`
`
`
`
`
`
`esses data in parallel and addresses the
`
`
`
`
`
`
`
`
`
`data by address instead of by tag or value.
`
`
`
`
`2) Associative Memory Processor—a
`
`
`
`
`
`processor that operates on data addressed
`
`
`
`
`
`
`
`
`by tag or value rather than bv address
`
`
`
`
`
`
`(Note:
`this definition does not require
`
`
`
`
`
`parallel operation; however,
`the defini-
`
`
`
`
`
`
`
`tion does allow for machines that operate
`
`
`
`
`on data in paraliel),
`
`
`
`
`
`
`3) Associative Array Processor — a processor
`
`
`
`
`
`
`is associative and also operates
`that
`
`
`
`
`
`
`
`on arrays of data (typically, the opera-
`
`
`
`
`
`
`
`
`
`tions are on a bit-slice basis, i.e., a single
`
`
`
`
`bit of many words).
`
`
`
`
`
`
`4) Orthogonal Processor — a processor with
`
`
`
`
`two subsystems —an associative array
`
`
`
`
`
`
`processor subsystem and a serial (SISD)
`
`
`
`
`processor
`subsystem — which share
`a
`
`
`
`common memory array.
`
`
`
`
`
`
`Higbie’s categories provide for the identifica-
`
`
`
`
`
`
`
`
`tion of the ability to perform parallel, asso-
`
`
`
`
`
`
`
`ciative, and serial processing. Higbie defines a
`
`
`
`
`
`
`
`
`parallel processor to be any computer that con-
`
`
`
`
`
`
`
`tains multiple arithmetic units and operates on
`
`
`
`
`
`
`
`multiple data streams. Clearly, all of Higbie’s
`
`
`
`
`
`
`
`four subcategories of SIMD processors fit his
`
`
`
`
`
`definition of a parallel processor.
`
`
`
`
`
`
`Although none of the classification schemes
`
`
`
`
`
`
`presented here are mathematically precise, they
`
`
`
`
`
`
`
`are necessary to place associative and parallel
`
`
`
`
`
`
`
`processors in perspective. In the next section
`
`
`
`
`
`
`the terminology of associative and parallel
`
`
`
`
`
`
`
`
`
`processors will be defined for use in the re-
`
`
`
`
`mainder of this paper.
`
`Definitions
`
`
`
`
`
`
`
`The definitions used for the remainder of this
`
`
`
`
`
`
`
`
`
`
`paper are based upon Flynn [15], and are con-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Reasons for Use of SIMD Processors
`
`
`
`
`
`
`
`
`
`There are many reasons for the use of SIMD
`
`
`
`
`
`
`
`architectures. Some of the most importantare:
`
`
`1) Functional
`
`
`
`
`
`
`
`a) Large problems such as weather data
`
`processing.
`
`
`
`
`
`b) Problems with inherent data structure
`
`
`and parallelism.
`
`
`
`
`c) Reliability and graceful degradation.
`
`
`
`d) System complexity.
`
`
`
`e) Computational load.
`
`
`2) Hardware
`
`
`
`
`
`
`a) Better use of hardware on problems
`
`
`
`
`
`with large amounts of parallelism.
`
`
`
`
`
`b) Advent of LSI microprocessors.
`
`
`
`
`
`c) Economyof duplicate structures.
`
`
`
`
`d) Lower nonrecurring and recurring
`
`costs.
`
`
`3) Software
`
`
`
`
`
`a) Simpler than for multiprocessor.
`
`
`
`
`
`
`b) Easier to construct large systems.
`
`
`
`
`c) Less executive function requirements.
`
`
`
`
`
`
`However, the reader is cautioned to remem-
`
`
`
`
`
`
`ber that these are special-purpose machines and
`
`
`
`
`
`
`
`any attempt
`to apply them to an incorrectly
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`INTEL - 1010
`
`INTEL - 1010
`
`

`

`
`218
`
`
`
`
`
`
`
`Kenneth J. Thurber and Leon D.

`
`
`
`
`
`
`
`
`sized, or designed, problem is an exercise in
`
`futility.
`
`
`
`
`Applications
`
`
`
`
`
`
`Numerous applications have been proposed for
`
`
`
`
`
`
`associative and parallel processors. Whether an
`
`
`
`
`
`
`
`application is ever implemented is based upon
`
`
`
`
`
`
`
`
`
`the economics of the problem. It is not enough
`
`
`
`
`
`
`
`
`
`to be able to describe a problem solution. To
`
`
`
`
`
`
`date, considerable energy has been expended
`
`
`
`
`
`searching for cost-effective parallel solutions,
`
`
`
`
`
`
`
`
`but little has been done to actually implement
`
`
`
`
`
`
`them. Examples of
`such applications are:
`
`
`
`
`matrix manipulation, differential equations,
`
`
`
`and linear programming.
`
`
`
`
`
`
`
`Several areas of application appear quite well
`
`
`
`
`
`
`
`suited to associative and parallel processing. In
`
`
`
`
`
`
`some cases, SIMD processors provide cost-
`
`
`
`
`
`
`effective system augmentation (e.g., air traffic
`
`
`
`
`
`
`control and associative head-per-track disks). In
`
`
`
`
`
`
`
`
`others, SIMD processors are very close to func-
`
`
`
`
`
`
`
`
`tional analogs of the physical system (e.g., data
`
`
`
`
`compression, concentration, and multiplexing).
`
`
`
`
`
`
`
`The use of associative and parallel processors
`
`
`
`
`
`
`
`appears promising in the elimination of critical
`
`
`
`
`
`bottlenecks in current general-purpose com-
`
`
`
`
`
`
`
`puter systems; however, only very small asso-
`
`
`
`
`
`
`
`
`ciative memories are required, and costis really
`
`
`
`
`
`
`
`
`
`not a critical factor m the solution of these
`
`
`
`
`
`
`problems. Such problems may be encountered
`
`
`
`
`
`
`
`in the management of computer resources, and
`
`
`
`
`
`
`
`might involve such items as protection mecha-
`
`
`
`
`
`nisms,
`resource allocation, and memory
`
`management.
`
`
`
`
`
`The application trend for associative
`
`
`
`
`
`
`
`
`processors is well defined. Due to cost factors,
`
`
`
`
`
`
`
`
`applications will be limited (in the near future)
`
`
`
`
`
`
`to problems such as
`resource management,
`
`
`
`
`
`
`virtual memory mechanisms, and some aug-
`
`
`
`
`
`
`
`mentation of current systems, rather than to
`
`
`
`
`
`
`data-base management
`orlarge file searching
`
`
`
`
`
`
`and processing. The application trend for
`
`
`
`
`
`
`
`
`parallel processors and ensembles seemsto be in
`
`
`
`
`
`
`
`the area of large data computation problems
`
`
`
`
`
`
`
`such as weather data processing, nuclear data
`
`
`
`
`
`
`processing, and ballistic missile defense. Re-
`
`
`
`
`
`
`searchers must concentrate on systems aspects
`
`
`
`
`
`
`
`
`of the problem, not on developing solutions to
`
`
`
`
`
`
`problems which are ill-defined or nonexistent.
`
`
`
`
`
`
`The only truly useful data-processing applica-
`
`
`
`
`
`
`
`tions of parallel and associative processors have
`
`
`
`
`
`
`
`been developed through extensive work at the
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`
`
`Waid
`
`
`
`
`
`
`
`
`
`systen
`level,
`systems
`followed by parallel
`
`
`
`
`
`design, rather than vice versa.
`
`
`
`
`
`
`SIMD processors have been proposed anc
`
`
`
`
`
`
`
`appear well suited for a numberof applications
`
`
`
`
`
`
`
`
`these are listed here along with their primary
`
`
`
`
`
`
`
`references. Factors to be considered in applying
`
`
`
`
`
`
`SIMD processors have been discussed elsewhere
`
`
`
`
`
`
`
`
`
`[18]. To be useful for solving the listed prob.
`
`
`
`
`
`
`lems,
`the highly parallel processors must
`
`
`
`
`
`become more
`cost-effective. The suggested
`
`
`
`application areasare:
`
`
`
`
`
`
`|) Applications in which associative proc-
`
`
`
`
`
`essors appear to be cost-effective:
`
`
`
`
`
`
`a) Virtual memory mechanisms
`[19]
`
`
`
`
`b) Resource allocation
`[20]
`
`
`
`
`c) Hardware executives
`(21]
`
`
`
`
`
`d) Interrupt processing
`[22,23]
`
`
`
`
`
`e) Protection mechanisms
`[19]
`
`
`
`f) Scheduling
`[20]
`
`
`
`
`
`
`2) Applications in which parallel processors
`
`
`
`
`
`appear cost-effective and in which
`
`
`
`
`
`associative processors may be
`cost-
`
`effective:
`
`
`
`
`(24]
`a) Bulk filtering
`
`
`
`b) Tracking
`{25]
`
`
`
`
`
`c) Air traffic control
`[26]
`
`
`
`
`d) Data compression
`[27]
`
`
`
`
`e) Communications multiplexing [28,29]
`
`
`
`
`f) Signal processing
`[30]
`
`
`
`
`
`
`3) Applications in which associative proc-
`
`
`
`
`
`
`
`
`essors would be useful if they were more
`
`
`
`
`
`
`cost-effective and in which parallel proc-
`
`
`
`
`
`essors are probably cost-effective are:
`
`
`
`
`a) Information retrieval
`(31]
`
`
`
`b) Sorting
`[32]
`
`
`
`
`c) Symbol manipulation
`{33]
`
`
`
`
`d) Pattern recognition
`(34]
`
`
`
`e)’Picture processing
`[35]
`
`
`
`
`f) Sonar processing
`[36]
`
`
`
`
`g) Sea surveillance
`[37]
`
`
`
`
`[38]
`h) Graph processing
`
`
`
`1) Dynamic programming
`[39]
`
`
`
`
`j) Differential equations
`[40]
`
`
`
`k) Eigenvector
`[41]
`
`
`
`
`1) Matrix operations
`[42]
`
`
`
`
`
`m) Network flow analysis
`[43]
`
`
`
`
`n) Documentretrieval
`{44]
`
`
`
`
`
`0) Data file manipulation
`[45]
`
`
`
`
`
`p) Machine documenttranslation
`[46]
`
`
`
`
`
`q) Data file searching
`[47]
`
`
`
`r) Compilation
`[48]
`
`
`
`
`s) Formatted files
`[49]
`
`
`
`
`
`t) Automatic abstracting
`[50,51]
`
`
`
`INTEL - 1010
`
`INTEL - 1010
`
`

`

`
`
`.
`
`
`219
`
`
`
`
`
`
`
`
`
`
`Jump:
`
`
`
`
`
`Associative and Parallel Processors
`
`
`
`
`
`
`
`
`
`
`[52]
`vu) Dictionary-look-up translations
`Each cell is interconnected to its four nearest
`
`
`
`
`
`
`
`
`
`([53]
`v) Data management
`neighbors. Cannon [59] derived the following
`
`
`
`
`
`
`
`
`
`
`
`
`(54]
`w) Theorem proving
`algorithm to multiply two nxn matrices to-
`
`
`
`
`
`
`
`
`
`
`
`
`{[55]
`x) Computergraphics
`gether in n stages using this processor.
`
`
`
`
`
`
`{56]
`y) Weather data processing
`
`
`
`
`
`
`presents many Algorithm:
`Hollander’s paper
`[2]
`
`
`
`
`
`
`
`
`
`
`CREG =0
`experts’ opinions about
`the applicability of
`Set:
`
`
`
`
`
`
`
`
`
`
`
`
`
`associative and parallel processors. Slotnick BREG (PE; y) =B[I,J] for all
`
`
`
`
`
`
`
`
`
`
`
`[57] and Fuller
`[58] have compared asso-
`LJ <N
`,
`
`
`
`
`
`
`
`
`
`
`
`
`
`ciative and parallel processors. They concluded
`AREG (PE, yJ = [I,J] for all
`
`
`
`
`
`
`
`
`
`,
`that parallel processors appear more useful than
`LJ <N
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`associative processors, but the machines would Ith row ofAleft I-1 columnsShift:
`
`
`
`
`
`
`
`
`
`
`
`
`
`always be special purpose. Shore [17] has pre-
`for all I<N
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`sented a very eloquent case against SIMD
`Jth column of B up J-1 rows
`
`
`
`
`
`
`
`
`machinesand parallel processors.
`for all J <N
`
`
`
`
`
`Multiply: atin BREG)
`Matrix Multiplication on a Parallel Processor
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Matrix manipulations can be used to illustrate
`(CREG=CREG + TREG)
`Add:
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`the potential of a parallel processor. One of the
`In Parallel in all PEs
`
`
`
`
`
`
`
`
`
`
`
`
`
`major computations in many of the possible Shift:|AREGright one row
`
`
`
`
`
`
`
`
`
`
`applications cited previously is
`the matrix
`BREG downone column
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`multiply. Assume that a parallel processor such
`If not Nth pass, Jump to
`
`
`
`
`
`
`
`
`
`
`
`as that shown in Figure 1
`is used to perform
`Multiply:
`
`
`this operation.
`
`
`
`
`
`
`
`Each processing element will be assumed to
`
`
`
`
`
`
`
`have three registers, AREG, BREG, and CREG.
`
`
`
`conT#OL
`MEMORY
`
`
`TO AND FROM ALLPEs
`
`
`
`
`
`
`
`
`
`t
`
`@ 4—
`
`
`
`
`
`
`oe e
`

`
`
`
`
`
`
`
`Figure 1. Parallel Processor with PE connec-
`
`
`
`
`
`
`tions to the Four Nearest Neighbors.
`
`
`
`
`
`
`
`
`As an example let:
`ay
`
`
`ag
`
`aq
`
`
`A=
`
`
`
`ay
`
`
`as
`
`ag
`
`
`a3.
`
`
`46
`ag
`
`ba
`by
`b;
`
`
`
`
`
`
`
`B= a
`
`
`b3
`b6
`bg
`
`
`
`
`and
`
`C=A xB,
`
`
`
`
`
`
`After initialization, the memory map for CREG
`
`is:
`
`
`
`0
`
`0
`
`0
`
`0
`
`Q
`
`0
`
`
`
`
`
`
`
`
`
`For AREG, the memory mapis:
`
`
`
`4]
`
`
`as
`a
`9
`
`i)
`
`
`46
`a
`
`7
`
`0
`
`0
`
`0
`
`43
`
`a4
`a
`8
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`INTEL - 1010
`
`INTEL - 1010
`
`

`

`
`
`
`
`
`
`
`Kenneth J. Thurber and Leon D. Wald
`220
`
`
`
`
`And the BREG mapis:
`
`.
`
`
`
`
`
`
`
`
`Sig
`
`
`
`
`
`
`
`
`b3
`
`
`by
`
`
`by
`
`
`bg
`
`
`bg
`
`bg
`
`
`
`
`
`
`
`bg
`b,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`after two more iterations the multiply will be
`After the multiply, add, shift, and jump, the
`
`
`
`
`
`
`
`
`
`
`
`
`
`finished and CREG [PE,,] will contain C[1,J].
`memory maps appear as follows:
`
`
`
`
`a,b, agbs—agzby
`
`
`
`
`
`CREG =
`asbs
`agbe
`agba
`
`
`SIMD GENEALOGY
`
`
`
`
`
`
`
`
`
`
`a7b4
`Figure 2 indicates the major parallel and asso-
`agb3
`agbg
`
`
`
`
`
`
`cialive architectures considered in this paper,
`
`
`
`
`
`
`together with their generic relationships. These
`
`
`
`
`
`
`
`
`may differ substantially, yet they are related by
`
`
`
`
`
`
`
`.
`me
`virtue of
`their global associative or parallel
`
`.
`properties.
`
`
`AREG=
`
`
`
`a3
`
`
`|a
`
`ag
`
`4
`
`a)
`
`
`a
`
`ag
`
`5
`
`
`
`a4
`
`a
`aq
`
`6
`
`SIMD MACHINES
`
`
`
`ASSOCIATIVE PROCESSORS
`
`
`
`
`PARALLEL PROCESSORS
`
`
`
`
`ENSEMBLES
`
`
`MEMEX (72)
`
`
`
`
`
`
`DISTRIBUTED LOGIC MEMORY (31)
`
`
`
`
`
`
`UNGER
`MACHINE
`
`(107)
`CRYOGENIC CATALOG MEMORY {73)
`
`
`
`
`
`
`
`
`
`
`(81)
`
`
`
`ASSOCIATION
`STORING
`
`PROCESSOR
`(78)
`
`
`
`
`TREE
`
`CHANNEL
`
`PROCESSOR
`(80)
`
`
`
`
`
`
`TWO
`
`DIMENSIONAL
`AND BULK
`
`
`DISTRIBUTED
`LOGIC
`
`MEMORIES
`
`
`
`
`
`
`
`
`
`HIGHLY PARALLEL
`
`
`
`AUGMENTED
`
`CONTENT
`ADDRESSABLE
`
`MEMORY
`
`(85)
`
`
`
`
`
`
`ASSOCIATIVE
`
`CONTROL
`SWITCH
`
`(24)
`
`
`
`
`
`
`ASSOCIATIVE
`
`LOGIC
`PROCESSING
`SYSTEM
`
`(87)
`
`
`
`PEPE
`(115)
`
`
`
`
`ORTHOGONAL
`COMPUTER
`
`
`(105)
`{
`
`OMEN
`
`(112)
`
`
`
`
`
`SOLOMON |
`(108)
`
`|
`SOLOMONII
`
`
`(109)
`
`
`ILLIAC iV
`
`(57)
`
`
`
`
`’
`IMPLEMENTED
`
`ASSOCIATIVE
`
`PROCESSORS
`
`
`J
`
`BIT
`
`SLICE
`(113)
`
`
`
`
`J
`
`ADDER
`
`SKEWED
`BIT
`
`SLICE
`
`(92)
`
`
`
`
`J
`
`EXOR
`
`SKEWED
`BIT
`
`SLICE
`(93)
`
`
`
`
`i
`
`BYTE
`
`SLICE
`
`(406)
`
`
`
`
`J
`
`DISTRIBUTED
`LOGIC
`
`
`(20)
`
`
`
`
`
`STARAN
`(114)
`
`
`
`
`MIXED MODE AND MULTIDIMENSIONAL (23)
`
`
`
`
`
`
`
`Figure 2. SIMD Geneology.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`
`
`,
`
`INTEL - 1010
`
`INTEL - 1010
`
`

`

`
`
`
`MAIN TRADEOFF ISSUES
`
`
`
`
`
`
`
`
`
`To allow the reader to place the processed
`
`
`
`
`
`
`architectures in perspective, the major system
`
`
`
`
`
`
`
`
`tradeoff issues in SIMD processors will be dis-
`
`
`
`
`
`
`
`
`cussed next. It should be understood that SIMD
`
`
`
`
`
`
`
`processors are special purpose. They are useful
`
`
`
`
`
`
`
`
`
`for a limited set of applications which may be
`
`
`
`
`
`characterized as having [18]:
`
`
`
`
`
`
`
`
`1) A large numberof independent datasets;
`
`
`
`
`
`
`
`2) No relationships between data sets that
`
`
`
`
`
`
`prevent
`them from being processed in
`
`parallel;
`
`
`
`
`
`
`3) The requirement
`for quick throughput
`
`
`response; and
`
`
`
`
`
`
`associative
`to exploit
`4) The potential
`
`
`
`addressing selection techniques.
`
`
`Associative Addressing
`
`
`
`
`
`
`
`
`Figure 3 indicates the main features required to
`
`
`
`
`
`
`implement an associative address process. The
`
`
`
`
`
`
`
`designer must
`include facilities to address or
`
`
`
`
`
`
`query the processor associatively, i., to
`
`
`
`.
`
`
`
`
`
`
`Associative and Parallel Processors
`221
`
`
`
`
`
`
`
`
`address the memory array by content. In Figure
`
`
`
`
`
`
`
`
`the application may require that all em-
`3,
`
`
`
`
`
`
`
`
`
`
`ployees with a salary of $35 per day be located.
`
`
`
`
`
`
`
`This would be accomplished by performing an
`
`
`
`
`
`
`
`
`
`equality search on the daily salary field of the
`
`
`
`
`
`
`
`
`
`file. This search is performed in parallel. To set
`
`
`
`
`
`
`
`
`
`up the search a Data Register must be loaded
`
`
`
`
`
`
`
`
`with the daily salary ($35) for comparison, a
`
`
`
`
`
`
`
`
`Mask Register is included to mask the data
`
`
`
`
`
`
`
`
`
`register so that only the desired fields may be
`
`
`
`
`
`
`
`searched, a Word Select (activity select) Reg-
`
`
`
`
`
`
`
`
`
`ister specifies the subset of words to be ad-
`
`
`
`
`
`
`
`
`dressed, a Results Register is required to collect
`
`
`
`
`
`
`
`
`
`the results of the search, a Match Indicatoris
`
`
`
`
`
`
`
`
`
`used to indicate the number of matches, and a
`
`
`
`
`
`
`
`Multiple Match Resolver must be provided to
`
`
`
`
`
`
`
`generate a pointer to the “‘topmost” matched
`
`
`
`
`
`
`
`word. An eight-word example of a personnel
`
`
`
`
`
`
`
`
`file stored in a simple associative memory is
`
`
`
`
`
`
`
`
`illustrated in Figure 3,
`it assumes that
`the
`
`
`
`
`
`
`
`
`
`
`search for a salary of $35 per day wasinitiated
`
`
`
`
`
`
`
`
`on the entire file. All necessary registers and
`
`
`
`features are illustrated.
`
`CONTROL UNIT
`
`
`
`MATCH
`INDICATOR
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`GENERAL FORM OF THE
`
`
`
`FIELD DESCRIPTOR
`
`
`
`
`
`
`
`NAME
`
`EMPLOYEE
`DAILY
`
`SALARY |NUMBER
`
`
`
`
`
`DATA REGISTER
`
`
`
`
`MASK REGISTER
`
`
`
`
`MULTIPLE
`
`MATCH
`RESOLVER
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`WORD
`
`SELECT
`
`REGISTER
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`SEARCH
`
`RESULTS
`
`REGISTER
`
`
`
`|
`
`
`
`
`
`EXTERNAL
`PROCESSING
`LOGIC
`
`
`
`
`
`
`
`
`
`Figure 3. Associative Processor Storing a PersonnelFile.
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`INTEL - 1010
`
`INTEL - 1010
`
`

`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`222
`
`
`
`:
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`a 0
`
`* CARRY INTO BIT POSITION 141 FROM BIT POSITION 1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Figure 4. Bit-Slice Addition State Transition
`
`Diagram.
`
`
`
`
`
`
`
`
`
`
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`INTEL - 1010
`
`
`
`
`
`tH
`
`
`
`
`
`
`
`
`
`
`Kenneth J. Thurber and Leon D. Wald
`
`
`
`
`
`
`PRESENT
`The State Transition Concept and Associative
`
`
`
`
`Processor Algorithms
`
`
`
`
`
`
`
`
`
`as fT
`In 1965, Fuller and Bird [35] described a bit-
`
`
`
`
`
`
`
`
`
`oof oe
`slice associative processor architecture. Fuller
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`and Bird discussed their algorithms in terms of
`ty t j|eo__»)SESE)
`
`o
`
`
`
`
`
`
`state transitions. To perform parallel processing
`fe
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`of data in an associative processor, two basic
`
`
`
`
`
`
`
`
`
`
`
`
`operations are required. These are a content
`
`
`
`
`
`
`
`
`0
`search and a “multiwrite” operation. The multi-
`
`
`
`
`
`
`
`
`
`write operation consists of
`the ability to
`
`
`
`
`
`
`
`
`simultaneously write into selected bits of
`
`
`
`
`
`
`
`
`
`several selected words. With a content search
`
`
`
`
`
`
`
`
`and multiwrite capability, an associative proc-
`
`
`
`
`
`
`
`
`
`essor can be used to process data in highly
`
`
`parallel fashion.
`
`
`
`
`
`
`consider
`the problem of
`example,
`For
`
`
`
`
`
`
`
`
`
`
`adding field A to field B. This can be accom-
`
`
`
`
`
`
`
`
`
`plished as a sequence of bit pair additions (A;
`
`
`
`
`
`
`
`
`
`added to Bj), with suitable attention paid to the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`the
`1) The basic processing primitive is
`carry bit. If bit 1 is the least significant and bit
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`n the most, Figure 4 indicates the state table
`identification of all words meeting some
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`binary state variable configuration;
`for the bit-serial addition of A to B with the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`result accumulating into B. As shown B; (result)
`2) No explicit addressing is necessary since
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`can
`processing
`be performed simul-
`and C; +
`differ from B; (operand) and C;in
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`taneously over all data sets of interest,
`the four states, 1, 3, 4, and 6. One possible
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`3) Data is processed “columnserially” to
`addition method would be to search for these
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`states and multiwrite the correct B; and Cj + 1
`limit the numberof state transformations
`
`
`
`
`
`
`
`
`
`
`
`
`into the appropniate fields, i.e., set up a “tag
`to be remembered;
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`field” designated as Cj, and set C} = 0. Then
`4) Tag bits (temporary bit-slice storage) are
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`operations would proceed from the least sig-
`very useful (for example, the carry bit);
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`nificant bit to the most significant bit of the
`5) A search results/word select register and
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`pair of operands. At each bit position, i, Aj, Bj,
`multiword write capability are desirable;
`
`
`
`
`
`
`
`
`
`
`and C; would be searched for state 1. Then
`and
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Cj + 1 = 0, B, = 1 would be multiwritten into
`6) Stored program control of the associative
`
`
`
`
`
`
`
`
`
`
`
`
`all matching words. After a search for state 3,
`processoris desirable.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Cj + 1 = 1, By = 0 would be multiwritteninto all
`Fuller and Bird recognized an important
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`matching words. This same procedure would be
`correlation between the way in which data is
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`carried out for states 4 and 6.
`if a word is
`stored in the processor’s memory andits proc-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`matched on any search, it would be removed
`essing speed. As examples of this phenomenon,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`from the list of active cells until operations
`they described both word per cell (W/C) and bit
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`moved from bit position i
`to i+ 1, so that
`(B/C) data organizations for
`their
`per
`cell
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`writing the resultant B, and the carry-in forbit
`associative processor. Figures 5 and 6 show the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`W/C and B/C storage organizations for nine
`position i+ 1 into the memory does not cause
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`the data already processed to be reprocessed.
`four-bit words Xj, ... , X33, respectively.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`The word per cell organization has become an
`Obviously, utilizing the sequential state trans-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`formation approach, any desired transforma-
`is,
`industry standard, and it
`in fact, the way
`
`
`
`
`
`
`
`
`
`
`
`
`
`most associative processors store data. The
`tions can be processed in an_associative
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`desire to perform bit-slice processing on many
`processor with simple content search and multi-
`
`
`
`
`
`
`
`
`
`
`
`
`
`data sets simultaneously without much inter-
`write capability using very simple word logic.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`word communication led to the widespread use
`Operating on the state transition concept, six
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`of this type of organization. W/C is very effi-
`general requirements of the architecture of an
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`cient if a large number of cells are involved;
`associative processor were deduced. Theseare:
`
`INTEL - 1010
`
`

`

`BITS 0 TO 3 OF
`
`
`
`
`WORD X13
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Associative and Parallel Processors
`
`
`
`
`
`.
`
`223
`
`
`
`
`
`BITS 0 TO 3
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`OF WORD X43
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Figure 5. Word Per Cell Organization.
`
`
`
`
`
`
`
`
`operands can be stored in the samecell, and
`
`
`
`
`
`
`
`most operands are being processed by eachbit-
`
`
`
`
`
`
`
`slice command. For example, in Figure 5, this
`
`
`
`
`
`
`
`type of organization would bevery efficient if,
`
`
`
`
`
`
`
`
`for all operands, Xj,, X,7, Xj3, it were desired
`
`
`
`
`
`
`
`
`
`
`
`to calculate Xj; + (Xj * Xj3) for alli, In this
`
`
`
`
`
`
`
`
`case all cells could be activated and the cal-
`
`
`
`
`
`
`culation could be performed in a bit-serial
`
`
`
`
`manneroverall operands.
`
`
`
`
`
`
`
`However, there are problems in which

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket