`
`I, Rachel J. Watters, am a librarian, and the Director of Wisconsin TechSearch
`
`(“WTS”), located at 728 State Street, Madison, Wisconsin, 53706. WTSis an
`
`interlibrary loan departmentat the University of Wisconsin-Madison.
`
`I have worked as
`
`a librarian at the University of Wisconsin library system since 1998.
`
`I have been
`
`employed at WTSsince 2002, first as a librarian and, beginning in 2011, as the Director.
`
`Through the course of my employment, I have becomewell informed about the
`
`operations of the University of Wisconsin library system, which follows standard library
`
`practices.
`
`This Declaration relates to the dates of receipt and availability of the following:
`
`Thurber, K. J., & Wald, L. D. (1975). Associative and parallel
`processors. ACM Computing Surveys, 7(4), 215-255.
`
`Standard operating procedures for materials at the University of Wisconsin-
`
`Madison Libraries. When an issue wasreceived by the Library, it would be checkedin,
`
`stamped with the date of receipt, added to library holdings records, and made available
`
`to readers as soonafter its arrival as possible. The procedure normally took a few days
`
`or at most 2 to 3 weeks.
`
`Exhibit A to this Declaration is true and accurate copy of the issue cover with
`
`library date stamp of ACM Computing Surveys (1975), from the University of
`
`Wisconsin-MadisonLibrary collection. Exhibit A also includes an excerpt of pages 215
`
`to 255 of that issue, showing thearticle entitled Associative and parallel processors
`
`INTEL - 1011
`
`INTEL - 1011
`
`
`
`Declaration of Rachel J. Watters on Authentication of Publication
`
`(1975). Based on this information, the date stamp on the journaltitle page indicates
`
`Associative and Parallel Processors (1975) was received by the Engineering Library at
`
`the University of Wisconsin-Madison on February 25, 1976.
`
`Based on the information in Exhibit A,it is clear that the volume wasreceived by
`
`the library on or before February 25, 1976, catalogued andavailableto library patrons
`
`within a few daysor at most 2 to 3 weeks after February 25, 1976.
`
`1 declare that all statements made herein of my own knowledgeare true and that
`
`all statements made on information and belief are believed to be true; and further that
`
`these statements were made with the knowledgethat willful false statements and the like
`
`so madeare punishable by fine or imprisonment, or both, under Section 1001 of Title 18
`
`of the United States Code.
`
`Date:January28,2020
`
`Wisconsin TechSearch
`Memorial Library
`728 State Street
`Madison, Wisconsin
`53706
`
`Pot
`
`Rackel J. Watters
`Director
`
`INTEL - 1011
`
`INTEL - 1011
`
`
`
`;
`Special Issue: Computer
`oystems Architecture
`
`0. Wald
`
`|, Organick
`
`Patil
`
`M. Keller
`
`' I
`
`A. Anderson
`D. Jensen
`
`J. Thurber
`
`A6%
`
`=
`
`imber
`
`tcember 1975
`
`ENGINEERING
`
`FEB 25
`Nae:
`
`INTEL - 1011
`
`
`
`COMPUTING SURVEYS
`
`The Survey and Tutorial Journal of the ACM
`
`Volume 7 Number 4 December 1975
`
`ASSOCIATION FOR
`COMPUTING MACHINERY
`Founded in 1947 as the society of the
`computing community, the
`Association for Computing
`Machineryis dedicated to the
`development of information
`processing as a discipline, and to
`the responsible use of computers
`in an increasing diversity of
`applications.
`
`Editor in Chief
`ELLIOTT |. ORGANICK
`Department of Computer Science
`University of Utah
`Salt Lake City, Utah 84112
`
`Executive Editor
`LEE REVENS
`1133 Avenue of the Americas
`New York, New York 10036
`
`GERARD SALTON
`Cornel! University
`
`COUNCIL
`Jean E. Sammet, President
`Herbert R. J. Grosch, Vice President
`John W. Hamblen,Secretary
`Aaron Finerman, Treasurer
`John A. Gosden, Chairman,
`Publications Board
`David H. Brandin, Chairman,
`SIG/SIC Board
`Anthony Ralston, Past President
`Members-at-Large
`Herbert S. Bright (1974-76)
`Anita Cochran (1974-76)
`Peter J. Denning (1974-78)
`M. Stuart Lynn (1974~76)
`Daniel D. McCracken (1974-78)
`Peter Wegner (1972-76)
`Regional Representatives
`Thomas Angell, Allegheny Region
`(1973-76)
`George G. Dodd, East Central Region
`(1973-76)
`Barry Gordon, Greater New York Re-
`gion (1973-76)
`Carl Hammer, Capital Region (1974-
`77)
`Fred H. Harris, N. Central Region
`(1972-75)
`Willard J. Holden, Pacific Region
`(1972-75)
`JACK B. DENNIS
`Horst Hiinke, European Region (1974-
`MIT
`77)
`Olin G. Johnson, S. Central Region
`RICHARD KARP
`(1972-75)
`ERIC A. WEISS
`California,
`University—of
`Roger L. Mills, S, California Region
`(1974-77)
`Sun Services Corp.
`Berkeley
`Gerard A. Salton, Northeast Region
`(1972-75)
`John L. Tischhauser, Mountain Region
`(1974-77)
`Larry L. Bell, Southeast Region
`(1975~76)
`
`The Surveys Editorial Board (Terms through 1978)
`HERBERT S. BRIGHT
`RAYMOND €. MILLER
`Computation Planning, Inc.
`IBM
`THOMASA. CHEATHAM
`Harvard University
`FERNANDO J. CORBATO
`Project MAC/MIT
`PETER J. DENNING
`Purdue University
`
`MARTIN H. SCHULTZ
`Yale University
`
`PETER WEGNER
`Brown University
`
`Editorial Staff
`Associate Editor: Victoria Gillen
`Administrative Assistant: Mary Neumeyer
`
`Surveys and tutorials are solicited. Submittals should be addressed,in triplicate, to
`one of the editors. Manuscripts,
`including bibliographic material, must be typed
`double spaced, with the original on bond paper.
`Correspondence about accepted papers should be addressed to the Executive
`ditor.
`Subscription rates (annual): ACM members, $7.00 (single copies $3.00); non-
`members, $30.00 (single copies $8.00).
`When paymentis included with membership applications, renewal notices, or in-
`voices, please use the address Association for Computing Machinery. P. 0. Box
`12105, Church Street Station, New York, NY 10249,
`Notides of address changes and requests for membership applications and sub-
`scription information should be addressed to Association for Computing Machinery,
`1133 Avenue of the Americas, New York, NY 10036. Allow 6 to 8 weeks for change
`of name and address or new membership to becomeeffective. Send old label with
`new address notification. To avoid interruption of service, notify your local Post
`Office before change of residence; for a fee, the Post Office will forward 2d and 3d
`class periodicals.
`Advertising: ACM, 1133 Avenue of the Americas, New York, N. Y. 10036. {212)
`265-6300.
`
`Published quarterly in March, June, September, and December at Mt. Royal &
`Guilford Aves. Baltimore. Md. 21202 by the Association for Computing Machinery,
`Inc. Copyright © 1975 Association for Computing Machinery,
`Inc. Second-class
`postage paid at New York, New York and at additional mailing offices.
`
`EXECUTIVE DIRECTOR
`Sidney Weinstein
`EXECUTIVE SECRETARY
`Trene Hollister
`PUBLICATIONS BOARD
`John A. Gosden, Chairman
`Aaron Finerman
`Patrick C. Fischer
`Willard J. Holden
`Raymond E. Miller
`B. L. Trippett
`Martha E. Williams
`
`OTHER ACM PUBLICATIONS
`Communications of the ACM
`Computing Reviews
`Journal of the Association for
`Computing Machinery
`Transactions on Mathematical
`Software
`Collected Algorithms
`ACM MonographSeries
`Proceedings of Conferences
`Special Publications
`
`INTEL - 1011
`
`INTEL - 1011
`
`
`
`Associative and Parallel Processors
`
`KENNETH J. THURBER
`
`Product Development Group, Sperry Univac Defense Systems, St. Paul, Minnesota 55165
`Computer, Information and Control Sciences Department, University of Minnesota,
`Minneapolis, Minnesota 55455
`
`LEON D. WALD
`
`Test Equipment Engineering, Government and Aeronautical Products Division, Honeywell, Inc.,
`Minneapolis, Minnesota
`
`This paper is a tutorial survey of the area of parallel and associative processors.
`The paper covers the main design tradeoffs and major architectures of SIMD
`(Single Instruction Stream Multiple Data Stream) systems. Summaries of
`ILLIAC IV, STARAN, OMEN, and PEPE, the major SIMD processors, are
`included.
`
`Keywords and Phrases: associative processor, parallel processor, OMEN,
`STARAN, PEPE, ILLIAC IV, architecture, large-scale systems, SIMD
`processors, array processors, ensemble.
`
`CR Categories: 3.80, 4.20, 6.22
`
`INTRODUCTION
`
`Purpose and Scope
`to present a
`this paper is
`The purpose of
`tutorial survey on the subject of parallel and
`associative processors. The paper covers the
`topics of system categorizations, applications,
`main tradeoff
`issues, historically important
`architectures, and the architectures of systems
`that are currently available.
`Currently, the microprocessor/com puter-on-
`a-chip revolution is providing the potential for
`production of very cost-effective high-
`performance computers through utilization of a
`large number of these processors in parallel or
`in a network. The parallel and associative proc-
`essors provide a class of architectures which can
`be readily used to take immediate advantage of
`the microprocessor technology.
`
`ciative processors have been published [1-10].
`Some of these surveys are only of historical
`interest due to their age. Several conferences in
`this area mayalso be ofinterest [11-14].
`
`Ciassification of Computer Architectures
`Many approaches to classification of computer
`architectures are possible. Most techniques use
`global architecture properties and are thus valid
`only within limited ranges. The main classifica-
`tion techniques discussed below provide a
`framework within which to view associative and
`parallel processors.
`Flynn [15]. proposed a classification scheme
`that divides systems into categories based upon
`their procedure and data streams. The four
`categories are:
`1) SISD (Single Instruction Stream Single
`Data Stream) — a uniprocessor such as a
`single processor IBM 360.
`2) MISD (Multiple Instruction Stream Single
`Data Stream) — a pipeline processor such
`as CDC STAR.
`
`Surveys
`A number of good surveys on the subject (or
`bordering on the subject) of parallel and asso-
`Copyright © 1976, Association for Computing Machinery, Inc. General permission to republish,
`but not for profit, all or part of this material is granted, provided that ACM’s copyright noticeis
`given and that reference is made to this publication, to its date of issue, and to the fact that re-
`printing privileges were granted by permission of the Association for Computing Machinery.
`
`INTEL - 1011
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`INTEL - 1011
`
`
`
`216
`
`.
`
`Kenneth J. Thurber and Leon D. Wald
`
`
`
`CONTENTS
`
`INTRODUCTION
`Purpose and Scope
`Surveys
`Classification of Computer Architectures
`Definitions
`Reasons for Use of SIMD Processors
`Applications
`Matrix Multiplication on a Parallel Processor
`SIMD GENEALOGY
`MAIN TRADEOFFISSUES
`Associative Addressing
`The State Transition Concept and Associative
`Processor Algorithms
`Processing Element (PE) Interconnections
`Input/Output
`Host Interaction
`MemoryDistribution
`Software
`Activity Control and Match Resolution
`Balance Between Logic and Memory
`ASSOCIATIVE PROCESSORS
`Introduction
`DLM (Distributed Logic Memory)
`Highly Parallel Associative Processors
`Mixed Mode and Multidimensional Memories
`Implemented Associative Processors
`Associative Processor Simulators
`PARALLEL PROCESSOR AND ENSEMBLES
`Introduction
`Unger’s Machine
`SOLOMONI, SOLOMONII, and ILLIAC IV
`Machines With Other Interconnection Structures
`Orthogonal Computer
`PEPE
`Comment
`ACTUAL MACHINES
`Introduction
`STARAN
`OMEN
`PEPE
`ILLIAC IV
`CONCLUSION
`REFERENCES
`
`———
`
`3) SIMD (Single Instruction Stream Multiple
`Data Stream) — an associative or parallel
`processor such as ILLIACIV.
`4) MIMD (Multiple Instruction Stream
`Multiple Data Stream) — a multiprocessor
`such as a Univac 1108 multiprocessor
`system.
`Murtha and Beadles [16] proposeda classifi-
`
`parallelism
`
`upon
`based
`technique
`cation
`properties. Their categories are:
`1) General-purpose network computer;
`2) Special-purpose network with global
`parallelism; and
`3) Non-global,
`semi-independent network
`with local parallelism — this category is a
`catchall for machines which do notfit
`into the first two categories.
`Categories
`1
`and 2 have the following
`subcases:
`l.a) General-purpose network with cen-
`tralized commoncontrol;
`1.b) General-purpose network with
`identical processors but
`independent
`instruction execution actions;
`2.a) Pattern processors;
`2.b) Associative processors.
`The purpose of this classification technique
`was to differentiate between multiprocessors
`and highly parallel organizations.
`sug-
`Another possible classification view,
`gested by Hobbs, et al.; [1] consists of the
`following categories:
`1) Multiprocessor
`2) Associative processor
`3) Networkor array processor, and
`4) Functional machines.
`Further, Hobbs, et al.; suggested that archi-
`tectures could be classified based upon the
`amountof parallelism in:
`1) Control
`2) Processing units, and
`3) Data streams.
`However, it was noted that these parameters
`were present in all highly parallel machines and
`were therefore not
`adequate to define a
`machinearchitecture.
`In his article against parallel processors,
`Shore
`[17] presents a unique classification
`technique which derives the machine descrip-
`tions from the description of a uniprocessor.
`The machine categories considered are sum-
`marized as follows:
`1) Machine I — a uniprocessor.
`2) Machine II — a bit-slice associative proc-
`essor built from Machine I by adding bit-
`slice processing and access
`capability
`(e.g., STARAN).
`3) Machine III —an orthogonal computer
`derived from Machine II by adding par-
`allel word processing and access capa-
`bility (e.g., OMEN).
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`INTEL - 1011
`
`INTEL - 1011
`
`
`
`Associative and Parallel Processors
`
`.
`
`217
`
`sistent with those used by Thurber [10]. The
`definitions are:
`any computer with a
`1) SIMD machine:
`single global control unit which drives
`multiple processing units, all of which
`either execute or
`ignore the current
`instruction.
`2) Associative Processor: any SIMD machine
`in which the processing units (or proc-
`essing memory)
`are addressed by a
`property of the data contents rather than
`by address (i.e., multiply A and B to-
`getherin all elements where A <B).
`3) Parallel Processor: any SIMD machine in
`which the processing elements are the
`order of complexity of current day small
`computers and which has, typically, high
`level of interconnectivity between proc-
`essing elements.
`4) Ensemble: a parallel processor in which
`the interconnection level between proc-
`essing elements is very low or nonex-
`istent.
`Clearly, the intersection of these definitions
`will not be null due to the overlap in classi-
`fying machine architectures.
`
`4) Machine [V—a machine derived from
`Machine I by replicating the processing
`units (e.g., PEPE).
`5) Machine V—a machine derived from
`Machine IV by adding interconnections
`between processors (e.g., ILLIAC IV).
`6) Machine VI—a machine derived from
`MachineI by integrating processing logic
`into every memory element(e.g., Kautz’s
`logic-in-memory computers (85, 86]).
`Higbie
`[67]
`classifies computers using
`Flynn’s
`four basic categories. However, he
`expands the SIMD category into the following
`four subcategories:
`1) Array Processor — a processor that proc-
`esses data in parallel and addresses the
`data by address instead of by tag or value.
`2) Associative Memory Processor—a
`processor that operates on data addressed
`by tag or value rather than by address
`(Note:
`this definition does not require
`parallel operation; however,
`the defini-
`tion does allow for machines that operate
`on data in parallel),
`3) Associative Array Processor — a processor
`that
`is associative and also operates
`on arrays of data (typically, the opera-
`tions are on a bit-slice basis; i.e., a single
`bit of many words).
`There are many reasons for the use of SIMD
`4) Orthogonal Processor — a processor with
`architectures. Some of the most importantare:
`two subsystems —an associative
`array
`1) Functional
`processor subsystem andaserial (SISD)
`a) Large problems such as weather data
`processor
`subsystem —which share a
`processing.
`common memory array.
`b) Problems with inherent data structure
`Higbie’s categories provide for the identifica-
`and parallelism.
`tion of the ability to perform parallel, asso-
`c) Reliability and graceful degradation.
`ciative, and serial processing. Higbie defines a
`d) System complexity.
`parallel processor to be any computer that con-
`e) Computational load.
`tains multiple arithmetic units and operates on
`2) Hardware
`multiple data streams. Clearly, all of Higbie’s
`a) Better use of hardware on problems
`four subcategories of SIMD processors fit his
`with large amounts of parallelism.
`definition of a parallel processor.
`b) Advent of LSI microprocessors.
`Although noneof the classification schemes
`c) Economy of duplicate structures.
`presented here are mathematically precise, they
`d) Lower nonrecurring and recurring
`are necessary to place associative and parallel
`costs.
`processors in perspective. In the next section
`the terminology of associative and parallel
`processors will be defined for use in the re-
`mainderof this paper.
`
`Reasons for Use of SIMD Processors
`
`Definitions
`
`The definitions used for the remainder of this
`paper are based upon Flynn [15], and are con-
`
`3) Software
`a) Simpler than for multiprocessor.
`b) Easier to construct large systems.
`c) Less executive function requirements.
`However, the reader is cautioned to remem-
`ber that these are special-purpose machines and
`any attempt to apply them to an incorrectly
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`INTEL - 1011
`
`INTEL - 1011
`
`
`
`218
`
`.
`
`Kenneth J. Thurber and Leon D.
`
`Wald
`
`sized, or designed, problem is an exercise in
`futility.
`
`Applications
`
`Numerousapplications have been proposed for
`associative and parallel processors. Whether an
`application is ever implemented is based upon
`the economics of the problem. It is not enough
`to be able to describe a problem solution. To
`date, considerable energy has been expended
`searching for cost-effective parallel solutions,
`but little has been done to actually implement
`them. Examples of
`such applications are:
`matrix manipulation, differential equations,
`and linear programming.
`Several areas of application appear quite well
`suited to associative and parallel processing. In
`some cases, SIMD processors provide cost-
`effective system augmentation (e.g., air traffic
`control and associative head-per-track disks). In
`others, SIMD processors are very close to func-
`tional analogs of the physical system (e.g., data
`compression, concentration, and multiplexing).
`The use of associative and parallel processors
`appears promising in the elimination of critical
`bottlenecks in current general-purpose com-
`puter systems; however, only very small asso-
`ciative memories are required, and costis really
`not a critical factor in the solution of these
`problems. Such problems may be encountered
`in the management of computer resources, and
`might involve such items as protection mecha-
`nisms,
`resource allocation, and memory
`management.
`The application trend for associative
`processors is well defined. Due to cost factors,
`applications will be limited (in the near future)
`to problems such as
`resource management,
`virtual memory mechanisms, and some aug-
`mentation of current systems, rather than to
`data-base management or large file searching
`and processing. The application trend for
`parallel processors and ensembles seemsto bein
`the area of large data computation problems
`such as weather data processing, nuclear data
`processing, and ballistic missile defense. Re-
`searchers must concentrate on systems aspects
`of the problem, not on developing solutions to
`problems which are ill-defined or nonexistent.
`The only truly useful data-processing applica-
`tions of parallel and associative processors have
`been developed through extensive work at the
`
`systen
`
`followed by parallel
`level,
`systems
`design, rather than vice versa.
`SIMD processors have been proposed anc
`appear well suited for a numberof applications
`these are listed here along with their primary
`references. Factors to be considered in applying
`SIMD processors have been discussed elsewhere
`{18]. To be useful for solving the listed prob.
`lems,
`the highly parallel processors must
`become more
`cost-effective. The suggested
`application areas are:
`1) Applications in which associative proc-
`essors appear to be cost-effective:
`[19]
`a) Virtual memory mechanisms
`{20]
`b) Resourceallocation
`{21]
`c) Hardware executives
`(22,23]
`d) Interrupt processing
`[19]
`e) Protection mechanisms
`[20]
`f) Scheduling
`2) Applications in which parallel processors
`appear cost-effective and in which
`associative processors may be
`cost-
`effective:
`(24]
`a) Bulk filtering
`[25]
`b) Tracking
`[26]
`c) Air traffic control
`{27]
`d) Data compression
`e) Communications multiplexing [28,29]
`f) Signal processing
`[30]
`3) Applications in which associative proc-
`essors would be useful if they were more
`cost-effective and in which parallel proc-
`essors are probably cost-effective are:
`a) Information retrieval
`b) Sorting
`c) Symbol manipulation
`d) Pattern recognition
`e)’Picture processing
`f) Sonar processing
`g) Sea surveillance
`h) Graph processing
`i) Dynamic programming
`j) Differential equations
`' k) Eigenvector
`1) Matrix operations
`m) Network flow analysis
`n) Documentretrieval
`o) Data file manipulation
`p) Machine documenttranslation
`q) Data file searching
`r) Compilation
`s) Formatted files
`t) Automatic abstracting
`
`[31]
`[32]
`[33]
`[34]
`[35]
`{36]
`[37]
`[38]
`[39]
`[40]
`(41]
`[42]
`[43]
`[44]
`[45]
`[46]
`{47]
`{48]
`[49]
`{50,51]
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`INTEL - 1011
`
`INTEL - 1011
`
`
`
`Associative and Parallel Processors
`
`.
`
`219
`
`[52]
`u) Dictionary-look-up translations
`[53]
`v) Data management
`[54]
`w) Theorem proving
`[55]
`x) Computergraphics
`[56]
`y) Weather data processing
`Hollander’s paper
`[2]
`presents many
`experts’ opinions about
`the applicability of
`associative and parallel processors. Slotnick
`[57]
`and Fuller
`[58] have compared asso-
`ciative and parallel processors. They concluded
`that parallel processors appear more useful than
`associative processors, but the machines would
`always be special purpose. Shore [17] has pre-
`sented a very eloquent
`case against SIMD
`machinesand parallel processors.
`
`Matrix Multiplication on a Parallel Processor
`
`Matrix manipulations can be used to illustrate
`the potential of a parallel processor. One of the
`major computations in many of the possible
`applications cited previously is
`the matrix
`multiply. Assume that a parallel processor such
`as that shown in Figure |
`is used to perform
`this operation.
`Each processing element will be assumed to
`have three registers, AREG, BREG, and CREG.
`
`CONTROL
`UNIT
`
`MEMORY TO AND FROM ALL PEs
`
`is interconnected to its four nearest
`Each cell
`neighbors. Cannon [59] derived the following
`algorithm to multiply two nxn matrices to-
`gether in n stages using this processor.
`
`Algorithm:
`Set:
`
`Shift:
`
`Multiply:
`
`Add:
`
`Shift:
`
`Jump:
`
`CREG=0
`BREG [PEy yl =B [I,J] for all
`IJ <N
`,
`AREG (PE; yl = [I,J] for all
`LJ <N
`,
`Ith row ofA left I-1 columns
`for all I <N
`Jth column of B up J-1 rows
`for all J SN
`(TREG = AREG times BREG)
`In Parallel in all PEs
`(CREG = CREG + TREG)
`In Parallel in all PEs
`AREG right one row
`BREG down one column
`If not Nth pass, Jump to
`Multiply:
`
`As an example let:
`
`ay
`a4
`a7
`
`b,
`b»
`b3
`
`a4
`as
`ag
`
`by
`bs
`be
`
`43.
`ag
`ag
`
`by
`bg
`bg
`
`A=
`
`B=
`
`and
`
`C= Ax B,
`
`After initialization, the memory map for CREG
`is:
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`For AREG, the memory mapis:
`
`a)
`
`as
`ag
`
`a9
`
`46
`aq
`
`0
`
`0
`
`0
`
`a3
`
`ag
`ag
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`INTEL - 1011
`
`
`
`
`
`
`
`ee8 #
`
`@ #—
`
`
`
`
`
`Figure 1. Parallel Processor with PE connec-
`tions to the Four Nearest Neighbors.
`
`INTEL - 1011
`
`
`
`220
`
`.
`
`Kenneth J. Thurber and Leon D.
`
`Wald
`
`And the BREG map is:
`
`by
`bo
`b3
`
`bs
`be
`b4
`
`bg
`by
`bg
`
`BREG=
`
`b3
`|b,
`bo
`
`by
`bs
`bg
`
`bg
`bg
`by
`
`After the multiply, add, shift,
`memory mapsappearas follows:
`
`and jump, the
`
`After two more iterations the multiply will be
`finished and CREG (PE, jl will contain C[(I,J].
`
`CREG =
`
`AREG =
`
`aby
`asb>
`agb3
`
`a3
`a4
`ag
`
`ands
`agbe
`aqb,
`
`ay
`ag
`ag
`
`agbg
`agb4
`agbe
`
`ag
`a6
`aq
`
`SIMD GENEALOGY
`
`Figure 2 indicates the major parallel and asso-
`ciative architectures considered in this paper,
`together with their generic relationships. These
`may differ substantially, yet they are related by
`virtue of
`their global associative or parallel
`properties.
`
`SIMD MACHINES
`
`ASSOCIATIVE PROCESSORS
`
`PARALLEL PROCESSORS
`
`ENSEMBLES
`
`MEMEX (72)
`
`PEPE
`(115)
`
`ORTHOGONAL
`COMPUTER
`(105)
`|
`OMEN
`(112)
`
`SOLOMON |
`(108)
`|
`SOLOMON II
`(109)
`
`ILLIAC IV
`(57)
`
`
`
`ASSOCIATIVE
`ASSOCIATIVE
`
`HIGHLY PARALLEL
`
`
`AUGMENTED
`
`CONTENT
`ADDRESSABLE
`MEMORY
`(85)
`
`CONTROL
`SWITCH
`(24)
`
`LOGIC
`PROCESSING
`SYSTEM
`(87)
`
`MIXED MODE AND MULTIDIMENSIONAL (23)
`
`Figure 2. SIMD Geneology.
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`INTEL - 1011
`
`UNGER
`MACHINE
`(107)
`CRYOGENIC CATALOG MEMORY (73)
`
`.
`
`DISTRIBUTED LOGIC MEMORY (31)
`
`Two
`TREE
`ASSOCIATION
`
`STORING CHANNEL=DIMENSIONAL
`
`IMPLEMENTED
`PROCESSOR
`PROCESSOR
`AND BULK
`ASSOCIATIVE
`(78)
`(80}
`DISTRIBUTED
`
`PROCESSORS
`Locic
`
`MEMORIES
`(81)
`
`BIT
`SLICE
`(113)
`
`ADDER
`SKEWED
`BIT
`SLICE
`(92)
`
`BYTE
`SLICE
`(106)
`
`DISTRIBUTED
`LOGIC
`(20)
`
` EXOR
`SKEWED
`BIT
`SLICE
`(93)
`
`aden
`
`44}
`
`INTEL - 1011
`
`
`
`Associative and Parallel Processors
`
`.
`
`221
`
`MAIN TRADEOFF ISSUES
`
`To allow the reader to place the processed
`architectures in perspective, the major system
`tradeoff issues in SIMD processors will be dis-
`cussed next. It should be understood that SIMD
`processors are special purpose. They are useful
`for a limited set of applications which may be
`characterized as having [18]:
`1) A large numberof independent datasets;
`2) No relationships between data sets: that
`prevent
`them from being processed in
`parallel;
`3) The requirement
`response; and
`to exploit
`4) The potential
`addressing selection techniques.
`
`for quick throughput
`
`associative
`
`Associative Addressing
`Figure 3 indicates the main features required to
`implement an associative address process. The
`designer must
`include facilities to address or
`query the processor associatively, i., to
`
`CONTROL UNIT
`
`
`
`GENERAL FORM OF THE
`FIELD DESCRIPTOR
`
`EMPLOYEE
`DAILY
`SALARY INUMBER
`
`address the memory array by content. In Figure
`3,
`the application may require that all em-
`ployees with a salary of $35 per day be located.
`This would be accomplished by performing an
`equality search on the daily salary field of the
`file, This search is performed in parallel. To set
`up the search a Data Register must be loaded
`with the daily salary ($35) for comparison, a
`Mask Register
`is included to mask the data
`register so that only the desired fields may be
`searched, a Word Select (activity select) Reg-
`ister specifies the subset of words to be ad-
`dressed, a Results Register is required to collect
`the results of the search, a Match Indicator is
`used to indicate the number of matches, and a
`Multiple Match Resolver must be provided to
`generate a pointer to the ‘‘topmost’’ matched
`word, An eight-word example of a personnel
`file stored in a simple associative memory is
`illustrated in Figure 3;
`it assumes that
`the
`search for a salary of $35 per day wasinitiated
`on the entire file. All necessary registers and
`features are illustrated.
`
`MATCH
`INDICATOR
`
`DATA REGISTER
`
`MASK REGISTER
`
`MULTIPLE
`MATCH
`RESOLVER
`
`|
`
`WORD
`SELECT
`REGISTER
`
`|
`
`EXTERNAL
`PROCESSING
`LOGIC
`
`SEARCH
`RESULTS
`REGISTER
`
`Figure 3. Associative Processor Storing a PersonnelFile.
`
`Computing Surveys, Vol. 7
`
`INTEL - 1011
`
`
`
`222
`
`.
`
`Kenneth J. Thurber and Leon D. Wald
`
`PRESENT
`STATE
`
`PARTIAL SUM
`
`
`
`a=EERE
`HLTHe
`
`
`
`
`The State Transition Concept and Associative
`Processor Algorithms
`In 1965, Fuller and Bird [35] described a bit-
`slice associative processor architecture. Fuller
`and Bird discussed their algorithms in terms of
`state transitions. To perform parallel processing
`of data in an associative processor, two basic
`operations are required. These are a content
`search and a “‘multiwrite” operation. The multi-
`write operation consists of
`the ability to
`simultaneously write into selected bits of
`several selected words. With a content search
`and multiwrite capability, an associative proc-
`essor can be used to process data in highly
`parallel fashion.
`the problem of
`For
`example, consider
`adding field A to field B. This can be accom-
`plished as a sequence ofbit pair additions (A;
`added to B;), with suitable attention paid to the
`carry bit. If bit 1
`is the least significant and bit
`n the most, Figure 4 indicates the state table
`for the bit-serial addition of A to B with the
`result accumulating into B. As shownB;(result)
`and C; + , differ from B; (operand) and C;in
`the four states, 1, 3, 4, and 6. One possible
`addition method would be to search for these
`states and multiwrite the correct B; and Cj + )
`into the appropriate fields, ic., set up a “tag
`field” designated as Cj, and set C) = 0. Then
`operations would proceed from the least sig-
`nificant bit to the most significant bit of the
`pair of operands. At each bit position, i, Aj, Bj,
`and C; would be searched for state 1. Then
`Ci + 1 = 0, B; = 1 would be multiwritten into
`all matching words. After a search for state 3,
`Cj + 1 = 1, Bj = 0 would be multiwritten into all
`matching words. This same procedure would be
`carried out
`for states 4 and 6.
`if a word is
`matched on any search, it would be removed
`from the list of active cells until operations
`moved from bit position i
`to i + I, so that
`writing the resultant B; and the carry-in for bit
`position i + 1 into the memory does not cause
`the data already processed to be reprocessed.
`Obviously, utilizing the sequential state trans-
`formation approach, any desired transforma-
`tions can be processed in an associative
`processor with simple content search and multi-
`write capability using very simple word logic.
`Operating on the state transition concept, six
`general requirements of the architecture of an
`associative processor were deduced. These are:
`
`Ay
`
`8;
`
`Cc
`
`* CARRY iNTO BIT POSITION i+1 FAOM BIT POSITION i.
`
`Figure 4. Bit-Slice Addition State Transition
`Diagram.
`
`the
`1) The basic processing primitive is
`identification of all words meeting some
`binary state variable configuration;
`2) No explicit addressing is necessary since
`processing
`can
`be performed simui-
`taneously over all data sets of interest;
`3) Data is processed “column serially” to
`limit the numberofstate transformations
`to be remembered;
`4) Tag bits (temporary bit-slice storage) are
`very useful (for example, the carry bit);
`5) A search resuits/word select register and
`multiword write capability are desirable;
`and
`:
`6) Stored program control of the associative
`processoris desirable.
`Fuller and Bird recognized an important
`correlation between the way in which data is
`stored in the processor’s memory and its proc-
`essing speed. As examples of this phenomenon,
`they described both word per cell (W/C) andbit
`per
`cell
`(B/C) data organizations for
`their
`associative processor, Figures 5 and 6 show the
`W/C and B/C storage organizations for nine
`four-bit words X,1, ... , X33, respectively.
`The word per cell organization has become an
`industry standard, and it
`is, in fact, the way
`most associative processors store data. The
`desire to perform bit-slice processing on many
`data sets simultaneously without much inter-
`word communication led to the widespread use
`of this type of organization. W/C is very effi-
`cient if a large number of cells are involved;
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`INTEL - 1011
`
`INTEL - 1011
`
`
`
`Associative and Parallel Processors
`
`.
`
`223
`
`BITS 0 TO 3
`
`OF WORD X14
`
`BITS 0 TO 3 OF
`
`
`
`
`
`Figure 5. Word Per Cell Organization.
`
`operands can be stored in the samecell, and
`most operands are being processed by eachbit-
`slice command. For example, in Figure 5, this
`type of organization would bevery efficientif,
`for all operands, Xj), Xj2, Xj3, it were desired
`to calculate Xj; + (Xj2 * Xj3) for all i. In this
`case all cells could be activated and the cal-
`culation could be performed in a_bit-serial
`manneroverall operands.
`However, there are problems in which a B/C
`approach is moreefficient. (The B/C approach
`is the same as described by Crane and Githens
`[76], and was discovered independently by
`Fuller and Bird [35].) The B/C organization
`allows many operands to be processed simul-
`taneously in a bit-parallel mode. Because many
`operands are treated in a bit-parallel fashion,
`B/C processing can be very efficient, even on
`only a few operands, in comparison with W/C
`processing. Also, operands stored in the same
`bit slice can be easily picked up. Therefore, B/C
`processing can be efficient in cases where only a
`few operands are activiated for processing by an
`instruction, or for applications in which exten-
`sive pairing of operands must occur.
`Using the state transition concept, asso-
`ciative processor algorithms may be clearly
`described for a variety of architectures. Con-
`sidering the range of architectures currently
`available, no attempt will be made to describe
`the algorithms here. Rather, a list will be given
`of the more important operations that can be
`performed. Obviously there are many variations
`of these algorithms.
`Define a field as a set of bit slices. There are
`two mai