`
`I, Rachel J. Watters, am a librarian, and the Director of Wisconsin TechSearch
`
`(“WTS”), located at 728 State Street, Madison, Wisconsin, 53706. WTS is an
`
`interlibrary loan department at the University of Wisconsin-Madison.
`
`I have worked as
`
`a librarian at the University of Wisconsin library system since 1998.
`
`I have been
`
`employed at WTS since 2002, first as a librarian and, beginning in 2011, as the Director.
`
`Through the course of my employment, I have become well informed about the
`
`operations of the University of Wisconsin library system, which follows standard library
`
`practices.
`
`This Declaration relates to the dates of receipt and availability of the following:
`
`Thurber, K. J., & Wald, L. D. (1975). Associative and parallel
`processors. ACM Computing Surveys, 7(4), 215—255.
`
`Standard operating procedures [or materials at the Universifl. of Wisconsin-
`
`Madison Libraries. When an issue was received by the Library, it would be checked in,
`
`stamped with the date of receipt, added to library holdings records, and made available
`
`to readers as soon after its arrival as possible. The procedure normally took a few days
`
`or at most 2 to 3 weeks.
`
`Exhibit A to this Declaration is true and accurate copy of the issue cover with
`
`library date stamp of ACM Computing Surveys (1975), from the University of
`
`Wisconsin-Madison Library collection. Exhibit A also includes an excerpt of pages 215
`
`to 255 of that issue, showing the article entitled Associative and parallel processors
`
`INTEL - 1011
`
`INTEL - 1011
`
`
`
`Declaration of Rachel J. Watters on Authentication of Publication
`
`(I975). Based on this information, the date stamp on the journal title page indicates
`
`Associative and Parallel Processors (I975) was received by the Engineering Library at
`
`the University of Wisconsin-Madison on February 25, I976.
`
`Based on the information in Exhibit A, it is clear that the volume was received by
`
`the library on or before February 25, 1976, catalogued and available to library patrons
`
`within a few days or at most 2 to 3 weeks after February 25, 1976.
`
`I declare that all statements made herein of my own knowledge are true and that
`
`all statements made on information and belief are believed to be true; and further that
`
`these statements were made with the knowledge that willful false statements and the like
`
`so made are punishable by fine or imprisonment, or both, under Section IOOI of Title 18
`
`of the United States Code.
`
`Date: January 28, 2020
`
`Wisconsin TechSearch
`
`Memorial Library
`728 State Street
`
`Madison, Wisconsin
`
`53706
`
`/%Ra
`
`el J . Watters
`
`Director
`
`INTEL - 1011
`
`INTEL - 1011
`
`
`
`«is
`
`f
`
`“”19
`
`Imber
`
`zcember 1975
`
`UW - MAD:
`
`Special Issue: Computer
`Systems Architecture
`
`D. Wald
`
`|. Organick
`
`Patil
`
`W. Keller
`
`A Anderson
`0 Jensen
`
`r ‘
`
`J. Thurber
`
`INTEL - 1011
`
`
`
`COMPUTING SURVEYS
`
`The Survey and Tutorial Journal of the ACM
`
`Volume 7 Number 4 December 1975
`
`Editor in Chief
`ELLIOTT l. ORGANICK
`Department of Computer Science
`University of Utah
`Salt Lake City, Utah 84112
`
`Executive Editor
`LEE REVENS
`1133 Avenue of the Americas
`New York, New York 10036
`
`The Surveys Editorial Board (Terms through 1978)
`HERBERT S. BRIGHT
`RAYMOND E. MILLER
`Computation Planning, Inc.
`IBM
`THOMAS A. CHEATHAM
`Harvard University
`
`GERARD SALTON
`Cornell University
`
`FERNANDO J. CORBATO
`Project MAC/MIT
`PETER J. DENNING
`Purdue University
`
`JACK B. DENNIS
`MIT
`RICHARD KARP
`University
`of
`Berkeley
`
`MARTIN H. SCHULTZ
`Yale University
`
`PETER WEGNER
`Brown University
`
`California.
`
`ERIC A. WEISS
`Sun Services Corp.
`
`Editorial Staff
`Associate Editor: Victoria Gillen
`Administrative Assistant: Mary Neumeyer
`
`Surveys and tutorials are solicited. Submittals should be addressed, in triplicate, to
`one of the editors. Manuscripts,
`including bibliographic material, must be typed
`double spaced, with the original on bond paper.
`Correspondence about accepted papers should be addressed to the Executive
`‘Editor.
`Subscription rates (annual): ACM members, $700 (single copies $3.00): non-
`members, $30.00 (single copies $8.00).
`When payment is included with membership applications, renewal notices. or in-
`voices, please use the address Association for Computing Machinery. P. O. Box
`12105, Church Street Station, New York, NY 10249.
`Notices of address changes and requests for membership applications and sub-
`scription information shOUId be addressed to Association for Computing Machinery.
`1133 Avenue of the Americas, New York, NY 10036. Allow 6 to 8 weeks for change
`of name and address or new membership to become effective. Send old label with
`new address notification. To avoid interruption of service, notify your local Post
`Office before change of residence: for a fee, the Post Office will forward 2d and 3d
`class periodicals.
`Advertising: ACM, 1133 Avenue of the Americas. New York, N. Y. 10036. (212)
`265—6300.
`
`Published quarterly in March, June. September, and December at Mt. Royal &
`Guilford Aves. Baltimore. Md. 21202 by the Association for Computing Machinery,
`Inc. Copyright © 1975 Association for Computing Machinery,
`Inc. Second-class
`postage paid at New York, New York and at additional mailing offices.
`
`ASSOCIATION FOR
`COMPUTING MACHINERY
`Founded in 1947 as the society of the
`computing community. the
`Association for Computing
`Machinery is dedicated to the
`development of information
`proeessing as a discipline, and to
`the responsible use of computers
`in an increasing diversity of
`applications.
`
`COUNCIL
`Jean E. Sammet, President
`Herbert R. J. Grosch, Vice President
`John W. Hamblen, Secretary
`Aaron Finerman, Treasurer
`John A. Gosden, Chairman,
`Publications Board
`David H. Brandin, Chairman,
`SIG/SIC Board
`Anthony Ralston, Past President
`Members-at-Large
`Herbert S. Bright (1974776)
`Anita Cochran (197446)
`Peter J. Denning (197448)
`M. Stuart Lynn (1974»76)
`Daniel D. McCracken (1974778)
`Peter Wegner (1972-76)
`Regional Representatives
`Thomas Angeli, Allegheny Region
`(1973‘76)
`George G. Dodd, East Central Region
`(1973-76)
`Barry Gordon, Greater New York Re-
`gion (1973-76)
`Carl Hammer, Capital Region (1974—
`77)
`Fred H. Harris, N. Central Region
`(1972—75)
`Willard J. Holden, Pacific Region
`(1972—75)
`Horst Hiinke, European Region (1974'
`77)
`Olin G. Johnson, S. Central Region
`(1972775)
`Roger L. Mills, S, California Region
`(1974—77)
`Gerard A. Salton, Northeast Region
`(1972—75)
`John L. Tischhauser, Mountain Region
`(1974777)
`Larry L. Bell, Southeast Region
`(1975»76)
`
`EXECUTIVE DIRECTOR
`Sidney Weinstein
`EXECUTIVE SECRETARY
`Irene Hollister
`PUBLICATIONS BOARD
`John A. Gosden, Chairman
`Aaron Finerman
`Patrick C. Fischer
`Willard J. Holden
`Raymond E. Miller
`B. L. Trippett
`Martha E. Williams
`
`OTHER ACM PUBLICATIONS
`Communications of the ACM
`Computing Reviews
`Journal of the Association for
`Computing Machinery
`Transactions on Mathematical
`Software
`Collected Algorithms
`ACM Monograph Series
`Proeeedings of Conferences
`Special Publications
`
`INTEL - 1011
`
`INTEL - 1011
`
`
`
`Associative card Parallel Proceuan
`
`KENNETH J. THURBER
`
`Product Development Group, Sperry Univac Defense Systems, St. Paul, Minnesota 55165
`Computer, Information and Control Sciences Department, University of Minnesota,
`Minneapolis, Minnesota 55455
`
`LEON D. WALD
`
`Test Equipment Engineering, Government and Aeronautical Products Division, Honeywell, Inc.,
`Minneapolis, Minnesota
`
`This paper is a tutorial survey of the area of parallel and associative processors.
`The paper covers the main design tradeoffs and major architectures of SIMD
`(Single Instruction Stream Multiple Data Stream) systems. Summaries of
`ILLIAC IV, STARAN, OMEN, and PEPE, the major SIMD processors, are
`included.
`
`Keywords and Phrases: associative processor, parallel processor, OMEN,
`STARAN, PEPE, ILLIAC IV, architecture, large-scale systems, SIMD
`processors, array processors, ensemble.
`
`CR Categories.“ 3.80, 4.20, 6.22
`
`INTRODUCTION
`
`Purpose and Scope
`
`to present a
`this paper is
`The purpose of
`tutorial survey on the subject of parallel and
`asociative processors. The paper covers the
`topics of system categorizations, applications,
`main tradeoff
`issues, historically important
`architectures, and the architectures of systems
`that are currently available.
`Currently, the microprocessor/computer—on-
`a-chip revolution is providing the potential for
`production of very cost-effective high-
`performance computers through utilization of a
`large number of these processors in parallel or
`in a network. The parallel and associative proc-
`essors provide a class of architectures which can
`be readily used to take immediate advantage of
`the microprocessor technology.
`
`Surveys
`
`A number of good surveys on the subject (or
`bordering on the subject) of parallel and asso-
`
`ciative processors have been published [1-10].
`Some of these surveys are only of historical
`interest due to their age. Several conferences in
`this area may also be of interest [1 1-14] .
`
`Classification of Computer Architectures
`
`Many approaches to classification of computer
`architectures are possible. Most techniques use
`global architecture properties and are thus valid
`only within limited ranges. The main classifica-
`tion techniques discussed below provide a
`framework within which to view associative and
`parallel processors.
`Flynn [15], proposed a classification scheme
`that divides systems into categories based upon
`their procedure and data streams. The four
`categories are:
`l) SISD (Single Instruction Stream Single
`Data Stream) — a uniproeessor such as a
`single processor IBM 360.
`2) MISD (Multiple Instruction Stream Single
`Data Stream) — a pipeline processor such
`as CDC STAR.
`
`Copyright © 1976, Association for Computing Machinery, Inc. General permission to republish,
`but not for profit, all or part of this material is granted, provided that ACM’s copyright notice is
`given and that reference is made to this publication, to its date of issue, and to the fact that re—
`printing privileges were granted by permission of the Association for Computing Machinery.
`
`INTEL - 1011
`Computing Surveys, Vol. 7, No. 4. December 1975
`
`INTEL - 1011
`
`
`
`216
`
`-
`
`Kenneth J. Thurber and Leon D. Wald
`
`
`
`CONTENTS
`
`INTRODUCTION
`Purpose and Scope
`Classification of Computer Architectures
`SW75 .
`.
`Definitions
`Reasons for Use of SIMD Processors
`
`.
`.
`A’P‘.‘°‘"‘°“‘.
`Matnx Multiplication on a Parallel Processor
`SIMD GENEALOGY
`MAIN TRADEOFF ISSUES
`Associative Addressing
`The State Transition Concept and Associative
`Processor Algorithms
`:zciigfipflemem (PE) interconnections
`u
`Host interaction
`Memory Distribution
`Software
`Activity Control and Match Resolution
`Balance Between Logic and Memory
`ASSOCIATIVE PROCESSORS
`Introduction
`DLM (Distributed Logic Memory)
`Highly Parallel Associative Processors
`Mixed Mode and Multidimensional Memories
`Implemented Associative Processors
`Associative Processor Simulators
`PARALLEL PROCESSOR AND ENSEMBLES
`introduction
`Unger’s Machine
`SOLOMON l, SOLOMON Ii, and ILLIAC iV
`Machines With Other interconnection Structures
`Orthogonal Computer
`PEPE
`Comment
`ACTUAL MACHINES
`introduction
`STARAN
`OMEN
`PEPE
`iLLiAC iV
`CONCLUSION
`REFERENCES
`
`+—
`
`3) SIMD (Single Instruction Stream Multiple
`Data Stream) — an associative or parallel
`processor such as iLLiAC IV.
`4) MIMD (Multiple Instruction Stream
`Multiple Data Stream) — a multiprocessor
`such as a Univac i 108 multiprocessor
`system.
`Murtha and Beadles [16] proposed a classifi-
`
`parallelism
`
`upon
`based
`technique
`cation
`properties. Their categories are:
`l) General-purpose network computer;
`2) Special-purpose network with global
`parallelism; and
`3) Non—global,
`semi-independent network
`with local parallelism — this category is a
`catchall for machines which do not fit
`.
`.
`into the first two categories.
`Categones
`l
`and 2 have the foilowmg
`SUbCfiSCS:
`
`l.a) General-purpose network with cen-
`.
`tralrzed common control;
`l.b) G e n eral-purpose network with
`.d
`ti
`1
`b t
`. d
`d
`t
`1 en ca processors
`Ll
`_ m epen en
`instruction execution actions;
`2.3) Pattern processors;
`e
`a
`2.b) Assocratrve processors.
`The purpose of this classification technique
`was to differentiate between multiprocessors
`and highly parallel organizations.
`sug-
`Another possible classification view,
`gested by Hobbs, et al.; [1] consists of the
`following categories:
`i) Multiprocessor
`2) Associative processor
`3) Network or array processor, and
`4) Functional machines.
`Further, Hobbs, et al.; suggested that archi-
`tectures could be classified based upon the
`amount of parallelism in:
`1) Control
`2) Processing units, and
`3) Data streams.
`However, it was noted that these parameters
`were present in all highly parallel machines and
`were therefore not
`adequate to define a
`machine architecture.
`
`In his article against parallel processors,
`Shore
`[17] presents a unique classification
`technique which derives the machine descrip-
`tions from the description of a uniprocessor.
`The machine categories considered are sum-
`marized as follows:
`
`1) Machine I — a uniprocessor.
`2) Machine II - a bit-slice associative proc-
`essor built from Machine 1 by adding bit-
`slice processing and access
`capability
`(e.g., STARAN).
`3) Machine III —an orthogonal computer
`derived from Machine 11 by adding par-
`allel word processing and access capa-
`bility (e.g., OMEN).
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`INTEL - 1011
`
`INTEL - 1011
`
`
`
`Associative and Parallel Processors
`
`-
`
`217
`
`4) Machine IV—a machine derived from
`Machine I by replicating the processing
`units (e.g., PEPE).
`5) Machine V —- a machine derived from
`Machine IV by adding interconnections
`between processors (e.g., ILLIAC IV).
`6) Machine VI—a machine derived from
`Machine I by integrating processing logic
`into every memory element (e.g., Kautz’s
`logic-in-memory computers [85, 86]).
`Higbie
`[67]
`classifies computers using
`Flynn's
`four basic categories. However, he
`expands the SIMD category into the following
`four subcategories:
`1) Array Processor — a processor that proc-
`esses data in parallel and addresses the
`data by address instead of by tag or value.
`2) Associative Memory Processor — a
`processor that operates on data addressed
`
`by tag or value rather than by address
`(Note:
`this definition does not require
`parallel operation; however,
`the defini-
`tion does allow for machines that operate
`on data in parallel).
`3) Associative Array Processor — a processor
`that
`is associative and also operates
`on arrays of data (typically. the opera-
`tions are on a bit-slice basis; i.e., a single
`bit of many words).
`4) Orthogonal Processor — a processor with
`two subsystems — an associative
`array
`processor subsystem and a serial (SISD)
`processor
`subsystem — which share a
`common memory array.
`Higbie’s categories provide for the identifica-
`tion of the ability to perform parallel, asso-
`ciative, and serial processing. Higbie defines a
`parallel processor to be any computer that con-
`tains multiple arithmetic units and operates on
`multiple data streams. Clearly, all of Higbie’s
`four subcategories of SIMD processors fit his
`definition of a parallel processor.
`Although none of the classification schemes
`presented here are mathematically precise, they
`are necessary to place associative and parallel
`processors in perspective. In the next section
`the terminology of associative and parallel
`processors will be defined for use in the re-
`mainder of this paper.
`
`Definitions
`
`The definitions used for the remainder of this
`
`paper are based upon Flynn [15] , and are con-
`
`sistent with those used by Thurber [10]. The
`definitions are:
`
`any computer with a
`1) SIMD machine:
`single global control unit which drives
`multiple processing units, all of which
`either execute or
`ignore the current
`instruction.
`2) Associative Processor: any SIMD machine
`in which the processing units (or proc-
`essing memory)
`are addressed by a
`property of the data contents rather than
`by address (i.e., multiply A and B to-
`gether in all elements where A <B).
`3) Parallel Processor: any SIMD machine in
`which the processing elements are the
`order of complexity of current day small
`computers and which has, typically, high
`level of interconnectivity between proc-
`essing elements.
`4) Ensemble: a parallel processor in which
`the interconnection level between proc-
`essing elements is very low or nonex-
`istent.
`
`Clearly, the intersection of these definitions
`will not be null due to the overlap in classi-
`fying machine architectures.
`
`Reasons for Use of SIMD Processors
`
`There are many reasons for the use of SIMD
`architectures. Some of the most important are:
`1) Functional
`a) Large problems such as weather data
`pr0cessing.
`b) Problems with inherent data structure
`and parallelism.
`c) Reliability and graceful degradation.
`d) System complexity.
`e) Computational load.
`2) Hardware
`a) Better use of hardware on problems
`with large amounts of parallelism.
`b) Advent of LSI microprocessors.
`c) Economy of duplicate structures.
`d) Lower nonrecurring and recurring
`costs.
`
`3) Software
`a) Simpler than for multiprocessor.
`b) Easier to construct large systems.
`c) Less executive function requirements.
`However, the reader is cautioned to remem-
`ber that these are special-purpose machines and
`any attempt to apply them to an incorrectly
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`INTEL - 1011
`
`INTEL - 1011
`
`
`
`218
`
`-
`
`Kenneth J. Thurber and Leon D. Wald
`
`sized, or designed, problem is an exercise in
`futility.
`
`Applications
`
`Numerous applications have been proposed for
`associative and parallel processors. Whether an
`application is ever implemented is based upon
`the economics of the problem. It is not enough
`to be able to describe a problem solution. To
`date, considerable energy has been expended
`searching for cost-effective parallel solutions,
`but little has been done to actually implement
`them. Examples of
`such applications are:
`matrix manipulation, differential equations,
`and linear programming.
`Several areas of application appear quite well
`suited to associative and parallel processing. In
`some cases, SIMD processors provide cost-
`effective system augmentation (e.g., air traffic
`control and associative head-per—track disks). In
`others, SIMD processors are very close to func-
`tional analogs of the physical system (e.g., data
`compression, concentration, and multiplexing).
`The use of associative and parallel processors
`appears promising in the elimination of critical
`bottlenecks in current general-purpose com—
`puter systems; however, only very small asso-
`ciative memories are required, and cost is really
`not a critical factor in the solution of these
`
`problems. Such problems may be encountered
`in the management of computer resources, and
`might involve such items as protection mecha-
`nisms,
`resource allocation, and memory
`management.
`The application trend for associative
`processors is well defined. Due to cost factors,
`applications will be limited (in the near future)
`to problems such as
`resource management,
`virtual memory mechanisms, and some aug-
`mentation of current systems, rather than to
`data-base management or large file searching
`and processing. The application trend for
`parallel processors and ensembles seems to be in
`the area of large data computation problems
`such as weather data processing, nuclear data
`processing, and ballistic missile defense. Re-
`searchers must concentrate on systems aspects
`of the problem, not on developing solutions to
`problems which are ill-defined or nonexistent.
`The only truly useful data-processing applica-
`tions of parallel and associative processors have
`been developed through extensive work at the
`
`system
`
`followed by parallel
`level,
`systems
`design, rather than vice versa.
`SIMD procesors have been proposed am
`.appear well suited for a number of applications
`these are listed here along with their primary
`references. Factors to be considered in applying
`SIMD processors have been discussed elsewhere
`[18]. To be useful for solving the listed prob-
`lems,
`the highly parallel processors must
`become more
`cost-effective. The suggested
`application areas are:
`1) Applications in which associative proc-
`essors appear to be cost-effective:
`[19]
`a) Virtual memory mechanisms
`[20]
`b) Resource allocation
`[21]
`c) Hardware executives
`[22,23]
`d) Interrupt processing
`[19]
`e) Protection mechanisms
`[20]
`f) Scheduling
`2) Applications in which parallel processors
`appear cost-effective and in which
`associative processors may be
`cost-
`effective:
`
`[24]
`a) Bulk filtering
`[25]
`b) Tracking
`[26]
`c) Air traffic control
`[27]
`d) Data COmpIBSiOI‘l
`e) Communications multiplexing [28,29]
`f) Signal processing
`[30]
`3) Applications in which associative proc-
`essors would be useful if they were more
`cost-effective and in which parallel proc-
`essors are probably cost-effective are:
`3) Information retrieval
`b) Sorting
`c) Symbol manipulation
`d) Pattern recognition
`e)'Picture procesing
`f) Sonar processing
`g) Sea surveillance
`h) Graph processing
`i) Dynamic programming
`j) Differential equations
`‘ k) Eigenvector
`1) Matrix operations
`m) Network flow analysis
`n) Document retrieval
`0) Data file manipulation
`p) Machine document translation
`q) Data file searching
`r) Compilation
`s) Formatted files
`t) Automatic abstracting
`
`[31]
`[32]
`[33]
`[34]
`[35]
`[36]
`[37]
`[38]
`[39]
`[40]
`[41]
`[42]
`[43]
`[44]
`[45]
`[46]
`[47]
`[48]
`[49]
`[50,51]
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`INTEL - 1011
`
`INTEL - 1011
`
`
`
`Associative and Parallel Processors
`
`-
`
`219
`
`[52]
`u) Dictionary-look-up translations
`[53]
`v) Data management
`[54]
`w) Theorem proving
`[55]
`x) Computer graphics
`[56]
`y) Weather data processing
`Hollander’s paper
`[2]
`presents many
`experts’ opinions about
`the applicability of
`associative and parallel processors. Slotnick
`[57]
`and Fuller
`[58] have compared asso-
`ciative and parallel processors. They concluded
`that parallel processors appear more useful than
`associative processors, but the machines would
`always be special purpose. Shore [17] has pre-
`sented a very eloquent
`case against SIMD
`machines and parallel processors.
`
`Matrix Multiplication on a Parallel Processor
`
`Matrix manipulations can be used to illustrate
`the potential of a parallel processor. One of the
`major computations in many of the possible
`applications cited previously is
`the matrix
`multiply. Assume that a parallel processor such
`as that shown in Figure 1
`is used to perform
`this operation.
`Each processing element will be assumed to
`have three registers, AREG, BREG, and CREG.
`
`wNTROL
`UNIT
`
`uEmRv TO AND FROM ALL PEr
`
`
`
`IIUOO‘—
`
` L.
`f
`
`1_.i
`
`
`
`cfg
`
`III!
`
`Ill!
`
`.. . ..
`
`Figure 1. Parallel Processor with PE connec-
`tions to the Four Nearest Neighbors.
`
`Each cell
`
`is interconnected to its four nearest
`
`neighbors. Cannon [59] derived the following
`algorithm to multiply two n x n matrices to-
`gether in n stages using this processor.
`
`Algorithm:
`Set:
`
`CREG = O
`
`BREG [PEI J] = B [1,1] for all
`I,J <N
`’
`AREG [PEI J] = [I,J] for all
`LJ szrr
`’
`lth row of A left [-1 columns
`for all I < N
`Jth column of B up 1-1 rows
`for all J <N
`(TREG = AREG times BREG)
`In Parallel in all PEs
`
`(CREG = CREG + TREG)
`In Parallel in all PEs
`
`AREG right one row
`BREG down one column
`If not Nth pass, Jump to
`Multiply:
`
`Shift:
`
`Multiply:
`
`Add:
`
`Shift:
`
`Jump:
`
`As an example let:
`
`31
`
`a4
`
`a7
`
`b1
`
`b2
`
`b3
`
`32
`
`a5
`
`a8
`
`b4
`
`b5
`
`b6
`
`33‘
`
`a6
`
`39
`
`b7
`
`b8
`
`b9
`
`A =
`
`B =
`
`and
`
`C = A x B,
`
`After initialization, the memory map for CREG
`is:
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`For AREG, the memory map is:
`
`31
`
`35
`
`a9
`
`a2
`
`a6
`
`a7
`
`0
`
`0
`
`0
`
`a3
`
`a4
`
`38
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`INTEL-1011
`
`INTEL - 1011
`
`
`
`220
`
`-
`
`Kenneth J. Thurber and Leon D. Wald
`
`And the BREG map is:
`
`b1
`
`b2
`
`b5
`
`b6
`
`b9
`
`b7
`
`BREG =
`
`b3
`
`b1
`
`b2
`
`b4
`
`b5
`
`b6
`
`b8
`
`b9
`
`b7
`
`After the multiply, add, Shift, and jump, the After two more iterations the multiply will be
`memOW maps appear as MOW
`finished and CREG [PEI J I will contain C[I,J 1.
`
`CREG =
`
`AREG =
`
`albl
`
`asbz
`
`a9b3
`
`a3
`a4
`a
`
`8
`
`a2bs
`
`a6b6
`
`a7b4
`
`a1
`a5
`a
`
`9
`
`a3b9
`
`a4b7
`
`a8b8
`
`a2
`a6
`a
`
`7
`
`SIMD GENEALOGY
`
`Figure 2 indicates the major parallel and asso-
`ciative architectures considered in this paper,
`together with their generic relationships. These
`may differ substantially, yet they are related by
`virtue of
`their global associative or parallel
`.
`properties.
`
`.
`
`SIMD MACHINES
`
`ASSOCIATIVE PROCESSORS
`
`PARALLEL PROCESSORS
`
`ENSEMBLES
`
`wLOMONI
`(108)
`J,
`SOLOMON II
`(109)
`
`ORTHOGONAL
`COMPUTER
`(105)
`£
`OMEN
`
`(112)
`
`UNGER
`MEMEX (72)
`MACHINE
`non
`—l:;j
`CRYOGENIC CATALOG MEMORY (73)
`
`
`DISTRIBUTED LOGIC MEMORY (31)
`l
`V
`ILLIAC IV
`
`(57)
`T—Tfi—
`
`ASSOCIATION
`TREE
`TWO
`
`STORING
`CHANNEL
`DIMENSIONAL
`IMPLEMENTED
`PROCESSOR
`PROCESSOR
`AND BULK
`
`ASSOC)AT)VE
`(78)
`(80)
`DISTRIBUTED
`
`PROCESSORS
`LOGIC
`
`MEMORIES
`
`(81)
`
`
`
`HIGHLY PARALLEL
`
`PEPE
`(115)
`
`T—fi—fl
`BIT
`ADDER
`EXOR
`BYTE
`DISTRIBUTED
`SLICE
`SKEWED
`SKEWED
`SLICE
`LOGIC
`(113)
`BIT
`BIT
`(106)
`(20)
`SLICE
`SLICE
`(92)
`(93)
`
`f—l—‘T
`AUGMENTED
`ASSOCIATIVE AsmCIATIVE
`CONTENT
`CONTROL
`LOGIC
`ADDRESSABLE
`SNITCH
`PROCESSING
`MEMORY
`(24)
`SYSTEM
`(85)
`(87)
`
`
`
`J,
`STARAN
`(114)
`
`MIXED MODE AND MULTIDIMENSIONAL (23)
`
`Figure 2. SIMD Geneology.
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`INTEL - 1011
`
`INTEL - 1011
`
`
`
`Associative and Parallel Processors
`
`-
`
`221
`
`MAIN TRADEOFF ISSUES
`
`To allow the reader to place the processed
`architectures in perspective, the major system
`tradeoff issues in SIMD processors will be dis-
`cussed next. It should be understood that SIMD
`
`processors are special purpose. They are useful
`for a limited set of applications which may be
`characterized as having [18] :
`l) A large number of independent data sets;
`2) No relationships between data sets that
`prevent
`them from being processed in
`parallel;
`3) The requirement
`response; and
`to exploit
`4) The potential
`addressing selection techniques.
`
`for quick throughput
`
`associative
`
`Associative Addressing
`
`Figure 3 indicates the main features required to
`implement an associative address process. The
`designer must
`include facilities to address or
`query the processor associatively,
`i.e.,
`to
`
`
`
`GENERAL FORM OF THE
`FIELD DESCRIPTOR
`
`EMPLOYEE
`DAILY
`SALARY NUMBER
`
`address the memory array by content. In Figure
`3,
`the application may require that all em-
`ployees with a salary of $35 per day be located.
`This would be accomplished by performing an
`equality search on the daily salary field of the
`file. This search is performed in parallel. To set
`up the search a Data Register must be loaded
`with the daily salary ($35) for comparison, a
`Mask Register
`is included to mask the data
`register so that only the desired fields may be
`searched, a Word Select (activity select) Reg-
`ister specifies the subset of words to be ad-
`dressed, a Results Register is required to collect
`the results of the search, a Match Indicator is
`used to indicate the number of matches, and a
`Multiple Match Resolver must be provided to
`generate a pointer to the “topmost” matched
`word. An eight-word example of a personnel
`file stored in a simple associative memory is
`illustrated in Figure 3;
`it assumes that
`the
`search for a salary of $35 per day was initiated
`on the entire file. All necessary registers and
`features are illustrated.
`
`MATCH
`IND ICATO R
`
`DATA REGISTER
`
`MASK REGISTER
`
`MULTIPLE
`MATCH
`REwLVER
`
`Ill
`
`WORD
`SELECT
`REGISTER
`
`SEARCH
`RESULTS
`REGISTER
`
`EXTERNAL
`PROCESSING
`LOGIC
`
`Figure 3. Associative Processor Storing a Personnel File.
`
`Computing Surveys, Vol. 7I figr‘éll-Jecemaei *975
`
`INTEL - 1011
`
`
`
`222
`
`-
`
`Kenneth J. Thurber and Leon D. Wald
`
`The State Transition Concept and Associative
`Processor Algorithms
`
`In 1965, Fuller and Bird [35] described a bit-
`slice associative processor architecture. Fuller
`and Bird discussed their algorithms in terms of
`state transitions. To perform parallel processing
`of data in an associative processor, two basic
`operations are required. These are a content
`search and a “multiwrite” operation. The multi-
`write operation consists of
`the ability to
`simultaneously write into selected bits of
`several selected words. With a content search
`
`and multiwrite capability, an associative proc-
`esor can be used to process data in highly
`parallel fashion.
`the problem of
`For
`example, consider
`adding field A to field B. This can be accom-
`plished as a sequence of bit pair additions (Ai
`added to Bi), with suitable attention paid to the
`carry bit. If bit 1
`is the least significant and bit
`n the most, Figure 4 indicates the state table
`for the bit-serial addition of A to B with the
`
`result accumulating into B. As shown Bi (result)
`and Ci + 1 differ from Bi (operand) and Ci in
`the four states, 1, 3, 4, and 6. One possible
`addition method would be to search for these
`
`states and multiwrite the correct Bi and Ci + 1
`into the appropriate fields, i.e., set up a “tag
`field” designated as Ci, and set Cl = 0. Then
`operations would proceed from the least sig-
`nificant bit to the most significant bit of the
`pair of operands. At each bit position, i, Ai, Bi,
`and Ci would be searched for state 1. Then
`Ci + 1 = 0, Bi = 1 would be multiwritten into
`all matching words. After a search for state 3,
`Ci + 1 = 1, Bi = 0 would be multiwritten into all
`matching words. This same procedure would be
`carried out
`for states 4 and 6.
`if a word is
`matched on any search, it would be removed
`from the list of active cells until operations
`moved from bit position i
`to i + 1, so that
`
`writing the resultant Bi and the carry-in for bit
`position i + 1 into the memory does not cause
`the data already processed to be reprocessed.
`Obviously, utilizing the sequential state trans-
`formation approach, any desired transforma-
`tions can be processed in an associative
`processor with simple content search and multi-
`write capability using very simple word logic.
`Operating on the state transition concept, six
`general requirements of the architecture of an
`associative procesor were deduced. These are:
`
`PRESENT
`STATE
`
`PARTIAL SUM
`
`
`
`' CARRY INTO BIT POSITION i+I FROM BIT POSITION 1'.
`
`Figure 4. Bit-Slice Addition State Transition
`Diagram.
`
`the
`1) The basic processing primitive is
`identification of all words meeting some
`binary state variable configuration;
`2) No explicit addressing is necessary since
`processing
`can
`be performed simul-
`taneously over all data sets of interest;
`3) Data is processed “column serially” to
`limit the number of state transformations
`to be remembered;
`4) Tag bits (temporary bit-slice storage) are
`very useful (for example, the carry bit);
`5) A search results/word select register and
`multiword write capability are desirable;
`and
`
`i
`
`6) Stored program control of the associative
`processor is desirable.
`Fuller and Bird recognized an important
`correlation between the way in which data is
`stored in the processor’s memory and its proc-
`essing speed. As examples of this phenomenon,
`they described both word per cell (W/C) and bit
`per
`cell
`(B/C) data organizations for
`their
`aSSOciative processor. Figures 5 and 6 show the
`W/C and B/‘C storage organizations for nine
`four-bit words X11,
`, X33, respectively.
`The word per cell organization has become an
`industry standard, and it
`is, in fact, the way
`most associative processors store data. The
`desire to perform bit-slice processing on many
`data sets simultaneously without much inter-
`word communication led to the widespread use
`of this type of organization. W/C is very effi-
`cient if a large number of cells are involved;
`
`Computing Surveys, Vol. 7, No. 4, December 1975
`
`INTEL - 1011
`
`INTEL - 1011
`
`
`
`Associative and Parallel Processors
`
`-
`
`223
`
`BITS 0 TO 3 OF
`
`
`
`
`
`Figure 5. Word Per Cell Organization.
`
`operands can be stored in the same cell, and
`most operands are being processed by each bit-
`slice command. For example, in Figure 5, this
`type of organization would be very efficient if,
`for all operands, Xil, xi2, Xi3, it were desired
`to calculate X11 + “Q * xi3) for all i. In this
`case all cells could be activated and the cal-
`
`culation could be performed in a bit-serial
`manner over all operands.
`However, there are problems in which a B/C
`approach is more efficient. (The B/C approach
`is the same as described by Crane and Githens
`[76] , and was discovered independently by
`Fuller and Bird [35].) The B/C organization
`allows many operands to be processed simul-
`taneously in a bit-parallel mode. Because many
`operands are treated in a bit-parallel fashion,
`B/C processing can be very efficient, even on
`only a few operands, in comparison with W/C
`processing. Also, operands stored in the same
`bit slice can be easily picked up. Therefore, B/C
`processing can be efficient in cases where only a
`few operands are activiated for processing by an
`instruction, or for applications in which exten-
`sive pairing of operands must occur.
`Using the state transition concept, asso-
`ciative processor algorithms may be clearly
`described for a variety of architectures. Con-
`sidering the range of architectures currently
`available, no attempt will be made to describe
`the algorithms here. Rather, a list will be given