`
`.
`
`(EEE TRANSACTIONS ON COMBUTERS. VOL.
`
`€.30 NG. 12. DECEMBER [981
`
`PASM: A Partitionable SIMD/MIMD System
`for Image Processing and Pattern Recognition
`
`IBEE,
`IREE, LEAH J, SHIEGEL, MEMBER,
`HOWARD BAY SIEGEL, MEMBER,
`IEEE,
`FREDERICK C. KEMMERER, MEMBER,
`IEEE, PHILEP T. MUELLER. IR... MEMBER,
`HAROLD E. SMALLEY. FR., MEMBER.
`ISEE. AND 5S. [IANE SMITH, MEMBER,
`IERE
`
`Abstract--PASM, a large-scale multimicroprecessor system being
`designed st Purdue University for image processing and pattern rec-
`ognition, is described. This system can be dvnamically reconfigured
`fo-operate as One or more independent SIMD and‘ar MIMD machines.
`PASM consists of a parallel computation unit, which comiains V
`processors, V memories, and an interconnection network; G micra-
`controllers, each of which controls Aiprocessors; A/G parallel
`secondary storage devices: a distributed memory management system:
`and 2 system control unit, 9 coordinate the ather system camponents.
`Possible values for NV and Gare 1024 and 16, respectively, The contral
`schemes and memory management in PASS areexplored. Examples
`of how PASMcan be used to perform image processing tasks are
`given,
`
`index Terms—image processing, memory management, MIMD
`machines. .muliimicroprocessor systems, multigte-SISLD machines,
`parallel processing, partitionable computer systems, PASM, recon-
`figurable computer systems, SEMD machines.
`
`1 TRODUCTION
`
`it is now
`S a result of the microprocessor revolution,
`feasible to build multirsicraprocessor systems capable
`of performing image processing tasks more rapidly than pre-
`viously possible. There are many image processing tasks which
`can-be-performed on a parallebnrocessing system, bul are
`prohibitively expensive io eerform on a conventional computer
`system due to the large amount-of'time required to do the tasks
`[37]. Inaddition, a. multimicropracessor sysiem can use par-
`allehsmeto perform the real-time image processing required
`
`for such applications ag robot {machine} vision, automatic
`guidance of aie and spacecrafi, and air traffic control.
`There are several types of parallel processing systems. An
`SUMD (single fnsrructinn stream-multiple data sirsam)
`machine [ES] typically consists of a.set of N processors, NV
`memorics, an interconnection network, and a-control unit (e.g.,
`Hac PY [10)). The control unit broadcasts instructions io the
`processors and all active (“turned on”) processors execute the
`same instruction at the same time. Each processor executes
`instructions using data taken frora a memory with which only
`iids assaciaied. The inierconnection network aHows inter-
`processor communication. An MSIMD Cnuudtiple-SiMD)
`systent is a parallel processing systern which can be structured
`as.one or more independent SIMD machines (e.g. MAP [91],
`[32}}. The hac PV was originally designed as an MSIMD
`system [3f. An ACFAPD Onultiple instruction siream-multipie
`data stream) machine [18] typically consisis of AU processors
`and Ni memories, where each processor can follow an inde-
`pendent instruction stream (e.g.. Cmmp {60]}, As with SUMD
`architectures, there is a multiple data stream and an inler-
`connection nepvork. A partitionable SIMB/ALTAID system
`is a parallel processing system which can be structured as-one
`or more independent SIMD and for MIMD machines. inthis
`paper PASM[47], [48], a partitionable SIMD / MIMDsys-
`iem being designed at Purdue Universityfor image processing
`and paiiern recognition, is described.
`Many designers have discussed the possibilities of building
`large-scate parallel processing systems, employing 2'*t9 2'6
`microprocessors, in SIMD (c.g, binary a-cube array [34]} and
`MIMD (o.2.. CHoPP [54], [S81) configurations. Without the
`Manuscrigt reesived September 11, 197%: revised May 8.1980 and March
`presence of such a large number of processors, the concept of
`
`
`4, P98). Thiswark was supported bytheAir Fores Office of Scientific Re-
`partitioning the system into smaller machines which can op-
`
`search: Air Force Systenns Canumand,
`AF under Geant APOSR-TS-358),
`
`
`eraic as SEMD or MIMD machines was unnecessary. Nuit
`and the Defense Mapping Agency, manitared by the LS Aig Fores Rame
`Aly
`ft
`Peveloniment Center boformalion
`Divisian, ander Cantract
`[34] has suggested a machine which is a muliipie-SIMD svs-
`
`
`FSQSG2-TS-0-0075 through the Univ:
`Michigan.
`tem, Lipoyski and Tripathi [27] have considered ghe idea of
`
`Hod Siegel and Ld. Siegel are with the
`School of Eicctrical Engineering,
`combining the SIMD and MEMD mades of operation In-one
`Purdue University, West Lafayene, 6S 7907,
`Foto Kemiierer swag with the School of Electrical Engineering, Purdue
`system. In addition, developments in recent years have shown
`Universky; Wee Lafaveie, IN-S7907> He is now with Bell Laborstorics,
`the importance of paraliclism ic image processing, using bath
`Whippany, NP O?88f,
`cellular logic arrays (en, CLIP [SO] BASE 8 [35] and
`
`BOT Mueller, Jr. was withthe School of Electrical Eagmeering, Purdue
`University, West Lafayette, IX 37907) Ede is naw with Geophysical Services.
`SEAD systems fo... STARAN [36]). A variety of such sys-
`ine. Dallas, TR-75268.
`tenis are discussed In [19]. Thus, the time seems right to in-
`HOE Smalley, Jy, was with che School of Electrical Engineering, Purduc
`vesligaic howto construct a compulersysiem such as PASM:
`Univensiy. West Lafayetic, IN STOUT: Ties now eh Hitachi American, Lid.
`Atlanta, GA 30368:
`a machine which can be dynamically reconfigured as one or
`SoD. Smith was whh-the School of Elecisieal Engineering, Purduc Uni-
`more SIMD and for MIMD machines, optimized for a variety
`versity, Wee Lafayette IN 47907. She is now withthe Depariment af Elee-
`of
`important
`image processing and pattern recognition
`iniesh and Comapulcr Engineering, Universdy af Wisconsin, Madison. WH
`SEGS.
`tasks,
`
`
`
`OOPE-SIAD (RE /1L2NG-G934500.75
`
`eye
`Lo
`© 1981 [EEE
`Reexamination Requester Exhibit 6
`IPT Ex-20188dot 14
`LG v IPT:
`IPR2023-00104
`
`IPT Ex-2012, p. 0001
`LG v IPT
`IPR2023-00104
`
`
`
`SEEGEL ef al. PASM
`
`The use of parallel processing in image processing has been
`limited in the past duc to cast consiraints. Mosi systems used
`smal numbers of processors (e.g,, Hiac TY (10), processors
`of Hmited capabilities (c.g, STARAN [36]}, or specialized
`logic modules (¢.g., PPM [24]}. With the development of
`microprocessors and related technologies it is reasonable io
`consider parallel systems using a large sumber of complete
`PFOCESSOTS.
`SIMD machines can be-used for “local” processing of seg-
`ments of images in parallel. For example, the image can be
`segmenied and each processor assigned a scgment. Then fal-
`lowing the same set of instructions, sucttasks as line thinning,
`threshold dependent operations, and gap Alling can be done
`in-paralel for all segments of the image simultaneously. Also
`in SIMD mode, matrix arithmetic used for such tasks as sta-
`tistical pattern recognition can be done efficiently, MIMD
`machifies can be-used io perform different “global” pattern
`recognition tasks in parallel, using multiple copies af the image
`or-one-or more shared copies, For example, in cases where the
`goal is to locate two ar more distinct objects in an image, cach
`object canbe assigned a processor or set of processors to search
`fort. An SIMD/ MIMD application might involve using the
`same set of microprocessors for preprocessing an image in
`SIMD made and then doing a. pattern recognition task in
`MIMD made.
`is a
`PASM, a patiitionable SIMD/MIMD rnachine,
`large-scale dynamically reconfigurable multimicropracesser
`sysiem [43], [46]~[48]. lt is a special purpose sysiem being
`designed te exploit the parallelism of image processing and
`patiern recognition tasks. locan also be applied to related areas
`such as speech processing and biomedical signal processing.
`in this paper the architecture of PASM is presented and ex-
`amples of its use in performing image processing tasks are
`given.
`Camputer system designers have been considering various
`multimicracomputer architectures, such as [9], Pii, (234,
`{263, [27}, [34], £94], [58], and [$9]). PASM combines the
`following features:
`1} itcan be partitioned to operate as many independent
`SIMD and/or MIMD machines of varying-sizes, and
`2} itis being developed using a variety of problems in image
`processing and pattern recognition to guide the design
`choices.
`The purpose.of PASM is to serve as a vehicle for experi-
`menting with parallel image processing and paticrn recogni-
`tion. Reis not meantic be a production-Hne machine, butrather
`aresearch ioal, The design attempis to incorporate the needed
`flexibility for studying large-scale SIMD and MIMD narai-
`iclism, while keeping system casts “reasonable.”
`in this paperthe overall organization of PASMis presented.
`In:particular, the control structure and secondary storage
`sysiems are.described, and some application examples are
`given. The purpase here is to present design concepts that will
`allow.a system to exploit the paraliclism that, for example,
`1024 processors can provide. Implementation details are
`currently under study and are beyond the scone and length of
`this paper.
`Fig. lisa block diagram of the basic components of PASM,
`The parallel computation unit (PCL) contains N = 2" pro-
`
`
`
`QYSTER
`
`RERES
`SQAEndy
`
`
`SANTRM
`
`SBQRASE
`
`BNET
`STARARE
`. SYOTES
`
`
`
`
`
`SENGRS
`
`
`RELRV
`MARASES
`BRRALLES
`
`
`SUMATRAT
`DONTROL~
`tnt
`BRET
`
`
`RERS
`RYSEER
`
`
`
`
`Fig. 3.
`Block disgram overview of PASH
`cesers,VW memory modules, and an interconnection network.
`The PCU processors are microprocessors that perforny the
`actual SIMD and MIMD computations. The PCL memory
`modudes are used by the PCU processors for data storage in
`SIMD made and both data and instruction storage in MIMD
`made. The interconnection network provides a means of
`communication among the PCH processors and memory
`modules, Two passible ways to organize the PCU and different
`types of partitionable networks which can be used are described
`in Section LL.
`The mucrocontrallers (AC's) are 3 set of microprocessors
`which act as the control units for the PCU processors in SIMD
`mode and orchestrate the activities of the PCU processars in
`MIMD made. There are O = 24 MC's. Bach MC controls
`NiO PCUprocessors. A virtual SIMD machine (partition) of
`size RA/O, where R= 2? andQ sr Sg,is obtained byloading
`R MC memory modules with the same instructions simulta-
`neously. Similarly, a virtual MIMD machineof size RA/Q is
`obtained by combining the efforts of the PCE! processars of
`R MC's, @ is therefore the maximum number of partitions
`allowable, and A/O is the size of the senallest partition. Possible
`values forWY and @ are 1024 and 16, respectively. Control
`Sorage contains the programs for the MC's. The MC’s are
`discussed in more detail in Section Hi.
`The memory storage system provides secondary storage
`space for the data files in SIMD mode, and for the data and
`program files in MIMD» mode. The memory management
`Spatem controls the transferring of files between the memory
`storage system and the POU memory modules. t employs a
`seta cooperating dedicated microprocessors. Multiple storage
`devices are used in the memorystorage system io allow parallel
`data transfers. The secondary memorysystem is described in
`Seetion TV,
`:
`The spstem_ control unit is 3 conventional machine, such as
`a PISP-11, and is responsible for the overall coordination of the
`activities of the other components of PASM. Examplesof the
`tasks the system control uni: will performinclude program
`development, job scheduling, and coordination of the loading
`of the PCL memory modules from the memory storage system
`with the loading of the MC memory modules-from control
`storage. By carefully choasing which tasks should be assigned
`to the system control unit and which should. be assigned to
`other system componenis, the sysiem contro) unit can work
`effectively and not become a bottleneck. Examples ofthis in-
`clude using the MC’s to act as the control units for virtual
`SIMD machines, controlling the interconnection network with
`routing tags generated by each PCL) processor, and haying the
`memory management sysicm supervise primary/secondary
`memory transfers,
`
`Reexamination Requester Exhibit 6
`IPT Ex-20385 bo6S 14
`LG v IPT
`IPR2023-00104
`
`
`
`IPT Ex-2012, p. 0002
`LG v IPT
`IPR2023-00104
`
`
`
`S36
`
`IEEE TRASSACTIGNS GN COMPUTERS, YOR. ¢-30, NO
`
`P2. BECEMEER T9St
`
`rarnnnnnannnnnnyaa
`eo
`:
`i
`i
`j
`
`RT
`
`ry
`ye:
`PRALESINE ELEMENT O
`yy
`oars
`‘
`\
`REM, GA
`KIOBR™ x
`nis ge
`PFRER. EP
`i
`eotetene
`:
`rit
`.
`ree
`PROCESS{NER ELENER
`ig
`“yy
`is
`RERES
`|
`He
`=
`Senet
`ig
`KER.TS
`
`REN. HA
`
`S
`
`
`
`&2r
`
`exe
`
`
`
`PERGAYMAKRCEATNTSratew
`
`AAARAEnmemendes
`
`Fig. 2.
`
` REGHREDTUON£%.
`
`
`
`txt
`
`
`Hig. 3.
`
`Processar-to-memory configuration af the parallel computation
`wan,
`
`address to the memory and once for iransferring the data. (The
`ume required for controlling the network is omitted since
`control methods vary.)
`For the -PE-to-PE configuration the time. required for a
`memory reference, Tippy, depends onthe location of the memory
`which js to be used. If the memory is focal, then
`
`Tpe = Tone
`
`(2}
`
`Ifthe memory is connected to seme other processor, then
`iD
`Ppp = Tin + Tae + Die
`Troe TOpresents the time required for the PE which has the data
`item to recognize and service the daia request. This mayre-
`quire.a significantly longer delay than Tig; [fp isthe proba-
`bility of a local memory reference, then (2) and (3) can be
`combined to give the expected memory reference time
`ElTpel = pTne HU PRTint Tart Tinh
`
`4)
`
`Cormparing (13 and (4)
`
`Tpiy, 2 ELTpp]
`
`(3
`
`for. p sufficiently large. Thus, the “best” configuration is task
`dependent.
`When operating in SIMD mode with the PE-tc-PE-con-
`figuration, ii is often possible to omit one occurrence of Fy, in
`(Shand reduce Tito Fin, This is done by computing the ad-
`
`Reexamination. Requester Exhibit 6
`IPT Ex-20183 dood 14
`LG vIPT
`IPR2023-00104
`
`Together, Sections UH, UE, and [VY present the overall ar-
`chitecture of PASM. Particular attention is paid to the ways
`in which the control structure and secondary memory scheme
`allow PASM to be efficiently partitioned inta independent
`virtual machines. Variations in the design of PASM's PCU
`which sil] support these control and secondary memory ideas
`are examined. This cxamination demonsirates how the con-
`cepis underlying PASMcan be used in the design of different
`systems.
`in. Section V image processing algorithms using PASMare
`presented. In particular, smoothing and histogram calculations
`are cxamined. Using these examples, the potential improve-
`monta system such as PASM can provide over serial machines
`is demonstraicd.
`
`I PARALLEL ComPporation Unit
`
`Al PCYOrganization
`
`The paraliel computation unit (PCU) coniains processors,
`memories, and an interconnestion network. One configuration
`of these components is Lo connect aamemory module to cach
`processor to form a processar-memary pair called 4 processing
`element( PE} {e.g., iacl¥ [1O)). The interconnection net-
`work is used for communications among PE’s, This °P#-10-
`PE” configuration is shown in Fig. 2. A pairof memoryunits
`isnsed for each memory module. This double-buffering scheme
`allows data to-be moved between one memory-unit and sec-
`ondary storage Ghe memorystorage system) while the pro-
`cessor dperates on daiain the ather. memory uni. in the PE-
`to-PE configuration, “local” memoryreferences are relatively
`fast, however, the transfer oflarge blocks.of data from PE io
`PE is delayed by the memoryfetching and storing which musi
`be done.
`The “P-io-A9" (processor-to-memary} configuration, shown
`in Pig. 3, uses the interconnection network to commect the
`processors to the memory modules. Again, double-bulfering
`isemployed. There is no “local” memory. To fetch an operand
`fromamemory, the processor must firsi send the address of the
`operand through the interconnection nctwork toa memory.
`Then the processor receives the operand from the memoryvia
`the Interconnection network. Advantages of the P-to-M con-
`figuration arethat a memory connected toa processor can be
`reconnected io another processor, effectively transferring the
`entire contents of the memory fram one processor to another,
`and that the number of memories does not have to be equal to
`the numberof processors (c.g.. BSP [12)). A disadvantage is
`that all memory references must go through the interconnec-
`ion network.
`A more detailed analysis reveals some of the tradeoffs in-
`volved in these two configurations. If Tq, is the time required
`for a memory accessicither a read or a write}. and Tj, is the
`time tosend.a.data Rem-through the interconnection network,
`then the time required for a memory reference in the P-to-M
`configuration. Tips, is given by
`
`Tp. MO Pin Pi bP.
`
`i}
`
`Fi, must be included twice, once for the processor tosend the
`
`IPT Ex-2012, p. 0003
`LG v IPT
`IPR2023-00104
`
`
`
`SPEGEL. er af. PASM
`
`937
`
`dress ofthe desired data in the processor connected to the
`memory to be accessed (c.g., see the algorithms in Section
`V-B), Thus, (4} reduces to
`
`ELTpg] = PPab oompT Fig).
`
`{6}
`
`Therefore. when operating in SIMD mode the PE-to PE
`configuration is preferable.
`When operating in MIMD mode, the PE-to-PE confizgu-
`ration requires that two processors be involved in every non-
`local memory reference. The efforts of two processars involved
`ina data transfer can be coordinated by having the processor
`which initiates the transfer interrupt the other processor or by
`dedicating one of hese processors to handling data transfers.
`In the P-to- M configuration the memories are shared by the
`pracessors, Le. more ihan-one processor can sccess the sarne
`memoryfor either data or instructions. However, for the image
`processing tasks thai have been examined, most daia and in-
`structions can be stored inthe local memory, reducing the
`impact of this consideration.
`The PE-t+ PE configuration will be used in PASM. De-
`pending on the application for which a different partitionable
`SIMD/ MIMD system is intended, the P-to-M canfiguration
`may be preferable. The interconnection networks, control
`structure, and secondary memory system described below can
`be.used in conjunction with either.
`
`B.
`
`interconnection Netvorks
`
`Two types of multisiage interconnection networks are being
`considered for PASM: the Generalized Cube and the Aug-
`mented Data Manipulator (ADM). Both of these nciwarks
`have the following properties: 1} a logical structure consisting
`of a stages of A'/2 (Cube} or N CADM}) switches [49]; 2}
`distribuied contral- by routing tags generated by cach processar
`[25], [28]; 3) the ability te operate in SIMD mode CY simul-
`taneous transfers} and MIMD mode [44], [45]: and 4) the
`ability to be parthioned into independent subnetworks (42),
`As discussed'in [49], the Cube network is equivalent or directly
`relatedto other networks in the Hterature (¢.g., 51, (61, [24],
`[33], and [34]). The ADM fs an extension.of the data ma-
`nipulator [16].
`Both the Cube and ADM networks can be implemented as
`single stage-networks [38], [40], [41] instead of as multistage
`networks, These single stage implementations can also be
`partitioned [42], are compatible with the PASMcontrol and
`memory Management-schemes,and may be appropriate for
`MSIMD systems, depending on the-intended applications.
`However, since there is only a single stage of switches, for the
`MIMD mode of operation intermediate processors may have
`to be Interrupted to transfer data. (See [40] for more infor-
`mation.) Thus, single stage aeiworks are nol appropriate for
`PASM, bat might be for an MSIMD system based on the
`design concepts of PASM,
`The tradeoffs between the Cube and ADM multistage
`networks for PASM are currently under study. The ADM
`network is more flexible and fault tolerant 140), [44], but is
`more complex, The Cube may. be more cost effective and suf-
`ficient for the system's needs. In the following sections it will
`be assumed.thai the processors. will be partitioned. such that
`
`the addresses of all of the processors in a partion agree it their
`low-order bit positions. This constraint will allow either the
`Cube or ADM network to be used as the partitionable imter-
`connection network in PASM. Details of these networks are
`beyond the scope of this paper, and readers are referred tothe
`references indicated.
`
`C. PCUProcessors
`
`The PCU processors will be specially designed for parailel
`image processing. Simulation studies are currently being
`conducted-ic aid in determining an efficient instruction set.
`A- PASM prototype will most likely be based on useramicro-
`programmable components to obtain needed flexibility, while
`the final system will employ custom VLSP processors.
`
`Hi. MICROCONTROLLERS
`
`A.
`
`lutroduction
`
`in this section the micracontraflers (MC's) are discussed.
`itis the MC's that enable PASM to operate a3 an MSEMD
`system.
`in general, the possible advantages of an MSIMD system
`over an SIMD system with a similar number of PB’s include
`the following.
`{) Fault Detection: For situations where high reliability
`is needed, three partitions can run the same pragram onthe
`same data and compare results.
`2) Fault Talerance: Ifa single PE fails, only those logical
`SIMD machines (partitions) which must include the failed PE
`need t¢ be disabled. The rest of the system can cominuc to
`function.
`3} Multiple Simutianeeus Users: Since there can be
`multiple independent virtual SIMD machines, there can be
`multiple simultaneous users of the system, cach executing a
`different SIMD program.
`$) Program Development: Rather than trying to debug an
`SIMD program on, for example, 1024 PB’s, itcan be debugged
`ona smaller size virtual SIMD machine of 32 or 64 PE’s.
`5} Variahle Machine Size for Efficiency: Wf a task requires
`only A'/2 of ¥ available PE’s, the other A’/2 can be used-for
`another task.
`6) Subtask Parallelism: Two independent SIMD subtasks
`that are part of the same job can be executed in parallel,
`sharing results if necessary.
`For PASM’s intended application as a toc] for experi-
`menting with the use of paraliclism in image processing and
`pattern recognition, points 2}~6) above will be significant.
`&. Microcontroller Organization and Addressing
`Conventions
`
`in order to have a partitionable system, some format mul-
`tuple control units must be provided. in PASMthis is done by
`having G@.= 2° MC's, physically addressed (numbered) from
`Oto — |. Each MC controls 8/O PCUprocessors, as shown
`in Fig. 4.
`An MC is a microprocessor which is attached toa memory
`module. Each memory module consists of a pair of memory
`units so that memoryloading and computations can be qver-
`lapped. This double-buffering reduces the likelihood of MC's
`
`Reexamination Requester Exhibit.6
`IPT Ex-20138% dot 14
`LG vIPT
`IPR2023-00104
`
`IPT Ex-2012, p. 0004
`LG v IPT
`IPR2023-00104
`
`
`
`JEEE TRANSACTIONS Of COMPUTERS, VOR. 30, NG.
`
`12, DECEMBER 19913
`
`FROM SYSTEM CONTROL UNIT
`AND CONTROL STORAGE
`
`
`PROE. 8
`PRO. gow ~ s
`ae ESR
`
`PROC. Nef ~
`PROD.
`F
`ARSE. Qed
`
`PROD, Heget
`
`BROC. +t
`PROC.
`IQs}
`
`SO MER.
`ow
`
`
`Rts
` OP SHORRe
`3b
`3x.
`
`
`go NES,
`o APADS Qh}
`Vis
`
`
`Fig. 4. PASMmiceacontroflers (MC's),
`
`being idle while waiting for a program to beloaded. In SIMD
`mode, each MC feiches instructions from its memory module,
`executing the control flow instructions (c.g, branches} and
`broadcasting the data processing instructions to its POU
`processors. The physical addresses-of the A/Q pracessors which
`are connected to an MC must all have the same low-order ¢
`bits so that the network can be paftitioned, The valuc.of these
`low-order g bis is the physical address of the MC. A-virtual
`SIMD machine of size RA/Q, where R = 2° and 0 Sr Sg,
`is obtained by loading & MiC’s with the samednstructions and
`synchronizing the MC's. The physical addresses of these MC's
`must have the same low-order a — r biis.se that all of the PCU
`processors in the partition have the same low-order g ~ r
`physical address bits. Similarly, a virtual MIMD machine of
`size. RAV/G is obtained-by combining the efforts-of the PCU
`PE's associated with R MC's which have the same low-order
`go physical address-bits. in. MIMD made the MC's maybe
`usedto- helo coordinate the activities of their PCO PR's,
`The approach of permanentlyassigning a fixed number of
`POU PE’s to each MC has several advantages over allowing
`a varying assignment, such as in (TT } and [31]. One advantage
`is thatthe operating system mead only schedule (and menitor
`the “busy” status of} O MC’s, rather than VW PCL PE’s. When
`@ = 16 (or even 32} and A = 1024, Uns is a substantial savings.
`Another advantage is that no crassbar switch is aceded for
`connecting processors and-control units (as proposed in [11]
`and [31}). Also, this fixed connection scheme allows theeffi-
`cient use of multiple secondary storage devices, as is discussed
`in Section 1V-B. The main disadvantage of this approach is
`that cach virtual machine size must be a power of two, with a
`minimumvalue ofV/QG. However, for PASM’s intended ex-
`perimental environment, Mlezibiliry at reasonable cost is the
`goal, not maximum processor utilization.
`This. basic MC organization can be enhanced to allow the
`sharing of memory modules by the MCs in a partition. The
`MC's can be connected by.a shared reconfigurable (’shorta-
`ble} bus such as described in [2]as shown in Fig. 5. The MC's
`must beordered on ihe bus in-terms of the bit reverse of their
`addresses due io the partitioning rules. (The sharing of
`
`NO NEMORY NDOULES
`
`a o STHRGUEN
`
`hynny
`
`Fig. 5. Reconfigurable shared bus scheme for interconnecting microcon-
`iralier CMC} processors and MCmemory modules, for @ S38. Each box
`ean beset io “through” or “short”
`
`memories using a reconfigurable bus is employed in [22}.) This
`enhanced MC.connection scheme.would provide more program
`space for jobs using multiple MCs and would alse provide a
`degree of fault tolerance. since known-faulty MC memory
`modules could be ignored. These advantages come atthe ex-
`pense of additional system complexity, and the inclusionofthe
`enhanced scheme.in PASM will depend on cost constraints at
`implementation lime.
`:
`iseach partition the PCU processors and mernory madules
`are assigned fogical addresses. Given a virtual machine of size
`RAVG. the processors and memory modules for this partition
`have logical addresses (numbers} 0 to (RASO} ~ ER = 2°,
`0.47 8 q. Assuming that ihc MC's have been assigned as
`described above, then the logical numvber-of a processor-or
`memory module is the high-order, + » - g bits-of the physical
`number. Similarly, the MC's assigned to the purtition.are
`logically numbered {addressed} from Gio R — 1. For R.> f,
`the logical number of an MC is the high-order
`bits of is
`physical number. The PASMlanguage compilers.and aper-
`ating system will be used ia convert fromlogical to physical
`addresses, 30 a system user will deal only with logical ad-
`dresses.
`
`C. Cammunications with the Sysien Contral Unit
`
`SUMP programs are stored in control ciorage (see Fig. 1},
`whichis the secandary storage for the MC's. The loading of
`these programs from control siorage into the MC memory
`units is controlled by the sysiem. control unit, When large
`SIMD jobs are run, that is, jobs which require more than AYO
`processors, more than one MC oxecutes the same sebof in-
`structions. With the basic scheme.in the previous section, each
`MC has itsown memory, so if more than one MC isda be used,
`ihen several memories must be loaded with the same sei of
`instructions.
`The fastest way to load several MC memories with the same
`sei of instructions is to load all of the memories al the same
`time. This can be accomplished by connecting the contral
`storage to all af ihe MC memory modules via a bus. Each
`memory module is either enabled or disabled for loading fram
`control storage, depending on-the contents of the G-bR AFC
`
`Reexamination Requester Exhibit 6
`IPT Ex-201288 gods §4
`LG vIPT
`IPR2023-00104
`
`IPT Ex-2012, p. 0005
`LG v IPT
`IPR2023-00104
`
`
`
`SIEGEL. of of, PASM
`
`939
`
`memory load register. The @-bit MC inemoryselect register
`selects for cach MC processor which memory unit of its
`memory module it should use for instructions. An enabled
`memory unit-not being. used by.an MC processor receives the
`data from control storage. Both of these registers are set by the
`system control unit.
`Fhe Q-bit ACstarus register.contains the go/done stains
`of the MC's. When the system cantrol.unilsets ihe th bile
`a“t,° MC j sets its program counter to zero.and begins exe-
`cuting the contents of the memory unit that is specified by the
`McC memory select register, When the MC is done, it resets
`its bit of the MC status register-to “OD” and sends.an interrupt
`ta the system control unitto inform ht that an MC has fine
`ished.
`Further details abaut the communications between the
`RMC’s and the system control unit are in [47].
`
`B. Communications Among Micracoatrolises
`instructions which examine the collective status of all of the
`PE'cof avirtual SMD machine include “ifany,” “if all," and
`“if none.” These instructions change ithe flow of controlof the
`program at execubion time depending on whether any oral
`processors in the SIMD machine satisfy some condition. For
`example, if cach PE is processing data from a different radar
`unit, bui all PE’s are looking for enemy planes, it is desirable
`for the control unk. to know 'fany” of the PR's has discovered
`a possible attack.
`The task of computing an “Hf any” type of statement that
`involves several MCs can be-handied using the existing
`hardware. This can be accomplished by having the PCU pro-
`cessars assaclated with these MCs use a recursive doubling
`P33} tyne of algorithm,
`Specialized hardware to handic this iask can be added in-
`expensively and will result in a faster solution than the soflware
`approach. Each MC will have a.g + 1 bit jod identification
`maonber UD) for the job it is running (there can be at most one
`job in each of the 2@ memory units}. The MC's will share an
`MC communication bus, consisting of an 1D bus and a data
`bus. When an “ifany” type instruction is encountered, each
`MC associated with the job determines the result of the “if
`any” type of statementfor ts N/Q PE’s and sends a request
`to use ihe communication bus. When an MO receives per-
`mission to use the bus, it broadcasts its job ED to all of the MC's
`(including itself) via the ID bus. fan MC is ruening the jab
`with the 1D which is on the 1D bus it then puts Hs local results
`onte the data bus. The data bus is one bit wide and will be
`constructed aga @ input “wired and” gate. This allows all of
`the MC's associated witha job to put thelr data on the bus si-
`mulancously and read the result. Rach MCserviced removes
`itsell fromthe queue. Phe MC's will then execute the condi-
`tional branch inthe commoninstruction stream. All MC's in
`3B partition will be synchronized, executing the “if any” type
`of instruction at the same time. None of the MC's will continue
`past thai instruction until after its job 1D bas appeared on the
`ID bus.
`The system control unitds not burdened underthis method.
`
`More details about
`147}.
`
`this communications scheme are in
`
`&. Enabling and Disabling PCU Processors
`In SIMD mode all of the active PCU PE’s will execute in-
`structions broadcast to them bytheir MC. A masking scheme
`is a method for determining which PCU PR's will be active at
`& given point in time. An SIMD machine may have several
`different masking schemes.
`The general masking scheme uses an N-bit vector to-de-
`termine which PCU PE’s ia activatc. PE? will be active if the
`ith bit.of the vector isa 1, for OQ 27 < NL A maskinsiruction
`is cxecuted whenever a change in the active siatusof the PE’s
`is required. The Hilac 1V, which has 84 processors.and 64-bit
`words, uses general masks (51). However, when 4 is larger,
`say 1024, a scheme.such as Unis becomes less appealing.
`The P& address masking scheme (281 uses an a-position
`mask tospecify which of the A’ PCL PE’s are to be activated.
`Each position of the mask corresponds to 4 bit position in the
`addresses of the PE’s. Each position of the mask will contain
`either a G, },or Y (DON'T CARE} and the only PR’s that will
`be active are those whose address maiches the mask: G.matches
`0, latches 1, and either Gor | matches A. Square brackets
`denote a mask. Superscripts-are used as repetition factors. Far
`example, MASK Uy? OF activates all even numbered PE’s;
`MASK [077OX] activates PE’s Gio 2? — 1.
`A negative PE address mask 139] is similar toa regular PE
`address mask, except that ii activates all those PR's which do
`mot match the mask. Negative PE address maske are prefixed
`with a minus sign to distinguish them from regular PE address
`masks. For example, for VY = 8, MASK [-G84] activatesall
`PE's except Gand 1, Thistype of mask can activate sets of PE's
`a single regular PE addreas mask cannot.
`Like general masks, PE address masks are specified in the
`SIMD program. PE address masks are more restictive than
`genera bmasks, in