throbber
O44
`
`.
`
`(EEE TRANSACTIONS ON COMBUTERS. VOL.
`
`€.30 NG. 12. DECEMBER [981
`
`PASM: A Partitionable SIMD/MIMD System
`for Image Processing and Pattern Recognition
`
`IBEE,
`IREE, LEAH J, SHIEGEL, MEMBER,
`HOWARD BAY SIEGEL, MEMBER,
`IEEE,
`FREDERICK C. KEMMERER, MEMBER,
`IEEE, PHILEP T. MUELLER. IR... MEMBER,
`HAROLD E. SMALLEY. FR., MEMBER.
`ISEE. AND 5S. [IANE SMITH, MEMBER,
`IERE
`
`Abstract--PASM, a large-scale multimicroprecessor system being
`designed st Purdue University for image processing and pattern rec-
`ognition, is described. This system can be dvnamically reconfigured
`fo-operate as One or more independent SIMD and‘ar MIMD machines.
`PASM consists of a parallel computation unit, which comiains V
`processors, V memories, and an interconnection network; G micra-
`controllers, each of which controls Aiprocessors; A/G parallel
`secondary storage devices: a distributed memory management system:
`and 2 system control unit, 9 coordinate the ather system camponents.
`Possible values for NV and Gare 1024 and 16, respectively, The contral
`schemes and memory management in PASS areexplored. Examples
`of how PASMcan be used to perform image processing tasks are
`given,
`
`index Terms—image processing, memory management, MIMD
`machines. .muliimicroprocessor systems, multigte-SISLD machines,
`parallel processing, partitionable computer systems, PASM, recon-
`figurable computer systems, SEMD machines.
`
`1 TRODUCTION
`
`it is now
`S a result of the microprocessor revolution,
`feasible to build multirsicraprocessor systems capable
`of performing image processing tasks more rapidly than pre-
`viously possible. There are many image processing tasks which
`can-be-performed on a parallebnrocessing system, bul are
`prohibitively expensive io eerform on a conventional computer
`system due to the large amount-of'time required to do the tasks
`[37]. Inaddition, a. multimicropracessor sysiem can use par-
`allehsmeto perform the real-time image processing required
`
`for such applications ag robot {machine} vision, automatic
`guidance of aie and spacecrafi, and air traffic control.
`There are several types of parallel processing systems. An
`SUMD (single fnsrructinn stream-multiple data sirsam)
`machine [ES] typically consists of a.set of N processors, NV
`memorics, an interconnection network, and a-control unit (e.g.,
`Hac PY [10)). The control unit broadcasts instructions io the
`processors and all active (“turned on”) processors execute the
`same instruction at the same time. Each processor executes
`instructions using data taken frora a memory with which only
`iids assaciaied. The inierconnection network aHows inter-
`processor communication. An MSIMD Cnuudtiple-SiMD)
`systent is a parallel processing systern which can be structured
`as.one or more independent SIMD machines (e.g. MAP [91],
`[32}}. The hac PV was originally designed as an MSIMD
`system [3f. An ACFAPD Onultiple instruction siream-multipie
`data stream) machine [18] typically consisis of AU processors
`and Ni memories, where each processor can follow an inde-
`pendent instruction stream (e.g.. Cmmp {60]}, As with SUMD
`architectures, there is a multiple data stream and an inler-
`connection nepvork. A partitionable SIMB/ALTAID system
`is a parallel processing system which can be structured as-one
`or more independent SIMD and for MIMD machines. inthis
`paper PASM[47], [48], a partitionable SIMD / MIMDsys-
`iem being designed at Purdue Universityfor image processing
`and paiiern recognition, is described.
`Many designers have discussed the possibilities of building
`large-scate parallel processing systems, employing 2'*t9 2'6
`microprocessors, in SIMD (c.g, binary a-cube array [34]} and
`MIMD (o.2.. CHoPP [54], [S81) configurations. Without the
`Manuscrigt reesived September 11, 197%: revised May 8.1980 and March
`presence of such a large number of processors, the concept of
`
`
`4, P98). Thiswark was supported bytheAir Fores Office of Scientific Re-
`partitioning the system into smaller machines which can op-
`
`search: Air Force Systenns Canumand,
`AF under Geant APOSR-TS-358),
`
`
`eraic as SEMD or MIMD machines was unnecessary. Nuit
`and the Defense Mapping Agency, manitared by the LS Aig Fores Rame
`Aly
`ft
`Peveloniment Center boformalion
`Divisian, ander Cantract
`[34] has suggested a machine which is a muliipie-SIMD svs-
`
`
`FSQSG2-TS-0-0075 through the Univ:
`Michigan.
`tem, Lipoyski and Tripathi [27] have considered ghe idea of
`
`Hod Siegel and Ld. Siegel are with the
`School of Eicctrical Engineering,
`combining the SIMD and MEMD mades of operation In-one
`Purdue University, West Lafayene, 6S 7907,
`Foto Kemiierer swag with the School of Electrical Engineering, Purdue
`system. In addition, developments in recent years have shown
`Universky; Wee Lafaveie, IN-S7907> He is now with Bell Laborstorics,
`the importance of paraliclism ic image processing, using bath
`Whippany, NP O?88f,
`cellular logic arrays (en, CLIP [SO] BASE 8 [35] and
`
`BOT Mueller, Jr. was withthe School of Electrical Eagmeering, Purdue
`University, West Lafayette, IX 37907) Ede is naw with Geophysical Services.
`SEAD systems fo... STARAN [36]). A variety of such sys-
`ine. Dallas, TR-75268.
`tenis are discussed In [19]. Thus, the time seems right to in-
`HOE Smalley, Jy, was with che School of Electrical Engineering, Purduc
`vesligaic howto construct a compulersysiem such as PASM:
`Univensiy. West Lafayetic, IN STOUT: Ties now eh Hitachi American, Lid.
`Atlanta, GA 30368:
`a machine which can be dynamically reconfigured as one or
`SoD. Smith was whh-the School of Elecisieal Engineering, Purduc Uni-
`more SIMD and for MIMD machines, optimized for a variety
`versity, Wee Lafayette IN 47907. She is now withthe Depariment af Elee-
`of
`important
`image processing and pattern recognition
`iniesh and Comapulcr Engineering, Universdy af Wisconsin, Madison. WH
`SEGS.
`tasks,
`
`
`
`OOPE-SIAD (RE /1L2NG-G934500.75
`
`eye
`Lo
`© 1981 [EEE
`Reexamination Requester Exhibit 6
`IPT Ex-20188dot 14
`LG v IPT:
`IPR2023-00104
`
`IPT Ex-2012, p. 0001
`LG v IPT
`IPR2023-00104
`
`

`

`SEEGEL ef al. PASM
`
`The use of parallel processing in image processing has been
`limited in the past duc to cast consiraints. Mosi systems used
`smal numbers of processors (e.g,, Hiac TY (10), processors
`of Hmited capabilities (c.g, STARAN [36]}, or specialized
`logic modules (¢.g., PPM [24]}. With the development of
`microprocessors and related technologies it is reasonable io
`consider parallel systems using a large sumber of complete
`PFOCESSOTS.
`SIMD machines can be-used for “local” processing of seg-
`ments of images in parallel. For example, the image can be
`segmenied and each processor assigned a scgment. Then fal-
`lowing the same set of instructions, sucttasks as line thinning,
`threshold dependent operations, and gap Alling can be done
`in-paralel for all segments of the image simultaneously. Also
`in SIMD mode, matrix arithmetic used for such tasks as sta-
`tistical pattern recognition can be done efficiently, MIMD
`machifies can be-used io perform different “global” pattern
`recognition tasks in parallel, using multiple copies af the image
`or-one-or more shared copies, For example, in cases where the
`goal is to locate two ar more distinct objects in an image, cach
`object canbe assigned a processor or set of processors to search
`fort. An SIMD/ MIMD application might involve using the
`same set of microprocessors for preprocessing an image in
`SIMD made and then doing a. pattern recognition task in
`MIMD made.
`is a
`PASM, a patiitionable SIMD/MIMD rnachine,
`large-scale dynamically reconfigurable multimicropracesser
`sysiem [43], [46]~[48]. lt is a special purpose sysiem being
`designed te exploit the parallelism of image processing and
`patiern recognition tasks. locan also be applied to related areas
`such as speech processing and biomedical signal processing.
`in this paper the architecture of PASM is presented and ex-
`amples of its use in performing image processing tasks are
`given.
`Camputer system designers have been considering various
`multimicracomputer architectures, such as [9], Pii, (234,
`{263, [27}, [34], £94], [58], and [$9]). PASM combines the
`following features:
`1} itcan be partitioned to operate as many independent
`SIMD and/or MIMD machines of varying-sizes, and
`2} itis being developed using a variety of problems in image
`processing and pattern recognition to guide the design
`choices.
`The purpose.of PASM is to serve as a vehicle for experi-
`menting with parallel image processing and paticrn recogni-
`tion. Reis not meantic be a production-Hne machine, butrather
`aresearch ioal, The design attempis to incorporate the needed
`flexibility for studying large-scale SIMD and MIMD narai-
`iclism, while keeping system casts “reasonable.”
`in this paperthe overall organization of PASMis presented.
`In:particular, the control structure and secondary storage
`sysiems are.described, and some application examples are
`given. The purpase here is to present design concepts that will
`allow.a system to exploit the paraliclism that, for example,
`1024 processors can provide. Implementation details are
`currently under study and are beyond the scone and length of
`this paper.
`Fig. lisa block diagram of the basic components of PASM,
`The parallel computation unit (PCL) contains N = 2" pro-
`
`
`
`QYSTER
`
`RERES
`SQAEndy
`
`
`SANTRM
`
`SBQRASE
`
`BNET
`STARARE
`. SYOTES
`
`
`
`
`
`SENGRS
`
`
`RELRV
`MARASES
`BRRALLES
`
`
`SUMATRAT
`DONTROL~
`tnt
`BRET
`
`
`RERS
`RYSEER
`
`
`
`
`Fig. 3.
`Block disgram overview of PASH
`cesers,VW memory modules, and an interconnection network.
`The PCU processors are microprocessors that perforny the
`actual SIMD and MIMD computations. The PCL memory
`modudes are used by the PCU processors for data storage in
`SIMD made and both data and instruction storage in MIMD
`made. The interconnection network provides a means of
`communication among the PCH processors and memory
`modules, Two passible ways to organize the PCU and different
`types of partitionable networks which can be used are described
`in Section LL.
`The mucrocontrallers (AC's) are 3 set of microprocessors
`which act as the control units for the PCU processors in SIMD
`mode and orchestrate the activities of the PCU processars in
`MIMD made. There are O = 24 MC's. Bach MC controls
`NiO PCUprocessors. A virtual SIMD machine (partition) of
`size RA/O, where R= 2? andQ sr Sg,is obtained byloading
`R MC memory modules with the same instructions simulta-
`neously. Similarly, a virtual MIMD machineof size RA/Q is
`obtained by combining the efforts of the PCE! processars of
`R MC's, @ is therefore the maximum number of partitions
`allowable, and A/O is the size of the senallest partition. Possible
`values forWY and @ are 1024 and 16, respectively. Control
`Sorage contains the programs for the MC's. The MC’s are
`discussed in more detail in Section Hi.
`The memory storage system provides secondary storage
`space for the data files in SIMD mode, and for the data and
`program files in MIMD» mode. The memory management
`Spatem controls the transferring of files between the memory
`storage system and the POU memory modules. t employs a
`seta cooperating dedicated microprocessors. Multiple storage
`devices are used in the memorystorage system io allow parallel
`data transfers. The secondary memorysystem is described in
`Seetion TV,
`:
`The spstem_ control unit is 3 conventional machine, such as
`a PISP-11, and is responsible for the overall coordination of the
`activities of the other components of PASM. Examplesof the
`tasks the system control uni: will performinclude program
`development, job scheduling, and coordination of the loading
`of the PCL memory modules from the memory storage system
`with the loading of the MC memory modules-from control
`storage. By carefully choasing which tasks should be assigned
`to the system control unit and which should. be assigned to
`other system componenis, the sysiem contro) unit can work
`effectively and not become a bottleneck. Examples ofthis in-
`clude using the MC’s to act as the control units for virtual
`SIMD machines, controlling the interconnection network with
`routing tags generated by each PCL) processor, and haying the
`memory management sysicm supervise primary/secondary
`memory transfers,
`
`Reexamination Requester Exhibit 6
`IPT Ex-20385 bo6S 14
`LG v IPT
`IPR2023-00104
`
`
`
`IPT Ex-2012, p. 0002
`LG v IPT
`IPR2023-00104
`
`

`

`S36
`
`IEEE TRASSACTIGNS GN COMPUTERS, YOR. ¢-30, NO
`
`P2. BECEMEER T9St
`
`rarnnnnnannnnnnyaa
`eo
`:
`i
`i
`j
`
`RT
`
`ry
`ye:
`PRALESINE ELEMENT O
`yy
`oars
`‘
`\
`REM, GA
`KIOBR™ x
`nis ge
`PFRER. EP
`i
`eotetene
`:
`rit
`.
`ree
`PROCESS{NER ELENER
`ig
`“yy
`is
`RERES
`|
`He
`=
`Senet
`ig
`KER.TS
`
`REN. HA
`
`S
`
`
`
`&2r
`
`exe
`
`
`
`PERGAYMAKRCEATNTSratew
`
`AAARAEnmemendes
`
`Fig. 2.
`
` REGHREDTUON£%.
`
`
`
`txt
`
`
`Hig. 3.
`
`Processar-to-memory configuration af the parallel computation
`wan,
`
`address to the memory and once for iransferring the data. (The
`ume required for controlling the network is omitted since
`control methods vary.)
`For the -PE-to-PE configuration the time. required for a
`memory reference, Tippy, depends onthe location of the memory
`which js to be used. If the memory is focal, then
`
`Tpe = Tone
`
`(2}
`
`Ifthe memory is connected to seme other processor, then
`iD
`Ppp = Tin + Tae + Die
`Troe TOpresents the time required for the PE which has the data
`item to recognize and service the daia request. This mayre-
`quire.a significantly longer delay than Tig; [fp isthe proba-
`bility of a local memory reference, then (2) and (3) can be
`combined to give the expected memory reference time
`ElTpel = pTne HU PRTint Tart Tinh
`
`4)
`
`Cormparing (13 and (4)
`
`Tpiy, 2 ELTpp]
`
`(3
`
`for. p sufficiently large. Thus, the “best” configuration is task
`dependent.
`When operating in SIMD mode with the PE-tc-PE-con-
`figuration, ii is often possible to omit one occurrence of Fy, in
`(Shand reduce Tito Fin, This is done by computing the ad-
`
`Reexamination. Requester Exhibit 6
`IPT Ex-20183 dood 14
`LG vIPT
`IPR2023-00104
`
`Together, Sections UH, UE, and [VY present the overall ar-
`chitecture of PASM. Particular attention is paid to the ways
`in which the control structure and secondary memory scheme
`allow PASM to be efficiently partitioned inta independent
`virtual machines. Variations in the design of PASM's PCU
`which sil] support these control and secondary memory ideas
`are examined. This cxamination demonsirates how the con-
`cepis underlying PASMcan be used in the design of different
`systems.
`in. Section V image processing algorithms using PASMare
`presented. In particular, smoothing and histogram calculations
`are cxamined. Using these examples, the potential improve-
`monta system such as PASM can provide over serial machines
`is demonstraicd.
`
`I PARALLEL ComPporation Unit
`
`Al PCYOrganization
`
`The paraliel computation unit (PCU) coniains processors,
`memories, and an interconnestion network. One configuration
`of these components is Lo connect aamemory module to cach
`processor to form a processar-memary pair called 4 processing
`element( PE} {e.g., iacl¥ [1O)). The interconnection net-
`work is used for communications among PE’s, This °P#-10-
`PE” configuration is shown in Fig. 2. A pairof memoryunits
`isnsed for each memory module. This double-buffering scheme
`allows data to-be moved between one memory-unit and sec-
`ondary storage Ghe memorystorage system) while the pro-
`cessor dperates on daiain the ather. memory uni. in the PE-
`to-PE configuration, “local” memoryreferences are relatively
`fast, however, the transfer oflarge blocks.of data from PE io
`PE is delayed by the memoryfetching and storing which musi
`be done.
`The “P-io-A9" (processor-to-memary} configuration, shown
`in Pig. 3, uses the interconnection network to commect the
`processors to the memory modules. Again, double-bulfering
`isemployed. There is no “local” memory. To fetch an operand
`fromamemory, the processor must firsi send the address of the
`operand through the interconnection nctwork toa memory.
`Then the processor receives the operand from the memoryvia
`the Interconnection network. Advantages of the P-to-M con-
`figuration arethat a memory connected toa processor can be
`reconnected io another processor, effectively transferring the
`entire contents of the memory fram one processor to another,
`and that the number of memories does not have to be equal to
`the numberof processors (c.g.. BSP [12)). A disadvantage is
`that all memory references must go through the interconnec-
`ion network.
`A more detailed analysis reveals some of the tradeoffs in-
`volved in these two configurations. If Tq, is the time required
`for a memory accessicither a read or a write}. and Tj, is the
`time tosend.a.data Rem-through the interconnection network,
`then the time required for a memory reference in the P-to-M
`configuration. Tips, is given by
`
`Tp. MO Pin Pi bP.
`
`i}
`
`Fi, must be included twice, once for the processor tosend the
`
`IPT Ex-2012, p. 0003
`LG v IPT
`IPR2023-00104
`
`

`

`SPEGEL. er af. PASM
`
`937
`
`dress ofthe desired data in the processor connected to the
`memory to be accessed (c.g., see the algorithms in Section
`V-B), Thus, (4} reduces to
`
`ELTpg] = PPab oompT Fig).
`
`{6}
`
`Therefore. when operating in SIMD mode the PE-to PE
`configuration is preferable.
`When operating in MIMD mode, the PE-to-PE confizgu-
`ration requires that two processors be involved in every non-
`local memory reference. The efforts of two processars involved
`ina data transfer can be coordinated by having the processor
`which initiates the transfer interrupt the other processor or by
`dedicating one of hese processors to handling data transfers.
`In the P-to- M configuration the memories are shared by the
`pracessors, Le. more ihan-one processor can sccess the sarne
`memoryfor either data or instructions. However, for the image
`processing tasks thai have been examined, most daia and in-
`structions can be stored inthe local memory, reducing the
`impact of this consideration.
`The PE-t+ PE configuration will be used in PASM. De-
`pending on the application for which a different partitionable
`SIMD/ MIMD system is intended, the P-to-M canfiguration
`may be preferable. The interconnection networks, control
`structure, and secondary memory system described below can
`be.used in conjunction with either.
`
`B.
`
`interconnection Netvorks
`
`Two types of multisiage interconnection networks are being
`considered for PASM: the Generalized Cube and the Aug-
`mented Data Manipulator (ADM). Both of these nciwarks
`have the following properties: 1} a logical structure consisting
`of a stages of A'/2 (Cube} or N CADM}) switches [49]; 2}
`distribuied contral- by routing tags generated by cach processar
`[25], [28]; 3) the ability te operate in SIMD mode CY simul-
`taneous transfers} and MIMD mode [44], [45]: and 4) the
`ability to be parthioned into independent subnetworks (42),
`As discussed'in [49], the Cube network is equivalent or directly
`relatedto other networks in the Hterature (¢.g., 51, (61, [24],
`[33], and [34]). The ADM fs an extension.of the data ma-
`nipulator [16].
`Both the Cube and ADM networks can be implemented as
`single stage-networks [38], [40], [41] instead of as multistage
`networks, These single stage implementations can also be
`partitioned [42], are compatible with the PASMcontrol and
`memory Management-schemes,and may be appropriate for
`MSIMD systems, depending on the-intended applications.
`However, since there is only a single stage of switches, for the
`MIMD mode of operation intermediate processors may have
`to be Interrupted to transfer data. (See [40] for more infor-
`mation.) Thus, single stage aeiworks are nol appropriate for
`PASM, bat might be for an MSIMD system based on the
`design concepts of PASM,
`The tradeoffs between the Cube and ADM multistage
`networks for PASM are currently under study. The ADM
`network is more flexible and fault tolerant 140), [44], but is
`more complex, The Cube may. be more cost effective and suf-
`ficient for the system's needs. In the following sections it will
`be assumed.thai the processors. will be partitioned. such that
`
`the addresses of all of the processors in a partion agree it their
`low-order bit positions. This constraint will allow either the
`Cube or ADM network to be used as the partitionable imter-
`connection network in PASM. Details of these networks are
`beyond the scope of this paper, and readers are referred tothe
`references indicated.
`
`C. PCUProcessors
`
`The PCU processors will be specially designed for parailel
`image processing. Simulation studies are currently being
`conducted-ic aid in determining an efficient instruction set.
`A- PASM prototype will most likely be based on useramicro-
`programmable components to obtain needed flexibility, while
`the final system will employ custom VLSP processors.
`
`Hi. MICROCONTROLLERS
`
`A.
`
`lutroduction
`
`in this section the micracontraflers (MC's) are discussed.
`itis the MC's that enable PASM to operate a3 an MSEMD
`system.
`in general, the possible advantages of an MSIMD system
`over an SIMD system with a similar number of PB’s include
`the following.
`{) Fault Detection: For situations where high reliability
`is needed, three partitions can run the same pragram onthe
`same data and compare results.
`2) Fault Talerance: Ifa single PE fails, only those logical
`SIMD machines (partitions) which must include the failed PE
`need t¢ be disabled. The rest of the system can cominuc to
`function.
`3} Multiple Simutianeeus Users: Since there can be
`multiple independent virtual SIMD machines, there can be
`multiple simultaneous users of the system, cach executing a
`different SIMD program.
`$) Program Development: Rather than trying to debug an
`SIMD program on, for example, 1024 PB’s, itcan be debugged
`ona smaller size virtual SIMD machine of 32 or 64 PE’s.
`5} Variahle Machine Size for Efficiency: Wf a task requires
`only A'/2 of ¥ available PE’s, the other A’/2 can be used-for
`another task.
`6) Subtask Parallelism: Two independent SIMD subtasks
`that are part of the same job can be executed in parallel,
`sharing results if necessary.
`For PASM’s intended application as a toc] for experi-
`menting with the use of paraliclism in image processing and
`pattern recognition, points 2}~6) above will be significant.
`&. Microcontroller Organization and Addressing
`Conventions
`
`in order to have a partitionable system, some format mul-
`tuple control units must be provided. in PASMthis is done by
`having G@.= 2° MC's, physically addressed (numbered) from
`Oto — |. Each MC controls 8/O PCUprocessors, as shown
`in Fig. 4.
`An MC is a microprocessor which is attached toa memory
`module. Each memory module consists of a pair of memory
`units so that memoryloading and computations can be qver-
`lapped. This double-buffering reduces the likelihood of MC's
`
`Reexamination Requester Exhibit.6
`IPT Ex-20138% dot 14
`LG vIPT
`IPR2023-00104
`
`IPT Ex-2012, p. 0004
`LG v IPT
`IPR2023-00104
`
`

`

`JEEE TRANSACTIONS Of COMPUTERS, VOR. 30, NG.
`
`12, DECEMBER 19913
`
`FROM SYSTEM CONTROL UNIT
`AND CONTROL STORAGE
`
`
`PROE. 8
`PRO. gow ~ s
`ae ESR
`
`PROC. Nef ~
`PROD.
`F
`ARSE. Qed
`
`PROD, Heget
`
`BROC. +t
`PROC.
`IQs}
`
`SO MER.
`ow
`
`
`Rts
` OP SHORRe
`3b
`3x.
`
`
`go NES,
`o APADS Qh}
`Vis
`
`
`Fig. 4. PASMmiceacontroflers (MC's),
`
`being idle while waiting for a program to beloaded. In SIMD
`mode, each MC feiches instructions from its memory module,
`executing the control flow instructions (c.g, branches} and
`broadcasting the data processing instructions to its POU
`processors. The physical addresses-of the A/Q pracessors which
`are connected to an MC must all have the same low-order ¢
`bits so that the network can be paftitioned, The valuc.of these
`low-order g bis is the physical address of the MC. A-virtual
`SIMD machine of size RA/Q, where R = 2° and 0 Sr Sg,
`is obtained by loading & MiC’s with the samednstructions and
`synchronizing the MC's. The physical addresses of these MC's
`must have the same low-order a — r biis.se that all of the PCU
`processors in the partition have the same low-order g ~ r
`physical address bits. Similarly, a virtual MIMD machine of
`size. RAV/G is obtained-by combining the efforts-of the PCU
`PE's associated with R MC's which have the same low-order
`go physical address-bits. in. MIMD made the MC's maybe
`usedto- helo coordinate the activities of their PCO PR's,
`The approach of permanentlyassigning a fixed number of
`POU PE’s to each MC has several advantages over allowing
`a varying assignment, such as in (TT } and [31]. One advantage
`is thatthe operating system mead only schedule (and menitor
`the “busy” status of} O MC’s, rather than VW PCL PE’s. When
`@ = 16 (or even 32} and A = 1024, Uns is a substantial savings.
`Another advantage is that no crassbar switch is aceded for
`connecting processors and-control units (as proposed in [11]
`and [31}). Also, this fixed connection scheme allows theeffi-
`cient use of multiple secondary storage devices, as is discussed
`in Section 1V-B. The main disadvantage of this approach is
`that cach virtual machine size must be a power of two, with a
`minimumvalue ofV/QG. However, for PASM’s intended ex-
`perimental environment, Mlezibiliry at reasonable cost is the
`goal, not maximum processor utilization.
`This. basic MC organization can be enhanced to allow the
`sharing of memory modules by the MCs in a partition. The
`MC's can be connected by.a shared reconfigurable (’shorta-
`ble} bus such as described in [2]as shown in Fig. 5. The MC's
`must beordered on ihe bus in-terms of the bit reverse of their
`addresses due io the partitioning rules. (The sharing of
`
`NO NEMORY NDOULES
`
`a o STHRGUEN
`
`hynny
`
`Fig. 5. Reconfigurable shared bus scheme for interconnecting microcon-
`iralier CMC} processors and MCmemory modules, for @ S38. Each box
`ean beset io “through” or “short”
`
`memories using a reconfigurable bus is employed in [22}.) This
`enhanced MC.connection scheme.would provide more program
`space for jobs using multiple MCs and would alse provide a
`degree of fault tolerance. since known-faulty MC memory
`modules could be ignored. These advantages come atthe ex-
`pense of additional system complexity, and the inclusionofthe
`enhanced scheme.in PASM will depend on cost constraints at
`implementation lime.
`:
`iseach partition the PCU processors and mernory madules
`are assigned fogical addresses. Given a virtual machine of size
`RAVG. the processors and memory modules for this partition
`have logical addresses (numbers} 0 to (RASO} ~ ER = 2°,
`0.47 8 q. Assuming that ihc MC's have been assigned as
`described above, then the logical numvber-of a processor-or
`memory module is the high-order, + » - g bits-of the physical
`number. Similarly, the MC's assigned to the purtition.are
`logically numbered {addressed} from Gio R — 1. For R.> f,
`the logical number of an MC is the high-order
`bits of is
`physical number. The PASMlanguage compilers.and aper-
`ating system will be used ia convert fromlogical to physical
`addresses, 30 a system user will deal only with logical ad-
`dresses.
`
`C. Cammunications with the Sysien Contral Unit
`
`SUMP programs are stored in control ciorage (see Fig. 1},
`whichis the secandary storage for the MC's. The loading of
`these programs from control siorage into the MC memory
`units is controlled by the sysiem. control unit, When large
`SIMD jobs are run, that is, jobs which require more than AYO
`processors, more than one MC oxecutes the same sebof in-
`structions. With the basic scheme.in the previous section, each
`MC has itsown memory, so if more than one MC isda be used,
`ihen several memories must be loaded with the same sei of
`instructions.
`The fastest way to load several MC memories with the same
`sei of instructions is to load all of the memories al the same
`time. This can be accomplished by connecting the contral
`storage to all af ihe MC memory modules via a bus. Each
`memory module is either enabled or disabled for loading fram
`control storage, depending on-the contents of the G-bR AFC
`
`Reexamination Requester Exhibit 6
`IPT Ex-201288 gods §4
`LG vIPT
`IPR2023-00104
`
`IPT Ex-2012, p. 0005
`LG v IPT
`IPR2023-00104
`
`

`

`SIEGEL. of of, PASM
`
`939
`
`memory load register. The @-bit MC inemoryselect register
`selects for cach MC processor which memory unit of its
`memory module it should use for instructions. An enabled
`memory unit-not being. used by.an MC processor receives the
`data from control storage. Both of these registers are set by the
`system control unit.
`Fhe Q-bit ACstarus register.contains the go/done stains
`of the MC's. When the system cantrol.unilsets ihe th bile
`a“t,° MC j sets its program counter to zero.and begins exe-
`cuting the contents of the memory unit that is specified by the
`McC memory select register, When the MC is done, it resets
`its bit of the MC status register-to “OD” and sends.an interrupt
`ta the system control unitto inform ht that an MC has fine
`ished.
`Further details abaut the communications between the
`RMC’s and the system control unit are in [47].
`
`B. Communications Among Micracoatrolises
`instructions which examine the collective status of all of the
`PE'cof avirtual SMD machine include “ifany,” “if all," and
`“if none.” These instructions change ithe flow of controlof the
`program at execubion time depending on whether any oral
`processors in the SIMD machine satisfy some condition. For
`example, if cach PE is processing data from a different radar
`unit, bui all PE’s are looking for enemy planes, it is desirable
`for the control unk. to know 'fany” of the PR's has discovered
`a possible attack.
`The task of computing an “Hf any” type of statement that
`involves several MCs can be-handied using the existing
`hardware. This can be accomplished by having the PCU pro-
`cessars assaclated with these MCs use a recursive doubling
`P33} tyne of algorithm,
`Specialized hardware to handic this iask can be added in-
`expensively and will result in a faster solution than the soflware
`approach. Each MC will have a.g + 1 bit jod identification
`maonber UD) for the job it is running (there can be at most one
`job in each of the 2@ memory units}. The MC's will share an
`MC communication bus, consisting of an 1D bus and a data
`bus. When an “ifany” type instruction is encountered, each
`MC associated with the job determines the result of the “if
`any” type of statementfor ts N/Q PE’s and sends a request
`to use ihe communication bus. When an MO receives per-
`mission to use the bus, it broadcasts its job ED to all of the MC's
`(including itself) via the ID bus. fan MC is ruening the jab
`with the 1D which is on the 1D bus it then puts Hs local results
`onte the data bus. The data bus is one bit wide and will be
`constructed aga @ input “wired and” gate. This allows all of
`the MC's associated witha job to put thelr data on the bus si-
`mulancously and read the result. Rach MCserviced removes
`itsell fromthe queue. Phe MC's will then execute the condi-
`tional branch inthe commoninstruction stream. All MC's in
`3B partition will be synchronized, executing the “if any” type
`of instruction at the same time. None of the MC's will continue
`past thai instruction until after its job 1D bas appeared on the
`ID bus.
`The system control unitds not burdened underthis method.
`
`More details about
`147}.
`
`this communications scheme are in
`
`&. Enabling and Disabling PCU Processors
`In SIMD mode all of the active PCU PE’s will execute in-
`structions broadcast to them bytheir MC. A masking scheme
`is a method for determining which PCU PR's will be active at
`& given point in time. An SIMD machine may have several
`different masking schemes.
`The general masking scheme uses an N-bit vector to-de-
`termine which PCU PE’s ia activatc. PE? will be active if the
`ith bit.of the vector isa 1, for OQ 27 < NL A maskinsiruction
`is cxecuted whenever a change in the active siatusof the PE’s
`is required. The Hilac 1V, which has 84 processors.and 64-bit
`words, uses general masks (51). However, when 4 is larger,
`say 1024, a scheme.such as Unis becomes less appealing.
`The P& address masking scheme (281 uses an a-position
`mask tospecify which of the A’ PCL PE’s are to be activated.
`Each position of the mask corresponds to 4 bit position in the
`addresses of the PE’s. Each position of the mask will contain
`either a G, },or Y (DON'T CARE} and the only PR’s that will
`be active are those whose address maiches the mask: G.matches
`0, latches 1, and either Gor | matches A. Square brackets
`denote a mask. Superscripts-are used as repetition factors. Far
`example, MASK Uy? OF activates all even numbered PE’s;
`MASK [077OX] activates PE’s Gio 2? — 1.
`A negative PE address mask 139] is similar toa regular PE
`address mask, except that ii activates all those PR's which do
`mot match the mask. Negative PE address maske are prefixed
`with a minus sign to distinguish them from regular PE address
`masks. For example, for VY = 8, MASK [-G84] activatesall
`PE's except Gand 1, Thistype of mask can activate sets of PE's
`a single regular PE addreas mask cannot.
`Like general masks, PE address masks are specified in the
`SIMD program. PE address masks are more restictive than
`genera bmasks, in

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket