`Baldwin
`
`[54] SYSTEM AND METHOD USING
`DOUBLE-BUFFER PREVIEW MODE
`[75] Inventor:
`David R. Baldwin, Weybridge,
`United Kingdom
`[73] Assignee: DuPont Pixel Systems Limited,
`London, United Kingdom
`[21] Appl. No.: 925,238
`[22] Filed:
`Jul. 31, 1992
`
`[63)
`
`Related U.S. Application Data
`Continuation of Ser. No. 326,781, Mar. 21, 1989, aban
`doned.
`Foreign Application Priority Data
`[30]
`Mar. 23, 1988 (GB) United Kingdom ................. 8806850
`Mar. 23, 1988 [GB] United Kingdom ................. 8806856
`Mar. 23, 1988 [GB} United Kingdom ....
`. 8806864
`Mar. 23, 1988 [GB] United Kingdom ................. 8806865
`[51] Int. Cl." .............................................. G06F 12/06
`[52] U.S. Cl. .................................... 395/425; 395/800;
`364/DIG. 1; 364/228.1; 364/238.4; 364/242.6;
`364/242.91; 364/244; 364/244.8; 364/245.5;
`364/245.7; 364/284; 364/284.1
`[58] Field of Search ................ 395/425, 250, 325, 725
`[56]
`References Cited
`U.S. PATENT DOCUMENTS
`3,623,017 11/1971 Lowell et al. ....................... 395/550
`4,149,242 4/1979 Pirz ..................................... 395/325
`4,172,287 10/1979 Kawabe et al. ..................... 364/736
`4,396,978 8/1983 Hammer et al. ...
`... 364/200
`4,443,846 4/1984 Adcock ..............
`... 364/200
`4,495,567 1/1985 Treen ..........
`. 364/200
`... 395/550
`4,633,434 12/1986 Scheuneman ..
`... 395/375
`4,722,049 1/1988 Lahti ..............
`4,870,572 9/1989 Hosono et al. ...................... 364/200
`FOREIGN PATENT DOCUMENTS
`0186150 2/1986 European Pat. Off. .
`0085435 10/1986 European Pat. Off. .
`0284751 2/1988 European Pat. Off. .
`2162406A 6/1985 United Kingdom .
`WO86/07174 4/1986 World Int. Prop. O. .
`
`FP Port
`
`Mode
`
`|||||||||||||||||||||||||||||||||
`USOO5329630A
`5,329,630
`[11] Patent Number:
`[45] Date of Patent:
`Jul. 12, 1994
`
`OTHER PUBLICATIONS
`David C. Wyland, “Dual-Port, Rams Simplify Com
`munication in Computer Systems,” Integrated Device
`Technology, Inc., 1986.
`Bureaux D’Etudes Automatishmes, No. 32, Mar. 1987,
`pp. 85–87; J. Gustafson: Un super-ordinateur vectoriel
`homogene, p. 85, figure; p. 85, left-hand column. line
`35—p. 87, middle column, line 9.
`Conference Proceedings IEEE Southeastcon '87,
`Tampa, Fla., Apr., 1987, vol. 1, pp. 225–228; M. C.
`Ertem: A reconfigurable co-processor for microprocessor
`systems, FIGS. 2–4; p. 226, left-hand column, line 6–p.
`227, left hand column, line 25.
`(List continued on next page.)
`Primary Examiner—Paul V. Kulik
`Attorney, Agent, or Firm—Robert Groover
`[57]
`ABSTRACT
`A novel double buffering subsystem, wherein a dual
`port memory is partitioned in software so that the top
`half of the memory is allocated to one processor, and
`the bottom half to the other. (This allocation is switched
`when both processors set respective flag bits indicating
`that they are ready to switch.) On accesses to this mem
`ory, additional bits tag the access as “physical,” “logi
`cal,” or “preview.” A physical access is interpreted as a
`literal address within the full memory, and the double
`buffering is ignored. A logical access is supplemented
`by an additional address bit, determined by the double
`buffering switch state. A preview access is used for read
`access only, and goes to the opposite bank of memory
`from that which would be accessed in a logical access.
`This double-buffer architecture is advantageously used,
`in a multiprocessor system, at the interface between a
`numeric processor and a cache bus. The preview access
`can help to avoid data flow inefficiencies at synchroni
`zation points in pipelined algorithms.
`
`19 Claims, 59 Drawing Sheets
`A1:A5
`Addr
`
`CP Port
`
`Mode
`
`Access Mode
`00 - Physical
`01 - Logical
`10 - Preview
`
`
`
`SWAP
`from FP
`
`SWAP
`from CP
`
`
`
`5,329,630
`Page 2
`
`OTHER PUBLICATIONS
`
`Proceedings of the Fourth Euromicro Symposium on
`Microprocessing and Microprogramming, Munich,
`Oct. 1978, pp. 358-365; F. B. Jorgensen et al.: A Bi-mi
`croprocessor implementation of a variable topology multi
`processor node, FIGS. 1–6, p. 358, right-hand column,
`line 13—p. 362, right-hand column, line 21.
`G. J. Myers: Digital system design with LSI bit-slice logic,
`1980, pp. 230–239, John Wiley & Sons, Inc., US p. 237,
`lines 1–4.
`W. Lichtenstein, “The Architecture of the Culler”,
`Mar. 1986, IEEE Coupon Spring 86, pp. 467–470.
`Proceedings of the IEEE, vol. 73, No. 5, May 1985, pp.
`
`852–873, IEEE, New York; J. Allen: “Computer archi
`tecture for digital signal processing”.
`Computer Design, vol. 16, No. 6, Jun. 1977, pp. 151–163;
`A. J. Weissberger: “Analysis of multiple-microproces
`sor system architectures”, FIGS. 7,8, p. 161.
`IEEE Electro, vol. 8, Apr. 1983, pp. 3/31–5, New York;
`B. J. New: “Address generation in signal/array proces
`sors”.
`Proceedings ICASSP, Dallas, Apr. 6th–9th, 1987, vol. 1,
`pp. 531–534; D. M. Taylor, et al.; “A novel VLSI digi
`tal signal processor architecture for high-speed vector
`and transform operations”.
`IBM Technical Disclosure Bulletin, vol. 27, No. 4A, Sep.
`1984, pp. 2184—2186, New York; J. P. Beraud et al.:
`“Fast fourier transform calculating circuit”.
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 1 of 59
`
`5,329,630
`
`i :(#) #. P
`
`C M
`— 190
`
`D T P
`
`i120
`
`DTP
`MC I/F
`
`
`
`
`
`Data Cache
`Memory
`140
`
`<!-
`
`st
`É
`?º
`2
`©
`to
`O
`
`256
`
`FP 130
`
`|
`
`[-
`
`GIP
`I/F
`170
`
`Host
`I/F
`160
`
`ºf /F
`150
`
`Fig. 1
`
`
`
`2.,1y.I.HJ
`
`4991
`
`Sheet 2 of 59
`
`5,329,630
`
`:N
`
`3N.mo;
`
`mm
`
`<_‘N
`
`Nmm
`
`’
`
`.mwwmmo
`
`x0209.:an
`
`<~.9“.
`
`viamm
`
`565mm
`
`58550
`
`US. Patent
`
`E—Emmmmmmmmm.‘9mm235
`wocmwM.3mM29w
`
`Fxoo_o3892.2x020
`
`222928oaNEQm
`/gm
`ENdam
`
`
`
`n.rm
`
`
`
`m:.m—
`
`m8manhwocozcmmxcflm:mamDo
`
`335%
`
`25:8
`
`Bfimm
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 3 of 59
`
`5,329,630
`
`(ZE)
`
`uO??Onu?SulŒ|| || || || || ||
`
`- OSIWN
`
`
`
`
`
`
`
`§, | (c) boles | (*) logºs}}
`
`
`uol? puoo |Ja?s1608;
`
`• • • • ? ? ? ? ? • • • • • • • • • • • • • • • • • • • • • • • * * * * * * * ***
`
`• = = * * * * * *
`* * * * * * * * * *
`
`az '61-I
`
`
`
`Sheet 4 of 59
`
`5,329,630
`
`Eofcfimcoo:Em0y”
`,mw:mm
`
`mmwmmw
`
`FNm
`
`ICE.
`
`WEocmzcmmW
`
`3u
`
`«mm5m
`
`a.3080
`
`wla052t5ch>953.)—
`
`US. Patent
`:mmPE8.2%
`«mm83S5fin
`
`man595:.8
`
`m5
`
`Lmk
`Reg/buffer
`
`NNrmamDP
`
`Nm.
`
`S'SJn/Zero
`extend
`3.15
`
`hE
`
`FNF
`
`'92mm
`
`
`
`July 12, 1994
`
`Sheet 5 of 59
`
`5,329,630
`
`Sequencer
`
`TDBUS122
`
`Constant
`Field
`
`US. Patent
`
`u.
`
`:0
`
`Da
`
`n O(
`
`I
`.9
`
`2 E 9
`
`.
`
`
`
`U.S. Patent
`
`499121y1uTu
`July 12, 1994
`
`Sheet 6 of 59
`Sheet 6 of 59
`
`5,329,630
`5,329,630
`
`US. Patent
`
`
` Q$285265Ea:6:023arms@598areas3:023595.53n:0mm5:00955.<5595E..200980:86waste...
`
`
`
`-83J
`
`:8:05-m2mama—um-
`20:386-
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 7 of 59
`
`5,329,630
`
`
`
`
`
`
`
`Transfer
`clock
`geºlor
`
`Local transfer
`clocks
`
`
`
`CP
`Extension
`
`CD bus 112
`
`
`
`
`
`
`
`
`
`CD Bus Trans-
`CelVerS
`444
`
`Local CP
`extension
`registers
`
`Fig. 4A
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 8 of 59
`
`5,329,630
`
`
`
`144
`Cache bus
`
`420
`
`-— 434
`
`431
`-— 432
`
`433
`
`Fig. 4B
`
`
`
`49912lV.lnJ
`
`Sheet 9 of 59
`
`2322
`
`.mm
`
`5,329,630
`
`wDOOOm0=2
`
`
`oczsobam.bab.
`
`x09»=38“...mmmfiu<
`
`
`amwwwwwm$8.“.
`
`US. Patent
`
`
`
`U.S. Patent
`
`Sheet 10 of 59
`
`5,329,630
`5,329,630
`
`Amy6:808669.“.99:00
`
`Amv.268385E25
`
`
`
`
`
`E6.28m<208:g62:8xomfimESofizm
`
`US. Patent
`
`
`
`
`632.;$3“.2806800.:.92QOm___mmman:ommDE£80286022
`
`2‘:63aria9586.8.Ema@828sz2:82;
`99.o896vmfimhnnmumQ995ucm9.2%ch
`
`.8mm:mammamE$93.4xg63:8umcoficomE8922
`2V:av
`
`
`
`
`
`
`
`
`(6)
`
`
`
`
`
`US. Patent
`
`49912ly1nJ
`
`Sheet 11 of 59
`
`5,329,630
`
`v325mzomo
`
`mwm
`
`Ill—.IIIII
`
`NZ.was00
`
`.§.vFF2029.52
`
`nEb
`
`.7};+x22
`
`3%
`
`063
`
`mmm
`
`BEEN;
`
`060..
`
`ole
`
`mo
`
`Emcw;
`
`060..
`
`omm
`
`
`
`US. Patent
`
`49912,1v.1HJ
`
`Sheet 12 of 59
`
`5,329,630
`
`5..2.5«P
`
`NNFmamQ...
`
`<oomman53‘m=2>
`
`.mfimmmmPE-92.).
`605$8
`
`<20
`
`5:058
`
`a
`
`m:2>
`
`82.25
`
`boEwE
`
`am
`
`m>m_m
`
`$063
`
`Mad38%
`
`$923Emom
`
`88%
`
`Na
`
`
`
`25:25new<20
`
`
`
`BREE.ccmmifim
`
`.968
`
`mam
`
`8:92.00
`
`addDoom$238
`
`wasm=2>
`
`m=2>
`
`oma060.a2.95
`
`
`
`6.250MB.82
`
`052
`
`2598ms;
`
`88
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 13 of 59
`
`5,329,630
`
`TD bus
`122
`
`Fifo full 770
`
`Strobes 760
`*=º-
`
`DATA PIPE
`OUT
`730
`
`~/
`
`Emp
`
`...
`Fifo
`740
`
`780
`
`780
`
`Full
`
`Data
`
`DATA PIPE
`IN § 1
`720
`
`Empt
`
`Rd
`
`Fifo
`Y50
`
`DATA PIPE
`|N Nº 2
`71
`
`Full
`
`Data
`Wr
`
`Fig. 7
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 14 of 59
`
`5,329,630
`
`
`
`
`
`„BWASOMTILL TOE
`
`Ou?uOO±?<!---*o | „BOSONA
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 15 of 59
`
`5,329,630
`
`Cl O JE
`
`ClO
`
`
`
` ?| || SnC|VO| – Z? ? Snq
`
`Data Cache
`Memory
`140
`
`
`
`AA
`
`DP
`
`150
`
`Fig. 9A
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 16 of 59
`
`5,329,630
`
`
`
`
`
`
`
`
`
`Õ?6 ETIH HELSIÐBH
`
`WÕT5
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 17 of 59
`
`5,329,630
`
`ae@
`
`D- |
`
`DP
`
`150
`
`Cl
`CD ŽE
`
`
`
`?| || SnC|VO| –
`
`Data Cache
`
`
`
`FP
`
`Fig. 10
`
`
`
`U.S. Patent
`
`July 12, 1994
`July 12, 1994
`
`95f081teehS
`Sheet 18 of 59
`
`5,329,630
`5,329,630
`
`25:26ng
`
`
`
`IE
`
`5:03.65
`
`ucmmoooou
`
`3'?F63:8
`
`a3.3
`
`ÕST?
`
`
`
`
`
`
`
`
`022%3:35.moss.you
`a»::5
`
`US. Patent
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 19 of 59
`
`5,329,630
`
`Internal Bus 1250
`
`
`
`
`
`Comparator
`1230
`
`Fig. 12
`
`Bit Reverse
`1240
`
`1270
`
`
`
`5:58_._3u
`.98a::Emm»:mm8925.
`
`US. Patent
`Ewaoia
`
`{
`
`...mg.
`
`mmofigmm:=E=
`
`xofim22.500
`
`Sheet 20 of 59
`
`5,329,630
`
`$23BEBE.
`
`dvnwa8*3285
`
`062:tm:32.92miH_
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 21 of 59
`
`5,329,630
`
`
`
`N
`
`
`
`ø?qeua ?nd?nO = EO
`
`TES
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 22 of 59
`
`5,329,630
`
`
`
`
`
`
`
`TOETSELTOETSELTOETSEL
`| Nooperation | z I - |z|-|z|-
`|Eye Extend Enabel BT.7|Enable Etz Enable Bitz
`Bytezero Fi? Enable zero Enable zero Enable zero
`| Word Extend |ZT-Enable BºtsIEnableBitts
`| Wordzero Fi?
`Z || -
`Enable zero Enable zero
`
`Z = Hi Impedance
`
`Fig. 14B
`
`
`
`U.S. Patent
`
`5,329,630
`
`
`
`+----------
`
`L
`
`* = * * * = = m =
`
`* * * * * =
`
`s?Ë
`
`
`
`(ISOH uuou
`
`L---------
`
`§:
`
`| | | SnC, VO
`
`lº
`
`Z| | SnC, CO
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 24 of 59
`
`5,329,630
`
`Cache bus 144
`
`256
`
`DCM I/F 1620
`
`
`
`
`
`430A
`
`430D 433
`Reg File
`430
`
`32
`
`430B 430C
`
`
`
`
`
`
`
`440A 440B
`FMPY
`440
`
`
`
`
`
`450A 450B
`FALU
`450
`
`
`
`
`
`Fig. 16
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 25 of 59
`
`5,329,630
`
`
`
`(1:2)
`XnW
`
`T?s 8 T?s
`
`?pOW
`
`
`
`9VTQ TOESn
`
`ZL ‘61-I
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 26 of 59
`
`5,329,630
`
`ÕIGT
`
`
`
`gO;8 || SS3/ppW
`
`<
`O
`3
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 27 of 59
`
`5,329,630
`
`Jpp\/
`
`(Gviov)
`
`
`
`?u.Od lenC]
`
`Áuouaw
`
`SsauppV (
`
`OIGT
`
`dV/NWS
`
`
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 28 of 59
`
`5,329,630
`
`uppV/
`
`
`
`?pOWN
`
`dV/NWS
`
`d'O UUO]]
`
`
`
`
`
`epow SS000\/
`
`TES
`TBS uod dº
`
`
`
`?pOW
`
`OZ "fil
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 29 of 59
`
`5,329,630
`
`SENA
`SS0/ppv/
`
`XOOIO
`19?Sueu L
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`992
`
`?z "fil-l
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 30 of 59
`
`5,329,630
`
`LIVM d=]
`
`LIVNA CHO
`
`?UOC] d'O
`
`LIVM d'O
`LIVM d=]
`
`LIVNA d'O
`LIVNA d=]
`
`LIVNA d=]
`
`LIVWA CHO
`
`
`
`- LIVNA d'O
`LIVNA d=]
`
`LIVNA d-!
`
`LIVNA CHO
`
`9.UOC.] CHO
`
`LIVNA d-!
`
`LIVNA d'O
`
`ZZ "61-I
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 31 of 59
`
`5,329,630
`
`
`
`ÖZGZ
`
`
`
`
`
`
`
`
`
`£Z "fil
`
`
`
`U.S. Patent
`
`
`
`ÖFT Å HOWEW B'HOVO V LVC]
`
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 33 of 59
`
`5,329,630
`
`MUX
`530
`
`Memo
`º
`510
`
`Enable
`
`
`
`FP Write
`Mask
`Logic
`2510
`
`Fig. 25
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 34 of 59
`
`5,329,630
`
`
`
`
`
`ÕIGE
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 35 of 59
`
`5,329,630
`
`XIO C]
`
`ÁSng
`
`
`
`
`
`- - - - - - - - - - 4 - J --- - - - - - - - - - - - 0–
`
`
`peÐH ??BOJ SOM
`
`???JNA e?BO SOM
`
`ZZ "fil
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 36 of 59
`
`5,329,630
`
`
`
`CP 2 EXT WCS
`
`T
`
`CP 1 EXT WCS
`
`3.
`
`8Z "fil
`
`
`
`5,329,630
`
`a>552;
`
`280San
`
`9661a“.
`
`whemaom.
`
`dwum
`
`US. Patent
`
`.303322:9:
`8892:.3:5
`
`565%
`8:35
`
`a
`
`
`
`Sogoo..mtmm
`
`c.32Exam
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 38 of 59
`Sheet 38 of 59
`
`5,329,630
`5,329,630
`
`6mmgmm.mmmv
`
`
`
`M5555DwDOOZm
`
`:2em5
`
`.55qu_m>_._.<4mm«
`
`.3802322—2.0550$8222
`
`
`6:80Em6.;395of
`
`:20E
`
`:2:20E
`
`"fibwiofiasw32:5o:25
`3&52mcooammoooz:
`
`US. Patent
`
`3528:680w$5
`
`oSm:ofiEBE.
`
`
`
`$3522
`
`
`
`20:E9200
`
`053385a053mmuoocmzomfim5:680
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 39 of 59
`
`5,329,630
`
`
`
`
`
`WWWW,
`}}}}}} |\ |\ i
`ITIII || || ||
`
`O)
`
`
`
`
`
`/X\/\\ /X\/\\
`||||||||||||||||
`XXX
`C?. 3,
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 40 of 59
`
`5,329,630
`
`
`
`
`
`
`
`
`
`
`
`Read next Pixel from register file
`
`
`
`
`
`
`
`Add pixel to base address of the
`histogram table
`
`Load address register with
`histogram address
`
`Read pixel count into ALU input
`register
`
`Increment pixel count by one
`
`Write new pixel count into
`histogram table
`
`More pixels?
`
`
`
`Fig. 32
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 41 of 59
`
`5,329,630
`
`CP MICROCODE
`
`i
`
`FP MICROCODE
`
`Load FP start address reg
`with microcode address
`Wait loop from
`and start FP running. FS&
`& previous command
`Transfer first 8 elements
`; Nº,
`of array A to register file
`:
`Z/Y.
`
`Cºstart D
`
`Transfer first 8 elements
`of array B to register file
`
`:
`:
`
`Request register file
`Swap
`
`S
`
`ses
`Set CP done and request –SS3
`sº
`register file swap
`§4. T to No
`i sº
`-
`Do 8 calculations and
`leave result in register file
`
`Yes
`
`i
`
`No
`Transfer last 8 result
`elements from register
`file into arrary C
`C End D
`
`
`
`
`
`
`
`
`
`Set CP done and
`request register file
`SW3C
`
`Transfer 8 result
`elements from register
`file into arrary C
`
`Fig. 33
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 42 of 59
`
`5,329,630
`
`Ö?IF
`
`?GIF
`
`?GIF
`
`ÕGIF
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ve ‘61-I
`
`
`
`OG?
`
`O?IF
`
`oe3mem=2>
`
`ÖGT7
`
`
`
`
`
`
`
`Sheet 43 of 59
`
`5,329,630
`
`$92
`
`a399%
`
`U.S. Patent
`
`US. Patent
`
`
`8.928258.982585.8%8.9282
`
`
`_wo_._mE:z335252825832.mo_.mE:z
`84m3%9.4mqua
`
`
`
`U.S. Patent
`US. Patent
`
`July 12, 1994
`July 12, 1994
`
`Sheet 44 of 59
`Sheet 44 of 59
`
`5,329,630
`5,329,630
`
`m3
`
`920E
`1
`
`$39.52
`
`429282
`
`.06.?
`
`E8265
`
`means—2
`
`9.62m
`
`wwws.
`
`heumhgmoo<
`
`dwdm
`
`5031
`
`85:5
`
`
`
`RotmEsz9.5893
`
`:oszhochF
`
`9.59.0
`
`29.52%¥m0>>Qm
`
`
`
`
`
`825832
`
`885.82
`
`Hoszzz
`
`4.29282
`
`a3.1m
`
`.mo_5E:z
`
`885.82
`
`d9...“
`
`392w
`
`dNfim
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 45 of 59
`
`5,329,630
`
`
`
`OTOGTF
`(?Ae?S)
`
`EIVNA
`
`ST18
`
`SSeW
`
`ZE ‘61-I
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 46 of 59
`
`5,329,630
`
`
`
`
`
`
`
`
`
`
`
`• æ • • • • • • • • • • •O 19 STIS -------------|:||---------- O?9 WOCl STIS WOCl -------------
`
`
`
`WO
`
`
`
`
`
`• - • • • •OZZ SONA d'O -------------- 029 SOM d_1C] ---------
`
`
`
`
`
`
`
`
`
`• • • • • • • • -fiu|p|OH ------------------ 6u(pIOH ----------
`
`ºsô0HºsôæH
`
`
`
`
`
`d_LC] (8 d'Od_1C] '8 d'O
`
`[FOTEET):TÕT?TI
`
`
`
`
`
`- WO
`
`
`
`
`
`09? -|/|5|WA ---------------------------------------------
`
`
`
`
`
`
`
`
`
`? ? *-+ - - - - - - - -, -, -,OG? -]/I ?d|d eyeO -------------[??ET)------ 08! =|/| OWN d1C] '804 || -/| dl50 -------
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 47 of 59
`
`5,329,630
`
`
`
`TUIGETI
`DOETUIGBOE)
`
`
`
`|?=F===', ___0Z8€ TOE ----------
`
`
`
`g88 -61-I
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 48 of 59
`
`5,329,630
`
`
`
`
`
`'-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -p?91-J SS0/ppv/ 9S|B-?|
`
`6& -61-I
`
`
`
`US. Patent
`U.S. Patent
`
`July 12, 1994
`July 12, 1994
`
`
`
`Sheet 49 of 59
`Sheet 49 of 59
`
`5,329,630
`5,329,630
`
`2
`
`:
`
`x Oa
`
`->
`C)
`:
`Qi.)
`'o
`*C.
`O
`O
`—
`9
`.9
`.92
`>
`
`ooot
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 50 of 59
`
`5,329,630
`
`interrupt service routine
`
`Hold status
`flag copy
`
`
`
`interrupt routine
`
`
`
`
`
`
`
`INTERRUPT
`
`Generate test condition
`and store in sequencer
`flag
`
`Conditional Jump
`True
`path
`
`
`
`Return from
`interrupt and
`restore sequencer
`flag
`
`Fig. 40B
`
`
`
`U.S. Patent
`
`July 12, w -
`
`Sheet 51 of 59
`
`5,329,630
`
`C
`:-
`l
`à
`LL]
`>
`>
`
`Mass
`Memory
`4160
`
`Mass
`Storage
`4170
`
`Interface
`4180
`
`Host
`Computer
`4100
`
`
`
`
`
`Picture ProC
`
`4140
`
`o E
`So H
`* :
`2 :
`3 :
`g :
`E =
`.9 =
`0- i.
`
`Numeric Accel
`
`4150
`ei sã
`* E
`cr) :
`?º
`§
`92 -
`t
`:
`à E
`H
`C :
`
`?ºl- >
`
`s
`
`Numeric Accel
`
`4150
`e;
`cu -
`§:
`
`Fig. 41
`
`
`
`US. Patent
`
`July 12, 1994
`
`Sheet 52 of 59
`
`5,329,630
`
`Request register file swap
`
`Read A 0 and B 0
`
`Read A 1 and B 1
`
`Calculate R[0] = A[0]'Bl0]
`
`Calculate R1 = A1 '3 1
`
`Write R[0]
`
`Read AB and B 3
`Calculate R 2 = A 2 ‘B 2
`
`Set FP done and re swa
`
`ReadAO'andBO'
`Calculate Fl 3 = A 3 ‘B 3
`
`Write Fl{2]
`
`ReadA1'andBt'
`
`Calculate Fl 0 ' = A O "B 0'
`
`Write R[3]
`
`Set FP done and re swa
`
`Flead A[0]' and B[0]‘
`
`Write Rte]
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 53 of 59
`
`5,329,630
`
`i
`
`#. P
`
`i
`
`
`
`D T P
`
`i
`
`Data Cache
`
`GIP
`I/F
`
`--- | |;
`Memory 140 | 170
`
`<!
`
`St g
`º
`
`+
`O
`co
`O
`
`[…]
`
`Host
`WF
`160
`
`DP
`I/F
`
`150
`
`DCM Ext'n
`
`4310
`
`Fig. 43
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 54 of 59
`
`
`
`
`
`
`
`wyty "fil
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`5,329,630
`
`
`
`Gyv "fil-l
`
`Sheet 55 of 59
`
`EEEEE|É||Í||?||
`
`
`
`US. Patent
`U.S. Patent
`
`July 12, 1994
`July 12, 1994
`
`
`
`Sheet 56 of 59
`Sheet 56 of 59
`
`5,329,630
`
`5,329,630
`
`:
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 57 of 59
`
`5,329,630
`
`
`
`HI ?ST)
`
`Z ? ? Snq CJO
`
`
`
`Giv "fil
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 58 of 59
`
`5,329,630
`
`
`
`
`
`9v "fil
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 59 of 59
`
`5,329,630
`
`|
`|
`
`| |
`
`SU00||
`
`Zv "fil-l
`
`
`
`5
`
`This is a continuation of application Ser. No. 326,781,
`filed Mar. 21, 1989, now abandoned.
`PARTIAL WAIVER OF COPYRIGHT
`All of the material in this patent application is subject
`to copyright protection under the copyright laws of the
`United Kingdom,
`the United States, and of other
`countries. As of the first effective filing date of the
`present application, this material is protected as unpub-
`lished material.
`However, permission to copy this material is hereby
`granted to the extent that the copyright owner has no
`objection to the facsimile reproduction by anyone of the
`patent document or patent disclosure, as it appears in
`official patent file or records of the United Kingdom or
`any other country, but otherwise reserves all copyright
`rights whatsoever.
`BACKGROUND OF THE INVENTION
`
`The present invention relates to computer systems
`and subsystems, and to computer-based methods for
`data processing.
`HIGH~SPEED MULTIPROCESSOR
`ARCHITECTURES
`
`It has long been realized that the use of multiple pro-
`cessors operating in parallel might in principle be a very
`convenient way to achieve very high net throughput.
`Many such architectures have been preposed. How-
`ever, the actual realization of such architectures is very
`difficult. In particular, it is difficult to design an archi-
`tecture of this kind which will be versatile enough to
`satisfy a range of users and adapt to advances in tech-
`nology.
`Fully asynchronous multiprocessor architectures
`have been proposed, but it is generally recognized in the
`art that the problems of programming support in a mul-
`tiprocessor architecture have not nearly been solved.
`A very recent overview of some of the issues in-
`volved in multiprocessor systems may be found in Du-
`bois et al., “Synchronization, Coherence, and Event
`Ordering in Multiprocessors,” Computer magazine,
`February 1988, page 9, which is hereby incorporated by
`reference. A recently proposed multiprocessor archi-
`tecture for digital signal processing is described in Lang
`et a1., “An Optimum Parallel Architecture for High-
`Speed Real-Time Digital Signal Processing,” Computer
`magazine, February 1988, page 47, which is hereby
`incorporated by reference.
`INTER-PROCESSOR SYNCHRONIZATION
`
`Synchronization between processors is a continuing
`critical issne in a very wide variety of multiprocessor
`system. Often such inter-processor interfaces make use
`of “processor-waiting” or “processor-ready” status
`signals which can be set or cleared by either processor.
`(Such signals are commonly known as “semaphores.")
`INTER-PROCESSOR DATA ROUTING
`
`Two general concepts of allocating work among
`processors are pipelining and parallelism. “Pipelining”
`is generally used to refer to data routings where a single
`data set is successively operated on by more than one
`processor. Parallelism refers to data routings where
`different operations are concurrently performed by
`
`2
`separate processors. Of course, some algorithms can
`profit by pipelining or parallelism to a much greater
`degree than others.
`The speed of a pipeline is limited by its slowest stage.
`Moreover, the average efficiency of a pipelined system
`will be diluted by two overhead requirements: the pipe-
`line must be filled at the start of the operation, and must
`be emptied at the end of the operation. The impact of
`these overheads depends on ratio of the number of
`elements which must be passed through the pipeline in
`one run to the number of stages in the pipeline (referred
`to as the length of the pipeline). Thus, these overheads
`may be unimportant when the length of the pipeline is
`short, and the number of elements per run is fairly long.
`However, for a longer pipeline (or for shorter runs),
`these overheads can be an important factor in through-
`put.
`
`INTER-PROCESSOR DATA EXCHANGE
`
`Cache memory is a conventional way to increase the
`net throughput of computing systems. If a large fraction
`of memory accesses are expected to call on memory
`locations already in cache, then every read from cache
`can save an amount of time equal to the difference be-
`tween the cache access time and the main memory
`access time. Therefore, cache memory systems nor-
`mally attempt to maximize the bandwidth to the cache.
`MICROCODED ARCHITECTURES
`
`An extremely important tool for developing high-
`speed and/or flexible computer architectures is micro-
`coding. See J. Mack & J . Brick, Bit-Slice Microprocessor
`Design (1980), which is hereby incorporated by refer-
`ence. Microcoded architectures are not only extremely
`flexible, but also have the potential
`to provide ex—
`tremely high speed.
`In microcoded architectures the individual instruc-
`tions are fairly long (e.g. 100 bits or so). Some fairly
`
`1
`
`SYSTEM AND METHOD USING
`DOUBLE-BUFFER PREVIEW MODE
`
`5,329,630
`
`low-level logic decodes the instructions, so that appro-
`
`The interface between two processors in a multipro-
`cessor system often requires that data be passed back
`and forth rapidly. Double buffering is a commonly used
`technique to permit data transfer, without hangups, loss
`of data synchronization, or data access collisions. Nor-
`mally the memory space to be shared is divided into two
`physical memories, and the accesses are arbitrated in
`hardware so that, on any one cycle, each processor can
`access only half the memory space (i.e. one of the physi-
`cal memories).
`FIG. 18 shows one example of a prior arrangement
`for double buffering. The port select logic 1810 pro-
`vides select signals to data buffers 1860, so that the two
`data busses 1850A and 1850B (from the sides of the
`double buffer) are connected to either the first or sec-
`ond memory 1820. The port select logic 1810 also pro-
`vides select signals to address multiplexers 1830, so that
`the two address busses 1840A and 1840B are connected
`to access either the first or second memory 1820.
`FIG. 19 shows another example of a prior arrange-
`ment for software-controlled double buffering. The
`port select logic 1910 provides select signals directly to
`the most significant address bit A6 of a dual port mem-
`ory 1920. Thus, each port sees only half of the physical
`address space, but the double buffering can be quite
`transparent.
`CACHE MEMORY ARCHITECTURES
`
`
`
`5
`
`15
`
`20
`
`5,329,630
`4
`3
`has much simpler timing requirements than an inter
`priate fields are sent to low-level devices (such as regis
`leaved memory architecture would. (However, a large
`ter files, adders, etc.).
`percentage of non-sequential accesses will ultimately
`The total number of bits in the instruction field will
`reduce the bandwidth to that of a normal single-width
`typically be very much larger than the log2 of the total
`architecture.)
`number of instructions. This permits the decode opera
`This memory architecture also has advantages in a
`tion to be made very much simpler. Microcoded archi
`multi-port situation where some or all of the ports have
`tectures commonly use a sequencer to perform address
`a much lower bandwidth than the memory itself. In
`calculations and perform a first level of decode. (Alter
`these cases there will be some intermediate storage
`natively, a lower level of logic can be used to perform
`(normally registers) to capture the data for later access
`the program sequencing function.) The sequencer ac
`10
`ing over several cycles by the recipient. While such
`cesses microinstructions from a control store (memory),
`time-multiplexed accesses are in progress, there is no
`and various portions of the microinstructions are pro
`demand on the memory system for bandwidth.
`vided to additional decode logic, and/or applied di
`rectly to devices. Since a single instruction can contain
`In the preferred embodiment there are also some
`significant novelties in the interface logic which con
`many command fields (all of which will be executed
`simultaneously), it is possible to write surprisingly short
`trols the data interface to the cache from the numeric
`processor. These features will be discussed in greater
`microcode programs.
`Since the individual instructions are quite low-level,
`detail below.
`-
`and fairly long, the total program storage required can
`A feature which helps to maximize the throughput of
`be quite significant. The data transfer requirements for
`the transfers in the transitional clock domain is a dou
`loading a microcode routine can be significant.
`ble-word interface on only one side of the fast register
`file. That is, the register file appears, on the cache mem
`SUMMARY OF THE INVENTION
`ory side, as if it were 64 bits wide. However, on the
`The present application provides a large number of
`FPU side it only appears to be 32 bits wide. This results
`innovative teachings, which will be described in the
`25
`in some odd/even structure in the word addresses, but
`general context of a system like that shown in FIG. 1.
`possible problem due to this odd/even structure are
`Among the innovative teachings set forth herein is a
`avoided by several innovative features. Since these
`multiprocessor numeric processing subsystem wherein
`problem can be avoided, the double-word interface
`an extremely wide local bus connects the arithmetic
`provides substantial advantages in the bandwidth of the
`calculation subunit to a large data cache memory. This
`register file interface.
`cache is multiported, so that newly retrieved data can
`Some significant advantages are also derived from
`be written into the cache at essentially the same time
`the preferred scheme for arbitrating access of the con
`that data transfer is occurring between the numeric
`trol processor and data-transfer processor to the cache
`processing subunit and the cache.
`memory. In the presently preferred embodiment, the
`To get a very high memory bandwidth, there are
`cache is physically dual-ported, but it is used as if it
`only three basic strategies:
`were triported.
`1. Use very fast memory devices: The problem here is
`The data cache memory is triported between the
`one of economics and size. Very fast memory de
`control processor module, the data-transfer processor
`vices are very expensive, sometimes as much as ten
`module, and the numeric processor module(s), so some
`times the cost of the slower counterparts, and the
`form of arbitration is necessary to control access. The
`number of storage bits per device is more limited.
`control processor generates addresses and controls the
`The major advantage of this technique is that the
`routing of data for itself and the floating-point proces
`bandwidth improvement is independent of the data
`sor(s) under program control so the control processor
`layout in memory (assuming that the address gen
`and floating-point processor access are mutually exclu
`erator is fast enough).
`45
`sive. The data-transfer processor, however, is totally
`2. Use interleaved memories: Interleaved memories
`autonomous and can compete for access at any time.
`have traditionally been used with dynamic RAMs
`In the presently preferred embodiment, the arbitra
`(DRAMs), where the cycle times have been longer
`tion is such that the control processor/floating-point
`than the access times. In this context, a significant
`processor has access whenever it wishes, and the data
`advantage can be gained by interleaving two or
`transfer processor makes use of any unused access cy
`more banks and offsetting the timing between
`cles. To make use of the unused cycles, the data-transfer
`banks. The problem with this technique occurs
`processor includes extra hardware which will allow it
`when successive accesses keep hitting the same
`to use a single free cycle amongst many busy ones.
`bank, or accesses through another port (in a multi
`The control processor and data-transfer processor are
`port memory)) disturbs the sequential accessing of
`55
`preferably autonomous but synchronized. This is ac
`banks. This technique can be used with static mem
`complished by letting them share a common microcode
`ories (SRAMs), but the equal access and cycle
`clock. This synchrony simplifies the arbitration. The
`times make it less attractive than with DRAMs.
`control processor and data-transfer process granted
`3. Use a wide memory structure: Normally the mem
`signal is available before the cycle in which the data
`ory width would be the same as the word width.
`transfer process. This signal therefore has enough time
`For example, a system using 32-bit words would
`to propagate into the sequencer, thus allowing the data
`typically use a 32-bit wide memory architecture.
`transfer process is not granted, then the data-transfer
`However, several of the innovative teachings set
`process cycles so the data-transfer processor will not
`forth herein show how a system with a much wider
`have long to wait. However, if the data-transfer proces
`local bus to cache memory can be very advanta
`65
`sor's program requires an end to waiting, the data-trans
`geous.
`fer processor can interrupt the control processor. On
`A wide memory structure provides high bandwidth
`receiving this interrupt the control processing the mem
`by accessing many words in parallel. Such a structure
`
`30
`
`35
`
`50
`
`
`
`15
`
`25
`
`-
`
`tecture.
`
`5,329,630
`6
`5
`Preferably double buffering is used in a register file at
`ory, and let the data-transfer processor in for at least
`the interface between a numeric processor and a large
`one cycle.
`data cache memory in a multiprocessor system. The
`The data-transfer process therefore accesses the
`partitioning of the register file avoids data collisions in
`memory no more often than once every 8 cycles. Its
`the cache memory
`bandwidth demands are therefore very low.
`In this sample embodiment, a 5-ported register file,
`The innovative teachings of the present application
`configured as two physically separate banks of high
`also enable a multiprocessor numeric processing sys
`speed memory, is used. However, a wide variety of
`tem, which bas a well-defined modular expansion inter
`other implementations could be used instead.
`face. This system can be used with one or several nu
`This innovation provides much greater flexibility
`meric processing modules. The modular interface per
`10
`than conventional systems which perform double buff
`mits multiple numeric processing modules (of different
`ering in hardware, at no loss in speed.
`types if desired) to be connected in parallel.
`The “preview” mode permits this double-buffering
`A control processor controls data transfers into and
`implementation to be used as a versatile interface archi
`out of each of the numeric processing modules. Control
`tecture in many pipelined environments.
`of these data transfers is accomplished by an extension
`of the control processor's microcode. Extensions of the
`BRIEF DESCRIPTION OF THE DRAWING
`control processor's writable control storage are located
`The present invention will be described with refer
`on each of the numeric processing modules. Each of the
`ence to the accompanying drawings, which show im
`extensions includes its own decode logic, and stores its
`portant sample embodiments of the invention and
`own executable microinstructions. Since all of the con
`20
`which are incorporated in the specification hereof by
`trol processor extensions are clocked by the control
`reference, wherein:
`processor's microcode clock, coordination among m