`Baldwin
`
`[54] SYSTEM AND METHOD USING
`DOUBLE-BUFFER PREVIEW MODE
`[75] Inventor:
`David R. Baldwin, Weybridge,
`United Kingdom
`[73] Assignee: DuPont Pixel Systems Limited,
`London, United Kingdom
`[21] Appl. No.: 925,238
`[22] Filed:
`Jul. 31, 1992
`
`[63)
`
`Related U.S. Application Data
`Continuation of Ser. No. 326,781, Mar. 21, 1989, aban
`doned.
`Foreign Application Priority Data
`[30]
`Mar. 23, 1988 (GB) United Kingdom ................. 8806850
`Mar. 23, 1988 [GB] United Kingdom ................. 8806856
`Mar. 23, 1988 [GB} United Kingdom ....
`. 8806864
`Mar. 23, 1988 [GB] United Kingdom ................. 8806865
`[51] Int. Cl." .............................................. G06F 12/06
`[52] U.S. Cl. .................................... 395/425; 395/800;
`364/DIG. 1; 364/228.1; 364/238.4; 364/242.6;
`364/242.91; 364/244; 364/244.8; 364/245.5;
`364/245.7; 364/284; 364/284.1
`[58] Field of Search ................ 395/425, 250, 325, 725
`[56]
`References Cited
`U.S. PATENT DOCUMENTS
`3,623,017 11/1971 Lowell et al. ....................... 395/550
`4,149,242 4/1979 Pirz ..................................... 395/325
`4,172,287 10/1979 Kawabe et al. ..................... 364/736
`4,396,978 8/1983 Hammer et al. ...
`... 364/200
`4,443,846 4/1984 Adcock ..............
`... 364/200
`4,495,567 1/1985 Treen ..........
`. 364/200
`... 395/550
`4,633,434 12/1986 Scheuneman ..
`... 395/375
`4,722,049 1/1988 Lahti ..............
`4,870,572 9/1989 Hosono et al. ...................... 364/200
`FOREIGN PATENT DOCUMENTS
`0186150 2/1986 European Pat. Off. .
`0085435 10/1986 European Pat. Off. .
`0284751 2/1988 European Pat. Off. .
`2162406A 6/1985 United Kingdom .
`WO86/07174 4/1986 World Int. Prop. O. .
`
`FP Port
`
`Mode
`
`|||||||||||||||||||||||||||||||||
`USOO5329630A
`5,329,630
`[11] Patent Number:
`[45] Date of Patent:
`Jul. 12, 1994
`
`OTHER PUBLICATIONS
`David C. Wyland, “Dual-Port, Rams Simplify Com
`munication in Computer Systems,” Integrated Device
`Technology, Inc., 1986.
`Bureaux D’Etudes Automatishmes, No. 32, Mar. 1987,
`pp. 85–87; J. Gustafson: Un super-ordinateur vectoriel
`homogene, p. 85, figure; p. 85, left-hand column. line
`35—p. 87, middle column, line 9.
`Conference Proceedings IEEE Southeastcon '87,
`Tampa, Fla., Apr., 1987, vol. 1, pp. 225–228; M. C.
`Ertem: A reconfigurable co-processor for microprocessor
`systems, FIGS. 2–4; p. 226, left-hand column, line 6–p.
`227, left hand column, line 25.
`(List continued on next page.)
`Primary Examiner—Paul V. Kulik
`Attorney, Agent, or Firm—Robert Groover
`[57]
`ABSTRACT
`A novel double buffering subsystem, wherein a dual
`port memory is partitioned in software so that the top
`half of the memory is allocated to one processor, and
`the bottom half to the other. (This allocation is switched
`when both processors set respective flag bits indicating
`that they are ready to switch.) On accesses to this mem
`ory, additional bits tag the access as “physical,” “logi
`cal,” or “preview.” A physical access is interpreted as a
`literal address within the full memory, and the double
`buffering is ignored. A logical access is supplemented
`by an additional address bit, determined by the double
`buffering switch state. A preview access is used for read
`access only, and goes to the opposite bank of memory
`from that which would be accessed in a logical access.
`This double-buffer architecture is advantageously used,
`in a multiprocessor system, at the interface between a
`numeric processor and a cache bus. The preview access
`can help to avoid data flow inefficiencies at synchroni
`zation points in pipelined algorithms.
`
`19 Claims, 59 Drawing Sheets
`A1:A5
`Addr
`
`CP Port
`
`Mode
`
`Access Mode
`00 - Physical
`01 - Logical
`10 - Preview
`
`
`
`SWAP
`from FP
`
`SWAP
`from CP
`
`Ex.1052.001
`
`DELL
`
`
`
`5,329,630
`Page 2
`
`OTHER PUBLICATIONS
`
`Proceedings of the Fourth Euromicro Symposium on
`Microprocessing and Microprogramming, Munich,
`Oct. 1978, pp. 358-365; F. B. Jorgensen et al.: A Bi-mi
`croprocessor implementation of a variable topology multi
`processor node, FIGS. 1–6, p. 358, right-hand column,
`line 13—p. 362, right-hand column, line 21.
`G. J. Myers: Digital system design with LSI bit-slice logic,
`1980, pp. 230–239, John Wiley & Sons, Inc., US p. 237,
`lines 1–4.
`W. Lichtenstein, “The Architecture of the Culler”,
`Mar. 1986, IEEE Coupon Spring 86, pp. 467–470.
`Proceedings of the IEEE, vol. 73, No. 5, May 1985, pp.
`
`852–873, IEEE, New York; J. Allen: “Computer archi
`tecture for digital signal processing”.
`Computer Design, vol. 16, No. 6, Jun. 1977, pp. 151–163;
`A. J. Weissberger: “Analysis of multiple-microproces
`sor system architectures”, FIGS. 7,8, p. 161.
`IEEE Electro, vol. 8, Apr. 1983, pp. 3/31–5, New York;
`B. J. New: “Address generation in signal/array proces
`sors”.
`Proceedings ICASSP, Dallas, Apr. 6th–9th, 1987, vol. 1,
`pp. 531–534; D. M. Taylor, et al.; “A novel VLSI digi
`tal signal processor architecture for high-speed vector
`and transform operations”.
`IBM Technical Disclosure Bulletin, vol. 27, No. 4A, Sep.
`1984, pp. 2184—2186, New York; J. P. Beraud et al.:
`“Fast fourier transform calculating circuit”.
`
`Ex.1052.002
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 1 of 59
`
`5,329,630
`
`i :(#) #. P
`
`C M
`— 190
`
`D T P
`
`i120
`
`DTP
`MC I/F
`
`
`
`
`
`Data Cache
`Memory
`140
`
`<!-
`
`st
`É
`?º
`2
`©
`to
`O
`
`256
`
`FP 130
`
`|
`
`[-
`
`GIP
`I/F
`170
`
`Host
`I/F
`160
`
`ºf /F
`
`150
`
`Fig. 1
`
`Ex.1052.003
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 2 of 59
`
`5,329,630
`
`
`
` evecSOMYhLZ
`
`Occ
`
`YN
`
`0197/UBIS
`
`bleZe
`
`
`
`cze|HStS|ayeBoudia
`Ore4==rdovAd|J9\s1694
`=—poy:ousnbes3ote
`glic49)s1Bay
`
`auljadisngFSussieoeyybua|
`
`1z2yOO|DBPOdOJOIWPoloBua]dd
`pjayyUBySUODpua}x9
`
`ieSizsnqsaduenbasyurzksnq@adoZ
`LzcoebE
`oySt4ayynq/5ay
`1oeesaqApyoo}
`
`yyTre
`
`veis
`
`
`
`.Joyesauab
`
`Ex.1052.004
`DELL Ex.1052.004
`
`DELL
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 3 of 59
`
`5,329,630
`
`(ZE)
`
`uO??Onu?SulŒ|| || || || || ||
`
`- OSIWN
`
`
`
`
`
`
`
`§, | (c) boles | (*) logºs}}
`
`
`uol? puoo |Ja?s1608;
`
`• • • • ? ? ? ? ? • • • • • • • • • • • • • • • • • • • • • • • * * * * * * * ***
`
`• = = * * * * * *
`* * * * * * * * * *
`
`az '61-I
`
`Ex.1052.005
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 4 of 59
`
`5,329,630
`
`ZeofsomWELE
`
`prayqueisuog|HUSyyy
`
`a3x<
`89snye}s
`tee||eee196
`/\Bre0160)
`
`yoursqAemnyinyy
`
`
`
`dle“=
`
`91
`
`91GLE
`
`‘snqsaouanbas
`
`Gble
`
`L
`
`Ore
`
`LLE
`
`02E3
`M
`
`2
`
`Loe
`
`sieJapo09q
`
`Link
`Reg/buffer
`
`VLE
`
`ook
`
`°idNc>*
`extend
`316
`
`of
`
`Ove
`
`Lot
`
`iuJaysibay
`
`[eoso}sibay
`
`opow
`
`ve“bis
`
`Ex.1052.006
`DELL Ex.1052.006
`
`DELL
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 5 of 59
`
`5,329,630
`
`Sequencer310
`
`315
`
`ice)-
`
`TDBUS122
`
`Fig.3B
`
` a
`
`311
`
`FromMICROBUS
`
`©-
`
`Constant
`Field
`
`Ex.1052.007
`DELL Ex.1052.007
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 6 of 59
`
`5,329,630
`
`
`
`Ex.1052.008
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 7 of 59
`
`5,329,630
`
`
`
`
`
`
`
`Transfer
`clock
`geºlor
`
`Local transfer
`clocks
`
`
`
`CP
`Extension
`
`CD bus 112
`
`
`
`
`
`
`
`
`
`CD Bus Trans-
`CelVerS
`444
`
`Local CP
`extension
`registers
`
`Fig. 4A
`
`Ex.1052.009
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 8 of 59
`
`5,329,630
`
`
`
`144
`Cache bus
`
`420
`
`433
`
`Fig. 4B
`
`-— 434
`
`431
`-— 432
`
`Ex.1052.010
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 9 of 59
`
`5,329,630
`
`UEIS
`
`sppsnq
`
`VLC
`
`CLv
`
`
`62b/X|aN_
`painpo
`
`SIWA
`
`
`
`oot|a8S
`
`aulnnoiqnsw9S|EduOMLaolv
`yor}ssseeevssolppy
`
`
`SlyGZPBeyPlyBay
`
`AQOSOXOIN
`
`Gcc
`
`HSS
`
`T8P
`
`Or“614
`
`Ex.1052.011
`DELL Ex.1052.011
`
`DELL
`
`
`
`
`U.S. Patent
`U.S. Patent
`
`July 12, 1994
`
`Sheet 10 of 59
`
`5,329,630
`5,329,630
`
`applyanf
`
`
`
`(2)ssouppy
`
`
`
`
`
`sseappyXx(Z)ssouppyA(Z)
`
`
`
`
`
`(9)suayipowssosppy
`
`yoojn}OSIN
`asje4|2P0D‘puod|aysaysibey|ajyGay
`
`
`(g)ajas|Burssosppe|(€)3M|(8)YONONAISU|]|joquoo
`(Zz)joNUODUOISIDd1g3\qnNOq
`
`
`
`
`
`
`(z)jo4)U09yYOR}SBUINOIGNS
`(z)demspueayeyspueH
`(z)joujuodsnyelsgAyoNS
`
`
`yy6ua}
`
`(€)|04JU0DPedYoye1NG
`
`
`
`(¢)sajqeussng
`
`
`
`(1);o4jU0DOYPIOBSN
`
`Nd
`
`68yNd
`
`(6)
`(6)
`
`(y)
`
`
`
`
`
`
`
`
`
`
`ap“bis
`
`Ex.1052.012
`DELL Ex.1052.012
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 11 of 59
`
`5,329,630
`
`bob
`
`AueuOHenIquy965
`
`
`
`
`
`vILS
`
`Wd
`
`evs
`
`bylsngoyoe.
`
`sJOJSUBIL_IssddwtZl
`
`ess
`
`016072s
`
`--y--4--
`
`Zitsnqqog“blsa|IC1010C0ee|jo4
`
`Wd+XNW
`
`0eS
`
`ses
`
`21607]Les
`
`
`
`ZsJOJSUBI|epdidwbLL
`
`LvSOrs
`oe2S
`
`Ex.1052.013
`DELL Ex.1052.013
`
`DELL
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 12 of 59
`
`5,329,630
`
`SIABAIS
`
`
`MowsayZEDapooep
`
`
`
`
`
`
`apo9apNd39eP}19}u|ssauppessoippypueog
`
`eckSngGL
`
`bobsngVL
`
`Yvoo9snqIppySIA
`
`0v9
`
`099eS9
`
`ydnwwa}ujpueYING
`
`42||O1JUOD
`
`sng
`
`
`
`
`
`ydn.uayu]PUBLYWA
`
`dite
`-OJDIIN
`
`jO1}U09
`
`059
`
`sng3WA
`
`009seo1es
`
`AWA
`
`ydnwueyul
`
`
` 09s2160)
`
`apoo
`
`peo}
`
`jO1}U09D
`
`9160)
`
`snqeedAWA
`
`009
`
`Ex.1052.014
`DELL Ex.1052.014
`
`DELL
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 13 of 59
`
`5,329,630
`
`TD bus
`122
`
`Fifo full 770
`
`Strobes 760
`*=º-
`
`DATA PIPE
`OUT
`730
`
`~/
`
`Emp
`
`...
`Fifo
`740
`
`780
`
`780
`
`Full
`
`Data
`
`DATA PIPE
`IN § 1
`720
`
`Empt
`
`Rd
`
`Fifo
`Y50
`
`DATA PIPE
`|N Nº 2
`71
`
`Full
`
`Data
`Wr
`
`Fig. 7
`
`Ex.1052.015
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 14 of 59
`
`5,329,630
`
`
`
`
`
`„BWASOMTILL TOE
`
`Ou?uOO±?<!---*o | „BOSONA
`
`Ex.1052.016
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 15 of 59
`
`5,329,630
`
`Cl O JE
`
`ClO
`
`
`
` ?| || SnC|VO| – Z? ? Snq
`
`Data Cache
`Memory
`140
`
`
`
`AA
`
`DP
`
`150
`
`Fig. 9A
`
`Ex.1052.017
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 16 of 59
`
`5,329,630
`
`
`
`
`
`
`
`
`
`Õ?6 ETIH HELSIÐBH
`
`WÕT5
`
`Ex.1052.018
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 17 of 59
`
`5,329,630
`
`ae@
`
`D- |
`
`DP
`
`150
`
`Cl
`CD ŽE
`
`
`
`?| || SnC|VO| –
`
`Data Cache
`
`
`
`FP
`
`Fig. 10
`
`Ex.1052.019
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 18 of 59
`
`5,329,630
`
`
`
`
`
`
`
`
`
`ÕST?
`
`Ex.1052.020
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 19 of 59
`
`5,329,630
`
`Internal Bus 1250
`
`
`
`
`
`Comparator
`1230
`
`Fig. 12
`
`Bit Reverse
`1240
`
`1270
`
`Ex.1052.021
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 20 of 59
`
`5,329,630
`
`weibold
`
`J9yUuNOd
`
`
`
`
`
`JoxadininuwSSasppeSpoooJoIyy
`
`sng.d.
`
`jdnuia}u|
`
`$10]03A,
`
`Ovet
`
`BasWwae
`siajunoy
`
`
`
`ocer
`
`o160|
`
`}dnua}u|
`$}dn1a)u]
`
`e1‘bis
`
`Ex.1052.022
`DELL Ex.1052.022
`
`DELL
`
`
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 21 of 59
`
`5,329,630
`
`
`
`N
`
`
`
`ø?qeua ?nd?nO = EO
`
`TES
`
`Ex.1052.023
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 22 of 59
`
`5,329,630
`
`
`
`
`
`TOETSELTOETSELTOETSEL
`| Nooperation | z I - |z|-|z|-
`|Eye Extend Enabel BT.7|Enable Etz Enable Bitz
`Bytezero Fi? Enable zero Enable zero Enable zero
`| Word Extend |ZT-Enable BºtsIEnableBitts
`| Wordzero Fi?
`Z || -
`Enable zero Enable zero
`
`
`
`Z = Hi Impedance
`
`Fig. 14B
`
`Ex.1052.024
`
`DELL
`
`
`
`U.S. Patent
`
`5,329,630
`
`
`
`+----------
`
`L
`
`* = * * * = = m =
`
`* * * * * =
`
`s?Ë
`
`
`
`(ISOH uuou
`
`L---------
`
`§:
`
`| | | SnC, VO
`
`lº
`
`Z| | SnC, CO
`
`Ex.1052.025
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 24 of 59
`
`5,329,630
`
`Cache bus 144
`
`256
`
`DCM I/F 1620
`
`
`
`
`
`430A
`
`430D 433
`Reg File
`430
`
`32
`
`430B 430C
`
`
`
`
`
`
`
`440A 440B
`FMPY
`440
`
`
`
`
`
`450A 450B
`FALU
`450
`
`
`
`
`
`Fig. 16
`
`Ex.1052.026
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 25 of 59
`
`5,329,630
`
`
`
`(1:2)
`XnW
`
`T?s 8 T?s
`
`?pOW
`
`
`
`9VTQ TOESn
`
`ZL ‘61-I
`
`Ex.1052.027
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 26 of 59
`
`5,329,630
`
`ÕIGT
`
`
`
`gO;8 || SS3/ppW
`
`<
`O
`3
`
`Ex.1052.028
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 27 of 59
`
`5,329,630
`
`Jpp\/
`
`(Gviov)
`
`
`
`?u.Od lenC]
`
`Áuouaw
`
`SsauppV (
`
`OIGT
`
`dV/NWS
`
`
`
`
`
`
`
`
`
`
`
`Ex.1052.029
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 28 of 59
`
`5,329,630
`
`uppV/
`
`
`
`?pOWN
`
`dV/NWS
`
`d'O UUO]]
`
`
`
`
`
`epow SS000\/
`
`TES
`TBS uod dº
`
`?pOW
`
`OZ "fil
`
`Ex.1052.030
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 29 of 59
`
`5,329,630
`
`SENA
`SS0/ppv/
`
`XOOIO
`19?Sueu L
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`992
`
`?z "fil-l
`
`Ex.1052.031
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 30 of 59
`
`5,329,630
`
`LIVM d=]
`
`LIVNA CHO
`
`?UOC] d'O
`
`
`LIVM d'O
`LIVM d=]
`
`LIVNA d'O
`LIVNA d=]
`
`LIVNA d=]
`
`LIVWA CHO
`
`
`
`- LIVNA d'O
`LIVNA d=]
`
`LIVNA d-!
`
`LIVNA CHO
`
`9.UOC.] CHO
`
`LIVNA d-!
`
`LIVNA d'O
`
`ZZ "61-I
`
`Ex.1052.032
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 31 of 59
`
`5,329,630
`
`
`
`ÖZGZ
`
`
`
`
`
`
`
`
`
`£Z "fil
`
`Ex.1052.033
`
`DELL
`
`
`
`U.S. Patent
`
`
`
`ÖFT Å HOWEW B'HOVO V LVC]
`
`
`
`
`
`Ex.1052.034
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 33 of 59
`
`5,329,630
`
`
`
`FP Write
`Mask
`Logic
`2510
`
`Fig. 25
`
`MUX
`530
`
`Memo
`º
`510
`
`Enable
`
`Ex.1052.035
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 34 of 59
`
`5,329,630
`
`
`
`
`
`ÕIGE
`
`
`
`Ex.1052.036
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 35 of 59
`
`5,329,630
`
`XIO C]
`
`ÁSng
`
`
`
`
`
`- - - - - - - - - - 4 - J --- - - - - - - - - - - - 0–
`
`
`peÐH ??BOJ SOM
`
`???JNA e?BO SOM
`
`ZZ "fil
`
`Ex.1052.037
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 36 of 59
`
`5,329,630
`
`
`
`CP 2 EXT WCS
`
`T
`
`CP 1 EXT WCS
`
`3.
`
`8Z "fil
`
`Ex.1052.038
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 37 of 59
`
`5,329,630
`
`Kowow
`
`Ort
`
`
`
`ayoeyeyed
`
`BulpioHds
`
`$19)sIbay
`
`Ocp
`
`
`
`ujdoojjeuaSs
`
`SOM
`
`OZP
`
`apooouoiwBeiup)7
`"SQO1A9P0}s}ndu!
`
`
`
`inodoo;jeuas
`
`auljadid
`
`49)s160}Y
`
`Olb.6z“Bis
`
`Ex.1052.039
`DELL Ex.1052.039
`
`DELL
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 38 of 59
`
`5,329,630
`
`
`
`
`
`
`
`
`
`
`
`Ex.1052.040
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 39 of 59
`
`5,329,630
`
`
`
`
`
`WWWW,
`}}}}}} |\ |\ i
`ITIII || || ||
`
`O)
`
`
`
`
`
`/X\/\\ /X\/\\
`||||||||||||||||
`XXX
`C?. 3,
`
`
`
`
`
`
`
`Ex.1052.041
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 40 of 59
`
`5,329,630
`
`
`
`
`
`
`
`
`
`
`
`Read next Pixel from register file
`
`
`
`
`
`
`
`Add pixel to base address of the
`histogram table
`
`Load address register with
`histogram address
`
`Read pixel count into ALU input
`register
`
`Increment pixel count by one
`
`Write new pixel count into
`histogram table
`
`More pixels?
`
`
`
`Fig. 32
`
`Ex.1052.042
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 41 of 59
`
`5,329,630
`
`CP MICROCODE
`
`i
`
`FP MICROCODE
`
`Load FP start address reg
`with microcode address
`Wait loop from
`and start FP running. FS&
`& previous command
`Transfer first 8 elements
`; Nº,
`of array A to register file
`:
`Z/Y.
`
`Cºstart D
`
`Transfer first 8 elements
`of array B to register file
`
`:
`:
`
`Request register file
`Swap
`
`S
`
`ses
`Set CP done and request –SS3
`sº
`register file swap
`§4. T to No
`i
`sº
`-
`Do 8 calculations and
`leave result in register file
`
`Yes
`
`i
`
`Ex.1052.043
`
`No
`Transfer last 8 result
`elements from register
`file into arrary C
`C End D
`
`
`
`
`
`
`
`
`
`Set CP done and
`request register file
`SW3C
`
`Transfer 8 result
`elements from register
`file into arrary C
`
`Fig. 33
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 42 of 59
`
`5,329,630
`
`Ö?IF
`
`?GIF
`
`?GIF
`
`ÕGIF
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ve ‘61-I
`
`Ex.1052.044
`
`DELL
`
`
`
`July 12, 1994
`
`OG?
`OSty
`
`O?IF
`OSIP
`
`
`
`U.S. Patent
`U.S. Patent
`
`jEOUOWNN
`
`JOyesajo00V
`
`fEOUSLUNN
`
`JOye19|900'Y
`
`
`
`jeOUaWINN
`
`JOye19|990V/
`
`ÖGT7
`OSTP
`
`
`
`
`
`
`
`jeOuOWNN
`
`Joye19|900Y
`
`OSTy
`
`Sheet 43 of 59
`
`5,329,630
`
`OlLySNEAWA
`
`SSEW
`
`a6es01S
`
`OZTV
`
`Asowayy
`
`1SOH
`
`ce“bis
`
`Ex.1052.045
`DELL Ex.1052.045
`
`DELL
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 44 of 59
`
`5,329,630
`
`
`
`
`
`Ex.1052.046
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 45 of 59
`
`5,329,630
`
`
`
`OTOGTF
`(?Ae?S)
`
`EIVNA
`
`ST18
`
`SSeW
`
`ZE ‘61-I
`
`Ex.1052.047
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 46 of 59
`
`5,329,630
`
`
`
`
`
`
`
`
`
`
`
`
`
`• æ • • • • • • • • • • •O 19 STIS WOCl -------------|:||---------- O?9 STIS WOCl -------------
`
`
`WO
`
`
`
`
`
`• - • • • •OZZ SONA d'O -------------- 029 SOM d_1C] ---------
`
`
`
`
`
`
`
`
`
`• • • • • • • • -fiu|p|OH ------------------ 6u(pIOH ----------
`
`ºsô0HºsôæH
`
`
`
`
`
`d_LC] (8 d'Od_1C] '8 d'O
`
`[FOTEET):TÕT?TI
`
`
`
`
`
`- WO
`
`
`
`
`
`09? -|/|5|WA ---------------------------------------------
`
`
`
`
`
`
`
`
`
`? ? *-+ - - - - - - - -, -, -,OG? -]/I ?d|d eyeO -------------[??ET)------ 08! =|/| OWN d1C] '804 || -/| dl50 -------
`
`
`
`
`
`
`
`
`
`
`
`
`
`Ex.1052.048
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 47 of 59
`
`5,329,630
`
`
`
`TUIGETI
`DOETUIGBOE)
`
`
`
`|?=F===', ___0Z8€ TOE ----------
`
`
`
`g88 -61-I
`
`Ex.1052.049
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 48 of 59
`
`5,329,630
`
`
`
`
`
`'-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -p?91-J SS0/ppv/ 9S|B-?|
`
`6& -61-I
`
`Ex.1052.050
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 49 of 59
`
`5,329,630
`
`
`
`->
`C)
`Qi.)
`*C.
`O
`O
`9
`.92
`>
`
`:
`
`Ex.1052.051
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 50 of 59
`
`5,329,630
`
`interrupt service routine
`
`Hold status
`flag copy
`
`
`
`interrupt routine
`
`
`
`
`
`
`
`INTERRUPT
`
`Generate test condition
`and store in sequencer
`flag
`
`Conditional Jump
`True
`path
`
`
`
`Return from
`interrupt and
`restore sequencer
`flag
`
`Fig. 40B
`
`Ex.1052.052
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, w -
`
`Sheet 51 of 59
`
`5,329,630
`
`Host
`Computer
`4100
`
`
`
`
`
`Picture ProC
`
`4140
`
`o E
`So H
`* :
`2 :
`3 :
`g :
`E =
`.9 =
`0- i.
`
`Numeric Accel
`
`4150
`ei sã
`* E
`cr) :
`?º
`§
`92 -
`t
`:
`à E
`H
`C :
`
`?ºl- >
`
`s
`
`Numeric Accel
`
`4150
`e;
`cu -
`§:
`
`Fig. 41
`
`C
`:-
`l
`à
`LL]
`>
`>
`
`Mass
`Memory
`4160
`
`Mass
`Storage
`4170
`
`Interface
`4180
`
`Ex.1052.053
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 52 of 59
`
`5,329,630
`
`
`
`
`
`
`Requestregisterfile swap
`
`Read A[0] and Bi0
`
`
`
`Read A[1] and Bi1
`alculate R{0] = A[0]"B[0}
`od
`
`Calculate R[1] = A[1}*B[1
`Write R[0]
`
`
`PT
`
`
`
`
`Read A[3] and B/3
`
`
`Calculate R{2] = A2]"Bi2
`
`
`
`Write R[1]
`
`
`
`
`
`
`FP
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`et FP done and req swap
`
`
`
`
`
`Read A[0}' and Bj0]'
`
`
`
`
`Write R[3]
`
`
` Fig. 42
`Set FP done and rea swap
`
`Calculate R/0]' = A[0)“Bioy’
`
`w
`
`Ex.1052.054
`DELL Ex.1052.054
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 53 of 59
`
`5,329,630
`
`i
`
`#. P
`
`i
`
`
`
`D T P
`
`i
`
`Data Cache
`
`GIP
`I/F
`
`--- | |;
`Memory 140 | 170
`
`<!
`
`St g
`º
`
`+
`O
`co
`O
`
`[…]
`
`Host
`WF
`160
`
`DCM Ext'n
`
`4310
`
`Fig. 43
`
`DP
`I/F
`
`150
`
`Ex.1052.055
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 54 of 59
`
`
`
`
`
`
`
`wyty "fil
`
`Ex.1052.056
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 55 of 59
`
`EEEEE|É||Í||?||
`
`5,329,630
`
`
`
`Gyv "fil-l
`
`Ex.1052.057
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 56 of 59
`
`5,329,630
`
`
`
`:
`
`Ex.1052.058
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 57 of 59
`
`5,329,630
`
`
`
`HI ?ST)
`
`Z ? ? Snq CJO
`
`
`
`Giv "fil
`
`Ex.1052.059
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 58 of 59
`
`5,329,630
`
`
`
`
`
`9v "fil
`
`Ex.1052.060
`
`DELL
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 59 of 59
`
`5,329,630
`
`|
`|
`
`| |
`
`SU00||
`
`Zv "fil-l
`
`Ex.1052.061
`
`DELL
`
`
`
`1
`
`SYSTEM AND METHOD USING
`DOUBLE-BUFFER PREVIEW MODE
`
`5,329,630
`
`2
`separate processors. Of course, some algorithms can
`profit by pipelining or parallelism to a much greater
`degree than others.
`The speedofa pipelineis limited by its slowest stage.
`Moreover, the average efficiency of a pipelined system
`will be diluted by two overhead requirements: the pipe-
`line must befilled at the start of the operation, and must
`be emptied at the end of the operation. The impact of
`these overheads depends on ratio of the number of
`elements which must be passed through the pipeline in
`one run to the numberofstagesin the pipeline (referred
`to as the length of the pipeline). Thus, these overheads
`may be unimportant when the length of the pipeline is
`short, and the number of elementsper runis fairly long.
`However, for a longer pipeline (or for shorter runs),
`these overheads can be an important factor in through-
`put.
`
`This is a continuation of application Ser. No. 326,781,
`filed Mar. 21, 1989, now abandoned.
`PARTIAL WAIVER OF COPYRIGHT
`All of the material in this patent application is subject
`to copyright protection under the copyrightlaws of the
`United Kingdom,
`the United States, and of other
`countries. As of the first effective filing date of the
`present application, this material is protected as unpub-
`lished material.
`However, permission to copy this material is hereby
`granted to the extent that the copyright owner has no
`objection to the facsimile reproduction by anyone ofthe
`patent document or patent disclosure, as it appears in
`official patentfile or records of the United Kingdom or
`any other country, but otherwise reserves all copyright
`rights whatsoever.
`
`BACKGROUNDOF THE INVENTION
`
`The present invention relates to computer systems
`and subsystems, and to computer-based methods for
`data processing.
`HIGH-SPEED MULTIPROCESSOR
`ARCHITECTURES
`
`It has long been realized that the use of multiple pro-
`cessors operating in parallel might in principle be a very
`convenient way to achieve very high net throughput.
`Many such architectures have been proposed. How-
`ever, the actual realization of such architectures is very
`difficult. In particular, it is difficult to design an archi-
`tecture of this kind which will be versatile enough to
`satisfy a range of users and adapt to advances in tech-
`nology.
`Fully asynchronous multiprocessor architectures
`have been proposed,butit is generally recognized in the
`art that the problems of programming support in a mul-
`tiprocessor architecture have not nearly been solved.
`A very recent overview of some of the issues in-
`volved in multiprocessor systems may be found in Du-
`bois et al., “Synchronization, Coherence, and Event
`Ordering in Multiprocessors,” Computer magazine,
`February 1988, page 9, which is hereby incorporated by
`reference. A recently proposed multiprocessor archi-
`tecture for digital signal processing is described in Lang
`et al., “An Optimum Parallel Architecture for High-
`Speed Real-Time Digital Signal Processing,” Computer
`magazine, February 1988, page 47, which is hereby
`incorporated by reference.
`INTER-PROCESSOR SYNCHRONIZATION
`
`15
`
`20
`
`25
`
`30
`
`40
`
`45
`
`Synchronization between processors is a continuing
`critical issue in a very wide variety of multiprocessor
`system. Often such inter-processorinterfaces make use
`of “processor-waiting” or
`‘“‘processor-ready” status
`signals which can be set or cleared by either processor.
`(Such signals are commonly known as “semaphores.”)
`INTER-PROCESSOR DATA ROUTING
`
`60
`
`Two general concepts of allocating work among
`processors are pipelining and parallelism. “Pipelining”
`is generally used to refer to data routings where a single
`data set is successively operated on by more than one
`processor. Parallelism refers to data routings where
`different operations are concurrently performed by
`
`INTER-PROCESSOR DATA EXCHANGE
`The interface between two processors in a multipro-
`cessor system often requires that data be passed back
`and forth rapidly. Double buffering is a commonly used
`technique to permit data transfer, without hangups,loss
`of data synchronization, or data access collisions. Nor-
`mally the memoryspaceto be sharedis divided into two
`physical memories, and the accesses are arbitrated in
`hardwareso that, on any one cycle, each processor can
`access only half the memoryspace(i.e. one of the physi-
`cal memories).
`FIG. 18 shows one example of a prior arrangement
`for double buffering. The port select logic 1810 pro-
`vides select signals to data buffers 1860, so that the two
`data busses 1850A and 1850B (from the sides of the
`double buffer) are connected to either the first or sec-
`ond memory 1820. The port select logic 1810 also pro-
`vides select signals to address multiplexers 1830, so that
`the two address busses 1840A and 1840B are connected
`to access either the first or second memory 1820.
`FIG. 19 shows another example of a prior arrange-
`ment for software-controlled double buffering. The
`port select logic 1910 provides select signals directly to
`the mostsignificant address bit A6 of a dual port mem-
`ory 1920. Thus, each port sees only half of the physical
`address space, but the double buffering can be quite
`transparent.
`
`CACHE MEMORY ARCHITECTURES
`
`Cache memoryis a conventional way to increase the
`net throughput of computing systems.If a large fraction
`of memory accesses are expected to call on memory
`locations already in cache, then every read from cache
`can save an amountof time equal to the difference be-
`tween the cache access time and the main memory
`access time. Therefore, cache memory systems nor-
`mally attempt to maximize the bandwidth to the cache.
`MICROCODED ARCHITECTURES
`
`An extremely important tool for developing high-
`speed and/or flexible computer architectures is micro-
`coding. See J. Mack & J. Brick, Bit-Slice Microprocessor
`Design (1980), which is hereby incorporated by refer-
`ence. Microcoded architectures are not only extremely
`flexible, but also have the potential
`to provide ex-
`tremely high speed.
`instruc-
`In microcoded architectures the individual
`tions are fairly long (e.g. 100 bits or so). Some fairly
`low-level logic decodes the instructions, so that appro-
`
`Ex.1052.062
`DELL Ex.1052.062
`
`DELL
`
`
`
`5
`
`15
`
`20
`
`5,329,630
`4
`3
`has much simpler timing requirements than an inter
`priate fields are sent to low-level devices (such as regis
`leaved memory architecture would. (However, a large
`ter files, adders, etc.).
`percentage of non-sequential accesses will ultimately
`The total number of bits in the instruction field will
`reduce the bandwidth to that of a normal single-width
`typically be very much larger than the log2 of the total
`architecture.)
`number of instructions. This permits the decode opera
`This memory architecture also has advantages in a
`tion to be made very much simpler. Microcoded archi
`multi-port situation where some or all of the ports have
`tectures commonly use a sequencer to perform address
`a much lower bandwidth than the memory itself. In
`calculations and perform a first level of decode. (Alter
`these cases there will be some intermediate storage
`natively, a lower level of logic can be used to perform
`(normally registers) to capture the data for later access
`the program sequencing function.) The sequencer ac
`10
`ing over several cycles by the recipient. While such
`cesses microinstructions from a control store (memory),
`time-multiplexed accesses are in progress, there is no
`and various portions of the microinstructions are pro
`demand on the memory system for bandwidth.
`vided to additional decode logic, and/or applied di
`rectly to devices. Since a single instruction can contain
`In the preferred embodiment there are also some
`significant novelties in the interface logic which con
`many command fields (all of which will be executed
`simultaneously), it is possible to write surprisingly short
`trols the data interface to the cache from the numeric
`processor. These features will be discussed in greater
`microcode programs.
`Since the individual instructions are quite low-level,
`detail below.
`-
`and fairly long, the total program storage required can
`A feature which helps to maximize the throughput of
`be quite significant. The data transfer requirements for
`the transfers in the transitional clock domain is a dou
`loading a microcode routine can be significant.
`ble-word interface on only one side of the fast register
`file. That is, the register file appears, on the cache mem
`SUMMARY OF THE INVENTION
`ory side, as if it were 64 bits wide. However, on the
`The present application provides a large number of
`FPU side it only appears to be 32 bits wide. This results
`innovative teachings, which will be described in the
`25
`in some odd/even structure in the word addresses, but
`general context of a system like that shown in FIG. 1.
`possible problem due to this odd/even structure are
`Among the innovative teachings set forth herein is a
`avoided by several innovative features. Since these
`multiprocessor numeric processing subsystem wherein
`problem can be avoided, the double-word interface
`an extremely wide local bus connects the arithmetic
`provides substantial advantages in the bandwidth of the
`calculation subunit to a large data cache memory. This
`register file interface.
`cache is multiported, so that newly retrieved data can
`Some significant advantages are also derived from
`be written into the cache at essentially the same time
`the preferred scheme for arbitrating access of the con
`that data transfer is occurring between the numeric
`trol processor and data-transfer processor to the cache
`processing subunit and the cache.
`memory. In the presently preferred embodiment, the
`To get a very high memory bandwidth, there are
`cache is physically dual-ported, but it is used as if it
`only three basic strategies:
`were triported.
`1. Use very fast memory devices: The problem here is
`The data cache memory is triported between the
`one of economics and size. Very fast memory de
`control processor module, the data-transfer processor
`vices are very expensive, sometimes as much as ten
`module, and the numeric processor module(s), so some
`times the cost of the slower counterparts, and the
`form of arbitration is necessary to control access. The
`number of storage bits per device is more limited.
`control processor generates addresses and controls the
`The major advantage of this technique is that the
`routing of data for itself and the floating-point proces
`bandwidth improvement is independent of the data
`sor(s) under program control so the control processor
`layout in memory (assuming that the address gen
`and floating-point processor access are mutually exclu
`erator is fast enough).
`45
`sive. The data-transfer processor, however, is totally
`2. Use interleaved memories: Interleaved memories
`autonomous and can compete for access at any time.
`have traditionally been used with dynamic RAMs
`In the presently preferred embodiment, the arbitra
`(DRAMs), where the cycle times have been longer
`tion is such that the control processor/floating-point
`than the access times. In this context, a significant
`processor has access whenever it wishes, and the data
`advantage can be gained by interleaving two or
`transfer processor makes use of any unused access cy
`more banks and offsetting the timing between
`cles. To make use of the unused cycles, the data-transfer
`banks. The problem with this technique occurs
`processor includes extra hardware which will allow it
`when successive accesses keep hitting the same
`to use a single free cycle amongst many busy ones.
`bank, or accesses through another port (in a multi
`The control processor and data-transfer processor are
`port memory)) disturbs the sequential accessing of
`55
`preferably autonomous but synchronized. This is ac
`banks. This technique can be used with static mem
`complished by letting them share a common microcode
`ories (SRAMs), but the equal access and cycle
`clock. This synchrony simplifies the arbitration. The
`times make it less attractive than with DRAMs.
`control processor and data-transfer process granted
`3. Use a wide memory structure: Normally the mem
`signal is available before the cycle in which the data
`ory width would be the same as the word width.
`transfer process. This signal therefore has enough time
`For example, a system using 32-bit words would
`to propagate into the sequencer, thus allowing the data
`typically use a 32-bit wide memory architecture.
`transfer process is not granted, then the data-transfer
`However, several of the innovative teachings set
`process cycles so the data-transfer processor will not
`forth herein show how a system with a much wider
`have long to wait. However, if the data-transfer proces
`local bus to cache memory can be very advanta
`65
`sor's program requires an end to waiting, the data-trans
`geous.
`fer processor can interrupt the control processor. On
`A wide memory structure provides high bandwidth
`receiving this interrupt the control processing the mem
`by accessing many words in parallel. Such a structure
`
`30
`
`35
`
`50
`
`Ex.1052.063
`
`DELL
`
`
`
`15
`
`25
`
`35
`
`tecture.
`
`5,329,630
`6
`5
`Preferably double buffering is used in a register file at
`ory, and let the data-transfer processor in for at least
`the interface between a numeric processor and a large
`one cycle.
`data cache memory in a multiprocessor system. The
`The data-transfer process therefore accesses the
`partitioning of the register file avoids data collisions in
`memory no more often than once every 8 cycles. Its
`the cache memory
`bandwidth demands are therefore very low.
`In this sample embodiment, a 5-ported register file,
`The innovative teachings of the present application
`configured as two physically separate banks of high
`also enable a multiprocessor numeric processing sys
`speed memory, is used. However, a wide variety of
`tem, which bas a well-defined modular expansion inter
`other implementations could b