`Baldwin
`
`[54] SYSTEM AND METHOD USING
`DOUBLE-BUFFER PREVIEW MODE
`[75] Inventor:
`David R. Baldwin, Weybridge,
`United Kingdom
`[73] Assignee: DuPont Pixel Systems Limited,
`London, United Kingdom
`[21] Appl. No.: 925,238
`[22] Filed:
`Jul. 31, 1992
`
`[63)
`
`Related U.S. Application Data
`Continuation of Ser. No. 326,781, Mar. 21, 1989, aban
`doned.
`Foreign Application Priority Data
`[30]
`Mar. 23, 1988 (GB) United Kingdom ................. 8806850
`Mar. 23, 1988 [GB] United Kingdom ................. 8806856
`Mar. 23, 1988 [GB} United Kingdom ....
`. 8806864
`Mar. 23, 1988 [GB] United Kingdom ................. 8806865
`[51] Int. Cl." .............................................. G06F 12/06
`[52] U.S. Cl. .................................... 395/425; 395/800;
`364/DIG. 1; 364/228.1; 364/238.4; 364/242.6;
`364/242.91; 364/244; 364/244.8; 364/245.5;
`364/245.7; 364/284; 364/284.1
`[58] Field of Search ................ 395/425, 250, 325, 725
`[56]
`References Cited
`U.S. PATENT DOCUMENTS
`3,623,017 11/1971 Lowell et al. ....................... 395/550
`4,149,242 4/1979 Pirz ..................................... 395/325
`4,172,287 10/1979 Kawabe et al. ..................... 364/736
`4,396,978 8/1983 Hammer et al. ...
`... 364/200
`4,443,846 4/1984 Adcock ..............
`... 364/200
`4,495,567 1/1985 Treen ..........
`. 364/200
`... 395/550
`4,633,434 12/1986 Scheuneman ..
`... 395/375
`4,722,049 1/1988 Lahti ..............
`4,870,572 9/1989 Hosono et al. ...................... 364/200
`FOREIGN PATENT DOCUMENTS
`0186150 2/1986 European Pat. Off. .
`0085435 10/1986 European Pat. Off. .
`0284751 2/1988 European Pat. Off. .
`2162406A 6/1985 United Kingdom .
`WO86/07174 4/1986 World Int. Prop. O. .
`
`FP Port
`
`Mode
`
`|||||||||||||||||||||||||||||||||
`USOO5329630A
`5,329,630
`[11] Patent Number:
`[45] Date of Patent:
`Jul. 12, 1994
`
`OTHER PUBLICATIONS
`David C. Wyland, “Dual-Port, Rams Simplify Com
`munication in Computer Systems,” Integrated Device
`Technology, Inc., 1986.
`Bureaux D’Etudes Automatishmes, No. 32, Mar. 1987,
`pp. 85–87; J. Gustafson: Un super-ordinateur vectoriel
`homogene, p. 85, figure; p. 85, left-hand column. line
`35—p. 87, middle column, line 9.
`Conference Proceedings IEEE Southeastcon '87,
`Tampa, Fla., Apr., 1987, vol. 1, pp. 225–228; M. C.
`Ertem: A reconfigurable co-processor for microprocessor
`systems, FIGS. 2–4; p. 226, left-hand column, line 6–p.
`227, left hand column, line 25.
`(List continued on next page.)
`Primary Examiner—Paul V. Kulik
`Attorney, Agent, or Firm—Robert Groover
`[57]
`ABSTRACT
`A novel double buffering subsystem, wherein a dual
`port memory is partitioned in software so that the top
`half of the memory is allocated to one processor, and
`the bottom half to the other. (This allocation is switched
`when both processors set respective flag bits indicating
`that they are ready to switch.) On accesses to this mem
`ory, additional bits tag the access as “physical,” “logi
`cal,” or “preview.” A physical access is interpreted as a
`literal address within the full memory, and the double
`buffering is ignored. A logical access is supplemented
`by an additional address bit, determined by the double
`buffering switch state. A preview access is used for read
`access only, and goes to the opposite bank of memory
`from that which would be accessed in a logical access.
`This double-buffer architecture is advantageously used,
`in a multiprocessor system, at the interface between a
`numeric processor and a cache bus. The preview access
`can help to avoid data flow inefficiencies at synchroni
`zation points in pipelined algorithms.
`
`19 Claims, 59 Drawing Sheets
`A1:A5
`Addr
`
`CP Port
`
`Mode
`
`Access Mode
`00 - Physical
`01 - Logical
`10 - Preview
`
`
`
`SWAP
`from FP
`
`SWAP
`from CP
`
`
`
`5,329,630
`Page 2
`
`OTHER PUBLICATIONS
`
`Proceedings of the Fourth Euromicro Symposium on
`Microprocessing and Microprogramming, Munich,
`Oct. 1978, pp. 358-365; F. B. Jorgensen et al.: A Bi-mi
`croprocessor implementation of a variable topology multi
`processor node, FIGS. 1–6, p. 358, right-hand column,
`line 13—p. 362, right-hand column, line 21.
`G. J. Myers: Digital system design with LSI bit-slice logic,
`1980, pp. 230–239, John Wiley & Sons, Inc., US p. 237,
`lines 1–4.
`W. Lichtenstein, “The Architecture of the Culler”,
`Mar. 1986, IEEE Coupon Spring 86, pp. 467–470.
`Proceedings of the IEEE, vol. 73, No. 5, May 1985, pp.
`
`852–873, IEEE, New York; J. Allen: “Computer archi
`tecture for digital signal processing”.
`Computer Design, vol. 16, No. 6, Jun. 1977, pp. 151–163;
`A. J. Weissberger: “Analysis of multiple-microproces
`sor system architectures”, FIGS. 7,8, p. 161.
`IEEE Electro, vol. 8, Apr. 1983, pp. 3/31–5, New York;
`B. J. New: “Address generation in signal/array proces
`sors”.
`Proceedings ICASSP, Dallas, Apr. 6th–9th, 1987, vol. 1,
`pp. 531–534; D. M. Taylor, et al.; “A novel VLSI digi
`tal signal processor architecture for high-speed vector
`and transform operations”.
`IBM Technical Disclosure Bulletin, vol. 27, No. 4A, Sep.
`1984, pp. 2184—2186, New York; J. P. Beraud et al.:
`“Fast fourier transform calculating circuit”.
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 1 of 59
`
`5,329,630
`
`i :(#) #. P
`
`C M
`— 190
`
`D T P
`
`i120
`
`DTP
`MC I/F
`
`
`
`
`
`Data Cache
`Memory
`140
`
`<!-
`
`st
`É
`?º
`2
`©
`to
`O
`
`256
`
`FP 130
`
`|
`
`[-
`
`GIP
`I/F
`170
`
`Host
`I/F
`160
`
`ºf /F
`150
`
`Fig. 1
`
`
`
`U.S. Patent
`
`
`
`PJsJUBISUOD
`
`was
`
`
`
`LzGizsnqssouenbasyur]zitsnado2
`StyStsayynq/6ay
`
`
`
`bleZe
`
`Sheet 2 of 59
`
`5,329,630
`
`0c
`
`
`
` OvecSOMYhlZ
`
`Lb2cotethE
`
`alteJa}siIBay
`ASTre
`
`July 12, 1994
`
`OrdeJn==C02OvAdlJa\siBay
`9“Iapow:souont
`bes:dee
`
`
`
`GeeaayeBOMdd
`12zYOO|DBpOdOJIIWyOldBus]dO
`
`ieoe8¢oseuy6ua|
`Japo03aqgyoo|oeuledidAsng
`
`
`
`
`
`Joyesaueb
`
`vebis
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 004
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 3 of 59
`
`5,329,630
`
`(ZE)
`
`uO??Onu?SulŒ|| || || || || ||
`
`- OSIWN
`
`
`
`
`
`
`
`§, | (c) boles | (*) logºs}}
`
`
`uol? puoo |Ja?s1608;
`
`• • • • ? ? ? ? ? • • • • • • • • • • • • • • • • • • • • • • • * * * * * * * ***
`
`• = = * * * * * *
`* * * * * * * * * *
`
`az '61-I
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 4 of 59
`
`5,329,630
`
`playyue}SuODylusOu
`auWosnjeys
`
`LLE
`
`waeL_som_|
`veeso
`MWILLE
`
`GLLe
`
`Mux 313
`
`
`
`youriqAemiinyy
`
`BIEo160,
`
`9h.
`
`Jaouanbas
`
`Ore
`
`©) Mux 312
`
`SIE
`
`‘sngsaouanbas
`
`ce
`
`aRa|
`3ICL9E
`
`EsJ9p0038q
`
`Link
`Reg/buffer
`
`VLE
`
`col
`
`Oo
`
`hadOoNcRo”
`extend
`316
`
`ot
`
`Ove
`
`Lot
`
`[eke
`Joysibay
`
`[ts4s9\s!6ay
`
`Spo
`
`ve‘bis
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 006
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 5 of 59
`
`5,329,630
`
` 315
` Sequencer310
`
`TDBUS122
`
`
`
`
` Constant
`
`Field
`
`FromMICROBUS
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 007
`
`
`
`U.S. Patent
`U.S. Patent
`
`July 12, 1994
`July 12, 1994
`
`Sheet 6 of 59
`Sheet 6 of 59
`
`5,329,630
`5,329,630
`
`
`
`(ze)uoNoNysUy
`
`Ndi
`
`sngGl
`
`(g)ya]es
`
`(2)
`
`
`
`6assaippe--OSI
`
`poysnjyejs-
`1x9ubis-
`
`
`OgS|su0d|GMW|VIANG
`
`
`(z)|(94)|(S)991@S}(€)IsUT
`
`
`(g)josjUOD](c)ajas}UHuUs|fan]oonWodpuog|yoojg|sidnuaju;
`
`9¢‘big
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 008
`
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 7 of 59
`
`5,329,630
`
`
`
`
`
`
`
`Transfer
`clock
`geºlor
`
`Local transfer
`clocks
`
`
`
`CP
`Extension
`
`CD bus 112
`
`
`
`
`
`
`
`
`
`CD Bus Trans-
`CelVerS
`444
`
`Local CP
`extension
`registers
`
`Fig. 4A
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 8 of 59
`
`5,329,630
`
`
`
`144
`Cache bus
`
`420
`
`-— 434
`
`431
`-— 432
`
`433
`
`Fig. 4B
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 9 of 59
`
`5,329,630
`
`EIS
`
`
`
`APpysnq
`
`VELL
`
`'ainpootii_—
`
`IWA
`
`
`
`O9T[as
`
`Goo
`
`YSS
`
`T8P
`
`ssalppy
`
`yoR]s
`
`
`
`aunnoiqnsv9SlEoly
`
`
`
`8ZySZPBoy
`
`AQOOOYXOIN
`
`Op“614
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 011
`
`
`
`
`U.S. Patent
`U.S. Patent
`
`July 12, 1994
`
`Sheet 10 of 59
`
`5,329,630
`5,329,630
`
`
`
`appyaM]aspe4|9P0Dpuogd}aysasibey|9j4GeyNdBayNd4|yGua)y9019WI
`
`(v1)Ippy(g)alas|Burssauppe|(€)3M|(8)YONSMuIsU]|joNuOO(p)961
`
`
`
`
`(9)ayaWyIDYN(z)demspueeyeyspue}
`
`
`(4)sseuppyXx(¢)sajqeuasng
`
`(Z)ssauppy(€)[0.44U0dpedYo}eJOS
`(vt)(6)
`
`
`
`
`
`
`
`(Zz)jo4juooyoRSBuNNOIGNS
`
`(Zz)josJUODUDISINag8|qnoq
`
`
`
`(z)josjuodsnyelsgAyons
`
`
`
`
`
`(6)
`
`
`
`
`
`
`
`
`
`
`(1)Jo4JU09GYPIOESN
`
`ap“614
`
`
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 012
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 11 of 59
`
`5,329,630
`
`vlS
`
`bob
`
`Wd+XNW
`
`0¢eS
`
`Wd
`
`Ges
`
`91607LS
`
`LvSOPS
`160Zvs
`
`
`bbeuoesquyges
`zingaog“bl4A|I
`[es]||||Joo|
`||fee}L|CL]LL]LJ}bes)
`ESLSSddukol
`21607]CJOJSUBL
` Jajsues|LoS41dMhev“
`
`oSS
`
`bylSNqByseD
`
`€vS
`
`--T-eo
`
`esoss
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 013
`
`
`
`
`
`
`AWABALIS
`
`
`
`apovepynBOF}191U|Ssouppessoippypueog
`
`
`
`
`MoweceSpooap
`
`
`
`
`
`eckSNGOL
`
`bobSNGVL
`
`voo9snqJppyAWA
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 12 of 59
`
`5,329,630
`
`0r9
`
`099ego
`
`ydnuajujpueYG
`
`48||0}U09
`
`sng
`
`
`
`
`
`ydna\uU|PUBSWIA
`
`jO1}U09
`
`0S9
`
`sngAWA
`
`
`
`9009saolas
`
`AWA
`
`ydnwa\ul
`
`
` 0992160)
`
`-OJOIN
`
`apoo
`
`peo|
`
`jouju09
`
`o16o}
`
`snqBegJWA
`
`go009
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 014
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 13 of 59
`
`5,329,630
`
`TD bus
`122
`
`Fifo full 770
`
`Strobes 760
`*=º-
`
`DATA PIPE
`OUT
`730
`
`~/
`
`Emp
`
`...
`Fifo
`740
`
`780
`
`780
`
`Full
`
`Data
`
`DATA PIPE
`IN § 1
`720
`
`Empt
`
`Rd
`
`Fifo
`Y50
`
`DATA PIPE
`|N Nº 2
`71
`
`Full
`
`Data
`Wr
`
`Fig. 7
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 14 of 59
`
`5,329,630
`
`
`
`
`
`„BWASOMTILL TOE
`
`Ou?uOO±?<!---*o | „BOSONA
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 15 of 59
`
`5,329,630
`
`Cl O JE
`
`ClO
`
`
`
` ?| || SnC|VO| – Z? ? Snq
`
`Data Cache
`Memory
`140
`
`
`
`AA
`
`DP
`
`150
`
`Fig. 9A
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 16 of 59
`
`5,329,630
`
`
`
`
`
`
`
`
`
`Õ?6 ETIH HELSIÐBH
`
`WÕT5
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 17 of 59
`
`5,329,630
`
`ae@
`
`D- |
`
`DP
`
`150
`
`Cl
`CD ŽE
`
`
`
`?| || SnC|VO| –
`
`Data Cache
`
`
`
`FP
`
`Fig. 10
`
`
`
`U.S. Patent
`U.S, Patent
`
`July 12, 1994
`July 12, 1994
`
`Sheet 18 of 59
`Sheet 18 of 59
`
`5,329,630
`5,329,630
`
`
`
`pueapovep
`
`OSTTEjo4Uu0D
`
`
`
`
`
`
`
`apig/Aaniny
`abiewpial
`
`wun
`
`uoNonsjsu||STE snguononsu|
`
`wu
`
`Orir
`
`ÕST?
`OErT
`
`tb“Big
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 020
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 19 of 59
`
`5,329,630
`
`Internal Bus 1250
`
`
`
`
`
`Comparator
`1230
`
`Fig. 12
`
`Bit Reverse
`1240
`
`1270
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 20 of 59
`
`5,329,630
`
`sng.d.
`
`
`
`soppy}dnuia}u|
`
`
`
`OLEl$10}O3/
`
`Ovet
`
`V4
`
`OTUUHel
`
`ssolppyBa.yuu
`
`yorssiajunoy
`
`J9\uN0o
`
`wesbolg
`
`
`
`
`
`
`
`
`JaxaidyjnuSsauppeSspoodoJdI/y
`
`oeer
`
`a160|
`
`ydnw9}u|
`$}dnsajy|
`
`eL“bis
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 022
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 21 of 59
`
`5,329,630
`
`
`
`N
`
`
`
`ø?qeua ?nd?nO = EO
`
`TES
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 22 of 59
`
`5,329,630
`
`
`
`
`
`
`
`TOETSELTOETSELTOETSEL
`| Nooperation | z I - |z|-|z|-
`|Eye Extend Enabel BT.7|Enable Etz Enable Bitz
`Bytezero Fi? Enable zero Enable zero Enable zero
`| Word Extend |ZT-Enable BºtsIEnableBitts
`| Wordzero Fi?
`Z || -
`Enable zero Enable zero
`
`Z = Hi Impedance
`
`Fig. 14B
`
`
`
`U.S. Patent
`
`5,329,630
`
`
`
`+----------
`
`L
`
`* = * * * = = m =
`
`* * * * * =
`
`s?Ë
`
`
`
`(ISOH uuou
`
`L---------
`
`§:
`
`| | | SnC, VO
`
`lº
`
`Z| | SnC, CO
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 24 of 59
`
`5,329,630
`
`Cache bus 144
`
`256
`
`DCM I/F 1620
`
`
`
`
`
`430A
`
`430D 433
`Reg File
`430
`
`32
`
`430B 430C
`
`
`
`
`
`
`
`440A 440B
`FMPY
`440
`
`
`
`
`
`450A 450B
`FALU
`450
`
`
`
`
`
`Fig. 16
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 25 of 59
`
`5,329,630
`
`
`
`(1:2)
`XnW
`
`T?s 8 T?s
`
`?pOW
`
`
`
`9VTQ TOESn
`
`ZL ‘61-I
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 26 of 59
`
`5,329,630
`
`ÕIGT
`
`
`
`gO;8 || SS3/ppW
`
`<
`O
`3
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 27 of 59
`
`5,329,630
`
`Jpp\/
`
`(Gviov)
`
`
`
`?u.Od lenC]
`
`Áuouaw
`
`SsauppV (
`
`OIGT
`
`dV/NWS
`
`
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 28 of 59
`
`5,329,630
`
`uppV/
`
`
`
`?pOWN
`
`dV/NWS
`
`d'O UUO]]
`
`
`
`
`
`epow SS000\/
`
`TES
`TBS uod dº
`
`
`
`?pOW
`
`OZ "fil
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 29 of 59
`
`5,329,630
`
`SENA
`SS0/ppv/
`
`XOOIO
`19?Sueu L
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`992
`
`?z "fil-l
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 30 of 59
`
`5,329,630
`
`LIVM d=]
`
`LIVNA CHO
`
`?UOC] d'O
`
`LIVM d'O
`LIVM d=]
`
`LIVNA d'O
`LIVNA d=]
`
`LIVNA d=]
`
`LIVWA CHO
`
`
`
`- LIVNA d'O
`LIVNA d=]
`
`LIVNA d-!
`
`LIVNA CHO
`
`9.UOC.] CHO
`
`LIVNA d-!
`
`LIVNA d'O
`
`ZZ "61-I
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 31 of 59
`
`5,329,630
`
`
`
`ÖZGZ
`
`
`
`
`
`
`
`
`
`£Z "fil
`
`
`
`U.S. Patent
`
`
`
`ÖFT Å HOWEW B'HOVO V LVC]
`
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 33 of 59
`
`5,329,630
`
`MUX
`530
`
`Memo
`º
`510
`
`Enable
`
`
`
`FP Write
`Mask
`Logic
`2510
`
`Fig. 25
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 34 of 59
`
`5,329,630
`
`
`
`
`
`ÕIGE
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 35 of 59
`
`5,329,630
`
`XIO C]
`
`ÁSng
`
`
`
`
`
`- - - - - - - - - - 4 - J --- - - - - - - - - - - - 0–
`
`
`peÐH ??BOJ SOM
`
`???JNA e?BO SOM
`
`ZZ "fil
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 36 of 59
`
`5,329,630
`
`
`
`CP 2 EXT WCS
`
`T
`
`CP 1 EXT WCS
`
`3.
`
`8Z "fil
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 37 of 59
`
`5,329,630
`
`Aioway
`
`Ort
`
`
`
`ayoeygeyed
`
`BulpioHda
`
`si90}sibay
`
`Oey
`
`
`
`ujdoo}[eas
`
`SOM
`
`OZv
`
`
`“SQDIAQP0}S}ndul
`
`apooooiwGasun)\7y
`
`
`
`inodoo|jeuaS
`
`4a}si6ay
`auijadid
`
`Olb.6z“bis
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 039
`
`
`
`
`
`
`U.S. Patent
`U.S. Patent
`
`July 12, 1994
`July 12, 1994
`
`Sheet 38 of 59
`Sheet 38 of 59
`
`5,329,630
`5,329,630
`
`
`
`(osa‘14‘zS3)oe“bi
`13S4did0O||SAILV1SY
`Jo1U0DYIUS*3"!)Ayduseoj!
`
`
`
`JOJUODBPODIOIW[04JU0DBPODOIIVY
`
`abio-UOl}|puod
`
`SN.LVLSGAGOONA.
`
`1IMYZ/L>Od100SNLWLS
`AydwieO14000GACGOONN
`
`21607
`
`Aemninw
`
`BsUlUS
`
`playjue}suog
`
`
`
`
`
`
`
`
`
`smeis991607]€|apoouyjoajegUONINUOD
`
`IN}Z/L<OFOLO
`
`INYO140
`
`OLOE
`
`UOIPLUOJU|
`
`lInyseyOf
`
`IInyOf
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 040
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 39 of 59
`
`5,329,630
`
`
`
`
`
`WWWW,
`}}}}}} |\ |\ i
`ITIII || || ||
`
`O)
`
`
`
`
`
`/X\/\\ /X\/\\
`||||||||||||||||
`XXX
`C?. 3,
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 40 of 59
`
`5,329,630
`
`
`
`
`
`
`
`
`
`
`
`Read next Pixel from register file
`
`
`
`
`
`
`
`Add pixel to base address of the
`histogram table
`
`Load address register with
`histogram address
`
`Read pixel count into ALU input
`register
`
`Increment pixel count by one
`
`Write new pixel count into
`histogram table
`
`More pixels?
`
`
`
`Fig. 32
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 41 of 59
`
`5,329,630
`
`CP MICROCODE
`
`i
`
`FP MICROCODE
`
`Load FP start address reg
`with microcode address
`Wait loop from
`and start FP running. FS&
`& previous command
`Transfer first 8 elements
`; Nº,
`of array A to register file
`:
`Z/Y.
`
`Cºstart D
`
`Transfer first 8 elements
`of array B to register file
`
`:
`:
`
`Request register file
`Swap
`
`S
`
`ses
`Set CP done and request –SS3
`sº
`register file swap
`§4. T to No
`i sº
`-
`Do 8 calculations and
`leave result in register file
`
`Yes
`
`i
`
`No
`Transfer last 8 result
`elements from register
`file into arrary C
`C End D
`
`
`
`
`
`
`
`
`
`Set CP done and
`request register file
`SW3C
`
`Transfer 8 result
`elements from register
`file into arrary C
`
`Fig. 33
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 42 of 59
`
`5,329,630
`
`Ö?IF
`
`?GIF
`
`?GIF
`
`ÕGIF
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ve ‘61-I
`
`
`
`July 12, 1994
`
`OG?
`OTF
`
`O?IF
`OSTP
`
`
`
`U.S. Patent
`U.S. Patent
`
`eon
`
`Joyesa|900V
`
`feQUeWNN
`
`J0ye13|899Y
`
`jeoLaWNN
`
`JOye19|900VV
`
`ÖGT7
`OSTP
`
`
`
`
`
`
`
`yeouewny
`
`JOyeJBjOIOY
`
`OSTP
`
`Sheet 43 of 59
`
`5,329,630
`
`OlLySNEAWA
`
`SSEW
`
`aes
`
`OZT¥
`
`Aowayy
`
`1SOH
`
`ce“bid
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 045
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`U.S. Patent
`
`July 12, 1994
`July 12, 1994
`
`Sheet 44 of 59
`Sheet 44 of 59
`
`5,329,630
`5,329,630
`
`
`
`
`
`OAdIANOLLVLSHHOMCE
`
`JEQUSLUNN
`
`Joyela|a001"7
`
`snqOSTPOST
`
`
`
`andd~dsoyes9}909soye1aja00y
`
`
`
`yeOUOLUNNfeouawny
`
`sEOUSLUNN
`
`JoyesajaooVy
`
`OSTP
`
`
`
`
`
`Bulpeys
`
`
`UodmatAjeoveuinNBAOadSJ9¢d
`OSTPBuiddyo
`
`Budde,Joyes9|909'y
`
`UOIJELUOJSUBI|
`
`OSty
`
`soeyNns
`
`UsPPIH
`
`Reg
`
`aseg
`
`WOM
`
`AINA
`
`snd
`
`OLLP
`
`Sse
`
`ebe10is
`
`OZTP
`
`iSOH
`
`Aowsay
`
`9¢“bis
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 046
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 45 of 59
`
`5,329,630
`
`
`
`OTOGTF
`(?Ae?S)
`
`EIVNA
`
`ST18
`
`SSeW
`
`ZE ‘61-I
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 46 of 59
`
`5,329,630
`
`
`
`
`
`
`
`
`
`
`
`
`
`• æ • • • • • • • • • • •O 19 STIS WOCl -------------|:||---------- O?9 STIS WOCl -------------
`
`
`WO
`
`
`
`
`
`• - • • • •OZZ SONA d'O -------------- 029 SOM d_1C] ---------
`
`
`
`
`
`
`
`
`
`• • • • • • • • -fiu|p|OH ------------------ 6u(pIOH ----------
`
`ºsô0HºsôæH
`
`
`
`
`
`d_LC] (8 d'Od_1C] '8 d'O
`
`[FOTEET):TÕT?TI
`
`
`
`
`
`- WO
`
`
`
`
`
`09? -|/|5|WA ---------------------------------------------
`
`
`
`
`
`
`
`
`
`? ? *-+ - - - - - - - -, -, -,OG? -]/I ?d|d eyeO -------------[??ET)------ 08! =|/| OWN d1C] '804 || -/| dl50 -------
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 47 of 59
`
`5,329,630
`
`
`
`TUIGETI
`DOETUIGBOE)
`
`
`
`|?=F===', ___0Z8€ TOE ----------
`
`
`
`g88 -61-I
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 48 of 59
`
`5,329,630
`
`
`
`
`
`'-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -p?91-J SS0/ppv/ 9S|B-?|
`
`6& -61-I
`
`
`
`O@3°ooi S=
`
`U.S. Patent
`U.S. Patent
`
`x o
`
`->
`C)
`Qi.)
`*C.
`O
`O
`9
`.92
`>
`
`:
`
`Fig.40A
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 051
`
`July 12, 1994
`July 12, 1994
`
`Sheet 49 of 59
`Sheet 49 of 59
`
`5,329,630
`5,329,630
`
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 50 of 59
`
`5,329,630
`
`interrupt service routine
`
`Hold status
`flag copy
`
`
`
`interrupt routine
`
`
`
`
`
`
`
`INTERRUPT
`
`Generate test condition
`and store in sequencer
`flag
`
`Conditional Jump
`True
`path
`
`
`
`Return from
`interrupt and
`restore sequencer
`flag
`
`Fig. 40B
`
`
`
`U.S. Patent
`
`July 12, w -
`
`Sheet 51 of 59
`
`5,329,630
`
`C
`:-
`l
`à
`LL]
`>
`>
`
`Mass
`Memory
`4160
`
`Mass
`Storage
`4170
`
`Interface
`4180
`
`Host
`Computer
`4100
`
`
`
`
`
`Picture ProC
`
`4140
`
`o E
`So H
`* :
`2 :
`3 :
`g :
`E =
`.9 =
`0- i.
`
`Numeric Accel
`
`4150
`ei sã
`* E
`cr) :
`?º
`§
`92 -
`t
`:
`à E
`H
`C :
`
`?ºl- >
`
`s
`
`Numeric Accel
`
`4150
`e;
`cu -
`§:
`
`Fig. 41
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 52 of 59
`
`5,329,630
`
`
`
`
`Calculate R[1] = A[1]*Bi1
`Write R[O}
`
`
`
`
`
`
`
`
`
`
`
`
`Read A[3] and BI3
`Calculate R[2] = A[2}*B/2
`Write R[1}
`
`
`
`
`
`Set FP done and req swap
`
`Fig. 42
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 054
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 53 of 59
`
`5,329,630
`
`i
`
`#. P
`
`i
`
`
`
`D T P
`
`i
`
`Data Cache
`
`GIP
`I/F
`
`--- | |;
`Memory 140 | 170
`
`<!
`
`St g
`º
`
`+
`O
`co
`O
`
`[…]
`
`Host
`WF
`160
`
`DP
`I/F
`
`150
`
`DCM Ext'n
`
`4310
`
`Fig. 43
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 54 of 59
`
`
`
`
`
`
`
`wyty "fil
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`5,329,630
`
`
`
`Gyv "fil-l
`
`Sheet 55 of 59
`
`EEEEE|É||Í||?||
`
`
`
`U.S. Patent
`U.S. Patent
`
`July 12, 1994
`July 12, 1994
`
`Sheet 56 of 59
`Sheet 56 of 59
`
`5,329,630
`5,329,630
`
`
`
`s0)s169y
`
`alld
`
`:
`ovr“Big
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 058
`
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 57 of 59
`
`5,329,630
`
`
`
`HI ?ST)
`
`Z ? ? Snq CJO
`
`
`
`Giv "fil
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 58 of 59
`
`5,329,630
`
`
`
`
`
`9v "fil
`
`
`
`U.S. Patent
`
`July 12, 1994
`
`Sheet 59 of 59
`
`5,329,630
`
`|
`|
`
`| |
`
`SU00||
`
`Zv "fil-l
`
`
`
`1
`
`SYSTEM AND METHOD USING
`DOUBLE-BUFFER PREVIEW MODE
`
`5,329,630
`
`2
`separate processors. Of course, some algorithms can
`profit by pipelining or parallelism to a much greater
`degree than others.
`The speed ofa pipelineis limited by its slowest stage.
`Moreover, the averageefficiency of a pipelined system
`will be diluted by two overhead requirements: the pipe-
`line must befilled at the start of the operation, and must
`be emptied at the end of the operation. The impact of
`these overheads depends on ratio of the number of
`elements which must be passed through the pipeline in
`one runto the numberofstagesin the pipeline (referred
`to as the length of the pipeline). Thus, these overheads
`may be unimportant whenthe length of the pipeline is
`short, and the numberof elementsper runis fairly long.
`However, for a longer pipeline (or for shorter runs),
`these overheads can be an importantfactor in through-
`put.
`
`20
`
`25
`
`40
`
`45
`
`INTER-PROCESSOR DATA EXCHANGE
`
`The interface between two processors in a multipro-
`cessor system often requires that data be passed back
`and forth rapidly. Double buffering is a commonly used
`technique to permit data transfer, without hangups,loss
`of data synchronization, or data access collisions. Nor-
`mally the memory spaceto besharedis divided into two
`physical memories, and the accesses are arbitrated in
`hardware so that, on any one cycle, each processor can
`access only half the memory space(i.e. one ofthe physi-
`cal memories).
`FIG. 18 shows one example of a prior arrangement
`for double buffering. The port select logic 1810 pro-
`vides select signals to data buffers 1860, so that the two
`data busses 1850A and 1850B (from the sides of the
`double buffer) are connected to either the first or sec-
`ond memory 1820. The port select logic 1810 also pro-
`vides select signals to address multiplexers 1830, so that
`the two address busses 1840A and 1840B are connected
`to access either the first or second memory 1820.
`FIG. 19 shows another example of a prior arrange-
`ment for software-controlled double buffering. The
`port select logic 1910 providesselect signals directly to
`the most significant address bit A6 of a dual port mem-
`ory 1920. Thus, each port sees only half of the physical
`address space, but the double buffering can be quite
`transparent.
`CACHE MEMORY ARCHITECTURES
`
`This is a continuation of application Ser. No. 326,781,
`filed Mar. 21, 1989, now abandoned.
`PARTIAL WAIVER OF COPYRIGHT
`All of the material in this patent application is subject
`to copyright protection underthe copyright laws of the
`United Kingdom,
`the United States, and of other
`countries. As of the first effective filing date of the
`presentapplication, this material is protected as unpub-
`lished material.
`However, permission to copy this material is hereby
`granted to the extent that the copyright owner has no
`objection to the facsimile reproduction by anyoneofthe
`patent documentor patent disclosure, as it appears in
`official patentfile or records of the United Kingdom or
`any other country, but otherwise reserves all copyright
`rights whatsoever.
`BACKGROUNDOF THE INVENTION
`
`The present invention relates to computer systems
`and subsystems, and to computer-based methods for
`data processing.
`HIGH-SPEED MULTIPROCESSOR
`ARCHITECTURES
`
`It has long been realized that the use of multiple pro-
`cessors operating in parallel might in principle be a very
`convenient way to achieve very high net throughput.
`Many such architectures have been proposed. How-
`ever, the actual realization of such architectures is very
`difficult. In particular,it is difficult to design an archi-
`tecture of this kind which will be versatile enough to
`satisfy a range of users and adapt to advancesin tech-
`nology.
`Fully asynchronous multiprocessor architectures
`have been proposed,butit is generally recognized in the
`art that the problems of programming support in a mul-
`tiprocessor architecture have not nearly been solved.
`A very recent overview of some of the issues in-
`volved in multiprocessor systems may be found in Du-
`bois et al., “Synchronization, Coherence, and Event
`Ordering in Multiprocessors,” Computer magazine,
`February 1988, page 9, which is hereby incorporated by
`reference. A recently proposed multiprocessor archi-
`tecture for digital signal processing is described in Lang
`et al., “An Optimum Parallel Architecture for High-
`Speed Real-Time Digital Signal Processing,” Computer
`magazine, February 1988, page 47, which is hereby
`incorporated by reference.
`INTER-PROCESSOR SYNCHRONIZATION
`
`Synchronization between processors is a continuing
`critical issue in a very wide variety of multiprocessor
`system. Often such inter-processor interfaces make use
`of “processor-waiting” or “processor-ready” status
`signals which can beset or cleared by either processor.
`(Such signals are commonly knownas ‘“‘semaphores.”)
`INTER-PROCESSOR DATA ROUTING
`
`60
`
`Two general concepts of allocating work among
`processors are pipelining and parallelism. “Pipelining”
`is generally used to refer to data routings where a single
`data set is successively operated on by more than one
`processor. Parallelism refers to data routings where
`different operations are concurrently performed by
`
`Cache memoryis a conventional way to increase the
`net throughput of computing systems.Ifa large fraction
`of memory accesses are expected to call on memory
`locations already in cache, then every read from cache
`can save an amount of time equal to the difference be-
`tween the cache access time and the main memory
`access time. Therefore, cache memory systems nor-
`mally attempt to maximize the bandwidth to the cache.
`MICROCODED ARCHITECTURES
`
`An extremely important tool for developing high-
`speed and/or flexible computer architectures is micro-
`coding. See J. Mack & J. Brick, Bit-Slice Microprocessor
`Design (1980), which is hereby incorporated by refer-
`ence. Microcoded architectures are not only extremely
`flexible, but also have the potential
`to provide ex-
`tremely high speed.
`In microcoded architectures the individual instruc-
`tions are fairly long (e.g. 100 bits or so). Some fairly
`low-level logic decodes the instructions, so that appro-
`
`CAVIUM-1052
`Cavium, Inc. v. Alacritech, Inc.
`Page 062
`
`
`
`5
`
`15
`
`20
`
`5,329,630
`4
`3
`has much simpler timing requirements than an inter
`priate fields are sent to low-level devices (such as regis
`leaved memory architecture would. (However, a large
`ter files, adders, etc.).
`percentage of non-sequential accesses will ultimately
`The total number of bits in the instruction field will
`reduce the bandwidth to that of a normal single-width
`typically be very much larger than the log2 of the total
`architecture.)
`number of instructions. This permits the decode opera
`This memory architecture also has advantages in a
`tion to be made very much simpler. Microcoded archi
`multi-port situation where some or all of the ports have
`tectures commonly use a sequencer to perform address
`a much lower bandwidth than the memory itself. In
`calculations and perform a first level of decode. (Alter
`these cases there will be some intermediate storage
`natively, a lower level of logic can be used to perform
`(normally registers) to capture the data for later access
`the program sequencing function.) The sequencer ac
`10
`ing over several cycles by the recipient. While such
`cesses microinstructions from a control store (memory),
`time-multiplexed accesses are in progress, there is no
`and various portions of the microinstructions are pro
`demand on the memory system for bandwidth.
`vided to additional decode logic, and/or applied di
`rectly to devices. Since a single instruction can contain
`In the preferred embodiment there are also some
`significant novelties in the interface logic which con
`many command fields (all of which will be executed
`simultaneously), it is possible to write surprisingly short
`trols the data interface to the cache from the numeric
`processor. These features will be discussed in greater
`microcode programs.
`Since the individual instructions are quite low-level,
`detail below.
`-
`and fairly long, the total program storage required can
`A feature which helps to maximize the throughput of
`be quite significant. The data transfer requirements for
`the transfers in the transitional clock domain is a dou
`loading a microcode routine can be significant.
`ble-word interface on only one side of the fast register
`file. That is, the register file appears, on the cache mem
`SUMMARY OF THE INVENTION
`ory side, as if it were 64 bits wide. However, on the
`The present application provides a large number of
`FPU side it only appears to be 32 bits wide. This results
`innovative teachings, which will be described in the
`25
`in some odd/even structure in the word addresses, but
`general context of a system like that shown in FIG. 1.
`possible problem due to this odd/even structure are
`Among the innovative teachings set forth herein is a
`avoided by several innovative features. Since these
`multiprocessor numeric processing subsystem wherein
`problem can be avoided, the double-word interface
`an extremely wide local bus connects the arithmetic
`provides substantial advantages in the bandwidth of the
`calculation subunit to a large data cache memory. This
`register file interface.
`cache is multiported, so that newly retrieved data can
`Some significant advantages are also derived from
`be written into the cache at essentially the same time
`the preferred scheme for arbitrating access of the con
`that data transfer is occurring between the numeric
`trol processor and data-transfer processor to the cache
`processing subunit and the cache.
`memory. In the presently preferred embodiment, the
`To get a very high memory bandwidth, there are
`cache is physically dual-ported, but it is used as if it
`only three basic strategies:
`were triported.
`1. Use very fast memory devices: The problem here is
`The data cache memory is triported between the
`one of economics and size. Very fast memory de
`control processor module, the data-transfer processor
`vices are very expensive, sometimes as much as ten
`module, and the numeric processor module(s), so some
`times the cost of the slower counterparts, and the
`form of arbitration is necessary to control access. The
`number of storage bits per device is more limited.
`control processor generates addresses and controls the
`The major advantage of this technique is that the
`routing of data for itself and the floating-point proces
`bandwidth improvement is independent of the data
`sor(s) under program control so the control processor
`layout in memory (assuming that the address gen
`and floating-point processor access are mutually exclu
`erator is fast enough).
`45
`sive. The data-transfer processor, however, is totally
`2. Use interleaved memories: Interleaved memories
`autonomous and can compete for access at any time.
`have traditionally been used with dynamic RAMs
`In the presently preferred embodiment, the arbitra
`(DRAMs), where the cycle times have been longer
`tion is such that the control processor/floating-point
`than the access times. In this context, a significant
`processor has access whenever it wishes, and the data
`advantage can be gained by interleaving two or
`transfer processor makes use of any unused access cy
`more banks and offsetting the timing between
`cles. To make use of the unused cycles, the data-transfer
`banks. The problem with this technique occurs
`processor includes extra hardware which will allow it
`when successive accesses keep hitting the same
`to use a single free cycle amongst many busy ones.
`bank, or accesses through another port (in a multi
`The control processor and data-transfer processor are
`port memory)) disturbs the sequential accessing of
`55
`preferably autonomous but synchronized. This is ac
`banks. This technique can be used with static mem
`complished by letting them share a common microcode
`ories (SRAMs), but the equal access and cycle
`clock. This synchrony simplifies the arbitration. The
`times make it less attractive than with DRAMs.
`control processor and data-transfer process granted
`3. Use a wide memory structure: Normally the mem
`signal is available before the cycle in which the data
`ory width would be the same as the word width.
`transfer process. This signal therefore has enough time
`For example, a system using 32-bit words would
`to propagate into the sequencer, thus allowing the data
`typically use a 32-bit wide memory architecture.
`transfer process is not granted, then the data-transfer
`However, several of the innovative teachings set
`process cycles so the data-transfer processor will not
`forth herein show how a system with a much wider
`have long to wait. However, if the data-transfer proces
`local bus to cache memory can be very advanta
`65
`sor's program requires an end to waiting, the data-trans
`geous.
`fer processor can interrupt the control processor. On
`A wide memory structure provides high bandwidth
`receiving this interrupt the control processing the mem
`by accessing many words in parallel. Such a structure
`
`30
`
`35
`
`50
`
`
`
`15
`
`25
`
`-
`
`tecture.
`
`5,329,630
`6
`5
`Preferably double buffering is used in a register file at
`ory, and let the data-transfer processor in for at least
`the interface between a numeric processor