throbber
United States Patent [19]
`Baldwin
`
`[54] SYSTEM AND METHOD USING
`DOUBLE-BUFFER PREVIEW MODE
`[75] Inventor:
`David R. Baldwin, Weybridge,
`United Kingdom
`[73] Assignee: DuPont Pixel Systems Limited,
`London, United Kingdom
`[21] Appl. No.: 925,238
`[22] Filed:
`Jul. 31, 1992
`
`[63)
`
`Related U.S. Application Data
`Continuation of Ser. No. 326,781, Mar. 21, 1989, aban
`doned.
`Foreign Application Priority Data
`[30]
`Mar. 23, 1988 (GB) United Kingdom ................. 8806850
`Mar. 23, 1988 [GB] United Kingdom ................. 8806856
`Mar. 23, 1988 [GB} United Kingdom ....
`. 8806864
`Mar. 23, 1988 [GB] United Kingdom ................. 8806865
`[51] Int. Cl." .............................................. G06F 12/06
`[52] U.S. Cl. .................................... 395/425; 395/800;
`364/DIG. 1; 364/228.1; 364/238.4; 364/242.6;
`364/242.91; 364/244; 364/244.8; 364/245.5;
`364/245.7; 364/284; 364/284.1
`[58] Field of Search ................ 395/425, 250, 325, 725
`[56]
`References Cited
`U.S. PATENT DOCUMENTS
`3,623,017 11/1971 Lowell et al. ....................... 395/550
`4,149,242 4/1979 Pirz ..................................... 395/325
`4,172,287 10/1979 Kawabe et al. ..................... 364/736
`4,396,978 8/1983 Hammer et al. ...
`... 364/200
`4,443,846 4/1984 Adcock ..............
`... 364/200
`4,495,567 1/1985 Treen ..........
`. 364/200
`... 395/550
`4,633,434 12/1986 Scheuneman ..
`... 395/375
`4,722,049 1/1988 Lahti ..............
`4,870,572 9/1989 Hosono et al. ...................... 364/200
`FOREIGN PATENT DOCUMENTS
`0186150 2/1986 European Pat. Off. .
`0085435 10/1986 European Pat. Off. .
`0284751 2/1988 European Pat. Off. .
`2162406A 6/1985 United Kingdom .
`WO86/07174 4/1986 World Int. Prop. O. .
`
`FP Port
`
`Mode
`
`|||||||||||||||||||||||||||||||||
`USOO5329630A
`5,329,630
`[11] Patent Number:
`[45] Date of Patent:
`Jul. 12, 1994
`
`OTHER PUBLICATIONS
`David C. Wyland, “Dual-Port, Rams Simplify Com
`munication in Computer Systems,” Integrated Device
`Technology, Inc., 1986.
`Bureaux D’Etudes Automatishmes, No. 32, Mar. 1987,
`pp. 85–87; J. Gustafson: Un super-ordinateur vectoriel
`homogene, p. 85, figure; p. 85, left-hand column. line
`35—p. 87, middle column, line 9.
`Conference Proceedings IEEE Southeastcon '87,
`Tampa, Fla., Apr., 1987, vol. 1, pp. 225–228; M. C.
`Ertem: A reconfigurable co-processor for microprocessor
`systems, FIGS. 2–4; p. 226, left-hand column, line 6–p.
`227, left hand column, line 25.
`(List continued on next page.)
`Primary Examiner—Paul V. Kulik
`Attorney, Agent, or Firm—Robert Groover
`[57]
`ABSTRACT
`A novel double buffering subsystem, wherein a dual
`port memory is partitioned in software so that the top
`half of the memory is allocated to one processor, and
`the bottom half to the other. (This allocation is switched
`when both processors set respective flag bits indicating
`that they are ready to switch.) On accesses to this mem
`ory, additional bits tag the access as “physical,” “logi
`cal,” or “preview.” A physical access is interpreted as a
`literal address within the full memory, and the double
`buffering is ignored. A logical access is supplemented
`by an additional address bit, determined by the double
`buffering switch state. A preview access is used for read
`access only, and goes to the opposite bank of memory
`from that which would be accessed in a logical access.
`This double-buffer architecture is advantageously used,
`in a multiprocessor system, at the interface between a
`numeric processor and a cache bus. The preview access
`can help to avoid data flow inefficiencies at synchroni
`zation points in pipelined algorithms.
`
`19 Claims, 59 Drawing Sheets
`A1:A5
`Addr
`
`CP Port
`
`Mode
`
`Access Mode
`00 - Physical
`01 - Logical
`10 - Preview
`
`
`
`SWAP
`from FP
`
`SWAP
`from CP
`
`INTEL Ex.1052.001
`
`

`

`5,329,630
`Page 2
`
`OTHER PUBLICATIONS
`
`Proceedings of the Fourth Euromicro Symposium on
`Microprocessing and Microprogramming, Munich,
`Oct. 1978, pp. 358-365; F. B. Jorgensen et al.: A Bi-mi
`croprocessor implementation of a variable topology multi
`processor node, FIGS. 1–6, p. 358, right-hand column,
`line 13—p. 362, right-hand column, line 21.
`G. J. Myers: Digital system design with LSI bit-slice logic,
`1980, pp. 230–239, John Wiley & Sons, Inc., US p. 237,
`lines 1–4.
`W. Lichtenstein, “The Architecture of the Culler”,
`Mar. 1986, IEEE Coupon Spring 86, pp. 467–470.
`Proceedings of the IEEE, vol. 73, No. 5, May 1985, pp.
`
`852–873, IEEE, New York; J. Allen: “Computer archi
`tecture for digital signal processing”.
`Computer Design, vol. 16, No. 6, Jun. 1977, pp. 151–163;
`A. J. Weissberger: “Analysis of multiple-microproces
`sor system architectures”, FIGS. 7,8, p. 161.
`IEEE Electro, vol. 8, Apr. 1983, pp. 3/31–5, New York;
`B. J. New: “Address generation in signal/array proces
`sors”.
`Proceedings ICASSP, Dallas, Apr. 6th–9th, 1987, vol. 1,
`pp. 531–534; D. M. Taylor, et al.; “A novel VLSI digi
`tal signal processor architecture for high-speed vector
`and transform operations”.
`IBM Technical Disclosure Bulletin, vol. 27, No. 4A, Sep.
`1984, pp. 2184—2186, New York; J. P. Beraud et al.:
`“Fast fourier transform calculating circuit”.
`
`INTEL Ex.1052.002
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 1 of 59
`
`5,329,630
`
`i :(#) #. P
`
`C M
`— 190
`
`D T P
`
`i120
`
`DTP
`MC I/F
`
`
`
`
`
`Data Cache
`Memory
`140
`
`<!-
`
`st

`?º
`2

`to
`O
`
`256
`
`FP 130
`
`|
`
`[-
`
`GIP
`I/F
`170
`
`Host
`I/F
`160
`
`ºf /F
`150
`
`Fig. 1
`
`INTEL Ex.1052.003
`
`

`

`
`
`
`
`nmamansocmsammé:N:95noNwEma'P3N
`tm:9236$
`1v.u50:30amu3m.hMMaO<3E.Bflmwm3mm
`S.Em:E9200omwwmmu.
`
`man
`
`eo%vNNw0>><ENmmF:—rm
`
`495566mmm03.2
`
`’mm
`
`0
`
`0
`
`.1EN
`
`&,<~.9".manIv>aBecomem:
`
`5.nmNmumx020mE—mai.QnNm
`wx020$8922x0206:2no
`mummf2%235m5
`mmmmmEN595.
`
`.oaSmcom
`
`INTEL Ex.1052.004
`
`
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 3 of 59
`
`5,329,630
`
`(ZE)
`
`uO??Onu?Sul&#8Œ|| || || || || ||
`
`- OSIWN
`
`
`
`
`
`
`
`§, | (c) boles | (*) logºs}}
`
`
`uol? puoo |Ja?s1608;
`
`• • • • ? ? ? ? ? • • • • • • • • • • • • • • • • • • • • • • • * * * * * * * ***
`
`• = = * * * * * *
`* * * * * * * * * *
`
`az '61-I
`
`INTEL Ex.1052.005
`
`

`

`US. Patent
`
`5,329,630
`
`
`
`.mIIOE4,
`
`h855?.
`
`co
`
`,ow:mmEofcfimcooEzwOwn“.
`
`mm—m_
`2dlvd.
`
`Mm:Sm5kmmm.
`
`my,W.mocmscwmWr3n:5Emm
`
`SwFm.mw506.5—
`
`
`soc—Sn$335.2W.mm—t.9x.EmGnue
`was$26:.3wmymm?mamehmm>agoo.em
`
`
`
`anlamaan
`
`.2680
`
`FNm
`
`flwNw
`
`INTEL Ex.1052.006
`
`
`
`
`

`

`US. Patent
`
`July 12, 1994
`
`Sheet 5 of 59
`
`5,329,630
`
`SequencerBLLQ
`
`315
`
`(D,_
`
`TDBUS122
`
`
`
`E"
`
`co
`
`Fig.33
`
`311
`
`CO1-
`
`Constant
`Field
`
`3a
`
`n Oa
`
`:
`$2
`
`2 E 2L
`
`L
`
`INTEL Ex.1052.007
`
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 6 of 59
`
`5,329,630
`
`
`
`INTEL Ex.1052.008
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 7 of 59
`
`5,329,630
`
`
`
`
`
`
`
`Transfer
`clock
`geºlor
`
`Local transfer
`clocks
`
`
`
`CP
`Extension
`
`CD bus 112
`
`
`
`
`
`
`
`
`
`CD Bus Trans-
`CelVerS
`444
`
`Local CP
`extension
`registers
`
`Fig. 4A
`
`INTEL Ex.1052.009
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 8 of 59
`
`5,329,630
`
`
`
`144
`Cache bus
`
`420
`
`433
`
`Fig. 4B
`
`-— 434
`
`431
`-— 432
`
`INTEL Ex.1052.010
`
`

`

`US. Patent
`
`July 12, 1994
`
`Sheet 9 of 59
`
`5,329,630
`
`oESobaw
`
`axoflm
`
`..mm_mu_..
`
`$062
`
`“Mmmam
`
`.uoshF—u
`
`$922.
`
`Q9E
`
`a
`
`wDOOOmQE
`
`E5
`
`62
`
`.:N25
`
`m§>
`
`“S
`
`duolw
`
`mNN
`
`aImm
`
`mice—2
`
`.mw
`
`ow.9".
`
`INTEL Ex.1052.011
`
`
`
`
`
`

`

`U.S. Patent
`US. Patent
`
`.J
`
`w.mwam.S
`
`5,329,630
`5,329,630
`
`
`6.250cmcofigom o.:8m:mo.mg29:828923mmfiamEmwawmw:H,»8Vm2m5mamm,E$922.Amv
`
`
` E6:80m<20$3Q65:8xoflm$55985Q.9250c2985mrnsoo
`
`
`
`
`
`
`:52VC53‘
`
`.60"03:.0min.
`
`
`88.280.:566me23mm2%.
`
`@62395868852avcozoaus
`
`.9250mgE”.
`av
`(6)
`
`a;
`
`£896208:2
`
`
`
`
`
`
`
`
`
`
`
`
`9.9”.
`
`INTEL Ex.1052.012
`
`
`
`
`
`
`

`

`U
`
`nJ
`
`.w.
`
`:osz5<S.a8m
`..FN—.2;
`
`Dam063nmm
`
`95m._<n_+x229.iItmmm
`aw.Em.chmc.
`
`Em%
`063Nvm
`
`1,inu._<n_
`
`w
`
`0
`
`Ill_lllllllllllllllllllllllllllllllllllIII.Mmvmm3;m3228
`mmm5«mm
`
` »._.---5...flag...................................--_3,_n__su_uu__nungIIIIIaEn__w_u__cm_II._.||
`mm—.m,was9..»$3300mEu—
`
`060..o53ch
`
`4clawEmnoimp
`
`INTEL Ex.1052.013
`
`
`
`

`

`US. Patent
`
`July 12, 1994
`
`Sheet 12 of 59
`
`5,329,630
`
`a
`
`amMod
`
`
`
`35:25cam<20
`
`mam
`
`8:258
`
`
`boEms.Mad«68%038%
`
`
`
`.mGofiMoo83.95mmofiom$062Emomm55m>m_m
`
`5..£5<._.
`
`<oowman59‘ms_> NNFmamDH
`
`
`
`
`astmusucmmgam
`
`
`
`.968add.Doom$223
`
`manms_>
`
`m=2>
`
`awn050.a3:95
`
`SwamimPa.2
`
`Ma
`
`one.
`
`.2280
`
`062
`
`
`
`25Samm§>
`
`meow
`
`INTEL Ex.1052.014
`
`
`
`
`
`
`
`
`
`
`
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 13 of 59
`
`5,329,630
`
`TD bus
`122
`
`Fifo full 770
`
`Strobes 760
`*=º-
`
`DATA PIPE
`OUT
`730
`
`~/
`
`Emp
`
`...
`Fifo
`740
`
`780
`
`780
`
`Full
`
`Data
`
`DATA PIPE
`IN § 1
`720
`
`Empt
`
`Rd
`
`Fifo
`Y50
`
`DATA PIPE
`|N Nº 2
`71
`
`Full
`
`Data
`Wr
`
`Fig. 7
`
`INTEL Ex.1052.015
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 14 of 59
`
`5,329,630
`
`
`
`
`
`„BWASOMTILL TOE
`
`Ou?uOO±?<!---*o | „BOSONA
`
`INTEL Ex.1052.016
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 15 of 59
`
`5,329,630
`
`Cl O JE
`
`
`
` ?| || SnC|VO| – Z? ? Snq ClO
`
`Data Cache
`Memory
`140
`
`
`
`AA
`
`DP
`
`150
`
`Fig. 9A
`
`INTEL Ex.1052.017
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 16 of 59
`
`5,329,630
`
`
`
`
`
`
`
`
`
`Õ?6 ETIH HELSIÐBH
`
`WÕT5
`
`INTEL Ex.1052.018
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 17 of 59
`
`5,329,630
`
`ae@
`
`D- |
`
`DP
`
`150
`
`Cl
`CD ŽE
`
`
`
`?| || SnC|VO| –
`
`Data Cache
`
`
`
`FP
`
`Fig. 10
`
`INTEL Ex.1052.019
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 18 of 59
`
`5,329,630
`
`
`
`
`
`
`
`
`
`ÕST?
`
`INTEL Ex.1052.020
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 19 of 59
`
`5,329,630
`
`Internal Bus 1250
`
`
`
`
`
`Comparator
`1230
`
`Fig. 12
`
`Bit Reverse
`1240
`
`1270
`
`INTEL Ex.1052.021
`
`

`

`US. Patent
`
`HJ
`
`..w.
`
`w
`
`9
`
`mam.0.
`
`463:95
`
`828>
`
`gun?
`
`1,lag.n325.:
`
`Imam
`92.500
`9959:.
`
`.m.258_.=onu.s.mc.8505m062
`5.98Q.:E89mm890.
`SE252.
`
`5,329,630
`
`2.9".
`
`INTEL Ex.1052.022
`
`
`
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 21 of 59
`
`5,329,630
`
`
`
`N
`
`
`
`ø?qeua ?nd?nO = EO
`
`TES
`
`INTEL Ex.1052.023
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 22 of 59
`
`5,329,630
`
`
`
`
`
`
`
`TOETSELTOETSELTOETSEL
`| Nooperation | z I - |z|-|z|-
`|Eye Extend Enabel BT.7|Enable Etz Enable Bitz
`Bytezero Fi? Enable zero Enable zero Enable zero
`| Word Extend |ZT-Enable BºtsIEnableBitts
`| Wordzero Fi?
`Z || -
`Enable zero Enable zero
`
`Z = Hi Impedance
`
`Fig. 14B
`
`INTEL Ex.1052.024
`
`

`

`U.S. Patent
`
`5,329,630
`
`
`
`+----------
`
`L
`
`* = * * * = = m =
`
`* * * * * =
`
`s?Ë
`
`
`
`(ISOH uuou
`
`L---------
`
`§:
`
`| | | SnC, VO
`
`lº
`
`Z| | SnC, CO
`
`INTEL Ex.1052.025
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 24 of 59
`
`5,329,630
`
`Cache bus 144
`
`256
`
`DCM I/F 1620
`
`
`
`
`
`430A
`
`430D 433
`Reg File
`430
`
`32
`
`430B 430C
`
`
`
`
`
`
`
`440A 440B
`FMPY
`440
`
`
`
`
`
`450A 450B
`FALU
`450
`
`
`
`
`
`Fig. 16
`
`INTEL Ex.1052.026
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 25 of 59
`
`5,329,630
`
`
`
`(1:2)
`XnW
`
`T?s 8 T?s
`
`?pOW
`
`
`
`9VTQ TOESn
`
`ZL ‘61-I
`
`INTEL Ex.1052.027
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 26 of 59
`
`5,329,630
`
`ÕIGT
`
`
`
`gO;8 || SS3/ppW
`
`<
`O
`3
`
`INTEL Ex.1052.028
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 27 of 59
`
`5,329,630
`
`Jpp\/
`
`(Gviov)
`
`
`
`?u.Od lenC]
`
`Áuouaw
`
`SsauppV (
`
`OIGT
`
`dV/NWS
`
`
`
`
`
`
`
`
`
`
`
`INTEL Ex.1052.029
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 28 of 59
`
`5,329,630
`
`uppV/
`
`
`
`?pOWN
`
`dV/NWS
`
`d'O UUO]]
`
`
`
`
`
`epow SS000\/
`
`TES
`TBS uod dº
`
`?pOW
`
`OZ "fil
`
`INTEL Ex.1052.030
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 29 of 59
`
`5,329,630
`
`SENA
`SS0/ppv/
`
`XOOIO
`19?Sueu L
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`992
`
`?z "fil-l
`
`INTEL Ex.1052.031
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 30 of 59
`
`5,329,630
`
`LIVM d=]
`
`LIVNA CHO
`
`?UOC] d'O
`
`
`LIVM d'O
`LIVM d=]
`
`LIVNA d'O
`LIVNA d=]
`
`LIVNA d=]
`
`LIVWA CHO
`
`
`
`- LIVNA d'O
`LIVNA d=]
`
`LIVNA d-!
`
`LIVNA CHO
`
`9.UOC.] CHO
`
`LIVNA d-!
`
`LIVNA d'O
`
`ZZ "61-I
`
`INTEL Ex.1052.032
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 31 of 59
`
`5,329,630
`
`
`
`ÖZGZ
`
`
`
`
`
`
`
`
`
`£Z "fil
`
`INTEL Ex.1052.033
`
`

`

`U.S. Patent
`
`
`
`ÖFT Å HOWEW B'HOVO V LVC]
`
`
`
`
`
`INTEL Ex.1052.034
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 33 of 59
`
`5,329,630
`
`
`
`FP Write
`Mask
`Logic
`2510
`
`Fig. 25
`
`MUX
`530
`
`Memo

`510
`
`Enable
`
`INTEL Ex.1052.035
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 34 of 59
`
`5,329,630
`
`
`
`
`
`ÕIGE
`
`
`
`INTEL Ex.1052.036
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 35 of 59
`
`5,329,630
`
`XIO C]
`
`ÁSng
`
`
`
`
`
`- - - - - - - - - - 4 - J --- - - - - - - - - - - - 0–
`
`
`peÐH ??BOJ SOM
`
`???JNA e?BO SOM
`
`ZZ "fil
`
`INTEL Ex.1052.037
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 36 of 59
`
`5,329,630
`
`
`
`CP 2 EXT WCS
`
`T
`
`CP 1 EXT WCS
`
`3.
`
`8Z "fil
`
`INTEL Ex.1052.038
`
`

`

`US. Patent
`
`2,
`
`wmS
`
`0
`
`>682).
`
`9.x.
`
`280San
`
`n.dmlvM299me
`
`9:201at
`
`E32Brow
`
`mm0>>
`
`a
`
`
`
`9.mSogoo.fizmm
`
`.mmuSwu223E
`2.892892::Ev
`
`aa9,mm.9...n5&me
`&,8:85
`
`INTEL Ex.1052.039
`
`
`
`
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 38 of 59
`
`5,329,630
`
`
`
`
`
`
`
`
`
`
`
`INTEL Ex.1052.040
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 39 of 59
`
`5,329,630
`
`
`
`
`
`WWWW,
`}}}}}} |\ |\ i
`ITIII || || ||
`
`O)
`
`
`
`
`
`/X\/\\ /X\/\\
`||||||||||||||||
`XXX
`C?. 3,
`
`
`
`
`
`
`
`INTEL Ex.1052.041
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 40 of 59
`
`5,329,630
`
`
`
`
`
`
`
`
`
`
`
`Read next Pixel from register file
`
`
`
`
`
`
`
`Add pixel to base address of the
`histogram table
`
`Load address register with
`histogram address
`
`Read pixel count into ALU input
`register
`
`Increment pixel count by one
`
`Write new pixel count into
`histogram table
`
`More pixels?
`
`
`
`Fig. 32
`
`INTEL Ex.1052.042
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 41 of 59
`
`5,329,630
`
`CP MICROCODE
`
`i
`
`FP MICROCODE
`
`Load FP start address reg
`with microcode address
`Wait loop from
`and start FP running. FS&
`& previous command
`Transfer first 8 elements
`; Nº,
`of array A to register file
`:
`Z/Y.
`
`Cºstart D
`
`Transfer first 8 elements
`of array B to register file
`
`:
`:
`
`Request register file
`Swap
`
`S
`
`ses
`Set CP done and request –SS3
`sº
`register file swap
`§4. T to No
`i sº
`-
`Do 8 calculations and
`leave result in register file
`
`Yes
`
`i
`
`No
`Transfer last 8 result
`elements from register
`file into arrary C
`C End D
`
`INTEL Ex.1052.043
`
`
`
`
`
`
`
`
`
`Set CP done and
`request register file
`SW3C
`
`Transfer 8 result
`elements from register
`file into arrary C
`
`Fig. 33
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 42 of 59
`
`5,329,630
`
`Ö?IF
`
`?GIF
`
`?GIF
`
`ÕGIF
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ve ‘61-I
`
`INTEL Ex.1052.044
`
`

`

`U.S. Patent
`US. Patent
`
`N.
`
`Mw.u,
`
`mm
`
`92395
`
`w
`
`OG?
`dun?
`
`O?IF
`.0de
`
`In.a
`
`
`
`and“
`ÖGT7
`
`32.6622
`
`83.8.82
`
`325.52
`
`58.282
`
`
`
`825.52
`
`585.82
`
`30:9:3Z
`
`LOawaBOoxx
`
`
`
`
`
`
`
`oF3mamm=2>
`
`
`
`@0905w322bop—5.2“mo...
`
`ad.on.0de
`
`6,mm.9”.
`
`INTEL Ex.1052.045
`
`
`
`
`
`
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 44 of 59
`
`5,329,630
`
`
`
`
`
`INTEL Ex.1052.046
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 45 of 59
`
`5,329,630
`
`
`
`OTOGTF
`(?Ae?S)
`
`EIVNA
`
`ST18
`
`SSeW
`
`ZE ‘61-I
`
`INTEL Ex.1052.047
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 46 of 59
`
`5,329,630
`
`
`
`
`
`
`
`
`
`
`
`
`
`09? -|/|5|WA ---------------------------------------------
`
`
`
`
`
`
`• æ • • • • • • • • • • •O 19 STIS WOCl -------------|:||---------- O?9 STIS WOCl -------------
`WO
`
`- WO
`
`[FOTEET):TÕT?TI
`
`ºsô0HºsôæH
`
`
`
`
`
`
`
`• • • • • • • • -fiu|p|OH ------------------ 6u(pIOH ----------
`
`d_LC] (8 d'Od_1C] '8 d'O
`
`
`
`
`
`
`
`
`
`
`
`• - • • • •OZZ SONA d'O -------------- 029 SOM d_1C] ---------
`
`
`
`
`
`
`
`
`
`? ? *-+ - - - - - - - -, -, -,OG? -]/I ?d|d eyeO -------------[??ET)------ 08! =|/| OWN d1C] '804 || -/| dl50 -------
`
`
`
`
`
`
`
`
`
`
`
`
`
`INTEL Ex.1052.048
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 47 of 59
`
`5,329,630
`
`
`
`TUIGETI
`DOETUIGBOE)
`
`
`
`|?=F===', ___0Z8€ TOE ----------
`
`
`
`g88 -61-I
`
`INTEL Ex.1052.049
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 48 of 59
`
`5,329,630
`
`
`
`
`
`'-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -p?91-J SS0/ppv/ 9S|B-?|
`
`6& -61-I
`
`INTEL Ex.1052.050
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 49 of 59
`
`5,329,630
`
`
`
`->
`C)
`Qi.)
`*C.
`O
`O
`9
`.92
`>
`
`:
`
`INTEL Ex.1052.051
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 50 of 59
`
`5,329,630
`
`interrupt service routine
`
`Hold status
`flag copy
`
`
`
`interrupt routine
`
`
`
`
`
`
`
`INTERRUPT
`
`Generate test condition
`and store in sequencer
`flag
`
`Conditional Jump
`True
`path
`
`
`
`Return from
`interrupt and
`restore sequencer
`flag
`
`Fig. 40B
`
`INTEL Ex.1052.052
`
`

`

`U.S. Patent
`
`July 12, w -
`
`Sheet 51 of 59
`
`5,329,630
`
`Host
`Computer
`4100
`
`
`
`
`
`Picture ProC
`
`4140
`
`o E
`So H
`* :
`2 :
`3 :
`g :
`E =
`.9 =
`0- i.
`
`Numeric Accel
`
`4150
`ei sã
`* E
`cr) :
`?º

`92 -
`t
`:
`à E
`H
`C :
`
`?ºl- >
`
`s
`
`Numeric Accel
`
`4150
`e;
`cu -
`§:
`
`Fig. 41
`
`C
`:-
`l

`LL]
`>
`>
`
`Mass
`Memory
`4160
`
`Mass
`Storage
`4170
`
`Interface
`4180
`
`INTEL Ex.1052.053
`
`

`

`US. Patent
`
`July 12, 1994
`
`Sheet 52 of 59
`
`5,329,630
`
`Request register file swap
`
`Read A 0 and BO
`
`Read A1 and B1
`
`Calculate Fl[0] = A[0]'B[0]
`
`
`
`
`__ C
`
`
`
`
`alculate R 1 = A1 'B 1
`
`
`
`
`
`
`
`
`
`ReadAO'andBO'
`
`Calculate R 3 = A 3 'B 3
`
`Write R[2]
`
`ReadA1'andBt'
`
`Calculate Fl 0 ' = A 0 "B 0'
`
`Write R[3]
`
`Set FP done and re swa
`
`
`
`
`Flead A[0]' and B[0]'
`
`
`
`Write R[3]
`
`Set FP done and re swa
`
`Fig. 42
`
`INTEL Ex.1052.054
`
`
`Write R[0]
`
`Read A[3 and B 3
`Calculate Ft 2 = A 2 ‘B 2
`
`Write R[1]
`
`
`
`
`
`
`
`
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 53 of 59
`
`5,329,630
`
`i
`
`#. P
`
`i
`
`
`
`D T P
`
`i
`
`Data Cache
`
`GIP
`I/F
`
`--- | |;
`Memory 140 | 170
`
`<!
`
`St g

`
`+
`O
`co
`O
`
`[…]
`
`Host
`WF
`160
`
`DCM Ext'n
`
`4310
`
`Fig. 43
`
`DP
`I/F
`
`150
`
`INTEL Ex.1052.055
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 54 of 59
`
`
`
`
`
`
`
`wyty "fil
`
`INTEL Ex.1052.056
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 55 of 59
`
`EEEEE|É||Í||?||
`
`5,329,630
`
`
`
`Gyv "fil-l
`
`INTEL Ex.1052.057
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 56 of 59
`
`5,329,630
`
`
`
`:
`
`INTEL Ex.1052.058
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 57 of 59
`
`5,329,630
`
`
`
`HI ?ST)
`
`Z ? ? Snq CJO
`
`
`
`Giv "fil
`
`INTEL Ex.1052.059
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 58 of 59
`
`5,329,630
`
`
`
`
`
`9v "fil
`
`INTEL Ex.1052.060
`
`

`

`U.S. Patent
`
`July 12, 1994
`
`Sheet 59 of 59
`
`5,329,630
`
`| |
`|
`|
`
`SU00||
`
`Zv "fil-l
`
`INTEL Ex.1052.061
`
`

`

`1
`
`SYSTEM AND METHOD USING
`DOUBLE-BUFFER PREVIEW MODE
`
`5,329,630
`
`2
`separate processors. Of course, some algorithms can
`profit by pipelining or parallelism to a much greater
`degree than others.
`The speed of a pipeline is limited by its slowest stage.
`Moreover, the average efficiency of a pipelined system
`will be diluted by two overhead requirements: the pipe-
`line must be filled at the start of the operation, and must
`be emptied at the end of the operation. The impact of
`these overheads depends on ratio of the number of
`elements which must be passed through the pipeline in
`one run to the number of stages in the pipeline (referred
`to as the length of the pipeline). Thus, these overheads
`may be unimportant when the length of the pipeline is
`short, and the number of elements per run is fairly long.
`However, for a longer pipeline (or for shorter runs),
`these overheads can be an important factor in through-
`put.
`
`INTER-PROCESSOR DATA EXCHANGE
`
`The interface between two processors in a multipro-
`cessor system often requires that data be passed back
`and forth rapidly. Double buffering is a commonly used
`technique to permit data transfer, without hangups, loss
`of data synchronization, or data access collisions. Nor-
`mally the memory space to be shared is divided into two
`physical memories, and the accesses are arbitrated in
`hardware so that, on any one cycle, each processor can
`access only half the memory space (i.e. one of the physi-
`cal memories).
`FIG. 18 shows one example of a prior arrangement
`for double buffering. The port select logic 1810 pro-
`vides select signals to data buffers 1860, so that the two
`data busses 1850A and 1850B (from the sides of the
`double buffer) are connected to either the first or sec-
`ond memory 1820. The port select logic 1810 also pro-
`vides select signals to address multiplexers 1830, so that
`the two address busses 1840A and 184-013 are connected
`to access either the first or second memory 1820.
`FIG. 19 shows another example of a prior arrange-
`ment for software-controlled double buffering. The
`port select logic 1910 provides select signals directly to
`the most significant address bit A6 of a dual port mem-
`ory 1920. Thus, each port sees only half of the physical
`address space, but the double buffering can be quite
`transparent.
`
`CACHE MEMORY ARCHITECTURES
`
`Cache memory is a conventional way to increase the
`net throughput of computing systems. If a large fraction
`of memory accesses are expected to call on memory
`locations already in cache, then every read from cache
`can save an amount of time equal to the difference be—
`tween the cache access time and the main memory
`access time. Therefore, cache memory systems nor-
`mally attempt to maximize the bandwidth to the cache.
`MICROCODED ARCHITECTURES
`
`An extremely important tool for deve10ping high-
`speed and/or flexible computer architectures is micro-
`coding. See J. Mack & J. Brick, Bit-Slice Microprocessor
`Design (1980), which is hereby incorporated by refer-
`ence. Microcoded architectures are not only extremely
`flexible, but also have the potential
`to provide ex-
`tremely high speed.
`instruc-
`In microcoded architectures the individual
`tions are fairly long (e.g. 100 bits or so). Some fairly
`low-level logic decodes the instructions, so that appro-
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`This is a continuation of application Ser. No. 326,781,
`filed Mar. 21, 1989, now abandoned.
`PARTIAL WAIVER OF COPYRIGHT
`All of the material in this patent application is subject
`to copyright protection under the copyright laws of the
`United Kingdom,
`the United States, and of other
`countries. As of the first effective filing date of the
`present application, this material is protected as unpub-
`lished material.
`
`However, permission to copy this material is hereby
`granted to the extent that the copyright owner has no
`objection to the facsimile reproduction by anyone of the
`patent document or patent disclosure, as it appears in
`official patent file or records of the United Kingdom or
`any other country, but otherwise reserves all copyright
`rights whatsoever.
`
`BACKGROUND OF THE INVENTION
`
`The present invention relates to computer systems
`and subsystems, and to computer-based methods for
`data processing.
`HIGH-SPEED MULTIPROCESSOR
`ARCHITECTURES
`
`It has long been realized that the use of multiple pro-
`cessors operating in parallel might in principle be a very
`convenient way to achieve very high net throughput.
`Many such architectures have been prOposed. How-
`ever, the actual realization of such architectures is very
`difficult. In particular, it is difficult to design an archi-
`tecture of this kind which will be versatile enough to
`satisfy a range of users and adapt to advances in tech-
`nology.
`Fully asynchronous multiprocessor architectures
`have been proposed, but it is generally recognized in the
`art that the problems of programming support in a mul-
`tiprocessor architecture have not nearly been solved.
`A very recent overview of some of the issues in-
`volved in multiprocessor systems may be found in Du-
`bois et al., “Synchronization, Coherence, and Event
`Ordering in Multiprocessors,” Computer magazine,
`February 1988, page 9, which is hereby incorporated by
`reference. A recently proposed multiprocessor archi-
`tecture for digital signal processing is described in Lang
`et al., “An Optimum Parallel Architecture for High-
`Speed Real-Time Digital Signal Processing,” Computer
`magazine, February 1988, page 47, which is hereby
`incorporated by reference.
`INTER-PROCESSOR SYNCHRONIZATION
`
`Synchronization between processors is a continuing
`critical issue in a very wide variety of multiprocessor
`system. Often such inter-processor interfaces make use
`of “processor-waiting” or “processor-ready” status
`signals which can be set or cleared by either processor.
`(Such signals are commonly known as “semaphores.”)
`INTER-PROCESSOR DATA ROUTING
`
`Two general concepts of allocating work among
`processors are pipelining and parallelism. “Pipelining”
`is generally used to refer to data routings where a single
`data set is successively operated on by more than one
`processor. Parallelism refers to data routings where
`different operations are concurrently performed by
`
`65
`
`INTEL Ex.1052.062
`
`
`

`

`5
`
`15
`
`20
`
`5,329,630
`4
`3
`has much simpler timing requirements than an inter
`priate fields are sent to low-level devices (such as regis
`leaved memory architecture would. (However, a large
`ter files, adders, etc.).
`percentage of non-sequential accesses will ultimately
`The total number of bits in the instruction field will
`reduce the bandwidth to that of a normal single-width
`typically be very much larger than the log2 of the total
`architecture.)
`number of instructions. This permits the decode opera
`This memory architecture also has advantages in a
`tion to be made very much simpler. Microcoded archi
`multi-port situation where some or all of the ports have
`tectures commonly use a sequencer to perform address
`a much lower bandwidth than the memory itself. In
`calculations and perform a first level of decode. (Alter
`these cases there will be some intermediate storage
`natively, a lower level of logic can be used to perform
`(normally registers) to capture the data for later access
`the program sequencing function.) The sequencer ac
`10
`ing over several cycles by the recipient. While such
`cesses microinstructions from a control store (memory),
`time-multiplexed accesses are in progress, there is no
`and various portions of the microinstructions are pro
`demand on the memory system for bandwidth.
`vided to additional decode logic, and/or applied di
`rectly to devices. Since a single instruction can contain
`In the preferred embodiment there are also some
`significant novelties in the interface logic which con
`many command fields (all of which will be executed
`simultaneously), it is possible to write surprisingly short
`trols the data interface to the cache from the numeric
`processor. These features will be discussed in greater
`microcode programs.
`Since the individual instructions are quite low-level,
`detail below.
`-
`and fairly long, the total program storage required can
`A feature which helps to maximize the throughput of
`be quite significant. The data transfer requirements for
`the transfers in the transitional clock domain is a dou
`loading a microcode routine can be significant.
`ble-word interface on only one side of the fast register
`file. That is, the register file appears, on the cache mem
`SUMMARY OF THE INVENTION
`ory side, as if it were 64 bits wide. However, on the
`The present application provides a large number of
`FPU side it only appears to be 32 bits wide. This results
`innovative teachings, which will be described in the
`25
`in some odd/even structure in the word addresses, but
`general context of a system like that shown in FIG. 1.
`possible problem due to this odd/even structure are
`Among the innovative teachings set forth herein is a
`avoided by several innovative features. Since these
`multiprocessor numeric processing subsystem wherein
`problem can be avoided, the double-word interface
`an extremely wide local bus connects the arithmetic
`provides substantial advantages in the bandwidth of the
`calculation subunit to a large data cache memory. This
`register file interface.
`cache is multiported, so that newly retrieved data can
`Some significant advantages are also derived from
`be written into the cache at essentially the same time
`the preferred scheme for arbitrating access of the con
`that data transfer is occurring between the numeric
`trol processor and data-transfer processor to the cache
`processing subunit and the cache.
`memory. In the presently preferred embodiment, the
`To get a very high memory bandwidth, there are
`cache is physically dual-ported, but it is used as if it
`only three basic strategies:
`were triported.
`1. Use very fast memory devices: The problem here is
`The data cache memory is triported between the
`one of economics and size. Very fast memory de
`control processor module, the data-transfer processor
`vices are very expensive, sometimes as much as ten
`module, and the numeric processor module(s), so some
`times the cost of the slower counterparts, and the
`form of arbitration is necessary to control access. The
`number of storage bits per device is more limited.
`control processor generates addresses and controls the
`The major advantage of this technique is that the
`routing of data for itself and the floating-point proces
`bandwidth improvement is independent of the data
`sor(s) under program control so the control processor
`layout in memory (assuming that the address gen
`and floating-point processor access are mutually exclu
`erator is fast enough).
`45
`sive. The data-transfer processor, however, is totally
`2. Use interleaved memories: Interleaved memories
`autonomous and can compete for access at any time.
`have traditionally been used with dynamic RAMs
`In the presently preferred embodiment, the arbitra
`(DRAMs), where the cycle times have been longer
`tion is such that the control processor/floating-point
`than the access times. In this context, a significant
`processor has access whenever it wishes, and the data
`advantage can be gained by interleaving two or
`50
`transfer processor makes use of any unused access cy
`more banks and offsetting the timing between
`cles. To make use of the unused cycles, the data-transfer
`banks. The problem with this technique occurs
`processor includes extra hardware which will allow it
`when successive accesses keep hitting the same
`to use a single free cycle amongst many busy ones.
`bank, or accesses through another port (in a multi
`The control processor and data-transfer processor are
`port memory)) disturbs the sequential accessing of
`55
`preferably autonomous but synchronized. This is ac
`banks. This technique can be used with static mem
`complished by letting them share a common microcode
`ories (SRAMs), but the equal access and cycle
`clock. This synchrony simplifies the arbitration. The
`times make it less attractive than with DRAMs.
`control processor and data-transfer process granted
`3. Use a wide memory structure: Normally the mem
`signal is available before the cycle in which the data
`ory width would be the same as the word width.
`transfer process. This signal therefore has enough time
`For example, a system using 32-bit words would
`to propagate into the sequencer, thus allowing the data
`typically use a 32-bit wide memory architecture.
`transfer process is not granted, then the data-transfer
`However, several of the innovative teachings set
`process cycles so the data-transfer processor will not
`forth herein show how a system with a much wider
`have long to wait. However, if the data-transfer proces
`local bus to cache memory can be very advanta
`65
`sor's program requires an end to waiting, the data-trans
`geous.
`fer processor can interrupt the control processor. On
`A wide memory structure provides high bandwidth
`receiving this interrupt the control processing the mem
`by accessing many words in parallel. Such a structure
`
`30
`
`35
`
`INTEL Ex.1052.063
`
`

`

`15
`
`25
`
`35
`
`tecture.
`
`5,329,630
`6
`5
`Preferably double buffering is used in a register file at
`ory, and let the data-transfer processor in for at least
`the interface between a numeric processor and a large
`one cycle.
`data cache memory in a multiprocessor system. The
`The data-transfer process therefore accesses the
`partitioning of the register file avoids data collisions in
`memory no more often than once every 8 cycles. Its
`the cache memory
`bandwidth demands are therefore very low.
`In this sample embodiment, a 5-ported register file,
`The innovative teachings of the present application
`configured as two physically separate banks of high
`also enable a multiprocessor numeric processing sys
`speed memory, is used. However, a wide variety of
`tem, which bas a well-defined modular expansion inter
`other implementations could be used instead.
`face. This system can be used with one or several nu
`This innovation provides much greater flexibility
`meric processing modules. The modular interface per
`10
`than conventional systems which perform double buff
`mits multiple numeric processing modules (of different
`ering in hardware, at no loss in speed.
`types if desired) to be connected in parallel.
`The “preview” mode permits this double-buffering
`A control processor controls data transfers into and
`implementation to be used as a versatile interface archi
`out of each of

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket