throbber
(12) United States Patent
`(12) United States Patent
`US 6,615,338 B1
`(10) Patent No.:
`US 6,615,338 B1
`(10) Patent No.:
`Tremblay et al.
`(45) Date of Patent:
`Sep. 2, 2003
`
`(45) Date of Patent: Sep. 2, 2003
`Tremblayet al.
`
`US006615338B1
`USOO661.5338B1
`
`(54)
`(54)
`
`(75)
`(75)
`
`(73)
`(73)
`
`(*)
`
`(21)
`(21)
`(22)
`(22)
`(51)
`(61)
`(52)
`(52)
`(58)
`(58)
`(56)
`(56)
`
`EP
`EP
`EP
`EP
`
`CLUSTERED ARCHITECTUREIN A VLIW
`CLUSTERED ARCHITECTURE IN A VLIW
`PROCESSOR
`PROCESSOR
`Inventors: Marc Tremblay, Menlo Park, CA (US);
`Inventors: Marc Tremblay, Menlo Park, CA (US);
`William Joy, Aspen, CO (US)
`William Joy, Aspen, CO (US)
`Assignee: Sun Microsystems, Inc., Palo Alto, CA
`Assignee: Sun Microsystems, Inc., Palo Alto, CA
`(US)
`(US)
`
`Notice:
`Notice:
`
`Subject to any disclaimer, the term of this
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`U.S.C. 154(b) by 0 days.
`
`OTHER PUBLICATIONS
`OTHER PUBLICATIONS
`Findlayet al., “HARP: AVLIW RISCProcessor”, IEEE, pp.
`Findlay et al., “HARP: A VLIW RISC Processor", IEEE, pp.
`368-372, 1991.*
`:
`:
`.
`368-372, 1991.*
`Keckler et al. “Processor Coupling: Integrating Compile
`Keckler et al: “Processor Coupling: Integrating Compile
`Time and Runtime Scheduling for Parallelism” Proceedings
`Time and Runtime Scheduling for Parallelism” Proceedings
`of the Annual International Symposium on Computer Archi
`of the Annual International Symposium on Computer Archi-
`tecture, US, New York, IEEE, vol. Symp. 19, 1992, pp.
`tecture, US, New York, IEEE, vol. Symp. 19, 1992, pp.
`202-213, XP000325804, ISBN: 0-89791-510-6.
`202-213, XP000325804, ISBN: 0-89791-510-6.
`Steven et al.: “iHARP: a multiple instruction issue proces
`Steven et al.: “iHARP: a multiple instruction issue proces-
`sor IEE Proceedings E. Computers & Digital Techniques.,
`sor” IEE Proceedings E. Computers & Digital Techniques.,
`vol. 139, No. 5, Sep. 1992, pp. 439–449, XP000319892,
`vol. 139, No. 5, Sep. 1992, pp. 439-449, KP000319892,
`Institution of Electrical Engineers. Stevenage., GB, ISSN:
`Institution of Electrical Engineers. Stevenage., GB, ISSN:
`Appl. No.: 09/204,584
`1350-2387.
`Appl. No.: 09/204,584
`1350-2387.
`Filed:
`Dec. 3, 1998
`* cited by examiner
`Filed:
`Dec. 3, 1998
`* cited by examiner
`Primary Examiner Emanuel Todd Voeltz
`7
`Primary Examiner—Emanuel Todd Voeltz
`7
`ne oceenenaraacacacnenenOOTS (74) Aitorney, Agent, or Firm—Zagorin, O’Brien &
`
`th m"' is (74) Attorney, Agent, or Firm Zagorin, O'Brien &
`Field of Search .....cccccccsecsnen 712/24, 23, 217,
`Graham, LLP
`Field of Search ............................. ... ..., Graham, LLP
`ABSTRACT
`(57)
`ABSTRACT
`(57)
`.
`References Cited
`References Cited
`A Very Long Instruction Word (VLIW) processor has a
`A Very Long Instruction Word (VLIW) processor has a
`clustered architecture including a plurality of independent
`U.S. PATENT DOCUMENTS
`clustered architecture including a plurality of independent
`U.S. PATENT DOCUMENTS
`functional units and a multi-ported register file that is
`functional units and a multi-ported register file that
`is
`ooobon ‘ ‘ 13/1993 pupaa cal. soni
`divided into a plurality of separate register file segments, the
`E. A : 1910, Six et al... gig divided into a plurality of Separate register file Segments, the
`5301,340 A
`4/1994 Cook .............. .395f800 register file Segments being individually associated with the
`5301340 A *
`41994 COOK sesecceecseses305/800
`register file segments being individually associated with the
`5,467,476 A
`11/1995 Kawasaki
`..
`... 395/800
`plurality of independent functional units. The functional
`5,467.476 A 11/1995 Kawasaki ..
`... 395/800
`plurality of independent functional units. The functional
`5.530,817 A 6/1996 Masubuchi ...
`... 395/375
`units access the respective associated register file Segments
`5,530,817 A
`6/1996 Masubuchi...
`we 395/375
`units access the respective associated register file segments
`5,542,059 A
`7/1996 Blomgren ................... 395/375
`using read operations that are local to the functional unit/
`5,542,059 A
`7/1996 Blomgren .......cceeee 395/375
`using read operations that are local to the functional unit/
`5,657,291 A 8/1997 Podlesny et al.
`register file Segment pairs. In contrast, the functional units
`5,657,291 A
`8/1997 Podlesny etal.
`register file segment pairs. In contrast, the functional units
`5,721868 A 2/1998 Yung et al. ................. 395/476
`access the register file Segments using write operations that
`5,721,868 A
`2/1998 Yung et al. cee 395/476
`access the register file segments using write operations that
`5,761475 A 6/1998 Yung et al.
`... 395/394
`are broadcast to a plurality of register file Segments. Inde
`5,761,475 A
`6/1998 Yungetal.
`w+ 395/394
`are broadcast to a plurality of register file segments. Inde-
`
`5,764,943 A
`6/1998 Wechsler ........-..-.ss..--. 395/394
`pendence between clusters is attained since the separate
`5,764,943 A 6/1998 Wechsler .................... 395/394
`pendence between clusters is attained since the Separate
`ores ‘ ‘ 1008 Leune ae ” pop clustered functional unit/ register file segment pairs have
`s A : E. E.M. r 3. clustered functional unit/ register file Segment pairs have
`
`5001301 A *
`51999 Matsuo et al.
`.
`_. 395/388
`local (internal) bypassing that allows internal computations
`5,901.301 A
`5f1999 Matsuo et al. .
`... 395/388
`local (internal) bypassing that allows internal computations
`
`6,076,159 A *
`6/2000 Fleck et al.
`....
`712/241
`to proceed, but have only limited bypassing between differ-
`
`6,076.159 A
`6/2000 Fleck et al. ....
`... 712/241
`to proceed, but have only limited bypassing between differ
`6,170,051 B1 * 1/2001 Dowling ..................... 712/225
`ent functional unit/ register file segment pair clusters. Thus
`6,170,051 BL *
`1/2001 Dowling... 712/225
`ent functional unit/ register file segment pair clusters. Thus
`a particular functional unit? register Segment pair does not
`a particular functional unit/ register segment pair does not
`FOREIGN PATENT DOCUMENTS
`bypass to all other functional unit/ register Segment pairs.
`FOREIGN PATENT DOCUMENTS
`bypass to all other functional unit/ register segment pairs.
`0 730 223
`9/1994 ee GO6F/9/38
`O 730 223
`9/1994 ............. GO6F/9/38
`O 653 703
`5/1995
`............. GO6F/9/38
`0 653 703
`S/I1995
`seeeeeeeceeee GO06F/9/38
`
`
`
`25 Claims, 18 Drawing Sheets
`25 Claims, 18 Drawing Sheets
`
`
`110
`112
`i
`112
`MPUt
`MPU2
`PUt
`PU2
`7210
`270
`2-210
`20
`instruction Cache
`Instruction Cache
`
`instruction Cache
`instruction Cache
`212
`1
`212
`22
`!
`22
`Instruction Aligner
`Instruction Aligner
`instruction Aligner
`instruction Aligner
`
`214
`q
`214
`24
`!
`214
`
`[
`instruction Buffer
`Instruction Buffer
`instruction Buffer
`instruction Buffer
`2
`226
`226s444 226
`PCU
`PCU
`
`PC
`PC2
`
`PC,
`PC,
`
`
`
`
`
`
`
`
`
`
` T rn
`216
`216, 220
`
`
`| Register Files |
`Register Files
`Register Files
`| Register Files |
`224-224.
`224 224 || 21
`224 224
`224 224
`21
`
`
`2185, 224 224 224 ‘224|||218, +224 \a2at\204 \204
`Load/Store Unit
`Load/Store Unit
`[
`|
`loadStore Unit
`Load Store Unit
`
`
`|
`Shared Data Cache and Synchronization Area
`Shared Data Cache and Synchronization Area
`
`-
`
`
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Sheet 1 of 18
`Sheet 1 of 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`ILP
`ILP
`
`SIZE
`SIZE
`
`12
`
`10
`
`FIG. 1
`FIG. 1
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Sheet 2 of 18
`Sheet 2 of 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`S/QNcre
`
`S/999'|
`
`wvuquaes
`
`¢Old
`
`S/999'L‘S-¥dN
`
`
`
`$/999°L‘N-vdN
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Sheet 3 of 18
`Sheet 3 of 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`PC
`
`PC2
`
`f 10
`
`210
`
`112
`
`210
`210
`
`
`
`
`
`
`
`
`
`
`
`
`
`InStruction CaChe
`
`
`
`212
`Instruction Aligner
`214
`
`InStruction Buffer
`
`PCU
`
`MFU3 MFU2MFU1 GFU
`
`
`
`
`
`Register Files
`218- 224 – 224
`Load/Store Unit
`Load/Store Unit
`
`
`
`
`
`
`
`InStruction CaChe
`
`
`
`22
`Instruction Aligner
`214
`
`InStruction Buffer
`N
`N
`
`iii.
`
`
`
`MFU3MFU2|MFUGFU
`Register Files
`
`
`
`218- 224 – 224
`218
`LOad/Store Unit
`Load/Store Unit
`
`Shared Data Cache and Synchronization Area
`Shared Data Cache and Synchronization Area
`
`FIG. 3
`FIG. 3
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Sheet 4 of 18
`Sheet 4 of 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`J216
`
`Broadcast Writes (5)
`
`Broadcast Writes (5)
`
`3 Read POrtS 3 Read POrtS 3 Read POrtS 3 Read POrtS
`3 Read Ports
`3Read Ports
`3Read Ports
`3 Read Ports
`FIG. 4
`FIG. 4
`
`
`
`Global
`Global
`Registers
`Registers
`12R/4W
`12R/4W
`O
`or
`12R15W
`
`12R/SW
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Sheet 5 of 18
`Sheet 5 of 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`Data In
`(up to 332-bit registers)
`(up to 3 32-bit registers)
`
`5 Cycles
`5 Cycles
`(globally
`(globally
`ViSible
`visible
`latency)
`latency)
`
`Hardware bypasses
`Hardware bypasses
`provide shortest
`provide shortest
`internal latency
`internal latency
`(4, 2 and 1 cycle
`(4, 2 and 7 cycle
`latency)
`latency)
`
` Data In
`
`
`
` Data Out —
`DataOut
`(one 32-bit register)
`(one 32-bit register)
`
`F.G. 6
`FIG. 6
`
`

`

`U.S. Patent
`U.S. Patent
`
`
`
`Sheet 6 of 18
`Sheet 6 of 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Z°9l4
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Sheet 7 of 18
`Sheet 7 of 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`sors?
`
`rs2
`
`cai}td
`
`N?
`
`
`AWW«W«”C0) ))))]]]},,,||pp,pWWW rs? rs2
`weeeee
`
`l|l1
`
`I||t{IIII|||||I{((II||
`
`FIG. 8A
`FIG. 8A
`
`I | |!||I I !|||||
`
`| l| ! |II I||I || I
`
`
`
`7226
`_—T To 4
`Pipe Ld/St
`756
`C/AT
`
`Located in the
`Pipe control Unit
`Pipe_control Unit
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Sheet 8 of 18
`Sheet 8 of 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`
`
`912€adid)adidajohope,
`
`G8Old
`
`a—(Gadd)addpeo)am|4|ev|ew|wi|oa
`
`
`
`
`
`01) (9 odid) ad?dpeo?@WTTWITTE) I 170 || 5 ||
`
`50/ISOOOO
`JOVIS-IGOIIT
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Sheet 9 of 18
`Sheet 9 of 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`
`
`==SsSe,2aEw29/8
`
`-800
`800
`
`FIG. 80
`FIG. 8C
`
`FIG. 9
`FIG. 9
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Sheet 10 of 18
`Sheet 10 0f 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`group mfu3
`mfu2
`mfu1
`gfu
`
`
`
`group |mfu2|mful|mfu3 gtu
`
`
`
`..., I,
`in|+tte
`
`
`
`
`iw2|mius2|mfue2|mtut_2|otu2|
`viv.3 mius 3 mile 3 mugglu 3
`tinA|rtus_4mtu|mtut_4{gt4
`
`
`in5|rmtus_5|mtu2_s[mivt_sJot5
`
`
`
`
`
`
`
`FIG 10A
`FIG. 10A
`
`
`
`
`
`
`
`3 4 || 5 || 6 || 7 || 8
`10 11 12 13
`
`cycle 1|2 12|18/|8 10|11|
`
`
`
`... ." "I"
`foefee)[PT
`E1 E2x E3 E4 TWB |
`|
`|
`neler||ler|eo|xjeslea|7|we||||_|
`EA1A2 A3 Twel
`out_lolelelarfatas|rlwo]|||||
`DE EA1A2 A3 twell
`ou2||ololelelar|aelas|r|wel||_|
`
`FIG 10B
`FIG. 10B
`
`
`
`
`
`FIG 10C
`FIG. 10€
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Sheet 11 of 18
`Sheet 11 of 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`Le)
`
`|7
`
`m[mm2=Be]BRB!BE|ss
`
`
`1|[eenae|
`
`
`vert
`oles
`|ooler
`bel
`eeler
`ter
`>
`| 1 || 2 | 3 || 4 || 5 || 6 || 7 | 8 || 9 | 10 11 12
`Cycle
`0
`mfu1 1 D E1 E2 X1 X2X3 E3 E4 T WB
`
`helper |
`| DEEexixexagged twel
`rove||of)eaewelaolcolx
`hal
`Q3
`rntS
`
`
`mu 2 TD ID | DEEA1A2 Aalt we
`
`
`mt2|_||olololelelarfae
`
`
`
`mu21 DETEEE A1A2 A3 Twel
`met|olelelelelalalas]
`||
`mile 2 |
`| ID | DD E1 E2 x Eleat
`mue2|||olo{olerleo|xles
`rl
`
`
`helper |
`|
`|
`|
`|
`| DEEexeget
`fewer|||||[olerteelx
`[D
`fea]
`
`
`glut | DEEEE| AllA2laat well
`out[olelelele|adwlaslr
` A3
`
`glu 2 TDD ID | DEEA1A2 A3 it we
`uz|lololololelelala
`lwe
`
`
`
`
`
`
`
`~
`
`Mm
`
`No
`
`FIG. 11A
`FIG. 11A
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Sheet 12 of 18
`Sheet 12 of 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`
`
`€ ~NI
`
`™_NallSB]ao!=/2/2GE]<|
`on!iSoy!>2=E;/,sce
`
`FIG 11B
`FIG. 11B
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Sheet 13 of 18
`Sheet 13 0f 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`
`
` ee
`
`Eilee|x|Eseat we
`imive_1|o|er]er]er|eo]x[esles!lwo]|
`| DEE2X Eg|Earlwell
`meer|_|oo|o|er|e2|x|3]eal7|e]
`
`| DEE A1A2 A3 twe
`mu?)|{|folete}ariaalas|rine
`eo
`
`
`31 DE1 E2x1 x2 x3 E3 E4 TWB
`rmfu3_1|o|e1|e2|x1|x2|xa]e3|e4|r[wel|
`helper |
`| DEE2x1 x2 x3 E3 E4 TWB
`felver|_|oLer[eo|xr|x2[xa]e3|e4|7[wel
`
`
`mu32
`DD ID | ETEA1A2 A3 Twel
`mus21||ololoelejarlaalas|r|we
`
`ee
`
`
`de EEI eataeast we
`out[olelelelelataelas|rial|
`gfu 1
`

`DDDDTE EA1A2 A3 twe
`uz||olololo}elelarfaalas|rine
`fu 2
`9.
`
`=>
`
`FIG 11C
`FIG. 11€
`
`
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Sheet 14 of 18
`Sheet 14 of 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`ifu pou pC
`ifu_pcu_pc
`
`RFO
`
`-226
`£226
`
`horizontal = 3 x 32
`mfu2x data t
`TT|PERE horizontal = 3x 32
`mfu2x_data_t
`mfu3X data t
`miu3x_datat
`Sp? data a2
`HiT|tT|ser_data_ae
`paytT
`
`PCU_GFUX_DP|~
`PCU GFUX DP-1110
`1130
`
`no
`TTT||rt|
`1110 1130
`PCU_GFUX MC
`PCU GFUX MC
`-
`mful pcu data e
`mful_pcudatae HH
`mfut_pcu_data_e2 +H
`mfu1 pCu data e2
`mful pcu data e4
`mfut_pcu_datae4|||
`IE|_||| deu_pcu_dc_data/63:32]
`
`r|| deu_peu_dc_data[31:0]
`dcu pCu dc data/31:0
`dcu pCu dic data16332)
`Isu pCu ndc data/31:0)
`|Tf|| tsu_pou_ndedata0]
`SupCu ndc data?ö3:32)
`pCu rS2 data
`pcu_rs2_data
`pCu rS1 data
`pcu_rsi_data
`pCu Strol data
`pcu_strd_data
`pcu_strd1data
`pCu Stra1 data
`pCu rS1 data
`pcu_rs?_data
`pCu (S2 data
`pcu_rs2_data
`pCu Stro data
`pcu_strddata
`
`ST
`
`hOrizOntal buSeS = 11 X 32
`horizontal buses = 11 x 3.
`Vertical routing = 13 x 32
`vertical routing = 13 x 32 4
`
`|
`
`GFU
`
`222
`222
`
`FIG. 12A
`FIG. 12A
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Sheet 15 of 18
`Sheet 15 of 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`RF1
`RFI
`(decoding)
`(deCOding)
`
`RFT
`RF1
`
`-226
`Jf226
`
`spr data a2
`|]TPEETE spr_data_a2
`Thorizontal = 4
`mfu2X data t
`||
`|| ||
`mfu2x_data_t ELTTYrorizontal = 4
`mfu3x data t
`mfu3x_data_t
`{TTT
`H+{||
`f1_pcu_rs1_data
`H rf1 pCu (S1 data
`mfu1x_data_t eT rf1_pcu_rs2_data
`mfu1X data t
`H. rf1 pCu rS2 data
`rf1 pCu rS3 data
`rf1_pcu_rs3_data
`PCU MFUIX MC
`PCU_MFU1X_DP
`PCU MFU1X MC
`PCU MFU1X DP
`1120
`7
`~
`1132
`1 120
`1132
`Ty dcu_pcu_dc_data[31:0]
`glux_data_at
`doupCuldc dataI31.0
`gfux data a1
`dCupCu_dc_data16332)
`gfuX data a4
`
`glux_dataa4 | dcu|pcu_de_data/63:32]
`i lsu_pcu_ndc_data[31:0]
`ISupCu ndc data/31:0)
`ISupCu ndc data163:32)
`lsu_pcu_ndc_data[63:32]
`
`
`
`ldx1 data
`ldx1 data
`ldx1m_data
`ldx1m data
`horizontal buses = 13
`hOriZOntal buSeS = 13
`pCu mfu1 rS1 data
`pcu_mful_rs1_data
`pCu mfu1 rS2 data
`pcu_mful_rs2_data
`pCu mfu1 rS3 data
`pcu_mful_rs3_data
`
`Ht
`
`mfu1 pCu data e
`mfu1_pcu_data_e
`mfuTpcu_datae?
`mful pCu_data e2
`mful pCu data e4
`gfu pCu data e
`gfu_pcu_data_e
`gfu pCu data e6e34
`gfu_pcu_data_e6e34
`
`mful_pcu_data_e4
`rt
`220
`
`
`
`pCu Stra data
`pcu_strd_data
`
`MFU1
`MFU1
`
`220
`
`FIG. 12B
`FIG. 12B
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Sheet 16 of 18
`Sheet 16 of 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`-226
`226
`
`miuix data t
`HT]PTRAETPmfutx_oata_t
`Horizontal
`miuéx data t||
`||
`Horizontal
`_‘Mmfusx_datat||||||
`|| Sp? data a2
`buSeS = 4
`buses = 4 T{==||||||serdata_a2
`
`III mil2x data t
`TtUT TTY
`mfu2xdatat
`e— Ah
`rf2_pcu_rs1_data
`EH rf2 pCu rS1 data
`rf2 pCu rS2 data
`rf2_pcu_rs2_data
`rf2 pCu rS3 data
`rf2_pcu_rs3_data
`
`499
`1122
`
`ldx1m_data
`ldX1m data
`ldx1 data
`Idx1 data
`gfuX data a1
`gfux_data_af
`gfuX data a4
`gfux_data_a4
`
`
`
`PCU_MFU2XMC
`PCU MFU2X MC
`
`PCU MFU2X DP
`
`Horizontal buses = 4
`Horizontal buses
`= 4
`orizomai
`DUSES
`OIZOI73, OUSGS
`
`i
`HT
`
`mfu2 pCu data e
`mfu2_pcu_data_e
`mfu2 pCu data e4
`mfu2_pcu_data_e4
`miu2_pcu_data_e2
`mfu2 pCu data e2
`
`pcu_mfu2_rs1_data
`pCu mfu2 rS1 data
`pCu mfu2 rS2 data
`pcu_mfu2_rs2_data
`pCu mfu2 rS3 data
`pcu_mfu2_rs3_data ry
`pCu Sird data
`pcu_strd_data
`
`
`
`MFU2
`
`MFU2
`
`220
`220
`
`FIG. 120
`FIG. 12€
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Sheet 17 of 18
`Sheet 17 of 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`ae
`
`/ 226
`L226
`
`gfux data t
`ATOTeo_gfux_data_t
`A
`A A
`PETTTLETTTTT|
`tex3_cata
`ldx3 data
`EEE ldx3m_data
`ldx3m data
`mfu2x data t
`mfu2xdata_t
`mtu3x_data_t
`mfu3X data t H Horizontal buSeS
`rf1 pCu rS1 data
`Horizontal buses
`rff_pcu_rs?data
`C
`in Channel = 5
`t
`in channel =
`rf1 pCu rS2 data
`fl_pcu_rs2_data
`rf1 pCu rS3 data
`rfl_pcu_rs3_data
`1122
`1122
`
`
`
`PCU MFU1X MC
`PCU_MFU1X_MC
`
`
`TT
`mfutpeudatae2
`agfux_data_aT
`
`PCU MFU1X DP
`PCU_MFU1X_DP
`1132
`1132
`22 Outputs
`22 outputs
`
`Horizontalbuses
`Horizontal buses
`in Channel = 19
`
`inchannel=
`
`pcu_mful_rs?data
`pCu miu1 rS1 data
`pcu_mful_rs2_data
`pCu mfu1 rS2 data
`pcu_mful_rs3_data
`pCu mful rS3 data
`
`dcupCu dc data/31:0)
`MTT dcu_pou_de_data[31:0]
`dCupCu dc data/63:32)
`a dcu_peu_de_data[63:32]
`| isu_peu_ndce_data[31:0]
`IsupCu ndc data/31:0)
`ISupCu ndc data?ö3:32)
`Isu_pcu_ndc_data[63:32]
`mful pCu data e
`mful
`pcu data e
`mfu1 pCu_data e2
`mful pCu_data e4
`mfu1_pcu_data_e4
`gfu pCu data e
`gfu_pcu_data_e
`gfu pCu data e6e34
`gfu_pcu_data_e6e34
`gfuX data a1
`gfuX data a2
`gfux data a2
`en mtulx data al
`gfuX data a3
`miu1X data a1
`giux_data_a3mfy1x_dataa2
`mfu1X data a2
`mfu1X data a3
`mfutx_data_a3
`Idx1_data
`ldX1 data
`ala
`::
`220 dyad
`220
`lax2m data
`
`FIG. 12D
`FIG. 12D
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep. 2, 2003
`Sep. 2, 2003
`
`Sheet 18 of 18
`Sheet 18 of 18
`
`US 6,615,338 B1
`US 6,615,338 B1
`
`LSU
`LSU
`
`y 1200
`vf 1200
`pCu trap
`pcu_trap
`
`
`Idx]
`1210
`Idy1
`ldx1 ra(70) Idx1 data1630)
`Walid
`
`lax1_rd{7-0}|lox1_data[63:0]|dsize|valid
`Idx2
`ldx2 ra.I7:01 lox2 data?ö3:0)
`Jdx2_ra7:0]|lox?dataj63-0}| dsize|valid
`
`
`Idx2
`Idx3
`ldx3 rdI7:0) Idy3 data?S3:0)
`Idx3_rd[7:0]|ldx3_data[63:0]|dsize
`0]
`Of
`ldx3
`
`
`
`ldx4
`
`Cycle
`cycle
`
`1
`2 3 || 4 || 5 || 6 || 7
`|7/2/13|14]5/6|7
`
`10 11 12
`10}
`11)
`12
`
`...
`.""
`
`
`jot_{olefeelao|rh||| |
`
`
`membar ID | ETEEA1A2 A3 Twel
`membar|{olelelelaraalas|t[wel|
`inst || |
`| IDEA1A2 A3
`well
`
`‘inte||||[olelarlaelaal[wel|
`
`
`
`
`
`FIG. 14A
`FIG. 14A
`
`
`
`
`
`
`
`
`
`EIILATI
`
`
`
`bay|olelololarel|||
`
`
`
`
`cycle 10|17|121)/2;3/4)/54|617
`III, 10 11 12
`I DEI cA2 A3 twell
`load
`foed||olelolsalaslrlwel|||_|
`
`
`load
`DETEEA1A2 A3 twell
`food||lolelele|ar{aa|aslr[wel|
`inst
`|
`|
`|
`| DEA1A2 at we
`insie|||||[ole[atlalasl+[we
`FIG, 14B
`FIG. 14B
`
`

`

`US 6,615,338 B1
`US 6,615,338 B1
`
`10
`
`15
`15
`
`20
`
`25
`25
`
`1
`1
`CLUSTERED ARCHITECTUREIN A VLIW
`CLUSTERED ARCHITECTURE IN A VLIW
`PROCESSOR
`PROCESSOR
`
`2
`2
`tion has a set of fields corresponding to each functional unit.
`tion hasa set of fields corresponding to each functional unit.
`Typical bit lengths of a Subinstruction commonly range from
`Typical bit lengths of a subinstruction commonly range from
`16 to 64 bits per functional unit to produce an instruction
`16 to 64 bits per functional unit to produce an instruction
`CROSS-REFERENCE TO RELATED
`length often in a range from 64 to 512 bits for VLIW groups
`CROSS-REFERENCE TO RELATED
`length often in a range from 64 to 512 bits for VLIW groups
`APPLICATIONS
`from four to eight Subinstructions.
`APPLICATIONS
`from four to eight subinstructions.
`The multiple functional units are kept busy by maintain
`The multiple functional units are kept busy by maintain-
`The present invention is related to Subject matter dis
`The present invention is related to subject matter dis-
`ing a code Sequence with Sufficient operations to keep
`ing a code sequence with sufficient operations to keep
`closed in the following co-pending patent applications:
`closed in the following co-pending patent applications:
`instructions scheduled. A VLIW processor often uses a
`instructions scheduled. A VLIW processor often uses a
`1. U.S. patent application Ser. No. 09/204,480, entitled,
`1. U.S. patent application Ser. No. 09/204,480, entitled,
`technique called trace Scheduling to maintain Scheduling
`technique called trace scheduling to maintain scheduling
`“A Multiple-Thread Processor for Threaded Software
`“A Multiple-Thread Processor for Threaded Software
`efficiency by unrolling loops and Scheduling code acroSS
`efficiency by unrolling loops and scheduling code across
`Applications”, naming Marc Tremblay and William Joy
`Applications”, naming Marc Tremblay and William Joy
`basic function blockS. Trace Scheduling also improves effi
`basic function blocks. Trace scheduling also improves effi-
`as inventors and filed on even date here with,
`as inventors and filed on even date herewith;
`ciency by allowing instructions to move acroSS branch
`ciency by allowing instructions to move across branch
`2. U.S. patent application Ser. No. 09/204,481, now U.S.
`points.
`2. U.S. patent application Ser. No. 09/204,481, now U.S.
`points.
`Pat. No. 6,343,348, entitled, “Apparatus and Method
`Pat. No. 6,343,348, entitled, “Apparatus and Method
`Limitations of VLIW processing include limited
`Limitations of VLIW processing include limited
`for Optimizing Die Utilization and Speed Performance
`for Optimizing Die Utilization and Speed Performance
`parallelism, limited hardware resources, and a vast increase
`parallelism, limited hardware resources, and a vast increase
`by Register File Splitting”, naming Marc Tremblay and
`by Register File Splitting”, naming Marc Tremblay and
`in code size. A limited amount of parallelism is available in
`in code size. A limited amountof parallelism is available in
`William Joy as inventors and filed on even date here
`William Joy as inventors and filed on even date here-
`instruction Sequences. Unless loops are unrolled a very large
`instruction sequences. Unless loops are unrolled a very large
`with;
`with;
`number of times, insufficient operations are available to fill
`numberof times, insufficient operations are available to fill
`3. U.S. patent application Ser. No. 09/204,536, entitled,
`3. U.S. patent application Ser. No. 09/204,536, entitled,
`the instruction capacity of the functional units. The opera
`the instruction capacity of the functional units. The opera-
`“Variable Issue-Width VLIW Processor”, naming Marc
`“Variable Issue-Width VLIW Processor”, naming Marc
`tional capacity of a VLIW processor is not determined by the
`tional capacity of a VLIW processoris not determined by the
`Tremblay as inventor and filed on even date herewith;
`Tremblay as inventor and filed on even date herewith;
`number of functional units alone. The capacity also depends
`numberof functional units alone. The capacity also depends
`4. U.S. patent application Ser. No. 09/204,586, now U.S.
`on the depth of the operational pipeline of the operational
`4. U.S. patent application Ser. No. 09/204,586, now U.S.
`on the depth of the operational pipeline of the operational
`Pat. No. 6,205,543, entitled, “Efficient Handling of a
`units. Several operational units Such as the memory, branch
`Pat. No. 6,205,543, entitled, “Efficient Handling of a
`units. Several operational units such as the memory, branch-
`Large Register File for Context Switching”, naming
`ing controller, and floating point functional units, are pipe
`Large Register File for Context Switching”, naming
`ing controller, and floating point functional units, are pipe-
`Marc Tremblay and William Joy as inventors and filed
`lined and perform a much larger number of operations than
`Mare Tremblay and William Joy as inventors andfiled
`lined and perform a much larger numberof operations than
`can be executed in parallel. For example, a floating point
`on even date here with;
`on even date herewith;
`can be executed in parallel. For example, a floating point
`pipeline with a depth of eight Steps has two operations issued
`5. U.S. patent application Ser. No. 09/205,121, now U.S.
`pipeline with a depth of eight steps has two operations issued
`5. U.S. patent application Ser. No. 09/205,121, now U.S.
`on a clock cycle that cannot depend on any of the operations
`Pat. No. 6,321,325, entitled, “Dual In-line Buffers for
`onaclock cycle that cannot depend on any ofthe operations
`Pat. No. 6,321,325, entitled, “Dual In-line Buffers for
`already within the floating point pipeline. Accordingly, the
`an Instruction Fetch Unit', naming Marc Tremblay and
`already within the floating point pipeline. Accordingly, the
`30
`an Instruction Fetch Unit”, naming Mare Tremblay and
`actual number of independent operations is approximately
`Graham Murphy as inventors and filed on even date
`actual number of independent operations is approximately
`Graham Murphy as inventors and filed on even date
`equal to the average pipeline depth times the number of
`herewith;
`equal to the average pipeline depth times the number of
`herewith;
`execution units. Consequently, the number of operations
`execution units. Consequently,
`the number of operations
`6. U.S. patent application Ser. No. 09/204,781, now U.S.
`6. U.S. patent application Ser. No. 09/204,781, now U.S.
`needed to maintain a maximum efficiency of operation for a
`needed to maintain a maximum efficiency of operation for a
`Pat. No. 6,249,861, entitled, “An Instruction Fetch Unit
`Pat. No. 6,249,861, entitled, “An Instruction Fetch Unit
`VLIW processor with four functional units is twelve to
`VLIW processor with four functional units is twelve to
`Aligner', naming Marc Tremblay and Graham Murphy
`Aligner”, naming Marc Tremblay and Graham Murphy
`sixteen.
`Sixteen.
`as inventors and filed on even date here with,
`as inventors and filed on even date herewith;
`Limited hardware resources are a problem, not only
`Limited hardware resources are a problem, not only
`7. U.S. patent application Ser. No. 09/204,535, now U.S.
`7. U.S. patent application Ser. No. 09/204,535, now U.S.
`because of duplication of functional units but more impor
`because of duplication of functional units but more impor-
`Pat. No. 6,279,100, entitled, “Local Stall Control
`Pat. No. 6,279,100, entitled, “Local Stall Control
`tantly due to a large increase in memory and register file
`tantly due to a large increase in memory andregister file
`Method and Structure in a Microprocessor', naming
`Method and Structure in a Microprocessor”, naming
`bandwidth. A large number of read and write ports are
`bandwidth. A large number of read and write ports are
`Marc Tremblay and Sharada Yeluri as inventors and
`Mare Tremblay and Sharada Yeluri as inventors and
`necessary for accessing the register file, imposing a band
`necessary for accessing the register file, imposing a band-
`filed on even date herewith;
`filed on even date herewith;
`width that is difficult to support without a large cost in the
`width that is difficult to support without a large cost in the
`8. U.S. patent application Ser. No. 09/204.858, entitled,
`8. U.S. patent application Ser. No. 09/204,858, entitled,
`Size of the register file and degradation in clock Speed. AS the
`size of the register file and degradation in clock speed. As the
`“Local and Global Register Partitioning in a VLIW
`“Local and Global Register Partitioning in a VLIW
`number of ports increases, the complexity of the memory
`numberof ports increases, the complexity of the memory
`Processor', naming Marc Tremblay and William Joy as
`Processor”, naming Mare Tremblay and William Joy as
`System further increases. To allow multiple memory
`45
`system further
`increases. To allow multiple memory
`inventors and filed on even date herewith; and
`inventors and filed on even date herewith; and
`45
`accesses in parallel, the memory is divided into multiple
`accesses in parallel, the memory is divided into multiple
`9. U.S. patent application Ser. No. 09/204,479, entitled,
`9. US. patent application Ser. No. 09/204,479, entitled,
`banks having different addresses to reduce the likelihood
`banks having different addresses to reduce the likelihood
`“Implicitly Derived Register Specifiers in a Processor”,
`“Implicitly Derived Register Specifiers in a Processor”,
`that multiple operations in a single instruction have con
`that multiple operations in a single instruction have con-
`naming Marc Tremblay and William Joy as inventors
`naming Mare Tremblay and William Joy as inventors
`flicting accesses that cause the processor to Stall since
`flicting accesses that cause the processor to stall since
`and filed on even date herewith.
`and filed on even date herewith.
`Synchrony must be maintained between the functional units.
`synchrony must be maintained between the functional units.
`Code size is a problem for Several reasons. The generation
`BACKGROUND OF THE INVENTION
`Codesize is a problem for several reasons. The generation
`BACKGROUND OF THE INVENTION
`of Sufficient operations in a nonbranching code fragment
`of sufficient operations in a nonbranching code fragment
`1. Field of the Invention
`1. Field of the Invention
`requires Substantial unrolling of loops, increasing the code
`requires substantial unrolling of loops, increasing the code
`The present invention relates to processors. More
`The present
`invention relates to processors. More
`size. Also, instructions that are not full include unused
`size. Also,
`instructions that are not full
`include unused
`Specifically, the present invention relates to architectures for
`Subinstructions that waste code Space, increasing code size.
`specifically, the present inventionrelates to architectures for
`subinstructions that waste code space, increasing code size.
`Very Long Instruction Word (VLIW) processors.
`Furthermore, the increase in the size of StorageS Such as the
`Very Long Instruction Word (VLIW) processors.
`Furthermore, the increase in the size of storages such as the
`2. Description of the Related Art
`register file increase the number of bits in the instruction for
`register file increase the numberof bits in the instruction for
`2. Description of the Related Art
`addressing registers in the register file.
`One technique for improving the performance of proces
`addressing registers in the registerfile.
`One technique for improving the performance of proces-
`A challenge in the design of VLIW processors is effective
`SorS is parallel execution of multiple instructions to allow
`sors is parallel execution of multiple instructions to allow
`Achallenge in the design of VLIW processorsis effective
`the instruction execution rate to exceed the clock rate.
`exploitation of instruction-level parallelism. Highly parallel
`the instruction execution rate to exceed the clock rate.
`exploitation of instruction-level parallelism. Highly parallel
`computing applications that have few data dependencies and
`Various types of parallel processors have been developed
`Various types of parallel processors have been developed
`computing applications that have few data dependencies and
`including Very Long Instruction Word (VLIW) processors
`few branches are executed most efficiently using a wide
`including Very Long Instruction Word (VLIW) processors
`few branches are executed most efficiently using a wide
`that use multiple, independent functional units to execute
`VLIW processor with a greater number of Subinstructions in
`that use multiple, independent functional units to execute
`VLIW processor with a greater numberof subinstructions in
`a VLIW group. However many computing applications are
`multiple instructions in parallel. VLIW processors package
`multiple instructions in parallel. VLIW processors package
`a VLIW group. However many computing applications are
`multiple operations into one very long instruction, the mul
`not highly parallel and include branches or data dependen
`multiple operations into one very long instruction, the mul-
`not highly parallel and include branches or data dependen-
`tiple operations being determined by Sub-instructions that
`cies that waste Space in instruction memory and cause
`tiple operations being determined by sub-instructions that
`cies that waste space in instruction memory and cause
`Stalling. Referring to FIG. 1, a graph illustrates a comparison
`are applied to the independent functional units. An instruc
`are applied to the independent functional units. An instruc-
`stalling. Referring to FIG. 1, a graph illustrates a comparison
`
`35
`35
`
`40
`40
`
`50
`50
`
`55
`55
`
`60
`60
`
`65
`65
`
`

`

`US 6,615,338 B1
`US 6,615,338 B1
`
`10
`
`15
`15
`
`20
`
`3
`3
`of instruction issue efficiency and processor size as VLIW
`of instruction issue efficiency and processor size as VLIW
`group width is varied. The left axis of the graph relates to an
`group width is varied. Theleft axis of the graph relates to an
`instruction-level parallelism plot 10 that depicts the number
`instruction-level parallelism plot 10 that depicts the number
`of instructions executed per cycle against VLIW issue width.
`of instructions executed per cycle against VLIW issue width.
`The right axis of the graph relates to a relative processor Size
`Theright axis of the graphrelatesto a relative processorsize
`plot 12 that shows relative processor Size in relation to
`plot 12 that shows relative processor size in relation to
`VLIW issue width.
`VLIW issue width.
`What are needed are a technique and processor architec
`What are needed are a technique and processor architec-
`ture that increase the capacity for instru

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket