`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 1 of 50 Page|D# 1100
`
`
`
`EXHIBIT B
`
`EXHIBIT B
`
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 3 of 50 PageID# 1102
`
`- - - - - - - - - - - - - - - - - . ,
`1
`I
`:
`I
`1
`I
`:
`I
`I
`I
`I
`
`~ VMU
`
`1...-
`
`'\..108
`
`V 120
`
`l/ 122
`
`1 102
`
`118\
`
`6
`
`1
`0
`
`(
`
`CCU
`
`IFU
`INST.
`116
`t..---"1"""-''---+t-\..-14-;--'! CACHE
`1
`\.132
`
`1\..128
`
`"'
`1 ..... 124
`
`lEU
`
`/134
`I
`...----L----.
`DATA
`J-..-,..-\..----l~ CACHE
`130
`\..o~o
`
`F/G._1
`
`r--
`I
`l
`
`L142
`
`I
`1
`1
`:
`\..136 1
`I
`1
`PORT
`I
`1 SWITCH
`I
`I
`I
`I
`1
`\.138 I
`I
`
`T :
`
`I
`I
`I
`I
`1
`I
`I
`r-- _I
`150
`140 I
`I
`/
`•r-
`(.'156 I t - -r+l ~-r
`1
`I
`110
`I
`I \160\
`.
`I
`l\164
`I
`} MCU
`110 .. , ________________ _.
`'
`
`1460\
`I p
`l
`0
`~520
`
`146 (\
`
`p 1
`
`'I
`
`•
`:
`•
`
`Pn
`
`154 ....._
`'
`
`~
`
`l.
`1521
`
`148
`""'\
`n
`1
`
`l.
`152n
`
`(')
`
`~ .....
`
`'"= ~ .....
`('D = .....
`~ 'e -....
`.... 0 = '"= = 0" -....
`.... 0 =
`
`(')
`
`~ .....
`
`~
`~
`
`(.H
`
`~
`
`N
`0
`0
`-....l
`
`('D
`('D
`
`rFJ =(cid:173)
`.....
`....
`0 .....
`....
`
`.j;o.
`
`c
`rFJ
`N
`0
`0
`~
`0 ....
`0 ....
`....
`> ....
`
`0
`(.H
`
`MAU
`
`\.112
`
`·
`
`1'-162
`
`'
`~
`
`
`
`
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 7 of 50 PageID# 1106
`
`Patent Application Publication May 3, 2007 Sheet 5 of 14
`
`US 2007/0101103 A1
`
`PC INC/SIZE
`IFIFO AD
`
`CF RESULT
`
`lr124
`
`490 \
`
`IEDECODE
`
`342 '
`
`lr344
`
`ir348
`
`502 \
`492\
`
`CRY I
`CHKRI
`
`504 \
`
`v506
`
`hr4s4
`DEP
`CHKR
`
`r5s6
`
`'r508
`
`t
`
`RETIRE
`CTL
`500 _) 534 j
`
`341 ~'
`IN TINFO 4~
`510 '
`REG
`RENAME
`..__
`..__
`
`470 "'\.
`
`OP
`
`528\
`
`CF
`CTL
`
`-
`
`[r530
`
`484 \
`
`r-1 LOAD
`
`STORE
`
`486
`
`Ex DATA
`
`482 \
`
`'
`
`INT
`REG/ACK
`j340
`
`f-516
`
`/
`
`lr-512
`
`•
`
`ISSUER~
`\.350 'C
`498 542 "\
`lEU
`RETURN
`
`514-
`
`540\
`
`DONE
`CTL
`
`\_518
`
`\.532
`
`'----i REG
`ARRAY
`
`472 j
`
`476'
`-474\
`
`BYPASS _I
`
`480 \
`
`r478 1
`
`(4780 l
`•
`I FUo I I FU1 I
`~
`
`4782
`
`J478n
`FU2 _ • • • FUn .I
`~
`
`104 _)
`
`r-522
`
`520\
`
`r-526
`
`BYPASS
`CTL
`j
`
`/524
`F/G._5
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 9 of 50 PageID# 1108
`
`Patent Application Publication May 3, 2007 Sheet 7 of 14
`
`US 2007/0101103 A1
`
`.
`
`I
`. rb[31]
`
`rf[31]
`
`1f572
`
`rc[31]
`
`574
`
`rc[O)
`
`rt[O]
`
`rb[O]
`
`FJG._B
`
`IF PADDR
`VM PADDR
`IF 10 OUT
`IFIDIN
`IF DATA
`
`IF IREAD
`IF IBUSY
`IFIRDY
`
`r324
`
`'r524
`r294
`r296
`/r 114
`128
`r298
`.r300
`.r302
`
`/
`
`ExPADDR
`Ex 10 IN
`ExiD OUT
`
`(788
`
`.r796
`r796
`
`Ex DATA
`
`ExREQ
`Ex BUSY
`Ex ROY
`ExA!W
`Ex WIDTH
`
`,.r792'
`
`/
`80
`
`A '
`
`\
`
`(
`
`784
`
`r 106
`
`v-132
`
`FIG._9
`
`LB/60
`"'
`28
`_L862
`
`MCUADDR
`
`MCU CTL+ 10
`
`_/ r 136
`,.
`64
`
`MCU lBUS
`
`[13:4]
`[31:14]
`[3:0)
`[3:0]
`
`ICACHE
`
`DCACHE
`
`r-134
`
`MCUADDR
`
`Ls:4
`"' 32
`r866 MCU CTL+ 10
`
`. LL_138
`
`/
`64
`
`MCU DBUS
`
`ccu
`
`Fl G._17
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 10 of 50 PageID# 1109
`
`Patent Application Publication May 3, 2007 Sheet 8 of 14
`
`US 2007/0101103 A1
`
`SPECIAL REG
`ADDRESS DATA
`
`IF_PC
`
`~ IMED. OP.
`594
`j352_,
`584 ~ .
`.
`l ALIGN I
`582
`596 -----:--.
`/
`_.,
`64/
`32
`
`OFFSET
`~378'
`(600 r586
`INT
`LOAD
`STORE
`
`98
`_L__5
`Ex DATA
`
`354-
`
`592
`
`LD_OP
`v ~-
`.
`32; .
`588
`ALIGN
`1\.,
`59o reo2
`MUX j LATCH J ·
`Keo4
`f"6o6
`1
`
`580\
`
`MUX
`608
`610 ~ J
`t
`t
`TEMPBUF
`
`612
`
`616-1
`
`618 ~ I -
`
`[r482'
`
`v32
`
`/
`
`:
`
`REG
`614~ ARRAY
`..-I
`.J
`
`622
`
`MUX
`
`620
`624
`
`BYPASS
`
`632 _..1,
`
`t
`
`t
`
`626
`628~ J
`630
`636 _}, •
`I
`ROUTER
`634
`I .r650
`/"648 V646
`354 _) .
`ALU1 ll SHFT
`
`644
`
`ALUO
`
`/'"642
`
`640
`
`INTBYPASS
`FP BYPASS
`
`SPECIAL REG
`ADDRESS
`DATA
`
`TARGETADDR
`ExPADDR
`
`654
`
`656
`
`LATC.H
`
`6
`58~
`ExVADDR
`
`650\
`
`INTFU
`
`FPFU
`
`___,
`
`652J
`
`FIG._ 10
`
`
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 13 of 50 PageID# 1112
`
`1
`REG
`FILE
`
`472 j
`
`k ,
`
`78iJ
`___._.--.
`
`7821
`
`jALIGNeAI~ 164
`
`764 .I
`
`798 7 164
`
`/
`
`792 .I
`
`ccu
`DATA
`
`654
`
`102 ./
`
`326
`
`762 '\
`
`7683
`
`1\.
`7700
`_____,._._,
`
`4x4
`COMPARITOR
`
`\_772
`-
`~776
`
`I
`
`2EA2I ,,
`\ f J. / , I
`\ SEL t I
`t
`l
`786 F F
`
`77o
`
`7
`703
`
`SA3
`
`790 ./
`
`1
`
`c7so
`
`/1
`7881
`ccu
`PAD DR
`[31:0]
`
`LOAD/STORE
`/"'TI
`-··
`lt-784
`
`I
`"-774
`lr796
`
`ccu
`CTL
`
`ExiD
`[4:0]
`
`108 j
`
`VADDA
`
`PAD DR
`
`VMU
`
`~794
`
`"' 778
`
`I
`
`(')
`
`~ .....
`
`'"= ~ .....
`('D = .....
`~ 'e -....
`.... 0 = '"= = 0" -....
`.... 0 =
`
`(')
`
`~ .....
`
`~
`~
`
`(.H
`
`~
`
`N
`0
`0
`-....l
`
`('D
`('D
`
`rFJ =(cid:173)
`.....
`....
`....
`0 .....
`....
`
`.j;o.
`
`'--760
`
`FIG._13
`
`c
`rFJ
`N
`0
`0
`~
`0 ....
`0 ....
`....
`> ....
`
`0
`(.H
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 15 of 50 PageID# 1114
`
`(')
`
`~ .....
`
`'"= ~ .....
`('D = .....
`~ 'e -....
`.... 0 = '"= = 0" -....
`.... 0 =
`
`(')
`
`~ .....
`
`~
`~
`
`(.H
`
`~
`
`N
`0
`0
`-....l
`
`('D
`('D
`
`(.H
`
`rFJ =(cid:173)
`.....
`....
`0 .....
`....
`
`.j;o.
`
`-V
`
`(822
`
`(334
`
`(332
`
`VM Ex
`RDY
`
`VM MISS
`
`VM
`EXEP
`
`BOO
`
`VMU CTL
`LOGIC
`
`(_820
`
`M Ex
`BUSY
`
`SPACE 10
`
`1'"832
`47~ 1'_834
`VIRT PAGE
`i4" (836 PHYS PAGE
`7
`&CTL.
`20
`
`/
`
`[930
`
`HIT
`
`I
`CTL
`OUT v
`
`,(838
`
`I
`
`\
`
`I
`v
`
`I NIL
`cs
`AD
`
`WA
`
`108)
`
`ADDR
`
`(824
`
`~
`
`,
`18
`
`VM
`PAD DR
`
`CAM
`
`v-so2
`
`5
`
`c
`rFJ
`N
`0
`0
`~
`0 ....
`0 ....
`....
`> ....
`
`0
`(.H
`
`\
`
`\
`
`~8
`
`,I
`
`~8
`
`,I
`
`32.
`9\
`IFVM REQ
`804
`~\
`ExVM REO
`·so
`5\
`ExVMRf\N
`326
`IF VADDR
`808
`ExVADDR
`0\
`81
`VMU RIW
`2\
`81
`VMU SEL
`4\
`81
`VM RESET
`s\
`81
`VM LOAD
`9\
`818
`VM CLEAR
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 17 of 50 PageID# 1116
`
`US 2007/0101103 AI
`
`May 3, 2007
`
`1
`
`HIGH-PERFORMANCE SUPERSCALAR-BASED
`COMPUTER SYSTEM WITH OUT-OF ORDER
`INSTRUCTION EXECUTION AND CONCURRENT
`RESULTS DISTRIBUTION
`
`[0008] 6. Microprocessor Architecture with a Switch Net(cid:173)
`work for Data transfer Between Cache, Memory Port, and
`IOU, invented by Lentz eta!., application Ser. No. 07/726,
`893, filed Jul. 8, 1991, now U.S. Pat. No. 5,440,752.
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`[0001] This application is a continuation of U.S. patent
`application Ser. No. 09/393,662, filed Sep. 10, 1999, now
`allowed, entitled High-Performance, Superscalar-Based
`Computer System with Out-of-Order Instruction Execution
`and Concurrent Results Distribution, which is a continuation
`ofU.S. patent application Ser. No. 09/158,568, filed Sep. 22,
`1998, now U.S. Pat. No. 6,038,653, which is a continuation
`ofU.S. patent application Ser. No. 08/716,728, filed Sep. 23,
`1996, now U.S. Pat. No. 5,832,292, which is a continuation
`of U.S. patent application Ser. No. 08/397,016, filed Mar. 1,
`1995, now U.S. Pat. No. 5,560,032, which is a continuation
`of U.S. patent application Ser. No. 07/817,809, filed Jan. 8,
`1992, now abandoned, which is a continuation of U.S. patent
`application Ser. No. 07/727,058 filed Jul. 8, 1991, now
`abandoned. Each of the above-referenced applications 1s
`incorporated by reference in its entirety herein.
`
`[0002] The present application is related to the following
`Applications, all assigned to the Assignee of the present
`Application:
`
`[0003] 1. High-Performance, Superscalar-Based Com(cid:173)
`puter System with Out-of-Order Instruction Execution,
`invented by Nguyen eta!., application Ser. No. 08/602,021,
`filed Feb. 15, 1996, now allowed, which is a continuation of
`application Ser. No. 07/817,810, filed Jan. 8, 1992, now U.S.
`Pat. No. 5,539,911, which is a continuation of Ser. No.
`07/727,006, filed Jul. 8, 1991;
`
`[0004] 2. RISC Microprocessor Architecture with Isolated
`Architectural Dependencies, invented by Nguyen et a!.,
`application Ser. No. 08/292,177, filed Aug. 18, 1994, which
`is a continuation of Ser. No. 07/817,807, filed Jan. 8, 1992,
`which is a continuation of Ser. No. 07/726,744, filed Jul. 8,
`1991;
`
`[0005] 3. RISC Microprocessor Architecture Implement(cid:173)
`ing Multiple Typed Register Sets, invented by Garg et a!.,
`application Ser. No. 07/726,773, filed Jul. 8, 1991, now U.S.
`Pat. No. 5,493,687;
`
`[0006] 4. RISC Microprocessor Architecture Implement(cid:173)
`ing Fast Trap and Exception State, invented by Nguyen et
`a!., application Ser. No. 08/345,333, filed Nov. 21, 1994,
`now U.S. Pat. No. 5,481,685, which is a continuation ofSer.
`No. 08/171,968, filed Dec. 23, 1993, which is a continuation
`of Ser. No. 07/817,811, filed Jan. 8, 1992, which is a
`continuation of Ser. No. 07/726,942, filed Jul. 8, 1991;
`
`[0007] 5. Page Printer Controller Including a Single Chip
`Superscalar Microprocessor with Graphics Functional
`Units, invented by Lentz eta!., application Ser. No. 08/267,
`646 filed Jun. 28, 1994, now U.S. Pat. No. 5,394,515, which
`is a continuation ofSer. No. 07/817,813, filed Jan. 8, 1992,
`which is a continuation of Ser. No. 07/726,929, filed Jul. 8,
`1991; and
`
`BACKGROUND OF THE INVENTION
`
`[0009] 1. Field of the Invention
`
`[0010] The present invention is generally related to the
`design of RISC type microprocessor architectures and, in
`particular, to a RISC microprocessor architecture that may
`be readily expanded for increased computational through(cid:173)
`put through the addition of functional computing elements,
`including those tailored for a particular computational func(cid:173)
`tion, into the architecture.
`
`[0011] 2. Background
`
`[0012] Recently, the design of microprocessor architec(cid:173)
`tures have matured from the use of Complex Instruction Set
`Computer (CISC) to simpler Reduced Instruction Set Com(cid:173)
`puter (RISC) Architectures. The CISC architectures are
`notable for the provision of substantial hardware to imple(cid:173)
`ment and support an instruction execution pipeline. The
`typical conventional pipeline structure includes, in fixed
`order,
`instruction fetch,
`instruction decode, data load,
`instruction execute and data store stages. A performance
`advantage is obtained by the concurrent execution of dif(cid:173)
`ferent portions of a set of instructions through the respective
`stages of the pipeline. The longer the pipeline, the greater the
`number of execution stages available and the greater number
`of instructions that can be concurrently executed.
`
`[0013] Two general problems limit the effectiveness of
`CISC pipeline architectures. The first problem is that con(cid:173)
`ditional branch instructions may not be adequately evaluated
`until a prior condition code setting instruction has substan(cid:173)
`tially completed execution through the pipeline.
`
`[0014] Thus, the subsequent execution of the conditional
`branch instruction is delayed, or stalled, resulting in several
`pipeline stages remaining inactive for multiple processor
`cycles. Typically, the condition codes are written to a
`condition code register, also referred to as a processor status
`register (PSR), only at completion of processing an instruc(cid:173)
`tion through the execution stage. Thus, the pipeline must be
`stalled with the conditional branch instruction in the decode
`stage for multiple processor cycles pending determination of
`the branch condition code. The stalling of the pipeline
`results in a substantial loss of through-put. Further, the
`average through-put of the computer will be substantially
`dependent on the mere frequency of conditional branch
`instructions occurring closely after the condition code set(cid:173)
`ting instructions in the program instruction stream.
`
`[0015] A second problem arises from the fact that instruc(cid:173)
`tions closely occurring in the program instruction stream
`will tend to reference the same registers of the processor
`register file. Data registers are often used as the destination
`or source of data in the store and load stages of successive
`instructions. In general, an instruction that stores data to the
`register file must complete processing through at least the
`execution stage before the load stage processing of a sub(cid:173)
`sequent instruction can be allowed to access the register file.
`Since the execution of many instructions require multiple
`processor cycles in the single execution stage to produce
`store data, the entire pipeline is typically stalled for the
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 18 of 50 PageID# 1117
`
`US 2007/0101103 AI
`
`May 3, 2007
`
`2
`
`duration of an execution stage operation. Consequently, the
`execution through-put of the computer is substantially
`dependent on the internal order of the instruction stream
`being executed.
`
`[0016] A third problem arises not so much from the
`execution of the instructions themselves, but the mainte(cid:173)
`nance of the hardware supported instruction execution envi(cid:173)
`ronment, or state-of-the-machine, of the microprocessor
`itself. Contemporary CISC microprocessor hardware sub(cid:173)
`systems can detect the occurrence of trap conditions during
`the execution of instructions. Traps include hardware inter(cid:173)
`rupts, software traps and exceptions. Each trap requires
`execution of a corresponding trap handling routines by the
`processor. On detection of the trap, the execution pipeline
`must be cleared to allow the immediate execution of the trap
`handling routine. Simultaneously, the state-of-the-machine
`must be established as of the precise point of occurrence of
`the trap; the precise point occurring at the conclusion of the
`first currently executing instruction for interrupts and traps
`and immediately prior to an instruction that fails due to a
`exception. Subsequently,
`the state-of-the-machine and,
`again depending on the nature of the trap the executing
`instruction itself must be restored at the completion of the
`handling routine. Consequently, with each trap or related
`event, a latency is introduced by the clearing of the pipeline
`at both the inception and conclusion of the handling routine
`and storage and return of the precise state-of-the-machine
`with corresponding reduction in the through-put of the
`processor.
`
`[0017] These problems have been variously addressed in
`an effort to improve the potential through-put of CISC
`architectures. Assumptions can be made about the proper
`execution of conditional branch instructions, thereby allow(cid:173)
`ing pipeline execution to tentatively proceed in advance of
`the final determination of the branch condition code.
`Assumptions can also be made as to whether a register will
`be modified, thereby allowing subsequent instructions to
`also be tentatively executed. Finally, substantial additional
`hardware can be provided to minimize the occurrence of
`exceptions that require execution of handling routines and
`thereby reduce the frequency of exceptions that interrupt the
`processing of the program instruction stream.
`
`[0018] These solutions, while obviously introducing sub(cid:173)
`stantial additional hardware complexities, also introduce
`distinctive problems of their own. The continued execution
`of instructions in advance of a final resolution of either a
`branch condition or register file store access require that the
`state-of-the-machine be restorable to any of multiple points
`in the program instruction stream including the location of
`the conditional branch, each modification of a register file,
`and for any occurrence of an exception; potentially to a point
`prior to the fully completed execution of the last several
`instructions. Consequently, even more supporting hardware
`is required and, further, must be particularly designed not to
`significantly increase the cycle time of any pipeline stage.
`
`[0019] RISC architectures have sought to avoid many of
`the foregoing problems by drastically simplifYing the hard(cid:173)
`ware implementation of the microprocessor architecture. In
`the extreme, each RISC instruction executes in only three
`pipelined program cycles including a load cycle, an execu(cid:173)
`tion cycle, and a store cycle. Through the use of load and
`
`store data bypassing, conventional RISC architectures can
`essentially execute a single instruction per cycle in the three
`stage pipeline.
`
`[0020] Whenever possible, hardware support in RISC
`architectures is minimized in favor of software routines for
`performing the required functions. Consequently, the RISC
`architecture holds out the hope of substantial flexibility and
`high speed through the use of a simple load/store instruction
`set executed by an optimally matched pipeline. And in
`practice, RISC architectures have been found to benefit from
`the balance between a short, high-performance pipeline and
`the need to execute substantially greater numbers of instruc(cid:173)
`tions to implement all required functions.
`
`[0021] The design of the RISC architecture generally
`avoids or minimizes the problems encountered by CISC
`architectures with regard to branches, register references and
`exceptions. The pipeline involved in a RISC architecture is
`short and optimized for speed. The shortness of the pipeline
`minimizes the consequences of a pipeline stall or clear as
`well as minimizing the problems in restoring the state-of(cid:173)
`the-machine to an earlier execution point.
`
`[0022] However, significant
`through-put performance
`gains over the generally realized present levels cannot be
`readily achieved by the conventional RISC architecture.
`Consequently, alternate, so-called superscalar architectures,
`have been variously proposed. These architectures generally
`attempt to execute multiple instructions concurrently and
`thereby proportionately increase the through-put of the
`processor. Unfortunately, such architectures are, again, sub(cid:173)
`ject to similar, if not the same conditional branch, register
`referencing, and exception handling problems as encoun(cid:173)
`tered by CISC architectures.
`
`[0023] A particular problem encountered by conventional
`superscalar architectures is that their inherent complexity
`generally precludes modification of the architecture without
`substantial redesign of foundational aspects of the architec(cid:173)
`ture. The handling of multiple concurrent instruction execu(cid:173)
`tions imposes substantial control constraints on the archi(cid:173)
`tecture in order to maintain certainty of the correctness of the
`execution of an instruction stream. Indeed, the execution of
`some instructions may complete before that of instructions
`earlier in the program instruction stream. Consequently, the
`control logic that manages even the fundamental aspects of
`instruction execution must often be redesigned to allow for
`architectural modifications that affect the execution flow of
`any particular instruction.
`
`BRIEF SUMMARY OF THE INVENTION
`
`[0024] Thus, a general purpose of the present invention is
`to provide a high-performance, RISC based, superscalar
`processor architecture suitable for ready architectural
`enhancement through the addition and alteration of compu(cid:173)
`tation augmenting functional units.
`
`[0025] This purpose is obtained in the present invention
`through the provision of a microprocessor architecture that
`includes an instruction fetch unit for fetching instruction sets
`from an instruction store and an execution unit that imple(cid:173)
`ments the concurrent execution of a plurality of instructions
`through a parallel array of functional units. The fetch unit
`generally maintains a predetermined number of instructions
`in an instruction buffer. The execution unit includes an
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 19 of 50 PageID# 1118
`
`US 2007/0101103 AI
`
`May 3, 2007
`
`3
`
`instruction selection unit, coupled to the instruction buffer,
`for selecting instructions for execution, and a plurality of
`functional units for performing instruction specified func(cid:173)
`tional operations.
`
`[0026] The instruction selection unit preferably includes
`an instruction decoder and related logic, coupled to the
`instruction buffer, for determining the availability of instruc(cid:173)
`tions for execution, and an instruction scheduler, coupled to
`each of the functional units for determining their respective
`execution status, for scheduling the initiation of the process(cid:173)
`ing of instructions through the functional units. The instruc(cid:173)
`tion scheduler schedules instructions determined to be avail(cid:173)
`able for execution and for which the instruction scheduler
`determines at least one of the functional units implementing
`a necessary computational function is available.
`
`[0027] Consequently, an advantage of the present inven(cid:173)
`tion is that the execution unit may be readily modified with
`respect to any desired modification of the functions per(cid:173)
`formed by any or all of the functional units including due to
`the modification of the function performed by predetermined
`one of said functional units and due to the provision of
`additional functional units. The modification and addition of
`functional units essentially requires only a corresponding
`modification of the instruction scheduler to account for the
`difference in instructions that may be executed by each
`modified or added functional unit.
`
`[0028] Another advantage of the present invention is that
`it the architecture provides for multiple execution data paths
`through the execution unit, where each execution data path
`is generally optimized for the type of computational function
`that is to be performed on the data: integer, floating point,
`and boolean.
`
`[0029] A further advantage of the present invention is that
`the number, type and computational specifics of the func(cid:173)
`tional units provided in each data path, and as between data
`paths, are mutually independent. Alteration offunction or an
`increase in the number of functional units in a data path will
`have no architectural impact on the other data functional
`units.
`
`[0030] Still another advantage of the present invention is
`that the instruction scheduler is a unified unit in that it
`schedules instructions for all of the functional units, regard(cid:173)
`less of the number of data paths implemented in the execu(cid:173)
`tion unit and the number or diversity of functions imple(cid:173)
`mented by the data path most suited for execution of a given
`instruction.
`
`BRIEF DESCRIPTION OF THE
`DRAWINGS/FIGURES
`[0031] These and other advantages and features of the
`present invention will become better understood upon con(cid:173)
`sideration of the following detailed description of the inven(cid:173)
`tion when considered in connection of the accompanying
`drawings, in which like reference numerals designate like
`parts throughout the figures thereof, and wherein:
`
`[0034] FIG. 3 is a block diagram of the program counter
`logic unit constructed in accordance with the present inven(cid:173)
`tion;
`
`[0035] FIG. 4 is a further detailed block diagram of the
`program counter data and control path logic;
`
`[0036] FIG. 5 is a simplified block diagram of the instruc(cid:173)
`tion execution unit of the present invention;
`
`[0037] FIG. 6A is a simplified block diagram of the
`register file architecture utilized in a preferred embodiment
`of the present invention;
`
`[0038] FIG. 6B is a graphic illustration of the storage
`register format of the temporary buffer register file and
`utilized in a preferred embodiment of the present invention;
`
`[0039] FIG. 6C is a graphic illustration of the primary and
`secondary instruction sets as present in the last two stages of
`the instruction FIFO unit of the present invention;
`
`[0040] FIGS. 7A, 7B and 7C provide a graphic illustration
`of the reconfigurable states of the primary integer register set
`as provided in accordance with a preferred embodiment of
`the present invention;
`
`[0041] FIG. 8 is a graphic illustration of a reconfigurable
`floating point and secondary integer register set as provided
`in accordance with the preferred embodiment of the present
`invention;
`
`[0042] FIG. 9 is a graphic illustration of a tertiary boolean
`register set as provided in a preferred embodiment of the
`present invention;
`
`[0043] FIG. 10 is a detailed block diagram of the primary
`integer processing data path portion of the instruction execu(cid:173)
`tion unit constructed in accordance with the preferred
`embodiment of the present invention;
`
`[0044] FIG. 11 is a detailed block diagram of the primary
`floating point data path portion of the instruction execution
`unit constructed in accordance with a preferred embodiment
`of the present invention;
`
`[0045] FIG. 12 is a detailed block diagram of the boolean
`operation data path portion of the instruction execution unit
`as constructed in accordance with the preferred embodiment
`of the present invention;
`
`[0046] FIG. 13 is a detailed block diagram of a load/store
`unit constructed in accordance with the preferred embodi(cid:173)
`ment of the present invention;
`
`[0047] FIG. 14 is a timing diagram illustrating the pre(cid:173)
`ferred sequence of operation of a preferred embodiment of
`the present invention in executing multiple instructions in
`accordance with the present invention;
`
`[0048] FIG. 15 is a simplified block diagram of the virtual
`memory control unit as constructed in accordance with the
`preferred embodiment of the present invention;
`
`[0032] FIG. 1 is a simplified block diagram of the pre(cid:173)
`ferred microprocessor architecture implementing the present
`invention;
`
`[0049] FIG. 16 is a graphic representation of the virtual
`memory control algorithm as utilized in a preferred embodi(cid:173)
`ment of the present invention; and
`
`[0033] FIG. 2 is a detailed block diagram of the instruction
`fetch unit constructed in accordance with the present inven(cid:173)
`tion;
`
`[0050] FIG. 17 is a simplified block diagram of the cache
`control unit as utilized in a preferred embodiment of the
`present invention.
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 20 of 50 PageID# 1119
`
`US 2007/0101103 AI
`
`May 3, 2007
`
`4
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`Table of Contents
`
`I. Microprocessor Architectural Overview
`
`II. Instruction Fetch Unit
`
`[0051] A. IFU Data Path
`
`[0052] B. IFU Control Path
`
`[0053] C. IFU/IEU Control Interface
`
`[0054] D. PC Logic Unit Detail
`
`[0055] 1. PF and ExPC Control/Data Unit Detail
`
`[0056] 2. PC Control Algorithm Detail
`
`[0057] E. Interrupt and Exception Handling
`
`[0058] 1. Overview
`
`[0059] 2. Asynchronous Interrupts
`
`[0060] 3. Synchronous Exceptions
`
`[0061] 4. Handler Dispatch and Return
`
`[0062] 5. Nesting
`
`[0063] 6. List of Traps
`
`III. Instruction Execution Unit
`
`[0064] A. lEU Data Path Detail
`
`[0065] 1. Register File Detail
`
`[0066] 2. Integer Data Path Detail
`
`[0067] 3. Floating Point Data Path Detail
`
`[0068] 4. Boolean Register Data Path Detail
`
`[0069] B. Load/Store Control Unit
`
`[0070] C. lEU Control Path Detail
`
`[0071] 1. EDecode Unit Detail
`
`[0072] 2. Carry Checker Unit Detail
`
`[0073] 3. Data Dependency Checker Unit Detail
`
`[0074] 4. Register Rename Unit Detail
`
`[0075] 5. Instruction Issuer Unit Detail
`
`[0076] 6. Done Control Unit Detail
`
`[0077] 7. Retirement Control Unit Detail
`
`[0078] 8. Control Flow Control Unit Detail
`
`[0079] 9. Bypass Control Unit Detail
`
`IV. Virtual Memory Control Unit
`
`V. Cache Control Unit
`
`VI. Summary/Conclusion
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`I. Microprocessor Architectural Overview
`
`[0080] The architecture 100 of the present invention is
`generally shown in FIG. 1. An Instruction Fetch Unit (IFU)
`102 and an Instruction Execution Unit (lEU) 104 are the
`
`principal operative elements of the architecture 100. A
`Virtual Memory Unit (VMU) 108, Cache Control Unit
`(CCU) 106, and Memory Control Unit (MCU) 110 are
`provided to directly support the function of the IFU 102 and
`lEU 104. A Memory Array Unit (MAU) 112 is also provided
`as a generally essential element for the operation of the
`architecture 100, though the MAU 112 does not directly
`exist as an integral component of the architecture 100. That
`is, in the preferred embodiments of the present invention, the
`IFU 102, lEU 104, VMU 108, CCU 106, and MCU 110 are
`fabricated on a single silicon die utilizing a conventional 0.8
`micron design rule low-power CMOS process and compris(cid:173)
`ing some 1,200,000 transistors. The standard processor or
`system clock speed of the architecture 100 is 40 MHz.
`However, in accordance with a preferred embodiment of the
`present invention, the internal processor clock speed is 160
`MHz.
`
`[0081] The IFU 102 is primarily responsible for the fetch(cid:173)
`ing of instructions, the buffering of instructions pending
`execution by the lEU 104, and, generally, the calculation of
`the next virtual address to be used for the fetching of next
`instructions.
`
`In the preferred embodiments of the present inven(cid:173)
`[0082]
`tion, instructions are each fixed at a length of 32 bits.
`Instruction sets, or "buckets" of four instructions, are
`fetched by the IFU 102 simultaneously from an instruction
`cache 132 within the CCU 106 via a 128 bit wide instruction
`bus 114. The transfer of instruction sets is coordinated
`between the IFU 102 and CCU 106 by control signals
`provided via a control bus 116. The virtual address of a
`instruction set to be fetched is provided by the IFU 102 via
`an IFU combined arbitration, control and address bus 118
`onto a shared arbitration, control and address bus 120 further
`coupled between the lEU 104 and VMU 108. Arbitration for
`access to the VMU 108 arises from the fact that both the IFU
`102 and lEU 104 utilize the VMU 108 as a common, shared
`resource. In the preferred embodiment of the architecture
`100, the low order bits defining an address within a physical
`page of the virtual address are transferred directly by the IFU
`102 to the Cache Control Unit 106 via the control lines 116.
`The virtualizing, high order bits of the virtual address
`supplied by the IFU 102 are provided by the address portion
`of the buses 118, 120 to the VMU 108 for translation into a
`corresponding physical page address. For the IFU 102, this
`physical page address is transferred directly from the VMU
`108 to the Cache Control Unit 106 via the address control
`lines 122 one-half internal processor cycle after the trans(cid:173)
`lation request is placed with the VMU 108.
`
`[0083] The instruction stream fetched by the IFU 102 is,
`in turn, provided via an instruction stream bus 124 to the
`lEU 104. Control signals are exchanged between the IFU
`102 and the lEU 104 via controls lines 126. In addition,
`certain instruction fetch addresses, typically those requiring
`access to the register file present within the lEU 104, are
`provided back to the IFU via a target address return bus
`within the control lines 126.
`
`[0084] The lEU 104 stores and retrieves data with respect
`to a data cache 134 provided within the CCU 106 via an
`80-bit wide bi-directional data bus 130. The entire physical
`address for lEU data accesses is provided via an address
`portion of the control bus 128 to the CCU 106. The control
`bus 128 also provides for the exchange of control signals
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 21 of 50 PageID# 1120
`
`US 2007/0101103 AI
`
`May 3, 2007
`
`5
`
`between the lEU 104 and CCU 106 for managing data
`transfers. The lEU 104 utilizes the VMU 108 as a resource
`for converting virtual data address into physical data
`addresses suitable for submission to the CCU 106. The
`virtualizing portion of the data address is provided via the
`arbitration, control and address bus 120 to the VMU 108.
`Unlike operation with respect to the IFU 102, the VMU 108
`returns the corresponding physical address via the bus 120
`to the lEU 104. In the preferred embodiments of the archi(cid:173)
`tecture 100, the lEU 104 requires the physical address for
`use in ensuring that load/store operations occur in proper
`program stream order.
`
`[0085] The CCU 106 performs the generally conventional
`high-level function of determining whether physical address
`defined requests for data can be satisfied from the instruction
`and data caches 132, 134, as appropriate. Where the access
`request can be properly fulfilled by access to the instruction
`or data caches 132, 134, the CCU 106 coordinates and
`performs the data transfer via the data buses 114, 128.
`
`[0086] Where a data access request cannot be satisfied
`from the instruction or data caches 132, 134, the CCU 106
`provides the corresponding physical address to the MCU
`110 along with sufficient control information to identify
`whether a read or write access of the MAU 112 is desired,
`the source or destination cache 132, 134 of the CCU 106 for
`each