Samsung Electronics Co., Ltd. v. NVIDIA Corporation, 3:14-cv-00757, No. 47-2 (E.D.Va. Jan. 12, 2015)

Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 1 of 50 PageID# 1100
`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 1 of 50 Page|D# 1100
`
`
`
`EXHIBIT B
`
`EXHIBIT B
`
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 3 of 50 PageID# 1102
`
`- - - - - - - - - - - - - - - - - . ,
`1
`I
`:
`I
`1
`I
`:
`I
`I
`I
`I
`
`~ VMU
`
`1...-
`
`'\..108
`
`V 120
`
`l/ 122
`
`1 102
`
`118\
`
`6
`
`1
`0
`
`(
`
`CCU
`
`IFU
`INST.
`116
`t..---"1"""-''---+t-\..-14-;--'! CACHE
`1
`\.132
`
`1\..128
`
`"'
`1 ..... 124
`
`lEU
`
`/134
`I
`...----L----.
`DATA
`J-..-,..-\..----l~ CACHE
`130
`\..o~o
`
`F/G._1
`
`r--
`I
`l
`
`L142
`
`I
`1
`1
`:
`\..136 1
`I
`1
`PORT
`I
`1 SWITCH
`I
`I
`I
`I
`1
`\.138 I
`I
`
`T :
`
`I
`I
`I
`I
`1
`I
`I
`r-- _I
`150
`140 I
`I
`/
`•r-
`(.'156 I t - -r+l ~-r
`1
`I
`110
`I
`I \160\
`.
`I
`l\164
`I
`} MCU
`110 .. , ________________ _.
`'
`
`1460\
`I p
`l
`0
`~520
`
`146 (\
`
`p 1
`
`'I
`
`•
`:
`•
`
`Pn
`
`154 ....._
`'
`
`~
`
`l.
`1521
`
`148
`""'\
`n
`1
`
`l.
`152n
`
`(')
`
`~ .....
`
`'"= ~ .....
`('D = .....
`~ 'e -....
`.... 0 = '"= = 0" -....
`.... 0 =
`
`(')
`
`~ .....
`
`~
`~
`
`(.H
`
`~
`
`N
`0
`0
`-....l
`
`('D
`('D
`
`rFJ =(cid:173)
`.....
`....
`0 .....
`....
`
`.j;o.
`
`c
`rFJ
`N
`0
`0
`~
`0 ....
`0 ....
`....
`> ....
`
`0
`(.H
`
`MAU
`
`\.112
`
`·
`
`1'-162
`
`'
`~
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 7 of 50 PageID# 1106
`
`Patent Application Publication May 3, 2007 Sheet 5 of 14
`
`US 2007/0101103 A1
`
`PC INC/SIZE
`IFIFO AD
`
`CF RESULT
`
`lr124
`
`490 \
`
`IEDECODE
`
`342 '
`
`lr344
`
`ir348
`
`502 \
`492\
`
`CRY I
`CHKRI
`
`504 \
`
`v506
`
`hr4s4
`DEP
`CHKR
`
`r5s6
`
`'r508
`
`t
`
`RETIRE
`CTL
`500 _) 534 j
`
`341 ~'
`IN TINFO 4~
`510 '
`REG
`RENAME
`..__
`..__
`
`470 "'\.
`
`OP
`
`528\
`
`CF
`CTL
`
`-
`
`[r530
`
`484 \
`
`r-1 LOAD
`
`STORE
`
`486
`
`Ex DATA
`
`482 \
`
`'
`
`INT
`REG/ACK
`j340
`
`f-516
`
`/
`
`lr-512
`
`•
`
`ISSUER~
`\.350 'C
`498 542 "\
`lEU
`RETURN
`
`514-
`
`540\
`
`DONE
`CTL
`
`\_518
`
`\.532
`
`'----i REG
`ARRAY
`
`472 j
`
`476'
`-474\
`
`BYPASS _I
`
`480 \
`
`r478 1
`
`(4780 l
`•
`I FUo I I FU1 I
`~
`
`4782
`
`J478n
`FU2 _ • • • FUn .I
`~
`
`104 _)
`
`r-522
`
`520\
`
`r-526
`
`BYPASS
`CTL
`j
`
`/524
`F/G._5
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 9 of 50 PageID# 1108
`
`Patent Application Publication May 3, 2007 Sheet 7 of 14
`
`US 2007/0101103 A1
`
`.
`
`I
`. rb[31]
`
`rf[31]
`
`1f572
`
`rc[31]
`
`574
`
`rc[O)
`
`rt[O]
`
`rb[O]
`
`FJG._B
`
`IF PADDR
`VM PADDR
`IF 10 OUT
`IFIDIN
`IF DATA
`
`IF IREAD
`IF IBUSY
`IFIRDY
`
`r324
`
`'r524
`r294
`r296
`/r 114
`128
`r298
`.r300
`.r302
`
`/
`
`ExPADDR
`Ex 10 IN
`ExiD OUT
`
`(788
`
`.r796
`r796
`
`Ex DATA
`
`ExREQ
`Ex BUSY
`Ex ROY
`ExA!W
`Ex WIDTH
`
`,.r792'
`
`/
`80
`
`A '
`
`\
`
`(
`
`784
`
`r 106
`
`v-132
`
`FIG._9
`
`LB/60
`"'
`28
`_L862
`
`MCUADDR
`
`MCU CTL+ 10
`
`_/ r 136
`,.
`64
`
`MCU lBUS
`
`[13:4]
`[31:14]
`[3:0)
`[3:0]
`
`ICACHE
`
`DCACHE
`
`r-134
`
`MCUADDR
`
`Ls:4
`"' 32
`r866 MCU CTL+ 10
`
`. LL_138
`
`/
`64
`
`MCU DBUS
`
`ccu
`
`Fl G._17
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 10 of 50 PageID# 1109
`
`Patent Application Publication May 3, 2007 Sheet 8 of 14
`
`US 2007/0101103 A1
`
`SPECIAL REG
`ADDRESS DATA
`
`IF_PC
`
`~ IMED. OP.
`594
`j352_,
`584 ~ .
`.
`l ALIGN I
`582
`596 -----:--.
`/
`_.,
`64/
`32
`
`OFFSET
`~378'
`(600 r586
`INT
`LOAD
`STORE
`
`98
`_L__5
`Ex DATA
`
`354-
`
`592
`
`LD_OP
`v ~-
`.
`32; .
`588
`ALIGN
`1\.,
`59o reo2
`MUX j LATCH J ·
`Keo4
`f"6o6
`1
`
`580\
`
`MUX
`608
`610 ~ J
`t
`t
`TEMPBUF
`
`612
`
`616-1
`
`618 ~ I -
`
`[r482'
`
`v32
`
`/
`
`:
`
`REG
`614~ ARRAY
`..-I
`.J
`
`622
`
`MUX
`
`620
`624
`
`BYPASS
`
`632 _..1,
`
`t
`
`t
`
`626
`628~ J
`630
`636 _}, •
`I
`ROUTER
`634
`I .r650
`/"648 V646
`354 _) .
`ALU1 ll SHFT
`
`644
`
`ALUO
`
`/'"642
`
`640
`
`INTBYPASS
`FP BYPASS
`
`SPECIAL REG
`ADDRESS
`DATA
`
`TARGETADDR
`ExPADDR
`
`654
`
`656
`
`LATC.H
`
`6
`58~
`ExVADDR
`
`650\
`
`INTFU
`
`FPFU
`
`___,
`
`652J
`
`FIG._ 10
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 13 of 50 PageID# 1112
`
`1
`REG
`FILE
`
`472 j
`
`k ,
`
`78iJ
`___._.--.
`
`7821
`
`jALIGNeAI~ 164
`
`764 .I
`
`798 7 164
`
`/
`
`792 .I
`
`ccu
`DATA
`
`654
`
`102 ./
`
`326
`
`762 '\
`
`7683
`
`1\.
`7700
`_____,._._,
`
`4x4
`COMPARITOR
`
`\_772
`-
`~776
`
`I
`
`2EA2I ,,
`\ f J. / , I
`\ SEL t I
`t
`l
`786 F F
`
`77o
`
`7
`703
`
`SA3
`
`790 ./
`
`1
`
`c7so
`
`/1
`7881
`ccu
`PAD DR
`[31:0]
`
`LOAD/STORE
`/"'TI
`-··
`lt-784
`
`I
`"-774
`lr796
`
`ccu
`CTL
`
`ExiD
`[4:0]
`
`108 j
`
`VADDA
`
`PAD DR
`
`VMU
`
`~794
`
`"' 778
`
`I
`
`(')
`
`~ .....
`
`'"= ~ .....
`('D = .....
`~ 'e -....
`.... 0 = '"= = 0" -....
`.... 0 =
`
`(')
`
`~ .....
`
`~
`~
`
`(.H
`
`~
`
`N
`0
`0
`-....l
`
`('D
`('D
`
`rFJ =(cid:173)
`.....
`....
`....
`0 .....
`....
`
`.j;o.
`
`'--760
`
`FIG._13
`
`c
`rFJ
`N
`0
`0
`~
`0 ....
`0 ....
`....
`> ....
`
`0
`(.H
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 15 of 50 PageID# 1114
`
`(')
`
`~ .....
`
`'"= ~ .....
`('D = .....
`~ 'e -....
`.... 0 = '"= = 0" -....
`.... 0 =
`
`(')
`
`~ .....
`
`~
`~
`
`(.H
`
`~
`
`N
`0
`0
`-....l
`
`('D
`('D
`
`(.H
`
`rFJ =(cid:173)
`.....
`....
`0 .....
`....
`
`.j;o.
`
`-V
`
`(822
`
`(334
`
`(332
`
`VM Ex
`RDY
`
`VM MISS
`
`VM
`EXEP
`
`BOO
`
`VMU CTL
`LOGIC
`
`(_820
`
`M Ex
`BUSY
`
`SPACE 10
`
`1'"832
`47~ 1'_834
`VIRT PAGE
`i4" (836 PHYS PAGE
`7
`&CTL.
`20
`
`/
`
`[930
`
`HIT
`
`I
`CTL
`OUT v
`
`,(838
`
`I
`
`\
`
`I
`v
`
`I NIL
`cs
`AD
`
`WA
`
`108)
`
`ADDR
`
`(824
`
`~
`
`,
`18
`
`VM
`PAD DR
`
`CAM
`
`v-so2
`
`5
`
`c
`rFJ
`N
`0
`0
`~
`0 ....
`0 ....
`....
`> ....
`
`0
`(.H
`
`\
`
`\
`
`~8
`
`,I
`
`~8
`
`,I
`
`32.
`9\
`IFVM REQ
`804
`~\
`ExVM REO
`·so
`5\
`ExVMRf\N
`326
`IF VADDR
`808
`ExVADDR
`0\
`81
`VMU RIW
`2\
`81
`VMU SEL
`4\
`81
`VM RESET
`s\
`81
`VM LOAD
`9\
`818
`VM CLEAR
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 17 of 50 PageID# 1116
`
`US 2007/0101103 AI
`
`May 3, 2007
`
`1
`
`HIGH-PERFORMANCE SUPERSCALAR-BASED
`COMPUTER SYSTEM WITH OUT-OF ORDER
`INSTRUCTION EXECUTION AND CONCURRENT
`RESULTS DISTRIBUTION
`
`[0008] 6. Microprocessor Architecture with a Switch Net(cid:173)
`work for Data transfer Between Cache, Memory Port, and
`IOU, invented by Lentz eta!., application Ser. No. 07/726,
`893, filed Jul. 8, 1991, now U.S. Pat. No. 5,440,752.
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`[0001] This application is a continuation of U.S. patent
`application Ser. No. 09/393,662, filed Sep. 10, 1999, now
`allowed, entitled High-Performance, Superscalar-Based
`Computer System with Out-of-Order Instruction Execution
`and Concurrent Results Distribution, which is a continuation
`ofU.S. patent application Ser. No. 09/158,568, filed Sep. 22,
`1998, now U.S. Pat. No. 6,038,653, which is a continuation
`ofU.S. patent application Ser. No. 08/716,728, filed Sep. 23,
`1996, now U.S. Pat. No. 5,832,292, which is a continuation
`of U.S. patent application Ser. No. 08/397,016, filed Mar. 1,
`1995, now U.S. Pat. No. 5,560,032, which is a continuation
`of U.S. patent application Ser. No. 07/817,809, filed Jan. 8,
`1992, now abandoned, which is a continuation of U.S. patent
`application Ser. No. 07/727,058 filed Jul. 8, 1991, now
`abandoned. Each of the above-referenced applications 1s
`incorporated by reference in its entirety herein.
`
`[0002] The present application is related to the following
`Applications, all assigned to the Assignee of the present
`Application:
`
`[0003] 1. High-Performance, Superscalar-Based Com(cid:173)
`puter System with Out-of-Order Instruction Execution,
`invented by Nguyen eta!., application Ser. No. 08/602,021,
`filed Feb. 15, 1996, now allowed, which is a continuation of
`application Ser. No. 07/817,810, filed Jan. 8, 1992, now U.S.
`Pat. No. 5,539,911, which is a continuation of Ser. No.
`07/727,006, filed Jul. 8, 1991;
`
`[0004] 2. RISC Microprocessor Architecture with Isolated
`Architectural Dependencies, invented by Nguyen et a!.,
`application Ser. No. 08/292,177, filed Aug. 18, 1994, which
`is a continuation of Ser. No. 07/817,807, filed Jan. 8, 1992,
`which is a continuation of Ser. No. 07/726,744, filed Jul. 8,
`1991;
`
`[0005] 3. RISC Microprocessor Architecture Implement(cid:173)
`ing Multiple Typed Register Sets, invented by Garg et a!.,
`application Ser. No. 07/726,773, filed Jul. 8, 1991, now U.S.
`Pat. No. 5,493,687;
`
`[0006] 4. RISC Microprocessor Architecture Implement(cid:173)
`ing Fast Trap and Exception State, invented by Nguyen et
`a!., application Ser. No. 08/345,333, filed Nov. 21, 1994,
`now U.S. Pat. No. 5,481,685, which is a continuation ofSer.
`No. 08/171,968, filed Dec. 23, 1993, which is a continuation
`of Ser. No. 07/817,811, filed Jan. 8, 1992, which is a
`continuation of Ser. No. 07/726,942, filed Jul. 8, 1991;
`
`[0007] 5. Page Printer Controller Including a Single Chip
`Superscalar Microprocessor with Graphics Functional
`Units, invented by Lentz eta!., application Ser. No. 08/267,
`646 filed Jun. 28, 1994, now U.S. Pat. No. 5,394,515, which
`is a continuation ofSer. No. 07/817,813, filed Jan. 8, 1992,
`which is a continuation of Ser. No. 07/726,929, filed Jul. 8,
`1991; and
`
`BACKGROUND OF THE INVENTION
`
`[0009] 1. Field of the Invention
`
`[0010] The present invention is generally related to the
`design of RISC type microprocessor architectures and, in
`particular, to a RISC microprocessor architecture that may
`be readily expanded for increased computational through(cid:173)
`put through the addition of functional computing elements,
`including those tailored for a particular computational func(cid:173)
`tion, into the architecture.
`
`[0011] 2. Background
`
`[0012] Recently, the design of microprocessor architec(cid:173)
`tures have matured from the use of Complex Instruction Set
`Computer (CISC) to simpler Reduced Instruction Set Com(cid:173)
`puter (RISC) Architectures. The CISC architectures are
`notable for the provision of substantial hardware to imple(cid:173)
`ment and support an instruction execution pipeline. The
`typical conventional pipeline structure includes, in fixed
`order,
`instruction fetch,
`instruction decode, data load,
`instruction execute and data store stages. A performance
`advantage is obtained by the concurrent execution of dif(cid:173)
`ferent portions of a set of instructions through the respective
`stages of the pipeline. The longer the pipeline, the greater the
`number of execution stages available and the greater number
`of instructions that can be concurrently executed.
`
`[0013] Two general problems limit the effectiveness of
`CISC pipeline architectures. The first problem is that con(cid:173)
`ditional branch instructions may not be adequately evaluated
`until a prior condition code setting instruction has substan(cid:173)
`tially completed execution through the pipeline.
`
`[0014] Thus, the subsequent execution of the conditional
`branch instruction is delayed, or stalled, resulting in several
`pipeline stages remaining inactive for multiple processor
`cycles. Typically, the condition codes are written to a
`condition code register, also referred to as a processor status
`register (PSR), only at completion of processing an instruc(cid:173)
`tion through the execution stage. Thus, the pipeline must be
`stalled with the conditional branch instruction in the decode
`stage for multiple processor cycles pending determination of
`the branch condition code. The stalling of the pipeline
`results in a substantial loss of through-put. Further, the
`average through-put of the computer will be substantially
`dependent on the mere frequency of conditional branch
`instructions occurring closely after the condition code set(cid:173)
`ting instructions in the program instruction stream.
`
`[0015] A second problem arises from the fact that instruc(cid:173)
`tions closely occurring in the program instruction stream
`will tend to reference the same registers of the processor
`register file. Data registers are often used as the destination
`or source of data in the store and load stages of successive
`instructions. In general, an instruction that stores data to the
`register file must complete processing through at least the
`execution stage before the load stage processing of a sub(cid:173)
`sequent instruction can be allowed to access the register file.
`Since the execution of many instructions require multiple
`processor cycles in the single execution stage to produce
`store data, the entire pipeline is typically stalled for the
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 18 of 50 PageID# 1117
`
`US 2007/0101103 AI
`
`May 3, 2007
`
`2
`
`duration of an execution stage operation. Consequently, the
`execution through-put of the computer is substantially
`dependent on the internal order of the instruction stream
`being executed.
`
`[0016] A third problem arises not so much from the
`execution of the instructions themselves, but the mainte(cid:173)
`nance of the hardware supported instruction execution envi(cid:173)
`ronment, or state-of-the-machine, of the microprocessor
`itself. Contemporary CISC microprocessor hardware sub(cid:173)
`systems can detect the occurrence of trap conditions during
`the execution of instructions. Traps include hardware inter(cid:173)
`rupts, software traps and exceptions. Each trap requires
`execution of a corresponding trap handling routines by the
`processor. On detection of the trap, the execution pipeline
`must be cleared to allow the immediate execution of the trap
`handling routine. Simultaneously, the state-of-the-machine
`must be established as of the precise point of occurrence of
`the trap; the precise point occurring at the conclusion of the
`first currently executing instruction for interrupts and traps
`and immediately prior to an instruction that fails due to a
`exception. Subsequently,
`the state-of-the-machine and,
`again depending on the nature of the trap the executing
`instruction itself must be restored at the completion of the
`handling routine. Consequently, with each trap or related
`event, a latency is introduced by the clearing of the pipeline
`at both the inception and conclusion of the handling routine
`and storage and return of the precise state-of-the-machine
`with corresponding reduction in the through-put of the
`processor.
`
`[0017] These problems have been variously addressed in
`an effort to improve the potential through-put of CISC
`architectures. Assumptions can be made about the proper
`execution of conditional branch instructions, thereby allow(cid:173)
`ing pipeline execution to tentatively proceed in advance of
`the final determination of the branch condition code.
`Assumptions can also be made as to whether a register will
`be modified, thereby allowing subsequent instructions to
`also be tentatively executed. Finally, substantial additional
`hardware can be provided to minimize the occurrence of
`exceptions that require execution of handling routines and
`thereby reduce the frequency of exceptions that interrupt the
`processing of the program instruction stream.
`
`[0018] These solutions, while obviously introducing sub(cid:173)
`stantial additional hardware complexities, also introduce
`distinctive problems of their own. The continued execution
`of instructions in advance of a final resolution of either a
`branch condition or register file store access require that the
`state-of-the-machine be restorable to any of multiple points
`in the program instruction stream including the location of
`the conditional branch, each modification of a register file,
`and for any occurrence of an exception; potentially to a point
`prior to the fully completed execution of the last several
`instructions. Consequently, even more supporting hardware
`is required and, further, must be particularly designed not to
`significantly increase the cycle time of any pipeline stage.
`
`[0019] RISC architectures have sought to avoid many of
`the foregoing problems by drastically simplifYing the hard(cid:173)
`ware implementation of the microprocessor architecture. In
`the extreme, each RISC instruction executes in only three
`pipelined program cycles including a load cycle, an execu(cid:173)
`tion cycle, and a store cycle. Through the use of load and
`
`store data bypassing, conventional RISC architectures can
`essentially execute a single instruction per cycle in the three
`stage pipeline.
`
`[0020] Whenever possible, hardware support in RISC
`architectures is minimized in favor of software routines for
`performing the required functions. Consequently, the RISC
`architecture holds out the hope of substantial flexibility and
`high speed through the use of a simple load/store instruction
`set executed by an optimally matched pipeline. And in
`practice, RISC architectures have been found to benefit from
`the balance between a short, high-performance pipeline and
`the need to execute substantially greater numbers of instruc(cid:173)
`tions to implement all required functions.
`
`[0021] The design of the RISC architecture generally
`avoids or minimizes the problems encountered by CISC
`architectures with regard to branches, register references and
`exceptions. The pipeline involved in a RISC architecture is
`short and optimized for speed. The shortness of the pipeline
`minimizes the consequences of a pipeline stall or clear as
`well as minimizing the problems in restoring the state-of(cid:173)
`the-machine to an earlier execution point.
`
`[0022] However, significant
`through-put performance
`gains over the generally realized present levels cannot be
`readily achieved by the conventional RISC architecture.
`Consequently, alternate, so-called superscalar architectures,
`have been variously proposed. These architectures generally
`attempt to execute multiple instructions concurrently and
`thereby proportionately increase the through-put of the
`processor. Unfortunately, such architectures are, again, sub(cid:173)
`ject to similar, if not the same conditional branch, register
`referencing, and exception handling problems as encoun(cid:173)
`tered by CISC architectures.
`
`[0023] A particular problem encountered by conventional
`superscalar architectures is that their inherent complexity
`generally precludes modification of the architecture without
`substantial redesign of foundational aspects of the architec(cid:173)
`ture. The handling of multiple concurrent instruction execu(cid:173)
`tions imposes substantial control constraints on the archi(cid:173)
`tecture in order to maintain certainty of the correctness of the
`execution of an instruction stream. Indeed, the execution of
`some instructions may complete before that of instructions
`earlier in the program instruction stream. Consequently, the
`control logic that manages even the fundamental aspects of
`instruction execution must often be redesigned to allow for
`architectural modifications that affect the execution flow of
`any particular instruction.
`
`BRIEF SUMMARY OF THE INVENTION
`
`[0024] Thus, a general purpose of the present invention is
`to provide a high-performance, RISC based, superscalar
`processor architecture suitable for ready architectural
`enhancement through the addition and alteration of compu(cid:173)
`tation augmenting functional units.
`
`[0025] This purpose is obtained in the present invention
`through the provision of a microprocessor architecture that
`includes an instruction fetch unit for fetching instruction sets
`from an instruction store and an execution unit that imple(cid:173)
`ments the concurrent execution of a plurality of instructions
`through a parallel array of functional units. The fetch unit
`generally maintains a predetermined number of instructions
`in an instruction buffer. The execution unit includes an
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 19 of 50 PageID# 1118
`
`US 2007/0101103 AI
`
`May 3, 2007
`
`3
`
`instruction selection unit, coupled to the instruction buffer,
`for selecting instructions for execution, and a plurality of
`functional units for performing instruction specified func(cid:173)
`tional operations.
`
`[0026] The instruction selection unit preferably includes
`an instruction decoder and related logic, coupled to the
`instruction buffer, for determining the availability of instruc(cid:173)
`tions for execution, and an instruction scheduler, coupled to
`each of the functional units for determining their respective
`execution status, for scheduling the initiation of the process(cid:173)
`ing of instructions through the functional units. The instruc(cid:173)
`tion scheduler schedules instructions determined to be avail(cid:173)
`able for execution and for which the instruction scheduler
`determines at least one of the functional units implementing
`a necessary computational function is available.
`
`[0027] Consequently, an advantage of the present inven(cid:173)
`tion is that the execution unit may be readily modified with
`respect to any desired modification of the functions per(cid:173)
`formed by any or all of the functional units including due to
`the modification of the function performed by predetermined
`one of said functional units and due to the provision of
`additional functional units. The modification and addition of
`functional units essentially requires only a corresponding
`modification of the instruction scheduler to account for the
`difference in instructions that may be executed by each
`modified or added functional unit.
`
`[0028] Another advantage of the present invention is that
`it the architecture provides for multiple execution data paths
`through the execution unit, where each execution data path
`is generally optimized for the type of computational function
`that is to be performed on the data: integer, floating point,
`and boolean.
`
`[0029] A further advantage of the present invention is that
`the number, type and computational specifics of the func(cid:173)
`tional units provided in each data path, and as between data
`paths, are mutually independent. Alteration offunction or an
`increase in the number of functional units in a data path will
`have no architectural impact on the other data functional
`units.
`
`[0030] Still another advantage of the present invention is
`that the instruction scheduler is a unified unit in that it
`schedules instructions for all of the functional units, regard(cid:173)
`less of the number of data paths implemented in the execu(cid:173)
`tion unit and the number or diversity of functions imple(cid:173)
`mented by the data path most suited for execution of a given
`instruction.
`
`BRIEF DESCRIPTION OF THE
`DRAWINGS/FIGURES
`[0031] These and other advantages and features of the
`present invention will become better understood upon con(cid:173)
`sideration of the following detailed description of the inven(cid:173)
`tion when considered in connection of the accompanying
`drawings, in which like reference numerals designate like
`parts throughout the figures thereof, and wherein:
`
`[0034] FIG. 3 is a block diagram of the program counter
`logic unit constructed in accordance with the present inven(cid:173)
`tion;
`
`[0035] FIG. 4 is a further detailed block diagram of the
`program counter data and control path logic;
`
`[0036] FIG. 5 is a simplified block diagram of the instruc(cid:173)
`tion execution unit of the present invention;
`
`[0037] FIG. 6A is a simplified block diagram of the
`register file architecture utilized in a preferred embodiment
`of the present invention;
`
`[0038] FIG. 6B is a graphic illustration of the storage
`register format of the temporary buffer register file and
`utilized in a preferred embodiment of the present invention;
`
`[0039] FIG. 6C is a graphic illustration of the primary and
`secondary instruction sets as present in the last two stages of
`the instruction FIFO unit of the present invention;
`
`[0040] FIGS. 7A, 7B and 7C provide a graphic illustration
`of the reconfigurable states of the primary integer register set
`as provided in accordance with a preferred embodiment of
`the present invention;
`
`[0041] FIG. 8 is a graphic illustration of a reconfigurable
`floating point and secondary integer register set as provided
`in accordance with the preferred embodiment of the present
`invention;
`
`[0042] FIG. 9 is a graphic illustration of a tertiary boolean
`register set as provided in a preferred embodiment of the
`present invention;
`
`[0043] FIG. 10 is a detailed block diagram of the primary
`integer processing data path portion of the instruction execu(cid:173)
`tion unit constructed in accordance with the preferred
`embodiment of the present invention;
`
`[0044] FIG. 11 is a detailed block diagram of the primary
`floating point data path portion of the instruction execution
`unit constructed in accordance with a preferred embodiment
`of the present invention;
`
`[0045] FIG. 12 is a detailed block diagram of the boolean
`operation data path portion of the instruction execution unit
`as constructed in accordance with the preferred embodiment
`of the present invention;
`
`[0046] FIG. 13 is a detailed block diagram of a load/store
`unit constructed in accordance with the preferred embodi(cid:173)
`ment of the present invention;
`
`[0047] FIG. 14 is a timing diagram illustrating the pre(cid:173)
`ferred sequence of operation of a preferred embodiment of
`the present invention in executing multiple instructions in
`accordance with the present invention;
`
`[0048] FIG. 15 is a simplified block diagram of the virtual
`memory control unit as constructed in accordance with the
`preferred embodiment of the present invention;
`
`[0032] FIG. 1 is a simplified block diagram of the pre(cid:173)
`ferred microprocessor architecture implementing the present
`invention;
`
`[0049] FIG. 16 is a graphic representation of the virtual
`memory control algorithm as utilized in a preferred embodi(cid:173)
`ment of the present invention; and
`
`[0033] FIG. 2 is a detailed block diagram of the instruction
`fetch unit constructed in accordance with the present inven(cid:173)
`tion;
`
`[0050] FIG. 17 is a simplified block diagram of the cache
`control unit as utilized in a preferred embodiment of the
`present invention.
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 20 of 50 PageID# 1119
`
`US 2007/0101103 AI
`
`May 3, 2007
`
`4
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`Table of Contents
`
`I. Microprocessor Architectural Overview
`
`II. Instruction Fetch Unit
`
`[0051] A. IFU Data Path
`
`[0052] B. IFU Control Path
`
`[0053] C. IFU/IEU Control Interface
`
`[0054] D. PC Logic Unit Detail
`
`[0055] 1. PF and ExPC Control/Data Unit Detail
`
`[0056] 2. PC Control Algorithm Detail
`
`[0057] E. Interrupt and Exception Handling
`
`[0058] 1. Overview
`
`[0059] 2. Asynchronous Interrupts
`
`[0060] 3. Synchronous Exceptions
`
`[0061] 4. Handler Dispatch and Return
`
`[0062] 5. Nesting
`
`[0063] 6. List of Traps
`
`III. Instruction Execution Unit
`
`[0064] A. lEU Data Path Detail
`
`[0065] 1. Register File Detail
`
`[0066] 2. Integer Data Path Detail
`
`[0067] 3. Floating Point Data Path Detail
`
`[0068] 4. Boolean Register Data Path Detail
`
`[0069] B. Load/Store Control Unit
`
`[0070] C. lEU Control Path Detail
`
`[0071] 1. EDecode Unit Detail
`
`[0072] 2. Carry Checker Unit Detail
`
`[0073] 3. Data Dependency Checker Unit Detail
`
`[0074] 4. Register Rename Unit Detail
`
`[0075] 5. Instruction Issuer Unit Detail
`
`[0076] 6. Done Control Unit Detail
`
`[0077] 7. Retirement Control Unit Detail
`
`[0078] 8. Control Flow Control Unit Detail
`
`[0079] 9. Bypass Control Unit Detail
`
`IV. Virtual Memory Control Unit
`
`V. Cache Control Unit
`
`VI. Summary/Conclusion
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`I. Microprocessor Architectural Overview
`
`[0080] The architecture 100 of the present invention is
`generally shown in FIG. 1. An Instruction Fetch Unit (IFU)
`102 and an Instruction Execution Unit (lEU) 104 are the
`
`principal operative elements of the architecture 100. A
`Virtual Memory Unit (VMU) 108, Cache Control Unit
`(CCU) 106, and Memory Control Unit (MCU) 110 are
`provided to directly support the function of the IFU 102 and
`lEU 104. A Memory Array Unit (MAU) 112 is also provided
`as a generally essential element for the operation of the
`architecture 100, though the MAU 112 does not directly
`exist as an integral component of the architecture 100. That
`is, in the preferred embodiments of the present invention, the
`IFU 102, lEU 104, VMU 108, CCU 106, and MCU 110 are
`fabricated on a single silicon die utilizing a conventional 0.8
`micron design rule low-power CMOS process and compris(cid:173)
`ing some 1,200,000 transistors. The standard processor or
`system clock speed of the architecture 100 is 40 MHz.
`However, in accordance with a preferred embodiment of the
`present invention, the internal processor clock speed is 160
`MHz.
`
`[0081] The IFU 102 is primarily responsible for the fetch(cid:173)
`ing of instructions, the buffering of instructions pending
`execution by the lEU 104, and, generally, the calculation of
`the next virtual address to be used for the fetching of next
`instructions.
`
`In the preferred embodiments of the present inven(cid:173)
`[0082]
`tion, instructions are each fixed at a length of 32 bits.
`Instruction sets, or "buckets" of four instructions, are
`fetched by the IFU 102 simultaneously from an instruction
`cache 132 within the CCU 106 via a 128 bit wide instruction
`bus 114. The transfer of instruction sets is coordinated
`between the IFU 102 and CCU 106 by control signals
`provided via a control bus 116. The virtual address of a
`instruction set to be fetched is provided by the IFU 102 via
`an IFU combined arbitration, control and address bus 118
`onto a shared arbitration, control and address bus 120 further
`coupled between the lEU 104 and VMU 108. Arbitration for
`access to the VMU 108 arises from the fact that both the IFU
`102 and lEU 104 utilize the VMU 108 as a common, shared
`resource. In the preferred embodiment of the architecture
`100, the low order bits defining an address within a physical
`page of the virtual address are transferred directly by the IFU
`102 to the Cache Control Unit 106 via the control lines 116.
`The virtualizing, high order bits of the virtual address
`supplied by the IFU 102 are provided by the address portion
`of the buses 118, 120 to the VMU 108 for translation into a
`corresponding physical page address. For the IFU 102, this
`physical page address is transferred directly from the VMU
`108 to the Cache Control Unit 106 via the address control
`lines 122 one-half internal processor cycle after the trans(cid:173)
`lation request is placed with the VMU 108.
`
`[0083] The instruction stream fetched by the IFU 102 is,
`in turn, provided via an instruction stream bus 124 to the
`lEU 104. Control signals are exchanged between the IFU
`102 and the lEU 104 via controls lines 126. In addition,
`certain instruction fetch addresses, typically those requiring
`access to the register file present within the lEU 104, are
`provided back to the IFU via a target address return bus
`within the control lines 126.
`
`[0084] The lEU 104 stores and retrieves data with respect
`to a data cache 134 provided within the CCU 106 via an
`80-bit wide bi-directional data bus 130. The entire physical
`address for lEU data accesses is provided via an address
`portion of the control bus 128 to the CCU 106. The control
`bus 128 also provides for the exchange of control signals
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-2 Filed 01/12/15 Page 21 of 50 PageID# 1120
`
`US 2007/0101103 AI
`
`May 3, 2007
`
`5
`
`between the lEU 104 and CCU 106 for managing data
`transfers. The lEU 104 utilizes the VMU 108 as a resource
`for converting virtual data address into physical data
`addresses suitable for submission to the CCU 106. The
`virtualizing portion of the data address is provided via the
`arbitration, control and address bus 120 to the VMU 108.
`Unlike operation with respect to the IFU 102, the VMU 108
`returns the corresponding physical address via the bus 120
`to the lEU 104. In the preferred embodiments of the archi(cid:173)
`tecture 100, the lEU 104 requires the physical address for
`use in ensuring that load/store operations occur in proper
`program stream order.
`
`[0085] The CCU 106 performs the generally conventional
`high-level function of determining whether physical address
`defined requests for data can be satisfied from the instruction
`and data caches 132, 134, as appropriate. Where the access
`request can be properly fulfilled by access to the instruction
`or data caches 132, 134, the CCU 106 coordinates and
`performs the data transfer via the data buses 114, 128.
`
`[0086] Where a data access request cannot be satisfied
`from the instruction or data caches 132, 134, the CCU 106
`provides the corresponding physical address to the MCU
`110 along with sufficient control information to identify
`whether a read or write access of the MAU 112 is desired,
`the source or destination cache 132, 134 of the CCU 106 for
`each

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases