throbber
United States Patent
`Papworth et al.
`
`[19]
`
`11111111111111 iii 1111111111111111111 iiiii 111111111111111111111111111111111
`US005584038A
`[u] Patent Number:
`[45] Date tif Patent:
`
`5,584,038
`Dec. 10, 1996
`
`[54] ENTRY ALLOCATION IN A CIRCULAR
`BUFFER USING WRAP BITS INDICATING
`WHETHER A QUEUE OF THE CIRCULAR
`BUFFER HAS BEEN TRAVERSED
`
`[75]
`
`Inventors: David B. Papworth, Beaverton;
`Andrew F. Glew; Michael A.
`Fetterman, both of Hillsboro; Glenn J.
`Hinton; Robert P. Colwell, both of
`Portland; Steven J. Griffith, Aloha;
`Shantanu R. Gupta; Narayan Hegde,
`both of Beaverton, all of Oreg.
`
`[73] Assignee: Intel Corporation, Santa Clara, Calif.
`
`[21] Appl. No.: 633,905
`
`[22] Filed:
`
`Apr. 17, 1996
`
`Related U.S. Application Data
`
`[62] Division of Ser. No. 204,760, Mar. 1, 1994.
`
`Int. C1.6
`[51]
`[52] U.S. Cl.
`
`[58]
`
`Field of Search
`
`GO6F 12/00
`3951800; 364/251; 364/244.4;
`364/DIG. 1; 395/437; 395/250
`395/800, 375,
`395/437, 250
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`4,616,338 10/1986 Helen
`5,226,126
`7/1993 McFarland
`1/1995 Brunelle
`5,381,528
`9/1995 Nguyen
`5,450,547
`
`395/250
`3951800
`395/250
`395/250
`
`5,450,560
`9/1995. Bridges
`5,471,598
`11/1995 Quattromani
`5,473,756 12/1995 Traylor
`
`395/410
`395/449
`395/250
`
`OTHER PUBLICATIONS
`
`A Parallel, High Speed Circular Queue Structure by A. E.
`Barbour et al., 1990 IEEE, Circuits and Systems.
`
`Primary Examiner -Eric Coleman
`Attorney, Agent, or Firm - Blakely, Sokoloff, Taylor & Zaf-
`man
`
`[57]
`
`ABSTRACT
`
`An allocator assigns entries for a circular buffer. The allo-
`cator receives requests for storing data in entries of the
`circular buffer, and generates a head pointer to identify a
`starting entry in the circular buffer for which circular buffer
`entries are not allocated. In addition to pointing to an entry
`in the circular buffer, the head pointer includes a wrap bit.
`The allocator toggles the wrap bit each time the allocator
`traverses the linear queue of the circular buffer. A tail pointer
`is generated, including the wrap bit, to identify an ending
`entry in the circular buffer for which circular buffer entries
`are allocated. In response to the request for entries, the
`allocator sequentially assigns entries for the requests located
`between the head pointer and the tail pointer. The allocator
`has application for use in a microprocessor performing
`out -of -order dispatch anti speculative execution. The allo-
`cator is coupled to a reorder buffer, configured as a circular
`buffer, to permit allocation of entries. The allocator utilizes
`an all or nothing allocation policy, such that either all or no
`incoming instructions are allocated during an allocation
`period.
`
`8 Claims, 8 Drawing Sheets
`
`Memory
`160
`
`External Bu,
`
`Memary &uisp,es
`155
`
`MOB
`164
`
`Brant Target
`Buffa Circuit
`197
`
`&upmaler
`Remotion Cluster
`130 I
`
`Branch
`Huffer Cache
`108
`Brenda
`Instruction
`Pointer Table
`162
`
`ranch Addrea
`Calculator
`164
`
`Ou[opOrder
`Cluster -
`110
`
`A
`
`e
`
`(ea
`
`SEC
`
`e5
`
`116
`
`t
`
`Reorder
`Buffer
`120
`
`tRRF
`
`Bus Interface
`102
`
`Inswrfim
`Fetch Unit
`105
`
`Instruction
`
`109
`
`Allocato
`112
`
`aff rder
`acne Cluster
`103
`
`Mieroproeessor100
`
`Retirement
`>140e
`140
`
`4
`EXHIBIT
`Witness:
`R. COLWELL, PhD
`Date: 2 -27 -17
`Sudan Magee CSR No 17661
`
`PUMA Exhibit 2011
`APPLE v. PUMA, IPR 2016-01114
`Page 1 of 18
`
`

`

`U.S. Patent
`
`Dec. 10, 1996
`
`Sheet 1 of 8
`
`5,584,038
`
`Memory
`160
`
`101
`
`External Bus
`
`Memory Subsystem
`155
`
`MOB
`154
`
`Superscalar
`Execution Cluster
`130 I
`
`-150
`-1AG
`IEÚ-14--g
`
`-IFEUR47
`
`- 145
`
`115
`
`f
`
`Reorder
`Buffer
`120
`
`t-
`
`RRF
`128
`
`Retirement
`Logic
`140
`
`Bus Interface
`102
`
`Instruction
`Fetch Unit
`105
`
`Instruction
`Decoder
`109
`
`Allocator
`112
`
`1
`
`In -Order
`Issue Cluster
`103
`
`Branch Target
`Buffer Circuit
`107
`
`I
`
`Branch Target
`Buffer Cache
`108
`
`Branch
`Instruction
`Pointer Table
`162
`
`Branch Address
`Calculator
`164
`
`RAT
`111
`
`Out -of -Order
`Cluster -
`110
`
`Microprocessor 100
`
`Figure 1
`
`PUMA Exhibit 2011
`APPLE v. PUMA, IPR 2016-01114
`Page 2 of 18
`
`

`

`RONuke
`ROClear
`JEClear
`
`cr,
`1/4t
`
`eDró
`
`i-+
`
`tailptr
`
`RONuke
`ROClear
`
`120
`ROB
`
`headptT
`
`ALstall
`
`Figure 2
`
`headptr
`Pdsts
`
`ALstall
`
`Allocation
`
`ROB
`
`220
`
`Control
`Allot
`
`225
`
`ALROBstall
`
`Uop Decode
`
`210
`
`Allocator 112
`
`JEClear
`
`Execution Unit
`
`Branch
`
`230
`
`Pdsts
`
`Instruction
`Uops from
`
`Decoder
`
`109
`
`PUMA Exhibit 2011
`APPLE v. PUMA, IPR 2016-01114
`Page 3 of 18
`
`

`

`U.S. Patent
`
`Dec. 10, 1996
`
`Sheet 3 of 8
`
`5,584,038
`
`Tail tr
`
`Head . tr
`
`Figure 3
`
`PUMA Exhibit 2011
`APPLE v. PUMA, IPR 2016-01114
`Page 4 of 18
`
`

`

`U.S. Patent
`
`Dec. 10, 1996
`
`Sheet 4 of 8
`
`5,584,038
`
`intheadptr
`
`1
`
`tailptr
`
`ER...eset
`
`Compare
`and latch
`415
`
`A1Robstall
`
`+1
`
`+3
`
`+2
`1
`Latches
`410
`Pdst+2
`
`Pdst+3
`y
`
`Pdst+1
`
`.
`Latches
`425
`
`Clk# & r.w>
`AiRobstall
`
`intheheadptr
`
`Clk# & >
`
`A1Robstall
`or A1Stall
`
`IDuops
`
`V
`
`V
`Mug
`440
`
`Pdst(3)
`Pdst(2) To
`120
`Pdst(1)
`
`new
`intheheadptr
`
`intheadptr'
`
`Stall
`
`/Clk
`
`450
`
`60
`
`P
`
`1
`
`headptr to
`ROB120
`
`224
`
`Figure 4
`
`PUMA Exhibit 2011
`APPLE v. PUMA, IPR 2016-01114
`Page 5 of 18
`
`

`

`U.S. Patent
`
`Dec. 10, 1996
`
`Sheet 5 of 8
`
`5,584,038
`
`1
`Branch 1
`i
`Branch 2
`
`(Executed first &
`correctly predicted)
`
`(Executed second &
`mispredicted,
`Assert JEClear/
`flush pipeline)
`
`Figure 5a
`
`1
`Branch 1
`
`ranch 2
`
`(Executed second &
`mispredicted,
`Assert JEClear/
`flush pipeline)
`
`(Executed first,
`predicted correctly)
`
`Figure 5b
`
`PUMA Exhibit 2011
`APPLE v. PUMA, IPR 2016-01114
`Page 6 of 18
`
`

`

`U.S. Patent
`
`Dec. 10, 1996
`
`Sheet 6 of 8
`
`5,584,038
`
`Branch 1
`
`(Executed second &
`mispredicted,
`Assert JEClear/
`flush pipeline)
`
`(Executed first &
`mispredicted,
`flush pipeline)
`
`Figure 5c
`
`1
`Branch 1
`
`ranch 2
`
`(Executed first &
`mispredicted,
`Assert JEClear/
`flush pipeline)
`
`(Executed second &
`mispredicted,
`do not flush
`pipeline)
`
`Figure 5d
`
`PUMA Exhibit 2011
`APPLE v. PUMA, IPR 2016-01114
`Page 7 of 18
`
`

`

`U.S. Patent
`
`Dec. 10, 1996
`
`Sheet 7 of 8
`Cstart)
`
`5,584,038
`
`610
`
`Mispredicted
`branch
`detected
`9
`
`Yes
`
`- 620
`
`Past
`mispredicted branches
`outstanding
`9
`
`Yes
`
`- 630
`Current
`mispredicted branch
`issued subsequent to
`past mispredicted
`branch
`9
`
`Yes
`
`Do not assert
`JE clear
`
`' 640
`
`nd)
`( t
`
`Figure 6
`
`T
`Assert
`JE clear
`
`625 -
`
`PUMA Exhibit 2011
`APPLE v. PUMA, IPR 2016-01114
`Page 8 of 18
`
`

`

`U.S. Patent
`
`Dec. 10, 1996
`
`Sheet 8 of 8
`
`5,584,038
`
`-720
`
`Yes
`
`- 730
`
`Result
`
`No
`
`Exclusive OR past Pdst and
`current Pdst wrap bits
`
`- 760
`
`Yes
`
`-770
`
`No
`
`790
`
`Past mispredicted
`branch older
`
`1
`
`- 785
`
`C End)
`
`Figure 7
`
`PUMA Exhibit 2011
`APPLE v. PUMA, IPR 2016-01114
`Page 9 of 18
`
`

`

`5,584,038
`
`1
`ENTRY ALLOCATION IN A CIRCULAR
`BUFFER USING WRAP BITS INDICATING
`WHETHER A QUEUE OF THE CIRCULAR
`BUFFER HAS BEEN TRAVERSED
`
`This is a divisional of application Ser. No. 08/204,760,
`filed Mar. 1, 1994.
`
`FIELD OF THE INVENTION
`
`The present invention relates to resource allocation in a
`buffer, and more specifically to methods and apparatus for
`allocating entries in a circular buffer for operation in a
`microprocessor.
`
`BACKGROUND OF THE INVENTION
`
`5
`
`to
`
`15
`
`2
`It is another object of the present invention to provide
`entry allocation for a reorder buffer in a microprocessor
`performing out -of -order execution.
`It is another object of the present invention to provide
`entry allocation for a reorder buffer in a microprocessor
`performing speculative execution.
`These and other objects of the present invention are
`realized in an arrangement that includes a circular buffer and
`an allocator for allocating entries in the circular buffer. The
`allocator receives requests for storing data in entries of the
`circular buffer. The allocator generates a head pointer to
`identify a starting entry in the circular buffer for which
`circular buffer entries are not allocated. In addition to
`pointing to an entry in the circular buffer, the head pointer
`includes a wrap bit. The allocator toggles the wrap bit each
`time the allocator traverses the linear queue of the circular
`buffer. A tail pointer is generated, including the wrap bit, to
`identify an ending entry in the circular buffer for which
`circular buffer entries are allocated. In response to the
`request for entries, the allocator sequentially assigns entries
`for the requests located between the head pointer and the tail
`pointer.
`In order to allocate entries in the circular buffer, the
`allocator generates speculative addresses for each of the
`requests by utilizing the head pointer as a base address. The
`speculative addresses identify potential allocation entries in
`the circular buffer. The allocator compares the speculative
`addresses to the tail pointer. If a match between one of the
`speculative addresses and the tall pointer occurs, then the
`circular buffer does not contain enough unallocated entries
`to support all of the requests. Alternatively, if no match
`occurs between one of the speculative addresses and the tail
`pointer, then the circular buffer contains resources to support
`all of the requests. In either case, the appropriate head
`pointer is stored for reference in a subsequent allocation
`period.
`The allocator of the present invention has application for
`use in a microprocessor performing out -of -order execution
`and speculative execution. The microprocessor contains an
`instruction fetch and decoder circuits for issuing and decod-
`ing instructions, respectively, in the program order. The
`microprocessor also includes a superscalar execution dus-
`ter, containing a plurality of execution units, and an out -of-
`order cluster for performing out -of -order dispatch and
`execution. The out -of -order cluster contains a reorder buffer
`configured as a circular buffer. The reorder buffer stores data
`for use in execution in the superscalar execution unit. The
`allocator is coupled to the reorder buffer to permit allocation
`of entries.
`In a preferred embodiment, the allocator utilizes an all or
`nothing allocation policy. The allocator receives a plurality
`of instructions, entitled micro -ops, for execution in the
`microprocessor. The allocator determines whether the reor-
`der buffer contains enough unallocated entries to support
`each of the incoming micro-ops. If the reorder buffer con-
`tains enough entries, the allocator sequentially allocates
`physical destinations in the reorder buffer, including a wrap
`bit, for each micro -op so as to retain the original program
`order. Altematively, if the reorder buffer does not contains
`enough entries, the allocator does not allocate any of the
`nticro -ops in the allocation period.
`In order to support speculative execution in the micro-
`processor, the microprocessor contains a branch execution
`unit. In part, the branch execution unit determines the
`outcome for a branch instruction previously predicted.
`Under certain circumstances, if the branch instruction is
`
`In general, data processing systems of all types employ
`registers or buffers to store data. The buffers are constructed
`in a variety of ways depending upon the particular applica- 20
`tion for the data processing system. For example, buffers
`utilized for data processing systems include random access
`memory (RAM) and first -in- first -out (FIFO) buffers. In
`addition to containing buffers for storing data, data process-
`ing systems typically employ apparatus for allocating entries 25
`in the buffer to store data. Because buffer resources are
`finite, data processing systems deallocate buffer entries
`when data stored in a corresponding buffer entry is no longer
`needed. Although FIFO memory devices provide easy entry
`allocation, FIFO memory devices are not suitable for all data 30
`processing applications.
`Buffers are utilized in data processing systems to store
`input data, intermediate result data and output data. For
`example, in a digital processing system for audio or video
`compression, random access memory (RAM) may be
`employed to store input uncompressed data, intermediate
`data and final compressed output data. In such a digital
`processing system, it is necessary to reallocate buffer entries
`no longer needed in order to provide a continual flow of
`resources required for subsequent data processing. In addi-
`tion, it is necessary to reallocate entries in the buffer so as to
`preserve data entries still required by the data processing
`system. Consequently, allocation of buffer entries requires
`both preservation of certain existing data entries, and
`resource assignment to new buffer entries for incoming data.
`Microprocessors utilize buffers to implement file registers
`for use in conjunction with executing instructions. Typically,
`file registers store data for access by an execution unit, such
`as an arithmetic logic unit (ALU), in the microprocessor.
`The complexity of the register file and allocation required
`for operation of the microprocessor is dependent upon the
`architecture of the microprocessor. For example, in a micro-
`processor performing parallel execution, some instructions
`are executed out of the original program order. Conse- 55
`quently, allocation and deallocation of entries in the register
`file becomes complex.
`
`45
`
`so
`
`35
`
`SUMMARY AND OBJECTS OF THE
`INVENTION
`Therefore, it is an object of the present invention to
`allocate entries in a circular buffer without the need for
`complex logic.
`It is a further object of the present invention to allocate 6s
`entries in a circular buffer without overwriting valid data
`currently stored in the buffer.
`
`60
`
`PUMA Exhibit 2011
`APPLE v. PUMA, IPR 2016-01114
`Page 10 of 18
`
`

`

`5,584,038
`
`3
`mispredicted, then the branch execution unit flushes the
`pipeline of incoming instructions. In order to determine
`whether to flush the pipeline, the branch execution unit
`determines the relative age of two branch instructions by
`comparing the physical destinations and wrap bits of a first
`and second branch instruction. If the wrap bits are equal,
`then the branch execution unit designates the instruction
`comprising the smallest physical destination value as the
`oldest branch instruction. If the wrap bits are not equal,
`indicating the linear queue has been traversed, then the to
`branch execution unit designates the instruction comprising
`the largest physical destination value as the oldest branch
`instruction.
`Other objects, features and advantages of the present
`invention will be apparent from the accompanying draw- 15
`ings, and from the detailed description that follows below.
`
`5
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`The objects, features, and advantages of the present 20
`invention will be apparent from the following detailed
`description of the preferred embodiment of the invention
`with references to the following drawings.
`FIG. 1 illustrates a high level block diagram of an
`out -of -order execution and speculative execution micropro- 25
`cessor incorporating the teachings of the present invention.
`FIG. 2 illustrates ti high level block diagram of an
`allocator and associated units configured in accordance with
`the present invention.
`FIG. 3 illustrates a logical diagram of the ROB 120 buffer 30
`configured in accordance with the present invention.
`FIG. 4 illustrates a ROB allocation unit configured in
`accordance with the present invention.
`FIG. 5a, 56, 5c, 5d illustrate examples of how a branch 35
`execution unit execute branches both in and out of order in
`accordance with the present invention.
`FIG. 6 illustrates pipeline flush logic implemented in the
`branch execution unit of the present invention.
`FIG. 7 illustrates a method for determining the relative 40
`branch age configured in accordance with the present inven-
`tion.
`
`45
`
`DETAILED DESCRIPTION
`Methods and apparatus for allocating reorder buffer
`resources in a microprocessor performing out -of -order
`execution are disclosed. In the following description, for
`purposes of explanation, specific nomenclature is set forth to
`provide a thorough understanding of the present invention. 50
`However, it will be apparent to one skilled in the art that
`these specific details are not required to practice the present
`invention. In other instances, well known circuits and
`devices are shown in block diagram form to avoid obscuring
`the present invention unnecessarily.
`The present invention has application for use in a micro-
`processor performing out -of -order execution and specula-
`tive execution. FIG. 1 illustrates a high level block diagram
`of an out -of -order execution and speculative execution
`microprocessor incorporating the teachings of the present 60
`invention. The high level block diagram of FIG. 1 illustrates
`functional blocks of a pipelined microprocessor incorporat-
`ing the teachings of present invention. The microprocessor
`100 contains an in -order issue cluster 103, an out -of -order
`cluster 110, and a superscalar execution cluster 130. In as
`addition, microprocessor 100 contains a bus interface 102,
`coupled to an external bus 101, and a memory subsystem
`
`55
`
`4
`155 for interfacing the microprocessor 100 to external
`memory.
`The bus interface 102 interfaces the microprocessor 100
`to peripheral components, including memory, via the exter-
`nal bus 101. The memory subsystem 155 is coupled to the
`bus interface 102 and provides a memory interface to cache
`memory and main memory. In one embodiment, the bus
`interface 101 attempts to load or store data from a high speed
`cache memory. Alternatively, the bus interface 101 accesses
`a main memory over the external bus 101. The bus interface
`102 and memory subsystem 155 are intended to represent a
`broad category of interface devices which are well known in
`the art and will not be described further.
`The bus interface 102 is coupled to an instruction fetch
`unit 105 located in the in -order issue cluster 103. The
`instruction fetch unit 105 retrieves microprocessor instruc-
`tions, known as macro instructions, and operands for execu-
`tion in the microprocessor 100. In a preferred embodiment,
`the microprocessor 100 is implemented as a pipelined pro-
`cessor overlapping instruction fetch, instruction decode and
`instruction execution functions. The instruction fetch unit
`105 continually fetches macro instructions for the pipeline in
`the microprocessor 100. However, simple unconditional
`branch instructions within the instruction stream prevent the
`instruction fetch unit 105 from retrieving instructions in a
`purely sequential path. Furthermore, conditional branch
`instructions, within the instruction stream, prevent the
`instruction fetch unit 105 from retrieving instructions along
`a predetermined path because the condition requires reso-
`lution to ascertain the direction of the path.
`In order to continually input macro instructions into the
`pipeline of microprocessor 100, the microprocessor 100
`includes a branch prediction unit 107. The branch prediction
`unit 107 predicts the execution path of an instruction stream.
`In general, the branch prediction unit 107 predicts the
`existence of branch instructions within the instruction
`steam, and predicts the outcome of the branch. Conse-
`quently, as the macro instructions input to the pipeline
`proceed down the pipeline stages, the macro instructions are
`"speculatively" executed because of the uncertainty that the
`branch was properly predicted. The reorder buffer allocation
`of the present invention supports the operation of specula-
`tive execution in the microprocessor 100 as is described
`more fully below.
`The in -order cluster 103 includes a branch target buffer
`circuit 107. The branch target buffer circuit 107 receives a
`current instruction pointer (IP) from the instruction fetch
`unit 105. The branch target buffer circuit 107 contains the
`branch target buffer cache 108. The branch target buffer
`cache 108 stores information pertaining to the history of
`branch instructions previously executed. By utilizing the
`current instruction pointer (IP) and the internal cache, the
`branch target buffer circuit 107 predicts both the existence of
`branch instructions within the instruction stream, and a
`target address for the branch instructions predicted. After a
`branch instruction is resolved by the microprocessor 100,
`the branch target buffer circuit 107 records the branch target
`address and the outcome of the branch instruction in the
`branch target buffer cache 108 for future reference.
`Each entry in the branch rrrget buffer cache 108 is
`indexed by the address of the last byte of the branch
`instruction to allow the branch target buffer circuit 107 to
`reference previous branch instructions utilizing the instruc-
`tion pointer. When the branch target buffer circuit 107 finds
`an upcoming branch instruction in the branch target buffer
`cache 108, the branch target buffer circuit 107 predicts a
`
`PUMA Exhibit 2011
`APPLE v. PUMA, IPR 2016-01114
`Page 11 of 18
`
`

`

`5,584,038
`
`5
`
`1s
`
`5
`taken or not -taken branch outcome for the branch instruc-
`tion. If the branch target buffer circuit 107 predicts a taken
`branch outcome, then the branch target buffer circuit 107
`also predicts a branch target address.
`The macro instructions retrieved are input to an instme-
`lion decoder 109. In general, the instruction decoder 109
`decodes an operation code and source data for the corre-
`sponding macro instructions. In a preferred embodiment, the
`instruction decoder 109 receives Intel® architecture com-
`patible macro instructions, and determines the type of 10
`instruction received. The instruction decoder 109 breaks
`down the macro instruction into one or more micro- opera-
`tions (micro -ops) with associated micro- operands. The one
`or more micro -ops corresponding to the decoded macro
`instruction specify the equivalent function.
`If the instruction decoder 109 determines that a received
`is microprocessor instruction is a branch instruction, then
`the instruction decoder 109 passes certain information about
`the branch instruction to a branch address calculator 164 for
`special treatment. The branch address calculator 164 further 20
`processes each branch instruction based on information
`provided in the instruction and the operand. For example, if
`the branch target buffer circuit 107 made a branch predic-
`tion, the branch address calculator 164 attempts to verify the
`branch prediction.
`The instruction decoder 109 is coupled to an allocator
`112, also located within the in -order issue cluster 103. The
`micro -ops generated in the instruction decoder 109 are input
`to the allocator 112. In general, the allocator 112 allocates
`resources necessary to execute each micro -op. In the pre- 30
`fared embodiment, the microprocessor 100 performs out -
`of -order execution, wherein micro -ops may be executed out
`of the original program order. During retirement of the
`micro -ops, the original program order is restored. The allo-
`cation of resources to the out -of -order cluster is described 35
`below.
`The out -of -order cluster 110 contains a reservation station
`(RS) 115, a reorder buffer (ROB) 120, a real register file
`(CRRF) 128, and retirement logic 140. The ROB 120
`includes a reorder buffer and corresponding reorder logic.
`The ROB 120 provides capabilities for speculative execu-
`tion, register renaming and out -of -order execution for the
`microprocessor 100. In a preferred embodiment of the
`present invention, the ROB 120 is implemented as a multi- 45
`port register file. The ROB 120 is managed as a first in first
`out (FIFO) register file. Both source reads and reorder buffer
`write -backs operate on the ROB 120 as a register file. The
`RRF 128 comprises the architectural registers of the micro-
`processor 100.
`The ROB 120 contains a circular buffer to store n entries,
`wherein each entry stores the results of executed micro -ops.
`Because each entry in the ROB 120 provides a destination
`for micro -op result, each ROB 120 entry is referred to as a
`physical destination (Pdst). The Pdsts, within the ROB 120,
`are numbered 0 through n -1. Each Pdst in the ROB 120
`contains a valid bit, that indicates whether or not the
`micro-op result is valid, a micro -op result, a set of flags and
`a corresponding mask, a code, and fault data.
`If the allocator 112 allocates an entry in the ROB 120 for 60
`a micro -op associated with a branch instruction, the branch
`target buffer circuit 107 allocates a matching entry in a
`branch instruction- pointer table (BIT) 162. The branch tar-
`get buffer circuit 107 stores a "fall - through" address for the
`branch instruction, and processor state information into the 65
`branch instruction- pointer table (BIT) 162. The fall -through
`address identifies the address of the instruction immediately
`
`50
`
`25
`
`55
`
`6
`following the branch instruction. The information stored in
`the branch instruction- pointer table (BIT) 162 is later uti-
`lized by the instruction fetch unit 105, the branch address
`calculator 164 and the branch target buffer circuit 107.
`The ROB 120 supports out -of -order execution by allow-
`ing the superscalar execution unit 130 to complete execution
`of instructions, and write -back the results without regard to
`other instructions that use the same logical register. There-
`fore, as far as the superscalar execution unit 130 is con-
`cerned, micro -ops complete out -of- order. Subsequently, the
`retirement logic 140, also contained within the out -of -order
`cluster 110, reorders the executed micro operations into the
`original sequence issued by the in -order issue cluster 103. To
`support register renaming and out -of -order execution, a
`register alias table (RAT) 111, located in the in -order issue
`cluster 103, maintains a mapping of logical registers, located
`in the real register file 128, to physical registers allocated in
`the ROB 120. In addition, the ROB 120 supports speculative
`execution by buffering the results of the superscalar execu-
`tion cluster 130 before committing the results to an archi-
`tecturally visible state in the RRF 128.
`The ROB 120 is utilized to support register renaming. In
`general, register renaming involves allocating a new physi-
`cal register from a logical register, as the destination for a
`predefined architectural register. In microprocessor 100,
`register renaming renames logical registers associated with
`the REF 128 and allocates physical registers in the ROB
`120. Consequently, by renaming the registers, the supersca-
`lar execution cluster 130 executes different instructions in
`overlapping dock cycles even though the instructions utilize
`the same architectural register because different physical
`registers are allocated in the ROB 120 for each micro -op.
`The allocator 112 allocates an catty in ROB 120 for each
`micro -op. The allocator 112 allocates and deallocates entries
`in the ROB 120 in a FIFO manner. Upon allocation of a
`micro -op to a reorder buffer entry, the allocator 112 provides
`the reorder unit 120 with physical destination addresses to
`identify the allocation. Each physical destination in the ROB
`120 contains micro-op result data, flags a code for the result
`data, fault data, and a valid bit, which indicates whether or
`not the corresponding micro -op is valid. During the high
`phase of the system clock, the allocator 112 provides the
`three physical destination addresses to the reorder unit 120.
`In a subsequent low phase of the clock cycle, the in -order
`fetch and issue cluster 103 provides information to write
`entries into the ROB 120. Also, on the low phase of the clock
`cycle, ROB 120 entries receive data. In a preferred embodi-
`ment, up to four micro -ops are allocated in the ROB 120 in
`any given clock.
`For each micro -op, the allocator 112 allocates an entry in
`the reservation station (RS) 115. Each entry in the RS 115
`stores a valid bit, indicating validity of the corresponding
`entry, the micro -op instruction code, two source data entries
`and corresponding source data valid bits. In addition, the RS
`115 stores two physical source fields identifying the location
`of the source data if the entry is not valid, and a physical
`destination for the result of the micro -op. Upon allocation of
`entries in the RS 115 and ROB 120, each micro -op waits in
`the RS 115 for both available resource data and an available
`execution unit in the superscalar execution cluster 130.
`When the resource data and the appropriate execution unit
`are ready, the RS 115 dispatches the micro -op to the appro-
`priate execution unit in the superscalar execution cluster
`130.
`The out -of -order cluster 110 is coupled to the superscalar
`execution cluster 130. The superscalar execution cluster 130
`
`PUMA Exhibit 2011
`APPLE v. PUMA, IPR 2016-01114
`Page 12 of 18
`
`

`

`5,584,038
`
`5
`
`to
`
`7
`executes instructions utilizing source data stored in the ROB
`120 and the RRF 128. For the present embodiment, the
`superscalar execution cluster 130 comprises four execution
`units (AGU 150, IEU 149, PEU 147 and MIU 145). Spe-
`cifically, the superscalar execution cluster 130 contains an
`address generation unit AGU (150), an integer execution
`unit (IEU) 149, a floating point execution unit (FEU) 147,
`and a memory interface unit (M1U) 145. Upon execution of
`the micro -op in the superscalar execution unit 130, the
`corresponding execution unit writes the result data, the
`architectural flags, and any fault information in the appro-
`priate physical destination entry in the ROB 120.
`The retirement logic 140 retires the write -back results
`stored in the ROB 120 for each executed micro-op. In
`general, the retirement logic 140 retires the ROB 120 entries
`by evaluating the physical destination entries in the ROB
`120 in the order allocated The retirement logic 140 retires
`the physical destination entries by transferring write -back
`data into a corresponding logical register in the RRF 128 so
`as to commit the write -back data to the current architectural
`state of the microprocessor 100. Because the allocator 112
`allocates the physical destination entries in the ROB 120 in
`the original program order, and the retirement logic 140
`retires the physical destination entries in the same order, the
`original program order is maintained.
`In order to properly retire a micro -op, the retirement logic
`140 tests a valid bit in the corresponding ROB 120 entry to
`ascertain whether the corresponding entry contains a valid
`executed micro -op result. If the micro -op result is valid, as
`indicated by the valid bit, then the retirement logic 140
`checks the fault field of the corresponding Pdst to ascertain
`whether special handing is required. If the ROB 120 Pdst
`entry contains a valid executed micro -op result and no fault
`exists, then the executed micro -op result is committed to
`permanent architectural state in the RRF 128. The tail
`pointer is incremented for each micro -op retired. When the
`retirement logic 140 attempts to retire a branch micro -op, the
`retirement logic 140 tests the fault field of the corresponding
`Pdst entry to determine whether the branch micro-op was
`mispredicted. If the retirement logic 140 detects that the
`micro -op was mispredicted, the retirement logic 140 flushes
`all additional micro -ops issued subsequent to the mispre-
`dicted branch.
`As described above, the allocator 112 is contained within
`the in -order issue cluster 103 on the microprocessor 100.
`The allocator 112 interacts with both the in -order and
`out -of -order sections of the microprocessor 100. Specifi-
`cally, the allocator 112 interacts with instruction decoder
`109, register alias table (RAT) 111, branch target buffer
`(BTB) 107, the integer execution unit 149, ROB 120, RS
`115, and MOB 154. During each clock cycle of micropro-
`cessor 100, the allocator 112 prepares to allocate up to three
`ROB 120, RS 115, and load buffer (LB) entries within MOB
`154. In addition, the allocator 112 prepares to allocate up to
`two store block (SB) entries within MOB 154.
`In order to allocate the appropriate resources, the allocator
`112 generates pointers to the appropriate resources by
`decoding the micro -ops input from the instruction decoder
`109. The decoded micro -ops permit the allocator 112 to
`ascertain specific resources required for the micro -ops. In
`addition, the decoded micro -ops indicate a specific RS 115
`dispatch port. The decoded micro -ops contain a micro-op
`valid bit that permits the allocator 112 to further qualify
`resources required. Based on the resources required and
`validity of the micro-ops, the allocator 112 ascertains the
`availability of resources for the micro -ops.
`FIG. 2 illustrates a high level block diagram incorporating
`the teachings of the present invention. In part, FIG. 2
`
`8
`illustrates a portion of the allocator 112 utilized to allocate
`resources to the ROB 120. In addition, FIG. 2 illustrates
`ROB 120 and a branch execution unit 230. In order to
`allocate resources for the ROB 120, the allocator 112
`contains a micro -op decoder 210 and ROB allocation unit
`220. The micro-op decoder 210 analyzes the incoming
`stream of micro -ops in order to determine whether a RS 115,
`LB or SB entries are required. In a preferred embodiment,
`each micro -op requires one ROB 120 buffer entry.
`The entries in the ROB 120 are entitled physical desti-
`nation addresses (Pdst), as opposed to logical addresses that
`are the source and destination fields of the m

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket