`
` 24 September, 2001
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`4 September, 201516
`i
`ot
`
`GEN-CXXXXX-REVA
`
`63168 Bytes*** @ ATI Confidential. Reference Copyright Notice on Cover Page © *** nonins oad
`
`
`
`-
`"
`Author:
`
`Laurent Lefebvre
`
`PAGE
`
`1 of 35
`
`
`
` issue To: | Copy No:
`
`R400 Sequencer Specification
`
`SEQ
`
`Version 1.24
`
`It provides an overview of the
`Qverview: This is an archiectural specification for ihe R400 Sequencer block (SEQ).
`required capabilities and expected uses of the block.
`it also describes the block interfaces,
`internal sub-
`blocks, and provides internal stale diagrams.
`
`AUTOMATICALLY UPDATED FIELDS:
`Document Location:
`C\perforcer400\archidoc\gik(RE\R400_Sequencer.dec
`
`Current Intranet Search Title:
`R400 Sequencer Specification
`
`:
`28
`APPROVALS.
`Sos
`
`Name/Dépt
`ce
`Signature/Date
`
`
`
`
`
`
`
`Remarks:
`
`
`
`THIS DOCUMENT CONTAINS CONFIDENTIAL INFORMATION THAT COULD BE
`
`
`
`
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES
`INC. THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`“Copyright 2001, ATI Technologies Inc. All rignts reserved. The material in this document constitutes an unpublished
`work created in 2001. The use of this copyright notice is intended to provide notice that ATI owns a copyright in this
`unpublished work. The copyright notice is not an admission that publication has occurred. This work contains
`confidential, proprietary information and trade secrets of ATI. No part of this document may be used, reproduced, or
`transmitted in any form or by any means without the prior written permission of ATI Technologies Inc.”
`
`
`Exhibit 2018 decR400_Sequencer.des
`
`PHAGANOFABOAGAOdoBM
`
`ATI 2018
`
`LGv. ATI
`IPR2015-00325
`
`AMD1044_0256912
`
`ATI Ex. 2106
`IPR2023-00922
`Page 1 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 1 of 223
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`PAGE
`R400 Sequencer Specification
`| 4 September, 201516 2 0f 35
`
`24 September, 2001
`L
`2
`UA
`
`Table Of Contents
`
`
`
`
`
`a
`we
`OVERVIEW...
`
`Top Level Bloc
`7
`lagrarn
`12
`Data Flow graph..........
`
`1a
`Control Graph cece ceeeeereeerteetteers 9
`2
`INTERPOLATED DATA BUS. .....
`
`3.
`INSTRUCTION STORE...
`4
`*
`SEQUENCER INSTRUCTIONS ..
`
`5.
`CONSTANT STORE...
`6.
`LOOPING AND BRANCH
`
`61
`The controlling state...
`
`6.2
`The Control Flow Program...
`Me
`63
`Data dependant predicate instructions bees“AT
`64
`HVY Detection of PV|PS 184646
`
`64
`18:
`Registerfile indexing......
`66
`Predicated instruction support for Texture
`» 1846
`CHAUSEB cette cate etesenevevae eeaepraepenaeens
`67
`Debugging the Shaders.....
`1846
`
`Method 1: Debugging registers vevnaee18416
`6.7.1
`Method 2: Exporting the values in the
`6.7.2
`
`GPRe (12) 19442
`6.7.3
`Method 3: Selective export of a 32 bit
`TO4e47
`Dword,
`PIXEL KILL MASK...
`7
`HOS SURFACESocc
`8.
`REGISTER FILE ALLOCATION.
`%
`FETCH ARBITRATION...
`10.
`
`ALU ARBITRATION...
`ti.
`
`HANDLING STALLS......
`2248:
`12.
`
`CONTENT OF THE RESERVATION STATION
`13.
`FIFOS 224920
`a 2241920
`I4.
`THE OUTPUT FILE ........
`224920
`15.
`bE FORMAT... ecw eee
`16.
`THE PARAMETER CACHE....
`
`17.
`VERTEX POSITION EXPORTING...
`18.
`EXPORTING ARBITRATION...
`
`19.
`REAL TIME COMMANDS.
`
`20,
`REGISTERS..
`
`20.1
`Control...
`
`Context oe
`20.2
`
`
`DEBUG REGISTERS. .....
`2522
`21.
`COMMPOL. ee eee eerste rere reeeoeeae
`211
`
`
`
`wenn 204718
`
`21.2
`22.
`22,
`22.4.1
`22.1.2
`22.1.3
`22,14
`
`22.1.8
`control Bus
`22.1.6
`bus
`22.1.7
`22.1.8
`
`22.1.9
`
`Context...
`INTERFACES...
`
`
`External Interfaces .
`
`
`PA/SC to SPO: LL) bus oo.
`PAISC to SEQ: Li Contral bus 262223
`SEQ to SPO: interpolator bus .262323
`SEQ to SPO: Parameter Cache bus
`
`2ieded
`SEG to SXO: Parameter Cache Mux
`272304
`8X0 to SPO: Parameter Cache Return
`2ie324
`VGT to SPO/SEQ : Vertex Bus 282324
`CP to SEQ : Constant store load
`
`282424
`CP to SEQ): Fetch State store load
`282424
`CP to SEQ : Control State store load
`282424
`MH to SEG: Instruction store Load
`
`262425
`SFO to SXO: Pixel read from RBs
`282425
`SEG to SMO: Control bus... 262425
`SXO to SEQ : Output file control
`
`292425
`SPO to SX0: Position return bus
`
`302828
`Shader Engine to Fetch Unit Bus (Fast
`302526
`Sequencer to Fetch Unit bus Slow
`302526
`EXAMPLES OF PROGRAM EXECUTIONS
`D12826
`23.1.1
`Sequencer Control of a Vector of
`322626
`Vertices
`Sequencer Control of a Vector of
`23.1.2
`Pixels
`
`-
`23.1.3
`24.
`OPEN ISSUESi ceeereeeteeees382828
`
`22.1.10
`
`22,111
`
`22112
`
`22.41.16
`Bus)
`22A417
`Bus)
`23.
`
`Revision Changes:
`
`Rev 0.1 (Laurent Lefebvre)
`Date: May 7, 2001
`
`First draft.
`
`Rev 0.2 (Laurent Lefebvre)
`Date : July 9, 2007
`Rev 0.3 (Laurent Lefebvre)
`Date : August 6, 2001
`Rev 0.4 (Laurent Lefebvre)
`Date : August 24, 2001
`Exhibit 2018dock400_Sequencer.doc
`
`Changed the interfaces to reflect the changesin the
`SP. Added some details in the arbitration section.
`Reviewed the Sequencer spec after the meeting on
`August 3, 2001.
`Added the dynamic allocation method for register
`file and an example (written in part by Vic) of the
`53168 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ***
`jonsns oe”
`latinnens
`Ok LAMA GABIOALASPM.
`
`AMD1044_0256913
`
`ATI Ex. 2106
`IPR2023-00922
`Page 2 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 2 of 223
`
`
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`GEN-CXXXXX-REVA
`3 of 35
`
`24 September, 2001
`
`Rev 0.5 (Laurent Lefebvre)
`Date : September 7, 2001
`Rev 0.6 (Laurent Lefebvre)
`Date : September 24, 2001
`Rev 0.7 (Laurent Lefebvre)
`Date : October 5, 2001
`
`Rev 0.8 (Laurent Lefebvre)
`Date : October 8, 2001
`Rev 0.9 (Laurent Lefebvre)
`Date : October 17, 2001
`
`Rev 1.0 (Laurent Lefebvre)
`Date : October 19, 2001
`Rev 1.1 (Laurent Lefebvre)
`Date : October 26, 2001
`
`Rev 1.2 (Laurent Lefebvre)
`Date: November 16, 20014
`
`4 September, 201515
`es
`flow of pixels/vertices in the sequencer.
`Added timing diagrams(Vic)
`
`the new R400
`
`reflect
`spec to
`Changed the
`architecture. Added interfaces.
`instruction
`Added constant
`store management,
`store management, control flow management and
`data dependantpredication.
`Changed the control
`flow method to be more
`flexible. Also updated the external interfaces.
`Incorporated changes madein the 10/18/01 control
`flow meeting. Added a NOP instruction, removed
`the
`conditional_execute_or_jump. Added debug
`registers.
`Refined interfaces to RB. Added state registers.
`
`delta
`Added SEQ--SPO interfaces. Changed
`precision. Changed VGT—SP0 interface. Debug
`Methods added.
`Interfaces greatly refined. Cleaned up the spec,
`
`
`
`Exhibit 201.docR400_Sequencerdec
`
`83168 Bytes*** @ AT] Confidential. Reference Copyright Notice on Cover Page © ***
`PMIGNS010 AMIONBIOO46BM
`
`
`pjoosus 195
`
`AMD1044_0256914
`
`ATI Ex. 2106
`IPR2023-00922
`Page 3 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 3 of 223
`
`
`
`
`
`
`
`ORIGINATE DATE
`24 September, 2001
`
`EDIT DATE
`4 September, 201546
`fh
`
`R400 Sequencer Specification
`
`PAGE
`4o0f35
`
`1. Overview
`The sequencer first arbitrates between vectors of 64 vertices thal arrive directly from primitive assembly and vectors
`of 16 quads (64 pixels) that are generated in the raster engine.
`
`The vertex or pixel program specifies how many GPR’s it needs to execute. The sequencer will not start the next
`vector until the needed spaceis available.
`
`It chooses two ALU clauses and a fetch clause to execute, and
`The sequencer is based on the R300 design.
`executes all of the instructions in a clause before looking for a new clause of the same type. Two ALU clauses are
`executed interleaved to hide the ALU latency. Each vector will have eight fetch and eight ALU clauses, but clauses do
`not need to contain instructions. A vector of pixels or vertices ping-pongs along the sequencer FIFO, bouncing from
`fetch reservation station to alu reservation station. A FIFO exists between each reservation stage, holding up vectors
`until the vector currently occupying a reservation station has left. A vector at a reservation station can be chosen to
`execute. The sequencer looks at all eight alu reservation stations to choose an alu clause to execute and all eight
`fetch stations to choose a fetch clause to execute. The arbitrator will give priority to clauses/reservation stations
`closer to the bottom of the pipeline.
`It will not execute an alu clause until the fetch fetchesinitiated by the previous
`fetch clause have completed. There are two separate sets of reservation stations, one for pixel vectors and one for
`vertices vectors. This way a pixel can pass a vertex and a vertex can pass a pixel.
`
`To support the shader pipe the raster engine also contains the shader instruction cache and consiant store. There
`are only one constant store for the whole chip and one instruction store. These will be shared among the four shader
`pipes. The four shader pipes also execute the same instuctieninstruction thus there is only one sequencer for the
`whole chip.
`
`Exhibit 2018.docR400_Sequercerdec
`
`83168 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ** ponans isa
`PMSINS/04-10:3 7AMONBO}OLARM
`
`AMD1044_0256915
`
`ATI Ex. 2106
`IPR2023-00922
`Page 4 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 4 of 223
`
`
`
`
`
`
`
`TWIRELVANeaddoAATLOaLOdd
`
`dani~~)ean
`
`
`
`
`
`
`
`MVESSONOPI
`
`bomenESCREZo
`
`
`
`dsdS
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`HeOPPOLOEHOTRYOTTOTENGPCTSLPUOOLeBya6ed19A04UOVDIIONIGUAdODDOUSISJOY"[ENUSPYUOD[LY@wsae01cs—sopeouanbeg“gorwsePSiOzTal
`
`
`
`
`
`
`
`
`
`,forenamewis!TOSLNODfi
`.y»gy»gy&ayapoorer
`—_gor0d|go/od-Old+)BODa>4rr|INVISNODonan——SNISNSv|ineHOLS
`
`
`
` aSeaaoeSYOLSISN
`odifo_—LLyPOTEETXm
`x!yo)A6LENI
`
`
`—eeeT—‘eVBLMXL_
`
`
`
`nsnanaeEVOreereepy
`
`
`
`SEJOGVATEKXXXXO-NADSrGLOdJequiees7Lo0g‘Iequaydespz
`
`
`
`
`doVdWON(AdaLNSWNOOdalydLigaaLvdSLVYNISIO
`
`
`!|SHELNIOd
`
`
`
`
`
`
`
`
`
`SLVLSHOLSad
`
`LSNIXGL
`
`
`
`avsaOd‘
`
`
`
`
`
`
`
`XELUSA
`
`TOMLNOD
`
`aucA
`
`BL209CIT-O
`
`A
`
`5ovo)Sve
`
`AMD1044_0256916
`
`ATI Ex. 2106
`IPR2023-00922
`Page 5 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 5 of 223
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`PAGE
`R400 Sequencer Specification
`EDIT DATE
`ORIGINATE DATE
`
`i fh
`|
`|
`24 September, 2001
`4 September, 201546
`Sof 35
`
`1.1 Top Level Block Diagram
`
`vertox/pixel vector arbitrator
`
`Possible delay for available GPR’s «
`
`
`‘oxture clause 0
`
`eservation station
`FIFO
`
`ALI clanse 0
`@-——teservation station
`|
`‘
`pe FIFO
`‘exture clause 1
`eservationstation
`ty
`
`wg AT clanse 1
`
`reservationstation
`[Pexture clause 2
`pe FO
`>|
`cc —
`eservationstation
`[FFT]g—_QA clanse 2
`
`reservation station
`iPexture clause 3
`TIPO pe,CL
`eservationstation
`FIFO
`
`TiO
`
`<
`
`extire arbitrator
`
`+
`
`oxture arbitrator
`
`hag
`
`I
`—ALU clause 3
`Feservaiion station
`|
`
`Prexture clause 4
`eservation station
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`nS
`
`‘ag
`
`og
`
`ALU clause 4
`reservationstation.
`$$$
`TS
`(eee
`Texture clause 5
`
`eservation station
`ATL clanse 4
`
`reservation station
`(Pexture clause 6
`
`reservation station.
`i
`a FIFO
`FEDeta
`-
`ALUclanse 6
`reservation station
`“BIRO
`-
`Ld oorextre clause 7
`eservation station
`i fen, FLEE itleg—ALUclause 7
`
`!
`reservation station
`
`There are two sets of the above figure, one for vertices and one for pixels.
`
`Depending on the arbitration state, the sequencer will either choose a vertex or a pixel packet. The control packet
`| consists of 243 bits of state, 6-7 bits for the base address of the Shader program and some information on the
`coverage to determine fetch LOD plus other various small statebits.
`
`Exhibit 2018.docR400_Sequercerdec
`
`83168 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ** ponans isa
`PRIAOt SOFAM OASOLOhSGRM
`:
`
`AMD1044_0256917
`
`ATI Ex. 2106
`IPR2023-00922
`Page 6 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 6 of 223
`
`
`
`
`
`
`
`ORIGINATE DATE
`
`
`
`
`
` Saas
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`(of 35
`GEN-CXXXXX-REVA
`4 September, 201518
`24 September, 2001
`a
`2 FR
`On receipt of a packet. the input state machine (not pictured but just before the first FIFO) allocated enough spacein
`the registers to store the interpolated values and temporaries. Following this, the input state machine stacks the
`packetin the first FIFO.
`
`On receipt of a command, the level 0 fetch machine issues a texuretexiure request and corresponding register
`address for the fetch address (ta). A small command (tcmd) is passed to the fetch system identifying the current level
`number (0) as well as the register write address for the fetch return data. One fetch request is sent every 4 clocks
`causing the texturing of sixteen 2x2s worth of data (or 64 vertices), Once all the requests are sent the packetis put in
`FIFO 1.
`
`Upon recept of the return data, the fetch unit writes the data to the register file using the write address that was
`provided by the level O fetch machine and sends the clause number (0) to the level 0 fetch state machine to signify
`that the write is done and thus the data is ready. Then, the level 0 fetch machine increments the counter of FIFO 1 to
`signify to the ALU 1 that the data is ready to be processed.
`
`On receipt of a command, the level O ALU machine first decrements the input FIFO counter and then issues a
`complete set of level 0 shader instructions. For each instruction, the state machine generates 3 source addresses,
`one destination address (3 cycles later) and an instruction. Once the last instruction as been issued, the packetis put
`into FIFO 2.
`
`There will always be two active ALU clauses at any given time (and two arbitrersarbiters). One arbitrerarbiter
`will arbitrate over the odd instructions (4 clocks cycles) and the other one will arbitrate over the even
`instructions (4 clocks cycles). The only constraints between the two areitrersarbiters is that they are not
`allowed to pick the same clause number as the other one is currently working on if the packet is not of the
`same type (render state).
`
`if the packet is a vertex packet, upon reaching ALU clause 3, it can export the position if the position is ready. So the
`
`arbitverarbiter must prevent ALU clause 3 to be selected if the positional buffer is full (or can’t be accessed). Along
`with the positional data, the location where the vertex data is to be put is also sent (parameter data pointers).
`
`{ISSUE: How do we handle parameter cache pointers (computed, semi-computed or not computed)?}
`
`A special case is for HOS surfaces wich can export 12 parameters per last 6 clauses to the output buffer. If the output
`buffer is full or doesn't have enough space the sequencerwill prevent such a vertex group to enter an exporting
`clause.
`
`Regular pixel and vertex shaders can export 12 pararneters to memory from the last clause only (7).
`
`All other level process in the same way until the packetfinally reaches the last ALU machine (7). On completion of the
`level 7 ALU clause, a valid bit is sent to the Render Backend which picks up the color data. This requires that the last
`instruction writes to the output register — a condition that is almost always true.
`If the packet was a vertex packet,
`instead of sending the valid bit to the RB, it is sent to the PA so it can know that the data present in the parameter
`store is valid.
`
`Only two ALU state machine may have access to the register file address bus or the instruction decode bus at one
`time. Similarly, only one fetch state machine may have access to the register file address bus at one time. Arbitration
`is performed by three arstrerarbiter blocks (two for the ALU state machines and onefor the fetch state machines).
`
`The arbtrerearbiters always favor the higher number state machines, preventing a bunch of half finished jobs from
`clogging up the registerfiles.
`
`Exhibit 2018.docR4G0_Sequencerdec
`
`83168 Bytas*** © ATI Confidential. Reference Copyright Notice on Cover Page © ***
`PMIGNS010 AMIONBIOO46BM
`
`joosns
`
`
`AMD1044_0256918
`
`ATI Ex. 2106
`IPR2023-00922
`Page7 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 7 of 223
`
`
`
` |
`
`EDIT DATE
`
`4 September, 2075416
`iat
`
`R400 Sequencer Specification
`
`PAGE
`8 of 35
`
`
`
`ORIGINATE DATE
`|
`24 September, 2001
`1.2 Data Flow graph
`
`en
`
`Us«
`£
`c
`£
`S|
`g
`8
`§
`a
`= |- #21.
`
`1
`(s@ lar input/output
`
`pipeline stage
`|
`!
`Le
`
`§
`53} |
`2=
`
`
`
`_.
`_
`een
`
`
`
`Register File
`
`MAC
`
`'
`|
`
`if
`
`ae
`
`
`
`
`
`re requi
`
`
`
`
`
`fs |
`33
`oO
`uest_ igstl
`)
`BiTH
`
`33
`
`Register File
`
`MAC
`
`.
`|
`|
`
`i
`|
`i
`fe
`
`( scalar inputfoutout
`i
`
`2&
`
`2
`bee
`&
`fai]
`o
`
`pipeline stage
`
`8 | pn
`
`pipeline stage
`
`
`
`Register File
`
`x
`
`scalar inout/cutput
`
`Luor
`15
`eS
`&
`S
`s a_h
`Co)
`
`pe
`scalar inputoutput
`
`
`
`epee _
`—~
`
`texture re| pst
`
`Pop
`nd NS
`
`a ao
`
`to Primitive Assembly Unit or RenderBackend
`
` Exhibit 2018.docR400_Sequencer.dec
`
`PMSINS/04-10:3 7AMONBO}OLARM
`ssieg Byers © ATI Confidential. Reference Copyright Notice on Cover Page © *** psoas isa
`
`AMD1044_0256919
`
`ATI Ex. 2106
`IPR2023-00922
`Page 8 of 223
`
`
`
`
`
`
`
`
`s
`21
`oS
`
`a
`
`S|
`Go!
`2!
`BI
`2
`
`adcress (
`
`
`
`
`
`tevil |}
`
`
`= cSEa texture
`
`
`
`
`
`
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 8 of 223
`
`
`
`
`
`
`ORIGINATE DATE
`
`24 September, 2001
`
`EDIT DATE
`
`4 September, 201545
`Yaratl
`
`DOCUMENT-REV. NUM.
`
`GEN-CXXKXKX-REVA
`
`PAGE
`
`9 of 35
`
`
`
`
`
`|
`
`|
`
`The gray area represents blocks that are replicated 4 times per shader pipe (16 times on the overall chip).
`
`1.3 Control Graph
`
`be
`
`Ciause # + Rady
`
`WrAddr
`oMD
`
`cst
`
`iS
`
`CST
`
`SEQ
`
`|
`|
`|
`|
`|
`|
`|
`|
`|
`i
`
`
`“i
`
`:
`
`WrAddr
`
`Y
`
`v
`
`yw
`
`y
`
`FETCH
`
`
`
`
`Bo
`C Wrvec |
`| WrScal Wwraddr
`‘
`_
`
`$
`
`|
`Phase|
`emp CS8Tcstzcstipx “
`|
`
`RdAddr
`
`¢
`
`i
`
`_
`
`‘
`
`|
`
`SPO
`
`Re
`
`OF
`
`WrAddr
`
`||
`|
`
`|||
`|
`
`
`
`in red the ALU control interface, in blue the InterpolatedV/ector
`In green is represented the Fetch control interface,
`control interface and in purple is the outputfile control interface.
`
`2. Interpolated data bus
`The interpolators contain an IJ buffer to pack the information as much as possible before writing it to the register file.
`
`Enhibit 2018.decR(00_Sequencerdoc
`
`83169 Bytes*** © AT] Confidential. Reference Copyright Notice on Cover Page © ***
`PMSINS/04-10:3 7AMONBO}OLARM
`
`jonas psd
`
`AMD1044_0256920
`
`ATI Ex. 2106
`IPR2023-00922
`Page 9 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 9 of 223
`
`
`
`
`
`PAGE
`|
`R400 Sequencer Specification
`EDIT DATE
`ORIGINATE DATE
`
` | 24 September, 2001 4 September, 201516 | 10 of 35
`a
`fut
`i
`
`
`
`
`RE
`
`|||
`
`To RB
`
`||
`
`AG
`
`Al
`
`
`
`
`
`= l
`
`
`
`
`
`
`
`
`
`
`
`
`
`4
`
`ot
`
`b2
`
`+
`|
`E0
`i
`/
`T
`T
`T
`1
`1
`I
`
`!
`i
`!
`
` : . 1
`
`INTERPOLATORS
`'
`
`EI
`
`o>e
`
`e
`
`cae eo ae
`a
`
`se
`bus Ob
`lis butfer (ping-pong buffer}
`{26 bits * 2 (10) + 8 bits * 6 (delta [Js)+4 ex)
`.
`0
`Al
`Aa
`BO
`bits*6)* 16 (quads) * 2 (double-buffered)
`4096 bits
`
`32x 128
`
`Bt
`
`co
`
`ct
`
`C2
`
`Ys buffer (ping-pong buffer}
`24 bits * 16 quads *2
`766 bits
`32x24
`
`'
`
`nnn
`
`C3
`
`Ce
`
`cS
`Do
`
`{—
`
`EI
`EO
`D2
`of
`
`i
`1
`|
`!
`i
`\
`i
`t
`!
`
`
`
`
`
`512
`
`
`
`Exhibit 2018.decR400_Sequencerdec
`
`83188 Bytes** © ATI Confidential. Reference Copyright Notice on Cover Page @ *** gasis izsa
`PMIGAN O47ABIOFB04AOBM
`
`AMD1044_0256921
`
`ATI Ex. 2106
`IPR2023-00922
`Page 10 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 10 of 223
`
`
`
`
`AAAIALAAAXAPJf}|||oe
`
`
`
` cOldp|LeoSb¢-OO-pr|-82Z|-ZL)|-@Z-ZL|lS)00|2D)0gl3|}0q20)/0¢dds|-02AX00Za|vOLy002d|vOLWdsi+L-25-9€|-02A-@G|-9€ -O9l-pr
`
`
`
`-9S-OF|-PZ|-21-96/-OF-pz-@|OSLDSOev|0aLDSOevé
`
`“BF-ZE|-OLA“BP|-ZE|-OLAX1d|}co|La|ov1d|}¢O/]1d|ov0AALAAX|AX|AXds
`
`SS6E|C2reSS/6e|¢c2
`EZLZZL/LZLIOTLIGLL|SLL|ZLLQOLLESLLI|PLLIELL|ZLL/EEL/OLL)GL)@LypZL)OLGL)PL)EL}ZL]dL|OL
`
`
`ANALOATAXAXAX||fttp||te
`coZr|l6)SL)
`
`ANIAATAXAXAXAXP|tit|ft-
`
`
`
`
`
`
`HeOPPOLOEHOTRYOTTOTENGPCTSLPUOOLeBy36eq19A045UOVDIIONIGUAdODDOUSISJON"ENUSPYUOD[LY@wxcaecrcs—sop-wousnbog“gopysePRAZTHT
`
`
`
`ooLIEHCTT
`
`
`
`
`GEHOLLVATEKXXXXO-NADSrGLOdJequiees7Lo0g‘Iequaydespz
`
`
`
`
`doVdWON(AdaLNSWNOOdalydLigaaLvdSLVYNISIO
`
`
`
`
`
`
`
`
`
`
`65tp|42]LL?69|Cr|ZeLb
`LGSe|6L6-0Lg|S¢|6L6-0
`
`
`TWIRELVANeaddoAATLOaLOdd
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`AMD1044_0256922
`
`ATI Ex. 2106
`IPR2023-00922
`Page 11 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 11 of 223
`
`
`
`
`
`
`
`
`
`
`PAGE
`EDIT DATE
`ORIGINATE DATE
`12 of 35
`4 September. 20" 516
`24 September, 2001
`Above is an example of a tile we might receive. The IJ information is packed in the lJ buffer 2 quads at a time. The
`sequencer allows at any given time as many as four quadsto interpolate a parameter. They all have to come from the
`same primitive. Then the sequencer controls the write mask to the register to write the valid data in.
`
`R400 Sequencer Specification
`
`
`
`Instruction Store
`3.
`There is going to be only one instruction store for the whole chip.
`
`
`
` A -{t is likely to be a 1 port memory; we use 1
`clock to load the ALU instruction, 1 clocks to load the Fetch instruction, 1 clock to load 2 control flow instructions and
`1 clock to write instructions.
`
`It will contain 4096 instructions of 96 bits each.
`
`INSTRUCTION INDEX PORT control
`The instruction store is loaded by the CP thru the INSTRUCTION DATA,
`both
`reads
`and writes
`to
`the
`registers, The
`INSTRUCTION INDEX PORT is
` aute-incremented
`on
`
`
`INSTRUCTION DATA register.
`
`The next picture shows the various modes the CP can load the memory, The Sequencer has to keep track of the
`loading modes in order to wrap around the correct boundaries, The MSB of the INSTRUCTION INDEX PORT
`register contains the packet type for the seauencer to know where it must wrap around. The wrap around peints are
`arbitrary and they are specified in the VERTEX SHADER BASE and PIXEL SHADER BASE registers.
`
`Exhibit 2018.docR400_Sequercerdec
`
`83168 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ** ponans isa
`PMSINS/04-10:3 7AMONBO}OLARM
`
`AMD1044_0256923
`
`ATI Ex. 2106
`IPR2023-00922
`Page 12 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 12 of 223
`
`
`
`
`
`
`
`TWIRELVANeaddoAATLOaLOdd
`
`LOOZ/PL/LEL‘pajepdn
`
`
`
`
`
`
`
`
`
`|)YYepoosemddqdapoynsan0]Sesselppe3Bp0DSA
`“9podol)Guynsexeasvd@asgqvHsTaXld-MBqS.0]GudSMouy
`
`
`
`oe98p09SALon¥8P09Sd;;
`‘gpo9ayyBuynoexe
`oeHEIspooSomadOa}¥8Pp0DSd§9p09SA
`enb:HOTS0]SSLSOLD]:eeeveudosdde@8P0DSdJequanbesossyoig
`
`
`
`
`o}sesseippe-qngayeudoidde
`
`
`
`foroor,MOWAIA]UOIJONIJSU|JOSMAIS.dDdOOTY
`
`HeOPPOLOEHOTRYOTTOTENGPCTSLPUOOLeBy36eq19A045UOVDIIONIGUAdODDOUSISJON"ENUSPYUOD[LY@wxcaecrcs—sop-wousnbog“gopysePRAZTHT
`
`
`
`
`
`
`
`
`2|Ouls-|SGOWBurylend-03GOW XALYSA|vepoosn|POOSA|vaepcosA|2PodXSLYaA2ae2ewIPod00BurypoopareysaSV@YSCVHSPeleus
`
`
`
`
`
`|s60rS607
`WY9PODSAapo
`a38PODSd
`
`
`
`
`GEOELVATEKXXXXO-NADSrGLOdJequiees7Lo0g‘Iequaydespz
`
`porecaeraeraEELSROUeererrr
`
`
`
`
`dovdANN(AdaLNSINNOOdsalvaLidsaLvdSLVYNISIO
`lrASVUACWHS
`
`
`
`
`
`
`
`AMD1044_0256924
`
`ATI Ex. 2106
`IPR2023-00922
`Page 13 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 13 of 223
`
`
`
`
`
`
`
`
`
`
`
`
`
`PAGE
`R400 Sequencer Specification
`EDIT DATE
`ORIGINATE DATE
`
`i fh
`|
`|
`24 September, 2001
`4 September, 201546
`14 0f 35
`
`4. Sequencer Instructions
`All control flow instructions and move instructions are handled by the sequencer only. The ALUs will perform NOPs
`during this time (MOV PV,PV, PS,PS).
`
`5. Constant Stores
`The-constant-slore-ie-managed-by-ihe-GP-The sequencer is aware of where the constants are using a remaping
`table-alee-managed-by-the-GF. A likely size for the constant store is S/2x/28-1024x 128 bits.The-censiantelere-ia
`alec-planned-ie-be-shared.-The read BW from the constant store is 126 bits/clock and the write bandwith is 32/4
`bits/clock.
`
`In order to do constant store indexing, the sequencer must be loaded first with the indexes (that come from the
`GPRs). There are 144 wires fram the exit of the SP to the sequencer (9 bits pointers x 16 vertexes/clock). Since the
`data must pass thru the Shader pipe for the float to fixed convertion, there is a latency of 4 clocks (1 instruction)
`between the time the sequenceris loaded and the time one can index into the constant store. The assembly will look
`like this
`
`MOVA R1.X,R2.X%
`NOP
`ADD
`
`// Loads the sequencerwith the content of R2.X, also copies the content of R2.X into R1.*
`// latency of the float to fixed conversion
` R3,R4,CO/R2.X]// Uses the state from the sequencer to add R4 to CO[R2.X] into R3
`
`Note that we don’t really care about what is in the brackets because we use the state from the MOVAinstruction.
`R2.X is just written again for the sake of simplicity.
`
`The storage needed in the sequencerin order to support this feature is 2*64"9 bits = 1152bits.
`
`The texture state is also kept in a similar memory. The size of this memory is 192x128. Which lets us load a texture
`statein277?
`
`The control flow constant memory docsn’t sit behind a renaming table. itis register mapped and thus the driver must
`reload its content each time there is a state change.
`
`6. Looping and Branches
`Loops and branches are planned to be supported and will have to be deall with at lhe sequencerlevel. VVe plan on
`supporting constant loops and branches using a control program.
`
`The controlling state.
`As per Dx the following state is available for control flow:
`
`Boolean(15:0]
`loop_count[7:0][7:0]
`In addition:
`loop_start [7:0] [7:0]
`loop_step [7:0] [7:0]
`Exist to give more control to the controlling program.
`
`Wewill extend that in the R400 to:
`Boolean[255:0]
`Loop_count[7:O][15:0]
`Loop_Start[7:0) [15:0] times 2-3(one for constant,registert1,
`Loop_Step[?:0] [15:0] times 2-3(ane for constant, register,
`Loop_End[7:0] [15:0]
`
`register?)
`regisier2zregiater)
`
`Exhibit 2018.docR400_Sequercerdec
`
`83168 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ** ponans isa
`PRIAOt SOFAM OASOLOhSGRM
`:
`
`
`
`AMD1044_0256925
`
`ATI Ex. 2106
`IPR2023-00922
`Page 14 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 14 of 223
`
`
`
`
`
`DOCUMENT-REV. NUM.
`EDIT DATE
`ORIGINATE DATE
`GEN-CXXXXX-REVA
`4 September, 201545
`24 September, 2001
`{ISSUE: Howis the controlling state loaded and how many contexts do we have?}
`
`
`
`|
`
`
`
`PAGE
`15 of 35
`
`We have a stack of 4 elementsfor calling subroutines and 4 loop counters to allow for nested loops.
`
`6.2 The Control Flow Program
`The R300 uses a match method for control flow: The shader is executed, and at every instruction its address is
`compared with addresses (or address?) in a control table. The “event” in the contro! table can redirect operations in
`the program.
`
`The Method chosen for the R400 is a “control program”. The control program has ten basic instructions:
`
`Execute
`Conditional_execute
`Conditional_Execute_Predicates
`Conditional_jump
`Call
`Return
`Loop_start
`Loop_end
`End_of_clause
`NOP
`
`Execute, causes the specified number ofinstructionsin instruction store to be executed.
`Conditional_execute checks a condition first, and if true, causes the specified number of instructions in instruction
`store to be executed.
`Loop_start resets the corresponding loop counter to the start value on the first pass after it checks for the end
`condition and if met jumps over to a specified address.
`Loop_end increments (decrements?) the loop counter and jumps back the specified number of instructions.
`Call jumps to an address and pushes the IP counter on the stack. On the return instruction, the IP is poped from the
`stack.
`Conditional_execute_or_Jump executes a block of instructions or jumps to an address is the condition is not met.
`Conditional_execute_Predicates executes a block of instructionsif all bits in the predicate vectors meet the condition.
`End_of_clause marks the end ofa clause.
`Conditional_jumps jumps to an addressif the condition is met.
`NOPis a regular NOP
`
`NOTE THAT ALL JUMPS MUST JUMP TO EVEN CFP ADDRESSES. Thus the compiler must insert NOPs where
`needed to align the jumps on even CFP addresses.
`
`Also if the jump is logically bigger than pshader_cnt_size (or vshader_cnitl_size) we break the program (clause) and
`set the debug registers. If an execute or conditional_execute is lower than cntl_size or bigger than size we also break
`the program (clause) and set the debug registers.
`
`
`
`We haveto fit instructions into 48 bits in order to be able to put two control flow instruction perline in the instruction
`store.
`
`
`Execute
`
`47 464 4. 24 B.A
`Addressing |
`00001
`RESERVED
`|
`Instruction count
`|
`Exec Address
`Execute up to 4k instructions at the specified addressin the instruction memory.
`NOP
`
`47
`46... 42
`41...0
`Addressing
`00010
`RESERVED
`
`
`
`Enhibit 2018.decR(00_Sequencerdoc
`
`83169 Bytes*** © AT] Confidential. Reference Copyright Notice on Cover Page © ***
`PMSINS/04-10:3 7AMONBO}OLARM
`
`jonas psd
`
`AMD1044_0256926
`
`ATI Ex. 2106
`IPR2023-00922
`Page 15 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 15 of 223
`
`
`
`
`
`ORIGINATE DATE
`24 September, 2001
`
`EDIT DATE
`4 September, 201516
`
`R400 Sequencer Specification
`
`PAGE
`16 0f 35
`
`B T
`
`his-MUSTbeafonvardiume-This is a regular NOP,
`
`Conditionnal_Execute
`a 46 48
`|
`33
`32...24
`41... 34
`|
`23... 12
`|
`11...0
`
`Addressing |
`00011
`
`Boolean address
`Condition
`RESERVED | Instruction_count
`| Exec Address
`
`lf the specified boolean (8 bits can address 256 booleans) meets the specified condition then execute the specified
`instructions (up to 4k instructions)
`
`Conditionnal_Execute_Predicates
`23... 12
`|
`41... 38
`'
`37
`|
`36...24
`| Condition
`RESERVED | Instruction count
`Predicate vector
`
`47
`|Addressing
`
`'
`
`46... 42
`007100
`
`|
`
`11.90
`|
`| Exec Address
`
`a
`
`if AND/OR matches the condition execute the specified number of
`Check the AND/OR of all current predicate bits.
`instructions. VWe need to AND/OR this with the kill mask in order not to consider the pixels that aren't valid.
`
`
`
`— “Loop_Start 7
`
`a7
`[a6 42
`15... 4
`41... 16
`[
`3...0
`|I
`RESERVED
`Jump address
`00101
`Addressing
`
`Loop Start. Compares the loop count with the end value. If loop condition not met jump to the address. Forward jump
`only. Also computes the index value.
`
`
`Loop End
`
`3...0
`47
`[
`46... 42
`41... 16
`15...4
`
`
`|||
`-2oI:Si: o
`00111
`RESERVED
`Start address
`Addressing
`
`
`
`
`
`
`
`Loop end. Increments the counter by one and jumps BACKonily to the start of the loop.
`
`The waythis is described does not prevent nested loops, and the inclusion of the loop id makethis easy to do.
`Cail
`
`47 | 46. 42
`41...12
`|
`11...0
`| O7000
`RESERVED
`|
`Address
`Addressing |
`
`' J
`
`umpsto the specified address and pushes the IP counter on the stack.
`Retum
`47
`46.42
`|
`41.0
`01001
`RESERVED
`|I
`Addressing
`
`Pops the topmost address frorn the stack and jumps to that address. If nothing is on the stack, the program will just
`continue to the next instruction.
`
`
`Conditionnal_Jump _
`
`47| 46.42 34 P33 32.182 8
`
`
`01010 “Boolean address|Condition | RESERVED | FWonly Address
`
`Addressing
`i
`
`| i
`
`f condition met, jumps to the address. FORWARD jump only allowedif bit 12 set. Bit 12 is only an optimization for the
`compiler and should NOT be exposed to the API.
`
`Exhibit 2012.docR400_Sequencerdoc
`
`83168 Bytac*** © ATI Confidential. Reference Copyright Notice on Cover Page © ***
`PMIGNS010 AMIONBIOO46BM
`
`josie i954)
`
`AMD1044_0256927
`
`ATI Ex. 2106
`IPR2023-00922
`Page 16 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 16 of 223
`
`
`
`
`
`
`EDIT DATE
`DOCUMENT-REV. NUM.
`
`ORIGINATE DATE
`
`PAGE
`
`
`
`
`
`
`
`
`24 September, 2001
`4 September, 201548
`GEN-CXXXXX-REVA
`Aan
`
`End_of_Clause
`
`17 of 35
`
`41.0
`47] 46 ADT
`RESERVED
`01011
`Addressing
`Marks the end of a clause.
`
`=
`
`If the
`To preventinfinite loops, we will keep 9 bits loop counters instead of 8 (we are only able to loop 256 times).
`counter goes higher than 255 then the loop_end or the loop_start instruction is going to break the loop and set thde
`
`debug registers.FeaRoeMGISSR ngoxee aos:
`
`
`AS index er conetant indexiingiebite the-index axoseds the nurnber ofrequested regiisters.
`
`The basic modelis as follows:
`
`The render state defined the clause boundaries:
`Vertex_shader_fetch[7:0][7:0]
`// eight 8 bit pointers to the location where each clauses control program is located
`Vertex_shader_alu[7:O0][7:0]
`# eight 8 bit pointers to the location where each clauses control program is located
`Pixel_shader_fetch[7:0][7:0]
`# eight 8 bit pointers to the location where each clauses control program is located
`Pixel_shader_alu[7:0][7:0]
`#/ eight 8 bit pointers to the location where each clauses control program is located
`
`de (Formatted
`—
`Apointer value of FF meansthat theclause doesn't contain any instructions. __
`(with the
`The control program for a given clause is executed to completion before moving to another clause,
`exception of the pick two nature of the alu execution). The control program is the only program aware of the clause
`boundaries.
`
`6.3 Data dependant predicate instructions
`Data dependant conditionals will be supported in the R400. The only way we plan to support thoseis by supporting
`three vector/scalar predicate operations of the form:
`
`PRED_SETE_# - similar to SETE except that the result is ‘exported’ to the sequencer.
`PRED_SETGT_#- similar to SETGT except that the result is ‘exported’ to the sequencer
`PRED_SETGTE_# - similar to SETGTE exceptthat the result is ‘exported’ to the sequencer
`
`
`
`For the scalar operations only we will also support the two following instructions:
`PRED_SETEO_#-—SETEO
`PRED_SETE1_#~-SETE1
`
`- 1 or O that is sent using the same data path as the MOVA instruction. The sequencer will
`The export is a single bit
`maintain 4 sets of 64 bit precicate vectors (in fact 8 sets because we interleave two programsbut only 4 will be
`exposed) and useit to control the write masking. This predicate is not maintained