throbber

`
` 24 September, 2001
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`4 September, 201516
`i
`ot
`
`GEN-CXXXXX-REVA
`
`63168 Bytes*** @ ATI Confidential. Reference Copyright Notice on Cover Page © *** nonins oad
`
`
`
`-
`"
`Author:
`
`Laurent Lefebvre
`
`PAGE
`
`1 of 35
`
`
`
` issue To: | Copy No:
`
`R400 Sequencer Specification
`
`SEQ
`
`Version 1.24
`
`It provides an overview of the
`Qverview: This is an archiectural specification for ihe R400 Sequencer block (SEQ).
`required capabilities and expected uses of the block.
`it also describes the block interfaces,
`internal sub-
`blocks, and provides internal stale diagrams.
`
`AUTOMATICALLY UPDATED FIELDS:
`Document Location:
`C\perforcer400\archidoc\gik(RE\R400_Sequencer.dec
`
`Current Intranet Search Title:
`R400 Sequencer Specification
`
`:
`28
`APPROVALS.
`Sos
`
`Name/Dépt
`ce
`Signature/Date
`
`
`
`
`
`
`
`Remarks:
`
`
`
`THIS DOCUMENT CONTAINS CONFIDENTIAL INFORMATION THAT COULD BE
`
`
`
`
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES
`INC. THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`“Copyright 2001, ATI Technologies Inc. All rignts reserved. The material in this document constitutes an unpublished
`work created in 2001. The use of this copyright notice is intended to provide notice that ATI owns a copyright in this
`unpublished work. The copyright notice is not an admission that publication has occurred. This work contains
`confidential, proprietary information and trade secrets of ATI. No part of this document may be used, reproduced, or
`transmitted in any form or by any means without the prior written permission of ATI Technologies Inc.”
`
`
`Exhibit 2018 decR400_Sequencer.des
`
`PHAGANOFABOAGAOdoBM
`
`ATI 2018
`
`LGv. ATI
`IPR2015-00325
`
`AMD1044_0256912
`
`ATI Ex. 2106
`IPR2023-00922
`Page 1 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 1 of 223
`
`

`

`
`ORIGINATE DATE
`EDIT DATE
`PAGE
`R400 Sequencer Specification
`| 4 September, 201516 2 0f 35
`
`24 September, 2001
`L
`2
`UA
`
`Table Of Contents
`
`
`
`
`
`a
`we
`OVERVIEW...
`
`Top Level Bloc
`7
`lagrarn
`12
`Data Flow graph..........
`
`1a
`Control Graph cece ceeeeereeerteetteers 9
`2
`INTERPOLATED DATA BUS. .....
`
`3.
`INSTRUCTION STORE...
`4
`*
`SEQUENCER INSTRUCTIONS ..
`
`5.
`CONSTANT STORE...
`6.
`LOOPING AND BRANCH
`
`61
`The controlling state...
`
`6.2
`The Control Flow Program...
`Me
`63
`Data dependant predicate instructions bees“AT
`64
`HVY Detection of PV|PS 184646
`
`64
`18:
`Registerfile indexing......
`66
`Predicated instruction support for Texture
`» 1846
`CHAUSEB cette cate etesenevevae eeaepraepenaeens
`67
`Debugging the Shaders.....
`1846
`
`Method 1: Debugging registers vevnaee18416
`6.7.1
`Method 2: Exporting the values in the
`6.7.2
`
`GPRe (12) 19442
`6.7.3
`Method 3: Selective export of a 32 bit
`TO4e47
`Dword,
`PIXEL KILL MASK...
`7
`HOS SURFACESocc
`8.
`REGISTER FILE ALLOCATION.
`%
`FETCH ARBITRATION...
`10.
`
`ALU ARBITRATION...
`ti.
`
`HANDLING STALLS......
`2248:
`12.
`
`CONTENT OF THE RESERVATION STATION
`13.
`FIFOS 224920
`a 2241920
`I4.
`THE OUTPUT FILE ........
`224920
`15.
`bE FORMAT... ecw eee
`16.
`THE PARAMETER CACHE....
`
`17.
`VERTEX POSITION EXPORTING...
`18.
`EXPORTING ARBITRATION...
`
`19.
`REAL TIME COMMANDS.
`
`20,
`REGISTERS..
`
`20.1
`Control...
`
`Context oe
`20.2
`
`
`DEBUG REGISTERS. .....
`2522
`21.
`COMMPOL. ee eee eerste rere reeeoeeae
`211
`
`
`
`wenn 204718
`
`21.2
`22.
`22,
`22.4.1
`22.1.2
`22.1.3
`22,14
`
`22.1.8
`control Bus
`22.1.6
`bus
`22.1.7
`22.1.8
`
`22.1.9
`
`Context...
`INTERFACES...
`
`
`External Interfaces .
`
`
`PA/SC to SPO: LL) bus oo.
`PAISC to SEQ: Li Contral bus 262223
`SEQ to SPO: interpolator bus .262323
`SEQ to SPO: Parameter Cache bus
`
`2ieded
`SEG to SXO: Parameter Cache Mux
`272304
`8X0 to SPO: Parameter Cache Return
`2ie324
`VGT to SPO/SEQ : Vertex Bus 282324
`CP to SEQ : Constant store load
`
`282424
`CP to SEQ): Fetch State store load
`282424
`CP to SEQ : Control State store load
`282424
`MH to SEG: Instruction store Load
`
`262425
`SFO to SXO: Pixel read from RBs
`282425
`SEG to SMO: Control bus... 262425
`SXO to SEQ : Output file control
`
`292425
`SPO to SX0: Position return bus
`
`302828
`Shader Engine to Fetch Unit Bus (Fast
`302526
`Sequencer to Fetch Unit bus Slow
`302526
`EXAMPLES OF PROGRAM EXECUTIONS
`D12826
`23.1.1
`Sequencer Control of a Vector of
`322626
`Vertices
`Sequencer Control of a Vector of
`23.1.2
`Pixels
`
`-
`23.1.3
`24.
`OPEN ISSUESi ceeereeeteeees382828
`
`22.1.10
`
`22,111
`
`22112
`
`22.41.16
`Bus)
`22A417
`Bus)
`23.
`
`Revision Changes:
`
`Rev 0.1 (Laurent Lefebvre)
`Date: May 7, 2001
`
`First draft.
`
`Rev 0.2 (Laurent Lefebvre)
`Date : July 9, 2007
`Rev 0.3 (Laurent Lefebvre)
`Date : August 6, 2001
`Rev 0.4 (Laurent Lefebvre)
`Date : August 24, 2001
`Exhibit 2018dock400_Sequencer.doc
`
`Changed the interfaces to reflect the changesin the
`SP. Added some details in the arbitration section.
`Reviewed the Sequencer spec after the meeting on
`August 3, 2001.
`Added the dynamic allocation method for register
`file and an example (written in part by Vic) of the
`53168 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ***
`jonsns oe”
`latinnens
`Ok LAMA GABIOALASPM.
`
`AMD1044_0256913
`
`ATI Ex. 2106
`IPR2023-00922
`Page 2 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 2 of 223
`
`

`

`
`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`GEN-CXXXXX-REVA
`3 of 35
`
`24 September, 2001
`
`Rev 0.5 (Laurent Lefebvre)
`Date : September 7, 2001
`Rev 0.6 (Laurent Lefebvre)
`Date : September 24, 2001
`Rev 0.7 (Laurent Lefebvre)
`Date : October 5, 2001
`
`Rev 0.8 (Laurent Lefebvre)
`Date : October 8, 2001
`Rev 0.9 (Laurent Lefebvre)
`Date : October 17, 2001
`
`Rev 1.0 (Laurent Lefebvre)
`Date : October 19, 2001
`Rev 1.1 (Laurent Lefebvre)
`Date : October 26, 2001
`
`Rev 1.2 (Laurent Lefebvre)
`Date: November 16, 20014
`
`4 September, 201515
`es
`flow of pixels/vertices in the sequencer.
`Added timing diagrams(Vic)
`
`the new R400
`
`reflect
`spec to
`Changed the
`architecture. Added interfaces.
`instruction
`Added constant
`store management,
`store management, control flow management and
`data dependantpredication.
`Changed the control
`flow method to be more
`flexible. Also updated the external interfaces.
`Incorporated changes madein the 10/18/01 control
`flow meeting. Added a NOP instruction, removed
`the
`conditional_execute_or_jump. Added debug
`registers.
`Refined interfaces to RB. Added state registers.
`
`delta
`Added SEQ--SPO interfaces. Changed
`precision. Changed VGT—SP0 interface. Debug
`Methods added.
`Interfaces greatly refined. Cleaned up the spec,
`
`
`
`Exhibit 201.docR400_Sequencerdec
`
`83168 Bytes*** @ AT] Confidential. Reference Copyright Notice on Cover Page © ***
`PMIGNS010 AMIONBIOO46BM
`
`
`pjoosus 195
`
`AMD1044_0256914
`
`ATI Ex. 2106
`IPR2023-00922
`Page 3 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 3 of 223
`
`

`

`
`
`
`
`ORIGINATE DATE
`24 September, 2001
`
`EDIT DATE
`4 September, 201546
`fh
`
`R400 Sequencer Specification
`
`PAGE
`4o0f35
`
`1. Overview
`The sequencer first arbitrates between vectors of 64 vertices thal arrive directly from primitive assembly and vectors
`of 16 quads (64 pixels) that are generated in the raster engine.
`
`The vertex or pixel program specifies how many GPR’s it needs to execute. The sequencer will not start the next
`vector until the needed spaceis available.
`
`It chooses two ALU clauses and a fetch clause to execute, and
`The sequencer is based on the R300 design.
`executes all of the instructions in a clause before looking for a new clause of the same type. Two ALU clauses are
`executed interleaved to hide the ALU latency. Each vector will have eight fetch and eight ALU clauses, but clauses do
`not need to contain instructions. A vector of pixels or vertices ping-pongs along the sequencer FIFO, bouncing from
`fetch reservation station to alu reservation station. A FIFO exists between each reservation stage, holding up vectors
`until the vector currently occupying a reservation station has left. A vector at a reservation station can be chosen to
`execute. The sequencer looks at all eight alu reservation stations to choose an alu clause to execute and all eight
`fetch stations to choose a fetch clause to execute. The arbitrator will give priority to clauses/reservation stations
`closer to the bottom of the pipeline.
`It will not execute an alu clause until the fetch fetchesinitiated by the previous
`fetch clause have completed. There are two separate sets of reservation stations, one for pixel vectors and one for
`vertices vectors. This way a pixel can pass a vertex and a vertex can pass a pixel.
`
`To support the shader pipe the raster engine also contains the shader instruction cache and consiant store. There
`are only one constant store for the whole chip and one instruction store. These will be shared among the four shader
`pipes. The four shader pipes also execute the same instuctieninstruction thus there is only one sequencer for the
`whole chip.
`
`Exhibit 2018.docR400_Sequercerdec
`
`83168 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ** ponans isa
`PMSINS/04-10:3 7AMONBO}OLARM
`
`AMD1044_0256915
`
`ATI Ex. 2106
`IPR2023-00922
`Page 4 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 4 of 223
`
`

`

`
`
`
`
`TWIRELVANeaddoAATLOaLOdd
`
`dani~~)ean
`
`
`
`
`
`
`
`MVESSONOPI
`
`bomenESCREZo
`
`
`
`dsdS
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`HeOPPOLOEHOTRYOTTOTENGPCTSLPUOOLeBya6ed19A04UOVDIIONIGUAdODDOUSISJOY"[ENUSPYUOD[LY@wsae01cs—sopeouanbeg“gorwsePSiOzTal
`
`
`
`
`
`
`
`
`
`,forenamewis!TOSLNODfi
`.y»gy»gy&ayapoorer
`—_gor0d|go/od-Old+)BODa>4rr|INVISNODonan——SNISNSv|ineHOLS
`
`
`
` aSeaaoeSYOLSISN
`odifo_—LLyPOTEETXm
`x!yo)A6LENI
`
`
`—eeeT—‘eVBLMXL_
`
`
`
`nsnanaeEVOreereepy
`
`
`
`SEJOGVATEKXXXXO-NADSrGLOdJequiees7Lo0g‘Iequaydespz
`
`
`
`
`doVdWON(AdaLNSWNOOdalydLigaaLvdSLVYNISIO
`
`
`!|SHELNIOd
`
`
`
`
`
`
`
`
`
`SLVLSHOLSad
`
`LSNIXGL
`
`
`
`avsaOd‘
`
`
`
`
`
`
`
`XELUSA
`
`TOMLNOD
`
`aucA
`
`BL209CIT-O
`
`A
`
`5ovo)Sve
`
`AMD1044_0256916
`
`ATI Ex. 2106
`IPR2023-00922
`Page 5 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 5 of 223
`
`
`
`
`
`
`
`
`

`

`
`
`
`
`PAGE
`R400 Sequencer Specification
`EDIT DATE
`ORIGINATE DATE
`
`i fh
`|
`|
`24 September, 2001
`4 September, 201546
`Sof 35
`
`1.1 Top Level Block Diagram
`
`vertox/pixel vector arbitrator
`
`Possible delay for available GPR’s «
`
`
`‘oxture clause 0
`
`eservation station
`FIFO
`
`ALI clanse 0
`@-——teservation station
`|
`‘
`pe FIFO
`‘exture clause 1
`eservationstation
`ty
`
`wg AT clanse 1
`
`reservationstation
`[Pexture clause 2
`pe FO
`>|
`cc —
`eservationstation
`[FFT]g—_QA clanse 2
`
`reservation station
`iPexture clause 3
`TIPO pe,CL
`eservationstation
`FIFO
`
`TiO
`
`<
`
`extire arbitrator
`
`+
`
`oxture arbitrator
`
`hag
`
`I
`—ALU clause 3
`Feservaiion station
`|
`
`Prexture clause 4
`eservation station
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`nS
`
`‘ag
`
`og
`
`ALU clause 4
`reservationstation.
`$$$
`TS
`(eee
`Texture clause 5
`
`eservation station
`ATL clanse 4
`
`reservation station
`(Pexture clause 6
`
`reservation station.
`i
`a FIFO
`FEDeta
`-
`ALUclanse 6
`reservation station
`“BIRO
`-
`Ld oorextre clause 7
`eservation station
`i fen, FLEE itleg—ALUclause 7
`
`!
`reservation station
`
`There are two sets of the above figure, one for vertices and one for pixels.
`
`Depending on the arbitration state, the sequencer will either choose a vertex or a pixel packet. The control packet
`| consists of 243 bits of state, 6-7 bits for the base address of the Shader program and some information on the
`coverage to determine fetch LOD plus other various small statebits.
`
`Exhibit 2018.docR400_Sequercerdec
`
`83168 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ** ponans isa
`PRIAOt SOFAM OASOLOhSGRM
`:
`
`AMD1044_0256917
`
`ATI Ex. 2106
`IPR2023-00922
`Page 6 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 6 of 223
`
`

`

`
`
`
`
`ORIGINATE DATE
`
`
`
`
`
` Saas
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`(of 35
`GEN-CXXXXX-REVA
`4 September, 201518
`24 September, 2001
`a
`2 FR
`On receipt of a packet. the input state machine (not pictured but just before the first FIFO) allocated enough spacein
`the registers to store the interpolated values and temporaries. Following this, the input state machine stacks the
`packetin the first FIFO.
`
`On receipt of a command, the level 0 fetch machine issues a texuretexiure request and corresponding register
`address for the fetch address (ta). A small command (tcmd) is passed to the fetch system identifying the current level
`number (0) as well as the register write address for the fetch return data. One fetch request is sent every 4 clocks
`causing the texturing of sixteen 2x2s worth of data (or 64 vertices), Once all the requests are sent the packetis put in
`FIFO 1.
`
`Upon recept of the return data, the fetch unit writes the data to the register file using the write address that was
`provided by the level O fetch machine and sends the clause number (0) to the level 0 fetch state machine to signify
`that the write is done and thus the data is ready. Then, the level 0 fetch machine increments the counter of FIFO 1 to
`signify to the ALU 1 that the data is ready to be processed.
`
`On receipt of a command, the level O ALU machine first decrements the input FIFO counter and then issues a
`complete set of level 0 shader instructions. For each instruction, the state machine generates 3 source addresses,
`one destination address (3 cycles later) and an instruction. Once the last instruction as been issued, the packetis put
`into FIFO 2.
`
`There will always be two active ALU clauses at any given time (and two arbitrersarbiters). One arbitrerarbiter
`will arbitrate over the odd instructions (4 clocks cycles) and the other one will arbitrate over the even
`instructions (4 clocks cycles). The only constraints between the two areitrersarbiters is that they are not
`allowed to pick the same clause number as the other one is currently working on if the packet is not of the
`same type (render state).
`
`if the packet is a vertex packet, upon reaching ALU clause 3, it can export the position if the position is ready. So the
`
`arbitverarbiter must prevent ALU clause 3 to be selected if the positional buffer is full (or can’t be accessed). Along
`with the positional data, the location where the vertex data is to be put is also sent (parameter data pointers).
`
`{ISSUE: How do we handle parameter cache pointers (computed, semi-computed or not computed)?}
`
`A special case is for HOS surfaces wich can export 12 parameters per last 6 clauses to the output buffer. If the output
`buffer is full or doesn't have enough space the sequencerwill prevent such a vertex group to enter an exporting
`clause.
`
`Regular pixel and vertex shaders can export 12 pararneters to memory from the last clause only (7).
`
`All other level process in the same way until the packetfinally reaches the last ALU machine (7). On completion of the
`level 7 ALU clause, a valid bit is sent to the Render Backend which picks up the color data. This requires that the last
`instruction writes to the output register — a condition that is almost always true.
`If the packet was a vertex packet,
`instead of sending the valid bit to the RB, it is sent to the PA so it can know that the data present in the parameter
`store is valid.
`
`Only two ALU state machine may have access to the register file address bus or the instruction decode bus at one
`time. Similarly, only one fetch state machine may have access to the register file address bus at one time. Arbitration
`is performed by three arstrerarbiter blocks (two for the ALU state machines and onefor the fetch state machines).
`
`The arbtrerearbiters always favor the higher number state machines, preventing a bunch of half finished jobs from
`clogging up the registerfiles.
`
`Exhibit 2018.docR4G0_Sequencerdec
`
`83168 Bytas*** © ATI Confidential. Reference Copyright Notice on Cover Page © ***
`PMIGNS010 AMIONBIOO46BM
`
`joosns
`
`
`AMD1044_0256918
`
`ATI Ex. 2106
`IPR2023-00922
`Page7 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 7 of 223
`
`

`

` |
`
`EDIT DATE
`
`4 September, 2075416
`iat
`
`R400 Sequencer Specification
`
`PAGE
`8 of 35
`
`
`
`ORIGINATE DATE
`|
`24 September, 2001
`1.2 Data Flow graph
`
`en
`
`Us«

`c

`S|
`g
`8

`a
`= |- #21.
`
`1
`(s@ lar input/output
`
`pipeline stage
`|
`!
`Le
`

`53} |
`2=
`
`
`
`_.
`_
`een
`
`
`
`Register File
`
`MAC
`
`'
`|
`
`if
`
`ae
`
`
`
`
`
`re requi
`
`
`
`
`
`fs |
`33
`oO
`uest_ igstl
`)
`BiTH
`
`33
`
`Register File
`
`MAC
`
`.
`|
`|
`
`i
`|
`i
`fe
`
`( scalar inputfoutout
`i
`
`2&
`
`2
`bee
`&
`fai]
`o
`
`pipeline stage
`
`8 | pn
`
`pipeline stage
`
`
`
`Register File
`
`x
`
`scalar inout/cutput
`
`Luor
`15
`eS
`&
`S
`s a_h
`Co)
`
`pe
`scalar inputoutput
`
`
`
`epee _
`—~
`
`texture re| pst
`
`Pop
`nd NS
`
`a ao
`
`to Primitive Assembly Unit or RenderBackend
`
` Exhibit 2018.docR400_Sequencer.dec
`
`PMSINS/04-10:3 7AMONBO}OLARM
`ssieg Byers © ATI Confidential. Reference Copyright Notice on Cover Page © *** psoas isa
`
`AMD1044_0256919
`
`ATI Ex. 2106
`IPR2023-00922
`Page 8 of 223
`
`
`
`
`
`
`
`
`s
`21
`oS
`
`a
`
`S|
`Go!
`2!
`BI
`2
`
`adcress (
`
`
`
`
`
`tevil |}
`
`
`= cSEa texture
`
`
`
`
`
`
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 8 of 223
`
`

`

`
`
`
`ORIGINATE DATE
`
`24 September, 2001
`
`EDIT DATE
`
`4 September, 201545
`Yaratl
`
`DOCUMENT-REV. NUM.
`
`GEN-CXXKXKX-REVA
`
`PAGE
`
`9 of 35
`
`
`
`
`
`|
`
`|
`
`The gray area represents blocks that are replicated 4 times per shader pipe (16 times on the overall chip).
`
`1.3 Control Graph
`
`be
`
`Ciause # + Rady
`
`WrAddr
`oMD
`
`cst
`
`iS
`
`CST
`
`SEQ
`
`|
`|
`|
`|
`|
`|
`|
`|
`|
`i
`
`
`“i
`
`:
`
`WrAddr
`
`Y
`
`v
`
`yw
`
`y
`
`FETCH
`
`
`
`
`Bo
`C Wrvec |
`| WrScal Wwraddr
`‘
`_
`
`$
`
`|
`Phase|
`emp CS8Tcstzcstipx “
`|
`
`RdAddr
`

`
`i
`
`_
`
`‘
`
`|
`
`SPO
`
`Re
`
`OF
`
`WrAddr
`
`||
`|
`
`|||
`|
`
`
`
`in red the ALU control interface, in blue the InterpolatedV/ector
`In green is represented the Fetch control interface,
`control interface and in purple is the outputfile control interface.
`
`2. Interpolated data bus
`The interpolators contain an IJ buffer to pack the information as much as possible before writing it to the register file.
`
`Enhibit 2018.decR(00_Sequencerdoc
`
`83169 Bytes*** © AT] Confidential. Reference Copyright Notice on Cover Page © ***
`PMSINS/04-10:3 7AMONBO}OLARM
`
`jonas psd
`
`AMD1044_0256920
`
`ATI Ex. 2106
`IPR2023-00922
`Page 9 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 9 of 223
`
`

`

`
`
`PAGE
`|
`R400 Sequencer Specification
`EDIT DATE
`ORIGINATE DATE
`
` | 24 September, 2001 4 September, 201516 | 10 of 35
`a
`fut
`i
`
`
`
`
`RE
`
`|||
`
`To RB
`
`||
`
`AG
`
`Al
`
`
`
`
`
`= l
`
`
`
`
`
`
`
`
`
`
`
`
`
`4
`
`ot
`
`b2
`
`+
`|
`E0
`i
`/
`T
`T
`T
`1
`1
`I
`
`!
`i
`!
`
` : . 1
`
`INTERPOLATORS
`'
`
`EI
`
`o>e
`
`e
`
`cae eo ae
`a
`
`se
`bus Ob
`lis butfer (ping-pong buffer}
`{26 bits * 2 (10) + 8 bits * 6 (delta [Js)+4 ex)
`.
`0
`Al
`Aa
`BO
`bits*6)* 16 (quads) * 2 (double-buffered)
`4096 bits
`
`32x 128
`
`Bt
`
`co
`
`ct
`
`C2
`
`Ys buffer (ping-pong buffer}
`24 bits * 16 quads *2
`766 bits
`32x24
`
`'
`
`nnn
`
`C3
`
`Ce
`
`cS
`Do
`
`{—
`
`EI
`EO
`D2
`of
`
`i
`1
`|
`!
`i
`\
`i
`t
`!
`
`
`
`
`
`512
`
`
`
`Exhibit 2018.decR400_Sequencerdec
`
`83188 Bytes** © ATI Confidential. Reference Copyright Notice on Cover Page @ *** gasis izsa
`PMIGAN O47ABIOFB04AOBM
`
`AMD1044_0256921
`
`ATI Ex. 2106
`IPR2023-00922
`Page 10 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 10 of 223
`
`

`

`
`AAAIALAAAXAPJf}|||oe
`
`
`
` cOldp|LeoSb¢-OO-pr|-82Z|-ZL)|-@Z-ZL|lS)00|2D)0gl3|}0q20)/0¢dds|-02AX00Za|vOLy002d|vOLWdsi+L-25-9€|-02A-@G|-9€ -O9l-pr
`
`
`
`-9S-OF|-PZ|-21-96/-OF-pz-@|OSLDSOev|0aLDSOevé
`
`“BF-ZE|-OLA“BP|-ZE|-OLAX1d|}co|La|ov1d|}¢O/]1d|ov0AALAAX|AX|AXds
`
`SS6E|C2reSS/6e|¢c2
`EZLZZL/LZLIOTLIGLL|SLL|ZLLQOLLESLLI|PLLIELL|ZLL/EEL/OLL)GL)@LypZL)OLGL)PL)EL}ZL]dL|OL
`
`
`ANALOATAXAXAX||fttp||te
`coZr|l6)SL)
`
`ANIAATAXAXAXAXP|tit|ft-
`
`
`
`
`
`
`HeOPPOLOEHOTRYOTTOTENGPCTSLPUOOLeBy36eq19A045UOVDIIONIGUAdODDOUSISJON"ENUSPYUOD[LY@wxcaecrcs—sop-wousnbog“gopysePRAZTHT
`
`
`
`ooLIEHCTT
`
`
`
`
`GEHOLLVATEKXXXXO-NADSrGLOdJequiees7Lo0g‘Iequaydespz
`
`
`
`
`doVdWON(AdaLNSWNOOdalydLigaaLvdSLVYNISIO
`
`
`
`
`
`
`
`
`
`
`65tp|42]LL?69|Cr|ZeLb
`LGSe|6L6-0Lg|S¢|6L6-0
`
`
`TWIRELVANeaddoAATLOaLOdd
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`AMD1044_0256922
`
`ATI Ex. 2106
`IPR2023-00922
`Page 11 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 11 of 223
`
`
`

`

`
`
`
`
`
`
`PAGE
`EDIT DATE
`ORIGINATE DATE
`12 of 35
`4 September. 20" 516
`24 September, 2001
`Above is an example of a tile we might receive. The IJ information is packed in the lJ buffer 2 quads at a time. The
`sequencer allows at any given time as many as four quadsto interpolate a parameter. They all have to come from the
`same primitive. Then the sequencer controls the write mask to the register to write the valid data in.
`
`R400 Sequencer Specification
`
`
`
`Instruction Store
`3.
`There is going to be only one instruction store for the whole chip.
`
`
`
` A -{t is likely to be a 1 port memory; we use 1
`clock to load the ALU instruction, 1 clocks to load the Fetch instruction, 1 clock to load 2 control flow instructions and
`1 clock to write instructions.
`
`It will contain 4096 instructions of 96 bits each.
`
`INSTRUCTION INDEX PORT control
`The instruction store is loaded by the CP thru the INSTRUCTION DATA,
`both
`reads
`and writes
`to
`the
`registers, The
`INSTRUCTION INDEX PORT is
` aute-incremented
`on
`
`
`INSTRUCTION DATA register.
`
`The next picture shows the various modes the CP can load the memory, The Sequencer has to keep track of the
`loading modes in order to wrap around the correct boundaries, The MSB of the INSTRUCTION INDEX PORT
`register contains the packet type for the seauencer to know where it must wrap around. The wrap around peints are
`arbitrary and they are specified in the VERTEX SHADER BASE and PIXEL SHADER BASE registers.
`
`Exhibit 2018.docR400_Sequercerdec
`
`83168 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ** ponans isa
`PMSINS/04-10:3 7AMONBO}OLARM
`
`AMD1044_0256923
`
`ATI Ex. 2106
`IPR2023-00922
`Page 12 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 12 of 223
`
`

`

`
`
`
`
`TWIRELVANeaddoAATLOaLOdd
`
`LOOZ/PL/LEL‘pajepdn
`
`
`
`
`
`
`
`
`
`|)YYepoosemddqdapoynsan0]Sesselppe3Bp0DSA
`“9podol)Guynsexeasvd@asgqvHsTaXld-MBqS.0]GudSMouy
`
`
`
`oe98p09SALon¥8P09Sd;;
`‘gpo9ayyBuynoexe
`oeHEIspooSomadOa}¥8Pp0DSd§9p09SA
`enb:HOTS0]SSLSOLD]:eeeveudosdde@8P0DSdJequanbesossyoig
`
`
`
`
`o}sesseippe-qngayeudoidde
`
`
`
`foroor,MOWAIA]UOIJONIJSU|JOSMAIS.dDdOOTY
`
`HeOPPOLOEHOTRYOTTOTENGPCTSLPUOOLeBy36eq19A045UOVDIIONIGUAdODDOUSISJON"ENUSPYUOD[LY@wxcaecrcs—sop-wousnbog“gopysePRAZTHT
`
`
`
`
`
`
`
`
`2|Ouls-|SGOWBurylend-03GOW XALYSA|vepoosn|POOSA|vaepcosA|2PodXSLYaA2ae2ewIPod00BurypoopareysaSV@YSCVHSPeleus
`
`
`
`
`
`|s60rS607
`WY9PODSAapo
`a38PODSd
`
`
`
`
`GEOELVATEKXXXXO-NADSrGLOdJequiees7Lo0g‘Iequaydespz
`
`porecaeraeraEELSROUeererrr
`
`
`
`
`dovdANN(AdaLNSINNOOdsalvaLidsaLvdSLVYNISIO
`lrASVUACWHS
`
`
`
`
`
`
`
`AMD1044_0256924
`
`ATI Ex. 2106
`IPR2023-00922
`Page 13 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 13 of 223
`
`
`
`
`
`
`
`

`

`
`
`
`
`PAGE
`R400 Sequencer Specification
`EDIT DATE
`ORIGINATE DATE
`
`i fh
`|
`|
`24 September, 2001
`4 September, 201546
`14 0f 35
`
`4. Sequencer Instructions
`All control flow instructions and move instructions are handled by the sequencer only. The ALUs will perform NOPs
`during this time (MOV PV,PV, PS,PS).
`
`5. Constant Stores
`The-constant-slore-ie-managed-by-ihe-GP-The sequencer is aware of where the constants are using a remaping
`table-alee-managed-by-the-GF. A likely size for the constant store is S/2x/28-1024x 128 bits.The-censiantelere-ia
`alec-planned-ie-be-shared.-The read BW from the constant store is 126 bits/clock and the write bandwith is 32/4
`bits/clock.
`
`In order to do constant store indexing, the sequencer must be loaded first with the indexes (that come from the
`GPRs). There are 144 wires fram the exit of the SP to the sequencer (9 bits pointers x 16 vertexes/clock). Since the
`data must pass thru the Shader pipe for the float to fixed convertion, there is a latency of 4 clocks (1 instruction)
`between the time the sequenceris loaded and the time one can index into the constant store. The assembly will look
`like this
`
`MOVA R1.X,R2.X%
`NOP
`ADD
`
`// Loads the sequencerwith the content of R2.X, also copies the content of R2.X into R1.*
`// latency of the float to fixed conversion
` R3,R4,CO/R2.X]// Uses the state from the sequencer to add R4 to CO[R2.X] into R3
`
`Note that we don’t really care about what is in the brackets because we use the state from the MOVAinstruction.
`R2.X is just written again for the sake of simplicity.
`
`The storage needed in the sequencerin order to support this feature is 2*64"9 bits = 1152bits.
`
`The texture state is also kept in a similar memory. The size of this memory is 192x128. Which lets us load a texture
`statein277?
`
`The control flow constant memory docsn’t sit behind a renaming table. itis register mapped and thus the driver must
`reload its content each time there is a state change.
`
`6. Looping and Branches
`Loops and branches are planned to be supported and will have to be deall with at lhe sequencerlevel. VVe plan on
`supporting constant loops and branches using a control program.
`
`The controlling state.
`As per Dx the following state is available for control flow:
`
`Boolean(15:0]
`loop_count[7:0][7:0]
`In addition:
`loop_start [7:0] [7:0]
`loop_step [7:0] [7:0]
`Exist to give more control to the controlling program.
`
`Wewill extend that in the R400 to:
`Boolean[255:0]
`Loop_count[7:O][15:0]
`Loop_Start[7:0) [15:0] times 2-3(one for constant,registert1,
`Loop_Step[?:0] [15:0] times 2-3(ane for constant, register,
`Loop_End[7:0] [15:0]
`
`register?)
`regisier2zregiater)
`
`Exhibit 2018.docR400_Sequercerdec
`
`83168 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ** ponans isa
`PRIAOt SOFAM OASOLOhSGRM
`:
`
`
`
`AMD1044_0256925
`
`ATI Ex. 2106
`IPR2023-00922
`Page 14 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 14 of 223
`
`

`

`
`
`DOCUMENT-REV. NUM.
`EDIT DATE
`ORIGINATE DATE
`GEN-CXXXXX-REVA
`4 September, 201545
`24 September, 2001
`{ISSUE: Howis the controlling state loaded and how many contexts do we have?}
`
`
`
`|
`
`
`
`PAGE
`15 of 35
`
`We have a stack of 4 elementsfor calling subroutines and 4 loop counters to allow for nested loops.
`
`6.2 The Control Flow Program
`The R300 uses a match method for control flow: The shader is executed, and at every instruction its address is
`compared with addresses (or address?) in a control table. The “event” in the contro! table can redirect operations in
`the program.
`
`The Method chosen for the R400 is a “control program”. The control program has ten basic instructions:
`
`Execute
`Conditional_execute
`Conditional_Execute_Predicates
`Conditional_jump
`Call
`Return
`Loop_start
`Loop_end
`End_of_clause
`NOP
`
`Execute, causes the specified number ofinstructionsin instruction store to be executed.
`Conditional_execute checks a condition first, and if true, causes the specified number of instructions in instruction
`store to be executed.
`Loop_start resets the corresponding loop counter to the start value on the first pass after it checks for the end
`condition and if met jumps over to a specified address.
`Loop_end increments (decrements?) the loop counter and jumps back the specified number of instructions.
`Call jumps to an address and pushes the IP counter on the stack. On the return instruction, the IP is poped from the
`stack.
`Conditional_execute_or_Jump executes a block of instructions or jumps to an address is the condition is not met.
`Conditional_execute_Predicates executes a block of instructionsif all bits in the predicate vectors meet the condition.
`End_of_clause marks the end ofa clause.
`Conditional_jumps jumps to an addressif the condition is met.
`NOPis a regular NOP
`
`NOTE THAT ALL JUMPS MUST JUMP TO EVEN CFP ADDRESSES. Thus the compiler must insert NOPs where
`needed to align the jumps on even CFP addresses.
`
`Also if the jump is logically bigger than pshader_cnt_size (or vshader_cnitl_size) we break the program (clause) and
`set the debug registers. If an execute or conditional_execute is lower than cntl_size or bigger than size we also break
`the program (clause) and set the debug registers.
`
`
`
`We haveto fit instructions into 48 bits in order to be able to put two control flow instruction perline in the instruction
`store.
`
`
`Execute
`
`47 464 4. 24 B.A
`Addressing |
`00001
`RESERVED
`|
`Instruction count
`|
`Exec Address
`Execute up to 4k instructions at the specified addressin the instruction memory.
`NOP
`
`47
`46... 42
`41...0
`Addressing
`00010
`RESERVED
`
`
`
`Enhibit 2018.decR(00_Sequencerdoc
`
`83169 Bytes*** © AT] Confidential. Reference Copyright Notice on Cover Page © ***
`PMSINS/04-10:3 7AMONBO}OLARM
`
`jonas psd
`
`AMD1044_0256926
`
`ATI Ex. 2106
`IPR2023-00922
`Page 15 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 15 of 223
`
`

`

`
`
`ORIGINATE DATE
`24 September, 2001
`
`EDIT DATE
`4 September, 201516
`
`R400 Sequencer Specification
`
`PAGE
`16 0f 35
`
`B T
`
`his-MUSTbeafonvardiume-This is a regular NOP,
`
`Conditionnal_Execute
`a 46 48
`|
`33
`32...24
`41... 34
`|
`23... 12
`|
`11...0
`
`Addressing |
`00011
`
`Boolean address
`Condition
`RESERVED | Instruction_count
`| Exec Address
`
`lf the specified boolean (8 bits can address 256 booleans) meets the specified condition then execute the specified
`instructions (up to 4k instructions)
`
`Conditionnal_Execute_Predicates
`23... 12
`|
`41... 38
`'
`37
`|
`36...24
`| Condition
`RESERVED | Instruction count
`Predicate vector
`
`47
`|Addressing
`
`'
`
`46... 42
`007100
`
`|
`
`11.90
`|
`| Exec Address
`
`a
`
`if AND/OR matches the condition execute the specified number of
`Check the AND/OR of all current predicate bits.
`instructions. VWe need to AND/OR this with the kill mask in order not to consider the pixels that aren't valid.
`
`
`
`— “Loop_Start 7
`
`a7
`[a6 42
`15... 4
`41... 16
`[
`3...0
`|I
`RESERVED
`Jump address
`00101
`Addressing
`
`Loop Start. Compares the loop count with the end value. If loop condition not met jump to the address. Forward jump
`only. Also computes the index value.
`
`
`Loop End
`
`3...0
`47
`[
`46... 42
`41... 16
`15...4
`
`
`|||
`-2oI:Si: o
`00111
`RESERVED
`Start address
`Addressing
`
`
`
`
`
`
`
`Loop end. Increments the counter by one and jumps BACKonily to the start of the loop.
`
`The waythis is described does not prevent nested loops, and the inclusion of the loop id makethis easy to do.
`Cail
`
`47 | 46. 42
`41...12
`|
`11...0
`| O7000
`RESERVED
`|
`Address
`Addressing |
`
`' J
`
`umpsto the specified address and pushes the IP counter on the stack.
`Retum
`47
`46.42
`|
`41.0
`01001
`RESERVED
`|I
`Addressing
`
`Pops the topmost address frorn the stack and jumps to that address. If nothing is on the stack, the program will just
`continue to the next instruction.
`
`
`Conditionnal_Jump _
`
`47| 46.42 34 P33 32.182 8
`
`
`01010 “Boolean address|Condition | RESERVED | FWonly Address
`
`Addressing
`i
`
`| i
`
`f condition met, jumps to the address. FORWARD jump only allowedif bit 12 set. Bit 12 is only an optimization for the
`compiler and should NOT be exposed to the API.
`
`Exhibit 2012.docR400_Sequencerdoc
`
`83168 Bytac*** © ATI Confidential. Reference Copyright Notice on Cover Page © ***
`PMIGNS010 AMIONBIOO46BM
`
`josie i954)
`
`AMD1044_0256927
`
`ATI Ex. 2106
`IPR2023-00922
`Page 16 of 223
`
`ATI Ex. 2106
`
`IPR2023-00922
`Page 16 of 223
`
`

`

`
`
`
`EDIT DATE
`DOCUMENT-REV. NUM.
`
`ORIGINATE DATE
`
`PAGE
`
`
`
`
`
`
`
`
`24 September, 2001
`4 September, 201548
`GEN-CXXXXX-REVA
`Aan
`
`End_of_Clause
`
`17 of 35
`
`41.0
`47] 46 ADT
`RESERVED
`01011
`Addressing
`Marks the end of a clause.
`
`=
`
`If the
`To preventinfinite loops, we will keep 9 bits loop counters instead of 8 (we are only able to loop 256 times).
`counter goes higher than 255 then the loop_end or the loop_start instruction is going to break the loop and set thde
`
`debug registers.FeaRoeMGISSR ngoxee aos:
`
`
`AS index er conetant indexiingiebite the-index axoseds the nurnber ofrequested regiisters.
`
`The basic modelis as follows:
`
`The render state defined the clause boundaries:
`Vertex_shader_fetch[7:0][7:0]
`// eight 8 bit pointers to the location where each clauses control program is located
`Vertex_shader_alu[7:O0][7:0]
`# eight 8 bit pointers to the location where each clauses control program is located
`Pixel_shader_fetch[7:0][7:0]
`# eight 8 bit pointers to the location where each clauses control program is located
`Pixel_shader_alu[7:0][7:0]
`#/ eight 8 bit pointers to the location where each clauses control program is located
`
`de (Formatted
`—
`Apointer value of FF meansthat theclause doesn't contain any instructions. __
`(with the
`The control program for a given clause is executed to completion before moving to another clause,
`exception of the pick two nature of the alu execution). The control program is the only program aware of the clause
`boundaries.
`
`6.3 Data dependant predicate instructions
`Data dependant conditionals will be supported in the R400. The only way we plan to support thoseis by supporting
`three vector/scalar predicate operations of the form:
`
`PRED_SETE_# - similar to SETE except that the result is ‘exported’ to the sequencer.
`PRED_SETGT_#- similar to SETGT except that the result is ‘exported’ to the sequencer
`PRED_SETGTE_# - similar to SETGTE exceptthat the result is ‘exported’ to the sequencer
`
`
`
`For the scalar operations only we will also support the two following instructions:
`PRED_SETEO_#-—SETEO
`PRED_SETE1_#~-SETE1
`
`- 1 or O that is sent using the same data path as the MOVA instruction. The sequencer will
`The export is a single bit
`maintain 4 sets of 64 bit precicate vectors (in fact 8 sets because we interleave two programsbut only 4 will be
`exposed) and useit to control the write masking. This predicate is not maintained

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket