`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`
`
`
`2
`a
`Author:
`
`7 May, 2007
`Laurent Lefebvre
`
`8 September, 20153
`e
`x4
`
`GEN-CXXXXX-REVA
`
`4 of 16
`
` AUTOMATICALLY UPDATED FIELDS:
`
`16178 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © *** poansoro4
`
`
`
`Issue To: | Copy No:
`
`
`R400 Sequencer Specification
`
`SEQ
`
`Version 0.32
`
`[it provides an overview of the
`Overview: This is an architectural specification for the R400 Sequencer block (SEQ).
`required capabilities and expected uses of the block. t also describes the block interfaces,
`internal sub-
`blocks, and provides internal state diagrams.
`
`Decument Location:
`Cwerforcey400iarchidocigik\RE\R400Seauencer.doc
`Current Intranet Search Title:
`R400 Sequencer Specification
`
`APPROVALS.
`
`Name/Dept
`
`Signature/Date
`
`
`
`Remarks:
`
`THIS DOCUMENT CONTAINS CONFIDENTIAL INFORMATION THAT COULD BE
`
`
` SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES
`
`
`
`INC. THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`
`
`“Copyright 2001, ATI Technologies Inc. All rights reserved. The material in this document constitutes an unpublished
`work created in 2001. The use of this copyright notice is intended to provide notice that ATI owns a copyright in this
`unpublished work. The copyright notice is not an admission that publication has occurred. This work contains
`
`confidential, proprietary information and trade secrets of ATI. No part of this document may be used, reproduced, or
`transmitted in any form or by any means without the prior written permission of ATI Technologies Inc.”
`
`
`
`Exhibit 2009 decR400_Sequencerdos
`
`POS W/O01 OS:17 PMOG01 02.15 PM
`
`ATI 2009
`
`LGv. ATI
`IPR2015-00325
`
`AMD1044_0256673
`
`ATI Ex. 2105
`IPR2023-00922
`Page 1 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 1 of 239
`
`
`
`
`
`EDIT DATE
`ORIGINATE DATE
`Vat
`PAGE
`R400 Sequencer Specification
`<a
`8
`7 May, 200%
`|
`8 September, 20153
`|
`2 of 16
`i
`4,
`bs 4A
`
`
`
`
`
`1.
`OVERVIEW ooo ccccceeeecceceecseeeecrsseeeeeneees 3
`Li
`Top Level Block Diagram... 4
`
`
`12 Data Flowgraph.. Be
`
`13. Control Graph. 1146
`
`2.
`INTERPOLATED DATA BUS....... 1340
`3.
`INSTRUCTION STORE ................ 1140
`4.
`CONSTANT STORE ..................000 {2i4
`4.
`LOGPING AND BRANCHES........ 1244
`6.
`REGISTER FILE ALLOCATION... 1244
`7.
`LEXTURE ARBITRATION... 1342
`8.
`9,
`CONTENT OF THE RESERVATION
`1b.
`STATION FIFOS ..ww eeee 1443
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`(SIOW- BUS) reereerreeneerrrererrerrrrerniee: td
`
` Table Of Contents
`
`li.
`
`THE OUTPUT FILE (RB FIFO AND
`
`
`
`
`
`
`
`12.1
`External Interfaces... 1443
`12.1.)
`Sequencer to Shader
`EngineBUS.TALS
`12.1.2
`Shader Engine to Output
`File
`1442
`
`644 Sequencerto Texture Unit bus
`
`6-4-5-Shader EnginetoREIPABus44
`Shader Engine to Texture
`12.1.3
`Unit Bus (Fast Bus
`
`12.1.4
`Sequencerto Texture Unit bus
`(Slow Bus) 1574
`
`Revision Changes:
`Rev 0.1 (Laurent Lefebvre)
`Date: May 7, 2001
`
`Rev 0.2 (Laurent Lefebvre)
`Date : July 9, 2007
`Rev 0.3 (Laurent Lefebvre)
`Date: August 6, 2001
`
`First draft.
`
`Changed the interfaces to reflect the changesin the
`SP. Added somedetails in the arbitration section.
`Reviewed the Sequencer spec after the meeting on
`August 3, 2007.
`
`Exhibit 2009.doch400_Sequence-des
`
`
`16178 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © *** perens 91.03
`PMOS/SIO1 03:47 MOTH3/01 02:10 PM
`
`AMD1044_0256674
`
`ATI Ex. 2105
`IPR2023-00922
`Page2 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 2 of 239
`
`
`
`
`
`i
`|
`
`7 May, 2001
`
`yA
`%
`p
`8 September, 20153
`
`GEN-CXXXXX-REVA
`
`i
`|
`
`3o0f 16
`
`
`
`
`
`|
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`
`1. Overview
`The sequencer first arbitrates between vectors of 16-4aybe-32) vertices that arrive directly from primitive assembly
`and vectors of $4-quads (16 pixels) (f2-pixels}-that are generated in the raster engine.
`
`The vertex or pixel program specifies how many GPR’s it needs to execute. The sequencer will not start the next
`vector until the needed spaceis available.
`
`It chooses as-twoALU clauses and a texture clause to execute, and
`The sequencer is based on the R300 design.
`executesall of the instructions in aa clause before looking for a new clause of the same type. Two ALU clauses are
`executed interleaved to hide the ALU latency. Each vector will have eight texture and eight ALU clauses, but clauses
`do not need to contain instructions. A vector of pixels or vertices ping-pongs along the sequencer FIFO, bouncing
`from texture reservation station to alu reservation station. A FIFO exists between each reservation stage, holding up
`vectors until the vector currently occupying a reservation station has left. A vector at a reservation station can be
`chosen te execute. The sequencer looks at all eight alu reservation stations to choose an alu clause to execute and
`all eight texture stations to choose a texture clause to execute. The arbitrator will give priority to clauses/reservation
`
`stations closer to the tep-bottern of the pipeline. It will not execute an alu clause until the texture fetches initiated by
`the previous texture clause have completed. There are two separate sets of reservation stations, one for pixel vectors
`and one for vertices vectors. This way a pixel can pass a vertex and a vertex can pass a pixel.
`
`
`To support the shader pipe the raster engine also contains the shader instruction cache and constant store. There
`are only one constant store for the whole chip and one instruction store. These will be shared amongthe four shader
`pipSs.
`
`Exhibit 2008.doeR400_Sequencerdee
`
`16178 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ** ponen5 9.93 oe
`PMOS/1 B01 OS: 17 PMONIS/O1 02:10 PM
`
`AMD1044_0256675
`
`ATI Ex. 2105
`IPR2023-00922
`Page 3 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 3 of 239
`
`
`
`
`
`|
`
`7 May, 2001
`
`8 Septernber, 20153
`
`R400 SequencerSpecification
`
`PAGE
`
`4 of 16
`
` |
`EDIT DATE
`ORIGINATE DATE
`| 1.1 Top Level Block Diagram
`
`Possible delay for available GPR|.gagfannmnnnmmnnsnnnnnnininnninannenns
`
`
`|___ FIFO
`
`vertex’pixel vector arbitrator
`
`
`
`
`
`
` |
`
`
`
`
`
`
`
`Texture clause 0 ——B
`
`—
`eservation station
`
`lee——[ TO ng
`(ALU clause 0
`hadj—teservation station
`[nntnnnnennannnnnnng
`:
`!
`L
`Texture clanse 1
`pee
`|___ gE
`»
`eservation station
`
`ALU clause 1
`i
`ot
`FIFO Legg
`reservation station
`eS ——
`
`extnre arbitrator Re
`jrexture clanse 2
`poe
`a eservation station
`Fro
`4
`hag——ALU clause 2
`Led
`keservationstation
`
`:
`Fro
`(rexture clanse 3
`
`reservation station
`<i
`FIFO
`|
`jg——ALU clause 3
`reservationstation
`1
`‘extre clanse 4
`r
`pel FES
`>
`»
`‘eservation station
`i
`FIFO
`{eel
`ALU clause 4
`fro en
`reservationstation
`:
`iPexture clause 5
`i
`
`reservationstation
`latfat————|FEO
`‘LUclause 5
`|
`reservation station
`eee
`:
`iPexture clause 6
`eservation station.
`
`exture arbitrator
`
`»
`
`‘
`
`<<
`
`FIFO
`
`i
`e ALU clause 6
`HD en
`reservation station
`nd [FES]
`PF
`rexnure clause 7
`:ALU clause 7
`<—
`ARO
`eservation station
`reservation station
`
`ilxio=1D ‘2 gOo oe=Go @a=iB 2°sn fete>@ aao=oO Pafo) Ssy® oS 3oO = s <@es‘OoioiDBg5a Qo3a ¢s 3xaKd
`
`The rasterizer always checks the vertices FIFO first and if allowed by the sequencer sends the data to the shader. If
`the vertex FIFO is emply then, the rasterizer takes the first entry of the pixel FIFO (@ vector of 32-16pixels) and
`sends it to the interpolators. Then the sequencer takes contral af the packet. The packet consists of 3 bits of state, 6-
`7 bits for the base address of the Shader program and someinformation on the coverage to determine texture LOD.
`All other information (2x2 adresses) is put in a FIFO (one for the pixels and one for the vertices) and retrieved when
`the packet finishesits last clause.
`
`Exhibit 2008JocR400Gequeacerdes
`
`16178 Bytes*** © AT! Confidential. Reference Copyright Notice on Cover Page © *** gsnensg93
`PMOS/1 B01 OS: 17 PMONIS/O1 02:10 PM
`ns
`
`AMD1044_0256676
`
`ATI Ex. 2105
`IPR2023-00922
`Page 4 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 4 of 239
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`
`
` -
`
`if the packel Is a vertex packel, upen reaching ALU clause 4 a can export the position ifthe position!is(ead. So the
`issues a register address for the return value (td). Then, it increments the counter of FIFO one-1 to signify to the ALU
`
`iA
`5 of 16
`GEN-CXXXXK-REVA
`8 September, 220153
`7 May, 2001
`On receipt of a packet, the input state machine (not pictured but just before the first FIFO) allocated enough space in
`the registers to store the interpolated values and temporaries. Following this, the input state machine stacks the
`packetin thefirst FIFO.
`
`On receipt of a command, the level 0 texture machine issues a texure request and corresponding register address for
`the texture address (ta). A small command (temd) is passed to the texture system identifying the current level number
`(0) as well as the register set-being-usedwrite address for the texture return data. One texture request is sent every 4
`clocks causing the texturing of four 2x2s worth of data (or 1G vertices). Once all the requests are sent the packetis
`put in FIFO 1.
`
`Upon recept of the return data (identified by the temd containing the level number 0), the level 0 texture machine
`
`‘that the data is ready to be processed.
`
`On receipt of a command, the level OQ ALU machine first decrements the input FIFO counter and then issues a
`complete set of level O shader instructions. For each instruction, the state machine generates 3 source addresses,
`
`
`one destination address (2-3cycles later) and an instruction‘d-wich-is-tised-teJndex-inio.the.inskaiction store. Once
`th
`the last instruction as been issued, the packet is put into FIFO 2.
`
`given time (and two arbitrers)in-thic-cacetheinsituctoné-ofa
`
`There will always be two active ALU clauses al any
`
`yectorare-interleaved-with-the-instructions-ofthe-other-vecter, One arbitrer will arbitrate over the odd clock cycles and
`the other one will arbitrate over the even clock cycles. The only constraints between the two arbitrers is thai they are
`not allowed to pick the same clause number as they other one is currently working on if the packet os of the same
`pe.
`
`positional data, the location wherethe vertex datais to be out iis also sent (parameter data pointers).
`All other level process in the same way until the packetfinally reaches the last ALU machine (8). On completion of the
`level 8 ALU clause, a valid bit is sent to the Render Backend which picks up the color data. This requires that the last
`instruction writes to the output register — a condition that is almost always true.
`If the packetwas a vertex packet,
`
`instead of sending the valid bit to the RB, itis sent to the PA,
`so it can know that the data present in the parameter store is valid.
`Only one-two ALU state machine may have access to the SRAMregister file address bus or the instruction decode
`bus at one time. Similarly, only one texture state machine may have access to the SRAMreqister file address bus at
`
`one time. Arbitration is performed by tve-three arbitrer blocks (eme-hwofor the ALU state machines and one for the
`texture state machines). The arbitrers always favor the higher number state machines, preventing a bunch ofhalf
`finished jobs from clogging up the Sk.AMregister Sfiles.
`
`Each state machine maintains an address pointer specifying where the 16-or-32} entries vector is located in the
`SRAMregisterfile (the texture machine has two pointers one for the read address and one for the write). Upon
`completion of its job,
`the address pointer is incremented by a predefined amount equal to the total number of
`registers required by the shading code. A comparison of the address pointer for the first state machine in the chain
`(the input state machine}, and the last machine in the chain (the level 8 ALU machine), gives an indication of how
`much unallocatedSRAMreqgister file memory is available
`
`
`
`Exhibit 2008.doeRA00_Sequencendes
`
`16178 Bytes** © AT! Confidential. Reference Copyright Notice on Cover Page © ** onan o.n3)
`PMGGH WO1 OS. 17 PNORNGIOT O20 PM
`
`AMD1044_0256677
`
`ATI Ex. 2105
`IPR2023-00922
`Page 5 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 5 of 239
`
`
`
`
`
`PAGE
`ORIGINATE DATE
`R400 Sequencer Specification
`EDIT DATE
`6 of 16
`& September, 20153
`£.
`is
` interpolated data trom RE
`
`512x128 (built as 4:1
`
`x128 or 16 128x32
`
` <j
`
`7 May, 2001
`
` Register File
`
`
`
`
`Address to texure
`or vertex parameter data to RE through texture block
`or pixel data to RB through texture block
`
`W382
`128 bit data
`
`
`
`432 bit MAC units
`
`128 bit scatar/vector |
`ALU
`
`
`
`
`
`
`control from RE
`
`constants irom RE
`
`control from RE
`
`Exhibit 2000docRAte.toquenserdsc
`
`16178 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ** sopensoins
`PMGGH WO1 OS. 17 PNORNGIOT O20 PM
`
`AMD1044_0256678
`
`ATI Ex. 2105
`IPR2023-00922
`Page 6 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 6 of 239
`
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`GEN-CXXXXX-REVA
`
`7 May, 2001
`
`
`
`& September, 20153WEES Yat
`
`
`
`
`
` aI
`
`ScalarUnit
`
`
`
`
`
`
` pipeline stage
`
`Register File
`
`
`texture request
`
` eftiles
`
`
`
`|datawimitivedatafromREinto]S$8regis.
`
`
`
`
`
`
`
`
`
`
`
`
`7 of 16
`
`|I
`4
`||
`Register File
`
`|
`
`
`
`scalar inpuvoutput
`
`iblLo~
`MAG
`|
`texture req
`J
`
`pipeline stage
`
`
`Registe
`
`
`
`file
`
`
`
`
`
`
`texture reques
`
`
`
`(scalar iInputfoutput
` L
`
`pipeline
`
`
`
`
`(
`Exhiblt2008docR400_Sequencendec
`16178Bytes*** © ATI Confidential. Reference Copyright dotice un Cover Page © ***
`
`PMGGH WO1 OS. 17 PNORNGIOT O20 PM
`
`v
`
`sonansound)
`
`AMD1044_0256679
`
`ATI Ex. 2105
`IPR2023-00922
`Page 7 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 7 of 239
`
`
`
`
`
`
`
` |
`|
`
`ORIGINATE DATE
`
`? May, 2001
`
`EDIT DATE
`
`en
`et
`8 Septernber, 20153
`
`R400 Sequencer Specification
`
`PAGE
`
`8 of 16
`
`Exhibit 2008JocR400Gequeacerdes
`
`16178 Bytes*** © AT! Confidential. Reference Copyright Notice on Cover Page © *** gsnensg93
`PMOS/1 B01 OS: 17 PMONIS/O1 02:10 PM
`ns
`
`AMD1044_0256680
`
`ATI Ex. 2105
`IPR2023-00922
`Page8 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 8 of 239
`
`
`
`iec-on-GoverPage-O-—sorp40 91.5 S
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Register File
`
`— a
`
`
`
`frereggy (TT
`reg
`~\
`
`
`
`text]
`[reques
`~
`
`LL
`
`
`
`
`
`Lt
`L
`is3
`Es
`&
`12.
`Pa
`a
`§
`2
`ity
`
`5
`fexture|S quest
`c
`a
`2
`
`.
`
`~
`
`'
`|
`\
`i
`
`|
`|
`
`|
`
`Register File
`
`nn (
`
`scalarTiputioutput
`
`pipeline stage
`|
`
`
`ifs>
`
`7
`.
`RegisterFile
`
`&8
`
`w#
`
`
`
`
`
`
`a ("|
`ft
`|
`i
`7 oo
`< |
`scalar inpubfoutput
`aa
`pipeline stage
`|
`|
`— mo
`'
`
`|
`|
`
`RegisterFile a
`
`
`aq
`
`LL!
`_
`|
`
`es
`TTT
`
`(Sak input/output
`
`————
`{
`i
`
`|
`cr
`
`_
`
`No |
`texture rel
`pst
`i
`
`|
`
`wo
`8
`oe
`£
`
`
`to Primitive Assembly Unit or RenderBackend
`)
` PMGGH WO1 OS. 17 PNORNGIOT O20 PM
`Exhibit298¢rbrctOG-Sequemcer:
`5
`
`AMD1044_0256681
`
`ATI Ex. 2105
`IPR2023-00922
`Page 9 of 239
`
`7 May, 2001
`
`uw
`ce£
`
`2 2
`
`RENEE
`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`
`
`
`8 September, 20153iene} sR
`
`
`
`GEN-CXXXXX-REVA
`
`9 of 16
`
`
`
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 9 of 239
`
`€
`
`
`||||||
`
` 4-yae,
`
`a |
`
`
`TEXTURE
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`OPERAND MUX
`
`
`
`
`
`
`
`. ALU Wee
`
`ALU
`
`SCALAR
`ALU
`
`
`
`
`
`
`
`
`
`
`
` 7 May, 2001
`
`ORIGINATE DATE
`
`EDIT DATE
`
`8 September, 201534 et
`
`
`R400 Sequencer Specification
`
`PAGE
`10 of 16
`
`Interpolated
`data / Vertex indexes
`
`F|
`REGISTER FILE
`
`:
`see
`<—
`
`|
`|
`
`INSTRUCTION
`STOREICACHE
`
`-
`
`-
`CONSTANT
`STORE
`
`The gray area represents blocks that are replicated 4 times per shader pipe (16 times on the overall chip).
`
`Exhibit 2000docRAte.toquenserdsc
`
`16178 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ** sopensoins
`PMOBH Q/O1 OS: 57 PMOR13/01 O2:10 PM
`:
`
`AMD1044_0256682
`
`ATI Ex. 2105
`IPR2023-00922
`Page 10 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 10 of 239
`
`
`
`
`
`ORIGINATE BATE
`7 May, 2007
`
`|
`L
`
`EDIT DATE
`& September, 20153
`4,
`p
`Pes
`
`DOCUMENT-REV. NUM.
`GEN-CoO000¢-REVA,
`
`PAGE
`11 of 16
`
`|
`
`L
` |
`
`1.3 Control Graph
`
`Ciause # + Rdy
`
`WrAddr
`eMD
`
`CST
`
`Be
`
`IS
`
`|
`
`|
`|
`|
`|
`
`Phase
`RdAddr
`
`
`CME esti Gers
`
`SEQ
`
`CST |
`
`I
`
`WrAddr
`
`RdAddr
`PARB
`
`7
`,
`
`I
`|
`
`
`A
`B CWrvec |
`WrSeal wraddr
`
`
`
`™
`
`WrAddr
`
`SP
`
`|
`||
`
`OF
`
`In green is represented the Texture control interface, in red the ALU control interface, in blue the Interpolated/Vector
`control interface and in purple is the outputfile control interface.
`
`2. Interpolated data bus
`physically divided (one 32x 128 per MAC) and we don’t have the
`Since each of the register file is actually
`
`
`a maximum size vector of vertices in the parameter buffer, we need fointerpolate on a parameter basis rather than on
`a quad basis. So the order to the registerfile will be:
`
`QOPO Q1P0 G2P0 OSP0 QOP1 GIP1 G2P1 Q3P2 GOP3 Q1P3 ..
`
`Instruction Store
`3,
`te 2000 instructions of 96 bits
`contain up
`It may
`There is
`going
`te be only one instruction store for the whole chip.
`
`each, The instruction store is loaded by
`the sequencer using
`the memory hub. The read bandwith from this store is
`
`24 bits/clock/pipe. To achieve this this instruction store is likely
`to be broken up
`into 4 blocks. An ALU instruction
`
`
`
`section CO R/1W) split in bve and a texture section (1R/1W)
`also solil in two, The bandwith out of those memories is 98
`
`
`bits/clock,
`
`
`
`tted: Bullets and Numberin >| Formas
`
`i Formatted: Bullets and Numbering
`
`Exhiblt 2000.doct40G_Sequencer.dec
`
`16178 Byes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ***
`PMO8/1 9/01 03:47 PMOTNS/01 02:10 PM
`
`nonen5 91.93 ee
`
`AMD1044_0256683
`
`ATI Ex. 2105
`IPR2023-00922
`Page 11 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 11 of 239
`
`
`
`
`
`
`
`
`
`PAGE
`EDIT DATE
`ORIGINATE DATE
`ote | —
`8 September, 20153
`7 May, 2001
`
`hen J. {FormattedBulletsandNumberingGYanecs
`
`
`4. Constant Store
`Oe
`a remapin
`the CP. The sequencer is aware of where the constanis are using
`The constant store is managed by
`
`table also managed by the CP. A likely
`size Jor the constant siore ie 512x128 bits. The constant siore is also planned
`
`to be shared. The read BWfrom the constant store is 512/4 bits/clock/pipe and the write bandwith is 32/4 bits/clock,
`
`
`
`R400 Sequencer Specification
`
`
`
`5. Looping and Branches
`Loops and branches are planned to be supported and will have to be dealt with at the sequencer level. However, itis
`
`
`still unclear if we plan on supparting data dependent branches ornot,
`
`
`
`6. Register file allocation
`the registerfile in
`In both cases,
`pixels can either be static or dynamic,
`The register file allocation for vertices and
`
`
`pixels and one for vertices).
`In the dynamic case the boundary between
`managed using
`two round robins (ene for
`pixels and verticesis allowed to move, in the sltatic caseitis fixed to VERTEXREGSIZEfor vertices and 256-
`
`VERTEXREGSIZEfor pixels.
`
`ose) Formatted: Bullets and Numbering
`OS as
`=
`:
`:
`oo
`
`*
`
`~
`
`“ 2 id Formatted: Bullets and Numbering
`
`Exhibit 2008JocR400Gequeacerdes
`
`16178 Bytes*** © AT! Confidential. Reference Copyright Notice on Cover Page © *** gsnensg93
`PMOS/1 B01 OS: 17 PMONIS/O1 02:10 PM
`ns
`
`AMD1044_0256684
`
`ATI Ex. 2105
`IPR2023-00922
`Page 12 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 12 of 239
`
`
`
`
`
`16178 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ** jonons 9.9 ee
`
`
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`7 May, 2001
`
`8 September, 20153
`es A Ra
`
`GEN-CXXXXX-REVA
`
`PAGE
`
`13 of 16
`
`Above is an example of how the algorithm works. Vertices come in from top to botiom: pixels come in from botiom to
`
`top. Vertices are in orange andpixels in creen. Theblue line Is the
`tall of the verlices and thegreenlineis the tallof
`the pixels. Thus anything between the two lines is shared. When pixele meets vertices the line turns white and the
`boundary is static until both vertices and pixels share fhe same “unallocated bubble”. Then the boundary as allowed
`to move again.
`
`2-7. Texture Arbitration
`The texture arbitration logic chooses one of the 8 potentially pending texture clauses to be executed. The choice is
`made by looking at the fifos from 7 to 0 and picking the first one ready to execute. Once chosen, the clause state
`machine will send one 2x2 texture fetch per clock (or 4 fetches in one clock every 4 clocks) until all the texture fetch
`instructions of the clause are sent. This means that there cannot be any dependencies between two texture fetches
`of the same clause.
`
`The arbitrator will not wait for the texture fetches to return prior to selecting another clause for execution. The texture
`pipe will be able to handle up to +20X(’?) in flight texture fetches and thus there can be a fair number of active clauses
`waiting for their texture return data.
`
`3-8. ALU Arbitration
`ALU arbitration proceeds in almost the same way than texture arbitration. The ALU arbitration logic chooses one of
`the 8 potentially pending ALU clauses to be executed. The choice is made by looking at the fifos from 7 to 0 and
`picking the first ¢one ready to execule
`=
`,
`
` state. mashine-4isgues- the first instruction forthe.first.sub-vector and then, 4.-elooks. later, the.firstinstruction of the
`
`second-cub-vecter-and-so-on-until-the-siause-is-finished.. There are two ALU arbitrers, one for the even clocks and
`
`Exhibit 2009docR400_Sequencondec
`
`PMOS/1 B01 OS:
`
`17 PMONIS/O1 02:10 PM
`
`
`
`
`
`ee = Formatted: Bullets and Numbering
`
`sos
`
`Formatted: Bullets and Numbering
`
`AMD1044_0256685
`
`ATI Ex. 2105
`IPR2023-00922
`Page 13 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 13 of 239
`
`
`
`
`
`
`
`
`
`
` |
`PAGE
`R400 Sequencer Specification
`EDIT DATE
`ORIGINATE DATE
`|
`shen
`14 0f 16
`85September, 201 53
`7 May, 2001
`onefor the add clocks. For exemple, hereiis the seg uencin “of twointerleaved ALUclauses (E and © stands for Even
`
`
`and Odd):
`
`Elnst0 OinstO Einett Oinsti Einst2Oinsi2Einst0 OinstS Einst) Oinst4 Elnsit2 Oinetd..
`Proceeding this way hides the latency of 8 clocks of the ALUs.
`
`4-9. Handling Stalls
`When the output file is full, the sequencer prevents the ALU arbitration logic to select the last clause (this way nothing
`can exit the shader pipe until there is place in the output file. If we-have-the-abilityto-export-etanyclausethe packet is
`a vertex packet and the position buffer is full (POS FULL) then the sequencer also prevents a thread to enter the
`exporting clause(47). The sequencerwill set the OUT_FILE_FULL signal n clocks before the outputfile is actually full
`and thus the ALU arbitrer will be able read this signal and act accordingly by not preventing exporting clauses to
`proceed.
`
`5-10. Content of the reservation station FIFOs
`3 bits of Render State-and 6-7 bits for the base address of the instruction store and some bits for LOD correction.
`Every other information (such as the coverage mask, quad address, etc.) is put in a FIFO and is retrieved when the
`quad exits the shader pipe to enter in the outputfile buffer. Since pixels and vertices are kept in order in the shader
`pipe, we only need two fifos (one for vertices and one for pixels) deep enough to cover the shaderpipe latency. This
`size will be determined later when we will Know the size of the small fifos between the reservation stations.
`
`eo) Formatted: Bullets and Numbering
`
`abe : Formatted: Bullets and Numbering
`
`6-11. The Output File (RB FIFO and Parameter Cache)
`The output file is where program results are exported when the pixel/vertex shaderfinishes. It constists of a 512x128
`memory cell that is statically divided between pixels and vertices. Each-section-is-aregular-FIEO.-The outputfile has
`1 write port and 1 read port. The sequencer is responsible for managing the addresses of this output file and for
`stalling the shader pipe should this outputfile fill up. The managementis done by keeping the tail and head pointers
`of each sections (pixels and vertices) and incrementing them using a simple RoundRobin allocation policy. The
`sequencer must also arbitrate between the PA and the RB for the use of the read port. This arbitration will either be
`priority based or just interleaved evenly (1 read every 2 clocks for each of the blocks).
`
`S & Formatted: Bullets and Numbering
`s
`
`:
`
`p= <| Formatted: Bullets and Numbering
`
`7-12. Interfaces
`
`7-+12.1 External interfaces
`
`$4+412.1.1Sequencer to Shader Engine Bus
`This is a bus that sends the instruction and constant data to all 4 Sub-Engines of the Shader. Because a newinstruction
`is needed only every 4 clocks, the width of the bus is divided by 4 and both constants and instruction are sent over
`these 4 clocks.
`
`
`Name
`Direction
`Bits
`| Description
`Instruction Start
`| SEQ-> SP
`ic High on first cycle of transfer
`SEQ-> SP
`32
`|
`128bits transferred over 4 cycles, alphafirst... blue last
`Constant 0
`
`|Constant1_SEQSP [32 | 128 bits transferred over 4 cycles, alpha first. bluelast
`[SEQ->SP_> SP
`.30
`|| 120 bits transferred over 4 cycles (order TBD) 7
`Instruction
`
`
`
`
`
`
`12.1.2Shader Engine to Output File
`Every clock each Sub-Engine can output 128 bits of ‘vector’ data and 32 bits of ‘scalar’ data to an output file (7). This
`data will be compressed into 128 bits total prior to storage in output file.
`
`
`ao Formatted: Bullets and Numbering
`
`“UL_Vector_Out
`
` Name
`| Exhibit 2008JocR400Gequeacerdes
`
`| SP-> OF
`
`| 128
`
`| Vector Data out
`
`[Bits [Description |
`_ f Direction_
`16178 Bytes*** © AT! Confidential. Reference Copyright Notice on Cover Page © *** gsnensg93
`PMOS/1 B01 OS: 17 PMONIS/O1 02:10 PM
`
`AMD1044_0256686
`
`ATI Ex. 2105
`IPR2023-00922
`Page 14 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 14 of 239
`
`
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`
`
`
`
`7 May, 2001
`8 September, 20153
`GEN-CXXKXX-REVA
`15 of 16
`- FNnesennaeneeereesn anernaeneeenseneencreteneveanrerturran wuneiemameeswanenanunenanneewens1 pecteenrennanneanrennennesnnen - vonesoswireke
`
`
`
`
`
`ULScalarOut
`_SP.> OF
`32
`_| Vector Data out
`URVectorOut _
`SP->OF
`i
`UR_Scalar_Out
`| SP-> OF
`
`
`
`
`
`
`
`
`[Namei s—~—“—*;~s~*~C=sCirco[Bits|Description
`
`LL_Vecter_Out
`| SP-> OF
`| 128
`| Vector Data out
`
`| 32____ Vector Data out
`LL Scalar Out
`SP-> OF
`
`
`|LR_Vector_Out [SP->OF [128|VectorDataout
`LR_Scalar_Out
`| SP-> OF
`|32
`| Vector Data out
`:
`:
`:
`aS
`:
`
`F+4312.1.3 Shader Engine to Texture Unit Bus (Fast Bus)
`One quad's worth of addresses is transferred to Texture Unit every clock. These are sourced from a different pixel
`within each of the sub-engines repeating every 4 clocks. The register-fleregister file index to read must precede the
`data by 2 clocks. The Read address associated with Quad 0 must be sent 1 clock after the Instruction Start signal is
`sent, so that data is read 3 clocks after the Instruction Start.
`
`a ce
`
`:
`
`
`
`
`
`One Quad's worth of Texture Data may be written to the Register FileRegisterfile every clock. These are directed to a
`different pixel of the sub-engines repeating every 4 clocks. The register fleregister file index to write must accompany
`the data. Data and Index associated with the Quad 0 must be sent 3 clocks after the Instruction Start signal is sent.
`
`Name
`Direction
`| Bits
`| Description
`;
`
`Tex_Read_Register_Index
`SEQ->SP
`|8
`Index into Register-FileR:
`r files for reading Texture
`|
`| Address
`Tex_RegFile_Read_Data
`SP->TEX
`5i2 | 4 Texture Addresses read from the RegisterFleRegister
`| file
`Index into RegisterfieRegister file for write of returned
`| Texture Data
`
`|
`| SEQ->TEX
`I
`
`|8
`
`Tex_Write_Register_Index
`
`
`
`
`
`:
`
`:
`
`:
`
`ee
`
`:
`
`|
`
`:
`ee
`
`7+412.1.4 Sequencer to Texture Unit bus (Siow Bus) —
`
`Once every four clock, the texture unit sends to the sequencer on wich clause it is now working and if the data in the
`registers is ready or not. This way the sequencer can update the texture counters for the reservation station fifos. The
`sequencer also provides the intruction and constants for the texture fetch to execute and the address in the register
`fileregister file where to write the texture return data.
`
`Name | Direction BitsDescription
`ee
`
`Tex_Ready
`_TEX— SEQ
`4
`| Data ready
`Tex Clause Num
`TEX— SEQ
`3
`| Clause number
`[Texcst 7 SEQ--TEX| Po_Texture constants Xbitssentover4clocks
`Tex_inst
`| SEQ-»TEX
`L?
`| Texture fetch instruction X bits sent over 4 clocks
`
`
`
`
`
`
`
`-
`.
`__[- =
`Fommatted: Bullets and Numbering
`7+312.1.5 Shader Engine to RE/PA Bus
`Se
`=
`
`Name
`Direction
`Bits | Description
`Interpolator_Register_Index [SEQ->SP
`8
`-Index into Register-FileRegister
`files
`for write
`of
`_
`|
`__Interpolator/index Data
`.
`Sas
`oe
`Interpolator_Write_Mask
`SEQ->SP
`1
`| Write Mask. The same write mask is usedforall 4 pixels
`see
`:
`ae
`:
`
`Interpolator_Write_Data
`RE/PA->SP
`| 512
`| 4interpolated vectors or vectors of indices
`:
`:
`: Ss eae
`. ed Formatted: Bullets and Numbering
`=
`=
`12.1.6 PA to sequencer
`“| Ser
`e
`
`
`Name{Formatted——S™~C~S~S|Direction | Bis | Description
`
`
`
`Adress,
`PASE,
`2,
`, Dealocation adresssent by the PAtelling the Sequencer ||
`__.
`700 pe--eeseeee a
`
`|
`| thet tis now possible io free this space in ihe parameter
`[pl © Formatted _
`
`|
`| buffer, This token is a pointer in the parameter cache and | {Formatted
`SR
`| 4 bits
`to
`tell the
`size
`wichis to
`be
`freed up
`
`
`
`
`
`.
`
`|
`
`Exif2009.docR400_Sequencendee
`
`16178 Btes** © ATI Confidential. Reference Copyright Notice on Cover Page © ** poponsgnd Ee
`PMOS/1 B01 OS: 17 PMONIS/O1 02:10 PM
`
`:
`
`es
`
`: .
`
`AMD1044_0256687
`
`ATI Ex. 2105
`IPR2023-00922
`Page 15 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 15 of 239
`
`
`
`
`
` ORIGINATE DATE Sadia
`
`| 8-13. Open issues
`
`|
`
`7 May, 2001
`
`EDIT DATE
`8 September, 20153
`
`R400 Sequencer Specification
`
`PAGE
`16 of 16 — —— -
`
`-
`
`There is currently an issue with constants. If the constants are not the same for the whole vector of vertices, we don't
`have the bandwith from the texture store to feed the ALUs. Two solutions exists for this problem:
`1) Let the compiler handle the case and put those instructions in a texture clause so we can use the
`bandwith there to operate. This requires a significant amount of temporary storage in the register store.
`2) Waterfall down the pipe allowing only at a given time the vertices having the same constants to operate in
`parrallel. This might in the worst case slow us down by a factor of 16.
`
`a rrr
`
`Need to de some testing on the size of the register-Heregister file as well as on the registerflerecister file allocation
`method (dynamic VSstatic).
`
`| Abilityto-exportatanyclause?
`
`Saving power?
`
`| Are we working on 32 vertices at a time or 167
`
`Size of the fifo containing the information of a vector of pixels/vertices. And size of the fifos before the reservation
`stations.
`
`SequencerInstruction memory, and constant memory.
`
`Arbitration policy for the outputfile.
`
`Loops and branches.
`
`The parameter cache may end up in the PA rather than in ine RS. Parameter cache management thus may change.
`
`Exhibit 2008JocR400Gequeacerdes
`
`16178 Bytes*** © AT! Confidential. Reference Copyright Notice on Cover Page © *** gsnensg93
`PMOS/1 B01 OS: 17 PMONIS/O1 02:10 PM
`ns
`
`AMD1044_0256688
`
`ATI Ex. 2105
`IPR2023-00922
`Page 16 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 16 of 239
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`
`Author:
`
`14 Auguel, 200144
`Laurent Lefebvre
`
`4 September, 201524
`Lu
`
`GEN-CXXXXX-REVA
`
`4 of 20
`
`
`
`
`issue To: | Copy No:
`
`
`
`
`
`
`R400 Sequencer Specification
`
`SEQ
`
`Version 0.42
`
`It provides an overview of the
`Overview: This is an archiectural specification for the R400 Sequencer block (SEQ).
`required capabilities and expected uses of the block.
`it also describes the block interfaces,
`internal sub-
`blocks, and provides internal stale diagrams.
`
`AUTOMATICALLY UPDATED FIELDS:
`Document Location:
`Ciiperforcer40Q\archidoc\whiRE\R400_Sequencerdec
`Current intranet Search Title:
`R400 Sequencer Specification
`
`
`:
`-
`oo
`:
`HOSE
`APRROVALS
`:
`:
`
`ES
`:
`eee ene “ Name/Dept ©
`ees
`Signature/Date
`
`
`
`
`
`Remarks:
`
`
`
`
`
` THIS DOCUMENT CONTAINS CONFIDENTIAL INFORMATION THAT COULD BE
`
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES
`INC. THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`
`
`“Copyright 2001, ATI Technologies Inc. All rights reserved. The material in this document constitutes an unpublished
`work created in 2001. The use of this copyright notice is intended to provide notice that ATI owns a copyright in this
`unpublished work. The copyright notice is not an admission that publication has occurred. This work contains
`confidential, proprietary information and trade secrets of ATI. No part of this document may be used, reproduced, or
`transmitted in any form or by any means without the prior written permission of ATI Technologies Inc.”
`
`|:
`
`Exhibit 2010 docR400_Sequencerdos
`
`25504 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page @ *** poping youd
`PMG843/01.0547 PMOTHI0245BM
`
`ATT 2010
`
`LGv. ATI
`TPR2015-00325
`
`AMD1044_0256689
`
`ATI Ex. 2105
`IPR2023-00922
`Page 17 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 17 of 239
`
`
`
`
`
`SEs
`PAGE
`R400 Sequencer Specification
`EDIT DATE
`ORIGINATE DATE
`14August,2001444September,2015242 of 20
`
`
`
`
`
`Table Of Conten