throbber

`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`
`
`
`2
`a
`Author:
`
`7 May, 2007
`Laurent Lefebvre
`
`8 September, 20153
`e
`x4
`
`GEN-CXXXXX-REVA
`
`4 of 16
`
` AUTOMATICALLY UPDATED FIELDS:
`
`16178 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © *** poansoro4
`
`
`
`Issue To: | Copy No:
`
`
`R400 Sequencer Specification
`
`SEQ
`
`Version 0.32
`
`[it provides an overview of the
`Overview: This is an architectural specification for the R400 Sequencer block (SEQ).
`required capabilities and expected uses of the block. t also describes the block interfaces,
`internal sub-
`blocks, and provides internal state diagrams.
`
`Decument Location:
`Cwerforcey400iarchidocigik\RE\R400Seauencer.doc
`Current Intranet Search Title:
`R400 Sequencer Specification
`
`APPROVALS.
`
`Name/Dept
`
`Signature/Date
`
`
`
`Remarks:
`
`THIS DOCUMENT CONTAINS CONFIDENTIAL INFORMATION THAT COULD BE
`
`
` SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES
`
`
`
`INC. THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`
`
`“Copyright 2001, ATI Technologies Inc. All rights reserved. The material in this document constitutes an unpublished
`work created in 2001. The use of this copyright notice is intended to provide notice that ATI owns a copyright in this
`unpublished work. The copyright notice is not an admission that publication has occurred. This work contains
`
`confidential, proprietary information and trade secrets of ATI. No part of this document may be used, reproduced, or
`transmitted in any form or by any means without the prior written permission of ATI Technologies Inc.”
`
`
`
`Exhibit 2009 decR400_Sequencerdos
`
`POS W/O01 OS:17 PMOG01 02.15 PM
`
`ATI 2009
`
`LGv. ATI
`IPR2015-00325
`
`AMD1044_0256673
`
`ATI Ex. 2105
`IPR2023-00922
`Page 1 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 1 of 239
`
`

`

`
`
`EDIT DATE
`ORIGINATE DATE
`Vat
`PAGE
`R400 Sequencer Specification
`<a
`8
`7 May, 200%
`|
`8 September, 20153
`|
`2 of 16
`i
`4,
`bs 4A
`
`
`
`
`
`1.
`OVERVIEW ooo ccccceeeecceceecseeeecrsseeeeeneees 3
`Li
`Top Level Block Diagram... 4
`
`
`12 Data Flowgraph.. Be
`
`13. Control Graph. 1146
`
`2.
`INTERPOLATED DATA BUS....... 1340
`3.
`INSTRUCTION STORE ................ 1140
`4.
`CONSTANT STORE ..................000 {2i4
`4.
`LOGPING AND BRANCHES........ 1244
`6.
`REGISTER FILE ALLOCATION... 1244
`7.
`LEXTURE ARBITRATION... 1342
`8.
`9,
`CONTENT OF THE RESERVATION
`1b.
`STATION FIFOS ..ww eeee 1443
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`(SIOW- BUS) reereerreeneerrrererrerrrrerniee: td
`
` Table Of Contents
`
`li.
`
`THE OUTPUT FILE (RB FIFO AND
`
`
`
`
`
`
`
`12.1
`External Interfaces... 1443
`12.1.)
`Sequencer to Shader
`EngineBUS.TALS
`12.1.2
`Shader Engine to Output
`File
`1442
`
`644 Sequencerto Texture Unit bus
`
`6-4-5-Shader EnginetoREIPABus44
`Shader Engine to Texture
`12.1.3
`Unit Bus (Fast Bus
`
`12.1.4
`Sequencerto Texture Unit bus
`(Slow Bus) 1574
`
`Revision Changes:
`Rev 0.1 (Laurent Lefebvre)
`Date: May 7, 2001
`
`Rev 0.2 (Laurent Lefebvre)
`Date : July 9, 2007
`Rev 0.3 (Laurent Lefebvre)
`Date: August 6, 2001
`
`First draft.
`
`Changed the interfaces to reflect the changesin the
`SP. Added somedetails in the arbitration section.
`Reviewed the Sequencer spec after the meeting on
`August 3, 2007.
`
`Exhibit 2009.doch400_Sequence-des
`
`
`16178 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © *** perens 91.03
`PMOS/SIO1 03:47 MOTH3/01 02:10 PM
`
`AMD1044_0256674
`
`ATI Ex. 2105
`IPR2023-00922
`Page2 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 2 of 239
`
`

`

`
`
`i
`|
`
`7 May, 2001
`
`yA
`%
`p
`8 September, 20153
`
`GEN-CXXXXX-REVA
`
`i
`|
`
`3o0f 16
`
`
`
`
`
`|
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`
`1. Overview
`The sequencer first arbitrates between vectors of 16-4aybe-32) vertices that arrive directly from primitive assembly
`and vectors of $4-quads (16 pixels) (f2-pixels}-that are generated in the raster engine.
`
`The vertex or pixel program specifies how many GPR’s it needs to execute. The sequencer will not start the next
`vector until the needed spaceis available.
`
`It chooses as-twoALU clauses and a texture clause to execute, and
`The sequencer is based on the R300 design.
`executesall of the instructions in aa clause before looking for a new clause of the same type. Two ALU clauses are
`executed interleaved to hide the ALU latency. Each vector will have eight texture and eight ALU clauses, but clauses
`do not need to contain instructions. A vector of pixels or vertices ping-pongs along the sequencer FIFO, bouncing
`from texture reservation station to alu reservation station. A FIFO exists between each reservation stage, holding up
`vectors until the vector currently occupying a reservation station has left. A vector at a reservation station can be
`chosen te execute. The sequencer looks at all eight alu reservation stations to choose an alu clause to execute and
`all eight texture stations to choose a texture clause to execute. The arbitrator will give priority to clauses/reservation
`
`stations closer to the tep-bottern of the pipeline. It will not execute an alu clause until the texture fetches initiated by
`the previous texture clause have completed. There are two separate sets of reservation stations, one for pixel vectors
`and one for vertices vectors. This way a pixel can pass a vertex and a vertex can pass a pixel.
`
`
`To support the shader pipe the raster engine also contains the shader instruction cache and constant store. There
`are only one constant store for the whole chip and one instruction store. These will be shared amongthe four shader
`pipSs.
`
`Exhibit 2008.doeR400_Sequencerdee
`
`16178 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ** ponen5 9.93 oe
`PMOS/1 B01 OS: 17 PMONIS/O1 02:10 PM
`
`AMD1044_0256675
`
`ATI Ex. 2105
`IPR2023-00922
`Page 3 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 3 of 239
`
`

`

`
`
`|
`
`7 May, 2001
`
`8 Septernber, 20153
`
`R400 SequencerSpecification
`
`PAGE
`
`4 of 16
`
` |
`EDIT DATE
`ORIGINATE DATE
`| 1.1 Top Level Block Diagram
`
`Possible delay for available GPR|.gagfannmnnnmmnnsnnnnnnininnninannenns
`
`
`|___ FIFO
`
`vertex’pixel vector arbitrator
`
`
`
`
`
`
` |
`
`
`
`
`
`
`
`Texture clause 0 ——B
`
`—
`eservation station
`
`lee——[ TO ng
`(ALU clause 0
`hadj—teservation station
`[nntnnnnennannnnnnng
`:
`!
`L
`Texture clanse 1
`pee
`|___ gE

`eservation station
`
`ALU clause 1
`i
`ot
`FIFO Legg
`reservation station
`eS ——
`
`extnre arbitrator Re
`jrexture clanse 2
`poe
`a eservation station
`Fro
`4
`hag——ALU clause 2
`Led
`keservationstation
`
`:
`Fro
`(rexture clanse 3
`
`reservation station
`<i
`FIFO
`|
`jg——ALU clause 3
`reservationstation
`1
`‘extre clanse 4
`r
`pel FES
`>

`‘eservation station
`i
`FIFO
`{eel
`ALU clause 4
`fro en
`reservationstation
`:
`iPexture clause 5
`i
`
`reservationstation
`latfat————|FEO
`‘LUclause 5
`|
`reservation station
`eee
`:
`iPexture clause 6
`eservation station.
`
`exture arbitrator
`

`
`‘
`
`<<
`
`FIFO
`
`i
`e ALU clause 6
`HD en
`reservation station
`nd [FES]
`PF
`rexnure clause 7
`:ALU clause 7
`<—
`ARO
`eservation station
`reservation station
`
`ilxio=1D ‘2 gOo oe=Go @a=iB 2°sn fete>@ aao=oO Pafo) Ssy® oS 3oO = s <@es‘OoioiDBg5a Qo3a ¢s 3xaKd
`
`The rasterizer always checks the vertices FIFO first and if allowed by the sequencer sends the data to the shader. If
`the vertex FIFO is emply then, the rasterizer takes the first entry of the pixel FIFO (@ vector of 32-16pixels) and
`sends it to the interpolators. Then the sequencer takes contral af the packet. The packet consists of 3 bits of state, 6-
`7 bits for the base address of the Shader program and someinformation on the coverage to determine texture LOD.
`All other information (2x2 adresses) is put in a FIFO (one for the pixels and one for the vertices) and retrieved when
`the packet finishesits last clause.
`
`Exhibit 2008JocR400Gequeacerdes
`
`16178 Bytes*** © AT! Confidential. Reference Copyright Notice on Cover Page © *** gsnensg93
`PMOS/1 B01 OS: 17 PMONIS/O1 02:10 PM
`ns
`
`AMD1044_0256676
`
`ATI Ex. 2105
`IPR2023-00922
`Page 4 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 4 of 239
`
`

`

`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`
`
` -
`
`if the packel Is a vertex packel, upen reaching ALU clause 4 a can export the position ifthe position!is(ead. So the
`issues a register address for the return value (td). Then, it increments the counter of FIFO one-1 to signify to the ALU
`
`iA
`5 of 16
`GEN-CXXXXK-REVA
`8 September, 220153
`7 May, 2001
`On receipt of a packet, the input state machine (not pictured but just before the first FIFO) allocated enough space in
`the registers to store the interpolated values and temporaries. Following this, the input state machine stacks the
`packetin thefirst FIFO.
`
`On receipt of a command, the level 0 texture machine issues a texure request and corresponding register address for
`the texture address (ta). A small command (temd) is passed to the texture system identifying the current level number
`(0) as well as the register set-being-usedwrite address for the texture return data. One texture request is sent every 4
`clocks causing the texturing of four 2x2s worth of data (or 1G vertices). Once all the requests are sent the packetis
`put in FIFO 1.
`
`Upon recept of the return data (identified by the temd containing the level number 0), the level 0 texture machine
`
`‘that the data is ready to be processed.
`
`On receipt of a command, the level OQ ALU machine first decrements the input FIFO counter and then issues a
`complete set of level O shader instructions. For each instruction, the state machine generates 3 source addresses,
`
`
`one destination address (2-3cycles later) and an instruction‘d-wich-is-tised-teJndex-inio.the.inskaiction store. Once
`th
`the last instruction as been issued, the packet is put into FIFO 2.
`
`given time (and two arbitrers)in-thic-cacetheinsituctoné-ofa
`
`There will always be two active ALU clauses al any
`
`yectorare-interleaved-with-the-instructions-ofthe-other-vecter, One arbitrer will arbitrate over the odd clock cycles and
`the other one will arbitrate over the even clock cycles. The only constraints between the two arbitrers is thai they are
`not allowed to pick the same clause number as they other one is currently working on if the packet os of the same
`pe.
`
`positional data, the location wherethe vertex datais to be out iis also sent (parameter data pointers).
`All other level process in the same way until the packetfinally reaches the last ALU machine (8). On completion of the
`level 8 ALU clause, a valid bit is sent to the Render Backend which picks up the color data. This requires that the last
`instruction writes to the output register — a condition that is almost always true.
`If the packetwas a vertex packet,
`
`instead of sending the valid bit to the RB, itis sent to the PA,
`so it can know that the data present in the parameter store is valid.
`Only one-two ALU state machine may have access to the SRAMregister file address bus or the instruction decode
`bus at one time. Similarly, only one texture state machine may have access to the SRAMreqister file address bus at
`
`one time. Arbitration is performed by tve-three arbitrer blocks (eme-hwofor the ALU state machines and one for the
`texture state machines). The arbitrers always favor the higher number state machines, preventing a bunch ofhalf
`finished jobs from clogging up the Sk.AMregister Sfiles.
`
`Each state machine maintains an address pointer specifying where the 16-or-32} entries vector is located in the
`SRAMregisterfile (the texture machine has two pointers one for the read address and one for the write). Upon
`completion of its job,
`the address pointer is incremented by a predefined amount equal to the total number of
`registers required by the shading code. A comparison of the address pointer for the first state machine in the chain
`(the input state machine}, and the last machine in the chain (the level 8 ALU machine), gives an indication of how
`much unallocatedSRAMreqgister file memory is available
`
`
`
`Exhibit 2008.doeRA00_Sequencendes
`
`16178 Bytes** © AT! Confidential. Reference Copyright Notice on Cover Page © ** onan o.n3)
`PMGGH WO1 OS. 17 PNORNGIOT O20 PM
`
`AMD1044_0256677
`
`ATI Ex. 2105
`IPR2023-00922
`Page 5 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 5 of 239
`
`

`

`
`
`PAGE
`ORIGINATE DATE
`R400 Sequencer Specification
`EDIT DATE
`6 of 16
`& September, 20153
`£.
`is
` interpolated data trom RE
`
`512x128 (built as 4:1
`
`x128 or 16 128x32
`
` <j
`
`7 May, 2001
`
` Register File
`
`
`
`
`Address to texure
`or vertex parameter data to RE through texture block
`or pixel data to RB through texture block
`
`W382
`128 bit data
`
`
`
`432 bit MAC units
`
`128 bit scatar/vector |
`ALU
`
`
`
`
`
`
`control from RE
`
`constants irom RE
`
`control from RE
`
`Exhibit 2000docRAte.toquenserdsc
`
`16178 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ** sopensoins
`PMGGH WO1 OS. 17 PNORNGIOT O20 PM
`
`AMD1044_0256678
`
`ATI Ex. 2105
`IPR2023-00922
`Page 6 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 6 of 239
`
`

`

`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`GEN-CXXXXX-REVA
`
`7 May, 2001
`
`
`
`& September, 20153WEES Yat
`
`
`
`
`
` aI
`
`ScalarUnit
`
`
`
`
`
`
` pipeline stage
`
`Register File
`
`
`texture request
`
` eftiles
`
`
`
`|datawimitivedatafromREinto]S$8regis.
`
`
`
`
`
`
`
`
`
`
`
`
`7 of 16
`
`|I
`4
`||
`Register File
`
`|
`
`
`
`scalar inpuvoutput
`
`iblLo~
`MAG
`|
`texture req
`J
`
`pipeline stage
`
`
`Registe
`
`
`
`file
`
`
`
`
`
`
`texture reques
`
`
`
`(scalar iInputfoutput
` L
`
`pipeline
`
`
`
`
`(
`Exhiblt2008docR400_Sequencendec
`16178Bytes*** © ATI Confidential. Reference Copyright dotice un Cover Page © ***
`
`PMGGH WO1 OS. 17 PNORNGIOT O20 PM
`
`v
`
`sonansound)
`
`AMD1044_0256679
`
`ATI Ex. 2105
`IPR2023-00922
`Page 7 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 7 of 239
`
`

`

`
`
`
`
` |
`|
`
`ORIGINATE DATE
`
`? May, 2001
`
`EDIT DATE
`
`en
`et
`8 Septernber, 20153
`
`R400 Sequencer Specification
`
`PAGE
`
`8 of 16
`
`Exhibit 2008JocR400Gequeacerdes
`
`16178 Bytes*** © AT! Confidential. Reference Copyright Notice on Cover Page © *** gsnensg93
`PMOS/1 B01 OS: 17 PMONIS/O1 02:10 PM
`ns
`
`AMD1044_0256680
`
`ATI Ex. 2105
`IPR2023-00922
`Page8 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 8 of 239
`
`

`

`iec-on-GoverPage-O-—sorp40 91.5 S
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Register File
`
`— a
`
`
`
`frereggy (TT
`reg
`~\
`
`
`
`text]
`[reques
`~
`
`LL
`
`
`
`
`
`Lt
`L
`is3
`Es
`&
`12.
`Pa
`a

`2
`ity
`
`5
`fexture|S quest
`c
`a
`2
`
`.
`
`~
`
`'
`|
`\
`i
`
`|
`|
`
`|
`
`Register File
`
`nn (
`
`scalarTiputioutput
`
`pipeline stage
`|
`
`
`ifs>
`
`7
`.
`RegisterFile
`
`&8
`
`w#
`
`
`
`
`
`
`a ("|
`ft
`|
`i
`7 oo
`< |
`scalar inpubfoutput
`aa
`pipeline stage
`|
`|
`— mo
`'
`
`|
`|
`
`RegisterFile a
`
`
`aq
`
`LL!
`_
`|
`
`es
`TTT
`
`(Sak input/output
`
`————
`{
`i
`
`|
`cr
`
`_
`
`No |
`texture rel
`pst
`i
`
`|
`
`wo
`8
`oe

`
`
`to Primitive Assembly Unit or RenderBackend
`)
` PMGGH WO1 OS. 17 PNORNGIOT O20 PM
`Exhibit298¢rbrctOG-Sequemcer:
`5
`
`AMD1044_0256681
`
`ATI Ex. 2105
`IPR2023-00922
`Page 9 of 239
`
`7 May, 2001
`
`uw
`ce£
`
`2 2
`
`RENEE
`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`
`
`
`8 September, 20153iene} sR
`
`
`
`GEN-CXXXXX-REVA
`
`9 of 16
`
`
`
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 9 of 239
`
`€
`

`

`||||||
`
` 4-yae,
`
`a |
`
`
`TEXTURE
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`OPERAND MUX
`
`
`
`
`
`
`
`. ALU Wee
`
`ALU
`
`SCALAR
`ALU
`
`
`
`
`
`
`
`
`
`
`
` 7 May, 2001
`
`ORIGINATE DATE
`
`EDIT DATE
`
`8 September, 201534 et
`
`
`R400 Sequencer Specification
`
`PAGE
`10 of 16
`
`Interpolated
`data / Vertex indexes
`
`F|
`REGISTER FILE
`
`:
`see
`<—
`
`|
`|
`
`INSTRUCTION
`STOREICACHE
`
`-
`
`-
`CONSTANT
`STORE
`
`The gray area represents blocks that are replicated 4 times per shader pipe (16 times on the overall chip).
`
`Exhibit 2000docRAte.toquenserdsc
`
`16178 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ** sopensoins
`PMOBH Q/O1 OS: 57 PMOR13/01 O2:10 PM
`:
`
`AMD1044_0256682
`
`ATI Ex. 2105
`IPR2023-00922
`Page 10 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 10 of 239
`
`

`

`
`
`ORIGINATE BATE
`7 May, 2007
`
`|
`L
`
`EDIT DATE
`& September, 20153
`4,
`p
`Pes
`
`DOCUMENT-REV. NUM.
`GEN-CoO000¢-REVA,
`
`PAGE
`11 of 16
`
`|
`
`L
` |
`
`1.3 Control Graph
`
`Ciause # + Rdy
`
`WrAddr
`eMD
`
`CST
`
`Be
`
`IS
`
`|
`
`|
`|
`|
`|
`
`Phase
`RdAddr
`
`
`CME esti Gers
`
`SEQ
`
`CST |
`
`I
`
`WrAddr
`
`RdAddr
`PARB
`
`7
`,
`
`I
`|
`
`
`A
`B CWrvec |
`WrSeal wraddr
`
`
`
`™
`
`WrAddr
`
`SP
`
`|
`||
`
`OF
`
`In green is represented the Texture control interface, in red the ALU control interface, in blue the Interpolated/Vector
`control interface and in purple is the outputfile control interface.
`
`2. Interpolated data bus
`physically divided (one 32x 128 per MAC) and we don’t have the
`Since each of the register file is actually
`
`
`a maximum size vector of vertices in the parameter buffer, we need fointerpolate on a parameter basis rather than on
`a quad basis. So the order to the registerfile will be:
`
`QOPO Q1P0 G2P0 OSP0 QOP1 GIP1 G2P1 Q3P2 GOP3 Q1P3 ..
`
`Instruction Store
`3,
`te 2000 instructions of 96 bits
`contain up
`It may
`There is
`going
`te be only one instruction store for the whole chip.
`
`each, The instruction store is loaded by
`the sequencer using
`the memory hub. The read bandwith from this store is
`
`24 bits/clock/pipe. To achieve this this instruction store is likely
`to be broken up
`into 4 blocks. An ALU instruction
`
`
`
`section CO R/1W) split in bve and a texture section (1R/1W)
`also solil in two, The bandwith out of those memories is 98
`
`
`bits/clock,
`
`
`
`tted: Bullets and Numberin >| Formas
`
`i Formatted: Bullets and Numbering
`
`Exhiblt 2000.doct40G_Sequencer.dec
`
`16178 Byes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ***
`PMO8/1 9/01 03:47 PMOTNS/01 02:10 PM
`
`nonen5 91.93 ee
`
`AMD1044_0256683
`
`ATI Ex. 2105
`IPR2023-00922
`Page 11 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 11 of 239
`
`

`

`
`
`
`
`
`
`PAGE
`EDIT DATE
`ORIGINATE DATE
`ote | —
`8 September, 20153
`7 May, 2001
`
`hen J. {FormattedBulletsandNumberingGYanecs
`
`
`4. Constant Store
`Oe
`a remapin
`the CP. The sequencer is aware of where the constanis are using
`The constant store is managed by
`
`table also managed by the CP. A likely
`size Jor the constant siore ie 512x128 bits. The constant siore is also planned
`
`to be shared. The read BWfrom the constant store is 512/4 bits/clock/pipe and the write bandwith is 32/4 bits/clock,
`
`
`
`R400 Sequencer Specification
`
`
`
`5. Looping and Branches
`Loops and branches are planned to be supported and will have to be dealt with at the sequencer level. However, itis
`
`
`still unclear if we plan on supparting data dependent branches ornot,
`
`
`
`6. Register file allocation
`the registerfile in
`In both cases,
`pixels can either be static or dynamic,
`The register file allocation for vertices and
`
`
`pixels and one for vertices).
`In the dynamic case the boundary between
`managed using
`two round robins (ene for
`pixels and verticesis allowed to move, in the sltatic caseitis fixed to VERTEXREGSIZEfor vertices and 256-
`
`VERTEXREGSIZEfor pixels.
`
`ose) Formatted: Bullets and Numbering
`OS as
`=
`:
`:
`oo
`
`*
`
`~
`
`“ 2 id Formatted: Bullets and Numbering
`
`Exhibit 2008JocR400Gequeacerdes
`
`16178 Bytes*** © AT! Confidential. Reference Copyright Notice on Cover Page © *** gsnensg93
`PMOS/1 B01 OS: 17 PMONIS/O1 02:10 PM
`ns
`
`AMD1044_0256684
`
`ATI Ex. 2105
`IPR2023-00922
`Page 12 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 12 of 239
`
`

`

`
`
`16178 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ** jonons 9.9 ee
`
`
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`7 May, 2001
`
`8 September, 20153
`es A Ra
`
`GEN-CXXXXX-REVA
`
`PAGE
`
`13 of 16
`
`Above is an example of how the algorithm works. Vertices come in from top to botiom: pixels come in from botiom to
`
`top. Vertices are in orange andpixels in creen. Theblue line Is the
`tall of the verlices and thegreenlineis the tallof
`the pixels. Thus anything between the two lines is shared. When pixele meets vertices the line turns white and the
`boundary is static until both vertices and pixels share fhe same “unallocated bubble”. Then the boundary as allowed
`to move again.
`
`2-7. Texture Arbitration
`The texture arbitration logic chooses one of the 8 potentially pending texture clauses to be executed. The choice is
`made by looking at the fifos from 7 to 0 and picking the first one ready to execute. Once chosen, the clause state
`machine will send one 2x2 texture fetch per clock (or 4 fetches in one clock every 4 clocks) until all the texture fetch
`instructions of the clause are sent. This means that there cannot be any dependencies between two texture fetches
`of the same clause.
`
`The arbitrator will not wait for the texture fetches to return prior to selecting another clause for execution. The texture
`pipe will be able to handle up to +20X(’?) in flight texture fetches and thus there can be a fair number of active clauses
`waiting for their texture return data.
`
`3-8. ALU Arbitration
`ALU arbitration proceeds in almost the same way than texture arbitration. The ALU arbitration logic chooses one of
`the 8 potentially pending ALU clauses to be executed. The choice is made by looking at the fifos from 7 to 0 and
`picking the first ¢one ready to execule
`=
`,
`
` state. mashine-4isgues- the first instruction forthe.first.sub-vector and then, 4.-elooks. later, the.firstinstruction of the
`
`second-cub-vecter-and-so-on-until-the-siause-is-finished.. There are two ALU arbitrers, one for the even clocks and
`
`Exhibit 2009docR400_Sequencondec
`
`PMOS/1 B01 OS:
`
`17 PMONIS/O1 02:10 PM
`
`
`
`
`
`ee = Formatted: Bullets and Numbering
`
`sos
`
`Formatted: Bullets and Numbering
`
`AMD1044_0256685
`
`ATI Ex. 2105
`IPR2023-00922
`Page 13 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 13 of 239
`
`

`

`
`
`
`
`
`
`
` |
`PAGE
`R400 Sequencer Specification
`EDIT DATE
`ORIGINATE DATE
`|
`shen
`14 0f 16
`85September, 201 53
`7 May, 2001
`onefor the add clocks. For exemple, hereiis the seg uencin “of twointerleaved ALUclauses (E and © stands for Even
`
`
`and Odd):
`
`Elnst0 OinstO Einett Oinsti Einst2Oinsi2Einst0 OinstS Einst) Oinst4 Elnsit2 Oinetd..
`Proceeding this way hides the latency of 8 clocks of the ALUs.
`
`4-9. Handling Stalls
`When the output file is full, the sequencer prevents the ALU arbitration logic to select the last clause (this way nothing
`can exit the shader pipe until there is place in the output file. If we-have-the-abilityto-export-etanyclausethe packet is
`a vertex packet and the position buffer is full (POS FULL) then the sequencer also prevents a thread to enter the
`exporting clause(47). The sequencerwill set the OUT_FILE_FULL signal n clocks before the outputfile is actually full
`and thus the ALU arbitrer will be able read this signal and act accordingly by not preventing exporting clauses to
`proceed.
`
`5-10. Content of the reservation station FIFOs
`3 bits of Render State-and 6-7 bits for the base address of the instruction store and some bits for LOD correction.
`Every other information (such as the coverage mask, quad address, etc.) is put in a FIFO and is retrieved when the
`quad exits the shader pipe to enter in the outputfile buffer. Since pixels and vertices are kept in order in the shader
`pipe, we only need two fifos (one for vertices and one for pixels) deep enough to cover the shaderpipe latency. This
`size will be determined later when we will Know the size of the small fifos between the reservation stations.
`
`eo) Formatted: Bullets and Numbering
`
`abe : Formatted: Bullets and Numbering
`
`6-11. The Output File (RB FIFO and Parameter Cache)
`The output file is where program results are exported when the pixel/vertex shaderfinishes. It constists of a 512x128
`memory cell that is statically divided between pixels and vertices. Each-section-is-aregular-FIEO.-The outputfile has
`1 write port and 1 read port. The sequencer is responsible for managing the addresses of this output file and for
`stalling the shader pipe should this outputfile fill up. The managementis done by keeping the tail and head pointers
`of each sections (pixels and vertices) and incrementing them using a simple RoundRobin allocation policy. The
`sequencer must also arbitrate between the PA and the RB for the use of the read port. This arbitration will either be
`priority based or just interleaved evenly (1 read every 2 clocks for each of the blocks).
`
`S & Formatted: Bullets and Numbering
`s
`
`:
`
`p= <| Formatted: Bullets and Numbering
`
`7-12. Interfaces
`
`7-+12.1 External interfaces
`
`$4+412.1.1Sequencer to Shader Engine Bus
`This is a bus that sends the instruction and constant data to all 4 Sub-Engines of the Shader. Because a newinstruction
`is needed only every 4 clocks, the width of the bus is divided by 4 and both constants and instruction are sent over
`these 4 clocks.
`
`
`Name
`Direction
`Bits
`| Description
`Instruction Start
`| SEQ-> SP
`ic High on first cycle of transfer
`SEQ-> SP
`32
`|
`128bits transferred over 4 cycles, alphafirst... blue last
`Constant 0
`
`|Constant1_SEQSP [32 | 128 bits transferred over 4 cycles, alpha first. bluelast
`[SEQ->SP_> SP
`.30
`|| 120 bits transferred over 4 cycles (order TBD) 7
`Instruction
`
`
`
`
`
`
`12.1.2Shader Engine to Output File
`Every clock each Sub-Engine can output 128 bits of ‘vector’ data and 32 bits of ‘scalar’ data to an output file (7). This
`data will be compressed into 128 bits total prior to storage in output file.
`
`
`ao Formatted: Bullets and Numbering
`
`“UL_Vector_Out
`
` Name
`| Exhibit 2008JocR400Gequeacerdes
`
`| SP-> OF
`
`| 128
`
`| Vector Data out
`
`[Bits [Description |
`_ f Direction_
`16178 Bytes*** © AT! Confidential. Reference Copyright Notice on Cover Page © *** gsnensg93
`PMOS/1 B01 OS: 17 PMONIS/O1 02:10 PM
`
`AMD1044_0256686
`
`ATI Ex. 2105
`IPR2023-00922
`Page 14 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 14 of 239
`
`

`

`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`
`
`
`
`7 May, 2001
`8 September, 20153
`GEN-CXXKXX-REVA
`15 of 16
`- FNnesennaeneeereesn anernaeneeenseneencreteneveanrerturran wuneiemameeswanenanunenanneewens1 pecteenrennanneanrennennesnnen - vonesoswireke
`
`
`
`
`
`ULScalarOut
`_SP.> OF
`32
`_| Vector Data out
`URVectorOut _
`SP->OF
`i
`UR_Scalar_Out
`| SP-> OF
`
`
`
`
`
`
`
`
`[Namei s—~—“—*;~s~*~C=sCirco[Bits|Description
`
`LL_Vecter_Out
`| SP-> OF
`| 128
`| Vector Data out
`
`| 32____ Vector Data out
`LL Scalar Out
`SP-> OF
`
`
`|LR_Vector_Out [SP->OF [128|VectorDataout
`LR_Scalar_Out
`| SP-> OF
`|32
`| Vector Data out
`:
`:
`:
`aS
`:
`
`F+4312.1.3 Shader Engine to Texture Unit Bus (Fast Bus)
`One quad's worth of addresses is transferred to Texture Unit every clock. These are sourced from a different pixel
`within each of the sub-engines repeating every 4 clocks. The register-fleregister file index to read must precede the
`data by 2 clocks. The Read address associated with Quad 0 must be sent 1 clock after the Instruction Start signal is
`sent, so that data is read 3 clocks after the Instruction Start.
`
`a ce
`
`:
`
`
`
`
`
`One Quad's worth of Texture Data may be written to the Register FileRegisterfile every clock. These are directed to a
`different pixel of the sub-engines repeating every 4 clocks. The register fleregister file index to write must accompany
`the data. Data and Index associated with the Quad 0 must be sent 3 clocks after the Instruction Start signal is sent.
`
`Name
`Direction
`| Bits
`| Description
`;
`
`Tex_Read_Register_Index
`SEQ->SP
`|8
`Index into Register-FileR:
`r files for reading Texture
`|
`| Address
`Tex_RegFile_Read_Data
`SP->TEX
`5i2 | 4 Texture Addresses read from the RegisterFleRegister
`| file
`Index into RegisterfieRegister file for write of returned
`| Texture Data
`
`|
`| SEQ->TEX
`I
`
`|8
`
`Tex_Write_Register_Index
`
`
`
`
`
`:
`
`:
`
`:
`
`ee
`
`:
`
`|
`
`:
`ee
`
`7+412.1.4 Sequencer to Texture Unit bus (Siow Bus) —
`
`Once every four clock, the texture unit sends to the sequencer on wich clause it is now working and if the data in the
`registers is ready or not. This way the sequencer can update the texture counters for the reservation station fifos. The
`sequencer also provides the intruction and constants for the texture fetch to execute and the address in the register
`fileregister file where to write the texture return data.
`
`Name | Direction BitsDescription
`ee
`
`Tex_Ready
`_TEX— SEQ
`4
`| Data ready
`Tex Clause Num
`TEX— SEQ
`3
`| Clause number
`[Texcst 7 SEQ--TEX| Po_Texture constants Xbitssentover4clocks
`Tex_inst
`| SEQ-»TEX
`L?
`| Texture fetch instruction X bits sent over 4 clocks
`
`
`
`
`
`
`
`-
`.
`__[- =
`Fommatted: Bullets and Numbering
`7+312.1.5 Shader Engine to RE/PA Bus
`Se
`=
`
`Name
`Direction
`Bits | Description
`Interpolator_Register_Index [SEQ->SP
`8
`-Index into Register-FileRegister
`files
`for write
`of
`_
`|
`__Interpolator/index Data
`.
`Sas
`oe
`Interpolator_Write_Mask
`SEQ->SP
`1
`| Write Mask. The same write mask is usedforall 4 pixels
`see
`:
`ae
`:
`
`Interpolator_Write_Data
`RE/PA->SP
`| 512
`| 4interpolated vectors or vectors of indices
`:
`:
`: Ss eae
`. ed Formatted: Bullets and Numbering
`=
`=
`12.1.6 PA to sequencer
`“| Ser
`e
`
`
`Name{Formatted——S™~C~S~S|Direction | Bis | Description
`
`
`
`Adress,
`PASE,
`2,
`, Dealocation adresssent by the PAtelling the Sequencer ||
`__.
`700 pe--eeseeee a
`
`|
`| thet tis now possible io free this space in ihe parameter
`[pl © Formatted _
`
`|
`| buffer, This token is a pointer in the parameter cache and | {Formatted
`SR
`| 4 bits
`to
`tell the
`size
`wichis to
`be
`freed up
`
`
`
`
`
`.
`
`|
`
`Exif2009.docR400_Sequencendee
`
`16178 Btes** © ATI Confidential. Reference Copyright Notice on Cover Page © ** poponsgnd Ee
`PMOS/1 B01 OS: 17 PMONIS/O1 02:10 PM
`
`:
`
`es
`
`: .
`
`AMD1044_0256687
`
`ATI Ex. 2105
`IPR2023-00922
`Page 15 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 15 of 239
`
`

`

`
`
` ORIGINATE DATE Sadia
`
`| 8-13. Open issues
`
`|
`
`7 May, 2001
`
`EDIT DATE
`8 September, 20153
`
`R400 Sequencer Specification
`
`PAGE
`16 of 16 — —— -
`
`-
`
`There is currently an issue with constants. If the constants are not the same for the whole vector of vertices, we don't
`have the bandwith from the texture store to feed the ALUs. Two solutions exists for this problem:
`1) Let the compiler handle the case and put those instructions in a texture clause so we can use the
`bandwith there to operate. This requires a significant amount of temporary storage in the register store.
`2) Waterfall down the pipe allowing only at a given time the vertices having the same constants to operate in
`parrallel. This might in the worst case slow us down by a factor of 16.
`
`a rrr
`
`Need to de some testing on the size of the register-Heregister file as well as on the registerflerecister file allocation
`method (dynamic VSstatic).
`
`| Abilityto-exportatanyclause?
`
`Saving power?
`
`| Are we working on 32 vertices at a time or 167
`
`Size of the fifo containing the information of a vector of pixels/vertices. And size of the fifos before the reservation
`stations.
`
`SequencerInstruction memory, and constant memory.
`
`Arbitration policy for the outputfile.
`
`Loops and branches.
`
`The parameter cache may end up in the PA rather than in ine RS. Parameter cache management thus may change.
`
`Exhibit 2008JocR400Gequeacerdes
`
`16178 Bytes*** © AT! Confidential. Reference Copyright Notice on Cover Page © *** gsnensg93
`PMOS/1 B01 OS: 17 PMONIS/O1 02:10 PM
`ns
`
`AMD1044_0256688
`
`ATI Ex. 2105
`IPR2023-00922
`Page 16 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 16 of 239
`
`

`

`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`
`Author:
`
`14 Auguel, 200144
`Laurent Lefebvre
`
`4 September, 201524
`Lu
`
`GEN-CXXXXX-REVA
`
`4 of 20
`
`
`
`
`issue To: | Copy No:
`
`
`
`
`
`
`R400 Sequencer Specification
`
`SEQ
`
`Version 0.42
`
`It provides an overview of the
`Overview: This is an archiectural specification for the R400 Sequencer block (SEQ).
`required capabilities and expected uses of the block.
`it also describes the block interfaces,
`internal sub-
`blocks, and provides internal stale diagrams.
`
`AUTOMATICALLY UPDATED FIELDS:
`Document Location:
`Ciiperforcer40Q\archidoc\whiRE\R400_Sequencerdec
`Current intranet Search Title:
`R400 Sequencer Specification
`
`
`:
`-
`oo
`:
`HOSE
`APRROVALS
`:
`:
`
`ES
`:
`eee ene “ Name/Dept ©
`ees
`Signature/Date
`
`
`
`
`
`Remarks:
`
`
`
`
`
` THIS DOCUMENT CONTAINS CONFIDENTIAL INFORMATION THAT COULD BE
`
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES
`INC. THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`
`
`“Copyright 2001, ATI Technologies Inc. All rights reserved. The material in this document constitutes an unpublished
`work created in 2001. The use of this copyright notice is intended to provide notice that ATI owns a copyright in this
`unpublished work. The copyright notice is not an admission that publication has occurred. This work contains
`confidential, proprietary information and trade secrets of ATI. No part of this document may be used, reproduced, or
`transmitted in any form or by any means without the prior written permission of ATI Technologies Inc.”
`
`|:
`
`Exhibit 2010 docR400_Sequencerdos
`
`25504 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page @ *** poping youd
`PMG843/01.0547 PMOTHI0245BM
`
`ATT 2010
`
`LGv. ATI
`TPR2015-00325
`
`AMD1044_0256689
`
`ATI Ex. 2105
`IPR2023-00922
`Page 17 of 239
`
`ATI Ex. 2105
`
`IPR2023-00922
`Page 17 of 239
`
`

`

`
`
`SEs
`PAGE
`R400 Sequencer Specification
`EDIT DATE
`ORIGINATE DATE
`14August,2001444September,2015242 of 20
`
`
`
`
`
`Table Of Conten

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket