`
`
`/
`Author:
`
`
`
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`4 Septernber, 201524
`=
`
`GEN-CXXXXX-REVA
`
`PAGE
`
`4 of 20
`
`-
`:
`
`ORIGINATE DATE
`
`14 August, 200144
`Laurent Lefebvre
`
`25584 Bytes*** © ATI HERference Copyright Notice on Cover Page © *** pagans yoad
`
`
`
`
`
`
`
`
`
`Issue To:
`Copy No:
`
`R400 Sequencer Specification
`
`SEQ
`
`Version 0.42
`
`Il provides an overview of the
`Overview: This is an archiectural specification for ine R400 Sequencer block (SEQ).
`required capabilities and expected uses of the block.
`it also describes the block interfaces,
`internal sub-
`blocks, and provides internal state diagrams.
`
`AUTOMATICALLY UPDATED FIELDS:
`Document Location:
`CAperforcer400\archidoclgh\REVWR400Sequencer.doc
`
`Gurrent Intranet Search Title: R400 Sequencer Specification
`APPROVALS.»
`
` Signature/Date
`
`es
`== Name/Dept
`
`
`
`
`
`
`
`“Remarks:
`
`THIS DOCUMENTCONTAINS [RRNFORMATION THAT COULD BE
`
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES
`
`
`INC. THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`
`
`
`“Copyright 2001, ATI Technologies Inc. All rights reserved. The material in this document constitutes an unpublished a
`
`work created in 2001. The use of this copyright notice is intended to provide notice that AT] owns a copyright in this
`[>
`unpublished work. The copyright notice is not an admission that publication has occurred. This work contains
`EEE:oprictary information and trade secrets of ATI. No part of this document may be used, reproduced, or
`transmitted in any form or by any meanswithout the prior written permission of ATI Technologies Inc.”
` a
`
`Exhibit 2010 dock400-Boquencoreos
`
`PMOSSSi4DS-4 2 RMOARAD9240ond.
`
`
`
`:
`
`ATI 2010
`
`LGv. ATI
`IPR2015-00325
`
`AMD1044_0016660
`
`ATI Ex. 2010
`IPR2023-00922
`Page 1 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 1 of 20
`
`
`
` Vat
`
`(2a
`all
`IU
`
`ORIGINATE DATE
`14 August, 200144
`Lu
`FAR
`
`EDIT DATE
`
`4 September, 201524
`oy
`
`
`
`
`
`R400 Sequencer Specification
`
`PAGE
`2 of 20
`
`2. INTERPOLATED. DATA-BUS were tO
`
`
`
`7, JEXTURE-ARBITRATION.
`
`8 ALU-ARBITRATION snvceesn
`
`
`
`Eile
`
`
`
`Unit-Bus(Fasi Bus)
`t+2-+4
`Sequencerto Texture Unit bus
`(Slow-Bus}-44
`
`
`
`6-1-2—ShaderEngine-io-QuipulFile
`+49
`
`Bus-(FastBus)
`
`Table Of Contents
`
`1.
`OVERVIEW oocccccccecesesssssnsessessussssesssses 43
`Li.
`Top Level Block Diegrarn 54
`
`
`12 Data Flow oraph. OF
`
`L3.
`Control Grapn.ee 1249
`
`2.
`INTERPOLATED DATA BUS....... 1240
`3.
`INSTRUCTION STORE 0 ....sccece2: 42190
`4,
`CONSTANT STORE oe ceccerecccrsesees 41344
`&.
`LOOPING AND BRANCHES........ 4344
`6.
`REGISTER FILE ALLOCATION... 1344
`7.
`TEXTURE ARBITRATION |... 1442
`&.
`ALU ARBITRATION 0. ccc sereeeee 1412
`9.
`HANDLING STALLS oo ccccsecsssesnees 4543
`10.
`CONTENT OF THE RESERVATION
`STATION FIFOS ooo. esceensnseenenerennnens 1543
`
`
`
`
`
`it.
`THE OUTPUT FILE (RB FIFO AND
`PARAMETER CACHE). .....cccccsscsesesssssees 1543
`
`12.
`INTERFACES... ccc. 1513
`121
`External Interfaces... 4543
`12.1.1
`Sequencer to Shader
`ENGING BUS. eee eeepc par ren es 4543
`12.1.2ShaderEnginetoOutput
`File
`1543
`12.1.3
`Shader Engine to Texture
`
`12.1.4
`Sequencer to Texture Unit bus
`(Slow Bus) 1644
`12.1.5
`Shader Engine to RE/PA Bus
`
`
`
`PA? to sequencer.............. 1644
`12.1.6
`
`
`13.
`EXAMPLES OF PROGRAM
`EXECUTIONS... cceesecesssnenmnesseenevenane 1744
`
`Sequencer Control of a Vector
`13.1.1
`of Vertices 1744
`
`13.1.2
`of Pixels
`
`Sequencer Control of a Vector
`1846
`
`13.1.3 Notes eceeeeeseeeses 1947
`14. OPEN ISSUES.
`ees
`
`OVERVIEW crccescesceccssseeeerseceeeersenenns 3
`
`
`
`
`Exhibit 2010.docR400.Sequeaceraios
`
`25864 Bytes*** © ATI
`
`Reference Copyright Notice on Cover Page © ***
`PMGEHGA03:47-PMOFHOtO246Dhl
`
` nencaie sage Q
`
`AMD1044_0016661
`
`ATI Ex. 2010
`IPR2023-00922
`Page2 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 2 of 20
`
`
`
`
`
` ORIGINATE DATE
`PAGE
`DOCUMENT-REV. NUM.
`|
`EDIT DATE
`
`
`
`14 August 200144 3 of 20 4 September, 201524 | GEN-CXXXXX-REVA
`
`644on Sequencer.to-Texture-Unit-bus
`
`
`ht
`
`(SlowBus}
`
`Revision Changes:
`Rev 0.1 (Laurent Lefebvre}
`Date: May 7, 2001
`
`Rev 0.2 (Laurent Lefebvre)
`Date : July 9, 2001
`Rev 0.3 (Laurent Lefebvre)
`Date: August 6, 2001
`Rey 0.4 (Laurent Lefebvre)
`Date : August 24, 2001
`
`First draft.
`
`Changed the interfaces to reflect the changesin the
`SP. Added somedetails in the arbitration section.
`Reviewed the Sequencer spec after the meeting on
`August 3, 2001.
`Added the dynamic allocation method for register
`file and an exam le (written in
`part
`by Vic) of the
`
`Exhibit2010docR400_Sequencondec
`
`26884 Bytes** © ATI HEcference Copyright Notice on Cover Page © ** gous 1244 ‘
`POS3104-83-47EMGFAG040240PM.
`
`AMD1044_0016662
`
`ATI Ex. 2010
`IPR2023-00922
`Page 3 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 3 of 20
`
`
`
`
`
`
`|
`ORIGINATE DATE
`EDIT DATE
`R400 SequencerSpecification
`PAGE
`L
`£7
`A
`oy
`|
`14 August, 200144
`4 September, 201524
`4 of 20
`
`|
`
`:
`
`1. Overview
`The sequencer first arbitrates between vectors of 16-4neybe-32) vertices that arrive directly from primitive assembly
`and vectors of 84-quads (16 pixels) <92-pixele}-that are generated in the raster engine.
`
`The vertex or pixel program specifies how many GPR’s it needs to execute. The sequencer will not start the next
`vector until the needed spaceis available.
`
`It chooses as-two_ALU clauses and a texture clause to execute, and
`The sequencer is based on the R300 design.
`executesall of the instructions in aa clause before looking for a new clause of the same type. Two ALU clauses are
`executed interleaved to hide the ALU latency. Each vector will have eight texture and eight ALU clauses, but clauses
`do not need to contain instructions. A vector of pixels or vertices ping-pongs along the sequencer FIFO, bouncing
`from texture reservation station to alu reservation station. A FIFO exists between each reservation stage, holding up
`vectors until the vector currently occupying a reservation station has left. A vector at a reservation station can be
`chosen toe execute. The sequencer looks at all eight alu reservation stations to choose an alu clause to execute and
`all eight texture stations to choose a texture clause to execute. The arbitrator will give priority to clauses/reservation
`
`stations closer to the tep-bottom of the pipeline. It will not execute an alu clause until the texture fetchesinitiated by
`the previous texture clause have completed. There are two separate sets of reservation stations, one for pixel vectars
`and one for vertices vectors. This way_a
`pixel can pass a vertex and a vertex can pass a pixel.
`
`
`
`To support the shader pipe the raster engine also contains the shader instruction cache and constant store. There
`are only one constant store for the whole chip and one instruction store. These will be shared among the four shader
`pipes.
`
`
`
`Exhibit 2010.viocR400-Sequenserdos
`
`25584 Bytes*™** © ATI HEcference Copyright Notice on Cover Page © ** saosis iaua :
`PMOSH3/O1-O3-47PMGFAStO20OR
`Es
`
`AMD1044_0016663
`
`ATI Ex. 2010
`IPR2023-00922
`Page 4 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 4 of 20
`
`
`
`
` |
`
`ORIGINATE DATE
`PAGE
`DOCUMENT-REV. NUM.
`EDNT DATE
`
`GEN-CXXXXX-REVA 5 of 20
`4 September, 201524
`Ee
`2
`L
`a
`|
`14 August, 200144
`1 Top Level Block Diagram
`
`Possible delay for available GPRS[gj aunmnsnmunuinnunnnnnnnnenmmmanntn
`
`extare arbitrator
`
`bs
`
`exture arbitrator
`
`
`rexture clause 0
`
`eservation station
`
`Lg————-[ FIFO |-¢———
`ALU clause C
`Megteservalionstalion
`nn
`BIRO
`i
`pe
`Texture clause |
`
`eservation station
`i
`ceceeneeceet FIFO
`a
`
`<
`ra
`JALU clause 1
`
`—
`reservationslalion
`denned
`pel FIFO
`'
`‘exture clause 2
`
`eservation station
`<@
`FIFO
`
`
`Le—ALU clause 2
`reservation station
`i FIFO.
`be
`'
`Texture clause 3
`
`eservation station
`=<
`FIFO Leg
`jeg——ALU clause 3
`—
`reservalionstation
`i
`pel FIFO
`be
`1
`exture clause 4
`eservationstation
`FIFO
`|<
`
`teservationstation
`
`
`
`
`‘eservation station
`
`
`vettex‘pixel vector arbitrator
`
`
`
`
`
`LU clause 4
`reservationstation
`
`FIFO
`
`[Texture clause 5
`‘eservation station
`
` i
`
`«@—ALUclause 5
`reservation slalion
`
`iG
`
`
`‘extire clanse 6
`
`reservation station
`
`
`ALU elanse 6
`beservalionstation
`
`
`extire clanse 7
`
`
`
`*
`lead
`.LU clause 7
`
`There are two sets of the above figure. one for vertices and one for pixels.
`
`The rasterizer always checks the vertices FIFO first and if allowed by the sequencer sends the data to the shader.If
`the vertex FIFO is empty then, the rasterizer takes the first entry of the pixel FIFO (@ vector of 32-16pixels) and
`sends it to the interpolators. Then the sequencer takes control of the packet. The packet consists of 3 bits of state, 6-
`7 bits for the base address of the Shader program and some information on the coverage to determine texture LOD.
`All other information (2x2 adresses) is put in a FIFO (one for the pixels and one for the vertices) and retrieved when
`the packet finishesits last clause.
`
`Exhibit 2010dock40G_Sequencer.dec
`
`25584 Bytes*** © ATI
`
`Reference Copyright Notice on Cover Page © =
`PMOSH3/O1-O3-47PMGFAStO20OR
`
`GG0416 i244
`
`AMD1044_0016664
`
`ATI Ex. 2010
`IPR2023-00922
`Page 5 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 5 of 20
`
`
`
`
`
` ORIGINATE DATE
`
`PAGE
`EDIT DATE
`6 of 20
`4September, 201524
`14 August, 200114
`-
`ca
`Tae
`&
`2h
`AE
`BE
`On receipt of a packet, the input state machine (not pictured but just before the first FIFO) allocated enough spacein
`the registers to store the interpolated values and temporaries. Following this, the input state machine stacks the
`packetin the first FIFO.
`
`R400 Sequencer Specification
`
`On receipt of a command, the level 0 texture machine issues a texure request and corresponding register address for
`the texture address (ta). A small command (temd) is passed to the texture system identifying the current level number
`(0) as well as the register sel-being-usedwrile address for the texture return data. One texture request is sent every 4
`clocks causing the texturing of four 2x2s worth of data (or 16 verlices). Once all the requests are sent the packetis
`put in FIFO 1.
`
`Upon recept of the return data-identifiedby the- temnd_containing the Jevel_number-0), the level O-fexture machine
`}
`ihe texture unit writes the data to the register file using the write
`address that was
`provided by
`the level 0 texture machine and sends the clause number(0) to the level 0 texture state
`
`machine to signify thal the write is done and thus the data is ready. Then, ihe level 0 lexiure machine-# increments
`the counter of FIFO ere-1_to signify to the ALU_1 that the data is ready to be processed.
`
`On receipt of a command, the level O ALU machine first decrements the input FIFO counter and then issues a
`complete set of level O shader instructions. For each instruction, the state machine generates 3 source addresses,
`one destination address (2-3 cycles later) and an instructionjd-wich 4¢-usedto index inte the inctruction-etore. Once
`the last instruction as been issued, the© packet is put into FIFO 2. See eeeteeOFSPRE
`
`Baekehewe iven_ time
`
`
`
`
`(and two _arbitrers)-In-this-case,the
`instructions-of-a-vector-are_interleaved_with-the-instructions-of-ihe-cther-vecior, One arbitrer will
`arbitrate
`
`over the odd clock cycles and the other one will arbitrate over the even clock cycles. The only constraints
`between ihe two arbitrers is that they are not allowed to pick the same clause number as they other one js
`currently working on if the packet os of the same type.
`
`it can export the position if the position is ready, So the
`ifthe packet is a vertex packet, upon reaching ALU clause 4,
`arbitrer must
`prevent ALU clause 4 to be selected if the positional buffer is full
`(or can't be accessed). Along with the
`
`
`positional data,
`the location where the vertex data is to be put is also sent (parameter data pointers).
`
`
`
`
`
`All other level process in the same way until the packetfinally reaches the last ALU machine (8). On completion of the
`level 8 ALU clause, a valid bit is sent to the Render Backend which picks up the color data. This requires that the last
`instruction writes to the output register — a condition that is almost always true.
`If the packet was a vertex packet,
`
`instead of sending the valid bit to the RB, it is sent lo the PA-whieh-picke-ue-the-daie-and-sute-#inic-the-veriexstore
`so it can know thal the data present in the parameter store is valid.
`
`Only one-lwoALU state machine may have access to the SRAMregisier fle address bus or the instruction decode
`bus at one time. Similarly, only one texture state machine may have access to the 3RAMregister file address bus at
`one time. Arbitration is performed by fvoethree arbitrer blocks (one-twofor the ALU state machines and onefor the
`texture state machines). The arbitrers always favor the higher number state machines, preventing a bunch of half
`finished jobs from clogging up the SRAMregister Sfiles.
`
`Each state machine maintains an address pointer specifying where the 16-Cer-42) entries vector is located in the
`SRAMregister file (the texture machine has two pointers one for the read address and one for the write). Upon
`completion of its job,
`the address pointer is incremented by a predefined amount equal to the total number of
`registers required by the shading code. A comparison of the address pointer for the first state machine in the chain
`(the input state machine), and the last machine in the chain (the level 8 ALU machine), gives an indication of how
`
`Extibli 2010docR400Sequencersios
`
`25684 Bytes*™* © ATIHEcference Copyright Notice on Cover Page © ** goose isa
`PROGHOH O3.4 7PMORHGINO26BM
`
`AMD1044_0016665
`
`ATI Ex. 2010
`IPR2023-00922
`Page 6 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 6 of 20
`
`
`
`
`
`
`EDIT DATE
`ORIGINATE DATE
`
`te a
`
`
`7 of 20
`GEN-CXXXXX-REVA
`4 September, 201524
`14 August, 200144
`interpolabod data from RE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`egiste:
`
`512x128 (built as 4 128x128 or 16 128x327
`
`
`Address to texure
`
`or verlex parameter dats lo RE throughtexture block
`or pixel data to RB through texture black
`
`
`
`
`
`E32
`
`128 bit data OT
`
`constants from RE
`
`
`
`perand mix
`
`
`
`
`
` 432 bit MAC units
`
`
`
`control from RE
`
`control from RH
`
`
`
`
`
`
`es
`128 bit scalar/vector|
`i
`ALU
`
`
`
`Exhibit 2010.docR4GG_Sequeacendes
`
`25864 Bytes*** © AT]HEcference Copyright Notice on Cover Page © ** p94;
`PMOSHW/O4 03:47PMORHSIO}G26PM
`
`sad
`
`AMD1044_0016666
`
`ATI Ex. 2010
`IPR2023-00922
`Page 7 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 7 of 20
`
`
`
`
`
`
`I I
`
`
`
`I - |
`
`pip stage
` |
`
`
`texture reques!
`
`pipeline silage
`
`ScalarUnit
`
`
`
`
`
`
`
`L c(fT
`
`
`
`| €
`
`textureaddress
`
`‘exturedetann
`Reference Copyright sotice uit Cover Page © ***
`
`ORIGINATE DATE
`
`EDIT DATE
`
`14 August, 200144.
`ee
`ek
`
`4 September, 201524
`seh RE
`sesh
`
`R400 Sequencer Specification
`
`PAGE
`8 of 20
`
`
`
`
`Register File
`
`——
`|
`
`| Caaterinpavutput)Mac
`
`texture request
`pipeline
`
`
`una!
`
`aa
`
`Register File!
`
`
`
`uo
`
`€ g
`
`/
`g
`i
`scalar input/output
`mac||
`ig
`ipeline
`———_1
`|
`
`
`
`'
`i
`|
`bt
`2]
`sy[ss
`UW|
`o
`mitived
`yt
`
`| t
`Sidataspil
`
`texture reque:
`
`text
`
`L =b
`
`file}
`
`
`
`|
`|
`
`Registe:
`
`MAG
`
`
`
`
`
`Gaerne
`
`aJt scaler inputicutputSe
`
`
`
`
`
`
`
`te!
`
`&
`
`
`
`éscalarinput/output
`
`
`
`
`
`=| texture request
`
`/
`\)
`Exhibit 2010.docR4ttSequencerdes
`
`
`to Primitive Assembly Unit or RenderBackend
`
`25884 Bytes*** © ATI
`
`PMOSHW/O4 03:47PMORHSIO}G26PM
`
`‘t
`
`somasag
`
`2
`
`AMD1044_0016667
`
`ATI Ex. 2010
`IPR2023-00922
`Page8 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 8 of 20
`
`
`
`
`
` |
`PAGE
`DOCUMENT-REV. NUM.
`EDIT DATE
`ORIGINATE DATE
`at L = fu .
`
`14 August, 200174
`4 Sepiember, 201524
`GEN-CoOOOG-REVA
`9 of 20
`|
`|
`
`
`
`1.2 Data Flow graph
`
`
`
`Exhibit 2010docRAQG_Sequetcerdec
`
`26584 Bytes™** © ATI HEcference Copyright Notice on Cover Page © ** poosus i244
`
`PROGHB04 O47 RACESIOOPh
`
`ae
`
`a
`
`AMD1044_0016668
`
`ATI Ex. 2010
`IPR2023-00922
`Page 9 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 9 of 20
`
`
`
`
`
`ORIGINATE DATE
`14August, 200144
`ed E
`
`
`
`
`
`
`———/
`7
`|
`=
`L.
`(Scalar input/output
`a es N
`
`|
`pipeline stage
`|
`
`tuec
`
`Of
`
`i
`
`
`
`
`EDIT DATE
`4 September, 201524
`ra
`
`R400 Sequencer Specification
`
`PAGE
`10 of 20
`
`
`
`Register File
`
`MAC
`
`|
`
`Register File
`
`i
`
`BB
`
`
`
`fe ren
`~\
`
`
`
`
`en
`
`
`
`
`requeg
`
`~
`
`|
`= 7
`( Scalar Inpuvoutput
`-
`/
`
`|
`pipeline stage
`'
`Ro —_
`|
`;
`of
`i
`i
`:
`
`MAC
`
`I
`i
`|
`ly
`it
`|
`
`oe
`c
`>
`i
`8
`no
`ee RegisterFile
`ve
`
`
`
`
`
`
`
`f
`|
`
`BLK At|| | texture rel bet
`
`
`
`
`|
`SN (alt input/output
`||
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`eo
`aoe
`i
`|
`< |
`i
`‘ scalar input/output
`
`pipeline stage
`|
`1
`|
`hay
`ix]
`iB!
`fe!
`=
`Z|
`LS!
`
`MA‘
`
`i
`|
`\
`|
`|
`|
`LW
`
`Ragister File
`
`\
`
`7
`|
`
`i
`|
`al
`
`|
`he
`3
`Ts. —_
`texturelS quest
`io.
`|
`S
`2
`cy
`=
`3
`irl
`
`|
`|
`'
`|
`|
`1
`'
`!
`|
`———
`'
`i
`|
`|
`\
`i
`wah} | |
`
`|
`
`
`
`
`
`
`
`on
`1
`
`|
`
`/
`| fosoms
`{
`i
`
`/
`i
`
`A
`
`(
`
`
`
`to Primitive Assembly Unit or RenderBackend
`
`K
`
`>y
`
`th
`2
`oO
`2
`s
`
`PROGHOH O3.4 7PMORHGINO26BM
`
`
`EahibnttaodunateemerregrtaPSE P-Page-Ggre v2.48 :
`
`
`
`AMD1044_0016669
`
`ATI Ex. 2010
`IPR2023-00922
`Page 10 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 10 of 20
`
`
`
`:
`
`:
`pee
`:
`
`.
`
`=
`
`-
`
`+
`|
`
`STORE
`CONSTANT
`
`|||
`
`a
`dele.
`
`|
`|
`l
`
`REGISTER FILE
`
`INSTRUCTION
`— STOREICACHE
`
`
`
`OPERAND MUX
`
`
`
`
`
`
`
`
` fk fn
`
`
`
`
`
`
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`14 August, 200714
`eed
`
`4 Seplember, 201524
`ot
`
`GEN-CXXAKX-REVA
`
`PAGE
`
`41 of 20
`
`
`
`
`!
`Interpolated
`data / Vertex indexes
`
` 4—YdOL—
`
`
`TEXTURE
`
`ALU fe
`
`SCALAR
`ALU
`
`
`
`
`
`
`
`
`
`The gray area represents blocks that are replicated 4 times per shader pipe (16 times on the overall chip).
`
`Exhibit 2070.docR40G_Sequercendec
`
`25884 Bytes*** © ATIHEcference Copyright Notice on Cover Page © ***
`PROGHOH O3.4 7PMORHGINO26BM
`
`peoans sad
`
`AMD1044_0016670
`
`ATI Ex. 2010
`IPR2023-00922
`Page 11 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 11 of 20
`
`
`
`
`
` |
`
`PAGE
`R400 SequencerSpecification
`EDIT DATE
`ORIGINATE DATE
`
`t iin! £7 A
`
`
`14 August, 200144
`4 Seplember, 201824
`42 of 20
`|
`:
`|
`
`1.3 Control Graph
`
`Ciause # + Rdy
`
`cmp
`
`cst
`
`v
`
`¥
`
`¥
`
`TX
`
`Phase
`RdAddr
`
`L
`
`4
`
`WrAddr
`
`IS
`
`SEQ
`
`CST
`
`|
`
`||
`|
`
`|
`
`c Wrvec
`8
`A
`| WrSeal
`
`WrAddr
`
`RdAddr
`PAIRB
`
`:
`radar
`
`‘
`
`fow iy
`
`‘ yy
`
`SP
`
`OF
`
`In green is represented the Texture control interface, in red the ALU controlinterface, in blue the Interpolated/Vector
`control interface and in purple is the outputfile control interface.
`
` WrAddr
` Exhibit 2010.viocR400-Sequenserdos
`
`2. Interpolated data bus
`Since each of the registerfile is actually physically divided (one 128x128 per MAC) and we don't havetheplace to
`hold 8 maximum size vector of vertices in the parameter buffer, we need to inlerpolate on a
`pararneter basis rather
`
`than on a quad basis. So the order to the register file will be:
`
`QOPO QiPO G2P0 OSP0 QGP1 G1P1 G2P1 QSP1 QOP2 Q1P2 ...
`
`Instruction Store
`3__
`There is going to be only one instruction store for the whole chip. may contain up jo 2000 instructions of 96 bits
`each,
`
`me
`
`“
`Ss
`"
`.
`:
`a. rrrrr—“—~srs——CSssS
`
`ee
`
`ee
`
`(ISSUE : The instruction store is loaded by the sequencer using ine memory hub 7}.
`
`The read bandwith from this store is 24 bits/clock/pipe, To achieve this this instruction store Js likely to be broken up
`into 4 blocks, An ALU instruction section CIR/1VW)_ split in two and a texture section CIR/IW) also split in two, The
`bandwith out of those memories is 96 bits/clock,
`
`
`
`
`
`25584 Bytes*™** © ATI HEcference Copyright Notice on Cover Page © ** saosis iaua :
`PMOSH3/O1-O3-47PMGFAStO20OR
`Es
`
`AMD1044_0016671
`
`ATI Ex. 2010
`IPR2023-00922
`Page 12 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 12 of 20
`
`
`
`
`
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
` ORIGINATE DATE
`
`
`A L ou fin. fu .
`14 August, 200114
`4 Sepiember, 201524
`GEN-CoOOOG-REVA
`|
`43 of 20
`
`
`
`
`a
`4 Constant Store
`a remapin
`The constant store is managed by ihe CP. The sequencer is aware of where ine constanie are using
`
`table also managed by the CP. A likely size for the constant store is 512x128 bits. The constant store is also planned
`
`
`to be shared, The read BWfrom the constant store is 512/4 bits/clack/pipe and the write bandwith is 32/4 bits/clock,
`
`
`5, Looping and Branches
`Loops and branches are planned to be supported and will have to be dealt with al the sequencer level. However,
`
`
`
`still unclear if we plan on supporting data dependent branches or not,
`
`iLis
`
`
`
`6. Register file allocation
`the register file in
`In both cases,
`The register file allocation for vertices and pixels can either be static or dynamic,
`Managed using two round robins (one for pixels and one for vertices), In tnedynamic case the boundary between
`
`pixels andverticesis allowed tomove, |
`atic case
`VERTEXREGSIZE forvertices and296-
`VERTEX REG SIZE for pixels.
`
` oo") Formatted: Bullets and Numbering
`
`ce
`
`Formatted: Bullets and Numbering
`
`- | Formatted: Bullets and Numbering
`
`:
`
`
`)
`
`Exhibit 2010docRAQG_Sequetcerdec
`
`26584 Bytes™** © ATI HEcference Copyright Notice on Cover Page © ** poosus i244
`PROGHB04 O347DMGTAS04O246Ph
`
`a
`
`AMD1044_0016672
`
`ATI Ex. 2010
`IPR2023-00922
`Page 13 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 13 of 20
`
`
`
`
`
`ORIGINATE DATE
`14 August, 200144
`
`EDIT DATE
`4 September, 201324
`
`R400 Sequencer Specification
`
`PAGE
`
`Peat
`
`top.Vertices arein orangeand pixels in q reen, Theblue lineis thetailofthe vertices and the green ling is thetail of
`the
`pixels, Thus anything between the two lines is shared. When pixels meets vertices the line turns white and the
`
`
`
`
`bouncary is static until boih vertices and okels share the same “unallocated bubble”. Then the bouncary as allowed
`moving again,
`
`2-7, Texture Arbitration
`The texture arbitration logic chooses one of the 8 potentially pending texture clauses to be executed. The choice is
`made by locking at the fifos from 7 to 0 and picking the first one ready to execute. Once chosen, the clause state
`machine will send one 2x2 texture fetch per clock (or 4 fetches in one clock every 4 clocks) until all the texture fetch
`instructions of the clause are sent. This means that there cannot be any dependencies between two texture fetches
`of the sameclause.
`
`
`
`
`
`
`
`
`4 Formatted: Bullets and Numbering
`
`
`
`The arbitrator will not wait for the texture fetches to return prior to selecting another clause for execution. The texture
`pipe will be able to handle up to 496X(?) in flight texture fetches and thus there can be a fair number ofactive clauses
`waiting for their texture return data.
`
`
`__ (Formatted:blesndnboing+)
`
`
`3-8. ALU Arbitration
`SCT
`ALU arbitration proceeds in almost the same way than texture arbitration. The ALU arbitration logic chooses one of
`the 8 potentially pending ALU clauses to be executed. The choice is made by looking at the fifos from 7 to 6 and
`pring the first onee ready to execuize
`
`
`
`Exhibit 2010.viocR400-Sequenserdos
`
`25584 Bytes*™** © ATI HEcference Copyright Notice on Cover Page © ** saosis iaua :
`
`PMOSH3/B+
`MATESIa
`:
`
`AMD1044_0016673
`
`ATI Ex. 2010
`IPR2023-00922
`Page 14 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 14 of 20
`
`
`
`
`
`
`PAGE
`DOCUMENT-REV. NUM.
`EDIT DATE
`ORIGINATE DATE
`|
`15 of 20
`GEN-CXXXXX-REVA
`4 September, 201524
`414 August,2001 44
`;
`one for the odd cloeke. For exer le here js the se uenein of iwo interleaved ALU clauses (E and O Slands for Even
`
`and Odd):
`
`
`
`
`
`EinsiG Oinstd Einst] Oinstt Einst2 Oinst2 EinstO Oinst3 Einsti Oinsi4 Einsi2 Oinsid...
`_Proceeding this way hides the latency of & clocks of the ALUs.
`
`a
`4-9. Handling Stalls
`When the outputfile is full, the sequencer prevents the ALU arbitration logic to select the last clause (this way nothing
`
`can exit the shader pipe until there is place in the output file. If we-have-the-abilityicexportatanyclausethe packet is
`a@_vertex
`packet and the
`
`position buffer is full
`(POS FULL)
`then the sequencer also prevents a thread to enter the
`exporting clause (47). The sequencerwill set the OUT_FILE_FULL signal n clocks before the outputfile is actually full
`and thus the ALU arbitrer will be able read this signal and act accordingly by not preventing exporting clauses to
`proceed.
`
`_-(FormattedBultandNumbering‘
`5-10. Content of the reservation station FIFOs
`ee —
`
`3 bits of Render State-and 6-7 bits for the base address of the instruction store andsomebits for LOD correction.
`Every other information (such as the coverage mask, quad address, etc.) is put in a FIFO and is retrieved when the
`quad exits the shader pipe to enter in the output file buffer. Since pixels and vertices are kept in order in the shader
`pipe, we only need twofifos (one for vertices and one for pixels) deep enough to cover the shaderpipe latency. This
`size will be determined later when wewill know the size of the small fifos between the reservation stations.
`
`=
`*
`”
`Oe
`
`a
`
`
`
`6-11. The Output File (RB FIFO and Parameter Cache)
`The output file is where program resulls are exported when the pixel/vertex shaderfinishes. It constists of a 512x128
`memory cell that is statically divided between pixels and vertices. Each-section-is-aregular-FlF®.The outputfile has
`1 write port and 1 read port. The sequencer is responsible for managing the addresses of this output file and for
`stalling the shader pipe should this outputfile fill up. The management is done by keeping the tail and head pointers
`of each sections (pixels and vertices) and incrementing them using a simple RoundRobin allocation policy. The
`sequencer must also arbitrate between the PA and the RB for the use of the read port. This arbitration will either be
`priority based or just interleaved evenly (1 read every 2 clocks for each of the blocks).
`
`SS a4 Formatted: Bullets and Numbering
`
`3
`| Ee
`
`712. Interfaces
`
`F-+12.1 External interfaces
`
`Oe
`
`7411211 Sequencer ta Shader Engine Bus
`This is a bus that sends the instruction and constant data to all 4 Sub-Engines of the Shader. Because a newinstruction
`is needed only every 4 clocks, the width of the bus is divided by 4 and both constants and instruction are sent over
`these 4 clocks.
`
`
`Direction
`Bits
`| Description
`| SEQ-> SP
`4
`High onfirst cycle of transfer
`Instruction Start
`SEQ-> SP
`32___| 128 bits transferred aver 4 cycles, alpha first... blue last
`Constant 0
`Constanti|SEQ>SPo [32 | 128 bits transferred over 4 cycles, alphafirst...bluelast __
`
`
`
`Instruction i[SEQ->SP_> SP .30 || 120 bits transferred over 4 cycles (order TBD) 7
`
`
`i
`
` Name
`
`#+212.1.2ShaderEngine to Output File
`* | = .
`
`Every clock each Sub-Engine can output 128 bits of ‘vector’ data and 32 bits of ‘scalar’ data to an output file (7). This
`data will be compressed into 128 bits total prior to storage in output file.
`
`
`
`
`Name
`Direction
`| Bits | Description
`UL_Vector_Out
`| SP-> OF
`| 128
`| Vector Data cut
`Exhibit 2010docRAQG_Sequetcerdec
`26584 Bytes™** © ATI HEcference Copyright Notice on Cover Page © ** poosus i244
`PROGHB04 O347DMGTAS04O246Ph
`
`oe 4 Formatted: Bullets and Numbering
`
`meee
`
`oe
`
`SEES
`
`~
`
`S
`
`=
`
`a
`
`AMD1044_0016674
`
`ATI Ex. 2010
`IPR2023-00922
`Page 15 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 15 of 20
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT
`
`DATE
`
` 14 August, 200144
`PAGE
`R400 Sequencer Specification
`16 of 20
`3.
`4 September, 201524
`
`Bilt
`-
`
`OLSealarOut
`(32
`| Vector Data out
`
`UR_Vector Out | Vector Data out
`UR_Scalar_Out _ Vector Data out
`
`
`
`
`
`
`
`
`
`
`Name
`| Direction
`| Bits
`| Description
`|
`
`LL_VectorOut
`| SP-> OF
`_ Vector Data out
`| Vector Data out
`LL_ScalarOut
`SP-> OF
`
`LRoVectorOut|SP->OF|128 Vector Datacut
`
`LR_Sealar_Out
`SP-> OF
`(32
`| Vector Data out
`
`+312.1.3 Shader Engine to Texture Unit Bus (Fast Bus)
`One quad’s worth of addresses is transferred to
`Texture Unit every clock. These are sourced from a different pixel
`within each of the sub-engines repeating every 4
`clocks. The register-fleregister file index to read must precede the
`data by 2 clocks. The Read address associated with Quad 0 must be sent 1 clock after the Instruction Start signal is
`ion Start.
`sent, so that data is read 3 clocksafter the Instruct
`
`aoe "| Formatted: Bullets and Numbering
`
`to the Register-FileRegister file every clock. These are directed to a
`One Quads worth of Texture Data may be written
`4 clocks. The register-Aleregister file index to write must accompany
`different pixel of the sub-engines repeating every
`the data. Data and Index associated with the Quad
`0 must be sent 3 clocks after the Instruction Start signal is sent.
`
`| Direction
`| Bits
`| Description
`7
`Name
`
`Tex_Read_Register_Index
`SEQ->SP
`| 8
`Index into Register-FiieRegister files for reading Texture
`-
`~
`—
`|
`7
`
`_ Address{
`7
`| 4 Texture Addresses read from the Register-FileRegister
`Tex_RegFile_Read_Data
`| SP->TEX
`512
`I
`| file
`Tex_Write_Register_Index
`| SEQ->TEX
`| &
`‘Index into Register-fleRegister fle for write of returned
`I
`\
`_ Texture Data
`
`
`
`
`
`an Formatted: Bullets and Numbering
`
`F+412.1.4 Sequencer to Texture Unit bus (Slow Bus)
`
`sequencer on wich clause it is now working and if the data in the
`Once every four clock, the texture unit sends to the
`n update the texture counters for the reservation station fifos. The
`registers is ready or not. This way the sequencerca
`sequencer also provides the intruction and constants for the texture fetch to execute and the address in the register
`fileregisterfile where to write the texture return data.
`
`
`Name
`| Direction Bits | Description a
`
`Tex_Ready
`| TEX SEQ
`4
`| Data ready
`3
`_ Clause number
`TexClauseNum
`| TEX SEQ
`
`oe ?
`| Texture constants X bits sent over 4 clacks |
`L?
`_ Texture fetch instruction X bits sent over 4 clocks
`
`
`
`
`
`
`Name
`| Direction
`Bits Description
`Interpolator_Register_Index | SEQ->SP
`8
`_Index
`into RBegister-—FileRegisier
`files
`for write
`of
`
`I
`_Interpolator/Index Data
`
`
`Interpolator_Write_Mask
`| SEQ->SP
`1
`| Write Mask. The same write mask is used for all 4 pixels
`Interpolator_Write_Data
`| RE/PA->SP
`512 | 4 interpolated vectors or vectors of indices
`
`Formatted: Bullets and Numbering
`
`
`
`
`
`
`
`
`
`
`
`
`12.1.6 PA? to sequencer
`| Bits Description
`Name
`| Direction
`= 4 Formatted
`|.
`%
`| Deaslocation adress sent by the PA telling the Sequencer
`Adress,
`PA-SE
`{FormattedOO
`_ that it is now possible to free this space in the parameter|>.
`| buffer, This token js a pointer in the parameter cache and
`(Formatted
`| 4 bits fo tell the size wich Is to be freed uD,
`
`_ ad Formatted: Bullets and Numbering
`
`.
`
`Exhibit 2010.dock400Sequensendes
`
`25584 Bytes*** © ATI
`
`PMOSHA/O1.08:47
`
`Reference Copyright Notice on Cover Page © *** psosisioua
` EEESAO
`pM
`oe
`
`AMD1044_0016675
`
`ATI Ex. 2010
`IPR2023-00922
`Page 16 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 16 of 20
`
`
`
`
`
`EDIT DATE
`
`ORIGINATE DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`17 of 20
`|
`GEN-CXXXXX-REVA
`4 September, 201524
`14 August, 200144
`L a
`4
`A
`riya
`a
`13. Examples of program executions
`
`13.1.1 Sequencer Control! of a Vector of Vertices
`to the RE’s Vertex FIFO
`1.
`PA sends a vector of 16 vertices (actually vertex indices — 32 bits/index for 512 bit total)
`
`
`*_State painter as well as tag inte position cache is sent along with vertices
`
`
`
`
`®
`space was allocated in the position cache for transformed position before the vector was sent
`
`o
`also before the vector is sent to the RE, the CP has loaded the global instruction store with the vertex
`
`shader program (using the MH?)
`«
`The vertex program is assumed to be loacied when we receive the vertex vector,
`
`« the SEQ then accesses the IS base for this shader using the local state pointer (provided fo all
`sequencers by the RBBM when the CP is done loading the programm)
`2. SEQ arbitrates between the Pixel FIFO and the Vertex FIFO — basically the Vertex FIFO always has priority
`»
`_at this point the vector is removed from the Vertex FIFO
`«
`the arbitrer is not going to select a vector fo be transformed ifthe parameter cacheis full unless the pipe as
`nothing else to do (ie no pixels are in the pixelfifo).
`
`the
`
`
`
`®
`
`
`
`selected by the ASM arbiter and gets the instructions for ALU
`
`
`
`3. SEQ allocates space in ihe SP register file for index data plus GP Rs used by the program
`e
`the number of GPRs required by
`the programis stored in a local state register, which is accessed _using
`
`
`state pointer that came gown with the vertices
`SEG will not send vertex data until space in the register file has been allocated
`®
`4, 3EQ sends thevector to the SP register file over the RESF interface (which has a bandwidth of 512bits/cycle)
`o
`_the 16 vertex indices are sent to the 16 register files over 4 cycles
`e
`RFO of SU0, SU1, SU2, and SUS is written the first cycle
`« RF of SU0, SU1, SU2, and SU3 is written the second cycie
`
`eo
`RF2 of SU0,SU1, SUZ, and SUS is written the third cvele
`RES of SU0, SU1, SU2, and SUS is written the fourth cycle
`
`
`
`
`ificant 32 bits (fleating paint format?) (what about compoundindices)
`
`SLING "2 ae theregist
`ainingdata
`
`bits. 2)
`
`
`
`
`5. SEQ construcis a control packet for the vector and sends ft te the firsi reservation station (the FIFO in front of
`texture state machine 0,ar TSMQFIFO)
`»
`_the control packet contains the state pointer, the tag to the position cache and a registerfile base pointer.
`8. TSMacc