throbber

`
`
`/
`Author:
`
`
`
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`4 Septernber, 201524
`=
`
`GEN-CXXXXX-REVA
`
`PAGE
`
`4 of 20
`
`-
`:
`
`ORIGINATE DATE
`
`14 August, 200144
`Laurent Lefebvre
`
`25584 Bytes*** © ATI HERference Copyright Notice on Cover Page © *** pagans yoad
`
`
`
`
`
`
`
`
`
`Issue To:
`Copy No:
`
`R400 Sequencer Specification
`
`SEQ
`
`Version 0.42
`
`Il provides an overview of the
`Overview: This is an archiectural specification for ine R400 Sequencer block (SEQ).
`required capabilities and expected uses of the block.
`it also describes the block interfaces,
`internal sub-
`blocks, and provides internal state diagrams.
`
`AUTOMATICALLY UPDATED FIELDS:
`Document Location:
`CAperforcer400\archidoclgh\REVWR400Sequencer.doc
`
`Gurrent Intranet Search Title: R400 Sequencer Specification
`APPROVALS.»
`
` Signature/Date
`
`es
`== Name/Dept
`
`
`
`
`
`
`
`“Remarks:
`
`THIS DOCUMENTCONTAINS [RRNFORMATION THAT COULD BE
`
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES
`
`
`INC. THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`
`
`
`“Copyright 2001, ATI Technologies Inc. All rights reserved. The material in this document constitutes an unpublished a
`
`work created in 2001. The use of this copyright notice is intended to provide notice that AT] owns a copyright in this
`[>
`unpublished work. The copyright notice is not an admission that publication has occurred. This work contains
`EEE:oprictary information and trade secrets of ATI. No part of this document may be used, reproduced, or
`transmitted in any form or by any meanswithout the prior written permission of ATI Technologies Inc.”
` a
`
`Exhibit 2010 dock400-Boquencoreos
`
`PMOSSSi4DS-4 2 RMOARAD9240ond.
`
`
`
`:
`
`ATI 2010
`
`LGv. ATI
`IPR2015-00325
`
`AMD1044_0016660
`
`ATI Ex. 2010
`IPR2023-00922
`Page 1 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 1 of 20
`
`

`

` Vat
`
`(2a
`all
`IU
`
`ORIGINATE DATE
`14 August, 200144
`Lu
`FAR
`
`EDIT DATE
`
`4 September, 201524
`oy
`
`
`
`
`
`R400 Sequencer Specification
`
`PAGE
`2 of 20
`
`2. INTERPOLATED. DATA-BUS were tO
`
`
`
`7, JEXTURE-ARBITRATION.
`
`8 ALU-ARBITRATION snvceesn
`
`
`
`Eile
`
`
`
`Unit-Bus(Fasi Bus)
`t+2-+4
`Sequencerto Texture Unit bus
`(Slow-Bus}-44
`
`
`
`6-1-2—ShaderEngine-io-QuipulFile
`+49
`
`Bus-(FastBus)
`
`Table Of Contents
`
`1.
`OVERVIEW oocccccccecesesssssnsessessussssesssses 43
`Li.
`Top Level Block Diegrarn 54
`
`
`12 Data Flow oraph. OF
`
`L3.
`Control Grapn.ee 1249
`
`2.
`INTERPOLATED DATA BUS....... 1240
`3.
`INSTRUCTION STORE 0 ....sccece2: 42190
`4,
`CONSTANT STORE oe ceccerecccrsesees 41344
`&.
`LOOPING AND BRANCHES........ 4344
`6.
`REGISTER FILE ALLOCATION... 1344
`7.
`TEXTURE ARBITRATION |... 1442
`&.
`ALU ARBITRATION 0. ccc sereeeee 1412
`9.
`HANDLING STALLS oo ccccsecsssesnees 4543
`10.
`CONTENT OF THE RESERVATION
`STATION FIFOS ooo. esceensnseenenerennnens 1543
`
`
`
`
`
`it.
`THE OUTPUT FILE (RB FIFO AND
`PARAMETER CACHE). .....cccccsscsesesssssees 1543
`
`12.
`INTERFACES... ccc. 1513
`121
`External Interfaces... 4543
`12.1.1
`Sequencer to Shader
`ENGING BUS. eee eeepc par ren es 4543
`12.1.2ShaderEnginetoOutput
`File
`1543
`12.1.3
`Shader Engine to Texture
`
`12.1.4
`Sequencer to Texture Unit bus
`(Slow Bus) 1644
`12.1.5
`Shader Engine to RE/PA Bus
`
`
`
`PA? to sequencer.............. 1644
`12.1.6
`
`
`13.
`EXAMPLES OF PROGRAM
`EXECUTIONS... cceesecesssnenmnesseenevenane 1744
`
`Sequencer Control of a Vector
`13.1.1
`of Vertices 1744
`
`13.1.2
`of Pixels
`
`Sequencer Control of a Vector
`1846
`
`13.1.3 Notes eceeeeeseeeses 1947
`14. OPEN ISSUES.
`ees
`
`OVERVIEW crccescesceccssseeeerseceeeersenenns 3
`
`
`
`
`Exhibit 2010.docR400.Sequeaceraios
`
`25864 Bytes*** © ATI
`
`Reference Copyright Notice on Cover Page © ***
`PMGEHGA03:47-PMOFHOtO246Dhl
`
` nencaie sage Q
`
`AMD1044_0016661
`
`ATI Ex. 2010
`IPR2023-00922
`Page2 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 2 of 20
`
`

`

`
`
` ORIGINATE DATE
`PAGE
`DOCUMENT-REV. NUM.
`|
`EDIT DATE
`
`
`
`14 August 200144 3 of 20 4 September, 201524 | GEN-CXXXXX-REVA
`
`644on Sequencer.to-Texture-Unit-bus
`
`
`ht
`
`(SlowBus}
`
`Revision Changes:
`Rev 0.1 (Laurent Lefebvre}
`Date: May 7, 2001
`
`Rev 0.2 (Laurent Lefebvre)
`Date : July 9, 2001
`Rev 0.3 (Laurent Lefebvre)
`Date: August 6, 2001
`Rey 0.4 (Laurent Lefebvre)
`Date : August 24, 2001
`
`First draft.
`
`Changed the interfaces to reflect the changesin the
`SP. Added somedetails in the arbitration section.
`Reviewed the Sequencer spec after the meeting on
`August 3, 2001.
`Added the dynamic allocation method for register
`file and an exam le (written in
`part
`by Vic) of the
`
`Exhibit2010docR400_Sequencondec
`
`26884 Bytes** © ATI HEcference Copyright Notice on Cover Page © ** gous 1244 ‘
`POS3104-83-47EMGFAG040240PM.
`
`AMD1044_0016662
`
`ATI Ex. 2010
`IPR2023-00922
`Page 3 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 3 of 20
`
`

`

`
`
`
`|
`ORIGINATE DATE
`EDIT DATE
`R400 SequencerSpecification
`PAGE
`L
`£7
`A
`oy
`|
`14 August, 200144
`4 September, 201524
`4 of 20
`
`|
`
`:
`
`1. Overview
`The sequencer first arbitrates between vectors of 16-4neybe-32) vertices that arrive directly from primitive assembly
`and vectors of 84-quads (16 pixels) <92-pixele}-that are generated in the raster engine.
`
`The vertex or pixel program specifies how many GPR’s it needs to execute. The sequencer will not start the next
`vector until the needed spaceis available.
`
`It chooses as-two_ALU clauses and a texture clause to execute, and
`The sequencer is based on the R300 design.
`executesall of the instructions in aa clause before looking for a new clause of the same type. Two ALU clauses are
`executed interleaved to hide the ALU latency. Each vector will have eight texture and eight ALU clauses, but clauses
`do not need to contain instructions. A vector of pixels or vertices ping-pongs along the sequencer FIFO, bouncing
`from texture reservation station to alu reservation station. A FIFO exists between each reservation stage, holding up
`vectors until the vector currently occupying a reservation station has left. A vector at a reservation station can be
`chosen toe execute. The sequencer looks at all eight alu reservation stations to choose an alu clause to execute and
`all eight texture stations to choose a texture clause to execute. The arbitrator will give priority to clauses/reservation
`
`stations closer to the tep-bottom of the pipeline. It will not execute an alu clause until the texture fetchesinitiated by
`the previous texture clause have completed. There are two separate sets of reservation stations, one for pixel vectars
`and one for vertices vectors. This way_a
`pixel can pass a vertex and a vertex can pass a pixel.
`
`
`
`To support the shader pipe the raster engine also contains the shader instruction cache and constant store. There
`are only one constant store for the whole chip and one instruction store. These will be shared among the four shader
`pipes.
`
`
`
`Exhibit 2010.viocR400-Sequenserdos
`
`25584 Bytes*™** © ATI HEcference Copyright Notice on Cover Page © ** saosis iaua :
`PMOSH3/O1-O3-47PMGFAStO20OR
`Es
`
`AMD1044_0016663
`
`ATI Ex. 2010
`IPR2023-00922
`Page 4 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 4 of 20
`
`

`

`
` |
`
`ORIGINATE DATE
`PAGE
`DOCUMENT-REV. NUM.
`EDNT DATE
`
`GEN-CXXXXX-REVA 5 of 20
`4 September, 201524
`Ee
`2
`L
`a
`|
`14 August, 200144
`1 Top Level Block Diagram
`
`Possible delay for available GPRS[gj aunmnsnmunuinnunnnnnnnnenmmmanntn
`
`extare arbitrator
`
`bs
`
`exture arbitrator
`
`
`rexture clause 0
`
`eservation station
`
`Lg————-[ FIFO |-¢———
`ALU clause C
`Megteservalionstalion
`nn
`BIRO
`i
`pe
`Texture clause |
`
`eservation station
`i
`ceceeneeceet FIFO
`a
`
`<
`ra
`JALU clause 1
`
`—
`reservationslalion
`denned
`pel FIFO
`'
`‘exture clause 2
`
`eservation station
`<@
`FIFO
`
`
`Le—ALU clause 2
`reservation station
`i FIFO.
`be
`'
`Texture clause 3
`
`eservation station
`=<
`FIFO Leg
`jeg——ALU clause 3
`—
`reservalionstation
`i
`pel FIFO
`be
`1
`exture clause 4
`eservationstation
`FIFO
`|<
`
`teservationstation
`
`
`
`
`‘eservation station
`
`
`vettex‘pixel vector arbitrator
`
`
`
`
`
`LU clause 4
`reservationstation
`
`FIFO
`
`[Texture clause 5
`‘eservation station
`
` i
`
`«@—ALUclause 5
`reservation slalion
`
`iG
`
`
`‘extire clanse 6
`
`reservation station
`
`
`ALU elanse 6
`beservalionstation
`
`
`extire clanse 7
`
`
`
`*
`lead
`.LU clause 7
`
`There are two sets of the above figure. one for vertices and one for pixels.
`
`The rasterizer always checks the vertices FIFO first and if allowed by the sequencer sends the data to the shader.If
`the vertex FIFO is empty then, the rasterizer takes the first entry of the pixel FIFO (@ vector of 32-16pixels) and
`sends it to the interpolators. Then the sequencer takes control of the packet. The packet consists of 3 bits of state, 6-
`7 bits for the base address of the Shader program and some information on the coverage to determine texture LOD.
`All other information (2x2 adresses) is put in a FIFO (one for the pixels and one for the vertices) and retrieved when
`the packet finishesits last clause.
`
`Exhibit 2010dock40G_Sequencer.dec
`
`25584 Bytes*** © ATI
`
`Reference Copyright Notice on Cover Page © =
`PMOSH3/O1-O3-47PMGFAStO20OR
`
`GG0416 i244
`
`AMD1044_0016664
`
`ATI Ex. 2010
`IPR2023-00922
`Page 5 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 5 of 20
`
`

`

`
`
` ORIGINATE DATE
`
`PAGE
`EDIT DATE
`6 of 20
`4September, 201524
`14 August, 200114
`-
`ca
`Tae
`&
`2h
`AE
`BE
`On receipt of a packet, the input state machine (not pictured but just before the first FIFO) allocated enough spacein
`the registers to store the interpolated values and temporaries. Following this, the input state machine stacks the
`packetin the first FIFO.
`
`R400 Sequencer Specification
`
`On receipt of a command, the level 0 texture machine issues a texure request and corresponding register address for
`the texture address (ta). A small command (temd) is passed to the texture system identifying the current level number
`(0) as well as the register sel-being-usedwrile address for the texture return data. One texture request is sent every 4
`clocks causing the texturing of four 2x2s worth of data (or 16 verlices). Once all the requests are sent the packetis
`put in FIFO 1.
`
`Upon recept of the return data-identifiedby the- temnd_containing the Jevel_number-0), the level O-fexture machine
`}
`ihe texture unit writes the data to the register file using the write
`address that was
`provided by
`the level 0 texture machine and sends the clause number(0) to the level 0 texture state
`
`machine to signify thal the write is done and thus the data is ready. Then, ihe level 0 lexiure machine-# increments
`the counter of FIFO ere-1_to signify to the ALU_1 that the data is ready to be processed.
`
`On receipt of a command, the level O ALU machine first decrements the input FIFO counter and then issues a
`complete set of level O shader instructions. For each instruction, the state machine generates 3 source addresses,
`one destination address (2-3 cycles later) and an instructionjd-wich 4¢-usedto index inte the inctruction-etore. Once
`the last instruction as been issued, the© packet is put into FIFO 2. See eeeteeOFSPRE
`
`Baekehewe iven_ time
`
`
`
`
`(and two _arbitrers)-In-this-case,the
`instructions-of-a-vector-are_interleaved_with-the-instructions-of-ihe-cther-vecior, One arbitrer will
`arbitrate
`
`over the odd clock cycles and the other one will arbitrate over the even clock cycles. The only constraints
`between ihe two arbitrers is that they are not allowed to pick the same clause number as they other one js
`currently working on if the packet os of the same type.
`
`it can export the position if the position is ready, So the
`ifthe packet is a vertex packet, upon reaching ALU clause 4,
`arbitrer must
`prevent ALU clause 4 to be selected if the positional buffer is full
`(or can't be accessed). Along with the
`
`
`positional data,
`the location where the vertex data is to be put is also sent (parameter data pointers).
`
`
`
`
`
`All other level process in the same way until the packetfinally reaches the last ALU machine (8). On completion of the
`level 8 ALU clause, a valid bit is sent to the Render Backend which picks up the color data. This requires that the last
`instruction writes to the output register — a condition that is almost always true.
`If the packet was a vertex packet,
`
`instead of sending the valid bit to the RB, it is sent lo the PA-whieh-picke-ue-the-daie-and-sute-#inic-the-veriexstore
`so it can know thal the data present in the parameter store is valid.
`
`Only one-lwoALU state machine may have access to the SRAMregisier fle address bus or the instruction decode
`bus at one time. Similarly, only one texture state machine may have access to the 3RAMregister file address bus at
`one time. Arbitration is performed by fvoethree arbitrer blocks (one-twofor the ALU state machines and onefor the
`texture state machines). The arbitrers always favor the higher number state machines, preventing a bunch of half
`finished jobs from clogging up the SRAMregister Sfiles.
`
`Each state machine maintains an address pointer specifying where the 16-Cer-42) entries vector is located in the
`SRAMregister file (the texture machine has two pointers one for the read address and one for the write). Upon
`completion of its job,
`the address pointer is incremented by a predefined amount equal to the total number of
`registers required by the shading code. A comparison of the address pointer for the first state machine in the chain
`(the input state machine), and the last machine in the chain (the level 8 ALU machine), gives an indication of how
`
`Extibli 2010docR400Sequencersios
`
`25684 Bytes*™* © ATIHEcference Copyright Notice on Cover Page © ** goose isa
`PROGHOH O3.4 7PMORHGINO26BM
`
`AMD1044_0016665
`
`ATI Ex. 2010
`IPR2023-00922
`Page 6 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 6 of 20
`
`

`

`
`
`
`EDIT DATE
`ORIGINATE DATE
`
`te a
`
`
`7 of 20
`GEN-CXXXXX-REVA
`4 September, 201524
`14 August, 200144
`interpolabod data from RE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`egiste:
`
`512x128 (built as 4 128x128 or 16 128x327
`
`
`Address to texure
`
`or verlex parameter dats lo RE throughtexture block
`or pixel data to RB through texture black
`
`
`
`
`
`E32
`
`128 bit data OT
`
`constants from RE
`
`
`
`perand mix
`
`
`
`
`
` 432 bit MAC units
`
`
`
`control from RE
`
`control from RH
`
`
`
`
`
`
`es
`128 bit scalar/vector|
`i
`ALU
`
`
`
`Exhibit 2010.docR4GG_Sequeacendes
`
`25864 Bytes*** © AT]HEcference Copyright Notice on Cover Page © ** p94;
`PMOSHW/O4 03:47PMORHSIO}G26PM
`
`sad
`
`AMD1044_0016666
`
`ATI Ex. 2010
`IPR2023-00922
`Page 7 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 7 of 20
`
`

`

`
`
`
`I I
`
`
`
`I - |
`
`pip stage
` |
`
`
`texture reques!
`
`pipeline silage
`
`ScalarUnit
`
`
`
`
`
`
`
`L c(fT
`
`
`
`| €
`
`textureaddress
`
`‘exturedetann
`Reference Copyright sotice uit Cover Page © ***
`
`ORIGINATE DATE
`
`EDIT DATE
`
`14 August, 200144.
`ee
`ek
`
`4 September, 201524
`seh RE
`sesh
`
`R400 Sequencer Specification
`
`PAGE
`8 of 20
`
`
`
`
`Register File
`
`——
`|
`
`| Caaterinpavutput)Mac
`
`texture request
`pipeline
`
`
`una!
`
`aa
`
`Register File!
`
`
`
`uo
`
`€ g
`
`/
`g
`i
`scalar input/output
`mac||
`ig
`ipeline
`———_1
`|
`
`
`
`'
`i
`|
`bt
`2]
`sy[ss
`UW|
`o
`mitived
`yt
`
`| t
`Sidataspil
`
`texture reque:
`
`text
`
`L =b
`
`file}
`
`
`
`|
`|
`
`Registe:
`
`MAG
`
`
`
`
`
`Gaerne
`
`aJt scaler inputicutputSe
`
`
`
`
`
`
`
`te!
`
`&
`
`
`
`éscalarinput/output
`
`
`
`
`
`=| texture request
`
`/
`\)
`Exhibit 2010.docR4ttSequencerdes
`
`
`to Primitive Assembly Unit or RenderBackend
`
`25884 Bytes*** © ATI
`
`PMOSHW/O4 03:47PMORHSIO}G26PM
`
`‘t
`
`somasag
`
`2
`
`AMD1044_0016667
`
`ATI Ex. 2010
`IPR2023-00922
`Page8 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 8 of 20
`
`

`

`
`
` |
`PAGE
`DOCUMENT-REV. NUM.
`EDIT DATE
`ORIGINATE DATE
`at L = fu .
`
`14 August, 200174
`4 Sepiember, 201524
`GEN-CoOOOG-REVA
`9 of 20
`|
`|
`
`
`
`1.2 Data Flow graph
`
`
`
`Exhibit 2010docRAQG_Sequetcerdec
`
`26584 Bytes™** © ATI HEcference Copyright Notice on Cover Page © ** poosus i244
`
`PROGHB04 O47 RACESIOOPh
`
`ae
`
`a
`
`AMD1044_0016668
`
`ATI Ex. 2010
`IPR2023-00922
`Page 9 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 9 of 20
`
`

`

`
`
`ORIGINATE DATE
`14August, 200144
`ed E
`
`
`
`
`
`
`———/
`7
`|
`=
`L.
`(Scalar input/output
`a es N
`
`|
`pipeline stage
`|
`
`tuec
`
`Of
`
`i
`
`
`
`
`EDIT DATE
`4 September, 201524
`ra
`
`R400 Sequencer Specification
`
`PAGE
`10 of 20
`
`
`
`Register File
`
`MAC
`
`|
`
`Register File
`
`i
`
`BB
`
`
`
`fe ren
`~\
`
`
`
`
`en
`
`
`
`
`requeg
`
`~
`
`|
`= 7
`( Scalar Inpuvoutput
`-
`/
`
`|
`pipeline stage
`'
`Ro —_
`|
`;
`of
`i
`i
`:
`
`MAC
`
`I
`i
`|
`ly
`it
`|
`
`oe
`c
`>
`i
`8
`no
`ee RegisterFile
`ve
`
`
`
`
`
`
`
`f
`|
`
`BLK At|| | texture rel bet
`
`
`
`
`|
`SN (alt input/output
`||
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`eo
`aoe
`i
`|
`< |
`i
`‘ scalar input/output
`
`pipeline stage
`|
`1
`|
`hay
`ix]
`iB!
`fe!
`=
`Z|
`LS!
`
`MA‘
`
`i
`|
`\
`|
`|
`|
`LW
`
`Ragister File
`
`\
`
`7
`|
`
`i
`|
`al
`
`|
`he
`3
`Ts. —_
`texturelS quest
`io.
`|
`S
`2
`cy
`=
`3
`irl
`
`|
`|
`'
`|
`|
`1
`'
`!
`|
`———
`'
`i
`|
`|
`\
`i
`wah} | |
`
`|
`
`
`
`
`
`
`
`on
`1
`
`|
`
`/
`| fosoms
`{
`i
`
`/
`i
`
`A
`
`(
`
`
`
`to Primitive Assembly Unit or RenderBackend
`
`K
`
`>y
`
`th
`2
`oO
`2
`s
`
`PROGHOH O3.4 7PMORHGINO26BM
`
`
`EahibnttaodunateemerregrtaPSE P-Page-Ggre v2.48 :
`
`
`
`AMD1044_0016669
`
`ATI Ex. 2010
`IPR2023-00922
`Page 10 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 10 of 20
`
`

`

`:
`
`:
`pee
`:
`
`.
`
`=
`
`-
`
`+
`|
`
`STORE
`CONSTANT
`
`|||
`
`a
`dele.
`
`|
`|
`l
`
`REGISTER FILE
`
`INSTRUCTION
`— STOREICACHE
`
`
`
`OPERAND MUX
`
`
`
`
`
`
`
`
` fk fn
`
`
`
`
`
`
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`14 August, 200714
`eed
`
`4 Seplember, 201524
`ot
`
`GEN-CXXAKX-REVA
`
`PAGE
`
`41 of 20
`
`
`
`
`!
`Interpolated
`data / Vertex indexes
`
` 4—YdOL—
`
`
`TEXTURE
`
`ALU fe
`
`SCALAR
`ALU
`
`
`
`
`
`
`
`
`
`The gray area represents blocks that are replicated 4 times per shader pipe (16 times on the overall chip).
`
`Exhibit 2070.docR40G_Sequercendec
`
`25884 Bytes*** © ATIHEcference Copyright Notice on Cover Page © ***
`PROGHOH O3.4 7PMORHGINO26BM
`
`peoans sad
`
`AMD1044_0016670
`
`ATI Ex. 2010
`IPR2023-00922
`Page 11 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 11 of 20
`
`

`

`
`
` |
`
`PAGE
`R400 SequencerSpecification
`EDIT DATE
`ORIGINATE DATE
`
`t iin! £7 A
`
`
`14 August, 200144
`4 Seplember, 201824
`42 of 20
`|
`:
`|
`
`1.3 Control Graph
`
`Ciause # + Rdy
`
`cmp
`
`cst
`
`v
`

`

`
`TX
`
`Phase
`RdAddr
`
`L
`
`4
`
`WrAddr
`
`IS
`
`SEQ
`
`CST
`
`|
`
`||
`|
`
`|
`
`c Wrvec
`8
`A
`| WrSeal
`
`WrAddr
`
`RdAddr
`PAIRB
`
`:
`radar
`
`‘
`
`fow iy
`
`‘ yy
`
`SP
`
`OF
`
`In green is represented the Texture control interface, in red the ALU controlinterface, in blue the Interpolated/Vector
`control interface and in purple is the outputfile control interface.
`
` WrAddr
` Exhibit 2010.viocR400-Sequenserdos
`
`2. Interpolated data bus
`Since each of the registerfile is actually physically divided (one 128x128 per MAC) and we don't havetheplace to
`hold 8 maximum size vector of vertices in the parameter buffer, we need to inlerpolate on a
`pararneter basis rather
`
`than on a quad basis. So the order to the register file will be:
`
`QOPO QiPO G2P0 OSP0 QGP1 G1P1 G2P1 QSP1 QOP2 Q1P2 ...
`
`Instruction Store
`3__
`There is going to be only one instruction store for the whole chip. may contain up jo 2000 instructions of 96 bits
`each,
`
`me
`
`“
`Ss
`"
`.
`:
`a. rrrrr—“—~srs——CSssS
`
`ee
`
`ee
`
`(ISSUE : The instruction store is loaded by the sequencer using ine memory hub 7}.
`
`The read bandwith from this store is 24 bits/clock/pipe, To achieve this this instruction store Js likely to be broken up
`into 4 blocks, An ALU instruction section CIR/1VW)_ split in two and a texture section CIR/IW) also split in two, The
`bandwith out of those memories is 96 bits/clock,
`
`
`
`
`
`25584 Bytes*™** © ATI HEcference Copyright Notice on Cover Page © ** saosis iaua :
`PMOSH3/O1-O3-47PMGFAStO20OR
`Es
`
`AMD1044_0016671
`
`ATI Ex. 2010
`IPR2023-00922
`Page 12 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 12 of 20
`
`

`

`
`
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
` ORIGINATE DATE
`
`
`A L ou fin. fu .
`14 August, 200114
`4 Sepiember, 201524
`GEN-CoOOOG-REVA
`|
`43 of 20
`
`
`
`
`a
`4 Constant Store
`a remapin
`The constant store is managed by ihe CP. The sequencer is aware of where ine constanie are using
`
`table also managed by the CP. A likely size for the constant store is 512x128 bits. The constant store is also planned
`
`
`to be shared, The read BWfrom the constant store is 512/4 bits/clack/pipe and the write bandwith is 32/4 bits/clock,
`
`
`5, Looping and Branches
`Loops and branches are planned to be supported and will have to be dealt with al the sequencer level. However,
`
`
`
`still unclear if we plan on supporting data dependent branches or not,
`
`iLis
`
`
`
`6. Register file allocation
`the register file in
`In both cases,
`The register file allocation for vertices and pixels can either be static or dynamic,
`Managed using two round robins (one for pixels and one for vertices), In tnedynamic case the boundary between
`
`pixels andverticesis allowed tomove, |
`atic case
`VERTEXREGSIZE forvertices and296-
`VERTEX REG SIZE for pixels.
`
` oo") Formatted: Bullets and Numbering
`
`ce
`
`Formatted: Bullets and Numbering
`
`- | Formatted: Bullets and Numbering
`
`:
`
`
`)
`
`Exhibit 2010docRAQG_Sequetcerdec
`
`26584 Bytes™** © ATI HEcference Copyright Notice on Cover Page © ** poosus i244
`PROGHB04 O347DMGTAS04O246Ph
`
`a
`
`AMD1044_0016672
`
`ATI Ex. 2010
`IPR2023-00922
`Page 13 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 13 of 20
`
`

`

`
`
`ORIGINATE DATE
`14 August, 200144
`
`EDIT DATE
`4 September, 201324
`
`R400 Sequencer Specification
`
`PAGE
`
`Peat
`
`top.Vertices arein orangeand pixels in q reen, Theblue lineis thetailofthe vertices and the green ling is thetail of
`the
`pixels, Thus anything between the two lines is shared. When pixels meets vertices the line turns white and the
`
`
`
`
`bouncary is static until boih vertices and okels share the same “unallocated bubble”. Then the bouncary as allowed
`moving again,
`
`2-7, Texture Arbitration
`The texture arbitration logic chooses one of the 8 potentially pending texture clauses to be executed. The choice is
`made by locking at the fifos from 7 to 0 and picking the first one ready to execute. Once chosen, the clause state
`machine will send one 2x2 texture fetch per clock (or 4 fetches in one clock every 4 clocks) until all the texture fetch
`instructions of the clause are sent. This means that there cannot be any dependencies between two texture fetches
`of the sameclause.
`
`
`
`
`
`
`
`
`4 Formatted: Bullets and Numbering
`
`
`
`The arbitrator will not wait for the texture fetches to return prior to selecting another clause for execution. The texture
`pipe will be able to handle up to 496X(?) in flight texture fetches and thus there can be a fair number ofactive clauses
`waiting for their texture return data.
`
`
`__ (Formatted:blesndnboing+)
`
`
`3-8. ALU Arbitration
`SCT
`ALU arbitration proceeds in almost the same way than texture arbitration. The ALU arbitration logic chooses one of
`the 8 potentially pending ALU clauses to be executed. The choice is made by looking at the fifos from 7 to 6 and
`pring the first onee ready to execuize
`
`
`
`Exhibit 2010.viocR400-Sequenserdos
`
`25584 Bytes*™** © ATI HEcference Copyright Notice on Cover Page © ** saosis iaua :
`
`PMOSH3/B+
`MATESIa
`:
`
`AMD1044_0016673
`
`ATI Ex. 2010
`IPR2023-00922
`Page 14 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 14 of 20
`
`

`

`
`
`
`PAGE
`DOCUMENT-REV. NUM.
`EDIT DATE
`ORIGINATE DATE
`|
`15 of 20
`GEN-CXXXXX-REVA
`4 September, 201524
`414 August,2001 44
`;
`one for the odd cloeke. For exer le here js the se uenein of iwo interleaved ALU clauses (E and O Slands for Even
`
`and Odd):
`
`
`
`
`
`EinsiG Oinstd Einst] Oinstt Einst2 Oinst2 EinstO Oinst3 Einsti Oinsi4 Einsi2 Oinsid...
`_Proceeding this way hides the latency of & clocks of the ALUs.
`
`a
`4-9. Handling Stalls
`When the outputfile is full, the sequencer prevents the ALU arbitration logic to select the last clause (this way nothing
`
`can exit the shader pipe until there is place in the output file. If we-have-the-abilityicexportatanyclausethe packet is
`a@_vertex
`packet and the
`
`position buffer is full
`(POS FULL)
`then the sequencer also prevents a thread to enter the
`exporting clause (47). The sequencerwill set the OUT_FILE_FULL signal n clocks before the outputfile is actually full
`and thus the ALU arbitrer will be able read this signal and act accordingly by not preventing exporting clauses to
`proceed.
`
`_-(FormattedBultandNumbering‘
`5-10. Content of the reservation station FIFOs
`ee —
`
`3 bits of Render State-and 6-7 bits for the base address of the instruction store andsomebits for LOD correction.
`Every other information (such as the coverage mask, quad address, etc.) is put in a FIFO and is retrieved when the
`quad exits the shader pipe to enter in the output file buffer. Since pixels and vertices are kept in order in the shader
`pipe, we only need twofifos (one for vertices and one for pixels) deep enough to cover the shaderpipe latency. This
`size will be determined later when wewill know the size of the small fifos between the reservation stations.
`
`=
`*
`”
`Oe
`
`a
`
`
`
`6-11. The Output File (RB FIFO and Parameter Cache)
`The output file is where program resulls are exported when the pixel/vertex shaderfinishes. It constists of a 512x128
`memory cell that is statically divided between pixels and vertices. Each-section-is-aregular-FlF®.The outputfile has
`1 write port and 1 read port. The sequencer is responsible for managing the addresses of this output file and for
`stalling the shader pipe should this outputfile fill up. The management is done by keeping the tail and head pointers
`of each sections (pixels and vertices) and incrementing them using a simple RoundRobin allocation policy. The
`sequencer must also arbitrate between the PA and the RB for the use of the read port. This arbitration will either be
`priority based or just interleaved evenly (1 read every 2 clocks for each of the blocks).
`
`SS a4 Formatted: Bullets and Numbering
`
`3
`| Ee
`
`712. Interfaces
`
`F-+12.1 External interfaces
`
`Oe
`
`7411211 Sequencer ta Shader Engine Bus
`This is a bus that sends the instruction and constant data to all 4 Sub-Engines of the Shader. Because a newinstruction
`is needed only every 4 clocks, the width of the bus is divided by 4 and both constants and instruction are sent over
`these 4 clocks.
`
`
`Direction
`Bits
`| Description
`| SEQ-> SP
`4
`High onfirst cycle of transfer
`Instruction Start
`SEQ-> SP
`32___| 128 bits transferred aver 4 cycles, alpha first... blue last
`Constant 0
`Constanti|SEQ>SPo [32 | 128 bits transferred over 4 cycles, alphafirst...bluelast __
`
`
`
`Instruction i[SEQ->SP_> SP .30 || 120 bits transferred over 4 cycles (order TBD) 7
`
`
`i
`
` Name
`
`#+212.1.2ShaderEngine to Output File
`* | = .
`
`Every clock each Sub-Engine can output 128 bits of ‘vector’ data and 32 bits of ‘scalar’ data to an output file (7). This
`data will be compressed into 128 bits total prior to storage in output file.
`
`
`
`
`Name
`Direction
`| Bits | Description
`UL_Vector_Out
`| SP-> OF
`| 128
`| Vector Data cut
`Exhibit 2010docRAQG_Sequetcerdec
`26584 Bytes™** © ATI HEcference Copyright Notice on Cover Page © ** poosus i244
`PROGHB04 O347DMGTAS04O246Ph
`
`oe 4 Formatted: Bullets and Numbering
`
`meee
`
`oe
`
`SEES
`
`~
`
`S
`
`=
`
`a
`
`AMD1044_0016674
`
`ATI Ex. 2010
`IPR2023-00922
`Page 15 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 15 of 20
`
`

`

`
`
`ORIGINATE DATE
`
`EDIT
`
`DATE
`
` 14 August, 200144
`PAGE
`R400 Sequencer Specification
`16 of 20
`3.
`4 September, 201524
`
`Bilt
`-
`
`OLSealarOut
`(32
`| Vector Data out
`
`UR_Vector Out | Vector Data out
`UR_Scalar_Out _ Vector Data out
`
`
`
`
`
`
`
`
`
`
`Name
`| Direction
`| Bits
`| Description
`|
`
`LL_VectorOut
`| SP-> OF
`_ Vector Data out
`| Vector Data out
`LL_ScalarOut
`SP-> OF
`
`LRoVectorOut|SP->OF|128 Vector Datacut
`
`LR_Sealar_Out
`SP-> OF
`(32
`| Vector Data out
`
`+312.1.3 Shader Engine to Texture Unit Bus (Fast Bus)
`One quad’s worth of addresses is transferred to
`Texture Unit every clock. These are sourced from a different pixel
`within each of the sub-engines repeating every 4
`clocks. The register-fleregister file index to read must precede the
`data by 2 clocks. The Read address associated with Quad 0 must be sent 1 clock after the Instruction Start signal is
`ion Start.
`sent, so that data is read 3 clocksafter the Instruct
`
`aoe "| Formatted: Bullets and Numbering
`
`to the Register-FileRegister file every clock. These are directed to a
`One Quads worth of Texture Data may be written
`4 clocks. The register-Aleregister file index to write must accompany
`different pixel of the sub-engines repeating every
`the data. Data and Index associated with the Quad
`0 must be sent 3 clocks after the Instruction Start signal is sent.
`
`| Direction
`| Bits
`| Description
`7
`Name
`
`Tex_Read_Register_Index
`SEQ->SP
`| 8
`Index into Register-FiieRegister files for reading Texture
`-
`~
`—
`|
`7
`
`_ Address{
`7
`| 4 Texture Addresses read from the Register-FileRegister
`Tex_RegFile_Read_Data
`| SP->TEX
`512
`I
`| file
`Tex_Write_Register_Index
`| SEQ->TEX
`| &
`‘Index into Register-fleRegister fle for write of returned
`I
`\
`_ Texture Data
`
`
`
`
`
`an Formatted: Bullets and Numbering
`
`F+412.1.4 Sequencer to Texture Unit bus (Slow Bus)
`
`sequencer on wich clause it is now working and if the data in the
`Once every four clock, the texture unit sends to the
`n update the texture counters for the reservation station fifos. The
`registers is ready or not. This way the sequencerca
`sequencer also provides the intruction and constants for the texture fetch to execute and the address in the register
`fileregisterfile where to write the texture return data.
`
`
`Name
`| Direction Bits | Description a
`
`Tex_Ready
`| TEX SEQ
`4
`| Data ready
`3
`_ Clause number
`TexClauseNum
`| TEX SEQ
`
`oe ?
`| Texture constants X bits sent over 4 clacks |
`L?
`_ Texture fetch instruction X bits sent over 4 clocks
`
`
`
`
`
`
`Name
`| Direction
`Bits Description
`Interpolator_Register_Index | SEQ->SP
`8
`_Index
`into RBegister-—FileRegisier
`files
`for write
`of
`
`I
`_Interpolator/Index Data
`
`
`Interpolator_Write_Mask
`| SEQ->SP
`1
`| Write Mask. The same write mask is used for all 4 pixels
`Interpolator_Write_Data
`| RE/PA->SP
`512 | 4 interpolated vectors or vectors of indices
`
`Formatted: Bullets and Numbering
`
`
`
`
`
`
`
`
`
`
`
`
`12.1.6 PA? to sequencer
`| Bits Description
`Name
`| Direction
`= 4 Formatted
`|.
`%
`| Deaslocation adress sent by the PA telling the Sequencer
`Adress,
`PA-SE
`{FormattedOO
`_ that it is now possible to free this space in the parameter|>.
`| buffer, This token js a pointer in the parameter cache and
`(Formatted
`| 4 bits fo tell the size wich Is to be freed uD,
`
`_ ad Formatted: Bullets and Numbering
`
`.
`
`Exhibit 2010.dock400Sequensendes
`
`25584 Bytes*** © ATI
`
`PMOSHA/O1.08:47
`
`Reference Copyright Notice on Cover Page © *** psosisioua
` EEESAO
`pM
`oe
`
`AMD1044_0016675
`
`ATI Ex. 2010
`IPR2023-00922
`Page 16 of 20
`
`ATI Ex. 2010
`
`IPR2023-00922
`Page 16 of 20
`
`

`

`
`
`EDIT DATE
`
`ORIGINATE DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`17 of 20
`|
`GEN-CXXXXX-REVA
`4 September, 201524
`14 August, 200144
`L a
`4
`A
`riya
`a
`13. Examples of program executions
`
`13.1.1 Sequencer Control! of a Vector of Vertices
`to the RE’s Vertex FIFO
`1.
`PA sends a vector of 16 vertices (actually vertex indices — 32 bits/index for 512 bit total)
`
`
`*_State painter as well as tag inte position cache is sent along with vertices
`
`
`
`

`space was allocated in the position cache for transformed position before the vector was sent
`
`o
`also before the vector is sent to the RE, the CP has loaded the global instruction store with the vertex
`
`shader program (using the MH?)

`The vertex program is assumed to be loacied when we receive the vertex vector,
`
`« the SEQ then accesses the IS base for this shader using the local state pointer (provided fo all
`sequencers by the RBBM when the CP is done loading the programm)
`2. SEQ arbitrates between the Pixel FIFO and the Vertex FIFO — basically the Vertex FIFO always has priority

`_at this point the vector is removed from the Vertex FIFO

`the arbitrer is not going to select a vector fo be transformed ifthe parameter cacheis full unless the pipe as
`nothing else to do (ie no pixels are in the pixelfifo).
`
`the
`
`
`

`
`
`
`selected by the ASM arbiter and gets the instructions for ALU
`
`
`
`3. SEQ allocates space in ihe SP register file for index data plus GP Rs used by the program
`e
`the number of GPRs required by
`the programis stored in a local state register, which is accessed _using
`
`
`state pointer that came gown with the vertices
`SEG will not send vertex data until space in the register file has been allocated

`4, 3EQ sends thevector to the SP register file over the RESF interface (which has a bandwidth of 512bits/cycle)
`o
`_the 16 vertex indices are sent to the 16 register files over 4 cycles
`e
`RFO of SU0, SU1, SU2, and SUS is written the first cycle
`« RF of SU0, SU1, SU2, and SU3 is written the second cycie
`
`eo
`RF2 of SU0,SU1, SUZ, and SUS is written the third cvele
`RES of SU0, SU1, SU2, and SUS is written the fourth cycle
`
`
`
`
`ificant 32 bits (fleating paint format?) (what about compoundindices)
`
`SLING "2 ae theregist
`ainingdata
`
`bits. 2)
`
`
`
`
`5. SEQ construcis a control packet for the vector and sends ft te the firsi reservation station (the FIFO in front of
`texture state machine 0,ar TSMQFIFO)

`_the control packet contains the state pointer, the tag to the position cache and a registerfile base pointer.
`8. TSMacc

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket