`® ‘
`
`ORIGINATE DATE
`7 May, 2001
`
`EDIT DATE
`4 September, 2015
`
`DOCUMENT-REV. NUM.
`GEN-CXXXXX-REVA
`
`PAGE
`1 of 9
`
`Author:
`
`Laurent Lefebvre
`
`R400 Sequencer Specification
`
`SEQ
`
`Version 0.1
`
`Overview: This is an architectural specification for the R400 Sequencer block (SEQ). It provides an overview of the
`required capabilities and expected uses of the block.
`It also describes the block interfaces, internal sub-
`blocks, and providesinternal state diagrams.
`
`transmitted in any form or by any means withoutthe prior written permission of AT] Technologies Inc.”
`
`AUTOMATICALLY UPDATEDFIELDS:
`Document Location:
`D:\Perforce\r400\arch\doc\gfx\MC\R400 MemCti.doc
`Current Intranet Search Title:
`R400 Memory Controller Architectural Specification
`APPROVALS
`
`NFORMATION THAT COULD BE
`THIS DOCUMENT CONTAINS
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES
`INC. THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`“Copyright 2001, AT! Technologies Inc. All rights reserved. The material in this document constitutes an unpublished
`work created in 2001. The use of this copyright notice is intended to provide notice that ATI owns a copyrightin this
`unpublished work. The copyright notice is not an admission that publication has occurred. This work contains
`confidential, proprietary information and trade secrets of ATI. No part of this document may be used, reproduced, or
`
`Exhibit 2007.doc
`
`9252 Bytes*** ©
`
`Reference Copyright Notice on Cover Page © ***04:15 o4:03 oy stl 2007
`LGy. ATI
`IPR2015-00325
`
`AMD1044_0256664
`
`ATI Ex. 2104
`IPR2023-00922
`Page 1 of 9
`
`
`
`Vat
`ha
`|
`oe
`
`P
`
`
`
`ORIGINATE DATE
`
`
`7 May, 2001
`
`EDIT DATE
`4 September, 2015
`
`
`
`R400 Memory Controller
`ificati
`Architectural Specification
`
`PAGE
`2 of 9
`
`Table Of Contents
`
`OVERVIEW cc ccccssessseenesesseaenenes 3
`1.
`Top Level Block Diagram ..........0...6. 4
`L.1
`TEXTURE ARBITRATION................. 7
`2.
`ALU ARBITRATION uu... cccesscsceeseens 8
`3.
`INPUT INTERFACE... .cccccesseeeeene 8
`4.
`Rasterizer to Regisiter File (interpolated
`4.1
`data) 8
`42 Texture Unit to Register File (texture
`PQLUIT) oot tee ee te ee teeter enter cteeeeneceneeeees 8
`
`ALU Unit to Register File (ALU op
`4.3
`PESUIE) o.oo ccc cece eee ceeeeeceeeeeesenevenecsereens 8
`44
`Scalar Unit to Register File (Scalar op
`POSUI) oo eect ree tee ct tettetentenrenneeen 8
`5.
`OUTPUT INTERFACE. ......sccsssenes 8
`5.1
`Sequencer to Shader Engine Bus. ....... 8
`5.2
`Shader Engine to Texture Unit Bus... 9
`6
`OPEN ISSUES cc esscsessenesseeseseenrees 9
`
`Revision Changes:
`
`Rev 0.1 (Laurent Lefebvre)
`Date: May 7, 2001
`
`First draft.
`
`Exhibit2007.doc
`
`9262 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ***pyo415 ¢4-03 pm
`
`AMD1044_0256665
`
`ATI Ex. 2104
`IPR2023-00922
`Page 2 of 9
`
`ATI Ex. 2104
`
`IPR2023-00922
`Page 2 of 9
`
`
`
`
`
`7 May, 2001
`
`4 September, 2015
`
`GEN-CXXXXX-REVA
`
`A BONY BEY BE OY AY SERLE RBA BY
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`3 of 9
`
`1. Overview
`
`The sequencerfirst arbitrates between vectors of 16 vertices that arrive directly from primitive assembly and vectors
`of 8 quads (2 pixels) that are generated in the raster engine.
`
`The vertex or pixel program specifies how many GPR’s it needs to execute. The sequencer will not start the next
`vector until the needed space is available.
`
`The sequencer is based on the R300 design. It chooses an ALU clause and a texture clause to execute, and execute
`all of the instructions in a clause before looking for a newclause of the same type. Each vector will have eight texture
`and eight alu clauses, but clauses do not need to contain instructions. A vector of pixels or vertices ping-pongs along
`the sequencer FIFO, bouncing from texture reservation station to alu reservation station. A FIFO exists between each
`reservation stage, holding up vectors until the vector currently occupying a reservation station has left. A vector at a
`reservation station can be chosen to execute. The sequencer looks at ail eight alu reservation stations to choose an
`alu clause to execute and all eight texture stations to choose a texture clause to execute. The arbitrator will give
`priority to clauses/reservation stations closer to the top of the pipeline.
`it will not execute an alu clause until the
`texture fetchesinitiated by the previous texture clause have completed.
`
`To support the shaderpipe the raster engine also contains the shader instruction cache and constantstore.
`
`Exhibit 2007.dec
`
`9252 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ***og4/45 04-03 py
`
`AMD1044_0256666
`
`ATI Ex. 2104
`IPR2023-00922
`Page 3 of 9
`
`ATI Ex. 2104
`
`IPR2023-00922
`Page 3 of 9
`
`
`
`BAMA Be Be VEY Sheee AVE RB BERL RY
`
`ORIGINATE DATE
`
`EDIT DATE
`
` 7 May, 2001
`R400 Memory Controller
`Architectural Specification
`4 of 9
`
`4 September, 2015
`1.1 Top Level Block
`Diagram
`
`PAGE
`
`vertex/pixel vectorarbitrator
`
`Possible delay for available GPR’s
`
`‘exture clause 0
`eservation station
`
`
`
`
`
`eservationstation
`
`reservation station
`
`‘exture clause 4
`
`
`eservation station
`
`U clause 4
`
`
`reservationstation
`
`
`exture clause S
`
`eservalion slalion
`
`ALUclause 5
`jeservationstation
`
`
`
`‘exture clause 6
`reservation station
`
`
`
` ‘exture clause 3
`‘exture clause 1
`
`
`eservation station
`
`
`ALUclause 1
`exture arbitrator
`
`
`
`eservation station
`
`
`‘exture clause 2
`exture arbitrator
`
`
`
`reservation station
`
`
`
`
`
`
`
`
`
`
`
`exture clause 7
`
`
`eservation station
`
`
`eservation station
`
`
`The rasterizer always checks the vertices FIFO first and if allowed by the sequencer sends the data to the shader. If
`the vertex FIFO is empty then, the rasterizer takes the first entry of the pixel FIFO (a vector of 32 pixels) and sends it
`to the interpolators. Then the sequencer takes control of the packet.
`
`On receipt of a packet, the input state machine (notpictured but just before the first FIFO) allocated enough spacein
`the registers to store the interpolatoted values and temporaries. Following this, the input state machine stacks the
`packetin the first FIFO.
`
`On receipt of a command, the level 0 texture machine issues a texure request and corresponding register address for
`the texture address (ta). A small command (tcmd) is passed to the texture system identifying the current level number
`
`Exhibit 2007.doc
`
`9252 Byes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ***ogo4i5 04-03 py
`
`AMD1044_0256667
`
`ATI Ex. 2104
`IPR2023-00922
`Page 4 of 9
`
`ATI Ex. 2104
`
`IPR2023-00922
`Page 4 of 9
`
`
`
`BARN ABe RRLV RR
`
`
`
`
`ORIGINATE DATE
`
`
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`
`
`
`7 May, 2001
`4 September, 2015
`GEN-CXXXXX-REVA
`(0) as well as the register set being used. One texture request is sent every 4 clocks causing the texturing of four
`2x2s worth of data.
`
`Uppon recept of the return data (identified by the temd containing the level number 0), the level 0 texture machine
`issues a register address for the return value (td). Then, it puts the finished packet in FIFO 1.
`
`On receipt af a command, the level 0 ALU machine issues a complete set of level 0 shader instructions. For each
`instruction,
`the state machine generates 3 source addresses, one destination address (2 cycles later) and an
`instruction id wich is used to index into the instruction store. Once the last instruction as been issued, the packetis
`put into FIFO 2. Note that in the case of a pixel packet, the two vectors of 16 pixels are interleaved in order to hide the
`latency of the ALUs (8 cycles).
`
`Ail other level process in the same way until the packetfinally reaches the last ALU machine (8). On completion of the
`level 8 ALU clause, a valid bit is sent to the Render Backend wich picks up the color data. This requires that the last
`instruction writes to the output register — a condition that is almost always true.
`If the packet was a vertex packet,
`instead of sending the valid bit to the RB, it is sent to the PA, which picks up the data a putsit into the vertex store.
`
`Only one ALU state machine may have access to the SRAM address bus or the instruction decode bus at one time.
`Similarly, only one texture state machine may have access to the SRAM address bus at one time. Arbitration is
`performed by two arbitrer blocks (one for the ALU state machines and one for the texture state machines). The
`arbitrers always favor the higher number state machines, preventing a bunch of half finished jobs from clogging up
`the SRAMS.
`
`Each state machine maintains an address pointer specifying where the 16 (or 32) entries vector is located in the
`SRAM (the texture machine has two pointers one for the read address and one for the write). Upon completion of its
`job, the address pointer is incremented by a predefined amount equal to the total number of registers required by the
`shading code. A comparison of the address pointer for the first state machine in the chain (the input state machine),
`and the last machine in the chain (the level 8 ALU machine), gives an indication of how much unallocated SRAM
`memory is available. When this numberfalls below a preset watermark, the input state machine will stall the rasterizer
`preventing new data from entering the chain.
`
`Exhibit 2007.dec
`
`9252 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ***cg4j5 04-03 py
`
`AMD1044_0256668
`
`ATI Ex. 2104
`IPR2023-00922
`Page 5of 9
`
`ATI Ex. 2104
`
`IPR2023-00922
`Page 5 of 9
`
`
`
`BAMA Bee BY by RRRER LVRS RO EVENEE BEY
`
`R400 Memory Controller
`EDIT DATE
`ORIGINATE DATE
`
`7 May, 2001 Architectural Specification 4 September, 2015
`
`6 of 9
`
`PAGE
`
`
`
`“|datafromRE
`
`RegisterFile
`512x128 (built as 4 128x128 oF 16 128x32
`
`control from RE
`
`
`
`or vertex parameter data to RE through texture block
`or pixel data to RB through toxture block
`
`532
`
`ngbit data —_——— constants fromRE
`_o
`
` Addressto texure
`
`wrt 4 32 bit}
`
`
`
`
`(perand mux
`Weyl yey
`
`AC units
`
`128 bil scalar/vector
`ALU
`
`
`control from RE
`
`Exhibit 2007.doc
`
`9252 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ***59.44/45 04-03 pm
`
`AMD1044_0256669
`
`ATI Ex. 2104
`IPR2023-00922
`Page 6 of 9
`
`ATI Ex. 2104
`
`IPR2023-00922
`Page 6 of 9
`
`
`
`BUA RR RE RELV Rk AA
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`7 May, 2001
`
`4 September, 2015
`
`GEN-CXXXXX-REVA
`
`
`
`
`
`-)
`CORSTEHIS Tom RE-
`
`_ Register File
`pipeline stage
`
`| p
`
`o
`instruction
`
`;
`,
`Register File
`
`MAC
`
`,
`
`
`
`_
`data from R
`EX{ureTeiGhTet
`
`addresstotexture
`constants from
`
`|
`~~
`cipeline stage
`
`| |
`
`|
`‘
`
`|
`|
`
`
`|
`
`| instruction
`
`m
`vipeline stage
`
`.
`;
`Register File
`
`data from RE
`exture fetch
`retum
`
`address
`
`totexture_)
`
`AC
`
`= Cl
`
`.
`
`|
`
`;
`
`|
`
`Scalar Unit
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`8)Q
`>
`[a
`
`constants from RE
`~
`
`-
`|
`struction
`data fromRE
`,
`5
`pgister File
`S
`.
`exturé fetch
`return
`1 1
`
`
`
`&
`
`pipeline stage [=ieSta I| I poe
`
`
`
`
`
`c
`
`constants from R
`IS
`2
`i|le
`io Heb
`ie! 2
`
`I
`o|
`|
`2
`I
`©
`
`
`
`MAC
`
`
`
`freon
`
`scalar operand input/ scalar result output
`
`addresstotexture
`
`2. Texture Arbitration
`
`The texture arbitration logic chooses one of the 8 potentially pending texture clauses to be executed. The choice is
`made by looking at the fifos from 8 to 0 and picking the first one ready to execute. Once chosen, the clause state
`machine will send one 2x2 texture fetch per 4 clocks until all the texture fetch instructions of the clause are sent. This
`means that there cannot be any dependencies between two texture fetches of the same clause.
`
`Exhibit 2007.dec
`
`9252 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ***og4/45 04-03 py
`
`AMD1044_0256670
`
`ATI Ex. 2104
`IPR2023-00922
`Page 7 of 9
`
`ATI Ex. 2104
`
`IPR2023-00922
`Page 7 of 9
`
`
`
`
`
`A AUN BEY Bk Yay RAREEER AVES RE BYRNE REY
`
`ORIGINATE DATE
`EDIT DATE
`R400 Memory Controller
`|
`PAGE
`Architectural Specification
`7 May, 2001
`4 September, 2015
`8 of 9
`
`4
`
`bd ]
`i
`
`3. ALU Arbitration
`
`ALU arbitration proceeds in almost the same way than texture arbitration. The ALU arbitration logic chooses one of
`the & potentially pending ALU clauses to be executed. The choice is made by looking at the fifos from 8 to 0 and
`picking the first one ready to execute.
`If the packet chosen is a packet of vertices, the state machine issues one
`instruction every 4 clocks until the clause is finished. This means that the compiler has to insert nops between two
`dependent successive instructions. If the packet is a pixel packet it is made out of two sub-vectors of 16. Thus the
`state machine issues the first instruction for the first sub-vector and then, 4 clocks later, the first instruction of the
`second sub-vector and so on until the clause is finished. Proceeding this way hides the latency of 8 clocks of the
`ALUs.
`
`4. Input Interface
`
`4.1 Rasterizer to Register File (interpolated data)
`
`
`
`‘Name_
`Direction|bits|Description rr
`
`
`SND
`SEQ DSP
`_|1
`High when sending data
`
`
`
`interpolated data SEQ DSP|512 512 bits transferred every 4 cycles
`
`42 Texture Unit to Register File (texture return)
`
` Name
`
`Direction|bits Description
`
`
`
`SND SEQSTU|1 High when sending data
`
`Texture colors
`TU>SP
`512
`512 bits transferred every 4 cycles
`
`4.3 ALU Unit to Register File (ALU op result)
`
` Name
`
`
`
`Direction|bits Description
`
`
`
`SND SEQSSP|1 High when sending cata
`Blend result ALU
`SP>SP
`512
`512 bits transferred every 4 cycles
`
`Write Mask The four write masks SP3SP 16
`
`
`
`44 Scalar Unit to Register File (Scalar op result)
`
`
`Direction|bits Description
`
`
`
`SND SEQ DSP|1 High when sending data
`Scalar result
`SP SP
`512
`512 bits transferred every 4 cycles
`Write Mask
`SPSP
`16
`The four write masks
`
` Name
`
`
`
`
`
`5. Qutput Interface
`
`5.1 Sequencer to Shader Engine Bus
`This is a bus that sends the instruction and constant data to all 4 Sub-Engines of the Shader. Because a new
`instruction is needed only every 4 clocks, the width of the bus is divided by 4 and both constants and instruction
`are sent over those 4 clocks.
`
`
`
`
`
`Narne Bits|Description| Direction |
`
`_SEQ-> SP
`1
`High on first cycle of transfer
`Instruction Start
`
`Constant 0
`| SEQ-> SP
`32
`128 bits transferred over 4 cycles, alpha first...blue last
`Constant 4
`| SEQ-> SP
`32
`128 bits transferred over 4 cycies, alpha first...blue last
`instruction
`| SEQ-> SP
`40
`160 bits transferred over 4 cycles
`
`
`
`
`
`
`Exhibit 2007.deo
`
`9252 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ***pg,4j45 94-03 pm
`
`AMD1044_0256671
`
`ATI Ex. 2104
`IPR2023-00922
`Page 8 of 9
`
`ATI Ex. 2104
`
`IPR2023-00922
`Page 8 of 9
`
`
`
`BRAN RBI BE OY BY AAREe LE RR BERL REY
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`
`
`DOCUMENT-REV. NUM.
`
`
`
` PAGE
`
`
`
`7 May, 2001
`4 September, 2015
`5.2 Shader Engine to Texture Unit Bus
`One quad’s worth of addresses is transferred to Texture Unit every clock. These are sourced fro a different pixel
`within each of the sub-engines repeating every 4 clocks. The register file index to read must precede the data by
`2 Clocks. The Read address associated with Quad 0 must be sent 1 clock after the Instruction Start signal is sent,
`so that data is read 3 clocks after the Instruction Start.
`
`GEN-CXXXXX-REVA
`
`9 of 9
`
`One Quad’s worth of Texture Data may be written to the Register File every clock. These are directed to a
`different pixel of the sub-engines repeating every 4 clocks. The register file index to write must accompany the
`data. Data and Index associated with the Quad 0 must be sent 3 clocks after the Instruction Start signal is sent.
` on
`
`Bits |Description
`Name
`|Direction
`|
`
`| Tex_Read_Register_Inde|SEQ->SP 8 index into Register Files for reading Texture Address
`
`xX
`
`
`
`
`| Tex_RegFile_Read_Data|SP->TEX 5i2|4 Texture Addresses read from the Register File
`
` Data
`
`| Tex_Write_Register_Index|SEQ->SP 8 index into Register file for write of returned Texture
`
`
`6. Open issues
`There is currently an issue with constants. If the constants are not the same for the whole vecior of vertices, we don’t
`have the bandwith from the texture store to feed the ALUs. Two solutions exists for this problem:
`1) Let the compiler handle the case and put those instructions in a texture clause so we can use the
`bandwith there to operate. This requires a significant amount of temporary storeage in the register store.
`2) Waterfall down the pipe allowing only at a given time the vertices having the same constants to operate in
`parralel. This might in the worst case siow us downbya factor of 16.
`
`Exhibit 2007.doc
`
`9252 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © ***po,4/45 4-03 pm
`
`AMD1044_0256672
`
`ATI Ex. 2104
`IPR2023-00922
`Page 9 of 9
`
`ATI Ex. 2104
`
`IPR2023-00922
`Page 9 of 9
`
`