`
` ORIGINATE DATE EDIT DATE DOCUMENT-REV. NUM. PAGE
`
`
`
`
`
`
`
`24 September, 2001
`4 September, 20154
`GEN-CXXXXX-REVA
`4 of 48
`—
`by
`
`Author:
`Laurent Lefebvre
`
`
`Issue To:
`Copy No:
`
`
`R400 Sequencer Specification
`
`SQ
`
`Version 1.87
`
`Overview: This is an architectural specification for the R400 Sequencer block (SEQ).
`[it provides an overview of the
`required capabilities and expected uses of the block. t also describes the block interfaces,
`internal sub-
`blocks, and provides internal state diagrams.
`
`
`
`
`
`AUTOMATICALLY UPDATED FIELDS:
`Decument Location:
`C\perforce400\doc_llb\designiblocks'sq\R400Sequencer.dac
`Current Intranet Search Title:
`R400 Sequencer Specification
`
`APPROVALS -
`as
`
`es
`:
`Signature/Date
`uu Name/Dept Oe
`
`
` Remarks:
`
`
`
`
`
`THIS DOCUMENT CONTAINS CONFIDENTIAL INFORMATION THAT COULD BE
`
`
`
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES
`
`
`
`
`
`
`INC. THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`
`“Copyright 2001, ATI Technologies Inc. All rights reserved. The material in this document constitutes an unpublished
`work created in 2001. The use of this copyright notice is intended to provide notice that ATI owns a copyright in this | =
`unpublished work. The copyright notice is not an admission that publication has occurred. This work contains
`confidential, proprietary information and trade secrets of ATI. No part of this document may be used, reproduced, or
`transmitted in any form or by any means without the prior written permission of ATI Technologies Inc.”
`
`Exhibit 2024.doeR400_Sequencerdec
`
`71269 Bytes*** © AT] Confidential. Reference Copyright Notice on Cover Page © ***
`
`ATI 2024
`
`LGvy. ATI
`TPR2015-00325
`
`AMD1044_0257135
`
`ATI Ex. 2107
`IPR2023-00922
`Page 1 of 260
`
`ATI Ex. 2107
`
`IPR2023-00922
`Page 1 of 260
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`
`
`R400 Sequencer Specification
`
`PAGE
`2 of 48
`
`a
`
`
`
`
`
`24 September, 2001
`
`TAO A
`4 September 20154
`
`
`
`TableOfContents
`
`1.
`
`OVERVIEW cecccccceecetrerceeennte cnn end RRR ERE CE eden ec cannenn een 6
`
`Top Level Block Diagram ooo ccc e ect ee cen eetneeeesnttaeeeeerssaeeeeebssaseeeeteseseeesnseereneneees 3
`1.1
`1.2 Data Flow graph (SP)oo ccc eee tr etre rb bette t tbr citteetittesteettteeceetttteeenintias 40
`LB
`COMPO) GPA eee eee rene n ere e cn EE Ee ED DD ttt Ee Db obttteeetiscieteesttsiseeesttssienenies 14
`
`
`
`
`
`2. INTERPOLATED DATA BUS wetcccseinerenereinencnanenneneed eennnneneeaanennenecenaes Woe
`
`INSTRUCTION STORE ooo enc ini ini anoint boone eur rea 4.
`3.
`4.
`SEQUENCER INSTRUCTIONS. ooo ccc ccc eecasaeenanenensennnnaenecannnnanencaeananaaaeensannsaaeneceann 16
`§.
`GONSTANT STORES 2c cece ccc ceceree reac eenne ee rece eeeneey cece ee enens cance nnneenceteerenneeccenmenenereeceaees 16
`6
`S.1
`M@MOry CFG ANIZATIONSocc eect eee creer renter Eee DD Dott teebbcoitteettissiteestasieneeesa 16
`3.2 Management of the re-mapping tables ooo ccc cee cccteectseeeeeteeetssteteetetessaeenaes 16 2.
`S201
`Dirty BIS cece tt teeter ti tititititesttttittre titi tititttieterecen 1948 2
`5.2.2
`Free List Blok occ ccc cece ee ee teee sete tseeeseeteneeseteisessiteseusieessinesiesiteestin 4948
`§.2.3
` De-allocate BOCK ooo ccc cc ceces tent trtteetetstetittitisetstetitetineciittretercreentenes 204948
`5.2.4
`Operation of Incremental Modelocc cette estetnttetttte teste 204848
`$3. Constant Store INCexing. cece centres eee testes eetteeeeesevueeeevestseeeersntbieseerennesaas 20419
`
`tbitettbsrtittettttiiteeeenes Zi204—
`$4 Real Time Commands...ee tenet tes
`
`$5 Constant Waterfallingocc tect eee tener Eee cr btttteccnttitteetbrcieeeniea 212049
`6&
`LOOPING AND BRANCHESocc eee cece seen de enenaedceciencanenceseneinen 222120
`B.1
`The COmTONINg State cece ccc eeee cc steteeetecciteeettasseeeettseseeeettsseeeceenaeees 222420
`6.2
`The Control Flow Program occ reece cette ete eeeeeetttteiseeetttteeesctteteeens Zeei2O
`6.3 Data dependant predicate INStrUCTIONS...0. teeter tere eee reper eerteeennee 2423
`6.4
`HW Detection OF PVPSote e etree tee et steers seeraeernsapenbrernsaerenaens 202423
`6.5 Register fhe INDEXING... cece cece ete e etc teteeeetttiteeeettaseeeesttiecceetttieeeecsttaeess 202423 |
`6.6
`Predicated instruction support for Texture CAUSES cette tttetettteees 2624
`6.7 Debugging the SNaGers occ eect tee cette ttee ett tsceeettttecteseteteeerstteeeees 202624 ©
`6.7.1 Method 4: Debugging registers occ cette ieee teeesttetetencneieaeieey 262524
`6.7.2 Method 2: Exporting the values in the GPRs (12) ooo teteeteee 2625
`7.
`PIXEL KILL MASK occerrre ne EES n nnn ede odenenedeneneaeteee 225
`8
` MULTIPASS VERTEX SHADERS (HOS)... cccccccccss cee ceeeeeecesssesseneecsesessnenesconncaans 212625 |
`4
`9.
`REGISTER FILE ALLOCATION oo. ccceccc cee ce cnn nae nena da aeinan neti oinnin anes 272625
`
`
`
`10. FETCH ARBITRATION. occ ecccccceeneecenceenaneecnnaannnaneencnannnnane snnaannnnaessaaaanaanaenaaanne 282726
`
`
`
`11. ALU ARBITRATION .ooocccccccccccccetececeecenceteeeeeeenceetegendcrcecencedenies caeeenennenecaeteee 282726
`HANDLING STALLS osc eet n eee nee eo enedned cece eenenenee 292827
`12
`13.
`CONTENT OF THE RESERVATION STATION FIFOSWo. .ccccceccccseccusenccsssncsnreesseneneen 202827
`THE OUTPUT FILE. cecccetcccsntesetessceeneenee ccc einnn eco neenne nen
`cnnennnanescananenaeeescenenenneesananeas 292827
`14.
`
`
`
`
`
`
`18. J FORMAT oe cee ie enne i nei aiiiaioaboooinnHannonblicaiboilnooouaiboarne oomobale iui 292827 |
`
`Interpolation of constant attributes cece cece cece settee etseeeeeeessees 302828
`15.1
`S
`16.
`STAGING REGISTERS wooo ccceccsstecsseeeeenecseneensen ee csnneennenesscnananntessnuenaaneeesnaueunaneesananeas 302928
`THE PARAMETER CACHE... ccccccssscsstecsssesssecssnensensessneensnenessanansnneeesseuennnneesnanens 323130
`17.
`
`
`ciccecccccessecccennseeeesseeneuseneessneneananerssanncaaeeessenenen 333430
`18. VERTEX POSITION EXPORTING.
`
`
`19. EXPORTING ARBITRATION Qocccccseccsecessescssnerseneessneensnenescnenanneessneueunaneesaneneas 333430
`
`EXPORT TYPES. ccecscccsceeeeeeceesnnnennn esas nnnsananesasnnaaanedsasnanaaneedsanamanaessasnaananadeasaneanas 333430
`20.
`20.1
`Vertex SHANGi en re rn teen DHE ttn Eb tttferbttttiteeercttteeeeer 333430
`
`
`
`Exhibit 2024, dochUoo_Sequencerdoc
`
`71260 Bytes*™** © ATI Confidential. Reference Copyright Notice on Cover Page © »*
`
`AMD1044_0257136
`
`ATI Ex. 2107
`IPR2023-00922
`Page2 of 260
`
`ATI Ex. 2107
`
`IPR2023-00922
`Page 2 of 260
`
`
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`
`
`
`
`
`
`:
`
`20.2
`21
`21.1
`
`3 of 48
`GEN-CXXXXX-REVA
`4 September 204 a
`24 September, 2001
`PIKE SHSCING ccc e eee c nt ete e tbr eeteebt bb citeeeettiesseeetttaeeceetttseeescsttaeess 333230
`SPECIAL INTERPOLATION MODES. oo cccccsccscssnsccsssccnscanssssensnsnssssenusensssessannnnnnecssnssnnes 343234
`PR@@! TIME COMMANIS ooo. ccc ccc cec cece ee ebb ebbeebbeeseeeteee bbe bbbbteeeetecceueeatestntreeeeeeess 343234
`
`
`teteteeeees 343234
`Sprites! XY screen coordinates/ FB information...
`21.2
`Auto generated COUNTETSocc eect etree bier ette tt ttteentitteeeeersttteeee 343334
`21.3
`2LB 1
`Vertex shaders oie cece cceetesessesesessevsesseisvivsrevivetvetersvevrsettevsseseiees 343332
`ZLB.2
`Piel Shaders. cece sceeeecevesesevevesevsessvesvetrscvitetvetersveveeveservseseseiees 343332
`22.
`STATE MANAGEMENT oo cccccccscsssceccccnscnsnscsesssssnsccsnasenasanssssnannnavstssanssansseusannnanntssenssnnas 353332
`
`
`22.1 Parameter cache Synchronization occceceescentbteteesnetttteeeersttieeeeen 253332
`23.
`KY ADDRESS IMPORTS. oo ccsssccsccssssenensnsnansnanssesnannnsnessnnnnnsnanasenuanananvessennannansnsnnnannne 353432
`
`
`
`
`23.1 Verlex INCEXES IMPONTIScecececeteeetetitteeetttsceeetttitecteetttieeeestttieeees gos4as
`24. REGISTERS oo. cccccccccassccnsccnscsssssnnsssescenssanansesansnsnnuscsnasnnssanssssnannnantsssanssnnssensannnsnnnsssnssnans 363433
`
`DAL COMO eee cce cc cet tenet eer d eee eee bebe bbb bib bb HEED GHEEEEEEEOecdbbttetcitetttnaaasaaaaaeeess 363433
`QAD CONTE ec c ec cee cece eee en EE OLE EEE EEE EEG edd DEED EDDbttbeeeeee tsb tebeb ebb sctteeteseeeseeeeeaene® 363433
`
`2%. DEBUG REGISTERS Lo ccccccacseeuuceceeneneuausseneusaneeceesadaeteuetsnaauaaaesecsuaedauceetedateteuseceusanaee 373834
`251 COMTEXcc eect ee eer eee ec rn EEE EEO bet tet bt batteetttisteeetttitectentteteeecnttieeees 3/3ee4
`
`
`
`INTERFACES ccc ccsseessesesssssecsscssssessncsesecsneenencesssssmsnansesssantannesseronneeen 373534
`26.
`External Interfaces.eee reretetrnsctivitnrnannens 373534
`261
`26.1.1
`SC to SQ: lJ Control busort eres tren rsneerenes 373634
`26.1.2
`SQ to SP: Interpolator DUS cece ete eet ese tsteteteeertetitetsteteteieees 383635
`
`SQ to SP: Parameter Cache Read control bus...
`26.1.3
`383635
`SQ to Sx: Parameter Cache Mux control BUS ooo cccccccceeeeeeseereeeseeees 393736
`26.1.4
`26.1.5
`SQ to SP: Staging Register Data oo cee cetnttettnttettteeete: 393236
`
`
`26.1.6 PA to SQ: Vertex interface oooccccecsececsceecsessevevetesessveveesesetieeeserees 393236
`SQ to CP: State report cece cess cette tsese estes ersten: 424439
`26.1.7
`26.1.8
`SQ to SX: Control DUS cece cecsescesescesvsevevssceveteetssvevettseevssssvaees 424439
`26.1.9
`SX to SQ: Output file Controlocc cece cs tr tetetttettittettite rset 424439
`26.1, 10 SQ to TP: Control busocc cccceesesecsveevsessevivetestsstivettesesvisssviees 424439
`26.1. 11 TP to SQ: Texture Stale eceesesecsvsecsersevevetveesstevettevessiesessiees 434240
`26.1,.12 SQ to SP: Texture stall cece cece cesesscsvsevevsscvevetvsvssvevetssseviesesviees 434240
`26.1.13 SQ to SP: GPR, Parameter cache control and auto counter oo. 434240
`26.1. 14 SQ to SPx: INStUCTIONS ooo ccc ecceceeesesescesesveevescrevetereesveteeteteesvenseserees 444344
`26.1,.15 SP to SQ: Constant address load ooo ccc ceccecceceseeeeteevvevesvesvreveneseeeees 454444
`
`26.1.16 SQ to SPx: constant broadcast o.oo
`cccceseeececcsevevervevessveveseestesveeeseeeees 454444
`26.1,.17 SPO to SQ: Kill vector load ooo ccc ceeesececscsecsersevsvetvetssvevesveveeviseesviees 454442
`26,1.18 SOQ to CP: REBM Bus ooo cccccccccccceeeecsessevevveevssevevvsavssuvevvaavesevsevaavenvestvaaveevs 454442
`26.1.19 CP to SQ: RBBM BUS cece eee e ee ett t ttt ttttttettten 404442
`27.
`EXAMPLES OF PROGRAM EXECUTIONS oo... ..ccccsssecsccnsnenenscnenensnececenannanensuenananens 464442
`
`27.1.1
`27.\.2
`QT VS
`
`Sequencer Control of a Vector ofVertices 464442
`Sequencer Control of a Vector ofPixels A4TAB43
`NOLO ooo coco ce cece ce cece cece eee be cee te ede eteteetestvessestessvssveveevessseveveetesveneisesees 484644
`
`Exhibit 2024,doch409_Sequeneerdas
`
`71260 Bytes** © ATI Confidential. Reference Copyright Notice on Cover Page © »*
`
`AMD1044_0257137
`
`ATI Ex. 2107
`IPR2023-00922
`Page 3 of 260
`
`ATI Ex. 2107
`
`IPR2023-00922
`Page 3 of 260
`
`
`
`
`
`PAGE
`R400 SequencerSpecification
`EDIT DATE
`ORIGINATE DATE
`Vat
`Bethan
`os
`SAAS A
`|
`4 of 48
`4 September, 20154
`24 September, 2001
`© Fi
`| 28. OPEN ISSUES occ eri cure nie nner inn eneeene cer sneetenncecnenenmeneeseencns 484744
`
`Exhibit 2024.dochUoo_Sequencerdos
`
`71260 Bytes*™** © ATI Confidential. Reference Copyright Notice on Cover Page © »*
`
`AMD1044_0257138
`
`ATI Ex. 2107
`IPR2023-00922
`Page 4 of 260
`
`ATI Ex. 2107
`
`IPR2023-00922
`Page 4 of 260
`
`
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`24 September, 2001
`
`Ra
`
`
`
`4 Seplember, 20154Yarawe!
`
` DOCUMENT-REV. NUM.
`
`GEN-CXXXXX-REVA
`
`PAGE
`5 of 48
`
`Revision Changes:
`
`Rev 0.1 (Laurent Lefebvre)
`Date: May 7, 2001
`
`Rev 0.2 (Laurent Lefebvre)
`Date : July 9, 2001
`Rev 0.3 (Laurent Lefebvre)
`Date : August 6, 2001
`Rev0.4 (Laurent Lefebvre)
`Date : August 24, 2001
`
`Rev 0.5 (Laurent Lefebvre)
`Date : September 7, 2001
`Rev 0.6 (Laurent Lefebvre)
`Date : September 24, 2001
`Rey 0.7 (Laurent Lefebvre)
`Date : October 5, 2001
`
`Rev 0.8 (Laurent Lefebvre)
`Date . October 8, 2001
`Rev 0.9 (Laurent Lefebvre)
`Date : October 17, 2001
`
`Rev 1.0 (Laurent Lefebvre)
`Date : October 19, 2001
`Rev 1.1 (Laurent Lefebvre)
`Date : October 26, 2001
`
`Rev 1.2 (Laurent Lefebvre)
`Date : November 16, 2001
`Rev 1.3 (Laurent Lefebvre)
`Date : November 26, 2004
`Rev 1.4 (Laurent Lefebvre)
`Date : December 6, 2001
`
`Rev 1.5 (Laurent Lefebvre)
`Date : December 11, 2001
`
`Rev 1.6 (Laurent Lefebvre)
`Date : January 7, 2002
`
`
`
`First draft.
`
`Changed the interfaces to reflect the changesin the
`SP. Added somedetails in the arbitration section.
`Reviewed the Sequencer specafter the meeting on
`August 3, 2001.
`Added the dynamic allocation method for register
`file and an example (written in part by Vic) of the
`flow of pixels/vertices in the sequencer.
`Added timing diagrams (Vic)
`
`the new R400
`
`reflect
`spec to
`Changed the
`architecture. Added interfaces.
`instruction
`Added
`constant
`store management,
`store management, control flow management and
`data dependant predication.
`Changed the control
`flow method to be more
`flexible. Also updated the external interfaces.
`incorporated changes made in the 10/18/01 contro!
`flow meeting. Added a NOP instruction, removed
`the
`conditional_execute_or_jump. Added
`debug
`registers.
`Refined interfaces to RB. Added state registers.
`
`delta
`SEQ-—-SPO interfaces. Changed
`Added
`precision. Changed VGT--SP0 interface. Debug
`Methods added.
`interfaces greatly refined. Cleaned up the spec.
`
`Addedthe different interpolation modes.
`
`Added the auto incrementing counters. Changed
`the VGT—SQ interface. Added content on constant
`management. Updated GPRs.
`Removed from the spec all interfaces that weren't
`directly tied to the SQ. Added explanations on
`constant
`management.
`Added
`PA-—SQ
`synchronization fields and explanation.
`Added more details on the staging register. Added
`detail about
`the parameter caches. Changed the
`call
`instruction to a Conditionnal_cail
`instruction.
`Added
`details
`on
`constant management
`and
`updated the diagram.
`in the SX
`Added Real Time parameter control
`interface. Updated the control flow section.
`New Interfaces to the SX block. Added the end of
`
`clause modifier,
`removed the
`end
`of clause
`instructions,
`
`Rev 1.7 (Laurent Lefebvre)
`Date : February 4, 2002
`Rev 1.8 (Laurent Lefebvre)
`Date : March 4, 2002
`
`Exhibit 2024.doch400_Sequencerdee
`
`71269 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © **
`
`AMD1044_0257139
`
`ATI Ex. 2107
`IPR2023-00922
`Page 5 of 260
`
`ATI Ex. 2107
`
`IPR2023-00922
`Page 5 of 260
`
`
`
`
`
` |
`
`|
`ORIGINATE DATE
`EDIT DATE
`R400 SequencerSpecification
`PAGE
`
`i
`
`24 September, 2001
`
`Blin
`on
`Le A
`4 September, 20154
`
`6 of 48
`
`1. Overview
`It chooses two ALU clauses and a fetch clause to execute, and
`The sequencer is based on the R300 design.
`executes all of the instructions in a clause before looking for a new clause of the same type. Two ALU clauses are
`executed interleaved to hide the ALU latency. Each vector will have eight fetch and eight ALU clauses, but clauses do
`not need to contain instructions. A vector of pixels or vertices ping-pongs along the sequencer FIFO, bouncing from
`fetch reservation station to alu reservation station. A FIFO exists between each reservation stage, holding up vectors
`until the vector currently occupying a reservation station has left. A vector at a reservation station can be chosen to
`execute. The sequencer looks at all eight alu reservation stations to choose an alu clause to execute and all eight
`fetch stations to choose a fetch clause to execute. The arbitrator will give priority to clauses/reservation stations
`closer to the bottom of the pipeline.
`It will not execute an alu clause until the fetch fetches initiated by the previous
`fetch clause have completed. There are two separate sets of reservation stations, one for pixel vectors and one for
`vertices vectors. This way a pixel can pass a vertex and a vertex can pass a pixel.
`
`To support the shader pipe the sequencer also contains the shader instruction cache, constant store, contro! flow
`constants and texture state. The four shader pipes also execute the same instruction thus there is only one
`sequencer for the whole chip.
`
`The sequencer first arbitrates between vectors of 64 vertices that arrive directly from primitive assembly and vectors
`of 16 quads (64 pixels) that are generated in the scan converter.
`
`The vertex or pixel program specifies how many GPRs it needs to execute. The sequencer will not start the next
`vector until the needed spaceis available in the GPRs.
`
`Exhibit 2024 doct400_Sequercerdes
`
`71260 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © **
`
`AMD1044_0257140
`
`ATI Ex. 2107
`IPR2023-00922
`Page 6 of 260
`
`ATI Ex. 2107
`
`IPR2023-00922
`Page 6 of 260
`
`
`
`45||
`
`MALLYA
`
`
`
`
`
`
`
`
`
`
`AMD1044_0257141
`
`ATI Ex. 2107
`IPR2023-00922
`Page7 of 260
`
`
`
`
`
`
`
`»
`
`
`
`
`
`
`
`
`
`
`
`
`valenueBLWLSHOLES
`:LS
`Pr|SuEINiOe|avaOd!|Le-ley—
`ooneeae|oeae"|Se|ayeeeejomALAAAPETKL~—
`xxx@BHC19AODUOSOHONJUBUAdODsoUdIaJOY“JENUSPHUOD[Ly@x84soz1
`
`
`
`
`
`
`po_wTOTx“Ibn|oa>|AMOLSLSNI
`TEee__fOnounSd7
`Wevedsdsds
`
`TVIELVNdadoAALLOdLOUdd
`NI~qiLOM!|dvessouorii_rpee4O~
`—aVLVOBLMXL|go/od|GO/Od|gOsOdfe)PO/Odmn
`
`
`
`
`
`
`
`
`eSNOERNETEEERTSriosWARE
`
`
`-|-|—_aGBS")LUBISUED
`MSLNI—)SELLNI/MSLNI|||
`
`
`
`
`OOKXKI-NADPGLO?Jequigjass»Loog‘equiaydespz
`
`
`dvd‘ANNAdaLNSIINDOdS170LidsSLYSLVYNISIYO
`;
`povmemrennmamemennenemnnennnneone|SSNSALE
`
`
`L_Lvol
`Cn=|
`—cd
`:
`MITALIA0Asouonbeg[e1eueyi]easy
`
`
`
`sepssouenbas“pored0hPLOEWINS
`
`rounoo=|SINWLSNOO
`
`ATI Ex. 2107
`
`IPR2023-00922
`Page 7 of 260
`
`
`
`
`
`
`
`
`
`
`
`
`
`| 24 September, 2001
`|
`ORIGINATE DATE
`
`EDIT DATE
`
`
`
`
`
`4 Seplember, 20154Bilis PO A
`
`|.1 Top Level Block Diagram
`
`R400 SequencerSpecification
`
`PAGE
`
`8 of 48
`
`
`
`Possible delayfor available GPR’s[yg
`
`ey
`IPextare clause 0
`eservationstation
`
`FIFO
`
`P
`
`fexture arbitrator
`
`Pexture clause 1
`eservationstation
`
`
`py
`exture clause 2
`eservation station
`
`
`FIFO.
`[FRO
`exture clause 3
`eservation station
`FIFO
`Local
`(Pextute clause 4
`reservation station.
`
`
`
`
`:
`ALU clause C
`}<-——feservation station
`ened
`ee
`efLU clause t
`reservationstation
`
`
`
`oO
`aS
`[Fre
`FIFO
`
`veriex/pixel vevtor arbitrator
`
`
`
`
`
`
`exture arbitrator
`
`
`
`ALU clause 2
`reservation station
`:
`JALU clause 3
`
`!
`keservation station.
`ee
`i
`ALU clause 4
`
`j
`:
`ARS
`keservationstation
`[Pextuce clause 5
`\
`eservation station
`
`U clause 5 J
`
`La
`FIFO
`
`
`ARS|LOT Texture clause 6reservation station
`
`
`;
`HFO
`eservation station
`LeggALUclause 6
`«
`“
`foscrvation station
`——_—
`rao
`|
`(Pexture clause 7
`4
`
`g
`i
`Ola
`eservation station
`FIFO
`
`Legg—ALU clause 7
`kescrvation station
`
`Be
`
`>
`
`—s
`
`Figure 2: Reservation stations and arbiters
`There are two sets of the above figure, one for vertices and onefor pixels.
`
`Depending on the arbitration state, the sequencer will either choose a vertex or a pixel packet. The control packet
`consists of 3 bits of state, 7 bits for the base address of the Shader program and someinformation on the coverage to
`determine fetch LOD plus other various small state bits.
`
`Exhibit 2024 doct400_Sequercerdes
`
`71260 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © **
`
`AMD1044_0257142
`
`ATI Ex. 2107
`IPR2023-00922
`Page8 of 260
`
`ATI Ex. 2107
`
`IPR2023-00922
`Page 8 of 260
`
`
`
`
`
` A:
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`
`
`
`
`9 of 48
`GEN-CXXXXX-REVA
`4 Seplember, 20154
`24 September, 2001
`Pd.
`On receipt of a packet, the input state machine (not pictured but just before the first FIFO) allocated enough space in
`the GPRs to store the interpolated values and temporaries. Following this,
`the barycentric coordinates (and XY
`screen position if needed) are sent to the interpolator, which will use them to interpolate the parameters and place the
`results into the GPRs. Then, the input state machine stacks the packetin the first FIFO.
`
`On receipt of a command, the level 0 fetch machine issues a fetch request to the TP and corresponding GPR
`address for the fetch address (ta). A smail command (tcmd) is passed to the fetch system identifying the current level
`number (0) as well as the GPR write address for the fetch return data. One fetch request is sent every 4 clocks
`causing the texturing of sixteen 2x2s worth of data (or 64 vertices). Once all the requests are sent the packetis put in
`FIFO 1.
`
`Upon receipt of the return data, the fetch unit writes the data to the register file using the write address that was
`provided by the level 0 fetch machine and sends the clause number (0) to the level O fetch state machine to signify
`that the write is done and thus the data is ready. Then, the level 0 fetch machine increments the counter of FIFO 1 to
`signify to the ALU 0 that the data is ready to be processed.
`
`On receipt of a command, the level 0 ALU machinefirst decrements the input FIFO 1 counter and then issues a
`complete set of level 0 shader instructions. For each instruction,
`the ALU state machine generates 3 source
`addresses, one destination address and an instruction. Once the last instruction has been issued, the packet is put
`into FIFO 2.
`
`There will always be two active ALU clauses at any given time (and two arbiters). One arbiter will arbitrate
`over the odd instructions (4 clocks cycles) and the other one will arbitrate over the even instructions (4
`clocks cycles). The only constraints between the two arbiters is that they are not allowed to pick the same
`clause number as the other one is currently working on if the packet is not of the same type (render state).
`
`if the packet is a vertex packet, upon reaching ALU clause 3, it can export the position if the position is ready. So the
`arbiter must prevent ALU clause 3 to be selected if the positional buffer is full (or can’t be accessed). Along with the
`positional data, if needed the sprite size and/or edge flags can also be sent.
`
`A special case is for multipass vertex shaders, which can export 12 parameters per last 6 clauses to the output
`buffer.
`If the output buffer is full or doesn’t have enough space the sequencer will prevent such a vertex group to
`enter an exporting clause.
`
`Multipass pixel shaders can export 12 parameters to memory from the last clause only (7).
`
`All other clauses process in the same way until the packetfinally reaches the last ALU machine(7).
`
`Only one pair of interleaved ALU state machines may have access to the register file address bus or the instruction
`decode bus at one time. Similarly, only one fetch state machine may have access to the register file address bus at
`one time. Arbitration is performed by three arbiter blocks (two for the ALU state machines and onefor the fetch state
`machines). The arbiters always favor the higher number state machines, preventing a bunch of half finished jobs from
`clogging up the registerfiles.
`
`Exhibit 2024,doch409_Sequeneerdas
`
`71260 Bytes** © ATI Confidential. Reference Copyright Notice on Cover Page © »*
`
`AMD1044_0257143
`
`ATI Ex. 2107
`IPR2023-00922
`Page 9 of 260
`
`ATI Ex. 2107
`
`IPR2023-00922
`Page 9 of 260
`
`
`
`
`
` |
`PAGE
`R400 Sequencer Specification
`EDIT DATE
`ORIGINATE DATE
`
`L Be A BA or
`10 of 48
`4 September, 20154
`24 September, 2001
`|
`
`
`1.2 Data Flow graph (SP)
`
`
`
`
`
`~~
`
`
`
`
`
`o
`
`
`
`
`
`=I)¢textureaddress1&
`
`7 scalarinput/output
`
`pipeline stage
`
`
`Register File
`
`|
`texture
`
`ByWo)BjepSARUL
`
`instruction
`
`
`oi
`Register File
`
`|I
`i
`
`fexturere}
`
`pst
`
` i
`
`
`- a\
`fo at
`to Primitive Assembly Unit or RenderBackend
`I
`
`| »
`ee
`
`|
`
`
`
`
`‘
`OEee
`
`
`
`r
`_¥
`1
`( sealat mmputfoutput
`|
`2b
`
`|
`pipeline stage
`tel
`fre requ
`
`
`
`
`
`Register File
`
`
`
`
`
`
`
`
`
`
`instruction
` RegisterFile
`
`
`
`
`
`
`
`
`instruction
` rT ana
`
`
`MAG
`[text equed —>~
`
` ScalarUnit
`instruction
`
` I
`
`
`
`BpepSinxey”quest
`
`\
`
`
`| | pipeline stage
`
`
`
`
`
`Exhibit 2024 doct400_Sequercerdes
`
`71260 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © **
`
`Figure 3: The shader Pipe
`
`AMD1044_0257144
`
`ATI Ex. 2107
`IPR2023-00922
`Page 10 of 260
`
`ATI Ex. 2107
`
`IPR2023-00922
`Page 10 of 260
`
`
`
` | ORIGINATE DATE EDIT DATE
`
`
`
`‘ The gray area represents blocks that are replicated 4 times per shader pipe (16 times on the overall chip).
`
`|
`
`24 September, 2001
`
`&
`
`
`
`4 September, 20154A Ee
`
`DOCUMENT-REV. NUM.
`
`GEN-CXXXXX-REVA
`
`PAGE
`
`11 of 48
`
`1.3 Control Graph
`
`Clause # + Rady
`WrAddr
`
`CMD
`
`cst
`
`_——___—
`
`IS
`
`|
`|
`
`SEQ
`
`_
`
`
`cs
`
`|
`|
`
`WrAddr
`
`|
`
`
`Phase:
`H
`|
`cmp SSTestzestipx &
`© Wrveo |
`
`| _
`| WrSeal wader
`4
`Bo
`
`RdAddr
`
`8
`
`FETCH
`
`SP
`
`OF
`
`WrAddr:
`
`Figure 4: Sequencer Control interfaces
`
`in red the ALU control interface, in blue the Interpolated/Vector
`In green is represented the Fetch control interface,
`control interface and in purple is the outputfile control interface.
`
`2. Interpolated dala bus
`The interpolators contain an IJ buffer to pack the information as much as possible before writing it to the register file.
`
`Exhibit 2024 dockt4o0_Sequercerdes
`
`71260 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © **
`
`AMD1044_0257145
`
`ATI Ex. 2107
`IPR2023-00922
`Page 11 of 260
`
`ATI Ex. 2107
`
`IPR2023-00922
`Page 11 of 260
`
`
`
`
`
`
`
`To RB
`
`ORIGINATE DATE
`
`EDIT DATE
`
`R400 Sequencer Specification
`
`
`
`4 September, 20154ra EUS A
`
`
`
`24 September, 2001
`
`
`
`PAGE
`12 of 48
`
`
`
`
`
`
`
`
`ot
`bz
`EG
`i=
`
`1
`T
`.
`i
`
`'
`FIX-FLOAT + EXPANSION
`pe a “
`
`di
`1
`a
`
`512
`“|/-
`a
`|
`
`
`i.
`oy
`on
`Ht
`i
`|
`i
`j
`j
`i
`f
`3u
`aur |
`3UR
`4LR
`
`
`|
`
`|
`
`|e
`
`
`
`XA
`
`Figure 3: Interpolation buffers
`
`Exhibit 2024. doch490_Sequencereiac
`
`71269 Byes*** © ATI Confidential. Reference Copyright Notice on Cover Page © »*
`
`AMD1044_0257146
`
`ATI Ex. 2107
`IPR2023-00922
`Page 12 of 260
`
`.
`Aa
`Al
`Ag
`Be
`
`
`Bt
`ce
`ct
`€2
`
`
`C3
`
`o4
`
`cs
`
`bo
`
`D1
`
`i
`
`b2
`
`EG
`
`i
`
`|
`INTERPOLATORS
`
`1
`
`|
`
`|
`
`
`
`|f
`
`2
`
`3
`
`4
`
`
`ae
`
`lds CROSSBAR (4x64 bits}
`TT a= a !
`to Segaa
`
`$n pt - aee iee
`
`| STEELEa
`—— To ee
`
`— SSE
`1
`Us buffer (ging-pong buffer)
`i
`(28 bits * 2 (15) + bits * 6 (delta Ue)+4 &
`Ag
`At
`AZ
`BO
`i
`bits*6}* 16 (quads) * 2 (doubie-butfered)
`
`|
`4096bits
`32x 128
`|'
`
`Bt
`
`co
`
`ci
`
`Ys buffer (ging-pong buffer)
`24 bits * 16 quads * 2
`768 bits
`sed
`
`C4
`3
`—____}__
`
`C5
`
`_
`
`C2
`
`Do
`
`i
`I
`|
`
`
`
`
`
`ATI Ex. 2107
`
`IPR2023-00922
`Page 12 of 260
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`WIt.iselpSuuUOREOd.AIUy79sans]
`
`
`
`
`
`
`
`
`
`
`
`xix@OBEY19A0DUOOHION1YUBUAdODsoUdIOJOY"PENUSPYUOT[LY@wacozl,—semseouonbes“gornsePPLOTTITS
`
`
`
`
`
`eeliecl[21OZL/6hL.StLél)bid
`
`
`
`
`TVIELVNdadoAALLOdLOUdd
`
`
`
`
`
`
`
`
`oceaniaocrencaacereeseanancnsaceaaaaaeaeaaatitaoAGASSIEEFERRIEREEFNSYPacaaaaaoSOSCESoCo
`
`
`
`
`87JOELWAREXXXXXO-NAOVGLO?JequielaesLO0g‘Jaquiaydes77
`
`
`dvdWON(AdaLNSINNOOGSLVdLidaSLYSLVNISIO
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`AMD1044_0257147
`
`ATI Ex. 2107
`IPR2023-00922
`Page 13 of 260
`
`ATI Ex. 2107
`
`IPR2023-00922
`Page 13 of 260
`
`
`
`
`
`
`
`
` ORIGINATE DATE See
`
`
`
`PAGE
`EDIT DATE
`14 0f 48
`4 September, 20154
`24 September, 2001
`Above is an example of a tile the sequencer might receive from the SC. The write side is how the data get stacked
`into the XY and IJ buffers, the read side is how the data is passed to the GPRs. The IJ information is packed in the IJ
`buffer 4 quads at a time or two clocks. The sequencer allows at any given time as many as four quadsto interpolate a
`parameter. They all have to come from the same primitive. Then the sequencer controls the write mask to the GPRs
`to write the valid data in.
`
`R400 Sequencer Specification
`
`{ISSUE : Do we do the center + centroid approach using both lJ buffers?}
`
`Instruction Store
`3.
`There is going to be only oneinstruction store for the whole chip. It will contain 4096 instructions of 96 bits each.
`
`It is likely to be a 1 port memory; we use 7 clock to load the ALU instruction, 1 clocks to load the Fetch instruction, 1
`clock to load 2 control flow instructions and 1 clock to write instructions.
`
`The instruction store is loaded by the CP thru the register mapped registers.
`
`The next picture shows the various modes the CP can load the memory. The Sequencer has to keep track of the
`loading modes in order to wrap around the correct boundaries. The wrap-around points are arbitrary and they are
`specified in the VS_BASE and PIX_BASE control registers. The VS_BASE and PS_BASE context registers are used
`to specify for each context whereits shaderis in the instruction memory.
`
`For the Real time commands the story is quite the same but for some small differences. There are no wrap-around
`points for real time so the driver must be careful not to overwrite regular shader data. The shared code (shared
`subroutines) uses the same path as real time.
`
`Exhibit 2024 doct400_Sequercerdes
`
`71260 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © **
`
`AMD1044_0257148
`
`ATI Ex. 2107
`IPR2023-00922
`Page 14 of 260
`
`ATI Ex. 2107
`
`IPR2023-00922
`Page 14 of 260
`
`
`
`xix@OBEY19A0DUOOHION1YUBUAdODsoUdIOJOY"PENUSPYUOT[LY@wacozl,—semseouonbes“gornsePPLOTTITS
`
`
`
`
`
`igquanbesossysoig
`
`
`
`-qngayeudoidde
`
`
`
`HIE}S0}SJOYA\SMAOUY
`
`
`
`‘apooeu;Bugnoexe
`
`
`
`LOOZ/PL/L}‘peyepdn
`
`LJB}SEposSOMAOwe8sponsa0]Sessaippe
`
`
`
`0]Sassalppe-qngayeudoidde
`
`sevet,ALOWS/\]UONONISU]JOSMAIAS,dDCOPY
`
`99p0dSd|8p0DSd{*3epodSACx¥oPOOSe
`
`
`TVIELVNdadoAALLOdLOUdd
`
`
`oRpauls|,BuryeiBurs-|aoBuryeng-03qoN
`
`
`
`
`
`
` SpoPeleus.apogpeieus-3Sv@YSCVHSXALYSA
`
`earnSEESERRSEEREYTEnaaaae810SLWARE
`
`V8Pp0DSAP09SA
`
`
`
`XXXXXO-NAOVGLO?JequielaesLO0g‘Jaquiaydes77
`
`
`
`dvdWON(AdaLNSINNOOGSLVdLidaSLYSLVNISIO
`gd8podSdJgquenbes08S901g
`
`
`SAQUIIULUOLIMLIPSULIY)JOMBIASof)DULLtLDUNS
`S60PoTS60r
`amLeASWUSCVHSXSLYSA
`
`URIS0]BJ8UMSMOU
`LBYSSpodSAMHO
`‘apooau)Buynoaxe
`—4
`
`
`
`3eP0DSA
`28P0DSd
`aed
`
`
`aSvaYSQVHSTSXid
`
`
`
`
`
`
`
`
`
`
`
`
`
`AMD1044_0257149
`
`ATI Ex. 2107
`IPR2023-00922
`Page 15 of 260
`
`ATI Ex. 2107
`
`IPR2023-00922
`Page 15 of 260
`
`
`
`
`
`
`
`
`
`
`
`
` |
`PAGE
`R400 SequencerSpecification
`EDIT DATE
`ORIGINATE DATE
`|
`i Le A Blin on
`
`
`24 September, 2001
`4 September, 20154
`16 of 48
`
`
`
`
`
`4 Sequencer Instructions
`All control flow instructions and moveinstructions are handled by the sequencer only. The ALUs will perform NOPs
`during this time (MOV PV,PV, PS,PS) if they have nothing else to do.
`
`5 Constant Stores
`
`5.1 Memory organizations
`A likely size for the ALU constant store is 1024x128 bits. The read BW from the ALU constant store is 128 bits/clock
`and the write bandwidth is 32 bits/clock (directed by the CP bus size not by memory ports).
`
`The maximum logical size of the constant store for a given shaderis 256 constants. Or 512 for the pixel/vertex shader
`pair. The size of the re-mapping table is 128 lines (each line addresses 4 constants). The write granularity is 4
`constants or 512 bits.
`It takes 16 clocks to write the four constants. Real time requires 256 lines in the physical
`memory (this is physically register mapped).
`
`The texture state is also kept in a similar memory. The size of this memory is 128x192 bits. The memory thus holds
`128 texture states (192 bits per state). The logical size exposes 32 different states total, which are going to be shared
`between the pixel and the vertex shader. The size of the re-mapping table to for the texture state memory is 32 lines
`(each line addresses 1 texture state lines in the real memory). The CP write granularity is 1 texture state lines (or 192
`bits). The driver sends 512 bits but the CP ignores the top 320 bits.
`it thus takes 6 clocks to write the texture state.
`Real time requires 32 lines in the physical memary (this is physically register mapped).
`
`The control flow constant memory doesn’t sit behind a renaming table. It is register mapped and thus the driver must
`
`hold 8 copies of the 32 dwords of control flow constants and the loop construct constants must be aligned.
`
`The constant re-mapping tables for texture state and ALU constants are logically register mapped for regular mode
`and physically register mapped for RT operation.
`
`“oS ee
`5.2 Management of the Control Flow Constants
`The controlflow constantsareregistermapped,thusthe CPwritesto the according registertoset the constant, the
`$Q@ decodes the address and writes to the block pointed by
`its current base pointer
`(CF VWWR BASE). On the read
`
`
`side, one level of indirection is used. A register (SQ CONTEXT MISC.CF RD BASE) keeps the current base pointer
`
`to the control flow block, This register is copied whenever there is a siale change. Shouls the CP write to CF aller the
`state change, the base register is updated with the (current
`pointer number +1)% number of states. This way,
`if the
`
`CP doesn't write fo CF the state is going to use the previous CF constants.
`
`
`
`
`Ee _ 4Formatted: Bullets and Numbering
`
`“
`
`$25.3Management of the re-mapping tables
`
`3-2-15.3.1 R400 Constant management
`The sequencer is responsible to manage two re-mapping tables (one for the constant store and one for the texture
`state). On a state change (by the driver), the sequencerwill broadside copy the contents ofits re-mapping tables to a
`new one. We have 8 different re-mapping tables we can use concurrently.
`
`The constant memory update will be incremental, the driver only need to update the constants that actually changed
`between the two state changes.
`
`For this model to work in its simplest for