`
` Pat
`
`PAGE
`DOCUMENT-REV. NUM.
`EDIT DATE
`ORIGINATE DATE
`ft. 8
`1 of 54
`GEN-CXXXXX-REVA
`4 September, 20152
`24 September, 2001
`raya
`UY
`Laurent Lefebvre
`
`—-
`
`Author:
`
`
`Issue To:
`| Gopy No:
`
`R400 Sequencer Specification
`
`SQ
`
`
`
` AUTOMATICALLY UPDATED FIELDS:
`
`Version 2.019
`
`
`It provides an overview of the
`Overview: This is an archiectural specification for ihe R400 Sequencer block (SEQ).
`required capabilities and expected uses of the block.
`it also describes the block interfaces,
`internal sub-
`blocks, and provides internal stale diagrams.
`
`Document Location:
`C\perforce’ir400\doc_lib\designiblocksisq\R 400,Sequencer.doc
`Current Intranet Search Title:
`R400 Sequencer Specificetion
`
`
`|
`.
`Se
`"APPROVALS -
`Us
`
`Name/Dépt--
`ee
`a8
`Signature/Date
`
` fb
` Remarks:
`
`
`
`
`
`
`
`
` THIS DOCUMENT CONTAINS CONFIDENTIAL INFORMATION THAT COULD BE
`
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES
`INC. THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`
`
`“Copyright 2001, ATI Technologies Inc. All rights reserved. The material in this document constitutes an unpublished
`work created in 2001. The use of this copyright notice is intended to provide notice that ATI owns a copyright in this |:
`unpublished work. The copyright notice is not an admission that publication has occurred. This work contains
`confidential, proprietary information and trade secrets of ATI. No part of this document may be used, reproduced, or
`transmitted in any form or by any meanswithout the prior written permission of ATI Technologies Inc.”
`
`Exhibit 2020coch400_Sequencendes
`
`73711 Bytes*** © AT] Confidential. Reference Copyright Notice on Cover Page © =
`
`ATI 2029
`
`LGv. ATI
`IPR2015-00325
`
`AMD1044_0257395
`
`ATI Ex. 2108
`IPR2023-00922
`Page 1 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 1 of 316
`
`
`
`
`
`24 September, 20014September,201522 of 54
`
`
`ORIGINATE DATE
`
`
`
`
`
`EDIT DATE
`
`R400 Sequencer Specification
`
`PAGE
`
`
`
`Table Of Contents
`
`LOVERVIEW oo eesccecccscsesnsucseseuesevenseressuesenssssenensnsssesessenvnsssssnnsssusssuenenensssssasssnnsrssenenenesseseses oF
`Li
`Top Level Block Digan eee e cece eee tees e es ttittttntttentnttnnibinagieetitpitettaitstsusunanansnnnns 119
`:
`12 Data Flow graph (SP
`
`13
`Comtrol Graphene eee ee eee ee cece ee ee tpe tte ateteteeteat ats sesssesnensategitstssttetestetetusesessss 1344
`2.
`INTERPOLATED DATA BUS..
`... TG44
`
`3.
`INSTRUCTION STORE ...........
`see
`4.
`SEQUENCER INSTRUCTIONS... oo. eccccccceccseeeesesesennnnnnneesssnnenenestuinnnnenenngnnnnneneetsenennnnnents 1644
`5.
`CONSTANT STORES. oo. ccccccecccccce cece eeenen ce neeeennnn nese enaananteessnnnnnnssdsnanaanaanesssauuseenesessnseeeneces 1614
`S.1
` MGMOry OF AMIZALIONS ooo o cece eeeeeeeeeeseeestu ns ueununnane vitunuusstsssseeeyertunnusesetesseeceesers 1644 95
`
`5.2 Management of the Control FlowConstants V46 ee
`
`
`
`$.3__Management of the fe-mepping tables ooops en ttn aentaninuiniaaiasaies W346
`
`5.3.1
`R400 Constant Management once ceeccccecccsecesccssuscuscsssstuseussssusuussustsssustisussnisssiiiesiiss 1715
`5.3.2
`Proposal for R400LE constant management occ. cc ecccccsessesceccescuseesusssesutssuteseuss 4745
`
`5 BA Free List BlOCK oie cs cccesussesesuespututctsitsauasasnnatuntstsiutsesssdstusistssstsesatsimtsta 14% 0
`5.35
` De-allocate Block ooo cece eves ceseeseseessssnstussatsusntusnatsssnditsutsiatitsutstsstustusnststsustnsasice 2018
`5.3.6
`Operation of Incremental MODEL ccc cec cece ceceeeeeecseeseeteeevessesuseseesuessusensetssaseteessseses 2048
`
`$4 Constant Store INCeOxIN ee eee eee eee eee eA EE LEE EEEEEttniiisitiuinbenpeneebigensititiitits sass 20418
`
`
`$.5 Real Time Commands...
`we
`
`49oes
`
`$6 ConstantWaterfallingee 2149 ©
`
`
`LOOPING AND BRANCHES...
`eeae
`
`
`The controlling state...
`
`6.2
`The Control Flow PrOgrarm cicscssssssssssssesssssvssssssvssssovssstivassivsisnasvsssusiasnsssisssnn 2220 - oe
`
`
`6.2.1
`Control flow instructions table ooo... ccccceccecsccescussuscssessuseesussussussussetestussussstsususieciies 23B4
`
`
`GB
`leplermertatcance eevee eves ceee eves esveuuesysesussuusvusybesususssubauesuesssisisussysiussussupbesssessuvessssss gage
`64 Data dependant predicate instructions. 2624 92
`
`
`
`65 HW Detection of PVPS ooo eect tee eeeeebetenentetettiotitteeteteeseuneneeneenes Z2f24
`66 Register fle iCexing ooo occ cece cece ee ec cseseenees sisesstsntnesttennanunnpstunsessteteteseeeteteseteteuseseutes 2fZA - Bs
`
`
`
`the values intheGPRs 2625 =
`
`Method 7: Debugging registerscccett an 2IBD o ee
`Method 2: Exporting
`7.
`eees
`PIXEL KILL MASK oc eee eter eeeeneeenereernyen es
`
`MULTIPASS VERTEX SHADERS (HOS) .0..cccscsscsssscsossssssssssssssssssosssssesssessessterssssesessnvsse a
`8.___
`9.
`REGISTER FILE ALLOCATION. oo ccssssssssesssssssssssssssssssssssssssvevssssssvsssvessssvsvsssuitensassvesasnevens 2826
`FETCH ARBITRATION.........
`2927
`10.__
`
`
`
`2927
`Lt.__ ALU ARBITRATION...
`HANDLING STALLS vovcsccecsssesssscssssssssssvesssssessesvesese
`3028
`12.
`
`CONTENT OF THE RESERVATION STATION FIFOS..
`13.
`
`THE OUTPUT FILE cece sceteeseeecseeseeeeeessereneeeeseneees
`(4.
`
`lJ FORMAT we
`
`
`
`Exhibit 2029, dockuoo_Sequencerdoe
`
`73711 Byes*** © ATI Confidential. Reference Copyright Notice on Cover Page © +=
`
`AMD1044_0257396
`
`ATI Ex. 2108
`IPR2023-00922
`Page 2 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 2 of 316
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`:
`
`3 of 54
`GEN-CXXXXX-REVA
`4 September, 20152
`24 September, 2001
`‘
`a
`fu
`LA i
`fy
`a
`sas
`THE PARAMETER CACHE oon ccccnccessseeceeesereennesntessutansnnnausssauannenautunmenusenssiennemassnsmennseases 333g
`17.
`171 ExportrestrictionSee eee BAB
`
`
`
`
`
`
`
`
`
`
`
`Arbitration restrictions oo eee eect eee e eect teeta bette etietettetttieettnetneenes 3439
`17.2
`EXPORT TYPES......
`3434
`18.
`
`
`VertexShading...
`346
`PIK@L SMC oo cece eee eee eee ete ette ett tthe te eEEEEEEEEHiAG AMD itbecietpeiteiisspeieupispisisissinessnes
`SPECIAL INTERPOLATION MODES..
`3534
`19.
`Real time commands
`3534
`19.1
`Sprites/ XY screen coordinates/ FB information... cece eee tee atte tsetse 3532
`19.2
`
`
`Auto generated COUNMIETSocc cece cece nee e neat vee eesti es set epnt tent stb ecbttesteseusestussenes 3632
` 19.3
`
`V9.3.) Vertex SRA ers oonnocee oe cence cesses eee o cesses peepee veto utes cititeetstepeiteeseces.., 8632
`
`
`
`19.3.2
`Pixel Shader. once vce
`vee
`ce
`eee ec see
`o eee eee teece ete tets
`costtttpseiecriveeesees
`es, BEB2
`20.
`STATE MANAGEMENTqu... cceccccccesenccsecesseeuseecessuuuseounsssusecenmnsussennnanavesesnnnaaantestnananacses 3623
`20.1
`Parameter cache SYNCHroniZation eee cece eee eee e eee tnnn nn ngetnasteteestnss 3633
`
`
`
`XY ADDRESS IMPORTS...
`3733
`21.
`
`
`
`2L1Vertexindexesimpor
`3733
`22.
`REGISTERS..
`
`
`22.1
`Control.
`
`
`
`22.2 Context.
`
`23. DEBUG REGISTERS... .....-cccccccssscesesseconnnnsssssnnnnsenseeennnensecassnensersunssansnsnnastanaesnnncstsananansttss 3835
`23.1 COMO cece cece eee eee eaneesssts-ceeetueesesueessenesssesste setts steteetessepsusuusetsteseteseestieeetretensse: 3835
`23.2 Control.
`3835
`
`
`24.
`INTERFACES...
`
`241
`External Interfaces.
`3
`242
`SC to SP Inleracesoe eee eee eee beet bb ntetnes bitisstttetetieeetutesntenistessitees 3835
`
`24.2.1aSCSPHoeceereesetpteptispntnabntnpitniigiinisinsiiinpwiiSOS
`QD 2—SC anna ccceecec ccc vcen cece suesveves sustains pussy ssuaisviessustessesstyssipesunesneyeensenssss 3936
`
`24.2.3
`
`SQ to SX: interpolator DUS once c ccc ccccceccccsesescuscsssesvesusesstusuuteesseeseesusevisvssssvse vis 4138
`
` 24.2.4
` SOQ to SP: Staging Register Data occ cssecscscscensnvsessvsvavannevsrpusstsvanveruunnvrtenvses 4138
`
`2A 25
` VGT to SQ. Vertex interface. ccc cecsesuescevesssetesuetussaueessestsevasiiivatnecaventisn 4138
`
`os
`esteess
`242.6
` S8Qto SX: Control bus...
`
`SX to SQ: Output file COMO cece se eseeeccseeeevosessssvsnevosnssssnsnesessesaupnessesenss 4544
`
`24.2.7
`QA 2B SO to TP: COmbrol BUS oes cece secs sencenvsvssanvssasnsvstasansausssssuvansetsssavisivansssanssesvanves 4642
`24.2.9
`TP to SQ: Texture Stal cccceccccssceucsscssapvavensssssasanvavsssssssunsayssvspisuanyisnvasssnesss 4642
`2A 2AO SQ to SP: Texture Stall ccc secsescsvenesussasossnssssuasasvavstustusunssusssasitssunessunuesssanvess 4742.
`242.11 SQ to SP: GPR and auto COUNT occ ese caeec seeeeee cess peenasstpsaetcetaevisatnintnsess 4743
`BA ZAZ SQ to SPInstructions one cecses eve ees e cee ssceeee tees pcptatasaetsuaesseitgetisusnuatnsests 4844
`242.13 SP to SQ: Constant address load/ Predicate Seto eee sve severe csusssrssersas 4844
`24.214 SO to SPx: constant broadeast oc ccc ususencssesnesssvsussunvenetsssazosenmnsssusstineseuns 4945
`
`Echiblt 2020. cdockd00_Sequencerdos
`
`73711 Byes*** © ATI Confidential. Reference Copyright Notice on Cover Page © +=
`
`AMD1044_0257397
`
`ATI Ex. 2108
`IPR2023-00922
`Page 3 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 3 of 316
`
`
`
`Fag 24September,2001
`4September,20122
`4of54
`
`
`
`
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`R400 Sequencer Specification
`PAGE a
`
`:
`
`SPO to SQ: Ki VECtOr lOocc cece cece cses ceva saeetesuespestesntensttutitennevsiteetnentnnets 4945
`24.2, 15
`24.216 SOQ to CP: RBBM DUS ooo ccesseussesuensssssansssnesssunsantvstisussnnnsvssansisssnnessunsnsssnsvess 4945
`242.17 CP to SQ: RBBM DUS. occ ccc ecco cecs eens ecssonssesse css tatnastastsututninttisansisstniessenessesisess 4945
`
`24,.2,18 SOQ to CP: State femora ete tates secetteatentetutueeuesstenengenepets 4945
`
` 4 -SEQUENCER-INSTRUCTIONS.,
`§.----
`CONSTANT STORES wecnveveeverveves
`
`
`5.31RA00.Sonstant-managementonsen
`
`
`33-2—Proposatfor-R400L.E-constant-management
`
`
`oy3-4—-Free-List Blo¢knner
`
`
`3.3-5-—-De-allecate- Bloekrnnnaea
`
`3-3-6. Operation-ofIncrementalmodel...
`4——-Gonstant Store-_indexing
`
`$3Real-Time-Commands..
`
`
`
` 62 The Control Flow Program oo...
`
`1H—ALU-ARBITRATION.
`12.-HANDLING STALLS essvseeensnnnensnrrenenrerrnonnneerinnenrernss
`
`
`
`i—-1J-FOR MASensspinner aurearennerurine tina ee
`
`Exhibit 2020. dockdoo_Sequencecdoe
`
`73711 Byes*** © ATI Confidential. Reference Copyright Notice on Cover Page © +=«
`
`:
`
`AMD1044_0257398
`
`ATI Ex. 2108
`IPR2023-00922
`Page 4 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 4 of 316
`
`
`
`
`
`24 September, 20014
`
`.
`
`4 September, 20152
`i
`SLA
`i
`
`
`
`
`rat
`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`
`
`
`GEN-CXXXKX-REVA
`
`5 of 54
`
`SG16-S%-Interpolater-pus-—.-
`SG1e-SF:-Staging-Register BalessseeeS
`VGT to SQ > Vertex INterace.. cece cece ee ect eteeeettreeteetttssetenttteecenenies 38
`8Q10-SX:-COnU OL BUSsree 44
`
`Sx-1o-SQ-Outpul fle contolseAF
`SG to-TP:-Contre| Bu enrnrererererreererrrrrrerrenerrrrrrrrrrerrerrrrrrerrrrerrreerrreerrerrrrerrre ee
`+TPto-SQ:—-Texture- Stall enneereeerenrrrererennreereerrreerrerrrrererrrrererrrerrrererrereerrererrrererereethe
`~-8G-te-SP:-Lexture- Stalrrererrennnrrrererenneeernrrenrreererrreerenrererreeeeerrereermerererreer he
`
`ar
`SQ 1o-SP:GPR-and-auto-counter.
`~-SQ-to-SPx:-ISltuGll ONS nnneceer eres rnrrrrrrrrrrrennnrerrrrrrsrnrrrrirrrrsrrre 44
`
`Echiblt 2020. cdockd00_Sequencerdos
`
`73711 Byes*** © ATI Confidential. Reference Copyright Notice on Cover Page © +=
`
`AMD1044_0257399
`
`ATI Ex. 2108
`IPR2023-00922
`Page 5 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 5 of 316
`
`
`
`
`
`
`
`
`
`
`R400 Sequencer Specification
`
`PAGE
`6 of 54
`
`|
`
`ORIGINATE DATE
`24 September, 2001
`
`EDIT DATE
`lat
`fay
`ih
`4 September, 20152
`
`27-2-4+3---SP-to SQ-Constant-address-load/-Predicate-Set.44
`
`Exhibit 2020. dockdoo_Sequencecdoe
`
`73711 Byes*** © ATI Confidential. Reference Copyright Notice on Cover Page © +=«
`
`AMD1044_0257400
`
`ATI Ex. 2108
`IPR2023-00922
`Page 6 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 6 of 316
`
`
`
`
`
`
`Revision Changes:
`
`ORIGINATE DATE
`
`24 September, 2001
`
`Rev 0.1 (Laurent Lefebvre)
`Date: May 7, 2001
`
`Rev 0.2 (Laurent Lefebvre)
`Date : July 9, 2001
`Rev 0.3 (Laurent Lefebvre)
`Date : August 6, 2001
`Rev0.4 (Laurent Lefebvre)
`Date : August 24, 2001
`
`Rev 0.5 (Laurent Lefebvre)
`Date : September 7, 2001
`Rev 0.6 (Laurent Lefebvre)
`Date : September 24, 2001
`Rev0.7 (Laurent Lefebvre)
`Date : October 5, 2001
`
`Rev 0.8 (Laurent Lefebvre)
`Date : October 8, 2001
`Rev 0.9 (Laurent Lefebvre)
`Date : October 17, 2001
`
`Rev 1.0 (Laurent Lefebvre)
`Date : October 19, 2001
`Rev 1.1 (Laurent Lefebvre)
`Date : October 26, 2001
`
`Rev 1.2 (Laurent Lefebvre)
`Date : November 16, 2001
`Rev 1.3 (Laurent Lefebvre)
`Date : November 26, 2001
`Rey 1.4 (Laurent Lefebvre)
`Date : December 6, 2001
`
`Rev 1.5 (Laurent Lefebvre)
`Date : December 11, 2001
`
`Rev 1.5 (Laurent Lefebvre)
`Date : January 7, 2002
`
`Rev 1.7 (Laurent Lefebvre)
`Date : February 4, 2002
`Rev 1.8 (Laurent Lefebvre)
`Date : March 4, 2002
`
`Rev 1.9 (Laurent Lefebvre)
`Date : March 18, 2002
`Rev 1.10 (Laurent Lefebvre)
`Date : March 25, 2002
`Rev 1.11 (Laurent Lefebvre)
`Date : Apri] 19, 2002
`Rev 2.0 (Laurent Lefebvre)
`Date : April 19, 2002
`
`EDIT DATE
`
`A
`
`4 Sepiember, 20152ieA A
`
` DOCUMENT-REV. NUM.
`
`GEN-CX200X-REVA
`
`PAGE
`7 of 54
`
`
`
`First draft.
`
`Changed the interfaces to reflect the changesin the
`SP. Added some details in the arbitration section.
`Reviewed the Sequencer spec after the meeting on
`August 3, 2001.
`Added the dynamic allocation method for register
`file and an example (written in part by Vic) of the
`flow of pixels/vertices in the sequencer.
`Added timing diagrams (Vic)
`
`the new R400
`
`reflect
`spec to
`Changed the
`architecture. Added interfaces.
`instruction
`Added
`constant
`store management,
`store management, control flow management and
`data dependant predication.
`Changed the control
`flow method to be more
`flexible. Also updated the external interfaces.
`Incorporated changes made in the 10/18/01 contro/
`flow meeting. Added a NOP instruction, removed
`the
`conditional_execute_or_jump. Added
`debug
`registers.
`Refined interfaces to RB. Added state registers.
`
`della
`interfaces. Changed
`SEQ-—-SPOQ
`Added
`precision. Changed VGT-SP0 interface. Debug
`Methods added.
`Interfaces greatly refined. Cleaned up the spec.
`
`Added the different interpolation modes.
`
`Added the auto incrementing counters. Changed
`the VGT--SQ interface. Added content on constant
`management. Updated GPRs.
`Removed from the spec all interfaces that werer’t
`directly tied to the SQ. Added explanations on
`constant
`management.
`Added
`PA--SQ
`synchronization fields and explanation.
`Added more details on the staging register. Added
`detail about
`the parameter caches. Changed the
`call
`instruction to a Conditionnal_call
`instruction.
`Added
`details
`on
`constant management
`and
`updated the diagram.
`in the SX
`Added Real Time parameter control
`interface. Updated the control flow section.
`Newinterfaces to the SX block. Added the end of
`clause modifier,
`removed the
`end
`of clause
`instructions.
`Rearangement of the CF instruction bits in order to
`ensure byte alignement.
`Updated the interfaces and added a section on
`exporting rules.
`Added CP state report interface. Last version of the
`spec with the old control flow scheme
`Newcontrol flow scheme
`
`Exhibit 2020.dock400_Sequencerdes
`
`79711 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © +=
`
`AMD1044_0257401
`
`ATI Ex. 2108
`IPR2023-00922
`Page7 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 7 of 316
`
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`R400 Sequencer Specification
`PAGE
`
` | 24 September, 2001 4 September, 20152 8 of 54
`oy A ih
`Zi.
`
`
`Rev 2.01 (Laurent Lefebvre)
`Changed slightly the control
`flow instructions to
`Dele : May 2. 2002
`alowforce jumos and calls.
`
`Exhibit 2020. dockdoo_Sequencecdoe
`
`73711 Byes*** © ATI Confidential. Reference Copyright Notice on Cover Page © +=«
`
`AMD1044_0257402
`
`ATI Ex. 2108
`IPR2023-00922
`Page 8 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 8 of 316
`
`
`
`
`
`
`i
`
`24 S5eptember, 2001
`
`Bmw
`i
`SA sh oe
`4 September, 20152
`
` |
`
`|
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`
`GEN-CXXKKXK-REVA
`
`L
`|
`
`9 of 54
`
`1. Overview
`The sequencer chooses two ALU threads and a fetch hread to execute, and executes all of the instructions in a block
`before looking for a new clause of the same type. Two ALU threads are executed interleaved to hide the ALU latency.
`The arbitrator will give priority to older threads. There are two separate reservation stations, one for pixel vectors and
`one for vertices vectors. This way a pixel can pass a vertex and a vertex can pass a pixel.
`
`To support the shader pipe the sequencer also contains the shader instruction cache, constant store, contro! flow
`constants and texture state. The four shader pipes also execute the same instruction thus there is only one
`sequencer for the whole chip.
`
`The sequencer first arbitrates between vectors of 64 vertices that arrive directly from primitive assembly and vectors
`of 16 quads (64 pixels) that are generated in the scan converter.
`
`The vertex or pixel program specifies how many GPRsit needs to execute. The sequencer will not start the next
`vector until the needed space is available in the GPRs.
`
`Exhibit 2028 dockdoG_Sequencerdes
`
`73711 Bytes*** © ATL Confidential. Reference Copyright Notice on Cover Page © +=
`
`AMD1044_0257403
`
`ATI Ex. 2108
`IPR2023-00922
`Page 9 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 9 of 316
`
`
`
`
`
`
`
`TWIRELVANeaddoAATLOaLOdd
`
`
`
`AqoVvd
`
`¥S3°OL
`
`
`
`
`
`uoijeayioadsuaouenbesoor
`
`alvdLids
`
`SaLVGSLYNISIYO
`
`
`
`
`
`uvassouori
`
`
`
`
`
`-SP1SQYAOEy)TOMLNOD
`
`|
`
`os
`
`
`
`
`
`
`
` aaiNiaaNi|*_4;
`
`
`
`
`
`x-GCia
`dsdsds
`
`gO/Od-g0/0d|FO/Od
`
`ia
`
`Wi"
`
`
`Wield
`
`
`
`
`
`
`sores}
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
` aada|
`44
`
`
`
`
`
`MOTAIOAGJosuonbag[eisuey3]oansly
`
`
`
`INWISHOOD
`
`OVO)
`
`EGLOsequigjaesp
`
`
`LO0Z‘“Iequiaydespz
`
`
`
`=TERIERYeT
`
`XELLUSA
`
`“OELNOO
`
`SINVLISNOO|peddeyy
`
`
`
`49imsiéey
`
`moLSEiP-
`
`peey2d
`
`
`
`
`
`
`
`axe@860gJ9AODUOBOONJUGUAdODsousioJey"]eENUEPYUOD[Ly@8sbez
`
`
`
`
`
`
`
`
`
` sopussuenbag"pppTFBI
`&SLVLSHOLSd
`
`
`AYOLSLSNI
`
`
`
`
`
`
`
`O62")IUBISUOT
`
`AMD1044_0257404
`
`ATI Ex. 2108
`IPR2023-00922
`Page 10 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 10 of 316
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`|
`|
`
`ORIGINATE DATE
`24 S5eptember, 2001
`
`
`
`EDIT DATE
`4 September, 20152
`Hew
`Oi
`QO Areeit ey
`
`DOCUMENT-REV. NUM.
`GEN-CXXAXX-REVA
`
`PAGE
`11 of 54
`
`|
`
`1.1 Top Level Block Diagram
`
`>—— InputArbiter _
`
`
`
`esSe ee,
`
`
`
`PIX RS
`
`-+—
`
`
`
`
`
`|
`
`:—r
`
`VIX RS
`
`Exec Arbiter
`
`
`
`
`
`ALU
`
`Texture —
`
`Figure 2: Reservation stations and arbiters
`Under this new scheme, the sequencer (SQ) will only use one global state management machine per vector type
`(pixel, vertex) that we call the reservation station (RS).
`
`Exhibit 2028 dockdoG_Sequencerdes
`
`73711 Bytes*** © ATL Confidential. Reference Copyright Notice on Cover Page © +=
`
`AMD1044_0257405
`
`ATI Ex. 2108
`IPR2023-00922
`Page 11 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 11 of 316
`
`
`
` |
`
`ORIGINATE DATE
`
`EDIT DATE
`
`4 September, 20152
`A
`PA A
`it
`
`R400 Sequencer Specification
`
`PAGE
`12 of 54
`
`|
`24 September, 2001
`1.2 Data Flow graph (SP)
`
`|
`
` (
`
`
`
`pst
`address
`
`
`
`
`
`
`
`
`texture
`
`
`
`
`
`r
`
`
`
`RegisterFile
`
`
`
`
`
`texture| quest
`
`
`
`
`
`
`texture rel
`
` \
`
`/
`
`Register File
`
`
`
`
`
`
`
`instruction
`
`
`
`instruction
`
`JLleealrnpulout
`
` tel fre rea
` el
`
`
`
`
`
`
`
`instruction
`
`text aa
`
`ScalarUnit
`
`instruction
`1i
`
`Byepsinyxey~
`
`
`
`ByWO)BlepSAW
`
`catrapuale)
`
` |
`pipeline stage
`
`
`|
`pipeline stage
`
`
`
`f
`RegisterFile
`
`scalar inputfoutput
`pipeline
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`~
`
`ae\
`itive Assembly Unit or Render!Backend
`f a EI
`r
`\
`to Prim
`v
`
`Exhibit 2029 decR400_Sequercerdoc
`
`73711 Byes*** @ ATI Confidential. Reference Copyright Notice on Cover Page © ++
`
`Figure 3: The shader Pipe
`
`AMD1044_0257406
`
`ATI Ex. 2108
`IPR2023-00922
`Page 12 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 12 of 316
`
`
`
`
`
`
`
`
`
`|
`
`|
`
`ORIGINATE DATE
`
`24 September, 2001
`
`
`
`EDIT DATE
`
`4 September, 20152
`few Ov ihr
`
`DOCUMENT-REV. NUM.
`
`GEN-CXXKXKX-REVA
`
`PAGE
`
`13 of 54
`
`The gray area represents blocks that are replicated 4 times per shader pipe (16 times on the overall chip).
`
`1.3 Control Graph
`
`is
`
`CST
`
`SEQ
`
`|
`|
`||
`|
`|
`|
`|
`i
`|
`i
`|
`
`Lo
`|
`|
`
`-
`
`WrAddr
`
`Be
`
`Ciause # + Rady
`
`WrAddr
`cMD
`
`cst
`
`
`
`| A
`D4
`Phase|
`emp CSTestzestipx 4
`C Wrvec |
`Do | WrScal race
`“
`2 a
`
`RdAdar
`
`3
`
`FETCH
`
`SPO
`
`Re
`
`OF
`
`WrAddr
`
`Figure 4: Sequencer Control interfaces
`
`in red the ALU control interface, in blue the Interpolated/Vector
`In green is represented the Fetch control interface,
`control interface and in purple is the output file contro! interface.
`
`2. Interpolated data bus
`The interpolators contain an lJ buffer to pack the information as much as possible before writing it to the register file.
`
`Exhibit 2028 dockdoG_Sequencerdes
`
`73711 Bytes*** © ATL Confidential. Reference Copyright Notice on Cover Page © +=
`
`AMD1044_0257407
`
`ATI Ex. 2108
`IPR2023-00922
`Page 13 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 13 of 316
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`PAGE
`14 of 54
`
`24 September, 2001
`
`4 September, 20152
`fu
`CLA
`wif
`
`R400 Sequencer Specification
`
`ToRB
`
`— _
`!
`AQ
`
`i
`At
`
`
`
`
`
`
`
`Bt
`co
`ot
`c2
`
`
`C5
`
`pe
`
`XYsbuffer (ping-pongbuffer)
`24 bits * 16 quads * 2
`768 bits
`saad
`
`C3
`
`c4
`
`cS
`
`bo
`
`2
`
`3
`
`4
`
`cs C4
`!
`|
`
`DI
`
`ba
`
`
`
`
`
`
`Ag its*2 (15) + 8 bits * 6 Getta s)r4 oFAt A2 BO /
`
`AD
`At
`AZ
`BO
`i
`bits*6)* 16 (quads) * 2 (double-butfered)
`
`4096 bits
`/
`32x 128
`Bt
`co
`ci
`c2
`i
`
`|
`|
`|
`1
`E1
`EO
`|
`i
`Dt
`D2
`EO
`EI
`|
`i
`_
`T
`T
`
`1
`T
`|
`|
`!
`t
`INTERPOLATORS
`
`FIX-FLOAT + EXPANSION
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`|
`
`|
`|
`
`L
`™
`J
`
`L i
`
`|
`
`ju
`|
`
`——
`
`
`
`Ih
`
`
`
`512
`
`|
`mm.
`my
`rf
`|
`|
`
`p — fj |
`
`valfen] Tul |an|a] oa
`
`
`
`|
`|
`|
`
`
`II I
`4uR |
`|
`fLE | LL |
`
`Figure 5: interpolation buffers
`
`Exhibit 2029 dockd00_Sequencerdac
`
`73711 Byes*** © ATI Confidential. Reference Copyright Notice on Cover Page © +=«
`
`AMD1044_0257408
`
`ATI Ex. 2108
`IPR2023-00922
`Page 14 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 14 of 316
`
`
`
`
`
`
`
`L2lOCL/GIL
`
`
`
`
`
`SLL
`
`
`
`
`
`ellbb
`
`
`OlL
`
`
`
`
`
`weiderpSuppuonepdasqUy:oBML]
`
`
`
`
`
`
`
` TeeeeCAITETFCAI
`
`
`
`
`
`
` I91a
`
`
`dovd
`
`¥G}OSL
`
`
`
`WON(AdaLNSWNOOd
`
`
`
`aLvd1103
`
`
`
`
`
`
`
`TWIRELVANeaddoAATLOaLOdd
`
`
`
`WAREXXXXXO-NAD
`
`
`EGLOJeqQueyaesy
`
`L00z‘lequieydes
` aivday
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`see@OHB_J9A05UOBOONWYUGUAdODsoudIIJON“|EHUSPYUSD[LY@vaHe
`
`
`
`
`
`
`
`
`
`
`
`sopisseanbes“ggpysop6702Tay
`
`AMD1044_0257409
`
`ATI Ex. 2108
`IPR2023-00922
`Page 15 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 15 of 316
`
`
`
`
`
`
`
`
`
`
`
`R400 Sequencer Specification
`
`
`
`
`
`
`PAGE
`EDIT DATE
`ORIGINATE DATE
`16 of 54
`4 September, 20152
`24 September, 2001
`a
`B.
`Above is an example of a tile the sequencer might receive from the SC. The write side is how the data get stacked
`into the XY and IJ buffers, the read side is how the data is passed to the GPRs. The IJ information is packed in the IJ
`buffer 4 quads at a time or two clocks. The sequencer allows at any given time as many as four quadsto interpolate a
`parameter. They all have to come from the sameprimitive. Then the sequencer controls the write mask to the GPRs
`to write the valid data in.
`
`Instruction Store
`3.
`There is going to be only oneinstruction store for the whole chip. It will contain 4096 instructions of 96 bits each.
`
`It is likely to be a 1 port memory; we use 1 clock to load the ALU instruction, 1 clocks to load the Fetch instruction, 1
`clock to load 2 control flow instructions and 1 clock to write instructions.
`
`The instruction store is loaded by the CP thru the register mappedregisters.
`
`The VS_BASE and PS_BASE context registers are used to specify for each context where its shader is in the
`instruction memory.
`
`For the Real time commandsthe story is quite the same but for some small differences. There are no wrap-around
`points for real time so the driver must be careful not to overwrite regular shader data. The shared code (shared
`subroutines) uses the same path as real time.
`
`4, Sequencer Instructions
`All control flow instructions and move instructions are handled by the sequencer only. The ALUs will perform NOPs
`during this time (MOV PV,PV, PS,PS) if they have nothing else to do.
`
`5. Constant Stores
`
`5.1 Memory organizations
`A likely size for the ALU constant store is 1024x128 bits. The read BW from the ALU constant store is 128 bits/clock
`and the write bandwidth is 32 bits/clock (directed by the CP bus size not by memory ports).
`
`The maximum logical size of the constant store for a given shader is 256 constants. Or 512 for the pixel/vertex shader
`pair. The size of the re-mapping table is 128 lines (each line addresses 4 constants). The write granularity is 4
`constants or 512 bits.
`It takes 16 clocks to write the four constants. Real time requires 256 lines in the physical
`memory (this is physically register mapped).
`
`The texture state is also kept in a similar memory. The size of this memory is 320x96 bits (128 texture states for
`regular mode, 32 states for RT). The memory thus holds 128 texture states (192 bits per state). The logical size
`exposes 32 different states total, which are going to be shared between the pixel and the vertex shader. The size of
`the re-mapping table to for the texture state memory is 32 lines (each line addresses 1 texture state lines in the real
`memory). The CP write granularity is 1 texture state lines (or 192 bits). The driver sends 512 bits but the CP ignores
`the top 320 bits.
`It thus takes 6 clocks to write the texture state. Real time requires 32 lines in the physical memory
`(this is physically register mapped).
`
`The control flow constant memory doesn’t sit behind a renaming table. It is register mapped and thus the driver must
`reload its content each time there is a changein the control flow constants. Its size is 320*32 because it must hold 8
`copies of the 32 dwords of contro! flow constants and the loop construct constants must be aligned.
`
`The constant re-mapping tables for texture state and ALU constants are logically register mapped for regular mode
`and physically register mapped for RT operation.
`
`Exhibit 2020 deckd0G_Sequeneerdoe
`
`73711 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © =
`
`AMD1044_0257410
`
`ATI Ex. 2108
`IPR2023-00922
`Page 16 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 16 of 316
`
`
`
`
`
`
`
`
`DOCUMENT-REV. NUM.
`EDIT DATE
`ORIGINATE DATE
`|
`GEN-CXXXXX-REVA
`4 September 20152
`24 September, 2001
`|
`hy
`2 Management of the Control Flow Constants
`The control flow constants are register mapped, thus the CP writes to the according register to set the constant, the
`SQ decodes the address and writes to the block pointed by its current base pointer (CF_VWWR_BASE). Onthe read
`side, one level ofindirection is used. A register (SQ_CONTEXT_MISC.CF_RD_BASE) keeps the current base pointer
`to the control flow block. This register is copied wheneverthere is a state change. Should the CP write to CF afler the
`state change, the base register is updated with the (current pointer number +1 )% number of states. This way, if the
`CP doesn't write to CF the state is going to use the previous CF constants.
`
`PAGE
`17 of 54
`
`|
`
`5.3 Managementof the re-mapping tables
`
`5.3.1 R400 Constant management
`The sequencer is responsible to manage two re-mapping tables (one for the constant store and one for the texture
`state). On a state change (by the driver), the sequencerwill broadside copy the contents ofits re-mapping tables to a
`new one. We have 8 different re-mapping tables we can use concurrently.
`
`The constant memory update will be incremental, the driver only need to update the constants that actually changed
`between the two state changes.
`
`For this model to work in its simplest form, the requirement is that the physical memory MUSTbeat least twice as
`
`large as the logical address space + the space allocated for Real Time. In our case, since the logical address space
`ig 512 and the reserved RT space can be up to 256 entries, the memory must be of sizes 1280 and above. Similarly
`the size of the texture store must be of 32*2+32 = 96 entries and above.
`
`5.3.2 Proposal for R400LE constant management
`To make this scheme work with only 512+256 = 768 entries, upon reception of a CONTROL packetof state + 1, the
`sequencer would check for SQ_IDLE and PA_IDLE and if both are idle willerase the content of statefo replace it‘with
`the newstate (this is depicted in Figure 8: De-allocation mechanism}
`
`allecation-mechaniem). Note that in the case a state is cleared a value of 0 is written to the corresponding de-
`allocation counter location so that when the SQ is going to report a state change, nothing will be de-allocated upon
`thefirst report.
`
`The second path sets all context dirty bits that were used in the current state to 1 hus allowing the newstate to
`reuse these physical addressesif needed).
`
`Exhibit 2028 dockdoG_Sequencerdes
`
`73711 Bytes*** © ATL Confidential. Reference Copyright Notice on Cover Page © +=
`
`AMD1044_0257411
`
`ATI Ex. 2108
`IPR2023-00922
`Page 17 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 17 of 316
`
`
`
`
`
`
`
`
`
`Global Register
`Data Bus
`
`'
`Constants
`location <——_—_
`available
`i
`WRTR
`
`
`
`f
`
`
`
`
`
`Physical
`Memory
`
`>
`
`|
`
`Staging Data
`Buffer
`|
`|
`i .
`i
`— Staging Write Addr|
`
`next
`physical
`address
`ready
`for allocate
`
`physical
`address
`to
`
`schedule
`for
`
`|
`deallac
`|
`
`i
`Logical address
`Onthe peNN
`GlbRegBus
`_4 a |
`_
`aA
`when Ish are zero
`This
`!
`
`Context
`first word of write
`,
`|
`|
`Dirt
`Renaming Table
`|
`for 1 Context
`yy
`|
`i
`Current/Last
`Logical
`|
`i
`Physical
`_L
`Address
`Address
`| Address
`|
`(Only
`er
`|
`| ditset
`|
`de-
`Le ‘cal
`|
`|
`don't
`
`
`
`Address|allocate allocate -—____
`
`
`ifset)
`or de
`|
`|
`
`allocate)|
`|
`|
`Renaming
`
`
`:
`table
`N-Contexts
`
`
`
`PAGE
`R400 Sequencer Specification
`EDIT DATE
`ORIGINATE DATE
`
` 24 September, 2001 4 September, 20152 18 of 54
`
`oy A ih
`Zi.
`
`Free List
`_
`sities>
`
`
`
`| Renaming Table
`
`Context 0 => N
`
`~CurrenvLast|| |
`
`
`Context
`i
`
`
`(8 rows of 16-8|| eri ;
`
`
`bit physical =>
`"
`Logical Address
`
`
`128 entries copy
`
`
`in eight clocks)
`& Context
`
`
`
`
`
`
`
`
`
`
`Physical
`Address
`
`
`
`
`
`~, Seq
`Constant
`Request
`
`Context &
`Logical
`Address —]
`
`
`
`
`
`Copy Last held above to
`Current Context onreceipt
`of Set Constant for a
`newcontext (Hide loading
`behind Set State load - 16 clocks)
`all cther Set States just write one
`entry te current state.
`
`Exhibit 2020. dockdoo_Sequencecdoe
`
`73711 Byes*** © ATI Confidential. Reference Copyright Notice on Cover Page © +=«
`
`Figure 7: Constant management
`
`AMD1044_0257412
`
`ATI Ex. 2108
`IPR2023-00922
`Page 18 of 316
`
`ATI Ex. 2108
`
`IPR2023-00922
`Page 18 of 316
`
`
`
`
`
`ORIGINATE DATE
`24 September, 2001
`
`
`EDIT DATE
`4 September, 20152
`“
`by
`
`Bu
`
`DOCUMENT-REV. NUM.
`GEN-CXXXXKX-REVA
`
`PAGE
`19 of 54
`
`|
`i
`
`ADDR
`
`
`SQ_STATE#
`
`
`
`
`
`
`
`
`
`
`|
`|
`
`
`
`_
`DEALOC
`i—WRITE_ENABLE
`|
`
`Free List CNT VALUE|COUNTERS - 5
`|
`|
`| [| |
`PREVIOUS
`
`i NOT lal
`STATE
`|
`|
`
`|
`NEW
`|
`STATE
`|
`|
`VALUE | |
`|
`|
`|
`——— I=
`| he ~<
`
`|
`
`/|
`oR
`
`|;
`
`:
`
`VALID
`
`|
`
`Cee
`
`
`L
`:
`|
`SQ IDLE
`—— AND }
`PA_IDLE
`CP_NEW_STATE_CNTL—
`SET CTX BITS
`
`
`
`
`
`
`Figure $: De-allocation mechanism for R400LE.
`
`5.3.3 Dirty bits
`Two sets of dirty bits will be mainiained per logical address. The first one will be set to zero on reset and set when
`the logical address is addressed. The second one will be set to zero whenever a newcontext is written and set for
`each address written while in this context. The reset dirty is not set, then writing to that logical address will not
`require de-allocation of whatever address stored in the renaming table.
`If itis set and the contextdirty is not set, then
`the physical address store needs to be de-allocated and a new physical address is necessary to store the incaming
`data.
`If they are both set, then the data will be written into the physical address held in the renaming for the current
`logical address. No de-allocation or allocation takes place. This will happen when the driver does a set constant
`twice to the same logical address between context changes. NOTE:
`It is important to detect and preventthis, failure
`to do it will allow multiple writes to allocate all physical memory and thus hang because a context will not fit for
`rendering to start and thus free up space.
`
`5.3.4 Free List Block
`A free list block that would consist of a counter (called the IFC or Initial Free Counter) that would reset to zero and
`incremented every time a chunk of physical memory is used until they have all been used once. This counter would
`be checked eachtime a physical block is needed, andif the original ones have not been used up, us a new one, else
`check the free list for an available physical block address. The count is the physical address for when getting a
`chunk from the counter.
`Storage of a free list big enough to store all physical block addresses.
`Maintain three pointers for the free list that are reset to zero. The first one we will call write_ptr. This pointer will
`identify the next location to write the physical address of a block to be de-allocated. Note: we can never free more
`physical memory locations than we have. Once recording address the pointer will be incremented to walk thefreelist
`like a ring.
`The second painter will be called stop_ptr. The stop_ptr pointer will be advanced by the number of address chunks
`de-allocates when a context fini