`
`
`
`R400 Sequencer Specification
`
`SQ
`
`Version 4442.0
`
` AUTOMATICALLY UPDATED FIELDS:
`
`It provides an overviewof the
`Overview: This is an architectural specification for the R400 Sequencer block (SEQ).
`required capabilities and expected uses of the block,
`It also describes the block interfaces,
`internal sub-
`blocks, and provides internal stete diagrams.
`
`|.
`|
`
`ATT 2028
`
`LGv. ATI
`IPR2015-00325
`
`AMD1044_0017308
`
`ATI Ex. 2011
`IPR2023-00922
`Page 1 of 58
`
`Decument Location:
`Current intranet Search Title:
`we
`~Name/Dept
`
`perforcer400\doc_libwesigniblocksiegiR400Sequencer.dec
`R400 Sequencer Specification
`_ APPROVALS
`|
`
`oe
`Signature/Date
`
`
`
` Remarks:
`
`THIS DOCUMENTCONTAINS [RRNFORMATION THAT COULD BE
`
`
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES
`INC. THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`
`
`“Copyright 2001, ATI Technologies Inc. All rights reserved. The material in this document constitutes an unpublished
`work created in 2001. The use of this copyright notice is intended to provide notice that ATI owns a copyright in this
`npublished work. The copyright notice is not an admission that publication has occurred. This work contains
`EEroprictary information and trade secrets of ATI. No part of this document may be used, reproduced, or
`transmitted in any form or by any means without the prior written permission of ATI Technologies Inc.”
`
`Exhibit 2028.docR400-Sequencerdes
`
`79201 Bytes*** @ ATIHRcference Copyright Notice on Cover Page © +
`
`uE
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOC UMENT-REV. NUM.
`
`PAGE
`
`-
`—
`Author:
`
`24 September, 2001
`Laurent Lefebvre
`
`4 Septernber, 201519
`Sani
`ny
`
`GEN-CXXXXX-REVA
`
`
`
`1 of 58
`
`
` Issue To:
`| Copy No:
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 1 of 58
`
`
`
`
`
`PAGE
`R400 Sequencer Specification
`2 of 58
`aE
`24 September, 2001
`
`
`ORIGINATE DATE
`
`
`EDIT DATE
`
` 4 September, 201519
`
`
`
`
`
`
`
`:
`
`:
`
`OVERVIEW ..cetccccceecetrec tee eente ceed cn E RE RRR ERE End eden es nannenn een 6
`1.
`Top Level Block Diagram ooo cece cece ett teeeec trees esetraeeestrtsieeeetiteeseesteeeesnneees 98
`1.1
`Data Flow graph (SP)tc eee t ett bb
`teen tttttteettttieseteesttrenne 4340 Soe
`1.2
`CONMTO) GIDi rte nn D ttn ee tec freee ebobbitetetrtiieeeesiteeneeeteeeennea TAM
`1.3
`INTERPOLATED DATA BUS.
`2.
`4744
`INSTRUCTION STOREQe
`3.
`4744
`SEQUENCER INSTRUCTIONS..
`4.
`CONSTANT STORESocc cceccensaneceaeannnaaecncanennanaesnsanannanensisaneaaanedsasnnnaensesonnananensas 144
`§.
` MEMOry OFQaANIZAtlONS oo cc cce cc eee eetebeeeeeeeeeseeeeeseeesccstiseeeeesttstteeettsaeeeenttasass 4744
`S.1
`5.2 Management of the Control Flow Constanis......
`a (BAS
`$3 Management of the re-mapping tables oo ccc cetee tees teeter etttseteeetneenaee 1840 0
`5.3.1
`R400 Constant managementett 4845
`
`Proposal for R400LE constant management...
`5.3.2
`4845
`Dirty bitsttecities 2047 |
`S303
`S.3.4
`Free List Block ccc es ceee estes es isernsrissotnesternensrntnieenennaennnsiniteeeen 204%
`§.3.5
` De-allocate BlOCkccc teeter tenet tie ettitter tintin ete 21498
`5.3.6
`Operation of Incremental modelett 2148
`$4 Gonstant Store INdexingee eee ee tee ebb tttee ett eteeeeetneesas 2148 2)
`
`
`$5 Real TIME COMMANIS. cecececeeeteeeeteeescenecneteeccbbteeteetecititetettntiteeetttaiass 2210 @ 2
`
`$3.6 Constant Waterfalling cccceceteeeebeteebesseeeteetesessetesetetsistettesetstetttesenias 2210
`6& LOOPING AND BRANCHEScecececeeescreeneaneneenoienaencece nennenenees 2320
`
`
`
`
`6.1 The controlling state. cececccceceeeeeceeeeenseseteeeeeeerseversstevceetneeeeseenuteeteernees 2320
`
`
`6.2 The Gomtral Flow Program ooo.c cece ceceeecteeec eee cuseeeceeuuieeeseccuuieeseeeorsa 2320
`
`6.3 Data dependant predicate INStructions cece cts ete eeteeeeessettetteeesctttteerersies 29ee 28
`64 HW Detection of PVPSccc creer ceevvv eevee vs seereeevevtuseveseoeesueeeeyeyenas 2923
`6.5 Register file indexing. reser eevee nev neerorvsesensiioisissresstrsinsessriavsniseens 2923
`
`3023
`6.6
` Predicated Instruction support for Texture clauses 00000...
`
`6.7 Debugging the SNaders cee ccc eect terre tenet tees cteebteteeecstttittessscttteeetrrcies 3023
`6.7.1 Method 4: Debugging registers oooee ec cec i tettteeetetetstetrertetnttietes 3028 -—
`6.7.2 Method 2: Exporting the values in the GPRs (12). ccc ceteetee 3024
`7 POEL KILL MASKocin iCninecieseecaneanenences 3124
`
`
`MULTIPASS VERTEX SHADERS (HOS)... cece ee cesesnnneersenesnencestenenanecesenenneneeees siz4
`8
`9.
`REGISTER FILE ALLOCATION oo ecccccc ccc cne ec ennee reas cena na nna ennaaan neces sunanenneasaanaanences 3124
`
`
`
`
`
`10. cece cenceeeceeeee niesecaeeceteesaeceeaneeecnceeeecteneeaneeecees 3226FETCH ARBITRATION.oocccccccccccccncececneccecee
`
`
`
`
`11. ALU ARBITRATION occseenteencen ERRRnERnnneecaneenna 3226
`
`12. HANDLING STALLS oss tercieenne stern teen eeeeene encnnennna ne Senneeceanneaeeteene ees sd2Fo
`
`
`
`CONTENT OF THE RESERVATION STATION FIFOSLo. cecccseccccsseccsneecssenenstncessnenenneess 332% 5
`13.
`14. THE OUTPUT FILE cececccccee ee ee nnEE EEnnneencane Boee |
`
`18.
`1D FORMAT occ centre ceee nner cnae nn nae eee na nena ee saa nana ae cae nae cannes cnaaannn nets naaaaeee 332%
`15.1
`Interpolation of constant attributes oo eect tert tite treetteteerrttteeeerrciea 3428
`16. STAGING REGISTERSweccccsseccessscsencenerenseeecesenesneneesseneaneeceseueuneueesesaueunaneessnanennaeees 2428
`
`
`17.
`THE PARAMETER CACHE. ..ucccccn
`36.
`18. VERTEX POSITION EXPORTING. cecccccccseccssencecssencunencenanansonescssancusancessceneuaneesanaunnneeses 3730 |
`
`Table Of Contents
`
`
`
`
`
`:
`
`Exhibit 2028,doch409_Sequencercdac
`
`79201 Bytes™** © ATIHEcference Copyright Notice on Cover Page © **
`
`AMD1044_0017309
`
`ATI Ex. 2011
`IPR2023-00922
`Page2 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 2 of 58
`
`
`
`
`
`
`
`Vat
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`© " ¢
`24 September, 2001
`4 September, 201549
`GEN-CXXXXX-REVA
`3 of 58
`
`
`
`
`
`Exhibit 2028.,docR409_Sequencerdac
`
`79201 Bytes*™** © ATI HEcference Copyright Notice on Cover Page © **
`
`19.
`EXPORTING ARBITRATION .o..ccccccccceccerccseceaneenceceanmeneanteeesenesanceeseneanseesenmataaeceemeanaaesens 3730
`20.
`EXPORTING RULES ooo eeccccccccccccceseneanenecsenaeeanceeeaaeennenecesseaueennessnaneanerscemeanaerccaneaneaesees 3830
`
`20.1
`Parameter caches Exportsotter titt teeter SES
`20.2
`M@EMOry OXDOMSen eee tt eee tb bt te ecb etbttteetidttteeeetne 3838
`20.3
`Position exporis........
`3830
`21.
`EXPORT TYPES.........
`3830
`21.1
`Vertex SHAGcece eee reece eee i tr Gree enn EEG Ste Dt tEttEeDdtttitteeitctteeetnnries 3834
`21.2
`Pixel SACIeee ce ee ceeeeeeeebbeeeeeessesseaeeeeeeesseeeeeeenestiettvntretsseraeeeeeees 3834+
`
`22.
`SPECIAL INTERPOLATION MODES.....
`3934
`
`22.1
`Real time COMMANS ooo. ccc cece cece ceetttr ees
`. 3934+
`
`22.2
`.. $834
`Sprites/ XY screen coordinates/ FB information...........
`
`22.3
`Auto generated COUNPETS 0. ccece cc rrteeeeetties
`3932
`22.3.19Vertex SRaderSccc eer eee ee tee etttttttrcitititet tttttttttittnnern 3932
`223.2
`Pixel shaders.ett
`3922
`23.
`STATE MANAGEMENT.......
`40233
`23.1
`Parameter cache synchronization ......
`4033
`24.
`XY ADDRESS IMPORTS... ccc
`4033
`
`24.19Varlex Indexes IMPOIS cece ccs teeee ttt treteeetriseeeetttsseteeettseeeccstteteeeccra 4033
`28, REGISTERS ..0....cccccceccceccesceeeeeeceneeeeceeeeue nee ceneeu tens cee sae nusic ese dune sees qudceneessnusneneesecentenmeesess 4133
`teen ttttteetttteenees
`ve
`251
`COMPOLce ete e ttt
`4133
`
`
` CONTEXE eect rte teteet cette teeeeebtbetttnnnrrea
`25.2
`4133
`
`26. DEBUG REGISTERS......
`« AZBA
`
`wn 4234
`26.1 Context...
`ZO 2 COMOEEE EEE EEE eet E eee DERE Dec e bene Ee eebbebteeeeeebcttteeeetnenies 4234.
`27.
`INTERFACESccc cccccsessssensenessesserenserssseesssnessessseentensstessenseteassneateneeneess 4235
`27.4
`External Interfaces ccc ccececevevsvsesvseressrevavevevevevseveavseevevevevevevieas 4235
`27.2
`SC to SP INM|SMACESocc cece cect e cent teeter dee eet bbbbbbbbbbbtbtttteaaaatesaateeeeeecceee 4235
`QT 2 SCSPB cece cette eettecrteeteetiitteittietiiteititter sitesi 4235
`QT SOSoc cece cece tates tent reteset testis ttititietiittititiesttitttiterttettettttteticees 4336
`27.2.3
`SQ to SX interpolator DUS oo ccc eee ttte ete ettrtetre titttsttrtetitteennees 4532
`27.2.4
`SQto SP: Staging Register Datace cette treet tetttttettretticees 4537
`27.2.5
`VGT to SQ. Vertex interface.eer etree titties rrsesnsernes 4538
`272.6
`SQ to SX: Control BUS cece tree vititetitie etitrertetietititittircititteenenees 4944
`27.2.7
`SX to SQ: Output file COMPOcc cece tet eettre ttttttsttrtintntetee 4944
`27.2.8
`SQ to TP: Control USocc ccc cece crete tetteetttsutterettitinettttettresicees 5044
`27.2.9
`TP to SQ: Texture Staller etter att errinrsnsserrssrrneris 5142
`27.210 SQ to SP: Texture stale eer tern nt err rnierssrrnees 5142
`272.11 SQ to SP: GPR and auto counteroe ersten rrssnnerns 5142
`27.212 SQ to SPic Instructionsoes eve seri enet sini niserrrmsrsers 5243
`
`5243
`27.2.13 SP to SQ: Constant address load/ Predicate Set...
`27.214 SQ to SPx: constant broadcast. ccc ce etttttteetttettttttttetttterres 5344
`27.215 SPO to SQ: Kill vector loadcee terre never riser ernst 5344
`272AG SQ to CP: RBEBM DUS. ccc cece tees tenets titeetiitttettetscreseetitetitnteenees 5344
`
`
`
`
`
`AMD1044_0017310
`
`ATI Ex. 2011
`IPR2023-00922
`Page 3 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 3 of 58
`
`
`
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`R400 Sequencer Specification
`PAGE
` Lot 24 September, 2001 4 September, 201519 4 of 58
`
`
`
`27217 CP to SQ: RBBM DUS. ccc cc cece tteetreteetittesititititertititttettittttteeres o344
`27.218 SG to CP: State report oo cect ee tttette se tittettitsetitititttrtttetteennes 584400
`2B. OPEN ISSUES... escssecccssssescsssescssnscssssseesssnsesssnseessssssassnssessssssssanscssneesesnsecssnuecessnsessaneeessoes 5844
`
`
`
`
`
`Exhibit 2028.,doch409_Sequencerdoc
`
`79201 Byies™** © ATI HEcference Copyright Notice on Cover Page © **
`
`AMD1044_0017311
`
`ATI Ex. 2011
`IPR2023-00922
`Page 4 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 4 of 58
`
`
`
`
` ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`|
`
`rts i oes
`
`4 September, 201519
`GEN-CXXXXX-REVA
`5 of 58
`|
`24 September, 2001
`
`Revision Changes:
`Rev 0.1 (Laurent Lefebvre}
`Date: May 7, 2001
`
`First draft.
`
`:
`
`Rev 0.2 (Laurent Lefebvre)
`Date : July 9, 2007
`Rev 0.3 (Laurent Lefebvre)
`Date : August 6, 2001
`Rev 0.4 (Laurent Lefebvre)
`Date : August 24, 2001
`
`Rev 0.5 (Laurent Lefebvre)
`Date : September 7, 2001
`Rev 0.6 (Laurent Lefebvre)
`Date : September 24, 2001
`Rey 0.7 (Laurent Lefebvre)
`Date : October 5, 2001
`
`Rev 0.8 (Laurent Lefebvre)
`Date : October 8, 2001
`Rev 0.9 (Laurent Lefebvre)
`Date : October 17, 2001
`
`Rev 1.0 (Laurent Lefebvre)
`Date : October 19, 2001
`Rev 1.1 (Laurent Lefebvre)
`Date : October 26, 2001
`
`Rev 1.2 (Laurent Lefebvre)
`Date : November 16, 2001
`Rev 1.3 (Laurent Lefebvre)
`Date : November 26, 2001
`Rev 1.4 (Laurent Lefebvre)
`Date : December 6, 2001
`
`Rev 1.5 (Laurent Lefebvre)
`Date : December 11, 2001
`
`Rev 1.6 (Laurent Lefebvre)
`Date : January 7, 2002
`
`Rev1.7 (Laurent Lefebvre)
`Date : February 4, 2002
`Rev 1.8 (Laurent Lefebvre)
`Date : March 4, 2002
`
`Rev 1.9 (Laurent Lefebvre)
`Date : March 18, 2002
`Rey 1.10 (Laurent Lefebvre)
`Date : March 25, 2002
`Rev 1.17 (Laurent Lefebvre)
`Date : April 19, 2002
`Rev 2.0 (Laurent Lefebvre)
`Date April 19, 2002
`
`Changed the interfaces to reflect the changesin the
`SP. Added some details in the arbitration section.
`Reviewed the Sequencer spec after the meeting on
`August 3, 2001.
`Added the dynamic allocation method for register
`file and an example (written in part by Vic) of the
`flow of pixels/vertices in the sequencer.
`Added timing diagrams (Vic)
`
`the new R400
`
`reflect
`spec to
`Changed the
`architecture. Added interfaces.
`instruction
`Added
`constant
`store management,
`store management, control flow management and
`data dependant predication.
`Changed the control
`flow method to be more
`flexible. Also updated the external interfaces.
`Incorporated changes made in the 10/18/01 contro!
`flow meeting. Added a NOPinstruction, removed
`the
`conditional_execute_or_jump. Added
`cebug
`registers.
`Refined interfaces to RB. Added state registers.
`
`delta
`SEQ-»SPO interfaces. Changed
`Added
`precision. Changed VGT—SP90interface. Debug
`Methods added.
`Interfaces greatly refined. Cleaned up the spec.
`
`Addedthe different interpolation modes.
`
`Added the auto incrementing counters. Changed
`the VGT-»SQ interface. Added content on constant
`management. Updated GPRs.
`Removed from the spec all interfaces that weren't
`directly tied to the SQ. Added explanations on
`constant
`management.
`Added
`PASO
`synchronization fields and explanation.
`Added more details on the staging register. Added
`detail about
`the parameter caches. Changed the
`call
`instruction to a Conditionnal_call
`instruction.
`Added
`details
`on
`constant management
`and
`updated the diagram.
`in the 3X
`Added Real Time parameter control
`interface. Updated the control flow section.
`Newinterfaces to the SX block. Added the end of
`clause modifier,
`removed the
`end of clause
`instructions.
`Rearangementof the CF instruction bits in order to
`ensure byte alignement.
`Updated the interfaces and added a section on
`exporting rules.
`Added CP state report interface. Last version of the
`spec with the old control flow scheme
`New control flow scheme
`
`Exhibit 2028.doch400-Sequenverdec
`
`79201 Bytes*** © ATI HEcference Copyright Notice on Cover Page © **
`
`AMD1044_0017312
`
`ATI Ex. 2011
`IPR2023-00922
`Page 5 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 5 of 58
`
`
`
`
`
`1. Overview
`jgntiThe sequencer chooses two ALU slauses-threads and a fetch slause
`:
`:
`se
`ark
`
`hreadto execute, and executesail of the instructions in a clause-block before looking for a newclause of the same
`
`type. Two ALU clauses threadsare@ executed interleaved to hide theALU latency. Each vector all haveie eisfateh
`
`
`
`
`
`|
`
`i
`|
`
`ORIGINATE DATE
`
`24 September, 2001
`
`EDIT DATE
`
`SE
`oes
`4 September, 201549
`
`R400 Sequencer Specification
`
`PAGE
`
`6 of 58
`
`acute. The arbitrator will give
`older threads, L-will-net-execuie-an-aly
`npleted.-There are two separate sets-of
` reservation stations, one, for pixel vectors and cone5 for vertices vectors. This way a pixel can pass a vertex and a
`
`vertex can pass a pixel.
`
`To support the shader pipe the sequencer also contains the shader instruction cache, constant store, contro! flow
`constants and texture state. The four shader pipes also execute the same instruction thus there is only one
`sequencer for the whole chip.
`
`The sequencer first arbitrates between vectors of 64 vertices that arrive directly from primitive assembly and vectors
`of 16 quads (64 pixels) that are generated in the scan converter.
`
`The vertex or pixel program specifies how many GPRs it needs to execute. The sequencer will not start the next
`vector until the needed spaceis available in the GPRs.
`
`Exhibit 2028 dock400_Sequercerdes
`
`73201 Bytes*** © ATI HEcference Copyright Notice on Cover Page ©
`
`AMD1044_0017313
`
`ATI Ex. 2011
`IPR2023-00922
`Page 6 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 6 of 58
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`|e
`
`|RoyAD
`
`
`;>|AYOLSLSNI
`NUTSOT—oeiCdLogLNefLe
`
`MYre||euauniod|
`_||pur}SLWLSHOLS
`
`
`
`
`
`
`we@@HEd18409UOSTONJUBUAdODsous.0joyBBLy@8Ms0rczcopssoennespopyoopazaznaps
`YTRLG
`fs|dSdsdsdsSHOISS|x|WLPommnmmnrnnnneLNTCELL
`__|PO/OdgO/oddO/dd«-)GO/dana|!Yi/8
`
`
`~~oe|»Gal|~Gel»el|eeaeme
`
`°l"PPeiSadw5|a
`
`
`
`
`
`
`eeeSeereeeemmseeeceaimmemaumiemiae|8GJO2WABE-XXXXXO-NAD
`
`FT‘tARLNTEXp
`
`
`BtGLOdVequiejdespyL00ZJequisidespz
`
`
`|3ovdWAN(ASMLNSINDO0aLlvdLidadivdalyNnigio
`awsaDed_paispe/:
`-—~kt,LOSLISMXL
`
`
`ov|*|JSNI
`
`_||avonNnyrlLSM
`
`
`aanaSlNi--)YSaiN
`LeL
`yeVESSOHS71|Be
`hooomeplSINDByTOSLNODaa
`
`
`
`MOIAIIAOJg0uenbeg[eleuey3]any
`49Li49shay|
`gopee
`
`ToRINCSpaddlerSLNVLSNOD
`
`
`
`
`
`
`
`
`-do
`
`AMD1044_0017314
`
`ATI Ex. 2011
`IPR2023-00922
`Page 7 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 7 of 58
`
`
`
`
`
`
`
`
`
`
`
`
`
` ORIGINATE DATE
`
`PAGE
`R400 Sequencer Specification
`EDIT DATE
`ry
`24 September, 2001
`8 of 58
`4 September, 201519
`
`Exhibit 2028.,doch409_Sequencerdoc
`
`79201 Byies™** © ATI HEcference Copyright Notice on Cover Page © **
`
`AMD1044_0017315
`
`ATI Ex. 2011
`IPR2023-00922
`Page 8 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 8 of 58
`
`
`
`
`
`
`
`PAGE
`DOCUMENT-REV. NUM.
`EDIT DATE
`ORIGINATE DATE
`|
`
`L L fy
`|
`24 September, 2001
`4 September, 201549
`GEN-CoOOOG-REVA
`9 of 58
`|
`
`1 Top Level Block Diagram
`
`
`
`
`
`Exhibit 2028 dock400_Sequerverdes
`
`73201 Bytes*** © ATI HEcference Copyright Notice on Cover Page ©
`
`AMD1044_0017316
`
`ATI Ex. 2011
`IPR2023-00922
`Page 9 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 9 of 58
`
`
`
`
` miLa *©°
`
`
`
`
`
`ORIGINATE DATE
`24 September, 2001
`
`EDIT DATE
`4 September, 201519
`By
`i
`oe
`
`R400 Sequencer Specification
`
`PAGE
`10 of 58
`
`|
`|
`|
`
`Input Arbiter
`
`||
`|
`|
`
`|||
`
`|
`
`I
`
`'
`
`;
`
`VIX RS
`
`PIX RS
`
`—_
`
`
`
`|
`—>|
`
`
`
`oS=@ =
`
`
`
`Texture
`
`1>i5
`
`
`
`Exhibit 2028.,doch409_Sequencerdoc
`
`79201 Byies™** © ATI HEcference Copyright Notice on Cover Page © **
`
`AMD1044_0017317
`
`ATI Ex. 2011
`IPR2023-00922
`Page 10 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 10 of 58
`
`
`
`
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`
`11 of 53
`GEN-CXXXXX-REVA
`4 September, 201549
`24 September, 2001
`A
`r
`AP
`
`
` ORIGINATE DATE
`
`
`
`vertex/pixel vector arbitrator
`
`Possible delay for available GPR's G
`
`
`
`
`
`
`
`@j—Aliletanse 4
`eservation station
`
`4
`
`FIFO -t——|
`PE
`
`
`
`
`oxture clause OQ
`|
`
`| FIFO+ |
`eservationstation |
`FIFO
`i
`<
`[Eo] <
`ATA clanse 0
`reservation
`statior
`
`
`FIFO
`la@——peservation station
`>
`Texture clause 1
`been
`!
`6
`\
`
`eservationstation |
`eV HES
`extire arbitrator
`
`
`Lag—ATT clanse 1 < [FIFOa |
`
`
`:
`eservation station
`IK
`
`[PPO]
`er.eservationstation
`oxture arbitrator
`
`
`
`ALU clause 2
`
`‘eservation station
`——————“—,
`plPFO]
`Bes
`
`(onl
`Texture clanse 3
`|
`
`AES
`eservationstation |
`
`begAT clanseos Lo
`-
`j
`rer
`eservation station
`FIFO
`L
`
`Lo
`iPexture clause 4
`
`eservation station |
`
`
`
`
` g@——ALU clause 7
`
` TH clanse 5
`eservation station
`
`reservation station.
`|
`
`
` < [RRS}jw@gALU clause 6 [FROTreservation station
`
`
`eecencey
`_
`
`| FIFO LpiPexture clause 7
`
`4
`eservation station
`
`
`Lg] FIFOLg___
`eservation station
`
`
`
`Figure 2: Reservation stations and arbiters
`Thereare twe-sets-of the above figure,one for-vertices andonetor-sixels.
`
`
`
`
`screen cositionineeded) are senitc the interpolator whichwilluseihemiic interpolate theparametersandclace-the
`
`
`ore—the—in
`
`Exhibit 2028.,docR409_Sequencerdac
`
`79201 Bytes*™** © ATI HEcference Copyright Notice on Cover Page © **
`
`
`
`
`
`
`
`
`
`AMD1044_0017318
`
`ATI Ex. 2011
`IPR2023-00922
`Page 11 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 11 of 58
`
`
`
` ORIGINATE DATE
`
`24 September, 2001
`
`EDIT DATE
`4 September, 201519
`ii
`
`R400 Sequencer Specification
`
`PAGE
`12 of 58
`
`
` buffor. Hike outout puter ie fullerdosen't have enough&space the sequencer wil prevent such.a ‘vertex group. to
`
`enterar-exporting clause.
`
` decode bus-al-onetime,Similaryonkione> fetch state.machine may.haveaaocees-_itotheregisier file addrese-buc.at
`
`one-time.‘Arbitration.iserformed.by three:arbiter-blocke(hwoforthe-ALAstate.Machines and’eOFe-for the fetchstale
`Under this new scheme,
`the sequencer
`(SQ) will only use one
`global state management machine
`per vector t
`ixel, vertex) that we call the reservation station (RS).
`
`
`Exhibit 2028.,doch409_Sequencerdoc
`
`79201 Byies™** © ATI HEcference Copyright Notice on Cover Page © **
`
`AMD1044_0017319
`
`ATI Ex. 2011
`IPR2023-00922
`Page 12 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 12 of 58
`
`
`
`
`
`
`PAGE
`DOCUMENT-REV. NUM.
`EDIT DATE
`ORIGINATE DATE
`|
`
`|
`24 September, 2001
`4 September, 201549
`GEN-CoOOOG-REVA
`|
`43 of 58
`{
`:
`fy
`{
`1.2 Data Flow graph (SP)
`
`
`
`possess
`
`e
`
`-
`
`
`——~ go
`
`Be ee
`
`
`
`Register File
`1
`
`
`re requ
`
`—~\
`
`requeg
`
`
`
`at | , &
`instruction
`
`
`
`
`
`
`
`
`
` (
`
`= cS*
`
`
`
`textureaddress
`
`
`pipeline stage
`|
`
`pipeline
`
`instruction
`
`
`
`
`
`SiraXxer
`
`
`
`
`
`
`
`
`
`
` Bpep
`
`texture}
`
`quest
`
`
`
`pipeline
`
`Lua
`oe
`a
`5)
`8
`=
`2
`
`ae
`S
`
`i
`
`i
`a
`texture re}
`
`
`
`so
`
`I
`
`te ,
`an
`j
`I
`
`\
`to PrimitiveAssembly Unit orRenderBackend
`r
`
`,
`
`est
`
`I
`
`|
`
`Exhibit 2028 dock400_Sequerverdes
`
`73201 Bytes*** © ATI HEcference Copyright Notice on Cover Page ©
`
`Figure 3: The shader Pipe
`
`AMD1044_0017320
`
`ATI Ex. 2011
`IPR2023-00922
`Page 13 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 13 of 58
`
`
`
`
`
` |
`
`PAGE
`R400 Sequencer Specification
`EDIT DATE
`ORIGINATE DATE
`
`| Agadir
`24 Septernber, 2001
`4 September, 201819
`44 of 58
`|
`|
`The gray area represents blocks that are replicated 4 times per shader pipe (16 times on the overall chip).
`
`1.3 Control Graph
`
`Clause # + Rdy
`WrAddr
`CMD
`
`cst
`
`:
`
`iS
`
`SEQ
`
`CST
`
`|
`|
`||
`|
`|
`|
`i
`|
`|
`*
`
`WrAddr
`
`
`|
`Phase
`P|
`
`emp CSTcst:estipx 4
`c Wivec
`Rance
`|
`Co
`| WrSeal wrAddr
`
`}
`vow le
`
`sey
`
`=
`
`FETCH
`
`SP
`
`OF
`
`WrAder
`
`Figure 4: Sequencer Control interfaces
`
`in red the ALU control interface, in blue the Interpolated/Vector
`In green is represented the Fetch control interface,
`control interface and in purple is the outputfile control interface.
`
`2. Interpolated data bus
`The interpolators contain an IJ buffer to pack the information as much as possible before writing it to the register file.
`
`Exhibit 2028 dock400_Sequercerdes
`
`73201 Bytes*** © ATI HEcference Copyright Notice on Cover Page ©
`
`AMD1044_0017321
`
`ATI Ex. 2011
`IPR2023-00922
`Page 14 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 14 of 58
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`24 September, 2001
`—
`
`4 September, 201349
`oy
`
`GEN-CXXXXX-REVA
`
`PAGE
`
`15 of 53
`
`
`
`
`
`
`=wile
`
`||
`lds CROSSBAR(4x64 bits}
`
`
`
`
`Aa
`Al
`Aa
`Ba
`
`
`Bt
`
`co
`
`ct
`
`2
`
`iJs buffer (ping-pong Suffer)
`(28 bits * 2 (10) + bits * 6 (delta Ws)+4 ©
`bits*6)* 16 (quads) * 2 (doubie-butffered)
`4096 bits
`32x 128
`
`2
`
`3
`
`4
`
`c3
`|
`C4
`cS
`be
`
`:
`+
`
`Bt
`
`b2
`
`Eo
`
`EA
`
`
`
`.
`i
`AD
`At
`Ag
`Bo
`i
`:
`
`
`Bt
`
`c3
`
`mt
`
`co
`
`C4
`
`bz
`
`c
`
`cs
`
`EG
`
`c2
`
`
`i
`Do
`i
`I
`!
`i
`|
`
`a
`
`XY¥s buffer (ping-pong buffer)
`24 bits * 16 quads *2
`768 its
`3ox24
`
`|
`
`
`
`
`
`
`
`
`
`
`INTERPOLATORS
`
`|
`
`|
`
`|
`
`!
`!
`FIX-FLOAT + EXPANSION
`
`|
`
`a
`
`
`
`Figure 5: Interpolation buffers
`
`Exhibit 2028.,docR409_Sequencerdac
`
`79201 Bytes*™** © ATI HEcference Copyright Notice on Cover Page © **
`
`AMD1044_0017322
`
`ATI Ex. 2011
`IPR2023-00922
`Page 15 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 15 of 58
`
`
`
`eelicelOLL
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`WILIStIpSUNUYUOpEod.ajUy79sansLy]
`
`
`
`
`
`
`
`
`
`we@@BUdJ8A0DUODTIIONJUBUAdODeoUe10;ouBMLy@8%9s0zezsoosovenes"pspyoorazazats
`
`
`
`
`
`aLydLidaFLVSLYNIOUO
` aFea 8GJOOLSral0eGUSMSSF|loozJequerdespz
`
`
`
`
`
`3OWduoleoyloedsseousnbes0O7Y
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`AMD1044_0017323
`
`ATI Ex. 2011
`IPR2023-00922
`Page 16 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 16 of 58
`
`
`
`
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
` ORIGINATE DATE
`
`a a
`17 of 58
`GEN-CXXXXX-REVA
`4 September, 201549
`24 Septernber, 2001
`Above is an example of a tile the sequencer might receive from the SC. The write side is how the data get stacked
`into the XY and IJ buffers, the read side is how the data is passed to the GPRs. The IJ information is packed in the IJ
`buffer 4 quads at a time or two clocks. The sequencer allows at any given time as many as four quadsto interpolate a
`parameter. They all have te come from the same primitive. Then the sequencer controls the write mask to the GPRs
`to write the valid data in.
`
`Instruction Store
`3.
`There is going to be only oneinstruction store for the whole chip. It will contain 4096 instructions of 96 bits each.
`
`It is likely to be a 1 port memory; we use 71 clock to load the ALU instruction, 1 clocks to load the Fetch instruction, 1
`clock to load 2 control flow instructions and 1 clock to write instructions.
`
`The instruction store is loaded by the CP thru the register mapped registers.
`
`The VS_BASE and PS_BASE context registers are used to specify for each context where its shader is in the
`instruction memory.
`
`For the Real time commandsthe story is quite the same but for some small differences. There are no wrap-around
`points for real time so the driver must be careful not to overwrite regular shader data. The shared code (shared
`subroutines) uses the same path as real time.
`
`4. Sequencer Instructions
`All control flow instructions and moveinstructions are handled by the sequencer only. The ALUs will perform NOPs
`during this time (MOV PV,PV, PS,PS) if they have nothing else to do.
`
`5. Constant Stores
`
`5.1 Memory organizations
`A likely size for the ALU constant store is 1024x128 bits. The read BW from the ALU constant store is 128 bits/clock
`and the write bandwidth is 32 bits/clock (directed by the CP bus size not by memory ports).
`
`The maximum logical size of the constant store for a given shaderis 256 constants. Or 512 for the pixel/vertex shader
`pair. The size of the re-mapping table is 128 lines (each line addresses 4 constants). The write granularity is 4
`constants or 512 bits.
`It takes 16 clocks to write the four constants. Real time requires 256 lines in the physical
`memory (this is physically register mapped).
`
`The texture state is also kept in a similar memory. The size of this memory is 320x96 bits (128 texture states for
`regular mode, 32 states for RT). The memory thus holds 128 texture states (192 bits per state). The logical size
`exposes 32 different states total, which are going to be shared between the pixel and the vertex shader. The size of
`the re-mapping table te for the texture state memory is 32 lines (each line addresses 1 texture state lines in the real
`memory). The CP write granularity is 1 texture state lines (or 192 bits). The driver sends 512 bits but the CP ignores
`the top 320 bits.
`It thus takes 6 clocks to write the texture state. Real time requires 32 lines in the physical memory
`(this is physically register mapped).
`
`The control flow constant memory doesn’t sit behind a renaming table. It is register mapped and thus the driver must
`reload its content each time there is a changein the control flow constants. Its size is 320*32 because it must hold 8
`copies of the 32 dwords of contral flow constants and the loop construct constants must be aligned.
`
`
`
`The constant re-mapping tables for texture state and ALU constants are logically register mapped for regular mode
`and physically register mapped for RT operation.
`
`Exhibit 2028 dock400_Sequerverdes
`
`73201 Bytes*** © ATI HEcference Copyright Notice on Cover Page ©
`
`AMD1044_0017324
`
`ATI Ex. 2011
`IPR2023-00922
`Page 17 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 17 of 58
`
`
`
` |
`
`
`
`ORIGINATE DATE
`
`24 September, 2001
`
`|
`
`EDIT DATE
`
`iy
`4 Seplember, 2015418
`
`R400 Sequencer Specification
`
`PAGE
`
`18 of 58
`
`5.2 Management of the Control Flow Constants
`The control flow constants are register mapped, thus the CP writes to the according register to set the constant, the
`S8Q decodes the address and writes to the block pointed by its current base pointer (CF_WR_BASE). On the read
`side, one level of indirection is used. A registerQ_CONTEXT_MISC.CF_RD_BASE) keeps the current base pointer
`to the control flow block. This register is copied wheneverthere is a state change. Should the CP write to CF after the
`state change, the base register is updated with the (current pointer number +1 )% number of states. This way, If the
`CP doesn't write to CF the state is going to use the previous CF constants.
`
`5.3 Managementof the re-mapping tables
`
`5.3.1 R400 Constant management
`The sequencer is responsible to manage two re-mapping tables (one for the constant store and one for the texture
`state). On a state change (by the driver), the sequencerwill broadside copy the contents ofits re-mapping tables to a
`new one. We have 8 different re-mapping tables we can use concurrenily.
`
`The constant memory update will be incremental, the driver only need to update the constants that actually changed
`between the two state changes.
`
`the physical memory MUSTbeat least twice as
`For this model to work in its simplest form, the requirement is that
`
`large as the logical address space + the space allocated for Real Time. In our case, since the logical address space
`is 512 and the reserved RT space can be up to 256 entries, the memory must be of sizes 1280 and above. Similarly
`the size of the texture store must be of 32*2+32 = 96 entries and above.
`
`5.3.2 Proposal for R400LE constant management
`To make this scheme work with only 512+256 = 768 entries, upon reception of a CONTROL packetof state + 1, the
`sequencer would check for SQ_IDLE and PA_IDLE and if both are idle will erase the content of state to replaceit with
`the newstate (this is depicted in Figure 8: De-allocation mechanismPigure-G:-De-alloeztion-meckaniem). Note that in
`the case a state is cleared a value of 0 is written to the corresponding de-allocation counter location so that when the
`8Q is going to report a state change, nothing will be de-allocated upon thefirst report.
`
`The second path sets all context dirty bits that were used in the current state to 1 (thus allowing the newstate to
`reuse these physical addressesif needed).
`
`Exhibit 2028 dock400_Sequercerdes
`
`73201 Bytes*** © ATI HEcference Copyright Notice on Cover Page ©
`
`AMD1044_0017325
`
`ATI Ex. 2011
`IPR2023-00922
`Page 18 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 18 of 58
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`
`
`PAGE
`
`19 of 58
`
`24 September, 2001
`Free List
`
`4 September, 201519
`aE
`Ee
`
`GEN-CXXXXX-REVA
`
`,Logical Address
`
`
`
`
`
`
`
`
`Currenv/Last
`
`
`Context
`
`||
`(8 rows of 16-8
`
`
`bit physical =>
`i'
`
`
`
`128 entries copy||| a
`
`
`
`
`
`Logical Actress in eight clocks)||| Sonrext | & Context
`
`
`
`&
`|
`S
`!
`
`@ I= .
`
`|
`Context N
`|
`-—s, Physical
`‘|
`Address
`|
`
`
`
`(3 — Read_ptr &:
`———_—
`‘al
`
`Address
`to Allocate
`
`ci piysical address:
`
`
`Renaming Table
`Context 0 => N
`
`|
`
`
`
`
`
`
`
`Global Register
`ol
`Data Bus
`Constants
`|
`location
`avalableloapneaaa
`WRIR + Staging Write Addr||r
`
`7
`»!
`|
`
`
`
`physical
`address
`to
`schedule
`for
`de-alloc
`
`i
`Logicaleres
`GibRegBus
`when Isb are zero
`first word of write
`
`next
`physical
`adcress
`ready
`for allocate
`|
`|
`|
`|
`|
`|
`|
`|
`
`-
`
`
`
`i
`\
`|
`*
`- x
`x
`| Reset
`.
`‘5.4
`Renaming Table)
`|
`id
`for 1 Context
`|
`| Logical
`Logical
`CurrentfLast
`Le
`nae
`!
`Physical
`Address ja | Address K
`Address
`|
`|
`(Only
`(if set
`er
`
`|
`|
`pe
`de
`don't
`Logical
`4
`:
`allocate
`allocate
`Address
`|e
`-
`
`
`|
`if set}
`or de-
`
`allocate)|
`|
`|
`
`
`Copy Last held above to
`Current Context on receipt
`of Set Constant for a
`new context (Hide loading
`behind Set State load - 16 clocks)
`all other Set States just write one
`entry te current state.
`
`Exhibit 2028.,docR409_Sequencerdac
`
`79201 Bytes*™** © ATI HEcference Copyright Notice on Cover Page © **
`
`Figure 78: Constant management
`
`Staging Data
`Buffer
`
`
`
`
`
`Physical
`
`q
`Se
`Constant
`Request
`
`|
`i
`|
`| Context &
`1
`Logical
`Address
`
`Renaming
`table
`N-Contexts
`
`|
`
`
`
`
`
`|
`I
`
`
`
`AMD1044_0017326
`
`ATI Ex. 2011
`IPR2023-00922
`Page 19 of 58
`
`ATI Ex. 2011
`
`IPR2023-00922
`Page 19 of 58
`
`
`
`
`
`
`
`ORIGINATE DATE
`24 September, 2001
`
`EDIT DATE
`4 September, 2015419
`4
`ey
`
`R400 Sequencer Specification
`
`PAGE
`20 of 58
`
`ADDR
`
`SQ_STATE#
`
`
`
`,
`pe—WRITE_ENABLE
`-
`|
`!
`
`DEALOC
`COUNTERS
`
`| |
`
`CNT VALUE
`
`Free List
`
`|
`
`||
`PREVIOUS
`STATE
`|
`!
`|
`|
`rc VALUE -—__
`r+
`———|
`Is
`F
`|
`“
`VALID
`|
`| « ——
`—_—_— OR
`|
`|
`|<
`89 IDLE
`_____
`——, AND |
`PA_IDLE-——
`se CP_NEW_STATE_CNTL—
`te
`ae
`REMAPPING
`“<@—_SET CTX BITS
`TABLE
`
`
`NEW
`STATE
`|
`|
`|
`
`
`
`
`
`Figure 89: De-allucation mechanism for R400LE
`
`5.3.3. Dirty bits
`Two sets ofdirty bits will be maintained per logical address. Thefirst one will be set to zero on reset and set when
`the logical address is addressed. The second onewill be set to zero whenever a new context is written and set for
`each address written while in this context. The reset dirty is not set, then writing to that logical address will not
`require de-allocation of whatever address stored in the renaming table.
`Ifit is set and the context dirty is not set, then
`the physical address store needs to be de-allocated and a new physical address is necessary to store the incoming
`data.
`lf they are both set, then the data will be written into the physical address held in the renaming for the current
`logical address. No de-allocation or allocation takes place. This will happen when the driver does a set constant
`twice to the samelogical address between context changes. NOTE:
`It is important to detect and prevent this, failure
`to do it will allow multiple writes to allocate all physical memory and thus hang because a context will not fit for
`rendering to start and thus free up space.
`
`5.3.4 Free List Block
`A free list block that would consist of a counter (called the IFC or Initial Free Counter) that would reset to zero and
`incremented every time a chunk of physical memory is used until they have all been used once. This counter would
`be checked each time a physical block is needed, andif the original ones have not been used up, us a new one, else
`check the free list for an available physical block address. The coun