`Document Location:
`Cc\periorce'r400\doc_lib\designiblocks\sq\R400Sequencer.doc
`
`Current intranet Search Title:
`R400 Sequencer Specification
`
`| ee SEES
`Bees
`APPROVALS |
`:
`:
`:
`=
`
`
`|
`
`
`Name/Dept.
`
`J a Signature/Date:
`
`
`
`
`
`ORIGINATE DATE
`
`“Author:
`
`
`
`EDIT DATE
`DOCUMENT-REV. NUM.
`GEN-CXXXXX-REVA
`1 of 51
`
`24 September, 2001 4 September, 207544a.Ledeentee
`
`
`Laurent Lefebvre
`_|ser No:
`
`R400 Sequencer Specification
`
`SQ
`
`Version 2.07
`
`
`It provides an overview of the
`Overview: This is an architectural specification for the R400 Sequencer block (SEQ).
`required capabilities and expected uses of the block.
`it also describes the block interfaces,
`internal sub-
`blocks, and provides internal state diagrarns.
`
` ‘IssueTo:
`
`r
`
`:
`
` transmitted in any form or by any means without the prior written permission of ATI Technologies Inc.”
`
`
`
`
`Remarks:
`
`
`
`
`
`
`THIS DOCUMENT CONTAINS CONFIDENTIAL INFORMATION THAT COULD BE
`
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES
`
`INC. THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`
`
`“Copyright 2001, ATI Technologies Inc. All rights reserved. The material in this document constitutes an unpublished
`work created in 2001. The use of this copyright notice is intended to provide notice that ATI owns a copyright in this
`unpublished work. The copyright notice is not an admission that publication has occurred. This work contains
`confidential, proprietary information and trade secrets of ATI. No part of this document may be used, reproduced, or =
`
`
`
`Exhibit 2038.docRe00_Sequencerdec
`
`73669 Bytes** © ATI Confidential. Reference Copyright Notice on Cover Page © ***
`
`ATI 20353
`LGv. ATI
`IPR2015-00325
`
`AMD1044_0257711
`
`ATI Ex. 2109
`IPR2023-00922
`Page 1 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 1 of 326
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`
`R400 SequencerSpecification
`PAGE
`oe
`2 0f 51
`:
`24 September, 2001 4September,2015+
`
`tember,201oi4
`
`
`
`.OfContents
`
` OVERVIEWoeriLEEnde ineent ere 7
`L
`
`
`
`1.1 Top Level Block Diagram ooceceeesceebenteeseetteeeeerccriteesesssieeeeecissisesesteterenenseees 9
`1.2 Data Flow graph (SP) ccc eect eee e nnn ee eb ebttteeercottteeeescateesetiseeeetetineeess 10
`
`LStencebbeteetbsbtieeeettvcietesettiieeesttttenennaes 4COMPO) GADDce eee ieee cite
`
`
`
`
`
`2. INTERPOLATED DATA BUS octetseereineinnenecnanecnaneedeeenanieneenaannnnanenenaes We
`
`3.
`INSTRUCTION STORE. oo. eerie ee creer en nnn nec nannennee sec tnneaneeenenenenensnnen 14
`4,
`SEQUENCER INSTRUCTIONS. o.ecccccccee csc nn een ees aenneeeeenanneneneeeene 14
`&.
`CONSTANT STORESue
`
`$.1 Memory organizationsooo
`$.2 Management of the Control Flow Constants......
`:
`§.3 Management of the re-mapping tables 000...
`5.3.1
`R400 Constant management oo. ccc cc cece tees treteetsceeetitetttesvetitetetiesnteees 15 |
`§,.3.2
`Proposal for R4AQ0LE constant MANAGEMENT oo cece cee teteteeevetettteteetteees 15
`SB.3
`Dirty DIS cece ee ceeetet te eetbevivititititesitttentrermitinttitscitititetecreetresteteeaes 7
`
`5.3.4
`Free List Block oo...
`wi lT eee
`5.3.5
`De-allocate BlOCK ooo occ ccc cece ttee ti tetteett is tit titistitetitititittitstttrerctnttrsttes 18
`5.3.6
`Operation of Incremental modeleters rier rntatrernrsnasnes 18
`$.4. Constant Store Indexing...
`218
`
`$3.5 Real Time Commands...
`19 |
`$3.6 Constant Waterfalling.0...
`18
`6
`LOOPING AND BRANCHES...
`naa
`6.1
`The controlling state. rere eer steer re ernst eter rrr eset resrroraay 20 a
`6.2
`The Control Flow Progra oo. cceeceeeccceeeeentrneeeeevneeecseennreeessentnetevecvutetevennteates 20 88
`6.2.1
`Control flow instructions table oooce ects eet te nets everetenrienens 21.
`BB
`IMPlEMe tation. ee cee eee ce eee cette ett ee eE Otte Sette i ttaeetcettateeeecstaaeeettriaaees 23
`64 Data dependant predicate instructions......
`24
`
`65
`HW Detection of PVPS oo.
`25
`6G Register Me inCexing cece eee eee eeeeettrssseeernneeestenteeesstttteeeertrtrreennnreannad 25 5%
`6.7 Debugging the Shaders ooec cee eect sec eeecueeeeescuieeeeeeccuuiesesscrsa 2625
`6.7.1 Method 1: Debugging registers ooo cece c cette tttttstttre tintin: 26
`6.7.2 Method 2: Exporting the values in the GPRS ooo ccc te ttre tienes 26
`PIMEL KILL MASK occ cet cei ni en enna ened cdc oinneneeenenenneneeenien 26
`
`
`
`
`
`
`7.
`
` MULTIPASS VERTEX SHADERS (HOS)... cccccc sess cee ceeeenessseenneneesssseeanenecessensnencensnees 26
`8
`REGISTER FILE ALLOCATION. occ ccsccccceeserecrseensnerestseassnecesinncuneecisen
`2
`9.
`
`
`
`10. FETCH ARBITRATION. cececenceeneenenernennenene
`
`11. ALU ARBITRATION ......
`
`12.
`HANDLING STALLS occ enne center recneneceeeemnenennee
`
`ease
`13.
`CGONTENT OF THE RESERVATION STATION FIFOS.....
`
`14.
`THE OUTPUT FILE oo ceccceccccsnecsnee secant een enn ERE Re Eanacne 2
`15.
`ID FORMAT cece ceeeneEEEEEe nee 29
`15.1
`Interpolation of constant attributes...
`29
`
`16.
`STAGING REGISTERS ooo ccccccccccsccenenneccnneeennneeesnneeenna ces snnenes ieee eeceneneeecunesennseeeennesennenes ene 30
`
`Exhibit 2035.docRaod_Sequencerdec
`
`73669 Bytes** © ATI Confidential. Reference Copyright Notice on Cover Page © »*
`
`AMD1044_0257712
`
`ATI Ex. 2109
`IPR2023-00922
`Page2 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 2 of 326
`
`
`
`
`
`PAGE
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`24 September, 2001
`4 September, 201544
`GEN-CXXXXX-REVA
`3of 51
`THE PARAMETER CACHE... cece csessnussssssnsssnssestusessnsusessuiuesessuseessuacensiceessuneectsnece 31
`17.
`Export restriclons ooeee eter res leet ttssinrttretitenesttnsittereneririee B20
`17.10
`PTL|Piel experts: eect estes estes teeter eneneerssnsuissesnessaeenstissoisateansninsnneenernaeey 32
`L712
`Vertex OxOrtse occ cece eettere testes trtitetttetitietrttertitititinttttmterttcretiteess 320
`L713
`Pass thru Oxports. oo cece ceeseetstetertreetttetitietrettrtitititetitinettmteretcrenetess 32
`17.2
`Arbitration restrictions ...........
`ven
`:
`
`18.
`EXPORT TYPES .oo..cccccccccccccscnsscceneceeeeenaaenseennnnnnn eee eaaaaaaeeesaaaaaaaiaeseaaaaeaanesssaueneeeceessseeeeneceeaaae
`
` VOrex SHAGINQ.EEE eee ete e eb bb ee eeetbbeeteeebteeeeeeeetees 32
`1B.1
`
`Pixel Shading0 cee
`182
`
`.
`-
`SPECIAL INTERPOLATION MODES.....
`19.
`Real TIM] COMIMANGS ooo ccc cee ccceeee vate eeraeevaeeeeneseeersaebvateerereetetuaeereseeentaeveneernnnaaes 33 628
`19.1
`Sprites/ XY screen coordinates/ FB information... cece teeter erteeeeeeeenaes 33
`19.2
`Auto generated COUNTEIS ccc c cess cece tees eee ees cessee eee ussseeecuusceeeeessstseeeensa 34
`19.3
`TO3.1|Vertex Shadersocc cece cece ee ceeee ee teeseetestrteitierertrtetevinevsuttinenensnenees 34 .
`
`19.3.2
`Pixel SHADESocc ccc ce ceeececevesseseveveeeesveeessvsvessssvivvitesevivetvaveteveveressstiieeses 34
`20.
`STATE MANAGEMENT ooo cccccceccccccsssessecesansnnneesesaaaaaaneeoeaunaaaaueseaesaaauecessuseeseresteeeeenaneceoaae 35
`20.1
`Parameter cache synchronization...
`21.
`XY ADDRESS IMPORTS.......ccccccceeee
`
`a”
`
`
`
`21.1
`22.
`23.
`23.1
`23.2
`
`Vertex INC@KXES IMPOSE Dee Ente cb tbtt tee bbbtteteeetttetees 35
`REGISTERS. o.....ccccccccccsccencccccccccccccsseseessereesccnsccanssassessseeusccenccanencasstnsessscetsecenscensssnnecsserencenne 35 8 oe
`INTERFACESLouse css ccsssssessnensscssssassesascesseeassseessisaneensessossnsnensentaneatscssssanen3
`External Interfaces. cece estes tees tesreesesreesitsiiesness osteitis 36
`SC to SP Interhaceeee e treet ee eee en tspevesaeeetsepevaeaeyrsaeptnaeeennaayess 36
`
`SCSPH oo cececccccccceesssssesseeessesssvesesssssasvsuessesssmessssesvimesssssvesssesssvesen 36
`DBZ
`SCSQ cece ccscccseeessesssuesssssessnstissntnuntentstrttttttisessvitististessvittsss eee7
`QB2 2
`SQ to SX(SP): Interpolater DUScc cc eect ttre ret tetetttittetnttetees 39 08
`23.2.3
`SQ to SP: Staging Register Data cece csc uesesesestseststeseseseseseensesvsneess 39
`23.2.4
` VGT to SQ: Vertex Interface.ccc ce ttee cette tetttrtititttittetitttreititinetess 39
`23.2.5
`SQ to SX: Control BUS ccc cree tittetite ts titteetitie tists tititetnrresees 42 —
`23.2.6
`SX to SQ: Output file controletter nett tettieteeee 4200
`23.2.7
`SQ to TP: Control busec cee eee ceetettie rtttttetttetnttetene 43 00
`23.2.8
`TP to SQ: Texture Stalec cnet tree ster tittt rttttttttenttteeness ABZ
`23.2.9
`23.210 SQ to SP: Texture stall ccc cece cece estes seen teteeseitttstttevettettenevecenass 44
`23.2.1]
`SQ to SP: GPR and auto coUNten le ccc tect tte ttttetetttetet tenet 44
`23.212 SQ to SPx: Instructions ooo ccc cee etter ete te tite tetetitittttestettettreetees 45 0
`23.2.13 SP to SQ: Constant address load/ Predicate Set/Kill sete 46
`23.214 SQ to SPx constant broadcasteee cette cette tieetctttttette inet 46
`23.215 SQ to CP: RBBM BUS llc ccc cee ceeteetebetestesteseeteeeesestetctititesteteriestes 46
`23.216 GCP to SQ: RBBM BUSec eee nite iis eieeseitetersieee 46
`23.217 SQ to CP: State reportete crete dete tttttnttttnee AT
`23.3
`Example of control flow program executionccc cette eter etttteteeeeee 47
`
`Exhibit 2038.docka09_Sequencercloc
`
`73669 Bytes** © ATI Confidential. Reference Copyright Notice on Cover Page © »*
`
`:
`
`AMD1044_0257713
`
`ATI Ex. 2109
`IPR2023-00922
`Page3 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 3 of 326
`
`
`
`
`
`ORIGINATE DATE
`
`24 September, 2001
`
`
`EDIT DATE
`4 September, 201544
`
`Pe
`P24 QPEN ISSUES oo ooccccccccccccccecececcecececeseseuccsen cousecscevacecsvaussesscovaseauascevasasascecessueuauascesasesenvece sean a
`
`R400 Sequencer Specification
`
`Exhibit 2036.dockaod_Sequencerdec
`
`73669 Bytes“** © ATI Confidential. Reference Copyright Notice on Cover Page © »*
`
`AMD1044_0257714
`
`ATI Ex. 2109
`IPR2023-00922
`Page 4 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 4 of 326
`
`
`
`
`
`PAGE
`DOCUMENT-REV. NUM.
`EDIT DATE
`ORIGINATE DATE
`
`GEN-CX0OOCREVA 5 of 51
`24 September, 2001
`4 September, 201544
`38"
`
`RevisionRevisionChanges:
`First draft.
`Rev 0.1 (Laurent Lefebvre)
`Date: May 7, 2001
`
`Rev 0.2 (Laurent Lefebvre)
`Date : July $, 2001
`Rev 0.3 (Laurent Lefebvre)
`Date : August 6, 2001
`Rev0.4 (Laurent Lefebvre)
`Date : August 24, 2001
`
`Rev 0.5 (Laurent Lefebvre)
`Date : September 7, 2001
`Rev 0.6 (Laurent Lefebvre)
`Date : September 24, 2001
`Rev 0.7 (Laurent Lefebvre)
`Date : October 5, 2001
`
`Rev 0.8 (Laurent Lefebvre)
`Date : October 8, 2001
`Rev 0.9 (Laurent Lefebvre)
`Date : October 17, 2001
`
`Rev 1.0 (Laurent Lefebvre)
`Date : October 19, 2001
`Rev 1.1 (Laurent Lefebvre)
`Date : October 26, 2001
`
`Rev 1.2 (Laurent Lefebvre)
`Date : November 16, 2001
`Rev 1.3 (Laurent Lefebvre)
`Date : November 26, 2001
`Rev 1.4 (Laurent Lefebvre)
`Date : December 6, 2001
`
`Rev 1.5 (Laurent Lefebvre)
`Date : December 11, 2001
`
`Rev 1.6 (Laurent Lefebvre)
`Date : January 7, 2002
`
`Rev 1.7 (Laurent Lefebvre)
`Date : February 4, 2002
`Rev 1.8 (Laurent Lefebvre)
`Date : March 4, 2002
`
`Rey 1.9 (Laurent Lefebvre)
`Date : March 18, 2002
`Rev 1.10 (Laurent Lefebvre)
`Date : March 25, 2002
`Rev 1.11 (Laurent Lefebvre)
`Date : April 19, 2002
`Rev 2.0 (Laurent Lefebvre)
`Date : April 19, 2002
`
`Changed the interfaces to reflect the changesin the
`SP. Added somedetails in the arbitration section.
`Reviewed the Sequencer spec after the meeting on
`August 3, 2001.
`Added the dynamic allocation method for register
`file and an example (written in part by Vic) of the
`flow of pixels/vertices in the sequencer.
`Added timing diagrams(Vic)
`
`the new R400
`
`reflect
`spec to
`Changed the
`architecture. Added interfaces.
`instruction
`Added
`constant
`store management,
`store management, control flow management and
`data dependant predication.
`Changed the control
`flow method to be more
`flexible. Also updated the external interfaces.
`Incorporated changes made in the 10/18/01 control
`flow meeting. Added a NOP instruction, removed
`the
`conditional_execute_or_jump. Added
`debug
`registers.
`Refined interfaces tc RB. Added state registers.
`
`delta
`SEQ—-SPO interfaces. Changed
`Added
`precision. Changed VGT-SPO0interface. Debug
`Methods added.
`Interfaces greatly refined. Cleaned up the spec.
`
`Added the different interpolation modes.
`
`Added the auto incrementing counters. Changed
`the VGT—SQ interface. Added content on constant
`management. Updated GPRs.
`Removed from the spec all interfaces that weren't
`directly tied to the SQ. Added explanations on
`constant
`management.
`Added
`PA-SQ
`synchronization fields and explanation.
`Added more details on the staging register. Added
`detail about
`the parameter caches. Changed the
`call
`instruction to a Conditionnal_call
`instruction.
`Added
`details
`on
`constant management
`and
`updated the diagram.
`in the 8X
`Added Real Time parameter control
`interface. Updated the control flow section.
`Newinterfaces to the SX block. Added the end of
`clause modifier,
`removed the
`end
`of clause
`instructions.
`Rearangement of the CF instruction bits in order to
`ensure byte alignement.
`Updated the interfaces and added a section on
`exporting rules.
`Added CP state report interface. Last version of the
`spec with the old control flow scheme
`Newcontrol flow scheme
`
`Exhibit 2038.docRa0o_Sequencerdec
`
`73669 Bytes*** @ ATI Confidential. Reference Copyright Notice on Cover Page © ***
`
`AMD1044_0257715
`
`ATI Ex. 2109
`IPR2023-00922
`Page 5 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 5 of 326
`
`
`
`
`
`ORIGINATE DATE
`
`24 September, 2001
`Rev 2.01 (Laurent Lefebvre)
`Date : May 2, 2002
`Rev 2.02 (Laurent Lefebvre)
`Date : May 13, 2002
`
`R400 Sequencer Specification
`
`flow instructions to
`
`EDIT DATE
`4 September, 201544
`CNeabes inne OWI 4
`Changed slightly the control
`allow force jumps and cails.
`Updated the Opcodes. Added type field to the
`constant/pred interface. Added Last
`field to the
`SQ—SP instruction load interface.
`SP interface
`updated
`to
`include
`optimizations. Added
`the
`predicate
`instructions,
`Documented the new parameter generation scheme
`for XY coordinates points and lines STs.
`Some
`interface
`changes
`and an architectural
`change to the auto-counter scheme.
`Widened the event interface to 5 bits. Some other
`little typos corrected.
`Looos.
`jumps and calls are now using a 13 bi
`address which allows to jump and call and loop
`around any control
`flow addresses (does not
`requires fo be even anymore),
`
`Rev 2.03 (Laurent Lefebvre)
`Date : July 15, 2002
`
`Rev 2.04 (Laurent Lefebvre)
`Date :August 2, 2002
`Rev 2.05 (Laurent Lefebvre)
`Date : September 10, 2002
`Rev 2.06 (Laurent Lefebvre)
`Date : October 11, 2002
`Rev 2.07 (Laurent Lefebvre)
`
`Date : October 14, 2002
`
` predication
`no
`stall
`
`Exhibit 2036.dockaod_Sequencerdec
`
`73669 Bytes“** © ATI Confidential. Reference Copyright Notice on Cover Page © »*
`
`AMD1044_0257716
`
`ATI Ex. 2109
`IPR2023-00922
`Page 6 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 6 of 326
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`“6 i
`
`fFotSi
`|
`GEN-CXXXXX-REVA
`4 September, 201544
`24 September, 2001
`
`1. Overview
`The sequencer chooses two ALU threads and a fetch hread to execute, and executes all of the instructions in a block
`before looking for a new clause of the same type. Two ALU threads are executed interleaved to hide the ALU latency.
`The arbitrator will give priority to older threads. There are two separate reservation stations, one for pixel vectors and
`one for vertices vectors. This way a pixel can pass a vertex and a vertex can pass a pixel.
`
`To support the shader pipe the sequencer also contains the shader instruction cache, constant store, control flow
`constants and texture state. The four shader pipes also execute the same instruction thus there is only one
`sequencer for the whole chip.
`
`The sequencer first arbiirates between vectors of 64 vertices that arrive directly from primitive assembly and vectors
`of 16 quads (64 pixels) that are generated in the scan converter.
`
`The vertex or pixel program specifies how many GPRsit needs to execute. The sequencerwill not start the next
`vector until the needed spaceis available in the GPRs.
`
`Exhibit 2038 dockdGo_Sequencerdec
`
`73669 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © »~*
`
`AMD1044_0257717
`
`ATI Ex. 2109
`IPR2023-00922
`Page 7 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 7 of 326
`
`
`
`
`
`
`
`sy/—4xaBOLionmyx»™»FYOLSLSNI
`seihees
`
`’TRL
`
`
` TIWIAE.LVNYaddoOFAILOdLOdd
`
`
`
`
`
`
`A9vduopeoyioedsJeouenbesOOrYaivaLIGAFLVCSLYNIOO
`
`
`
`
`
`
`
`IGlogPEGLOdJequigjaespLoog‘iequisidespz
`
`
`
`
`
`“ISNayer7nyafoe~~)SELLNI=~SSLLNIHSLNIow":AOMLNOOTI288i
`
`
`
`eVeSSOwd71|connec|!
`
`
`SPLSAYNOS-———,TONLNOO4nn~
`
`YORLNDDSINVLSNOD
`
`
`
`
`
`
`
`
`
`
`XSLMSA
`
`|peayJO]
`
`
`
`
`
`xe@BGCIGAODUOBSNON1yGuAdODsoussajoyY“JENUSPYUODLLY@wxxseV\ccose,—sopussuenbeg“ogryderGeirWING
`
`
`
`
`
`
`
`
`
`MOIAISAOJgononbeg[e10ue53[|sanSLy
`
`
`
`
`
`
`ds
`
`~Fe,—x|G@Ofod
`
`.i_.
`
`
`
`
`
`
`
`SLVLSHOLSS
`
`
`
`AMD1044_0257718
`
`ATI Ex. 2109
`IPR2023-00922
`Page8 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 8 of 326
`
`
`
`
`
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`“6 i
`
`| Gof5i
`GEN-CXXXXX-REVA
`4 September, 201544
`24 September, 2001
`
`1.1 Top Level Block Diagram
`
`[ InputArbiter ]
`
`—}
`
`VTX RS
`
`PIX RS
`
`}*—
`
`
`
`
`
`
`
`
`
`Texture
`
`Figure 2: Reservationstations and arbiters
`Under this new scheme, the sequencer (SQ) will only use one global state management machine per vector type
`(pixel, vertex) that we call the reservation station (RS).
`
`Exhibit 2038 dockdGo_Sequencerdec
`
`73669 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © »~*
`
`AMD1044_0257719
`
`ATI Ex. 2109
`IPR2023-00922
`Page 9 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 9 of 326
`
`
`
`
`
`
`ORIGINATE DATE
`PAGE
`
`
` EDIT DATE
`R400 Sequencer Specification
`10 of 51
`
`4 September, 201544
`
`|
`24 September, 2001
`1.2 Data Flow graph (SP)
`
`|
`
`
`
`instruction
`
`
`
`
`<2
`
`o
`
`=
`
`<
`2
`S
`E
`w
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
` h
`
`
`
`
`
`
`Register File
`
`
`
`|
`I
`
`raques
`i
`
`|
`
`\ i i—
`
`|
`
`ay
`fenturel& juest_
`g
`*
`
`ao
`
`i——-
`12)
`g)
`§
`a
`Fry
`
`Register File
`
`MAC
`
`|
`
`Register File
`
`=e
`
`7
`
`Ji
`scalar input/output
`!
`
`MA
`
`|
`!
`|
`i
`:
`|
`|
`a i
`i
`> i
`i
`|
`
`|
`i
`i
`¢
`:
`i
`
`Wy key
`|
`1
`
`fexturerel
`Est
`| KN
`
`r
`
`La
`
`
`
`
`
`
`
`'I|
`
`address
`
`
`
`exture
`
`~
`~\I
`
`|
`= Sx
`
`L
`_,
`|
`pipeline stage
`|
`
`|
`
`uu
`ee
`<
`oS
`2
`=
`SB
`8
`=
`2|_Ist
`__
`sr A
`
`Register File
`
`
`|(= ,
`“.
`|
`scalar inputfoutput
`MAC —BN fel
`
`
`fre requ
`
`pipeline stage
`I
`'
`
`
`
`|
`
`|
`pipeline stage
`|
`Pamc
`n
`&
`oO
`QO
`oO
`
`
`| a
`to Primitive Assembly Unit or RenderBackend
`5
`
`Figure 3: ‘The shader Pipe
`
`Exhibit 2035 deckacd_Sequencerdes
`
`73669 Bytes** © ATI Confidential. Reference Copyright Notice on Cover Page © »~*
`
`AMD1044_0257720
`
`ATI Ex. 2109
`IPR2023-00922
`Page 10 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 10 of 326
`
`€
`
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`24 September, 2001
`
`4 September, 201544
`2h
`
`_.
`
`GEN-CXXXXX-REVA
`
`PAGE
`
`11 of 51
`
`The gray area represents blocks that are replicated 4 times per shader pipe (16 times on the overall chip).
`
`1.3 Control Graph
`
`Ciause # + Rdy
`WrAddr
`CMD
`
`|
`
`IS
`
`SEQ
`
`WrAddr
`
`
`
`Phase
`
`BC Wrvec
`RaAddr WSC8 rer
`pe
`
`FETCH
`
`SP
`
`wo
`
`OF
`
`WrAdar
`
`|
`
`Figure 4: Sequencer Control interfaces
`
`in red the ALU control interface, in blue the Interpolated/Vector
`In green is represented the Fetch control interface,
`control interface and in purple is the outputfile control interface.
`
`2. Interpolated data bus
`The interpolators contain an lJ buffer to pack the information as much as possible before writing it to the register file.
`
`Exhibit 2038 dockdGo_Sequencerdec
`
`73669 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © »~*
`
`AMD1044_0257721
`
`ATI Ex. 2109
`IPR2023-00922
`Page 11 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 11 of 326
`
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`PAGE
`
`R400 Sequencer Specification
`
`12 0f 51
`24 September, 2001
`4 September, 201544
`
`
`carat
`
`
`| ~oeTe]ToRB |
`
`
`
`
`
`
`
`Ne CROSSBAR (4x100bite)
`
`pa
`|or ae
`!
`|
`eeTewenn ne
`
` t ne
`
`a — —
`
`re
`Ii ne
`EEE
`|
`AQ
`AO
`At
`Bo
`iJs buffer (ping-pong buffer)
`i
`(25 bits *8 (WW) 74°4* 4 (quadruple-bufferg
`Ao
`At
`42
`nanaraces| 12800 bits
`|
`Bt
`ce
`ct
`2
`/
`Bt
`co
`c
`c2
`
`C3
`C4
`ch
`bo
`Xs buffer (ging-pong duifer}
`i
`24 bits * 16 quads *2
`/
`768 bits
`!
`3ox24
`|
`Dl
`2
`EO
`e1
`/
`
`T
`T
`i
`LC
`i
`|
`|
`|
`i
`INTERPOLATORS
`i
`1
`,
`|
`
`
`|
`|
`il
`ll
`
`PP nl
`i
`i
`i
`|
`|
`|
`|
`|
`|
`|
`i
`
`
`
`
`
`
`
`|} QUEWUL the|ae|} SUL |) 4ub '" || aur | aur || 4uR || Sue ||
`
`
`LL Le
`|
`
`BO
`
`Do
`
`c3
`
`C4
`
`cS
`
`
`
`
`
`
`
`
`
`on
`
`
`2
`
`3
`
`4
`
`812
`
`i
`
`!I
`
`Figure 5: Interpolation buffers
`
`Exhibit 2036.dockaod_Sequencerdec
`
`73669 Bytes“** © ATI Confidential. Reference Copyright Notice on Cover Page © »*
`
`AMD1044_0257722
`
`ATI Ex. 2109
`IPR2023-00922
`Page 12 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 12 of 326
`
`
`
`
`
`
`
`
`
`
`
`
`
`dovd
`
`bSJO€L
`
`
`
`WN
`bl
`
`
`
`VASEXXXXXO-NAD
`
`“ASLNSANOOG
`
`
`
`
`
`TIWIAE.LVNYaddoOFAILOdLOdd
`
`
`
`
`
`WELISeIpSUNUOBLOd.93U]79BANSL]
`
`
`
`
`
`
`
`
` 0e.L)6LLObl
`
`
`
`
`
`
`
`ChL|LLL
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`SLVdLid3
` Loog‘Jequisidespz aivdav
`
`
`
`
`
`ID1eO
`
`
`
`
`
`AMD1044_0257723
`
`ATI Ex. 2109
`IPR2023-00922
`Page 13 of 326
`
`
`
`
`
`
`
`
`
`axe@BHCJ9AODUOSINIONJYBUAGODsoUdJajoy“[ENUSPIUOD[Ly@wesccose,sopusousnbes“oorusuyshoeTai
`
`
`
`
`
`
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 13 of 326
`
`
`
`
`
`
`
`
`
`ORIGINATE DATE
`24 September, 2001
`
`
`PAGE
`R400 Sequencer Specification
`EDIT DATE
`14 0f 51
`4 September, 201544
`
`EadesbeenREMA
`|Aboveis an example ofa tile the sequencer might receive from the SC. The write sideishowthe data get stacked -
`into the XY and lJ buffers, the read side is how the data is passed to the GPRs. The IJ information is packed in the IJ
`buffer 4 quads at a time or two clocks. The sequencer allows at any given time as many as four quads to interpolate a
`parameter. They all have to come from the same primitive. Then the sequencer controls the write mask to the GPRs
`to write the valid data in.
`
`Instruction Store
`3.
`There is going to be only oneinstruction store for the whole chip. It will contain 4096 instructions of 96 bits each.
`
`It is likely to be a 1 port memory; we use 7 clock to load the ALU instruction, 1 clocks to load the Fetch instruction, 1
`clock to load 2 control flow instructions and 1 clock to write instructions.
`
`The instruction store is loaded by the CP thru the register mapped registers.
`
`The VS_BASE and PS_BASE context registers are used to specify for each context where its shader is in the
`instruction memory.
`
`For the Real time commandsthe story is quite the same but for some smail differences. There are no wrap-around
`points for real time so the driver must be careful not to overwrite regular shader data. The shared code (shared
`subroutines) uses the same path as real time.
`
`4, Sequencer Instructions
`All control flow instructions and move instructions are handled by the sequencer only. The ALUs will perform NOPs
`during this time (MOV PV,PV, PS,PS) if they have nothing else to do.
`
`5. Constant Stores
`
`5.1 Memory organizations
`A likely size for the ALU constant store is 1024x128 bits. The read BW from the ALU constant store is 128 bits/clock
`and the write bandwidth is 32 bits/clock (directed by the CP bus size not by memory ports).
`
`The maximum logical size of the constant store for a given shaderis 256 constants. Or 512 for the pixel/vertex shader
`pair. The size of the re-mapping table is 128 lines (each line addresses 4 constants). The write granularity is 4
`constants or 512 bits.
`It takes 16 clocks to write the four constants. Real time requires 256 lines in the physical
`memory (this is physically register mapped).
`
`The texture state is also kept in a similar memory. The size of this memory is 320x96 bits (128 texture states for
`regular mode, 32 states for RT). The memory thus holds 128 texture states (192 bits per state). The logical size
`exposes 32 different states total, which are going to be shared between the pixel and the vertex shader. The size of
`the re-mapping table to for the texture state memory is 32 lines (each line addresses 1 texture state lines in the real
`memory). The CP write granularity is 1 texture state lines (or 192 bits). The driver sends 512 bits but the CP ignores
`the top 320 bits.
`It thus takes 6 clocks to write the texture state. Real time requires 32 lines in the physical memory
`(this is physically register mapped).
`
`The control flaw constant memory doesn’t sit behind a renaming table. It is register mapped and thus the driver must
`reload its content each time there is a changein the control flow constants. Its size is 320*32 because it must hold 8
`copies of the 32 dwords of control flow constants and the loop construct constants must be aligned.
`
`The constant re-mapping tables for texture state and ALU constants are logically register mapped for regular mode
`and physically register mapped for RT operation.
`
`Exhibit 2035 deckacd_Sequencerdes
`
`73669 Bytes** © ATI Confidential. Reference Copyright Notice on Cover Page © »~*
`
`AMD1044_0257724
`
`ATI Ex. 2109
`IPR2023-00922
`Page 14 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 14 of 326
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`
`
`| 5
`
`PAGE
`15 of 51
`
`|
`
`DOCUMENT-REV. NUM.
`GEN-CXXXXX-REVA
`
`
`;
`eT:
`24 September, 2001
`4 September, 201544
`.2 Managementof the Control Flow Constants
`The conirol flow constants are register mapped, thus the CP writes to the accarding register to set the constant, the
`SQ decodes the address and writes to the block pointed by its current base pointer (CF_WR_BASE). On the read
`side, one level of indirection is used. A register (SQ_CONTEXT_MISC.CF_RD_BASE) keeps the current base pointer
`to the control flow block. This register is copied wheneverthere is a state change. Should the CP write to CF after the
`state change, the base register is updated with the (current pointer number +1 )% number of states. This way, if the
`CP doesn't write to CF the state is going te use the previous CF constants.
`
`5.3 Managementof the re-mapping tables
`
`5.3.1 R400 Constant management
`The sequencer is responsible to manage two re-mapping tables (one for the constant store and one for the texture
`state). On a state change (by the driver), the sequencerwill broadside copy the contentsofits re-mapping tables to a
`new one. We have 8 different re-mapping tables we can use concurrently.
`
`The constant memory update will be incremental, the driver only need to update the constants that actually changed
`between the two state changes.
`
`For this model to work in its simplest form, the requirement is that the physical memory MUSTbeat least twice as
`large as the logical address space + the space allocated for Real Time. In our case, since the logical address space
`is 512 and the reserved RT space can be up to 256 entries, the memory must be of sizes 1280 and above. Similarly
`the size of the texture store must be of 32*2+32 = 96 entries and above.
`
`5.3.2 Proposal for R400LE constant management
`To make this scheme work with only 512+256 = 768 entries, upon reception of a CONTROL packet of state + 1, the
`sequencer would check for SQ_IDLE and PA_IDLE andif both are idle will erase the content of state to replaceit with
`the new state (this is depicted in Figure 8: De-allocation mechanismFigure-8:De-allocation-mechanismFigure 8:De~
`allocation-mechanism). Note that in the case a state is cleared a value of 0 is written to the corresponding de-
`allocation counter location so that when the SQ is going to report a stale change, nothing will be de-allocated upon
`the first report.
`
`The second path sets all context dirty bits that were used in the current state to 1 (thus allowing the new state to
`reuse these physical addresses if needed).
`
`Exhibit 2038 dockdGo_Sequencerdec
`
`73669 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © »~*
`
`AMD1044_0257725
`
`ATI Ex. 2109
`IPR2023-00922
`Page 15 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 15 of 326
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`
`24 September, 2001
`
`4 September, 201544
`ae DN
`
` Logical Acdress
`
` [0 «— Read
`
`
`FreeLis
`
`
`
`
`
`
`
`
`
`
`
`
` pirto
`
`
`Address
`
`to Allocate
`
`R400 Sequencer Specification
`
`
`
`
`Renaming Table
`Context 0 =>
`
`r|
`
`Current/Last
`Context
`
`
`
`(8 rows of 16-8 |.
`
`
`bit physical =>
`128 entries copy
`
`
`
`
`in eight clocks)
`
`
`
`
`
`=
`
`.
`Logical Address
`& Context
`
`
`|
`|
`
`
`Context N
`|
`
`|
`
`
`Physical
`Address
`
` Staging Data
`
`Physical
`Memory
`
`
`Global Register
`Data Bus
`Constants
`
`Buffer
`
`
`
`
`
`
`
`
`-
`
`Staging Write Addr|
`
`:
`
`|
`I
`
`1
`
`
`
`
`Dealloc
`
`physical
`'
`next
`address
`Counts |
`
`physical
`to
`
`adcress
`scnedile
`ready
`for
`
`for allocate
`dealloc
`Seq
`en
`snral
`
`
`
`Constant
`|
`Logicaladdress
`Request
`GlbRegBus
`°
`_ a
`
`when Isb are zero
`|
`This
`|
`i
`
`first word of write Context||.
`
`
`| Dirty
`Renaming Table
`y
`!
`=)
`
`a
`:
`for 1 Context
`er
`|
`!
`
`
`
`Curentlast||coical | Loyieal | | Context &
`
`
`Physical
`i
`_
`i
`|
`Logical
`|
`
`Address
`Address
`| Address
`“—~ Address —]
`|
`|
`only
`(ifset
`|
`|
`|
`de-
`don't
`!
`
`
`| allocate
`allocate
`———
`ifset)
`or de-
`|
`allocate)|
`I
`
`
`Copy Last held aboveto
`Current Context on receipt
`of Set Constant for a
`newcontext (Hide loading
`behind Set State load - 16 clocks)
`all cther Set States just write one
`entry to current state.
`
`
`
`Renaming
`table
`Contexts
`
`Exhibit 2035 docR400_Sequencerdac
`
`73869 Bytes** © ATI Confidential. Reference Copyright Notice on Cover Page © »*
`
`Figure 7: Constant management
`
`AMD1044_0257726
`
`ATI Ex. 2109
`IPR2023-00922
`Page 16 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 16 of 326
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`4 September, 201544
`
`DOCUMENT-REV. NUM.
`
`
`GEN-CXXXXX-REVA
`|
`17 of 51
`
`PAGE
`
`Free List
`
`CNT VALUE
`
`BEALOC
`COUNTERS
`
`I|
`WRI TE_ENASLE
`
` 24 September, 2001
`
`
`
`SQ_STATE#
`
`
`
`
`|
`
`|
`
`|
`|
`PREVIOUS
`
`le—nor| ome
`
`|
`NEW
`|
`|
`|
`|
`STATE
`
`— |
`——
`‘
`
`VALUE
`
`|
`|||
`
`VALID
`
`[oo
`
`
`
`
`
`
`ra
`
`IDLE
`——| AND -#——PA_IDLE
`he CP_NEW_STATE_CNTL—
`———_!
`REMAPPING
`~¢—_—_SET CTX BITS
`TABLE
`
`
`Figure 8: De-allocation mechanism for R400LE.
`
`5.3.3 Dirty bits
`Two sets of dirty bits will be maintained per logical address. The first one will be set to zero on reset and set when
`the logical address is addressed. The second onewill be set to zero whenever a new context is written and set for
`each address written while in this context. The reset dirty is not set, then writing to that logical address will not
`require de-allocation of whatever address stored in the renaming table.
`(fit is set and the context dirly is not set, then
`the physical address store needs to be de-allocated and a new physical address is necessary to store the incoming
`data.
`lf they are both set, then the data will be written into the physical address held in the renaming for the current
`logical address. No de-allocation or allocation takes place. This will happen when the driver does a set constant
`twice to the samelogical address between context changes. NOTE:
`It is important to detect and prevent this, failure
`to do it will allow multiple writes to allocate all physical memory and thus hang because a context will not fit for
`rendering to start and thus free up space.
`
`5.3.4 Free List Block
`A free list block that would consist of a counter (called the IFC or Initial Free Counter) that would resel to zero and
`incremented every time a chunk of physical memory is used until they have all been used once. This counter would
`be checked each time a physical biock is needed, andif the original ones have not been used up, us a new one, else
`check the free list for an available physical block address. The count is the physical address for when getting a
`chunk frorn the counter.
`Storage of a free list big enoughto store all physical block addresses.
`Maintain three pointers for the free list that are reset to zero. The first one we will call write_ptr. This pointer will
`identify the next location to write the physical address of a block to be de-allocated. Note: we can never free more
`physical memory locations than we have. Once recording address the pointer will be incremented to walk the freelist
`like a ring.
`The second pointer will be called step_ptr. The stop_ptr pointer will be advanced by the number of address chunks
`de-allocates when a context finishes. The address between the stop_ptr and write_ptr cannot be reused because
`they are still in use. But as soon as the context using then is dismissed the stop_pir will be advanced.
`The third pointer will be called read_ptr. This pointer will point will point to the next address that can be used for
`allocation as long as the read_ptr does not equal the stop_ptr and the IFC is at its maximum count.
`
`Exhibit 2038 dockdGo_Sequencerdec
`
`73669 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © »~*
`
`AMD1044_0257727
`
`ATI Ex. 2109
`IPR2023-00922
`Page 17 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 17 of 326
`
`
`
`PAGE |
`
`R400 Sequencer Specification
`EDIT DATE
`ORIGINATE DATE
`
`i aie E
`
`24 September, 2001
`4 September, 207544
`18 0f51
`; 535 De-allocate Block
`This block will maintain a free physical address block count for each context. While in current context, a count shall
`be maintained specifying how many blocks were written into the free list at the write_ptr pointer. This count will be
`reset upon reset or when this context is active on the back and different than the previous context.
`It is actually a
`count of blocks in the previous context that