throbber
AUTOMATICALLY UPDATED FIELDS:
`Document Location:
`Cc\periorce'r400\doc_lib\designiblocks\sq\R400Sequencer.doc
`
`Current intranet Search Title:
`R400 Sequencer Specification
`
`| ee SEES
`Bees
`APPROVALS |
`:
`:
`:
`=
`
`
`|
`
`
`Name/Dept.
`
`J a Signature/Date:
`
`
`
`
`
`ORIGINATE DATE
`
`“Author:
`
`
`
`EDIT DATE
`DOCUMENT-REV. NUM.
`GEN-CXXXXX-REVA
`1 of 51
`
`24 September, 2001 4 September, 207544a.Ledeentee
`
`
`Laurent Lefebvre
`_|ser No:
`
`R400 Sequencer Specification
`
`SQ
`
`Version 2.07
`
`
`It provides an overview of the
`Overview: This is an architectural specification for the R400 Sequencer block (SEQ).
`required capabilities and expected uses of the block.
`it also describes the block interfaces,
`internal sub-
`blocks, and provides internal state diagrarns.
`
` ‘IssueTo:
`
`r
`
`:
`
` transmitted in any form or by any means without the prior written permission of ATI Technologies Inc.”
`
`
`
`
`Remarks:
`
`
`
`
`
`
`THIS DOCUMENT CONTAINS CONFIDENTIAL INFORMATION THAT COULD BE
`
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES
`
`INC. THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`
`
`“Copyright 2001, ATI Technologies Inc. All rights reserved. The material in this document constitutes an unpublished
`work created in 2001. The use of this copyright notice is intended to provide notice that ATI owns a copyright in this
`unpublished work. The copyright notice is not an admission that publication has occurred. This work contains
`confidential, proprietary information and trade secrets of ATI. No part of this document may be used, reproduced, or =
`
`
`
`Exhibit 2038.docRe00_Sequencerdec
`
`73669 Bytes** © ATI Confidential. Reference Copyright Notice on Cover Page © ***
`
`ATI 20353
`LGv. ATI
`IPR2015-00325
`
`AMD1044_0257711
`
`ATI Ex. 2109
`IPR2023-00922
`Page 1 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 1 of 326
`
`

`

`
`ORIGINATE DATE
`EDIT DATE
`
`R400 SequencerSpecification
`PAGE
`oe
`2 0f 51
`:
`24 September, 2001 4September,2015+
`
`tember,201oi4
`
`
`
`.OfContents
`
` OVERVIEWoeriLEEnde ineent ere 7
`L
`
`
`
`1.1 Top Level Block Diagram ooceceeesceebenteeseetteeeeerccriteesesssieeeeecissisesesteterenenseees 9
`1.2 Data Flow graph (SP) ccc eect eee e nnn ee eb ebttteeercottteeeescateesetiseeeetetineeess 10
`
`LStencebbeteetbsbtieeeettvcietesettiieeesttttenennaes 4COMPO) GADDce eee ieee cite
`
`
`
`
`
`2. INTERPOLATED DATA BUS octetseereineinnenecnanecnaneedeeenanieneenaannnnanenenaes We
`
`3.
`INSTRUCTION STORE. oo. eerie ee creer en nnn nec nannennee sec tnneaneeenenenenensnnen 14
`4,
`SEQUENCER INSTRUCTIONS. o.ecccccccee csc nn een ees aenneeeeenanneneneeeene 14
`&.
`CONSTANT STORESue
`
`$.1 Memory organizationsooo
`$.2 Management of the Control Flow Constants......
`:
`§.3 Management of the re-mapping tables 000...
`5.3.1
`R400 Constant management oo. ccc cc cece tees treteetsceeetitetttesvetitetetiesnteees 15 |
`§,.3.2
`Proposal for R4AQ0LE constant MANAGEMENT oo cece cee teteteeevetettteteetteees 15
`SB.3
`Dirty DIS cece ee ceeetet te eetbevivititititesitttentrermitinttitscitititetecreetresteteeaes 7
`
`5.3.4
`Free List Block oo...
`wi lT eee
`5.3.5
`De-allocate BlOCK ooo occ ccc cece ttee ti tetteett is tit titistitetitititittitstttrerctnttrsttes 18
`5.3.6
`Operation of Incremental modeleters rier rntatrernrsnasnes 18
`$.4. Constant Store Indexing...
`218
`
`$3.5 Real Time Commands...
`19 |
`$3.6 Constant Waterfalling.0...
`18
`6
`LOOPING AND BRANCHES...
`naa
`6.1
`The controlling state. rere eer steer re ernst eter rrr eset resrroraay 20 a
`6.2
`The Control Flow Progra oo. cceeceeeccceeeeentrneeeeevneeecseennreeessentnetevecvutetevennteates 20 88
`6.2.1
`Control flow instructions table oooce ects eet te nets everetenrienens 21.
`BB
`IMPlEMe tation. ee cee eee ce eee cette ett ee eE Otte Sette i ttaeetcettateeeecstaaeeettriaaees 23
`64 Data dependant predicate instructions......
`24
`
`65
`HW Detection of PVPS oo.
`25
`6G Register Me inCexing cece eee eee eeeeettrssseeernneeestenteeesstttteeeertrtrreennnreannad 25 5%
`6.7 Debugging the Shaders ooec cee eect sec eeecueeeeescuieeeeeeccuuiesesscrsa 2625
`6.7.1 Method 1: Debugging registers ooo cece c cette tttttstttre tintin: 26
`6.7.2 Method 2: Exporting the values in the GPRS ooo ccc te ttre tienes 26
`PIMEL KILL MASK occ cet cei ni en enna ened cdc oinneneeenenenneneeenien 26
`
`
`
`
`
`
`7.
`
` MULTIPASS VERTEX SHADERS (HOS)... cccccc sess cee ceeeenessseenneneesssseeanenecessensnencensnees 26
`8
`REGISTER FILE ALLOCATION. occ ccsccccceeserecrseensnerestseassnecesinncuneecisen
`2
`9.
`
`
`
`10. FETCH ARBITRATION. cececenceeneenenernennenene
`
`11. ALU ARBITRATION ......
`
`12.
`HANDLING STALLS occ enne center recneneceeeemnenennee
`
`ease
`13.
`CGONTENT OF THE RESERVATION STATION FIFOS.....
`
`14.
`THE OUTPUT FILE oo ceccceccccsnecsnee secant een enn ERE Re Eanacne 2
`15.
`ID FORMAT cece ceeeneEEEEEe nee 29
`15.1
`Interpolation of constant attributes...
`29
`
`16.
`STAGING REGISTERS ooo ccccccccccsccenenneccnneeennneeesnneeenna ces snnenes ieee eeceneneeecunesennseeeennesennenes ene 30
`
`Exhibit 2035.docRaod_Sequencerdec
`
`73669 Bytes** © ATI Confidential. Reference Copyright Notice on Cover Page © »*
`
`AMD1044_0257712
`
`ATI Ex. 2109
`IPR2023-00922
`Page2 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 2 of 326
`
`

`

`
`
`PAGE
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`24 September, 2001
`4 September, 201544
`GEN-CXXXXX-REVA
`3of 51
`THE PARAMETER CACHE... cece csessnussssssnsssnssestusessnsusessuiuesessuseessuacensiceessuneectsnece 31
`17.
`Export restriclons ooeee eter res leet ttssinrttretitenesttnsittereneririee B20
`17.10
`PTL|Piel experts: eect estes estes teeter eneneerssnsuissesnessaeenstissoisateansninsnneenernaeey 32
`L712
`Vertex OxOrtse occ cece eettere testes trtitetttetitietrttertitititinttttmterttcretiteess 320
`L713
`Pass thru Oxports. oo cece ceeseetstetertreetttetitietrettrtitititetitinettmteretcrenetess 32
`17.2
`Arbitration restrictions ...........
`ven
`:
`
`18.
`EXPORT TYPES .oo..cccccccccccccscnsscceneceeeeenaaenseennnnnnn eee eaaaaaaeeesaaaaaaaiaeseaaaaeaanesssaueneeeceessseeeeneceeaaae
`
` VOrex SHAGINQ.EEE eee ete e eb bb ee eeetbbeeteeebteeeeeeeetees 32
`1B.1
`
`Pixel Shading0 cee
`182
`
`.
`-
`SPECIAL INTERPOLATION MODES.....
`19.
`Real TIM] COMIMANGS ooo ccc cee ccceeee vate eeraeevaeeeeneseeersaebvateerereetetuaeereseeentaeveneernnnaaes 33 628
`19.1
`Sprites/ XY screen coordinates/ FB information... cece teeter erteeeeeeeenaes 33
`19.2
`Auto generated COUNTEIS ccc c cess cece tees eee ees cessee eee ussseeecuusceeeeessstseeeensa 34
`19.3
`TO3.1|Vertex Shadersocc cece cece ee ceeee ee teeseetestrteitierertrtetevinevsuttinenensnenees 34 .
`
`19.3.2
`Pixel SHADESocc ccc ce ceeececevesseseveveeeesveeessvsvessssvivvitesevivetvaveteveveressstiieeses 34
`20.
`STATE MANAGEMENT ooo cccccceccccccsssessecesansnnneesesaaaaaaneeoeaunaaaaueseaesaaauecessuseeseresteeeeenaneceoaae 35
`20.1
`Parameter cache synchronization...
`21.
`XY ADDRESS IMPORTS.......ccccccceeee
`
`a”
`
`
`
`21.1
`22.
`23.
`23.1
`23.2
`
`Vertex INC@KXES IMPOSE Dee Ente cb tbtt tee bbbtteteeetttetees 35
`REGISTERS. o.....ccccccccccsccencccccccccccccsseseessereesccnsccanssassessseeusccenccanencasstnsessscetsecenscensssnnecsserencenne 35 8 oe
`INTERFACESLouse css ccsssssessnensscssssassesascesseeassseessisaneensessossnsnensentaneatscssssanen3
`External Interfaces. cece estes tees tesreesesreesitsiiesness osteitis 36
`SC to SP Interhaceeee e treet ee eee en tspevesaeeetsepevaeaeyrsaeptnaeeennaayess 36
`
`SCSPH oo cececccccccceesssssesseeessesssvesesssssasvsuessesssmessssesvimesssssvesssesssvesen 36
`DBZ
`SCSQ cece ccscccseeessesssuesssssessnstissntnuntentstrttttttisessvitististessvittsss eee7
`QB2 2
`SQ to SX(SP): Interpolater DUScc cc eect ttre ret tetetttittetnttetees 39 08
`23.2.3
`SQ to SP: Staging Register Data cece csc uesesesestseststeseseseseseensesvsneess 39
`23.2.4
` VGT to SQ: Vertex Interface.ccc ce ttee cette tetttrtititttittetitttreititinetess 39
`23.2.5
`SQ to SX: Control BUS ccc cree tittetite ts titteetitie tists tititetnrresees 42 —
`23.2.6
`SX to SQ: Output file controletter nett tettieteeee 4200
`23.2.7
`SQ to TP: Control busec cee eee ceetettie rtttttetttetnttetene 43 00
`23.2.8
`TP to SQ: Texture Stalec cnet tree ster tittt rttttttttenttteeness ABZ
`23.2.9
`23.210 SQ to SP: Texture stall ccc cece cece estes seen teteeseitttstttevettettenevecenass 44
`23.2.1]
`SQ to SP: GPR and auto coUNten le ccc tect tte ttttetetttetet tenet 44
`23.212 SQ to SPx: Instructions ooo ccc cee etter ete te tite tetetitittttestettettreetees 45 0
`23.2.13 SP to SQ: Constant address load/ Predicate Set/Kill sete 46
`23.214 SQ to SPx constant broadcasteee cette cette tieetctttttette inet 46
`23.215 SQ to CP: RBBM BUS llc ccc cee ceeteetebetestesteseeteeeesestetctititesteteriestes 46
`23.216 GCP to SQ: RBBM BUSec eee nite iis eieeseitetersieee 46
`23.217 SQ to CP: State reportete crete dete tttttnttttnee AT
`23.3
`Example of control flow program executionccc cette eter etttteteeeeee 47
`
`Exhibit 2038.docka09_Sequencercloc
`
`73669 Bytes** © ATI Confidential. Reference Copyright Notice on Cover Page © »*
`
`:
`
`AMD1044_0257713
`
`ATI Ex. 2109
`IPR2023-00922
`Page3 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 3 of 326
`
`

`

`
`
`ORIGINATE DATE
`
`24 September, 2001
`
`
`EDIT DATE
`4 September, 201544
`
`Pe
`P24 QPEN ISSUES oo ooccccccccccccccecececcecececeseseuccsen cousecscevacecsvaussesscovaseauascevasasascecessueuauascesasesenvece sean a
`
`R400 Sequencer Specification
`
`Exhibit 2036.dockaod_Sequencerdec
`
`73669 Bytes“** © ATI Confidential. Reference Copyright Notice on Cover Page © »*
`
`AMD1044_0257714
`
`ATI Ex. 2109
`IPR2023-00922
`Page 4 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 4 of 326
`
`

`

`
`
`PAGE
`DOCUMENT-REV. NUM.
`EDIT DATE
`ORIGINATE DATE
`
`GEN-CX0OOCREVA 5 of 51
`24 September, 2001
`4 September, 201544
`38"
`
`RevisionRevisionChanges:
`First draft.
`Rev 0.1 (Laurent Lefebvre)
`Date: May 7, 2001
`
`Rev 0.2 (Laurent Lefebvre)
`Date : July $, 2001
`Rev 0.3 (Laurent Lefebvre)
`Date : August 6, 2001
`Rev0.4 (Laurent Lefebvre)
`Date : August 24, 2001
`
`Rev 0.5 (Laurent Lefebvre)
`Date : September 7, 2001
`Rev 0.6 (Laurent Lefebvre)
`Date : September 24, 2001
`Rev 0.7 (Laurent Lefebvre)
`Date : October 5, 2001
`
`Rev 0.8 (Laurent Lefebvre)
`Date : October 8, 2001
`Rev 0.9 (Laurent Lefebvre)
`Date : October 17, 2001
`
`Rev 1.0 (Laurent Lefebvre)
`Date : October 19, 2001
`Rev 1.1 (Laurent Lefebvre)
`Date : October 26, 2001
`
`Rev 1.2 (Laurent Lefebvre)
`Date : November 16, 2001
`Rev 1.3 (Laurent Lefebvre)
`Date : November 26, 2001
`Rev 1.4 (Laurent Lefebvre)
`Date : December 6, 2001
`
`Rev 1.5 (Laurent Lefebvre)
`Date : December 11, 2001
`
`Rev 1.6 (Laurent Lefebvre)
`Date : January 7, 2002
`
`Rev 1.7 (Laurent Lefebvre)
`Date : February 4, 2002
`Rev 1.8 (Laurent Lefebvre)
`Date : March 4, 2002
`
`Rey 1.9 (Laurent Lefebvre)
`Date : March 18, 2002
`Rev 1.10 (Laurent Lefebvre)
`Date : March 25, 2002
`Rev 1.11 (Laurent Lefebvre)
`Date : April 19, 2002
`Rev 2.0 (Laurent Lefebvre)
`Date : April 19, 2002
`
`Changed the interfaces to reflect the changesin the
`SP. Added somedetails in the arbitration section.
`Reviewed the Sequencer spec after the meeting on
`August 3, 2001.
`Added the dynamic allocation method for register
`file and an example (written in part by Vic) of the
`flow of pixels/vertices in the sequencer.
`Added timing diagrams(Vic)
`
`the new R400
`
`reflect
`spec to
`Changed the
`architecture. Added interfaces.
`instruction
`Added
`constant
`store management,
`store management, control flow management and
`data dependant predication.
`Changed the control
`flow method to be more
`flexible. Also updated the external interfaces.
`Incorporated changes made in the 10/18/01 control
`flow meeting. Added a NOP instruction, removed
`the
`conditional_execute_or_jump. Added
`debug
`registers.
`Refined interfaces tc RB. Added state registers.
`
`delta
`SEQ—-SPO interfaces. Changed
`Added
`precision. Changed VGT-SPO0interface. Debug
`Methods added.
`Interfaces greatly refined. Cleaned up the spec.
`
`Added the different interpolation modes.
`
`Added the auto incrementing counters. Changed
`the VGT—SQ interface. Added content on constant
`management. Updated GPRs.
`Removed from the spec all interfaces that weren't
`directly tied to the SQ. Added explanations on
`constant
`management.
`Added
`PA-SQ
`synchronization fields and explanation.
`Added more details on the staging register. Added
`detail about
`the parameter caches. Changed the
`call
`instruction to a Conditionnal_call
`instruction.
`Added
`details
`on
`constant management
`and
`updated the diagram.
`in the 8X
`Added Real Time parameter control
`interface. Updated the control flow section.
`Newinterfaces to the SX block. Added the end of
`clause modifier,
`removed the
`end
`of clause
`instructions.
`Rearangement of the CF instruction bits in order to
`ensure byte alignement.
`Updated the interfaces and added a section on
`exporting rules.
`Added CP state report interface. Last version of the
`spec with the old control flow scheme
`Newcontrol flow scheme
`
`Exhibit 2038.docRa0o_Sequencerdec
`
`73669 Bytes*** @ ATI Confidential. Reference Copyright Notice on Cover Page © ***
`
`AMD1044_0257715
`
`ATI Ex. 2109
`IPR2023-00922
`Page 5 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 5 of 326
`
`

`

`
`
`ORIGINATE DATE
`
`24 September, 2001
`Rev 2.01 (Laurent Lefebvre)
`Date : May 2, 2002
`Rev 2.02 (Laurent Lefebvre)
`Date : May 13, 2002
`
`R400 Sequencer Specification
`
`flow instructions to
`
`EDIT DATE
`4 September, 201544
`CNeabes inne OWI 4
`Changed slightly the control
`allow force jumps and cails.
`Updated the Opcodes. Added type field to the
`constant/pred interface. Added Last
`field to the
`SQ—SP instruction load interface.
`SP interface
`updated
`to
`include
`optimizations. Added
`the
`predicate
`instructions,
`Documented the new parameter generation scheme
`for XY coordinates points and lines STs.
`Some
`interface
`changes
`and an architectural
`change to the auto-counter scheme.
`Widened the event interface to 5 bits. Some other
`little typos corrected.
`Looos.
`jumps and calls are now using a 13 bi
`address which allows to jump and call and loop
`around any control
`flow addresses (does not
`requires fo be even anymore),
`
`Rev 2.03 (Laurent Lefebvre)
`Date : July 15, 2002
`
`Rev 2.04 (Laurent Lefebvre)
`Date :August 2, 2002
`Rev 2.05 (Laurent Lefebvre)
`Date : September 10, 2002
`Rev 2.06 (Laurent Lefebvre)
`Date : October 11, 2002
`Rev 2.07 (Laurent Lefebvre)
`
`Date : October 14, 2002
`
` predication
`no
`stall
`
`Exhibit 2036.dockaod_Sequencerdec
`
`73669 Bytes“** © ATI Confidential. Reference Copyright Notice on Cover Page © »*
`
`AMD1044_0257716
`
`ATI Ex. 2109
`IPR2023-00922
`Page 6 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 6 of 326
`
`

`

`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`“6 i
`
`fFotSi
`|
`GEN-CXXXXX-REVA
`4 September, 201544
`24 September, 2001
`
`1. Overview
`The sequencer chooses two ALU threads and a fetch hread to execute, and executes all of the instructions in a block
`before looking for a new clause of the same type. Two ALU threads are executed interleaved to hide the ALU latency.
`The arbitrator will give priority to older threads. There are two separate reservation stations, one for pixel vectors and
`one for vertices vectors. This way a pixel can pass a vertex and a vertex can pass a pixel.
`
`To support the shader pipe the sequencer also contains the shader instruction cache, constant store, control flow
`constants and texture state. The four shader pipes also execute the same instruction thus there is only one
`sequencer for the whole chip.
`
`The sequencer first arbiirates between vectors of 64 vertices that arrive directly from primitive assembly and vectors
`of 16 quads (64 pixels) that are generated in the scan converter.
`
`The vertex or pixel program specifies how many GPRsit needs to execute. The sequencerwill not start the next
`vector until the needed spaceis available in the GPRs.
`
`Exhibit 2038 dockdGo_Sequencerdec
`
`73669 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © »~*
`
`AMD1044_0257717
`
`ATI Ex. 2109
`IPR2023-00922
`Page 7 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 7 of 326
`
`

`

`
`
`
`
`sy/—4xaBOLionmyx»™»FYOLSLSNI
`seihees
`
`’TRL
`
`
` TIWIAE.LVNYaddoOFAILOdLOdd
`
`
`
`
`
`
`A9vduopeoyioedsJeouenbesOOrYaivaLIGAFLVCSLYNIOO
`
`
`
`
`
`
`
`IGlogPEGLOdJequigjaespLoog‘iequisidespz
`
`
`
`
`
`“ISNayer7nyafoe~~)SELLNI=~SSLLNIHSLNIow":AOMLNOOTI288i
`
`
`
`eVeSSOwd71|connec|!
`
`
`SPLSAYNOS-———,TONLNOO4nn~
`
`YORLNDDSINVLSNOD
`
`
`
`
`
`
`
`
`
`
`XSLMSA
`
`|peayJO]
`
`
`
`
`
`xe@BGCIGAODUOBSNON1yGuAdODsoussajoyY“JENUSPYUODLLY@wxxseV\ccose,—sopussuenbeg“ogryderGeirWING
`
`
`
`
`
`
`
`
`
`MOIAISAOJgononbeg[e10ue53[|sanSLy
`
`
`
`
`
`
`ds
`
`~Fe,—x|G@Ofod
`
`.i_.
`
`
`
`
`
`
`
`SLVLSHOLSS
`
`
`
`AMD1044_0257718
`
`ATI Ex. 2109
`IPR2023-00922
`Page8 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 8 of 326
`
`
`
`
`
`

`

`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`“6 i
`
`| Gof5i
`GEN-CXXXXX-REVA
`4 September, 201544
`24 September, 2001
`
`1.1 Top Level Block Diagram
`
`[ InputArbiter ]
`
`—}
`
`VTX RS
`
`PIX RS
`
`}*—
`
`
`
`
`
`
`
`
`
`Texture
`
`Figure 2: Reservationstations and arbiters
`Under this new scheme, the sequencer (SQ) will only use one global state management machine per vector type
`(pixel, vertex) that we call the reservation station (RS).
`
`Exhibit 2038 dockdGo_Sequencerdec
`
`73669 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © »~*
`
`AMD1044_0257719
`
`ATI Ex. 2109
`IPR2023-00922
`Page 9 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 9 of 326
`
`

`

`
`
`
`ORIGINATE DATE
`PAGE
`
`
` EDIT DATE
`R400 Sequencer Specification
`10 of 51
`
`4 September, 201544
`
`|
`24 September, 2001
`1.2 Data Flow graph (SP)
`
`|
`
`
`
`instruction
`
`
`
`
`<2
`
`o
`
`=
`
`<
`2
`S
`E
`w
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
` h
`
`
`
`
`
`
`Register File
`
`
`
`|
`I
`
`raques
`i
`
`|
`
`\ i i—
`
`|
`
`ay
`fenturel& juest_
`g
`*
`
`ao
`
`i——-
`12)
`g)

`a
`Fry
`
`Register File
`
`MAC
`
`|
`
`Register File
`
`=e
`
`7
`
`Ji
`scalar input/output
`!
`
`MA
`
`|
`!
`|
`i
`:
`|
`|
`a i
`i
`> i
`i
`|
`
`|
`i
`i

`:
`i
`
`Wy key
`|
`1
`
`fexturerel
`Est
`| KN
`
`r
`
`La
`
`
`
`
`
`
`
`'I|
`
`address
`
`
`
`exture
`
`~
`~\I
`
`|
`= Sx
`
`L
`_,
`|
`pipeline stage
`|
`
`|
`
`uu
`ee
`<
`oS
`2
`=
`SB
`8
`=
`2|_Ist
`__
`sr A
`
`Register File
`
`
`|(= ,
`“.
`|
`scalar inputfoutput
`MAC —BN fel
`
`
`fre requ
`
`pipeline stage
`I
`'
`
`
`
`|
`
`|
`pipeline stage
`|
`Pamc
`n
`&
`oO
`QO
`oO
`
`
`| a
`to Primitive Assembly Unit or RenderBackend
`5
`
`Figure 3: ‘The shader Pipe
`
`Exhibit 2035 deckacd_Sequencerdes
`
`73669 Bytes** © ATI Confidential. Reference Copyright Notice on Cover Page © »~*
`
`AMD1044_0257720
`
`ATI Ex. 2109
`IPR2023-00922
`Page 10 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 10 of 326
`
`€
`

`

`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`24 September, 2001
`
`4 September, 201544
`2h
`
`_.
`
`GEN-CXXXXX-REVA
`
`PAGE
`
`11 of 51
`
`The gray area represents blocks that are replicated 4 times per shader pipe (16 times on the overall chip).
`
`1.3 Control Graph
`
`Ciause # + Rdy
`WrAddr
`CMD
`
`|
`
`IS
`
`SEQ
`
`WrAddr
`
`
`
`Phase
`
`BC Wrvec
`RaAddr WSC8 rer
`pe
`
`FETCH
`
`SP
`
`wo
`
`OF
`
`WrAdar
`
`|
`
`Figure 4: Sequencer Control interfaces
`
`in red the ALU control interface, in blue the Interpolated/Vector
`In green is represented the Fetch control interface,
`control interface and in purple is the outputfile control interface.
`
`2. Interpolated data bus
`The interpolators contain an lJ buffer to pack the information as much as possible before writing it to the register file.
`
`Exhibit 2038 dockdGo_Sequencerdec
`
`73669 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © »~*
`
`AMD1044_0257721
`
`ATI Ex. 2109
`IPR2023-00922
`Page 11 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 11 of 326
`
`

`

`
`
`ORIGINATE DATE
`EDIT DATE
`PAGE
`
`R400 Sequencer Specification
`
`12 0f 51
`24 September, 2001
`4 September, 201544
`
`
`carat
`
`
`| ~oeTe]ToRB |
`
`
`
`
`
`
`
`Ne CROSSBAR (4x100bite)
`
`pa
`|or ae
`!
`|
`eeTewenn ne
`
` t ne
`
`a — —
`
`re
`Ii ne
`EEE
`|
`AQ
`AO
`At
`Bo
`iJs buffer (ping-pong buffer)
`i
`(25 bits *8 (WW) 74°4* 4 (quadruple-bufferg
`Ao
`At
`42
`nanaraces| 12800 bits
`|
`Bt
`ce
`ct
`2
`/
`Bt
`co
`c
`c2
`
`C3
`C4
`ch
`bo
`Xs buffer (ging-pong duifer}
`i
`24 bits * 16 quads *2
`/
`768 bits
`!
`3ox24
`|
`Dl
`2
`EO
`e1
`/
`
`T
`T
`i
`LC
`i
`|
`|
`|
`i
`INTERPOLATORS
`i
`1
`,
`|
`
`
`|
`|
`il
`ll
`
`PP nl
`i
`i
`i
`|
`|
`|
`|
`|
`|
`|
`i
`
`
`
`
`
`
`
`|} QUEWUL the|ae|} SUL |) 4ub '" || aur | aur || 4uR || Sue ||
`
`
`LL Le
`|
`
`BO
`
`Do
`
`c3
`
`C4
`
`cS
`
`
`
`
`
`
`
`
`
`on
`
`
`2
`
`3
`
`4
`
`812
`
`i
`
`!I
`
`Figure 5: Interpolation buffers
`
`Exhibit 2036.dockaod_Sequencerdec
`
`73669 Bytes“** © ATI Confidential. Reference Copyright Notice on Cover Page © »*
`
`AMD1044_0257722
`
`ATI Ex. 2109
`IPR2023-00922
`Page 12 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 12 of 326
`
`

`

`
`
`
`
`
`
`
`
`
`
`dovd
`
`bSJO€L
`
`
`
`WN
`bl
`
`
`
`VASEXXXXXO-NAD
`
`“ASLNSANOOG
`
`
`
`
`
`TIWIAE.LVNYaddoOFAILOdLOdd
`
`
`
`
`
`WELISeIpSUNUOBLOd.93U]79BANSL]
`
`
`
`
`
`
`
`
` 0e.L)6LLObl
`
`
`
`
`
`
`
`ChL|LLL
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`SLVdLid3
` Loog‘Jequisidespz aivdav
`
`
`
`
`
`ID1eO
`
`
`
`
`
`AMD1044_0257723
`
`ATI Ex. 2109
`IPR2023-00922
`Page 13 of 326
`
`
`
`
`
`
`
`
`
`axe@BHCJ9AODUOSINIONJYBUAGODsoUdJajoy“[ENUSPIUOD[Ly@wesccose,sopusousnbes“oorusuyshoeTai
`
`
`
`
`
`
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 13 of 326
`
`
`
`

`

`
`
`
`
`ORIGINATE DATE
`24 September, 2001
`
`
`PAGE
`R400 Sequencer Specification
`EDIT DATE
`14 0f 51
`4 September, 201544
`
`EadesbeenREMA
`|Aboveis an example ofa tile the sequencer might receive from the SC. The write sideishowthe data get stacked -
`into the XY and lJ buffers, the read side is how the data is passed to the GPRs. The IJ information is packed in the IJ
`buffer 4 quads at a time or two clocks. The sequencer allows at any given time as many as four quads to interpolate a
`parameter. They all have to come from the same primitive. Then the sequencer controls the write mask to the GPRs
`to write the valid data in.
`
`Instruction Store
`3.
`There is going to be only oneinstruction store for the whole chip. It will contain 4096 instructions of 96 bits each.
`
`It is likely to be a 1 port memory; we use 7 clock to load the ALU instruction, 1 clocks to load the Fetch instruction, 1
`clock to load 2 control flow instructions and 1 clock to write instructions.
`
`The instruction store is loaded by the CP thru the register mapped registers.
`
`The VS_BASE and PS_BASE context registers are used to specify for each context where its shader is in the
`instruction memory.
`
`For the Real time commandsthe story is quite the same but for some smail differences. There are no wrap-around
`points for real time so the driver must be careful not to overwrite regular shader data. The shared code (shared
`subroutines) uses the same path as real time.
`
`4, Sequencer Instructions
`All control flow instructions and move instructions are handled by the sequencer only. The ALUs will perform NOPs
`during this time (MOV PV,PV, PS,PS) if they have nothing else to do.
`
`5. Constant Stores
`
`5.1 Memory organizations
`A likely size for the ALU constant store is 1024x128 bits. The read BW from the ALU constant store is 128 bits/clock
`and the write bandwidth is 32 bits/clock (directed by the CP bus size not by memory ports).
`
`The maximum logical size of the constant store for a given shaderis 256 constants. Or 512 for the pixel/vertex shader
`pair. The size of the re-mapping table is 128 lines (each line addresses 4 constants). The write granularity is 4
`constants or 512 bits.
`It takes 16 clocks to write the four constants. Real time requires 256 lines in the physical
`memory (this is physically register mapped).
`
`The texture state is also kept in a similar memory. The size of this memory is 320x96 bits (128 texture states for
`regular mode, 32 states for RT). The memory thus holds 128 texture states (192 bits per state). The logical size
`exposes 32 different states total, which are going to be shared between the pixel and the vertex shader. The size of
`the re-mapping table to for the texture state memory is 32 lines (each line addresses 1 texture state lines in the real
`memory). The CP write granularity is 1 texture state lines (or 192 bits). The driver sends 512 bits but the CP ignores
`the top 320 bits.
`It thus takes 6 clocks to write the texture state. Real time requires 32 lines in the physical memory
`(this is physically register mapped).
`
`The control flaw constant memory doesn’t sit behind a renaming table. It is register mapped and thus the driver must
`reload its content each time there is a changein the control flow constants. Its size is 320*32 because it must hold 8
`copies of the 32 dwords of control flow constants and the loop construct constants must be aligned.
`
`The constant re-mapping tables for texture state and ALU constants are logically register mapped for regular mode
`and physically register mapped for RT operation.
`
`Exhibit 2035 deckacd_Sequencerdes
`
`73669 Bytes** © ATI Confidential. Reference Copyright Notice on Cover Page © »~*
`
`AMD1044_0257724
`
`ATI Ex. 2109
`IPR2023-00922
`Page 14 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 14 of 326
`
`

`

`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`
`
`| 5
`
`PAGE
`15 of 51
`
`|
`
`DOCUMENT-REV. NUM.
`GEN-CXXXXX-REVA
`
`
`;
`eT:
`24 September, 2001
`4 September, 201544
`.2 Managementof the Control Flow Constants
`The conirol flow constants are register mapped, thus the CP writes to the accarding register to set the constant, the
`SQ decodes the address and writes to the block pointed by its current base pointer (CF_WR_BASE). On the read
`side, one level of indirection is used. A register (SQ_CONTEXT_MISC.CF_RD_BASE) keeps the current base pointer
`to the control flow block. This register is copied wheneverthere is a state change. Should the CP write to CF after the
`state change, the base register is updated with the (current pointer number +1 )% number of states. This way, if the
`CP doesn't write to CF the state is going te use the previous CF constants.
`
`5.3 Managementof the re-mapping tables
`
`5.3.1 R400 Constant management
`The sequencer is responsible to manage two re-mapping tables (one for the constant store and one for the texture
`state). On a state change (by the driver), the sequencerwill broadside copy the contentsofits re-mapping tables to a
`new one. We have 8 different re-mapping tables we can use concurrently.
`
`The constant memory update will be incremental, the driver only need to update the constants that actually changed
`between the two state changes.
`
`For this model to work in its simplest form, the requirement is that the physical memory MUSTbeat least twice as
`large as the logical address space + the space allocated for Real Time. In our case, since the logical address space
`is 512 and the reserved RT space can be up to 256 entries, the memory must be of sizes 1280 and above. Similarly
`the size of the texture store must be of 32*2+32 = 96 entries and above.
`
`5.3.2 Proposal for R400LE constant management
`To make this scheme work with only 512+256 = 768 entries, upon reception of a CONTROL packet of state + 1, the
`sequencer would check for SQ_IDLE and PA_IDLE andif both are idle will erase the content of state to replaceit with
`the new state (this is depicted in Figure 8: De-allocation mechanismFigure-8:De-allocation-mechanismFigure 8:De~
`allocation-mechanism). Note that in the case a state is cleared a value of 0 is written to the corresponding de-
`allocation counter location so that when the SQ is going to report a stale change, nothing will be de-allocated upon
`the first report.
`
`The second path sets all context dirty bits that were used in the current state to 1 (thus allowing the new state to
`reuse these physical addresses if needed).
`
`Exhibit 2038 dockdGo_Sequencerdec
`
`73669 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © »~*
`
`AMD1044_0257725
`
`ATI Ex. 2109
`IPR2023-00922
`Page 15 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 15 of 326
`
`

`

`
`ORIGINATE DATE
`EDIT DATE
`
`24 September, 2001
`
`4 September, 201544
`ae DN
`
` Logical Acdress
`
` [0 «— Read
`
`
`FreeLis
`
`
`
`
`
`
`
`
`
`
`
`
` pirto
`
`
`Address
`
`to Allocate
`
`R400 Sequencer Specification
`
`
`
`
`Renaming Table
`Context 0 =>
`
`r|
`
`Current/Last
`Context
`
`
`
`(8 rows of 16-8 |.
`
`
`bit physical =>
`128 entries copy
`
`
`
`
`in eight clocks)
`
`
`
`
`
`=
`
`.
`Logical Address
`& Context
`
`
`|
`|
`
`
`Context N
`|
`
`|
`
`
`Physical
`Address
`
` Staging Data
`
`Physical
`Memory
`
`
`Global Register
`Data Bus
`Constants
`
`Buffer
`
`
`
`
`
`
`
`
`-
`
`Staging Write Addr|
`
`:
`
`|
`I
`
`1
`
`
`
`
`Dealloc
`
`physical
`'
`next
`address
`Counts |
`
`physical
`to
`
`adcress
`scnedile
`ready
`for
`
`for allocate
`dealloc
`Seq
`en
`snral
`
`
`
`Constant
`|
`Logicaladdress
`Request
`GlbRegBus

`_ a
`
`when Isb are zero
`|
`This
`|
`i
`
`first word of write Context||.
`
`
`| Dirty
`Renaming Table
`y
`!
`=)
`
`a
`:
`for 1 Context
`er
`|
`!
`
`
`
`Curentlast||coical | Loyieal | | Context &
`
`
`Physical
`i
`_
`i
`|
`Logical
`|
`
`Address
`Address
`| Address
`“—~ Address —]
`|
`|
`only
`(ifset
`|
`|
`|
`de-
`don't
`!
`
`
`| allocate
`allocate
`———
`ifset)
`or de-
`|
`allocate)|
`I
`
`
`Copy Last held aboveto
`Current Context on receipt
`of Set Constant for a
`newcontext (Hide loading
`behind Set State load - 16 clocks)
`all cther Set States just write one
`entry to current state.
`
`
`
`Renaming
`table
`Contexts
`
`Exhibit 2035 docR400_Sequencerdac
`
`73869 Bytes** © ATI Confidential. Reference Copyright Notice on Cover Page © »*
`
`Figure 7: Constant management
`
`AMD1044_0257726
`
`ATI Ex. 2109
`IPR2023-00922
`Page 16 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 16 of 326
`
`

`

`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`4 September, 201544
`
`DOCUMENT-REV. NUM.
`
`
`GEN-CXXXXX-REVA
`|
`17 of 51
`
`PAGE
`
`Free List
`
`CNT VALUE
`
`BEALOC
`COUNTERS
`
`I|
`WRI TE_ENASLE
`
` 24 September, 2001
`
`
`
`SQ_STATE#
`
`
`
`
`|
`
`|
`
`|
`|
`PREVIOUS
`
`le—nor| ome
`
`|
`NEW
`|
`|
`|
`|
`STATE
`
`— |
`——
`‘
`
`VALUE
`
`|
`|||
`
`VALID
`
`[oo
`
`
`
`
`
`
`ra
`
`IDLE
`——| AND -#——PA_IDLE
`he CP_NEW_STATE_CNTL—
`———_!
`REMAPPING
`~¢—_—_SET CTX BITS
`TABLE
`
`
`Figure 8: De-allocation mechanism for R400LE.
`
`5.3.3 Dirty bits
`Two sets of dirty bits will be maintained per logical address. The first one will be set to zero on reset and set when
`the logical address is addressed. The second onewill be set to zero whenever a new context is written and set for
`each address written while in this context. The reset dirty is not set, then writing to that logical address will not
`require de-allocation of whatever address stored in the renaming table.
`(fit is set and the context dirly is not set, then
`the physical address store needs to be de-allocated and a new physical address is necessary to store the incoming
`data.
`lf they are both set, then the data will be written into the physical address held in the renaming for the current
`logical address. No de-allocation or allocation takes place. This will happen when the driver does a set constant
`twice to the samelogical address between context changes. NOTE:
`It is important to detect and prevent this, failure
`to do it will allow multiple writes to allocate all physical memory and thus hang because a context will not fit for
`rendering to start and thus free up space.
`
`5.3.4 Free List Block
`A free list block that would consist of a counter (called the IFC or Initial Free Counter) that would resel to zero and
`incremented every time a chunk of physical memory is used until they have all been used once. This counter would
`be checked each time a physical biock is needed, andif the original ones have not been used up, us a new one, else
`check the free list for an available physical block address. The count is the physical address for when getting a
`chunk frorn the counter.
`Storage of a free list big enoughto store all physical block addresses.
`Maintain three pointers for the free list that are reset to zero. The first one we will call write_ptr. This pointer will
`identify the next location to write the physical address of a block to be de-allocated. Note: we can never free more
`physical memory locations than we have. Once recording address the pointer will be incremented to walk the freelist
`like a ring.
`The second pointer will be called step_ptr. The stop_ptr pointer will be advanced by the number of address chunks
`de-allocates when a context finishes. The address between the stop_ptr and write_ptr cannot be reused because
`they are still in use. But as soon as the context using then is dismissed the stop_pir will be advanced.
`The third pointer will be called read_ptr. This pointer will point will point to the next address that can be used for
`allocation as long as the read_ptr does not equal the stop_ptr and the IFC is at its maximum count.
`
`Exhibit 2038 dockdGo_Sequencerdec
`
`73669 Bytes*** © ATI Confidential. Reference Copyright Notice on Cover Page © »~*
`
`AMD1044_0257727
`
`ATI Ex. 2109
`IPR2023-00922
`Page 17 of 326
`
`ATI Ex. 2109
`
`IPR2023-00922
`Page 17 of 326
`
`

`

`PAGE |
`
`R400 Sequencer Specification
`EDIT DATE
`ORIGINATE DATE
`
`i aie E
`
`24 September, 2001
`4 September, 207544
`18 0f51
`; 535 De-allocate Block
`This block will maintain a free physical address block count for each context. While in current context, a count shall
`be maintained specifying how many blocks were written into the free list at the write_ptr pointer. This count will be
`reset upon reset or when this context is active on the back and different than the previous context.
`It is actually a
`count of blocks in the previous context that

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket