`EOITOATE
`(date,\@ '"d MMMM,
`17 January, :2110,2
`�
`Author. Alktew Gmbe r, /irult Skende, Tom Frtslflger
`
`.-
`
`I
`
`CIOCIJM EN!T-RE.V. NUM,
`G.EN-CXXXXX-REVA
`
`PAGE
`1 or.cg
`
`i5S�To:
`
`I eopyiNo:
`
`Shader Processor
`
`Rev 2.102
`
`0¥erview: This d0C11.1m1rnt dH cribes lhe ovem!I a1chitecture or the Shadera, me rraces, par1ilionrng into func'lfon11I bfocks as
`wgll as the 1iming of the shade1 pipoline. Ifs inl:ended £or use by hardwa re d1Jsignen;_
`
`AurOMATICALL Y UPDATED FIELDS:
`Docum&'lllt Loc ation
`: flrna_mndi_rnobih::J .•. Jdoc._ 'pa rbs J�P
`Cunent ln'lrimet Ssarch Title: Shade-r �roe�
`�am�pt
`
`APPFt:OVM..s
`
`Sig111a11.Jrel0al.e
`
`Ref'l\erk$
`
`·copyright 2001, ATI lethn ologi c-.s Inc. A'll righ'ls rcscrv.cd. The material in fiilis doQ.lroont conslitu� ilil unpubli&hcd work
`
`
`
`to pro'odde 1notlee lhal ATI O'M\S � eopyrl9ht , Oiil$ llm'J)Ublished
`
`ore?Jledl �n 2001. TJ-ie-use oC this. oop'.,'righl flO'lj(e 1$ lnteri<Jed
`
`worlc. The copyright notice is. not an adtrnisslon Chai publie;a'lion has occumre,d, Thi!! work con'tam con denlial, proprietary
`may bl!i usad. ireprodu ootl, or ·trans.rmtBd in, any farm or b:, any
`information and trade secratis of All. No part or lhis document
`mcaM wUhout 'the prio-r WTilk:-.n perl'ni5$ioo of ATI Tcchm::ilogic&. I'm:,·
`
`ATI Ex. 2067
`IPR2023-00922
`Page 1 of 448
`
`
`
`Vat
`Ms
`t
`
`ORIGINATE DATE
`17 January, 2002
`
`EDIT DATE
`[date ‘vay “id MMMM,
`
`DOCUMENT-REV. NUM,
`R400 ShaderProcessor Model
`
`PAGE
`
`2of4ag
`
`aSUSE tna Fh ergs pent eaten Suse Gin Pic dude VN CdSe ar TS Vad CRT a goer fea Es 6
`11.
`
`
`1.1 3 ‘Previous Instruction Result... ee ee ccce esc cese sence seesesuseseceseussemepeeesseauesasseseeseensnenenes ie
`1. NSNE SaI sie servi dismerdst ax ann Raed AAW EERE esnaRL of
`Lee Pivel Shader........c.cc0.
`syste nina eae lp da eee dane ang en apencnnas
`M FORMAT.SPREE REE ERSTE EE PRE REE RREEE PERERA RRO RRR RRR R REET P REPRE ee Pee t
`CSSA Ae SEE STEEN SRS EST EE SS OEE TES SETS ESTEE SEES PSST ESTEE STSSee t
`
`te ho
`
`
`
`ALU ©
`
`Sd oo
`
`ImSEP|
`
`fe0ho nN leae
`
`joie EeeH
`“Pz =
`instruction format...
`See esy Sagopa doa aa Se gs RASSEUV
`Instruction Word intarp station praitha siting hig eh ct sia. wyguscsieucdanslaced jade eeedae Tear cussed 10
`Ar an DHT EHV FOUNTENSSsciatcxascincvsaeaccwete
`canmeadstandaiacamennedaininds vamoaennayidansneanns
`LdPoodfiedjt—fyNEE
`
`Ps, CL
`CUS et
`taunimenien
`oniania
`nina
`nputantOutputModifiers davis hada tepldiivideubeodanGapuuicuudlaynedertehousvuel riseddsineccbaakvaseetaacies
`
`ta hy La ta
`
`RT MMPNRRM ses 2h acy sss nas fac a i doc Aca oe URILL EEGa sehaa ese a een 11
`
`balaad hoaf he
`
`Export and Predicate related decoding ..u..ccccc cc ice cu ees eevee 11
`Export Types amd Addresses oo... ccccecssesessessersenssensenssnerenseessnsenssenresneaserseenenerten 11
`ec cc 5 gS dg betta Fae RSUEESE GOSGEI GSUGBEES 13
`
`BSIMDarchit@cure..........ccsceseesceesseeerersoneesseussay snes susvaucedecuevauseevenereueervenesns ces 21
`a
`
`=\5'|2:55ave
`Lovet Diagram ofa ShadercPipetn sidsaesescypuedasedeCoEPeaacladeedualeeceeee21
`cnyetr| mk =(mM
`
`A — Li
`Cr —=
`to
`
`an = oo
`in ie oo
`(Or — — co
`
`poho
`co|
`
`r Pi
`
`uencer: Constant addr
`
`ee eeee es es 28
`
`wbemal Interfaces...........0...cce seta ig oli Sine uaa tif pbb aif eb ania wan ec 2a
`Pbearrnirey CCWVERCHT. ........eeeacenc ces conser eesconersnceaecnesnesreasessess seaeeanensrenensecesrenstarensees nesses tea 22
`Shader Engine to Texture Fetoh Unit Bus. secssessesseesseuseuscsseuseuseessccessseruennsune dead
`uencer ta
`her Pipes): Teter
`Stall coo ccecccccecseesencccuscssesereecesscanscsatermecrevveavencevanen 2d
`talole|ek "i.
`scanGonverter to Shader Pipe: I BUS cee cees cus ce ecu eeu eeeseesseeseueueueeuseue ues 2d
`00/60oo
`to
`Shader Pi
`-b
`2 Ini
`EM
`seen acs laa seeders aan sve
`uencer to Shader Pipes
`dcast: Parameter Cache Read Oontroe BuS ose 24
`|cnjen —| al
`equencer to Shader Pipe: GPR, Parameter Cache control and auto counter .............24
`eayoo
`hader Pipe
`to Shader Export
`(SX): Parameter data out of Parameter Cache ............. 25
`hader Export (5%) to Interpolators: Parameter Cache Return bug... 25
`hader Pipe
`to Shader Export
`(SX): PimelVertex write to SXoo. 25
`frupcr)eri—_mefmm[meCkPho|
`Sequencer tose RRSSa VSTNRIIENoapeaesicajcSeayeases yadanenaut saGolpdebbaglavaeielevpeek 26
`— ok
`Sequencer toee SCREearEMScascnicepciceaccdpeccasetcadeyqacuanahepiasvaweqanaptanaesievdpenpel 24
`
`‘PARAMETER INTERPOLATION.. ere
`
`OO)“(oole>ims
`SALVE AA sinieet ee eeeeee iceeee 30
`HARDWARE IMPLEMENTATION SPECIFICS......c.cccccsctcctstcnsccssnssseersetserssisonssneesseonsanees 31
`General Information on the Shader Floating Point arithimettic ...,..............ccccceceecseseeseesensees 31
`PR
`Meters: sare LOY BTS on ecrccen tances ccattdacessesusadeeacdensdeseccusequidcstaarsansaysetsenicacageeapsanend |
`coyPf — sai
`Inter
`BARDae FessShr ineSETt erie IN ipsa eS Uk ae
`
`Int
`MUMS INA MRED: cg gosg esau arvana yen uh u's ike wy ODS a Aa eno bl gin NEEM worm a2
`
`a iN fi
`Parameter Selection Unit ...........
`sounpunuonqanc) queseusnnessantonssuasssetonaetusnessey SOE
`IN
`Parameter Difference &
`lindrical Wrap En
`Sigil Guadvydinaua Mabe add se bastee ag
`oo ) a to
`ATI Ex. 2067
`IPR2023-00922
`Page 2 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 2 of 448
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`i?es2002
`
`[dateWa*“d MMMM,
`
`GEN-CRAOXN-REVA
`
`line MantiseaCalculation.edededesdeShinemieetuerdnsctentbocctssibdaiweesieinningarecasel
`
`Post Proce
`
`Sof49 B.4.0.17.12
`
`
`6.4.7.1.1.1.3 High Precision
`B41 1114
`a4. 1 iF LE;
`1 5
`horde | LY ss eee etree eter retirerere Lint ret Crearrier re Etre Liter etretrae ere errr eter tis
`
`8.4.1,1,1.1.6 Gaaiinatan of High PrecCmATCOMTS ooncteatenccteneeesen
`i
`OF EN sisiductnaivetibueainenhbunddainhinnie Peeeeee Pests esse
`eeea)
`
`Precision Pi
`
`Pipeline
`
`3.
`
`ATI Ex. 2067
`IPR2023-00922
`Page 3 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 3 of 448
`
`
`
`
`
`ORIGINATE GATE
`
`EDIT DATE
`
`1? January, 2002
`Revision Changes:
`
`[date Wa} “od MMMM,
`rumen —
`
`DOCUMENT-REV. NUM,
`
`R400 Shader Processor Model
`
`Rev 0.0 (Steve Morein)
`Date: April, 2001
`Initial revislor,
`
`Rev 0.4 (Andi Skende)
`Date: May 09, 2004
`
`Rev 0.2 (Andi Skenda)
`Date: May 21, 2001
`Rev 0.3 (Andi Skende)
`Date: June 19, 2007
`
`Rov 0.4 (Andi Skenda)
`Date: June 20, 2001
`Rev 0.5 (Andi Skendo)
`Date: July 31, 2001
`
`Rev 0.6 (Andi Skende)
`Date: August 17,2001
`
`Rev 0.7 (Andi Skende)
`Date: Movember 6, 2001
`
`Rev 0.4 (Andi Skende)
`Bate: Movember 27, 2007
`Rev 0.5 (Andi Skende)
`Date: December 10, 2001
`
`Rev 1.0 (Andi Skende}
`Date: January 15, 2002
`
`Rev 1.4 (Andi Skende}
`Date: January 21, 2002
`
`Rev 1.2 (Andi Skende)
`Date: January 22, 2002
`Rev 1.3 (Andi Skende)
`Date: January 20, 2002
`
`Becurnent started
`
`Updated, added the instruction formant, mitial block
`diagrans and preliminary interlace descnplien
`
`the SP<>TEX,
`
`A more detailed description of
`RE/Sequencer <->5P interfaces.
`functional
`Added the paragraph related to shader
`limitations that the compiler needs to be aware af.
`A new updated and compressed version of ALU
`instruction format.
`Updated the Intreducton of this document. A new
`Pipeline Timing Diagram was inserted.
`Merged in the Shader Hardware Spec, A more detailed
`description of the interfaces with the other blocks was
`added. Updated some of the diagrams to a more
`correct representation of the datapatirs.
`
`A more detailed description/definition of Shader
`interfaces with the other blocks.
`instructen
`the
`of
`A more detailed desenption
`supported by Shader Processor and it's relation to
`instruction set exposed at API level.
`Updated the Alu instruction word definition and the bist
`of the alu instruction apeedes supperted by the shader
`pipe ALU unit
`Updated the definition of the Extemal Interfaces
`
`Updated the definition and naming of some of the
`extemal
`interfaces, cearranged fhe ALU insiruchon
`word definition such that the fields are chord aligned,
`The instruction opoode definition was updated and
`expanded.
`Updated most of the diagrams. Updated the External
`Interface definitions. Added a description of
`the
`Parameter
`Interpolation Units. Added a diagram
`description of the GPR write data paths.
`Updated some of the external interface definitions.
`Specified
`the
`expected
`behavior
`of
`hardware
`implementation of some shader opcode with some
`comer case Values as Inpul arguments. The MS
`Reference Raslerizer shader was used as guideline
`Updated some of the extemal interface definitions.
`
`1. Changed the order of the swizzle bite per
`channel in the instrection word such that the
`LS8s belong to the red channel
`...MSBs for
`the alpha channel,
`2. Extended the $0SP_Instuct bus width to 21
`bits to account for the Scalar Opeode being &
`bits instead of 5.
`1. Modified the instruction word format.
`2. Added new aprodes (DST and MUL_PREV2)
`and redefined the opceoded values for both
`vector and scalar instructons,
`
`3. Updated the 50SP Instruction$interface
`definition,
`
`Rev 1.4 (Andi Skende)
`Date: August 12, 2002
`
`ATI Ex. 2067
`IPR2023-00922
`Page 4 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 4 of 448
`
`
`
`mete]
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV, NUM,
`
`i? January, 2002
`
`[date Va “d MMMM,
`
`GEN-CRAOXN-REVA
`
`Introduction
`
`Shader Pipe (SP) serves as the central Anthmetic and Logic Linit (ALU) for the R400 Graphics Processor. There are four
`identical Shader pipelines in the R400 architecture. Differently fom previous AT! architectures, the R400 Shader Pipe truly
`represents a Unified Shader Architecture (USA).
`In R400, both vertex and pixel shading operations are implemented
`through the shader units. The R400 Shader Pipe represents a Single Inetruction Multiple Data (SIMD) architecture. All the
`shader units of each and every pipe execute the same ALL instruction on different sets of vertex parametersipivel values.
`The building blocks of the R400 shader units execute operations on single precision IEEE floating-pomt values.
`
`ATI Ex. 2067
`IPR2023-00922
`Page 5 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 5 of 448
`
`
`
`rumen —
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM,
`
`1? January, 2002
`
`[date Vg “od MMMM,
`
`R400 Shader Processor Modal
`
`State
`
`1.1 Shader State
`
`1.1.1 GPRs (Genera! Purpose Registers)
`The general-purpose registers are 126 bils wide, composed of four 32-ba values. Depending on the operation these
`values can be interpreted, among others, as ABGR, WZYx, ORTS, QWVU, or AUYV respectively; to simplify matters
`the only ihe alias W27X will be used throughout this document.
`
`To hide the latency of memory accesses the shader pipe will switch between different vectors. This is the same as the
`idea of “microlhreading’ thal some advanced CPU's are investigating. The lange register file is split between the
`vectors éxeculing in ihe shader pipe. The management of the shader register file is automatic, and mot visible to a
`program executing on a vector, excepl that a program is required to declare the number of GPRsit needs to execute,
`The hardware will not start a vector uni the required number of registers is available. There is a direct tradeoff
`between the number of registers each program/vector needs and the number of vectors than can be simultaneously
`resident.
`if there are too few vectors resident, lhen the latency of memory accesses can mo longer be hidden and
`performance suffers.
`
`There are a total of 128 general purpose registers. A given shader can request al most 64 GPRs, Requesting a very
`lange number of GPRs will make i difficult to hide mvemery latency, but the program will still execute and generate the
`comect result.
`
`Most pixel programs are expected to have less than eight registers; verlex programs are expected to have less than
`sixteen registers.
`
`The number of registers a program needs is the mamum number of registers it meds at any instruction. a program
`needs only 3 general purpose registers nearly all of the time, except fora short period when it needs &.if still needs to
`allocate eight, A significant performance optimization is for the compiler to reorder the instructions to minimize the
`number of needed registers.
`
`
`
`Notation: ROW refers to the bits &6 to 127 of ragister one (50 does ROA).
`
`1.1.2 Constants
`
`There are algo (512) constant registers available lo vertex and pixel shaders in the primary command siream.
`
`127
`
`95
`
`BS
`
`31
`
`0 Constant
`
`
`
`Real-time hae it's own 256. Conatants are physically part of the Sequencer unit. As it will become clear by reading the rest
`of this document, the content of the constants can be made available to the ALU units of the shader pipes in the fomm of one
`of the possible ALU operation arguments. The ALU instruction word provides for that,
`
`The constant file is shared between veriex shaders and pixel shaders, it is the drivers job to allocate one section to pixel
`shaders and another to vertex shaders to match the O30 programming model: other API's may allow more freedom.
`
`To be able te support multiple textures easily, and to save hardware area, the texture state regisiers are stored in separate
`pod of constant registers. Each texture constant holds 192 bits of texture state. Rather than have four or six sets of texture
`registers as we do in the R100, R200, and ROG, by storing them in constant memory we can save area by reusing the logic
`
`ATI Ex. 2067
`IPR2023-00922
`Page6 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 6 of 448
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`i? January, 2002
`
`[date Va “d MMMM,
`pant
`
`GEN-CRAOXN-REVA
`
`DOCUMENT-REV, NUM,
`
`already needed to update the constant registers in order, Since any single texture instruction wall only feteh from one texture
`we do not need the simultaneous access we would get with implementing this as “normal” regisiers.
`
`1.1.3 Previous Instruction Result
`
`(This section i6. no longer accurate and needs updating)
`Within an ALU clause the result of the previous operation is explicitly available, without requiring a register read
`(due to an exposed pipeline delay, the resull of the previous operation can not be read trom the registerfile without a
`one-instruciion delay shot). There are two distinct previous insimctions, one scalar and one wector.
`This. register is not preserved between the end of one ALU clause and the beginning of another.
`fo can be weed to avoid using anoiher GPR if ihe result is not needed, Also, the output modifiers, which do affect the
`result of an instruction written into GPRs, do not affect the Previous Result content,
`
`1.2 Initial state
`
`1.2.1 Vertex Shader
`
`A vertex shaderinitially has ihe X value of RO sei to the vertex index. No other registers. are filled. The vertex shader
`must use the index io feich the vertex data from the vertex armay(s). The pointers to the vertex arrays should be placed
`in texture consiant registers by the driver.
`
`1.2.2 Pixel Shader
`
`The pixel shader has the interpolated values generated from the values exported by the vertex shader,
`
`If the vertex shader exports io parameter 0, and the R400 is appropriately programmed, then GPR O in the pixel
`shader contains the interpolated values for hal parameter al that piel.
`
`2. Program Format
`(This section is no longer accurate and needs updating)
`A pixel or vertex shader program consists of 16 clauses, eight texture clauses and eight alu clauses.
`The instroctions in a clause will be executed sequentially, If a given instruction is implementing, for example,
`T*5+O0(T = texture for SRC A, 5S = Specular for Source 6, O = Diffuse for Source C}, i's ihe Sequencer's task to
`resolve the dependencies between the ALU clause and the respective texture clause. In olher words, he sequencer
`will not sue the ALU instruction using texture data as input to the shader pipe, until the texture request has been
`issued to and serviced by the texture pipe. In general, the Shader is not aware of the origin of the SRC A, SRC B and
`SRC C data (texture, diffuse, specular, verlex parameters etc), Three address pointers into the register files (one for
`each operand) are all the shaders need fo fetch these operands. In reality, a5 A will become more evident later in this
`document, there is mo need for the pointer values to be passed to the shader unas. This is related to ihe GPR's
`Padwritée Mechanism we have chosen to implement.
`
`3. ALU
`
`3.1 ALU structure
`
`ALU consist of bvo distinc! unis: the “Vector ALU and the “Scalar ALU. The Vector ALU performs operations in
`parallel across a 4-component vector, while the Scalar ALU performs operations on a single component of a vector
`whieh is then replicated across all components, A single instruction will ‘co-issue’ both a Vector and a Scalar
`instruction, Almost all scalar insinuctions require SrcC as an operand, When the Vector operation is only using SrA
`and SrcB as operands (such as in a MUL (Multiply) instruction), the scalar pipe is tree to use Srec as it wishes. When
`the VEcior pipe is abo consuming SroG, such as in a three operand instruction like MULADD (Multiply and ADD), Srec
`is. fixed for the scalar pipe.
`It's important to understand that the given scalar operation sill occurs on SoC. Under
`most circumstances this will result in undesirable behavior unless the scalar operation is benign and has masked its
`destination writes.
`
`For more details on the overall iructure of he Shader ALU, refer to the figures in Section 5 of this document.
`
`ATI Ex. 2067
`IPR2023-00922
`Page7 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 7 of 448
`
`
`
`Select it for selecting Constant va. Register
`O: conten
`Selec ba for selecting Constant vs. Register
`0: Constant
`1: Register
`Select ba tor selecting Constant vs Register
`O: Constant
`1: Register
`
`‘VectorGpeode|@288_|5| Opcode for Vector instruction
`SRC A Register/Consi-
`87:80
`Location of SRC A in the Register or Constant file
`ant Polnter
`i Register, Bite [T]-[80] denote:
`Bit [87]
`0: De not execute ABS on inpul register
`1: Execute ABS on input regrter
`Bit (86)
`:
`O: Logical register addressing
`4: Currant Loop index relative register addressing
`Bits [85)-(B0] location of SRC A in the registerfile
`Hf Constant, Bits [87]-{20) denote:
`_Bits [87)-(60) bocation of SRA in the constant file
`
`SRC 8 Register/Const-|for2 Leeation of SRC 8 in the Register or Constant file
`Refer to SRC A Reg
`ant Pointer
`SRC C Register/Const-
`Location of SRC C in the Register or Constant file
`ant Pointer
`Refer to SRC A Reg
`Constant)
`The address pointer into the Constant file is relates to some base address
`Logical/Relative
`register (works in conjunchen with Relative Addreas Register Select)
`0: Legical eanatant addressing
`1, Relative constant addressing
`The address pointer into the Constant file is relatve to some base address
`register (works in congunchen with Relative Addreas Reglater Select)
`0: Logical constant addressing
`1: Relative constant addressing
`This bat detenmines the address register used as base register when
`
`creas—o—h— 1a Megate_
`
`indgong is
`
`relate,
`
`Logical/Relative and Constant’ Logical/Relative fields.
`O: Current Loop index relative
`1 Address Register relative
`Bite (60/59)
`OX: No predication
`10: Predicated — 1 means skip, 0 means execube
`11; Predicated -— 0 means skip, 1 means execute
`
`DOCUMENT-REV. NUM, 1? January, 2002
`
`ORIGINATE DATE
`
`EDIT DATE
`
`3.2 ALU instruction format
`
`[date Vg “od MMMM,
`
`R400 Shader Processor Modal
`
`There are two opcodes present in the ALU instruction, one for the Vector operation and one for Scalar operation, The
`idea is that we can allow a 4-component vector operation (if ihe compiler permits} cossued with a Scalar Operation.
`The Sealar unit may use SRC C, depending on whelher this source is being used by the vector operation, Please refer
`to Section 8 of this document on the limitations of a Vector or Scalar instrvction issuing.
`
`Constant
`Logical/Relative
`
`Relative
`
`Register ener
`
`Constant
`
`| is used in conjunction wih Constant)
`
`Predicate Select
`
`ATI Ex. 2067
`IPR2023-00922
`Page 8 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 8 of 448
`
`
`
`SRC A Swizzle
`
`SRC 6 Swizzle
`SRC C Swizzile
`
`Scalar Clamp
`
`;
`
`Scalar Write Mask
`
`Vector Write Mask
`
`
`
`Gefines which out of 32 bet words (four of tham) in the scalar result is written
`back in the Register file. There's one bit per channel.
`Bit [23]
`0: Leave ihe current value
`1: Write Scalar Ww
`Bit [22]
`O: Leave ihe current value
`4: Write Scalar 2
`Bit [24]
`O: Leave the current value
`1: Wirte Scalar ¥
`Bit [20]
`0: Leave the current value
`1: Write Scalar *
`Getines which out of 32 bet words (our of ther) in the vector result is written
`back in the Register file. There's ome bit per channel
`Bit [19]
`0: Leave the current value
`1: Witte Vector W
`Bit [18]
`0: Leave the current value
`1) Write Vector 2
`Bit [17]
`0: Leave the current value
`4: Write Vector ¥
`Bit [78]
`0: Leave the current value
`1: Write Wector X
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV, NUM,
`
`i? January, 2002
`
`[date Va “d MMMM,
`
`GEN-CRAOXN-REVA
`
`2 bite for each component
`Bits (52)51] — Wi channel swirzle
`OO: leave Wi
`O41: x
`41: ¥
`W1:2Z
`Bits (50)[49] — 2 channel swizzle
`OO: leave 2
`O1 W
`10: ¥
`Wy
`Bits (48)[47] -— '¥ channel swale
`OO: leave ¥
`O1Z
`10: Ww
`WX
`
`Bits (46)[45] = % channel swale
`O0: leave
`oy
`10:2
`14:
`2 bits for each component(refer io SRC A Swizzle
`2? bits for each component
`(refer to SRC A Swizzle
`poode for the Scalar instruction
`
`ATI Ex. 2067
`IPR2023-00922
`Page 9 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 9 of 448
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM,
`
`1? January, 2002
`
`[date Vg “od MMMM,
`punt
`
`R400 Shader Processor Modal
`
`and scalar operations.
`
`Scalar Destination|15:8 Bit [18] denotes the destnation of the ALL results
`
`
`Pointer
`0: Register file (Scalar and Vector]
`1: Export file (Gcalar and Vector)
`I Register file destination, Bite [14]-[6] denote
`Bit [14] determines whether the scalar dastinaton address into Register file is
`logical or ralatrve.
`O: Logical register addressing
`1: Current Loop index relative register addressing
`Bits [13]-[8] species the address into the Register file for the result of scalar
`operation.
`I Export fila destination, Bite [14]-[8] denote:
`Bit [14] determines parameter export masking behavior. See table in 3.2.1.4
`Bits
`[13]-[E] ate unused, Must Be Zero
`
`Vector Destination=70 Bit [7] determines the ABS input modifier on al constants in instruction
`
`Peinter
`0: Do not execute ABS on all constants
`1: Execute ABS on all constants
`Bit
`[6] determines whether the vector destination address into Regater ar
`Export file is logical or relative,
`0: Logical register addressing (Must Be cero for exports on r400)
`1: Current Loop Index relative register addressing
`Hf Register file destination, Bits [5]40] denote:
`Bits (5)-(0) species the address into the Regester file for the result of vector
`operation.
`i Expert fila destinaben, Bite [5)-[0] denote:
`Bits [5]-(9] specifies ihe address into the Export file for the result of vector
`
`total of 86 bits per instruction. The bit allocation and assignment for the different fields of ihe instruction
`There's €
`word was done with under the limitations that they should be DWORD (32 bit) aligned.
`
`3.2.1 ALU Instruction Word Interpretation
`3.2.1.1 Ar
`nt Selection
`and Pointers
`There can be a maximum of three sources (operands) required for an ALU operation of a vector type.
`The ROO ALU instrection word definition provides location pointers into the Register file (GPiRs) or Constant file for
`eet of the three sources (SRC A Register/Constant Pointer, SRC B Register/Constant Pointer, SRC B Register/Constant
`nter}.
`
`3211.1 Logical vs, Relative Registers
`SrcA, SrcB and SrcC GPRlocations denoied by SRC A (B, C} Register/Constant Pointer fields of the ALU insinuctian
`word, can be bogical as well as relative addresses. If relative, they are relative to the Current Loop Index present in the
`Sequencer state.
`
`S102 Logical vs, Relative vs, Absolute Constants
`Constants. can also be addressed in logical and relative fashions along with an absolute mode. When relative, they
`Can be relative to either the Cument Loop indéx (CLI) of Address Register (AR) in the sequence state. The truth table
`below shows the instrection fields that are used to decode the nature of the constant values.
`
`Constanto
`Lagieal/Relative
`0: Logical
`
`a
`
`Same as Constant
`
`Same as Consiantt
`_ Absolute
`Absolute
`
`Relatve, CLI|Logical Same as Constant!
`Logical | Relative, CLI|Same as Constantt
`
`
`R
`
`gical
`
`Same as Constant
`Sarre a6 Constanti
`
`_ Relative, AR
`
`Note from the table that if both Constants are relative, they are relalive to the same value, either the Cumeni Loop
`Index (CLI) or Address Register (AR),
`
`ATI Ex. 2067
`IPR2023-00922
`Page 10 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 10 of 448
`
`
`
`mete]
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV, NUM,
`
`i? January, 2002
`
`[date Va “d MMMM,
`
`GEN-CRAOXN-REVA
`
`Constant® refers to the first constant in the instneciion; Constant] and Constant2 refer to the second and third
`consiants in the instruction respectively.
`
`3.2.1.2 InputandOutput Modifiers
`The R400 ALU Instruction word definition provides for only three input modifiers for each of the ihree sources,
`Negate, ABS and Swizzle. When the source is a Constant value,
`inpul modifier ABS applies to all constants in
`instruction. In all situations, ABS is ahvays applied before Negate. Input modifiers do not apply to PreviowsScalar.
`
`The R400 ALU Instruction word provides for two output (result) modifiers: Write Mask which only affects the results
`going into GPRs but not the PreviousScalar and Clamp.
`
`3.2.1.3 GPR write-backs
`The table below deseribes the precedence onder for “mined” use of Scalar Write Mask and Vector Write Mask for
`GPR wrile-backs when the Scalar Destination Pointer and Vector Destination Pointer in the instrection word
`speciy the same GPR. This is done per component (each MASK field is 4 bits wide, one bil per component/channel).
`
`Oo
`
`Dont writerit(mask)_ _
`
`Result of GPR write-back
`
`3.2.1.4ExportandPredicaterelateddecoding
`
`Exporis aré allowed from either Scalar or Vector Pipe. Similar to ihe GPR write-backs, masking of export dala is
`permitted. The mask is present in the ALU instruction word. When exponiing, the export address used is ihe Vector
`Destination Pointer present in the instruction word. The Scalar Destination Pointer in this case is abvays ignored.
`The table below deseribes the “mixed” use of Scalar Write Mask and Vector Write Mask per component when
`exports are coissued, The ability to genérate 0.0f or 1.07 during export provides one method for defaults.
`
`Scalar Write Mask |
`O
`
`| Resultof Export
`‘Bit[14]
`O Bont woite (mask)
`4: Write 0.0f
`
`Write1.01
`
`iSense
`
`A few oiler export related definitions and restrictions:
`1) Exporting of 'ColonFog' is a special case,
`a) When exporting Fog, Color must be exported al the same time. Fog will be exported in the Scalar
`pipe and Golor in the Vector pape.
`bj) The SP produces a final export Color by obeying the vector/scalar mask rules for exports. The
`SP does not see bit[14] so when the vecior and scalar masks for a given channel are 0 a O.0f is
`generated, The SP than merges Fog (always from the scalar pipe) inte the final export color,
`Finally, channel masking is applied in (20/5%) only when the scalar and vector masks for that
`channel are 0 and bij14] is 0.
`¢) Note for Fog to work correctly, SW should ahways output ihe same Fog factor from the sealar pipe
`for all masked writes and all channels of Color should be written before ibe shader exits.
`
`3.2.1.5 Expon Types and Addresses
`The location where the data should be put in ihe event of an export is specified by in the destination pointer field of the
`ALU instrection word. Following is a list of ihe possible types of exports and the range of addresses.
`
`Vertex Shading
`0:15
`16:31
`a2
`33:37
`38:46
`
`+16 parameter cache
`- Empty (Reserved?)
`- Export Address
`+5 verlex exports to the frame buffer and index
`- Emply
`
`ATI Ex. 2067
`IPR2023-00922
`Page 11 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 11 of 448
`
`
`
` ral
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`1? January, 2002
`[date Vg “od MMMM,
`R400 Shader Processor Modal
`12 of 49
`
`.
`
`&
`
`runs
`
`47
`48:52
`53:59
`60
`él
`62
`63
`
`- Debug Address
`- S$ debug export (interpret as normal memory export)
`-Emply
`- export addressing mode
`- Empty
`» position
`- Sprite size export that goes wilh position export
`(X = point size, ¥ = edge flag is bit 0, 7 = VtxKill is bitwise OR of bits 30-0 (any bit other than
`sign means Vixkall_)
`
`Pixel Shading
`Oo
`1
`2
`a
`4:15
`16
`7
`18
`18
`20:31
`32
`33:37
`38:46
`a7
`48:52
`60
`61
`62:63
`
`~ Color for buffer 0 (primary)
`- Color for buffer 4
`=» Color for butter 2
`- Cor for bulfer 3
`-Empty
`- Buffer 0 ColonFog (primary)
`~ Buffer 1 ColonFog
`- Buffer 2 ColonFog
`- Bulfer 3 ColonFog
`- Empty
`~ Export Address
`- 45 exports for multipass pixel shaders.
`- Ermply
`- Debug Addmass
`-§ debug exports (interpret as normal memory export)
`> export addressing mode
`- 2 for primary bulfer (2 exporied to "X" component)
`- Empty
`
`ATI Ex. 2067
`IPR2023-00922
`Page 12 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 12 of 448
`
`
`
`DOCUMENT-REV, NUM, i? January, 2002
`
`[date Va “d MMMM,
`
`GEN-CRAOXN-REVA
`
`ORIGINATE DATE
`
`EDIT DATE
`
`3.3 ALU Opcodes
`The following table represents the ALU operations/opcodes supported by the Vector unit.
`Nokes
`component add.
`arands
`possible colesue.
`component
`om.
`orands possible codssue.
`Component mim.
`2 operand: posgible oplesue,
`
`l eperands possible colscue,
`
`Result
`Elee
`Reaolt
`
`+f
`
`Rasult
`Ela
`Result
`= Broa.
`
`+
`
`-FUGGR(Sroniy
`
`= TRUWC(Srcal y
`Ragulet
`Tf
`(
`(8rék = 0.00) a6 (8rek l= Reeult)
`Ragolt
`+= -Lsof
`Srok © Srol + Srey
`
`|
`
`Tf
`
`[Srca == 0.00)
`Raault
`= Seche
`Eloe
`Rasult = Srocy
`(Srck > O.0f)
`Result
`= Spehy
`Elee
`Rasult
`= Spocy
`[Sro0k > OD.)
`Retult = Brel:
`Rasult
`= Srecy
`
`Por component
`
`Pe Soh T
`7 operand: popalbla coisas,
`
`seat gragter than.
`Par fommonet
`2 operands: pogelble eoleaue,
`
`Far-component oof qroater than
`agian .
`2 oporands possible coissue,
`
`FOr component set no equa
`a
`i porslble mol aie.
`
`“fractlonal’ part of
`Fer componont
`Trek.
`1 coerands:
`possible. ooissue.
`er. Sopanent
`Srois
`1 operands:
`Ciotis
`Par component Floor
`1 operand: poztible eoiseue.
`
`oval tip] y= ack
`Per component
`3 oeprandy no colazsoe,
`Par component fomditional move
`Hjizh |] .
`1 operand: no colssue,
`
`Por component conditional move
`greater than sequal,
`i operands: no colssue.
`
`Per component comditional moahe
`greater
`thar.
`i operands: no ooissue,
`
`i component dok prodiect
`Aetult paplicated in all
`channels.
`ae
`possible cod
`J component dot produ
`Aesult
`feplléated in
`sranine Lite
`2 operands
`posstbloe ca
`2 Component dct prodipct with add.
`Assit Feplicated in all
`channels.
`3 operands no colszsue.
`Cubsimap instriction.
`2 operand (8fck = AR.Zaey, Seb =
`An-VREE}: potrible celserue,
`
`Ainate
`eoordinatay
`Remlt = man {scan HM, Sroke ky
`
`&
`
`I
`
`1 component maximum.
`Result
`replicated in all
`shannelz.
`
`foue
`
`ATI Ex. 2067
`IPR2023-00922
`Page 13 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 13 of 448
`
`€
`
`
`R400 Shader Processor Model
`[date Vg "d MMMM,
`(Serene == OLOf) be
`[Srckome ONT]
`{
`fF
`|
`Result = d.0f%
`Eat Pred cataReg (Execute) J
`iee {[
`Basult = Spek. W # L.0ta
`fat hred] Ca tebe (Shiels
`C Larch f= OVE) ae
`Rasult = O.oty
`fatPrediearaleg (Executels
`laa -[
`Result = Srok.W + 1.0£f1
`SetPradicateneg(sklp}1
`
`ff cloned S-ble Integer DOCUMENT-REV. NUM.
`
`EDIT GATE
`
`[SrcA. == Sik] 7F
`
`Tt
`
`1E
`
`JE
`
`(orense » O.0t]
`(
`Result = O.0f2
`fot Predicatoneg (Executes
`
`bt CSrckWee O.0f} 7
`
`|
`
`lse |
`Result = Srck.W + L.oty
`fet Pred] catenegi Seige:
`
`4 E
`
`LSrcr.,
`
`if
`
`] E
`
`(2ccR. wh w= OL) &e
`[
`Result = D.0Es
`Sat Pred cateheg (Executeals
`lea [
`Result = Srck.W 4 Liots
`SetPradicateHag iSkin);
`
`[orca
`fezult
`Killed
`
`lse
`
`]E
`
`Rasult
`Kliled
`
`1 E
`
`lse
`Ragult
`Result .W
`Srcn.wy
`Reoult.é
`Grok. da
`Result. = SrcA.¥ *
`Reale, x
`L-OFF
`Result
`= MAKifroA, &2c8) 7
`SQResultl - FLOOR (Srek.W = 0, 50)7
`it
`(SQResultF s= -250.0f5 [
`SOULE = SOResul thy
`
`Eraba ty
`
`PREDSETE PUSH
`
`ne PUSH
`
`FREDSEDGE - PUSH
`
`|
`
`[acrenant equals
`Fredlcats counter
`Update predicate register.
`femuit ceplicated in all
`four
`channels.
`2 operand: possible colsous,
`foe noOce below,
`
`inoremant not
`Predicate counter
`equals Update predicate ceglater.
`Aesult
`fFeplicated la all
`faue
`channels.
`2 operand) possible colague.
`fee note below,
`
`lncrenant
`Freadileats counter
`greater than: Update predicate
`register.
`Four
`Aezult
`replicated in all
`channels.
`2 oparands possible coleoue.
`fee Hoke Below.
`
`increment
`Predicate counter
`greater than mpials Update
`Predicate caglater.
`four
`Result coplicuted In all
`changala,
`2 operand: podeible codessie,
`See note below.
`
`For component pixel
`Set KIL] bit.
`2 operands possible
`fae note Below.
`
`Per component plod
`thane Sat KELL bit.
`2 operand: possible
`foe note below,
`
`K1lll groater
`
`oolssue.
`
`Per component pixel
`than equaly Set kill bic.
`2 operand: poseible celgsue,
`See note De lcws
`
`Greater
`
`OF CompheAtT pat
`Set kI11 bit.
`2 oparand; possible codasue.
`See note below.
`
`equaur
`
`Computes dlatanc