`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`1 of 12
`GEN-CXXXXX-REVA
`4 September, 2015
`23 January, 2001
`
`Author:
`Andrew Gruber, Andi Skende
`
`
`
`Issue To:
`
`
`r
`
`Copy No:
`
`Shader Processor
`
`ver 0.1
`
`
`
`Overview: This document describes the overall architecture of the Shaders, interfaces, partitioning into functional blocks
`as well as the timing of the shaderpipeline. its intended for use by hadware designers.
`
`AUTOMATICALLY UPDATED FIELDS:
`Document Location
`SHD PC
`Current Intranet Search Title: Shader Processor
`
`
`
`APPROVALS
`Name/Dept S Signature/Date.
`
`
`
`
`
`
`
`Remarks:
`
`
`
`
`
`
`THIS DOCUMENT CONTAINS [RRNFORMATION THAT COULD BE
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES
`
`INC. THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`
`
`
`
`
`
`
`“Copyright 2001, ATI Technologies Inc. All rights reserved. The material in this document constitutes an unpublished
`work created in 2001. The use of this copyright notice is intended to provide notice that ATI owns a copyrightin this
`unpublished work. The copyright notice is not an admission that publication has occurred. This work contains
`
`
`EEEroprictary information and trade secrets of ATI. No part of this document may be used, reproduced, or
`
`16774 Bytes*** © ATI BRference Copyright Notice on Cover Page © ***pagans 4.44 eM ATI 2042
`LGv. ATI
`IPR2015-00325
`
`
`
`
`transmitted in any form or by any means without the prior written permission of ATI Technologies Inc.”
`
`
`
`Exhibit 2042.doo
`
`AMD1044_0017952
`
`ATI Ex. 2012
`IPR2023-00922
`Page 1 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 1 of 12
`
`
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`R400 Shader Processor Model
`
`23 January, 2001
`4 September, 2015
`
`Table Of Contents
`
`DL.—STATE. wi ccsscoonsvecsveussvvvensecussnusooenuserssunonssovnuvesssnonavesusssnssennasesuussunsopnanssususnuneovansvessnosuszenseussuessns 5
`Ld
`SNAG] State ooo ccc c cece ce ve cccecneeeeeseeeeceeseeueeecseeeeeeveaeegetsecevasvesanessiivaestaneststenseses 5
`hida 5
`L12
`Constant REGISEOLS ooo ccc cc cece te cee cen eee ees cveeeees Cones ee cree cnc ceeeesrcueeees Crecaeeeenenes 5
`1.2
`Texture/Memory State... cc ccccccc ccc teen cence cen nceeeeen ceases ccneceeteccunreeeseeccceeeeesenenneeesecnses 7
`13
`IthSTE cece cceaeccccennsvccbeeeesepeeesscueessaneaessneaaeseunaaeeneraaeesssenneanesansensvatnntseneaaes 7
`13.1
`Vertex SNES occ cece cece cece cceeenee ce ane ee teceeeessaaeeseceeeeas vonneetetevaaasseneeeetenannennnenns 7
`1.3.2
`Pixel SECO cece cece cece eee nc anee eee veneanveneeaeeescanaeeavaaeesssaneeasevenganunrevanrenevanerasenee: 8
`1.3.3
`2D SIACoooc ccc cce cece ecceeceee venue eececceeevevesegastavevaevesgeceerevaaessegesteterenensenesens 8
`1.3.4
`RealTime SHader ooo... cccccceccee cnn cccecueceuecceeseeceueueeecsueueuuuanneeserieeeatneeeevereeaee 8
`PROGRAM FORMAT oo ...scccvsvesevssssvnensecesseuvoonenevevesnono:enuusrnsennezesoussunsennaussususnunooausuvevensnszeenenen 8
`ALLL), sevnenecevsenecnnonevevcunopnceususuuaaunns aus susannunnsayvssuuusancanuasssaunuancaussustunnunonseusunanauansaysauansnayauanuansensas 8
`ALU instruction format.occ cece ccc c cece cece eee ecnveevevanaeessananeeseuaeessnuasesvecaenanienenenevaeeens 8
`ALU OpCodes oo ccc cece cece cece ence nee ee ce neccrseeescceseesecccreeeecetecceteeecrcreessecccteeeeccntenrereseees 9
`MACODCOMES ccc cece eee e neste eee beeen eer He EET CE EERE ECOL OLED COE CCEECGEteEEEGeenAHESctANOHGdeHNabEaaaes 10
`TEMTURE/MIEMOIR Y.isssccsceccccnsnsnncnecnenawnnnnnncusnnnnsnnnon yo sunnsananvaesseuwnsananuusuvaqwnnsnxsuusnnnannnnanesnes 10
`INSTMUCTION FOPMGt cece ccc cc eee cenan ee ceveeeesepeesssupensssnvanesnraneeeuraaeeseraaterivenserrisenss 10
`OPCOMSS ooo. ccccceene cane eeeceeeeeecenene ae seaeeeeseeeseeteauaaaeeseeeeescceen te sauaaaesaeeeeeceeeeettasauaseneasarstaens 1
`
`2.
`3B.
`3.1
`3.2
`3.3
`4.
`4]
`4.2
`
`Exhibit 2042. doc
`
`16774 Bytes*** © ATI BBcference Copyright Notice on Cover Page © ***goo4/45 04:44 PM
`
`AMD1044_0017953
`
`ATI Ex. 2012
`IPR2023-00922
`Page 2 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 2 of 12
`
`
`
`
`
`
`
`23 January, 2007
`
`GEN-CXXXXX-REVA
`
`4 September, 2015
`Revision Changes:
`
`
`
`DOCUMENT-REV. NUM.
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`Rev 0.0 (Andi Skende)
`Date: May 09, 2001
`Initial revision.
`
`Rev 0.1 (Andi Skende)
`Date: May 09, 2001
`Initial revision.
`
`Rey 0.2 (Andi Skende)
`Date: May 10, 2001
`initial revision
`
`Document started
`
`Updated, added the instruction formant, initial block
`diagrams and preliminary interface description
`
`the SP ->RB
`A more detailed description of
`interface as well as RE/Sequencer->SP interface.
`
`Exhibit 2042.doc
`
`16774 Bytes*** © ATI BBRference Copyright Notice on Cover Page © ***po4/15 04:44 PM
`
`AMD1044_0017954
`
`ATI Ex. 2012
`IPR2023-00922
`Page 3 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 3 of 12
`
`
`
`
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`
`
`
`R400 Shader Processor Model
`4of 12
`23 January, 2001
`4 September, 2015
`
`Introduction
`
`DOCUMENT-REV. NUM.
`
`
`
` PAGE
`
`The shader pipeline processes elements (pixels or vertices) in groups “vectors” of sixteen. R400 operates on tiles of
`2x2 pixels or quad of pixels. There will be four sets of four shader pipes. For ease of reference and relative
`positioning of pixels within the quad that each set of shader blocks operates on, we name this sets as UL (upper left),
`Upper Right (upper right), LL (lower left) and LR (lower right). Please refer to the R400 Shader Processor Model
`(architectural specification) for an overall functionality of the shaders from the programmer's view-point.
`
`Exhibit 2042.doc
`
`16774 Bytes*** © ATI BEReference Copyright Notice on Cover Page © ***po94/45 g4-44 pu
`
`AMD1044_0017955
`
`ATI Ex. 2012
`IPR2023-00922
`Page 4 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 4 of 12
`
`
`
`
`
`
`
`1. State
`
`ORIGINATE DATE
`
`EDIT DATE
`
`23 January, 2001
`
`4 September, 2015
`
`DOCUMENT-REV. NUM.
`
`GEN-CXXXXX-REVA
`
`1.1 Shader State
`
`1.1.1 GPR
`
`The general purpose registers are 128 bits wide, composed of four 32 bit values. Depending on the operation these
`values are interpreted at RGBA, or XYZW,or STQW, or UVOQW, or YUVA, or.. to simplify matters the only two aliases
`used here are XYZW and RGBA.
`
`To hide the latency of memory acceses the shaderpipe will switch between different vectors. This is the same as the
`idea of “microthreading’ that some advanced CPU's are investigating. The large register file is split between the
`vectors executing in the shader pipe. The mangment of the shader register file is automatic, and not visibie to a
`program executing on a vector, execept that a program is required to declare the number of GPRsit need to execute.
`The hardware will not start a vector until the required number of registers is available. There is a direct tradeoff
`between the numberof registers each program/vector needs and the number of vectors than can be simultainiously
`resident.
`If there are too few vectors resident, then the latency of memory accesses can no longer be hidded and
`performance suffers.
`
`There are a total of 128 registers. We do not yet know how manyregisters per vector is too many, and performance
`starts suffering.
`
`lt is possible for a single program/vector to request all 128 registers. This will make it impossible to hide memory
`latency, but the program will still execute and generate the correct result.
`
`Most pixel programs are expectec to have less than eight registers, vertex programs are expected to have less than
`sixteen registers.
`
`If a
`The number of registers a program needs is the maximum number of registers it needs at any instruction.
`program needs only 3 instructions nearly all of the time, except for a short period when it needs 8, it still needs to
`allocate eight. A significant performance optimization is for the compiler to reorder the instructions to minimize the
`number of needed registers.
`
`An open issue is if the pipeline will need GPROto store pixel related information. (coverage mask, position, Z, VV). If
`we chose to do this (to avoid having a separate memory for this data) then GPR0Ois unavailble as a general register.
`
`
`31
`0 GPR
`[RK| RO
`
`
`
`Po R127
`
`: Ri
`
`Notation:
`
`RO.A refers to the bits 96 to 127 of register one. So does RO.W
`
`1.1.2 Constant Registers
`There are also (1927) constant registers:
`
`
`127
`95
`63
`31
`0 Const
`
`AW
`B/Z
`GY
`RIX
`co
`
`
`
`
`
`
`| C191
`
`These are ONLY available to vertex and pixel shader program in the primary commands stream. They should not be
`used for real time stream pixel shaders, or 2D shaders.
`
`Exhibit 2042.doc
`
`16774 Bytes*** © ATI BRReference Copyright Notice on Cover Page © ***po¢4145 ga-a4 py
`
`AMD1044_0017956
`
`ATI Ex. 2012
`IPR2023-00922
`Page 5 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 5 of 12
`
`
`
`R400 Shader Processor Model
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`23 January, 2001
`
`4 September, 2015
`
`
`
`The constant registers are shared between vertex shaders and pixel shaders, it is the drivers job to allocate one
`section to pixel shaders and another to vertex shaders to match the D3D programming model, other API’s may allow
`more freedom.
`
`NEW Constant registers are also used to hold texture/memory fetch state. NEV
`To be able to support multiple textures easily, and to save hardware area the texture registers are stored in constant
`registers. A pair of constant registers hold 256 bits of texture state. Rather than have four or six sets of texture
`registers as we do in the R100,R200, and R300 by storing them in the constant memory we can save area by reusing
`the logic already needed to update the constant registers in order. Since any single texture instruction will only fetch
`from one texture we do not need the simultainous access we would get with implementing this as “normal” registers.
`The driver will probably decide to allocate a fixed number of the constant registers as texture registers.
`
`The constant registers are backed up by control logic to ensure that a pixel sees the correct state, even when partial
`state updates are completed. | want to save area in this logic by having the updates occur on a 256 bit granularity.
`Here is how | expect the driver to work: The driver maintains a copy in cacheable system memory of the constant
`registers. When the driver needs to upload a change of the constantto the chip Gust before drawing) it needs to copy
`two sequential aligned 128 bit words from the system memory version to the indirect buffer, even if only a 32 bit word
`within the two constant values has changed. Since the CPU will read in at least the full 256 bits into its cache, the
`only performance penalty will be the second 128 bit write
`
`1.1.3 Previous Instruction
`
`Within a alu clase the result of the previous operation is explicitly available, without requiring a register read.
`(in fact due an exposed pipeline delay, the result of the previous operation can not be read from the registerfile
`without a one instruction delay slot)
`
`This register is not preserved between the end of one alu clause and the begining of another.
`
`It can be used to avoid using another GPR if the result is not needed.
`127 0 95 63 31
`
`
`
`
`
` AW
`| BZ
`| GY
`| RIX
`| Prev
`
`1.1.4 Texture Temporaries
`There are two texture temporary registers:
`63 0 47 31 15
`
`
`
`
`
` A
`B
`G
`R
`| Tto
`
`__ee
`
`Tt
`
`
`
`They are used to implement higher order filters (tri-linear, tri-linear (from a volume texture), Bi-cubic, aniso, arbitrary
`filters)
`
`TtO can be viewed as an accumution buffer. The result of the bilinear blend can be written into Tt0, after being
`summed with the value that is already there.
`
`A trilinear filter can be done with twoinstructions:
`
`TtO = texture(address, and rest of state neeed, but with mipcnil set to “lower mip level”)
`R = Tt0 + texture(address, and rest of state neeed, but with mipentl set to “upper mip level”)
`
`Volume textures and mipmapped volume textures are implemented in the same way.
`
`Tt1 is used for implementing filers of arbitrary size. For every four samplesin the filter two acceses are made, the first
`access fetchesthe filter weights, the second fetches the texture values and uses the contents of Tt1 as the weights
`instead of a bi-linear filter.
`
`We will have explicit support for bi-cubic filters, and seperabie filters to avoid the doubled cycies of the previous
`method.
`
`Exhibit 2042.doc
`
`16774 Bytes*** © ATI BBRcference Copyright Notice on Cover Page © ***yo4/45 4-44 pm
`
`AMD1044_0017957
`
`ATI Ex. 2012
`IPR2023-00922
`Page 6 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 6 of 12
`
`
`
`
`
`
`
`GEN-C0000-REVA
`
`
`23 January, 2001
`4 September, 2015
`1.2 Texture/Memory State
`
`ORIGINATE DATE
`
`EDIT DATE
`
`
`
`DOCUMENT-REV. NUM.
`
`Texture/memoryfetch state is stored in the constant registers; each texture uses two sequential 128 bit registers to
`hold the 256 bits of state per texture.
`
`The contents of these constant registers were stored in normal registers in previous chips.
`
`A very early version of the data stored in a texture/memory constant register is:
`
`Field size|Description
`
`
`
`Min_MIPlevel|4 Clamp mip map level to this level
`
`
`Max MIP level|4 Clamp mip map level to this level
`
`
`First_MIP_level|4 First mip map level in tree. (do we want/need this?)
`
`Clamp _$
`3
`Clamp/wrap/
`
`Clamp_T
`3
`
`Clamp_W
`3
`Clamp control for volume textures
`
`Border_mode
`1
`
`Tx_format
`5
`Non_power2
`1
`
`to 0 after clamp mirror, must be multiplied by txwidth/txheight
`
`0-
`
`
`
`
`
`
`texure is in range 1
`(powerz2)
`1-
`texture has been multipled by the texture size in the pixel shader. (need to work out how
`
`to deal with clamp modes)
`
`TXWIDTH
`4
`Texture width (or faceO width)
`
`TXHEIGHT
`4
`Texture height (or face height)
`
`TXDEPTH
`4
`Texture depth (volume textures)
`
`TAWIDTH_f1
`4
`
`TAHEIGHT_ft
`4
`
`TAWIDTH_f2
`4
`;
`
`TXHEIGHT_f2|4
`TAWIDTH_f3
`4
`
`TAHEIGHT_f8|4
`
`TAWIDTH_ 4
`4
`
`TXHEIGHTf4|4 |
`TXWIDTH[4
`
`TXHEIGHTf|4
`
`Alpha_mask?
`1
`
`Chroma_key?
`1
`Tex_coord_typ|3
`
`
`OD_BIAS
`14
`
`TX_PITCH
`14
`Numberofbits will decrease, used with non power 2 textures
`Offset
`32
`Texture offset (includes endian and tile control)
`
`e L
`
`Limit
`
`32
`
`Any memory accesses > limit will be killed, and the pixel that made the requestwill also be
`killed. If the access was from a vertex shader, then the vertex shaderwill for the x value of
`the vertex to be NAN whichwill kill all triangles that attempt to use the vertex.
`
`1.3 Initial state
`
`1.3.1 Vertex Shader
`
`A vertex shader initially has the x value of RO set to the vertex index. No other registers are filled. The vertex shader
`must use the index to fetch the vertex data from the vertex array(s), The pointers to the vertex arrays should be
`placed in constant registers by the driver.
`
`Exhibit 2042.doc
`
`16774 Bytes*** © ATI BEReference Copyright Notice on Cover Page © ***po4/95 o4-44 py
`
`AMD1044_0017958
`
`ATI Ex. 2012
`IPR2023-00922
`Page 7 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 7 of 12
`
`
`
`
`
`
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`
`
`R400 Shader Processor Model
`23 January, 2001
`4 September, 2015
`1.3.2 Pixel Shader
`
`The pixel shader has the interpolated values generated from the values exported by the vertex shader.
`If the vertex shader did expxy, and the appropriate control bit in the rasterizer is set, then the register 0 contains the
`x,y,Z,w of the pixel (screen space). If the pixel shader wants a world space x,y,z,w the vertex shader should output
`that.
`
`1.3.3 2D Shader
`to be defined
`
`1.3.4 RealTime Shader
`To be defined
`
`2. Program Format
`
`A pixel or vertex shader program consists of 16 clauses, eight texture and eight alu.
`The instructions in a clause will be executed sequentially,
`
`3. ALU instruction format and other instruction related issues
`
` _Reg/ConstantPointer| [nent
`
`
`
`3.1 ALU instruction format
`
`
`Field
`Size
`Description
`
`Vector Opcode
`5
`Opcode
`eee
`_Sealar/Alpha Opcode 7.
`Opcode forthe Scalaror Alpha channelinstruction:
`Scalar Source Select
`1
`Selection the input for the scalar operation out of SRC B or SRC C when a
`scalar instruction is coissued with a vector operation. The vector operation has
`
`
`to be 2 sourceinstructions.
`
`SRC ARGB Select
`2
`
`
`SRC B RGB Select
`
`SRC C RGB Select
`
`aSRC A Alpha Select|
`
`SRC B Alpha Select
`2
`
`SRC C Alpha Select
`2
`
`SRC RGB|8A
`
`Location of Source A in the registerfile
`
`2 2 2
`
`
`
`
`SRC RGB|8 Location of SourceBin the registerfileB
`
`Reg/Constant Pointer|eeannetntnnnnnnnennnennnnensncne
`
`
`
`SRC RGB|8Cc Location of Source C in the registerfile
`
`Reg/Constant Pointer
`
`
`
`SRC Alpha|&A Location of SourceA in the registerfile |
`
`
`Reg/Constant Pointer
`|
`
`SRC Alpha|8 Location of SourceBin the registerfileB
`
`
`
`Reg/Constant Pointer
`
`
`
`SRC Alpha|8Cc Location of Source C in the registerfile
`
`Reg/Constant Pointer
`
`
`SRC ARGB Arg Mod|2 Argument A modifier
`
`
`SRC B RGB Arg Mod|2 Argument B modifier
`
`
`SRC C RGBArg Mod|2 Argument C modifier
`
`
`SRC A Alpha Arg Mod|2 | Argument A modifier on the alpha channel |
`
`_SRC BAlphaArgMod |2 |ArgumentBmodifieron the apha channel
`
`SRC C AlphaArg Mod|2 _ _ArgumentCmodifier onthealphachannel
`SRC Aswizzie|7120
`3hbitsforeach component.
`eeeesesesesesesesesesaesesaiesesee-
`
`SRC B swizzie
`12
`3 bits for each component
`Exhibit 2042.doc
`16774 Bytes*** © ATI BBcference Copyright Notice on Cover Page © ***yo4/15 ga-4a pM
`
`AMD1044_0017959
`
`ATI Ex. 2012
`IPR2023-00922
`Page 8 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 8 of 12
`
`
`
`©
`
`ORIGINATE DATE
`
`
`DOCUMENT-REV. NUM.
`EDIT DATE
`
`23 January, 2001
`4 September, 2015
`GEN-CXXXXX-REVA
`9 of 12
`
`SRC C swizzle
`12
`| 3 bits for each component
`
`
`scalar/Alpha Output|3 Output modifier for the result of the Scalar or Alpha channel operation
`
`Mod
`
`PAGE
`
`
`
`
`‘VectorExpotfiag [| [TBD
`
`
`Vector Output Modifier|3 Output modifier on the vector result (RGB) when alpha operation is being
`
`: coissued or ARGB when scalar operation is being coissued
`
`Scalar/Alpha Clamp
`1
`
`Vector Clamp
`1
`
`
`
`Scalar/Alpha Write|4 Defines which out of 32 bits words (four of them)in the result is written back
`
`Mask
`in the register file
`
`Vector Write Mask
`Defines which out of 32 bits words (four of them)in the result is written back
`Scalar/Alpha
`result
`| Specifies the addressinto the register files for result of scalar/alpha operation
`
`pointer
`|
`
`Vector result pointer
`Scalar Export flag
`
`| TBD
`
`D = A truncated RndV
`
`3.2 ALU Opcodes
`The following opcodes are native, core opcodes:
`Name Notes Function
`
`
`MACC
`/D#A*B+C
`(add is A*1.0
`+ C)
`(mul
`is A*B +
`0.0)
`
`(nop/passthrough/move is A*1.0 + 0.0)
`
`MSUB|D=A*B-—C
`(sub is A*1.0 — C)
`MRSB|D=C-—A’*B
`
`DOT2
`D = (Ax * Bx) + (Ay * B.y)
`
`DOT3
`D = (Ax * Bx) + (Ay * B.y) + (A.z * B.z)
`
`DOT4
`D = (Ax * Bx) + (Ay * By) + (Az * B.z) + (A.w* B.w)
`
`RECP
`D=1/A.w
`Use broadcast to select something other than w
`CMxx
`D=AifC xx 0.0, else B
`Xx can be: gt,gte,eq (It.lte,neé can be generated by
`
`swaping a and b)
`
`CLMP|D=Aif (B>A>C) else Bif (A> B) B else C
`ABSV D=AifA>0 else B
`
`
`CEIL
`D = A; the smallest integer D such thatD >A
`FLOR
`
`D = A rounded to the nearest integer
`
`FRAC|D=A-floor(A)
`
`MINV|D = min(AB)
`
`MAXV|D = max(A.B)
`Area
`D = area(A)
`Possible opcode:
`Sets D to A.w,A.w,A.w,A.w where each A is from
`a different pixel
`in the quad. Can be used to
`calculate area for LOD calculations, usefull
`for
`
`
`antialiasing procedural shaders.
`
`Exp
`D = Pow(a)
`Possible opcode, otherwise texture lookup
`
`
`
`RSOR|D = tisqrt(a) Possible opcode, otherwise texture lookup
`
`Log2
`D = Log(A, base = 2)
`Possible opcode, otherwise table lookup
`
`Log
`D = Log(A, base = B)
`Possible opcode, otherwise table lookup
`
`Pow
`D = A to the B'th power
`SCLP|D=ABC
`Concatinate clip code test results into a single
`DWORD
`CUBE|D=A
`Find the largest of x,y,z, place the reciprocal of
`that value in w, set x and y to the two remaining
`values, and identify the face as the integer 0 to 5
`in Z. Used to setup for a cube map.
`
`
`
`
`
`Exhibit 2042.doc
`
`16774 Bytes*** © ATI BEReference Copyright Notice on Cover Page © ***po4/95 o4-44 py
`
`AMD1044_0017960
`
`ATI Ex. 2012
`IPR2023-00922
`Page 9 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 9 of 12
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`4 September, 2015
`
`R400 Shader Processor Model
`
`10 of 12
`
`23 January, 2001
`
`_Theexportopcodes mustbeinthe lastaluclause:
`
`xportpositionand clipcodes of ‘vertex,end
`“EXPP | Export position: position = A, clipcodes = B
`
`_ vertex shader
`16
`| Export vertex component
`EXP
`Export vertex component, {D} =A
`to vertex cache,
`| possible destinations
`| Placeholder for the rasterizer to put pixel position |
`EXPAY | Placeholder, {0} = 1,0,0,0
`
`nd Z into pixel shader.
`
`
`EXPC
`
`Export color
`xport color of pixel, end pixel shader
`| Export Z of pixel
`Export Z
`EXPZ
`
`
`
`
`
`There are some extraction and data conversion opcodes to be added.
`There will also be a specialized opcode for cube maps.
`
`The export opcodes will be removed in a future version, and replaces with new destinations for general operations.
`Exports muststill be the last operations executed.
`3.3 Macro opcodes
`These instructions are NOT implemented in the R400. But their functionality can be implemented with the shown
`opcodes.
`
`4 Texture/Memory
`
`4.1 Instruction Format
`
`Destination control:
`
`
`
`
`Field Size|Description |
`
`Initialize
`1
`If set, destination register is set to 1,0,0,0 before any writes are made (w,z,y,x) (a,b,g,1)
`
`Wmask 4 Write mask for result of lookup
`Channels
`2
`0: 1 channel, duplicate across all four channels
`1:2 channels d.x.d.y = a, d.z,d.w=b
`2:2 channels d.x,d.z =a, dy,d.w=b
`
`3: 4 channels
`
`4
`
` Volume filter | 1
`
`
`Data_Format
`
`0: unsigned int
`1: none- just write to destination register
`
`others (Z, apple YUV, mpeg, etc.)
`
`Bias
`1?
`Do we want a bias other than —-128?
`|
`
`Scale
`>4
`Result value is (range(s)tbias)*scale
`|
`
`Range
`1
`0 to 1 or O to 255/256 (or 65335/65336 etc..)
`
`Texture
`7
`Which texture we want to fetch from
`|
`
`Texture_t
`1
`Texture or linear memory array
`(if linear array, then only offset and limit are noticed, point sampling is forced, format is |
`
`32bpp)
`|
`
`Dest
`7
`Destination address of texture/memory fetch
`|
`
`Src
`7
`Source register for address
`
`Swizzel
`5
`Which part of source register contains texture coordinate
`MAGFilter
`1
`0- Nearest
`
`1-_ Linear
`
`Min_Filter
`1
`0- Nearest
`
`1-_ Linear
`0- Disabled
`2
`Mip_filter
`1- Enabled
`
`Filtering betweeen volume texture levels
`16774 Bytes*** © ATI BBcference Copyright Notice on Cover Page © ***yo4/15 ga-4a pM
`
`Exhibit 2042.doc
`
`AMD1044_0017961
`
`ATI Ex. 2012
`IPR2023-00922
`Page 10 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 10 of 12
`
`
`
`
`
`
` TE DATE
`
` DOCUMENT-REV. NUM.
`
`ORIGINA
`EDIT DATE
`
`
`23 Janua
`GEN-CXXXXX-REVA
`
`4 September, 2015
`y, 2001
`0- Nearest
`
`{- Linear
`
`2
`
`| Filter_mode
`
`| MIP_enable
`1
`Q-
`no mipmaping
`
`1-_mip-mapping
`0-
`treat input as four 8bit values
`{-
`treat input as two 16bit values
`|
`2-
`treat input as one 32 bit value
`
`3-
`input is notfiltered
`
`| Scale
`4
`Multiply the x (and y?) cornponents by this value, used for vertex fetching
`
`| Offset | 4
`Add this integer to the x component, after the scale, used for vertex fetching.
`| Combine
`2
`0: R=R
`| mode
`1:R=R+ Tto
`2: Use Tti values as biend factors
`
`3:
`
`2
`
`0: Do not store result of fetch (NOP)
`| Output mode
`1: store result in GPR
`|
`|
`2: store result in Tt
`
`3: store result in Tt1
`
`
`
`
`
`2
`
`| Mip mode
`
`0: normal
`2: lower mip filter and bias
`
`|
`-
`3: upper mip filter and bias
`| Tex3D mode
`0: normal (non volume)
`2
`2: lower z fetch
`
`3: upper z fetch
`
`| Sample_bias_|4 2's complementoffsetfor bi-linear sample.
`| x
`0 will generate a normal fetch, a positive or negative number will fetch the 2x2 samples that
`distance in 2x2s away. Used to impement bi-cubic and arbitrary sized filters
`
`| Samplebias|4 2’s complementoffsetfor bi-linear sample.
`ly
`0 will generate a normal fetch, a positive or negative number will fetch the 2x2 samples that
`distance in 2x2s away. Used to impementbi-cubic and arbitrary sized filters
`
`We can movefields between here and the constant register that hold the rest of the texture fetch state.
`42 Opcodes
`
`| Name
`Function
`| Notes
`
`i TF
`Texture fetch
`
`| CTF
`Cube texture fetch
`|
`| AF
`Array fetch
`Used for vertex array fetches, ignores most of the
`state in the texture constant registers, allow driver
`to only store offset+limit values
`in constant
`register
`
`
`
`
`
`
`
`4.3 Example
`
`To do a vertex fetch the instruction mightbe asfollows:—ee—eesesesa—isi‘“‘“‘i‘i‘i‘ir
`
`(Field Size|Description
`
`| Initialize
`1
`Setfor first fetch to each gpr.
`
`| Wmask
`4
`Set to write to correct element of gpr- ie if fetching x, would be set to 1000
`
`| Channels
`2
`0: 1 channel, duplicate acrossall four channels
`|
`| Data_Format
`4
`1: none- just write to destination register
`
`or 2? Color, unpack to rgba
`
`_Bias
`1?
`10
`
`
`| Scale >410
`
`| Range
`1
`0 to 255/256
`|
`| Texture
`7
`Which texture we want to fetch from
`
`| Texturet 1 _|linearmemoryarray eee
`
`| Dest Destination address of texture/memoryfetch I 7
`
`Exhibit 2042.doc
`16774 Bytes*** © ATI BBRference Copyright Notice on Cover Page © ***o9.04/45 ga-44 pu
`
`AMD1044_0017962
`
`ATI Ex. 2012
`IPR2023-00922
`Page 11 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 11 of 12
`
`
`
`12 of 12
`
`
`
`
`PAGE
`
`T
`
`Field : size|Description |
`
`|
`DOCUMENT-REV. NUM.
`EDIT DATE
`ORIGINATE DATE
`
`|
`R400 Shader Processor Model
`4 September, 2015
`23 January, 2001
`
`
`7
`Source register for address
`Siro
`
`Swizzel
`5
`Which part of source register contains texture coordinate
`
`‘MAGFilter[1 [|2-Nearest
`
`Min_Filter |2- Nearest22eae1
`
`
`
`Mip_filter
`2
`2- Disabied
`Volume_filter
`1
`Filtering betweeen volume texture levels
`
`| 2- Nearest
`
`2- nomipmaping
`MIP_enable
`1
`eee|
`
`Filter_ mode
`2
`4-_inputis not filtered
`|
`Scale
`4
`Multiply the x (and y?) components by this value, used for vertex fetching
`
`Offset
`4
`Add this integer to the x component, after the scale, used for vertex fetching.
`And the constant register pair that texture points to would contain:
`
`
`Min_MIP_level
`| 4
`NA
`Max_MIP_level
`[4
`| NA
`First_MIP_level
`|4
`| NA
`Clamp_$
`3
`NA
`Clamp_T
`3
`NA
`Clamp_w
`13
`| NA
`
`Border_mode
`1
`NA
`
`Tx_format
`5
`3z2bpp
`;
`Non_powerz
`1
`2-
`texture has been multipled by the texture size in the pixel shader. (need to work out how
`
`to deal with clamp modes)
`
`TAWIDTH
`| 4
`Texture width (or faceO width)
`
`TXAHEIGHT
`14 Sn
`
`TXDEPTH
`4
`NA
`
`TAWIDTH_ ft
`4
`NA
`
`TXHEIGHTfi
`| 4
`NA
`
`TXWIDTH_f2
`4
`NA
`
`
`TXHEIGHT(20040 |NA
`
`TXWIDTH_f3
`4
`NAO
`eeseseseaeeseeseseseseseseaeseseseseseseseseaeeeeeeeeeeeee|
`
`TXHEIGHT 3 | 4
`NA
`|
`
`TXWIDTH_f4
`4
`NA
`
`TXHEIGHTf4 | 4
`NA
`7
`
`TXWIDTH_f5
`4
`NA
`
`
` TXHEIGHT 15|4 NA
`
`Alpha_mask?
`|
`1
`NA
`
`Chroma_key?
`1
`NA
`Tex_coord_typ
`3
`?1D array
`
`
`ODBIAS
`14
`0
`
`TX PITCH
`1 14
`Number of bits will decrease, used with non power 2 textures
`Offset
`32
`Texture offset (includes endian and tile control)
`
`
`
`
`
`
`e L
`
`Limit
`
`Any memory accesses > limit will be killed, and the pixel that made the request will also be
`killed. If the access was from a vertex shader, then the vertex shader will for the x value of
`
`the vertex to be NAN whichwill kill all triangles that attempt to use the vertex.
`
`32
`
`Exhibit 2042.dec
`
`16774 Bytes*** © ATI BBference Copyright Notice on Cover Page © ***po4/45 04-44 PM
`
`AMD1044_0017963
`
`ATI Ex. 2012
`IPR2023-00922
`Page 12 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 12 of 12
`
`