throbber

`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`1 of 12
`GEN-CXXXXX-REVA
`4 September, 2015
`23 January, 2001
`
`Author:
`Andrew Gruber, Andi Skende
`
`
`
`Issue To:
`
`
`r
`
`Copy No:
`
`Shader Processor
`
`ver 0.1
`
`
`
`Overview: This document describes the overall architecture of the Shaders, interfaces, partitioning into functional blocks
`as well as the timing of the shaderpipeline. its intended for use by hadware designers.
`
`AUTOMATICALLY UPDATED FIELDS:
`Document Location
`SHD PC
`Current Intranet Search Title: Shader Processor
`
`
`
`APPROVALS
`Name/Dept S Signature/Date.
`
`
`
`
`
`
`
`Remarks:
`
`
`
`
`
`
`THIS DOCUMENT CONTAINS [RRNFORMATION THAT COULD BE
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES
`
`INC. THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`
`
`
`
`
`
`
`“Copyright 2001, ATI Technologies Inc. All rights reserved. The material in this document constitutes an unpublished
`work created in 2001. The use of this copyright notice is intended to provide notice that ATI owns a copyrightin this
`unpublished work. The copyright notice is not an admission that publication has occurred. This work contains
`
`
`EEEroprictary information and trade secrets of ATI. No part of this document may be used, reproduced, or
`
`16774 Bytes*** © ATI BRference Copyright Notice on Cover Page © ***pagans 4.44 eM ATI 2042
`LGv. ATI
`IPR2015-00325
`
`
`
`
`transmitted in any form or by any means without the prior written permission of ATI Technologies Inc.”
`
`
`
`Exhibit 2042.doo
`
`AMD1044_0017952
`
`ATI Ex. 2012
`IPR2023-00922
`Page 1 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 1 of 12
`
`

`

`
`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`R400 Shader Processor Model
`
`23 January, 2001
`4 September, 2015
`
`Table Of Contents
`
`DL.—STATE. wi ccsscoonsvecsveussvvvensecussnusooenuserssunonssovnuvesssnonavesusssnssennasesuussunsopnanssususnuneovansvessnosuszenseussuessns 5
`Ld
`SNAG] State ooo ccc c cece ce ve cccecneeeeeseeeeceeseeueeecseeeeeeveaeegetsecevasvesanessiivaestaneststenseses 5
`hida 5
`L12
`Constant REGISEOLS ooo ccc cc cece te cee cen eee ees cveeeees Cones ee cree cnc ceeeesrcueeees Crecaeeeenenes 5
`1.2
`Texture/Memory State... cc ccccccc ccc teen cence cen nceeeeen ceases ccneceeteccunreeeseeccceeeeesenenneeesecnses 7
`13
`IthSTE cece cceaeccccennsvccbeeeesepeeesscueessaneaessneaaeseunaaeeneraaeesssenneanesansensvatnntseneaaes 7
`13.1
`Vertex SNES occ cece cece cece cceeenee ce ane ee teceeeessaaeeseceeeeas vonneetetevaaasseneeeetenannennnenns 7
`1.3.2
`Pixel SECO cece cece cece eee nc anee eee veneanveneeaeeescanaeeavaaeesssaneeasevenganunrevanrenevanerasenee: 8
`1.3.3
`2D SIACoooc ccc cce cece ecceeceee venue eececceeevevesegastavevaevesgeceerevaaessegesteterenensenesens 8
`1.3.4
`RealTime SHader ooo... cccccceccee cnn cccecueceuecceeseeceueueeecsueueuuuanneeserieeeatneeeevereeaee 8
`PROGRAM FORMAT oo ...scccvsvesevssssvnensecesseuvoonenevevesnono:enuusrnsennezesoussunsennaussususnunooausuvevensnszeenenen 8
`ALLL), sevnenecevsenecnnonevevcunopnceususuuaaunns aus susannunnsayvssuuusancanuasssaunuancaussustunnunonseusunanauansaysauansnayauanuansensas 8
`ALU instruction format.occ cece ccc c cece cece eee ecnveevevanaeessananeeseuaeessnuasesvecaenanienenenevaeeens 8
`ALU OpCodes oo ccc cece cece cece ence nee ee ce neccrseeescceseesecccreeeecetecceteeecrcreessecccteeeeccntenrereseees 9
`MACODCOMES ccc cece eee e neste eee beeen eer He EET CE EERE ECOL OLED COE CCEECGEteEEEGeenAHESctANOHGdeHNabEaaaes 10
`TEMTURE/MIEMOIR Y.isssccsceccccnsnsnncnecnenawnnnnnncusnnnnsnnnon yo sunnsananvaesseuwnsananuusuvaqwnnsnxsuusnnnannnnanesnes 10
`INSTMUCTION FOPMGt cece ccc cc eee cenan ee ceveeeesepeesssupensssnvanesnraneeeuraaeeseraaterivenserrisenss 10
`OPCOMSS ooo. ccccceene cane eeeceeeeeecenene ae seaeeeeseeeseeteauaaaeeseeeeescceen te sauaaaesaeeeeeceeeeettasauaseneasarstaens 1
`
`2.
`3B.
`3.1
`3.2
`3.3
`4.
`4]
`4.2
`
`Exhibit 2042. doc
`
`16774 Bytes*** © ATI BBcference Copyright Notice on Cover Page © ***goo4/45 04:44 PM
`
`AMD1044_0017953
`
`ATI Ex. 2012
`IPR2023-00922
`Page 2 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 2 of 12
`
`

`

`
`
`
`
`23 January, 2007
`
`GEN-CXXXXX-REVA
`
`4 September, 2015
`Revision Changes:
`
`
`
`DOCUMENT-REV. NUM.
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`Rev 0.0 (Andi Skende)
`Date: May 09, 2001
`Initial revision.
`
`Rev 0.1 (Andi Skende)
`Date: May 09, 2001
`Initial revision.
`
`Rey 0.2 (Andi Skende)
`Date: May 10, 2001
`initial revision
`
`Document started
`
`Updated, added the instruction formant, initial block
`diagrams and preliminary interface description
`
`the SP ->RB
`A more detailed description of
`interface as well as RE/Sequencer->SP interface.
`
`Exhibit 2042.doc
`
`16774 Bytes*** © ATI BBRference Copyright Notice on Cover Page © ***po4/15 04:44 PM
`
`AMD1044_0017954
`
`ATI Ex. 2012
`IPR2023-00922
`Page 3 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 3 of 12
`
`

`

`
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`
`
`
`R400 Shader Processor Model
`4of 12
`23 January, 2001
`4 September, 2015
`
`Introduction
`
`DOCUMENT-REV. NUM.
`
`
`
` PAGE
`
`The shader pipeline processes elements (pixels or vertices) in groups “vectors” of sixteen. R400 operates on tiles of
`2x2 pixels or quad of pixels. There will be four sets of four shader pipes. For ease of reference and relative
`positioning of pixels within the quad that each set of shader blocks operates on, we name this sets as UL (upper left),
`Upper Right (upper right), LL (lower left) and LR (lower right). Please refer to the R400 Shader Processor Model
`(architectural specification) for an overall functionality of the shaders from the programmer's view-point.
`
`Exhibit 2042.doc
`
`16774 Bytes*** © ATI BEReference Copyright Notice on Cover Page © ***po94/45 g4-44 pu
`
`AMD1044_0017955
`
`ATI Ex. 2012
`IPR2023-00922
`Page 4 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 4 of 12
`
`

`

`
`
`
`
`1. State
`
`ORIGINATE DATE
`
`EDIT DATE
`
`23 January, 2001
`
`4 September, 2015
`
`DOCUMENT-REV. NUM.
`
`GEN-CXXXXX-REVA
`
`1.1 Shader State
`
`1.1.1 GPR
`
`The general purpose registers are 128 bits wide, composed of four 32 bit values. Depending on the operation these
`values are interpreted at RGBA, or XYZW,or STQW, or UVOQW, or YUVA, or.. to simplify matters the only two aliases
`used here are XYZW and RGBA.
`
`To hide the latency of memory acceses the shaderpipe will switch between different vectors. This is the same as the
`idea of “microthreading’ that some advanced CPU's are investigating. The large register file is split between the
`vectors executing in the shader pipe. The mangment of the shader register file is automatic, and not visibie to a
`program executing on a vector, execept that a program is required to declare the number of GPRsit need to execute.
`The hardware will not start a vector until the required number of registers is available. There is a direct tradeoff
`between the numberof registers each program/vector needs and the number of vectors than can be simultainiously
`resident.
`If there are too few vectors resident, then the latency of memory accesses can no longer be hidded and
`performance suffers.
`
`There are a total of 128 registers. We do not yet know how manyregisters per vector is too many, and performance
`starts suffering.
`
`lt is possible for a single program/vector to request all 128 registers. This will make it impossible to hide memory
`latency, but the program will still execute and generate the correct result.
`
`Most pixel programs are expectec to have less than eight registers, vertex programs are expected to have less than
`sixteen registers.
`
`If a
`The number of registers a program needs is the maximum number of registers it needs at any instruction.
`program needs only 3 instructions nearly all of the time, except for a short period when it needs 8, it still needs to
`allocate eight. A significant performance optimization is for the compiler to reorder the instructions to minimize the
`number of needed registers.
`
`An open issue is if the pipeline will need GPROto store pixel related information. (coverage mask, position, Z, VV). If
`we chose to do this (to avoid having a separate memory for this data) then GPR0Ois unavailble as a general register.
`
`
`31
`0 GPR
`[RK| RO
`
`
`
`Po R127
`
`: Ri
`
`Notation:
`
`RO.A refers to the bits 96 to 127 of register one. So does RO.W
`
`1.1.2 Constant Registers
`There are also (1927) constant registers:
`
`
`127
`95
`63
`31
`0 Const
`
`AW
`B/Z
`GY
`RIX
`co
`
`
`
`
`
`
`| C191
`
`These are ONLY available to vertex and pixel shader program in the primary commands stream. They should not be
`used for real time stream pixel shaders, or 2D shaders.
`
`Exhibit 2042.doc
`
`16774 Bytes*** © ATI BRReference Copyright Notice on Cover Page © ***po¢4145 ga-a4 py
`
`AMD1044_0017956
`
`ATI Ex. 2012
`IPR2023-00922
`Page 5 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 5 of 12
`
`

`

`R400 Shader Processor Model
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`23 January, 2001
`
`4 September, 2015
`
`
`
`The constant registers are shared between vertex shaders and pixel shaders, it is the drivers job to allocate one
`section to pixel shaders and another to vertex shaders to match the D3D programming model, other API’s may allow
`more freedom.
`
`NEW Constant registers are also used to hold texture/memory fetch state. NEV
`To be able to support multiple textures easily, and to save hardware area the texture registers are stored in constant
`registers. A pair of constant registers hold 256 bits of texture state. Rather than have four or six sets of texture
`registers as we do in the R100,R200, and R300 by storing them in the constant memory we can save area by reusing
`the logic already needed to update the constant registers in order. Since any single texture instruction will only fetch
`from one texture we do not need the simultainous access we would get with implementing this as “normal” registers.
`The driver will probably decide to allocate a fixed number of the constant registers as texture registers.
`
`The constant registers are backed up by control logic to ensure that a pixel sees the correct state, even when partial
`state updates are completed. | want to save area in this logic by having the updates occur on a 256 bit granularity.
`Here is how | expect the driver to work: The driver maintains a copy in cacheable system memory of the constant
`registers. When the driver needs to upload a change of the constantto the chip Gust before drawing) it needs to copy
`two sequential aligned 128 bit words from the system memory version to the indirect buffer, even if only a 32 bit word
`within the two constant values has changed. Since the CPU will read in at least the full 256 bits into its cache, the
`only performance penalty will be the second 128 bit write
`
`1.1.3 Previous Instruction
`
`Within a alu clase the result of the previous operation is explicitly available, without requiring a register read.
`(in fact due an exposed pipeline delay, the result of the previous operation can not be read from the registerfile
`without a one instruction delay slot)
`
`This register is not preserved between the end of one alu clause and the begining of another.
`
`It can be used to avoid using another GPR if the result is not needed.
`127 0 95 63 31
`
`
`
`
`
` AW
`| BZ
`| GY
`| RIX
`| Prev
`
`1.1.4 Texture Temporaries
`There are two texture temporary registers:
`63 0 47 31 15
`
`
`
`
`
` A
`B
`G
`R
`| Tto
`
`__ee
`
`Tt
`
`
`
`They are used to implement higher order filters (tri-linear, tri-linear (from a volume texture), Bi-cubic, aniso, arbitrary
`filters)
`
`TtO can be viewed as an accumution buffer. The result of the bilinear blend can be written into Tt0, after being
`summed with the value that is already there.
`
`A trilinear filter can be done with twoinstructions:
`
`TtO = texture(address, and rest of state neeed, but with mipcnil set to “lower mip level”)
`R = Tt0 + texture(address, and rest of state neeed, but with mipentl set to “upper mip level”)
`
`Volume textures and mipmapped volume textures are implemented in the same way.
`
`Tt1 is used for implementing filers of arbitrary size. For every four samplesin the filter two acceses are made, the first
`access fetchesthe filter weights, the second fetches the texture values and uses the contents of Tt1 as the weights
`instead of a bi-linear filter.
`
`We will have explicit support for bi-cubic filters, and seperabie filters to avoid the doubled cycies of the previous
`method.
`
`Exhibit 2042.doc
`
`16774 Bytes*** © ATI BBRcference Copyright Notice on Cover Page © ***yo4/45 4-44 pm
`
`AMD1044_0017957
`
`ATI Ex. 2012
`IPR2023-00922
`Page 6 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 6 of 12
`
`

`

`
`
`
`
`GEN-C0000-REVA
`
`
`23 January, 2001
`4 September, 2015
`1.2 Texture/Memory State
`
`ORIGINATE DATE
`
`EDIT DATE
`
`
`
`DOCUMENT-REV. NUM.
`
`Texture/memoryfetch state is stored in the constant registers; each texture uses two sequential 128 bit registers to
`hold the 256 bits of state per texture.
`
`The contents of these constant registers were stored in normal registers in previous chips.
`
`A very early version of the data stored in a texture/memory constant register is:
`
`Field size|Description
`
`
`
`Min_MIPlevel|4 Clamp mip map level to this level
`
`
`Max MIP level|4 Clamp mip map level to this level
`
`
`First_MIP_level|4 First mip map level in tree. (do we want/need this?)
`
`Clamp _$
`3
`Clamp/wrap/
`
`Clamp_T
`3
`
`Clamp_W
`3
`Clamp control for volume textures
`
`Border_mode
`1
`
`Tx_format
`5
`Non_power2
`1
`
`to 0 after clamp mirror, must be multiplied by txwidth/txheight
`
`0-
`
`
`
`
`
`
`texure is in range 1
`(powerz2)
`1-
`texture has been multipled by the texture size in the pixel shader. (need to work out how
`
`to deal with clamp modes)
`
`TXWIDTH
`4
`Texture width (or faceO width)
`
`TXHEIGHT
`4
`Texture height (or face height)
`
`TXDEPTH
`4
`Texture depth (volume textures)
`
`TAWIDTH_f1
`4
`
`TAHEIGHT_ft
`4
`
`TAWIDTH_f2
`4
`;
`
`TXHEIGHT_f2|4
`TAWIDTH_f3
`4
`
`TAHEIGHT_f8|4
`
`TAWIDTH_ 4
`4
`
`TXHEIGHTf4|4 |
`TXWIDTH[4
`
`TXHEIGHTf|4
`
`Alpha_mask?
`1
`
`Chroma_key?
`1
`Tex_coord_typ|3
`
`
`OD_BIAS
`14
`
`TX_PITCH
`14
`Numberofbits will decrease, used with non power 2 textures
`Offset
`32
`Texture offset (includes endian and tile control)
`
`e L
`
`Limit
`
`32
`
`Any memory accesses > limit will be killed, and the pixel that made the requestwill also be
`killed. If the access was from a vertex shader, then the vertex shaderwill for the x value of
`the vertex to be NAN whichwill kill all triangles that attempt to use the vertex.
`
`1.3 Initial state
`
`1.3.1 Vertex Shader
`
`A vertex shader initially has the x value of RO set to the vertex index. No other registers are filled. The vertex shader
`must use the index to fetch the vertex data from the vertex array(s), The pointers to the vertex arrays should be
`placed in constant registers by the driver.
`
`Exhibit 2042.doc
`
`16774 Bytes*** © ATI BEReference Copyright Notice on Cover Page © ***po4/95 o4-44 py
`
`AMD1044_0017958
`
`ATI Ex. 2012
`IPR2023-00922
`Page 7 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 7 of 12
`
`

`

`
`
`
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`
`
`R400 Shader Processor Model
`23 January, 2001
`4 September, 2015
`1.3.2 Pixel Shader
`
`The pixel shader has the interpolated values generated from the values exported by the vertex shader.
`If the vertex shader did expxy, and the appropriate control bit in the rasterizer is set, then the register 0 contains the
`x,y,Z,w of the pixel (screen space). If the pixel shader wants a world space x,y,z,w the vertex shader should output
`that.
`
`1.3.3 2D Shader
`to be defined
`
`1.3.4 RealTime Shader
`To be defined
`
`2. Program Format
`
`A pixel or vertex shader program consists of 16 clauses, eight texture and eight alu.
`The instructions in a clause will be executed sequentially,
`
`3. ALU instruction format and other instruction related issues
`
` _Reg/ConstantPointer| [nent
`
`
`
`3.1 ALU instruction format
`
`
`Field
`Size
`Description
`
`Vector Opcode
`5
`Opcode
`eee
`_Sealar/Alpha Opcode 7.
`Opcode forthe Scalaror Alpha channelinstruction:
`Scalar Source Select
`1
`Selection the input for the scalar operation out of SRC B or SRC C when a
`scalar instruction is coissued with a vector operation. The vector operation has
`
`
`to be 2 sourceinstructions.
`
`SRC ARGB Select
`2
`
`
`SRC B RGB Select
`
`SRC C RGB Select
`
`aSRC A Alpha Select|
`
`SRC B Alpha Select
`2
`
`SRC C Alpha Select
`2
`
`SRC RGB|8A
`
`Location of Source A in the registerfile
`
`2 2 2
`
`
`
`
`SRC RGB|8 Location of SourceBin the registerfileB
`
`Reg/Constant Pointer|eeannetntnnnnnnnennnennnnensncne
`
`
`
`SRC RGB|8Cc Location of Source C in the registerfile
`
`Reg/Constant Pointer
`
`
`
`SRC Alpha|&A Location of SourceA in the registerfile |
`
`
`Reg/Constant Pointer
`|
`
`SRC Alpha|8 Location of SourceBin the registerfileB
`
`
`
`Reg/Constant Pointer
`
`
`
`SRC Alpha|8Cc Location of Source C in the registerfile
`
`Reg/Constant Pointer
`
`
`SRC ARGB Arg Mod|2 Argument A modifier
`
`
`SRC B RGB Arg Mod|2 Argument B modifier
`
`
`SRC C RGBArg Mod|2 Argument C modifier
`
`
`SRC A Alpha Arg Mod|2 | Argument A modifier on the alpha channel |
`
`_SRC BAlphaArgMod |2 |ArgumentBmodifieron the apha channel
`
`SRC C AlphaArg Mod|2 _ _ArgumentCmodifier onthealphachannel
`SRC Aswizzie|7120
`3hbitsforeach component.
`eeeesesesesesesesesesaesesaiesesee-
`
`SRC B swizzie
`12
`3 bits for each component
`Exhibit 2042.doc
`16774 Bytes*** © ATI BBcference Copyright Notice on Cover Page © ***yo4/15 ga-4a pM
`
`AMD1044_0017959
`
`ATI Ex. 2012
`IPR2023-00922
`Page 8 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 8 of 12
`
`

`


`
`ORIGINATE DATE
`
`
`DOCUMENT-REV. NUM.
`EDIT DATE
`
`23 January, 2001
`4 September, 2015
`GEN-CXXXXX-REVA
`9 of 12
`
`SRC C swizzle
`12
`| 3 bits for each component
`
`
`scalar/Alpha Output|3 Output modifier for the result of the Scalar or Alpha channel operation
`
`Mod
`
`PAGE
`
`
`
`
`‘VectorExpotfiag [| [TBD
`
`
`Vector Output Modifier|3 Output modifier on the vector result (RGB) when alpha operation is being
`
`: coissued or ARGB when scalar operation is being coissued
`
`Scalar/Alpha Clamp
`1
`
`Vector Clamp
`1
`
`
`
`Scalar/Alpha Write|4 Defines which out of 32 bits words (four of them)in the result is written back
`
`Mask
`in the register file
`
`Vector Write Mask
`Defines which out of 32 bits words (four of them)in the result is written back
`Scalar/Alpha
`result
`| Specifies the addressinto the register files for result of scalar/alpha operation
`
`pointer
`|
`
`Vector result pointer
`Scalar Export flag
`
`| TBD
`
`D = A truncated RndV
`
`3.2 ALU Opcodes
`The following opcodes are native, core opcodes:
`Name Notes Function
`
`
`MACC
`/D#A*B+C
`(add is A*1.0
`+ C)
`(mul
`is A*B +
`0.0)
`
`(nop/passthrough/move is A*1.0 + 0.0)
`
`MSUB|D=A*B-—C
`(sub is A*1.0 — C)
`MRSB|D=C-—A’*B
`
`DOT2
`D = (Ax * Bx) + (Ay * B.y)
`
`DOT3
`D = (Ax * Bx) + (Ay * B.y) + (A.z * B.z)
`
`DOT4
`D = (Ax * Bx) + (Ay * By) + (Az * B.z) + (A.w* B.w)
`
`RECP
`D=1/A.w
`Use broadcast to select something other than w
`CMxx
`D=AifC xx 0.0, else B
`Xx can be: gt,gte,eq (It.lte,neé can be generated by
`
`swaping a and b)
`
`CLMP|D=Aif (B>A>C) else Bif (A> B) B else C
`ABSV D=AifA>0 else B
`
`
`CEIL
`D = A; the smallest integer D such thatD >A
`FLOR
`
`D = A rounded to the nearest integer
`
`FRAC|D=A-floor(A)
`
`MINV|D = min(AB)
`
`MAXV|D = max(A.B)
`Area
`D = area(A)
`Possible opcode:
`Sets D to A.w,A.w,A.w,A.w where each A is from
`a different pixel
`in the quad. Can be used to
`calculate area for LOD calculations, usefull
`for
`
`
`antialiasing procedural shaders.
`
`Exp
`D = Pow(a)
`Possible opcode, otherwise texture lookup
`
`
`
`RSOR|D = tisqrt(a) Possible opcode, otherwise texture lookup
`
`Log2
`D = Log(A, base = 2)
`Possible opcode, otherwise table lookup
`
`Log
`D = Log(A, base = B)
`Possible opcode, otherwise table lookup
`
`Pow
`D = A to the B'th power
`SCLP|D=ABC
`Concatinate clip code test results into a single
`DWORD
`CUBE|D=A
`Find the largest of x,y,z, place the reciprocal of
`that value in w, set x and y to the two remaining
`values, and identify the face as the integer 0 to 5
`in Z. Used to setup for a cube map.
`
`
`
`
`
`Exhibit 2042.doc
`
`16774 Bytes*** © ATI BEReference Copyright Notice on Cover Page © ***po4/95 o4-44 py
`
`AMD1044_0017960
`
`ATI Ex. 2012
`IPR2023-00922
`Page 9 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 9 of 12
`
`

`

`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`4 September, 2015
`
`R400 Shader Processor Model
`
`10 of 12
`
`23 January, 2001
`
`_Theexportopcodes mustbeinthe lastaluclause:
`
`xportpositionand clipcodes of ‘vertex,end
`“EXPP | Export position: position = A, clipcodes = B
`
`_ vertex shader
`16
`| Export vertex component
`EXP
`Export vertex component, {D} =A
`to vertex cache,
`| possible destinations
`| Placeholder for the rasterizer to put pixel position |
`EXPAY | Placeholder, {0} = 1,0,0,0
`
`nd Z into pixel shader.
`
`
`EXPC
`
`Export color
`xport color of pixel, end pixel shader
`| Export Z of pixel
`Export Z
`EXPZ
`
`
`
`
`
`There are some extraction and data conversion opcodes to be added.
`There will also be a specialized opcode for cube maps.
`
`The export opcodes will be removed in a future version, and replaces with new destinations for general operations.
`Exports muststill be the last operations executed.
`3.3 Macro opcodes
`These instructions are NOT implemented in the R400. But their functionality can be implemented with the shown
`opcodes.
`
`4 Texture/Memory
`
`4.1 Instruction Format
`
`Destination control:
`
`
`
`
`Field Size|Description |
`
`Initialize
`1
`If set, destination register is set to 1,0,0,0 before any writes are made (w,z,y,x) (a,b,g,1)
`
`Wmask 4 Write mask for result of lookup
`Channels
`2
`0: 1 channel, duplicate across all four channels
`1:2 channels d.x.d.y = a, d.z,d.w=b
`2:2 channels d.x,d.z =a, dy,d.w=b
`
`3: 4 channels
`
`4
`
` Volume filter | 1
`
`
`Data_Format
`
`0: unsigned int
`1: none- just write to destination register
`
`others (Z, apple YUV, mpeg, etc.)
`
`Bias
`1?
`Do we want a bias other than —-128?
`|
`
`Scale
`>4
`Result value is (range(s)tbias)*scale
`|
`
`Range
`1
`0 to 1 or O to 255/256 (or 65335/65336 etc..)
`
`Texture
`7
`Which texture we want to fetch from
`|
`
`Texture_t
`1
`Texture or linear memory array
`(if linear array, then only offset and limit are noticed, point sampling is forced, format is |
`
`32bpp)
`|
`
`Dest
`7
`Destination address of texture/memory fetch
`|
`
`Src
`7
`Source register for address
`
`Swizzel
`5
`Which part of source register contains texture coordinate
`MAGFilter
`1
`0- Nearest
`
`1-_ Linear
`
`Min_Filter
`1
`0- Nearest
`
`1-_ Linear
`0- Disabled
`2
`Mip_filter
`1- Enabled
`
`Filtering betweeen volume texture levels
`16774 Bytes*** © ATI BBcference Copyright Notice on Cover Page © ***yo4/15 ga-4a pM
`
`Exhibit 2042.doc
`
`AMD1044_0017961
`
`ATI Ex. 2012
`IPR2023-00922
`Page 10 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 10 of 12
`
`

`

`
`
`
` TE DATE
`
` DOCUMENT-REV. NUM.
`
`ORIGINA
`EDIT DATE
`
`
`23 Janua
`GEN-CXXXXX-REVA
`
`4 September, 2015
`y, 2001
`0- Nearest
`
`{- Linear
`
`2
`
`| Filter_mode
`
`| MIP_enable
`1
`Q-
`no mipmaping
`
`1-_mip-mapping
`0-
`treat input as four 8bit values
`{-
`treat input as two 16bit values
`|
`2-
`treat input as one 32 bit value
`
`3-
`input is notfiltered
`
`| Scale
`4
`Multiply the x (and y?) cornponents by this value, used for vertex fetching
`
`| Offset | 4
`Add this integer to the x component, after the scale, used for vertex fetching.
`| Combine
`2
`0: R=R
`| mode
`1:R=R+ Tto
`2: Use Tti values as biend factors
`
`3:
`
`2
`
`0: Do not store result of fetch (NOP)
`| Output mode
`1: store result in GPR
`|
`|
`2: store result in Tt
`
`3: store result in Tt1
`
`
`
`
`
`2
`
`| Mip mode
`
`0: normal
`2: lower mip filter and bias
`
`|
`-
`3: upper mip filter and bias
`| Tex3D mode
`0: normal (non volume)
`2
`2: lower z fetch
`
`3: upper z fetch
`
`| Sample_bias_|4 2's complementoffsetfor bi-linear sample.
`| x
`0 will generate a normal fetch, a positive or negative number will fetch the 2x2 samples that
`distance in 2x2s away. Used to impement bi-cubic and arbitrary sized filters
`
`| Samplebias|4 2’s complementoffsetfor bi-linear sample.
`ly
`0 will generate a normal fetch, a positive or negative number will fetch the 2x2 samples that
`distance in 2x2s away. Used to impementbi-cubic and arbitrary sized filters
`
`We can movefields between here and the constant register that hold the rest of the texture fetch state.
`42 Opcodes
`
`| Name
`Function
`| Notes
`
`i TF
`Texture fetch
`
`| CTF
`Cube texture fetch
`|
`| AF
`Array fetch
`Used for vertex array fetches, ignores most of the
`state in the texture constant registers, allow driver
`to only store offset+limit values
`in constant
`register
`
`
`
`
`
`
`
`4.3 Example
`
`To do a vertex fetch the instruction mightbe asfollows:—ee—eesesesa—isi‘“‘“‘i‘i‘i‘ir
`
`(Field Size|Description
`
`| Initialize
`1
`Setfor first fetch to each gpr.
`
`| Wmask
`4
`Set to write to correct element of gpr- ie if fetching x, would be set to 1000
`
`| Channels
`2
`0: 1 channel, duplicate acrossall four channels
`|
`| Data_Format
`4
`1: none- just write to destination register
`
`or 2? Color, unpack to rgba
`
`_Bias
`1?
`10
`
`
`| Scale >410
`
`| Range
`1
`0 to 255/256
`|
`| Texture
`7
`Which texture we want to fetch from
`
`| Texturet 1 _|linearmemoryarray eee
`
`| Dest Destination address of texture/memoryfetch I 7
`
`Exhibit 2042.doc
`16774 Bytes*** © ATI BBRference Copyright Notice on Cover Page © ***o9.04/45 ga-44 pu
`
`AMD1044_0017962
`
`ATI Ex. 2012
`IPR2023-00922
`Page 11 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 11 of 12
`
`

`

`12 of 12
`
`
`
`
`PAGE
`
`T
`
`Field : size|Description |
`
`|
`DOCUMENT-REV. NUM.
`EDIT DATE
`ORIGINATE DATE
`
`|
`R400 Shader Processor Model
`4 September, 2015
`23 January, 2001
`
`
`7
`Source register for address
`Siro
`
`Swizzel
`5
`Which part of source register contains texture coordinate
`
`‘MAGFilter[1 [|2-Nearest
`
`Min_Filter |2- Nearest22eae1
`
`
`
`Mip_filter
`2
`2- Disabied
`Volume_filter
`1
`Filtering betweeen volume texture levels
`
`| 2- Nearest
`
`2- nomipmaping
`MIP_enable
`1
`eee|
`
`Filter_ mode
`2
`4-_inputis not filtered
`|
`Scale
`4
`Multiply the x (and y?) components by this value, used for vertex fetching
`
`Offset
`4
`Add this integer to the x component, after the scale, used for vertex fetching.
`And the constant register pair that texture points to would contain:
`
`
`Min_MIP_level
`| 4
`NA
`Max_MIP_level
`[4
`| NA
`First_MIP_level
`|4
`| NA
`Clamp_$
`3
`NA
`Clamp_T
`3
`NA
`Clamp_w
`13
`| NA
`
`Border_mode
`1
`NA
`
`Tx_format
`5
`3z2bpp
`;
`Non_powerz
`1
`2-
`texture has been multipled by the texture size in the pixel shader. (need to work out how
`
`to deal with clamp modes)
`
`TAWIDTH
`| 4
`Texture width (or faceO width)
`
`TXAHEIGHT
`14 Sn
`
`TXDEPTH
`4
`NA
`
`TAWIDTH_ ft
`4
`NA
`
`TXHEIGHTfi
`| 4
`NA
`
`TXWIDTH_f2
`4
`NA
`
`
`TXHEIGHT(20040 |NA
`
`TXWIDTH_f3
`4
`NAO
`eeseseseaeeseeseseseseseseaeseseseseseseseseaeeeeeeeeeeeee|
`
`TXHEIGHT 3 | 4
`NA
`|
`
`TXWIDTH_f4
`4
`NA
`
`TXHEIGHTf4 | 4
`NA
`7
`
`TXWIDTH_f5
`4
`NA
`
`
` TXHEIGHT 15|4 NA
`
`Alpha_mask?
`|
`1
`NA
`
`Chroma_key?
`1
`NA
`Tex_coord_typ
`3
`?1D array
`
`
`ODBIAS
`14
`0
`
`TX PITCH
`1 14
`Number of bits will decrease, used with non power 2 textures
`Offset
`32
`Texture offset (includes endian and tile control)
`
`
`
`
`
`
`e L
`
`Limit
`
`Any memory accesses > limit will be killed, and the pixel that made the request will also be
`killed. If the access was from a vertex shader, then the vertex shader will for the x value of
`
`the vertex to be NAN whichwill kill all triangles that attempt to use the vertex.
`
`32
`
`Exhibit 2042.dec
`
`16774 Bytes*** © ATI BBference Copyright Notice on Cover Page © ***po4/45 04-44 PM
`
`AMD1044_0017963
`
`ATI Ex. 2012
`IPR2023-00922
`Page 12 of 12
`
`ATI Ex. 2012
`
`IPR2023-00922
`Page 12 of 12
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket