throbber
&
`
`ORIGINATE DATE
`
`
`
`
`
`
`
`
`Author: Andrew Gruber, Andi Skende
`
`17 January, 2002
`
`EDIT DATE
`
`9 October, 2015
`
`DOCUMENT-REV. NUM.
`
`GEN-CXXXXX-REVA
`
`PAGE
`
`1 of 43
`
`Issue To:
`
`Copy No:
`
`Shader Processor
`
`Rev 1.2
`
`
`
`Overview: This document describes the overall architecture of the Shaders, interfaces, partitioning into functional blocks as
`well as the timing of the shader pipeline. It’s intended for use by hardware designers.
`
`AUTOMATICALLY UPDATED FIELDS:
`Document Location
`: Hma_andi_mobile/..../doc_lib/parts/sp
`Current Intranet Search Title: Shacer Processor
`ons APPROVALS
`
`Name/Dept
`Signature/Date:
`
`
`
`
`Remarks
`
`
`
`THIS DOCUMENTCONTAINS [RN FORMATION THAT COULD BE
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES INC.
`THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`“Copyright 2001, ATI Technologies Inc. All rights reserved. The material in this document constitutes an unpublished work
`created in 2001. The use of this copyright notice is intended to provide notice that ATI owns a copyright in this unpublished
`work. The copyright notice is not an admission that publication has occurred. This work contains EIEN:oprictary
`information and trade secrets of ATI. No part of this document may be used, reproduced, or transmitted in any form or by any
`meanswithout the prior written permission of ATI Technologies Inc.”
`
`
`Ex.2042 - r400-doc_lib-design-blocks-sp__Shaders.doc__file#9(2).doc
`
`62641 Bytes*** © ATI HBReference Copyright Notice on Cover Page ©
`wee
`10/09/15 12:06 PM
`
`ATI 2042
`
`LG v. ATI
`
`IPR2015-00330
`
`AMD1044_0013380
`
`ATI Ex. 2008
`IPR2023-00922
`Page 1 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 1 of 43
`
`

`

`
`PAGE
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`Vat
`2 of 43
`17 January, 2002
`9 October, 2015
`R400 Shader Processor Model
`? sf |
`
`
`
`
`
`
`
`
`Table Of Contents
`STATE wu... ecccccsscssesssssccsescsssssessesssesesseessesesesesusssstaeseasscecenssuseseaeececsatsesessensencessneaeseersuentceseaneneceseaees 6
`1.1 Shader State ooccc cece cce cscs cess sessisesesasusstveasessisestisisstsisessisiteeseseseeeetesseeees 6
`1.1.1 GPRs (General Purpose Registers)... cccccccccc cesses ceseecseteteteeeteteeeteteetenseees 6
`1.1.2 Constant REGIStErs o.oo ccc ccc cscestetesssettesesestase vesesestissnsitatetsisetetsiseesesseeeees 6
`1.1.3 Previous Instruction Result .o..ccccecccceccccccsceeeccsceeeeseseseesesesetsesseetenaestensesensenes 7
`1.2
`UMItIAl SEATS cece ccc cecenerse tener sittnesesisise treaseatitintisisssitessisitetseseteteetiseeeees 7
`T.2.1 Vertex SHader iin cccccccececcccsescsesscsceetscscsetecscseceessevevsnsrestsaveventavevenseseveesevevens 7
`1.2.2 Pixel SMader ooo ccc cecccceccececcce cece cece cevevecevevevssevevetstssesesesetesesvivsvevevevevivivevveeees 7
`PROGRAM FORMATuuu... ccscceccsssssescsssessssssssnsescsesnsscseensencsesseaeeseceanaesesssnenenecenens 7
`2.
`3. ALU. ceccssssssssssesssnssneesssssnesssusanenssssaeeucassaseneseeusanesesueneesesesuesosseensanestensaesatsseanaseasenenseseeneas 7
`3.1 ALU SErUCTULonce cc cee cece eseasenisetentistsentessstasess tisasestisiesisetsisisecetseteeseseeeetes 7
`3.2 ALU Instruction fOr... ccc cee cececssestenssee ten teseitetessutenessstetevaseetseeseneeees 7
`3.2.1
`ALU Instruction Word Interpretation ....0.c.cccccccccccccsesesescsesecescevensnsvsvevsusesvsvavevsvsnevevensestey 9
`3.2.1.1. Relative vs. Absolute Constants... ccccccceccessceeeeeeseeeeceeseeessessssaseaaseeneeenseteeees 9
`3.2.1.2 Argument Selection and POINtELS «2.0.0...e cece eee ceeececc cee esesaaaeaeeeeeeeeseeeeees 9
`3.2.1.3
`Input and Output Modifiers 0000s ccc eeeeeeeeteeeecceeeeeessseeeeauseaaaaeeeseeeenesceeeees 9
`3.2.1.4 Export and Predicate related deCoding 0.2.0... ccceccccccee cece eeeecessaeeeeseeeaaaneeeees 10
`3.2.1.5 Export Types and AddreSsSe6Ss 2000 tete cee eeeeeeeeeeeeeeeeeeaaeaaaaaaeaeeeecees 11
`3.3 ALU OpPCode)oon cee ccessesessessessssssssensesseitattessissassissseesssssatseiseeetsetseeeseavesees 11
`3.4 DX9.0 Shader Instructions, related exceptions and corner CaS@S ......... occ cceeeeeecceeeeeeeeees 14
`3.5 Ma€aCro OPCOdESoo. cece sete tesesteneseeceseesssseensisesineesssesseresesisissisiseesisiesesenseees 14
`4.
`SHADER BLOCK DIAGRAMG. .....0 cece ccceeeecsesneeeesseeesseseneeseeneeeneenens 15
`4.1
`Shader as an SIMD architecture 20000...e cece cece ee ttt tebe bette teteeeteeeeeees 15
`4.2 Top-Level Diagram of a Shader Pipeline... ccc ccceesssseceeeeseseceeecccseeeensteassessaeeneeeesees 15
`5B.
`INTERFACES ..............cccccsssseccessssceencesccnsosecesoncecessnescessaseceataeeceuceseesscuessescuecessasecesscsecenccuessas 16
`5.1
`External Interfaces oo... ccccccceeessteeceeeeeeeeeeceeeeeescceneesseeeeaeeeeeeeeeeeseces secs taaaaasseeeaeasaeea 16
`5.1.1
`Naming Convention... cceccccccccecsesesescsesesescsevesecscsveeessssavevecsusveeesevevevevsvetsneneees 16
`5.1.2 Shader Engine to Texture Fetch Unit BUS ....0..cccccccccccscsesesceveesescsvsesesesesnsesnsestevsvsnsens 17
`5.1.3 Sequencer to Shader Pipe(s): Texture stall... 0 c.ccccccccccccceccssceeseseseeescsseseeeeeseeesesesees 17
`5.1.4 ScanConverter to Shader Pipe: |J DUS o...0.c.ccccccccccccesccsecesesescsssecsesetetecetseeseeeesvsetetereets 17
`5.1.5 Sequencer to Shader Pipe(s) - broadcast: interpolator DUS ........ ec cceccceeeeeernees 18
`5.1.6 Sequencer to Shader Pipe(s)-broadcast: Parameter Cache Read control bus........... 18
`5.1.7 Sequencer to Shader Pipe: GPR, Parameter Cache control and auto counter........... 18
`5.1.8 Shader Pipe to Shader Export (SX): Parameter data out of Parameter Cache........... 19
`5.1.9 Shader Export (SX) to Interpolators: Parameter Cache Return DUS ........ccccceeecen 19
`5.1.10 Shader Pipe to Shader Export (SX): Pixel/Vertex write to SX oo... cecccccccceeceeceeeeeeen 19
`5.1.11
`Sequencer to SPx: Instruction Interface ooo... ocecececccccececcsceesescscsesesvsvstetstsvsesteveveees 20
`5.1.12 Shader Pipe to Sequencer: Constant address l0ad.....ccccceccesesecseseeteseseeeesees 22
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`10/09/15 12:06 PM
`
`AMD1044_0013381
`
`ATI Ex. 2008
`IPR2023-00922
`Page 2 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 2 of 43
`
`

`

`
`
`
`
`
`
`
`PAGE
`Pat
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`
`
`
`
`17 January, 2002 9 October, 2015 GEN-GXXXXX-REVAeJe | 3 of 43
`5.1.13 Sequencer to SPx: constant broadCast ooo... ccececcceceeeecseeceeevsesteveveceveveveveveveveteees 22
`PARAMETER INTERPOLATION .........:cccccccssseeseccseeneecneennerssecneesceescenasceesnneesenenneneneensensensans 23
`6.
`SHADER LIMITATIONS .......ccccccsscssssccessensescsenenssssnseessnsseoaeseesensessesecnsssussaesssussoaaesesaeaeaasensas 24
`7.
`HARDWARE IMPLEMENTATION SPECIFICS.......sccsscsscnsccrscseesseeesssnssssssneeeesseennensssenssesesene 25
`8.
`8.1 General Information on the Shader Floating Point arithmetic.......0...0000 cece cceeeeeeeeeee 25
`8.2
`Interpolators and IU/XY Buffers 200 cere ener en eee e eee ee tener nnn neneeeeeeeeeeeeeeees 20
`O.2.1
` Interpolators ooo cee ccc cece cecececscesevessessevevsssssevessenseeesseesvevevsvitevevstevivanevevesnessnes 26
`8.2.1.1
`Interpolation Units ooo... cece cece cece eee ee eee e eee e een nttneene ener eeeeescceeeeaeeeaaaaeaaeaaeeeesenegs 26
`8.2.1.2 Parameter Selection Unit .....000000 0000 occ cece cece cece ce cette eee eeeeennnenteeeeeees 2?
`8.2.1.3 Parameter Difference & Cylindrical Wrap Engine... cccccceccse tte ssseeeeeeeeees 2f
`8.2.2 GPR Write Patri ccccccccccccccccccssssescensssntnesestasesetrsusentissttisinssisetsiestetsesiseneetinsess 28
`SoG
`10300) 0]|)re 29
`8.3.1 Vector Unit Pipeline ooo cece cscssececscscevesesssesesssesetestesseseesesscstevevesitiveneneey 29
`8.3.2 Argument Selection and ROUtING 0... cceccccsesesscesesesseseeseeseseseeevevsseevevsnevevenseey 30
`8.3.3 Parameter Data Path. ccc ccccescscsescececscssesevesesavessensesssstrsssseseetesseitesineey 31
`Se. ere|ere 33
`8.4.1
`Scalar Engine Pipeline oo... cccccccceccessesceceseseteseessseeescsseseessssevesecseetevicenivevenetey 33
`8.4.1.1.1.1.1 High Precision Pipeline Exp, PreproceSSing............0..ccccccccceeccccsccseeeeeeeccsetteeeeesscseneeeseeeniaa 34
`8.4.1.1.1.1.2 High Precision Pipeline Mantissa Calculation... ccc ceceeeeecscctttteeeecseseteeeeseeeceea 34
`8.4.1.1.1.1.3 High Precision Pipeline Logs Post ProceSSing............cceceeeeccerreeeeectttntteeesccnstiteeeereetea 38
`8.4.1.1.1.1.4 High Precision Pipeline Exponent 000.2... 00... c eee ceenee eee c ec eeteeeeeeeesttteteeeeescstttieeeeeeeee 38
`8.4.1.1.1.1.5 High Precision Special OUtPUtS....... ccc cce cece eters tteeeeccbeeteeeeeeesnneaeeeessccetteeeeeenns 39
`8.4.1.1.1.1.6 Determination of High Precision Coefficients... cece ccceeeeeeecstttteeeeesscteteteseeecea 39
`OPEN ISSUESQo... ceeceeeceeeesseesseeceeseeeeeeeeeeeneenenneennneseaeeoeeeseeeeeeesesesnaaaeesnneeaeeesaeeesseeneenensnnns 42
`
`9.
`
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`#ee
`10/09/18 12:06 PM
`
`AMD1044_0013382
`
`ATI Ex. 2008
`IPR2023-00922
`Page 3 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 3 of 43
`
`

`

`
`
`ORIGINAT!
`
` E DATE
`
`EDIT DAT
`
`
`
`
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`
`R400 Shader Processor Model 4 of 43
`9 October, 2015
`17 January, 2002
`
`
`
`Revision Changes:
`
`Rev 0.0 (Steve Morein)
`Date: Alpril, 2001
`Initial revision.
`
`Rev 0.1 (Andi Skende)
`Date: May 09, 2001
`
`Rev 0.2 (Andi Skende)
`Date: May 21, 2001
`Rev 0.3 (Andi Skende)
`Date: June 19, 2001
`
`Rev 0.4 (Andi Skende)
`Date: June 20, 2001
`Rev 0.5 (Andi Skende)
`Date: July 31, 2001
`
`Rev 0.6 (Andi Skende)
`Date: August 17,2001
`
`Rev 0.7 (Andi Skende)
`Date: November 8, 2001
`
`Rev 0.8 (Andi Skende)
`Date: November 27, 2001
`Rev 0.9 (Andi Skende)
`Date: December 10, 2001
`
`Rev 1.0 (Andi Skende)
`Date: January 15, 2002
`
`Rev 1.1 (Andi Skende)
`Date: January 21, 2002
`
`Rev 1.2 (Andi Skende)
`Date: January 22, 2002
`
`Documentstarted
`
`initial block
`Updated, added the instruction formant,
`diagrams and preliminary interface description
`
`of
`
`the SP<->TEX,
`
`description
`detailed
`A more
`RE/Sequencer <->SP interfaces.
`Added the paragraph related to shader functional
`limitations that the compiler needs to be awareof.
`A new updated and compressed version of ALU
`instruction format.
`Updated the Introduction of this document. A new
`Pipeline Timing Diagram wasinserted.
`Merged in the Shader Hardware Spec. A more detailed
`description of the interfaces with the other blocks was
`added. Updated some of the diagrams to a more
`correct representation of the datapaths.
`
`of Shader
`
`description/definition
`detailed
`A more
`interfaces with the other blocks.
`A more detailed description of the instruction supported
`by Shader Processor and it’s relation to instruction set
`exposed at API level.
`Updated the Alu instruction word definition and the list
`of the alu instruction opcodes supported by the shader
`pipe ALU unit.
`Updated the definition of the External Interfaces
`
`Updated the definition and naming of some of the
`external
`interfaces,
`rearranged the ALU instruction
`word definition such that the fields are dword aligned.
`The instruction opcode definition was updated and
`expanded.
`Updated most of the diagrams. Updated the External
`Interface definitions. Added a description of
`the
`Parameter
`Interpolation Units. Added a diagram
`desciption of the GPR write data paths.
`Updated some of the external interface definitions.
`Specified
`the
`expected
`behavior
`of
`hardware
`implementation of some shader opcode with some
`corner case values as input arguments. The MS
`Reference Rasterizer shader was used as guideline.
`Updated some of the external interface definitions.
`
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`ake
`10/09/15 12:06 PM
`
`AMD1044_0013383
`
`ATI Ex. 2008
`IPR2023-00922
`Page 4 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 4 of 43
`
`

`

`
`
`ORIGINATE DATE
`
`EDIT DAT
`
`
`
`
`
`DOCUMENT-REV. NUM.
`
`PAG
`
`
`
`
`
`5 of 43
`GEN-CXXXXX-REVA
`9 October, 2015
`17 January, 2002
`
`
`Introduction
`
`Shader Pipe (SP) serves as the central Arithmetic and Logic Unit (ALU) for the R400 Graphics Processor. There are four
`identical Shader pipelines in the R400 architecture. Differently from previous AT! architectures, the R400 Shader Pipetruly
`represents an Unified Shader Architecture. In R400, both vertex and pixel shading operations are implemented through the
`shader units. The R400 Shader Pipe represents an SIMD architecture. All the shader units of each and every pipe execute
`the same ALUinstruction on different sets of vertex parameters/pixel values. The building blocks of the R400 shader units
`execute operations on single precision IEEE floating-point values.
`
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`#ee
`10/09/18 12:06 PM
`
`AMD1044_0013384
`
`ATI Ex. 2008
`IPR2023-00922
`Page 5 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 5 of 43
`
`

`

`
`
`
`

`
`
`ORIGINATE DATE
`
`EDIT DAT
`
`
`
`
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`
`
`
`17 January, 2002
`9 October, 2015
`R400 Shader Processor Model
`6 of 43
`
`State
`
`1.1 Shader State
`
`1.1.1 GPRs (General Purpose Registers)
`The general-purpose registers are 128 bits wide, composed of four 32-bit values. Depending on the operation these
`values are interpreted at RGBA, or XYZW, or STQW, or UVQW,or YUVA,or.. to simplify matters the only two aliases
`used here are XYZW and RGBA.
`
`To hide the latency of memory accesses the shader pipe will switch between different vectors. This is the same as
`the idea of “microthreading” that some advanced CPU's are investigating. The large register file is split between the
`vectors executing in the shader pipe. The management of the shader register file is automatic, and not visible to a
`program executing on a vector, except that a program is required to declare the number of GPRsit needs to execute.
`The hardware will not start a vector until the required number of registers is available. There is a direct tradeoff
`between the number of registers each program/vector needs and the number of vectors than can be simultaneously
`resident.
`If there are too few vectors resident, then the latency of memory accesses can no longer be hidden and
`performancesuffers.
`It is possible for a single program/vector to requestall 128 registers. This will make
`There are a total of 128 registers.
`it impossible to hide memory latency, but the program will still execute and generate the correct result.
`Most pixel programs are expected to have less than eight registers, vertex programs are expected to haveless than
`sixteen registers.
`If a
`The number of registers a program needs is the maximum number of registers it needs at any instruction.
`program needs only 3 general purpose registers nearly all of the time, except for a short period when it needs 8, it still
`needs to allocate eight. A significant performance optimization is for the compiler to reorder the instructions to
`minimize the number of needed registers.
`
`127
`95
`63
`31
`Q GPR
`
`AW
`B/Z
`GY
`R/X
`RO
`
`R1
`
`
`
`
`
`
`
`
`
`
`
`Notation:
`
`RO.A refers to the bits 96 to 127 of register one. So does RO.W
`
`1.1.2 Constant Registers
`There are also (192?) constant registers:
`
`127
`95
`63
`31
`O Const
`
`AIW
`BIZ
`GIY
`RIX
`Co
`
`C1
`
`
`
`
`
`
`
`
`
`
`
`
`R127
`
`C191
`
`These are ONLY available to vertex and pixel shader program in the primary commands stream. They should not be used
`for real time stream pixel shaders, or 2D shaders. Constant Registers are physically part of the Sequencer unit. As it
`become clear by reading the rest of this document, the content of the constant registers can be made available to the ALU
`units of the shader pipes in the form of one of the possible alu operation arguments. ALU instruction word providesfor that.
`
`The constant registers are shared between vertex shaders and pixel shaders, itis the drivers job to allacate one section to
`pixel shaders and another to vertex shaders to match the D3D programming model, other API’s may allow more freedom.
`To be able to support multiple textures easily, and to save hardwarearea, the texture state registers are stored in constant
`registers. A pair of constant registers hold 256 bits of texture state. Rather than have four or six sets of texture registers as
`we do in the R100,R200, and R300 bystoring them in the constant memory we can save area by reusing the logic already
`needed to update the constant registers in order. Since any single texture instruction will only fetch from one texture we do
`not need the simultaneous access we would get with implementing this as “normal” registers. The driver will probably decide
`to allocate a fixed number of the constant registers as texture registers.
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`#ee
`10/09/18 12:06 PM
`
`AMD1044_0013385
`
`ATI Ex. 2008
`IPR2023-00922
`Page6 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 6 of 43
`
`

`

`
`
`ORIGINATE DATE
`
`EDIT DAT
`
`
`
`
`
`DOCUMENT-REV. NUM.
`
`PAG
`
`
`
`7 of 43
`GEN-CXXXXX-REVA
`9 October, 2015
`17 January, 2002
`
`
`1.1.3 Previous Instruction Result
`
`Within an ALU clause the result of the previous operation is explicitly available, without requiring a register read.
`(due to an exposedpipeline delay, the result of the previous operation can not be read from the register file without a
`one-instruction delay slot). There are twodistinct previous instructions, one scalar and one vector.
`This register is not preserved between the end of one alu clause and the beginning of another.
`It can be used to avoid using another GPR if the result is not needed. Also, the output modifiers, which do effect the
`result of an instruction written into GPRs, do not effect the Previous Result content.
`
`127
`
`95
`
`63
`
`31
`
`0
`
`1.2 Initial state
`
`1.2.1 Vertex Shader
`
`A vertex shader initially has the x value of RO set to the vertex index. No other registers arefilled. The vertex shader
`must use the index to fetch the vertex data from the vertex array(s), The pointers to the vertex arrays should be
`placed in constant registers by the driver.
`
`1.2.2 Pixel Shader
`
`The pixel shader hasthe interpolated values generated from the values exported by the vertex shader.
`If the vertex shader did expxy, and the appropriate control bit in the rasterizer is set, then the register 0 contains the
`x,y,z,w of the pixel (screen space).
`If the pixel shader wants a world space x,y,z,w the vertex shader should output
`that.
`
`2. Program Format
`
`A pixel or vertex shader program consists of 16 clauses, eight texture clauses and eight alu clauses.
`The instructions in a clause will be executed sequentially. If a given instruction is implementing, for example,
`T*S+0D(T = texture for SRC A, S = Specular for Source B, D = Diffuse for Source C),
`it’s the Sequencer’s task to
`resolve the dependencies between the ALU clause and the respective texture clause.
`In other words, the sequencer
`will not issue the ALU instruction using texture data as input to the shader pipe, until the texture request has been
`issued to and serviced by the texture pipe.
`In general, the Shader is not aware of the origine of the SRC A, SRC B
`and SRC C data (texture, diffuse, specular, vertex parameters etc). Three address pointers into the register files (one
`for each operand) are all the shaders needto fetch these operands.In reality, as it will become more evidentlater in
`this document, there is no need for the pointer values to be passed to the shader units. This is related to the GPR's
`read/write mechanism we have chosen to implement.
`
`3. ALU
`
`3.1 ALU structure
`
`ALU consist of two distinct units: the ‘Vector’ ALU and the ‘Scalar ALU. The Vector ALU peforms operations in
`parallel across a 4-component vector, while the Scalar ALU performs operations on a single componentof a vector
`which is then replicated across all components. A single instruction may ‘co-issue’ both a Vector and a Scalar
`instruction, subject to the limitation that the vector instruction may only require 1 or 2 arguments. For example, a
`Vector MUL (Multiply) instruction can be coissued with a Scalar instruction, but a MULADD (Multiply and ADD) may
`not.
`For more details on the overall structure of the Shader ALU, refer to the figures in Section 5 of this document.
`
`3.2 ALU instruction format
`
`There are two opcodespresent in the ALUinstruction, one for the Vector operation and one for Scalar operation. The
`idea is that we can allow a 4-componentvector operation (if the compiler permits) coissued with a Scalar Operation.
`
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`#ee
`10/09/18 12:06 PM
`
`AMD1044_0013386
`
`ATI Ex. 2008
`IPR2023-00922
`Page 7 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 7 of 43
`
`

`

`Vt)
`<“- 6
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`17 January, 2002
`9 October, 2015
`R400 Shader Processor Model
`
`
`
`
`
`
`
`PAGE
`8 of 43
`
`
`
`The Scalar unit may use SRC C, depening on whether this source is being used by the vector operation. Please refer
`to Section 8 of this document on the limitations of a Vector or Scalar instruction issuing.
`
`Field
`Bits
`Size
`Description
`SRC A Select
`95
`1
`Select bit for selecting Constant vs Register/Vector/Scalar Feedback
`0: Constant
`
`1: Register/Previous Vector/Previous Scalar
`SRC B Select
`94
`1
`Select bit for selecting Constant vs Register/Vector/Scalar Feedback
`0: Constant
`
`1: Register/Previous Vector/Previous Scalar
`SRC C Select
`93
`1
`Select bit for selecting Constant vs Register/Vector/Scalar Feedback
`0: Constant
`
`1: Register/Previous Vector/Previous Scalar
`
`Vector Opcode
`92:88
`5
`Opcode for Vector instruction
`
`SRC A Register/Const-|87:80 8 Location of Source A in the register file
`
`ant Pointer
`If not Constant, Bits [6],[7] denote:
`00- (absolute register)
`01 - (relative register)
`10- (previous vector)
`
`11- (previous scalar)
`
`
`SRC B Register/Const-|79:72 8 Refer to SRC A Register/Constant Ptr
`
`ant Pointer
`
`SRC C Register/Const-|71:64 8 Refer to SRC A Register/Constant Ptr
`
`
`ant Pointer
`Constant0
`63
`1
`The address pointer into the Constant Register File is relative to some base
`
`Relative/Absolute
`address register (works in conjunction with Address Register Select)
`Constant
`62
`1
`The address pointer into the Constant Register File is relative to some base
`
`Relative/Absolute
`address register (works in conjunction with Address Register Select)
`
`
`
`Register Select
`Constant
`indexing is
`relative.
`It
`is used in conjunction with Constan0O
`Relative/Absolute and Constant1 Relative/Absolutefields.
`O:Loop indexrelative
`
`1:Address Register relative
`2
`60:59
`Predicate Select
`This bits are used in conjunction with bit 7 of Scalar Destination Pointer and
`
`Vector Destination Pointer
`
`0: No modification
`_1:negate
`SRC A Arg Modifier
`58
`
`0: No modification
`1:negate
`SRC B Arg Modifier
`57
`0: No modification
`1:negate
`SRC C Arg Modifier
`56
`SRC A swizzle
`55:48
`2 bits for each component
`45:46 alpha channel
`00:leave alpha
`01:red
`10:blue
`11:green
`47:48 red channel
`OO:leave red
`01:green
`10:blue
`11:alpha
`49:50 green channel
`OQ:leave green
`01:blue
`10:alpha
`11:red
`51:52 blue channel
`00:leave blue
`01:alpha
`10:red
`
`11:green
`
`SRC B swizzle
`47:40
`2 bits for each component
`(refer to ‘SRC A swizzle’)
`
`SRC C swizzle
`39:32
`2 bits for each component
`(refer to ‘SRC A swizzle’)
`
`Opcode for the Scalar instruction
`Scalar Opcode
`31:26
`0: No clamp 1: Clamp to [0.0, 1.0] range
`Scalar Clamp
`25
`0: No clamp 1: Clamp to [0.0, 1.0] range
`Vector Clamp
`24
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`ake
`10/09/15 12:06 PM
`
`Relative Address|61 1 This bit determines the address register used as base register when
`
` co|=}3}=>
`
`
`
`
`
`
`
`>}—|@|oc}00
`
`AMD1044_0013387
`
`ATI Ex. 2008
`IPR2023-00922
`Page 8 of 43
`
`
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 8 of 43
`
`

`

`&
`
`ORIGINATE DATE
`
`
`EDIT DATE
`
`
`DOCUMENT-REV. NUM.
`
`
`PAGE
`
`
`
`1/7 January, 2002
`9 October, 2015
`GEN-CXXXXX-REVA
`9 of 43
`
`Scalar Write Mask
`23:20
`4
`Defines which out of 32 bits words (four of them) in the result is written back in
`the register file. There’s one bit per channel.
`0: leave the current value
`
`1: write
`Defines which out of 32 bits words (four of them) in the result is written back in
`Vector Write Mask
`19:16
`4
`the register file. There’s one bit per channel
`0: leave the current value
`1: write
`
`Specifies the address into the register files for the result of scalar operation
`Scalarresult pointer
`15:8
`8
`Bit[6] determines whether the destination address into GPR’s is relative or
`absolute.
`0: absolute
`1: relative
`Bit[7] in conjuction with Predicate Select are used to define different scenarios
`of export and predicate functionality. For more on this, refer to Section 3.2.1.2
`Specifies the addressinto the register files for the result of vector operation
`Bit[6] determines whether the destination address into GPR’s is relative or
`absolute.
`0: absolute
`1: relative
`
`Vector result pointer
`
`7:0
`
`
`
`
`
`8
`
`
`
`
`
`There’s a total of 96 bits per instruction.
`SrcA, SrcB and SrcC GPR locations denoted by Src A(B, C) Register/Constant Ptr fields of the ALU instruction word,
`can be relative as well as absolute addresses.If relative, they are relative to a register (Relative Address Register)
`present in the Sequencer as a render state. The above applies to Constant values as well.
`The bit allocation and assignment for the different fields of the instruction word was done with under the limitations
`that they should be dword (32-bit) aligned.
`
`3.2.1 ALU Instruction Word Interpretation
`
`3.2.1.1 Relative vs. Absolute Constants
`The location of the Constant Values in the Constant Regiter File can be absolute or relative to an offset value. When
`relative, they can be relative to either a loop index or a given register content value. The truth table shows the
`instruction fields that are used to decode the nature of the constant values.
`
`Constantd
`Constant!
`Address Register
`Notes
`Relative/Absolute
`Relative/Absolute
`Select
`
`0 ConstantO —absolute|Constant1-absolute0 0
`
`
`
`
`
`0
`0
`1
`Constant0 --absolute 9 bits, Constant 1 Absolute
`
`1
`0
`0
`Constant0 —loop index relative Constant-absolute
`
`0
`4
`0
`ConstantO-absolute
`_Constant1 —loop index relative
`
`1
`4
`0
`ConstantO-loop index relative ConstantO-loop index relative
`1
`0
`1
`ConstantO-addressrelative Constant1 -absolute
`
`0
`4
`1
`ConstantO-absolute
`Constant-addressrelative
`1
`4
`1
`ConstantO-addressrelative Constant1-addressrelative
`
`
`
`
`
`
`
`
`
`
`
`Note from the table that if both Constants are relative, they are relative to the same value, being that a loop index or
`addressregister.
`
`3.2.1.2 Argument Selection and Pointers
`There can be a maximum of three sources (operands) required for an ALU operation of a vector type.
`The R400 ALUinstruction word definition provides location pointers into GPRs or Constant Memory for each of the
`three sources (SRC A Register/Constant Pointer, SRC B Register/Constant Pointer, SRC B Register/Constant Pointer).
`
`3.2.1.3 Input and Output Modifiers
`The R400 ALU Instruction word definition provides for only two input modifiers for each of the three sources,
`Negate and Swizzle.
`The R400 ALU Instruction word provides for two output (result) modifiers: Mask which only effects the results going
`into GPRs but not the Previous Results and Clamp.
`
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`#ee
`10/09/18 12:06 PM
`
`AMD1044_0013388
`
`ATI Ex. 2008
`IPR2023-00922
`Page 9 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 9 of 43
`
`

`

`
`
`
`
`ORIGINATE DATE
`
`EDIT DAT
`
`
`
`
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`
`
`R400 Shader Processor Model 10 of 43
`9 October, 2015
`17 January, 2002
`
`3.2.1.4 Export and Predicate related decoding
`The table below describes the encoding of the exports and predicate support in the instruction word.
`Exports are allowed from either Scalar or Vector Pipe. Similar to the GPR write-backs, masking of export data is
`permitted. The mask is present in the ALU instruction word.
`In cases when exports are coissued from Scalar and
`Vector pipes, the export address used is the Vector Result Pointer present in the instruction word. The Scalar
`Result Pointer in this case is ignored. The table below describes the “mixed” use of Scalar and Vector Result Masks
`per component (each MASK field is 4 bits wide, one bit per component/channel) when exports are coissued.
`
`Scalar Mask
`Vector Mask
`Result of Export
`
`0
`0
`Don't write
`
`1
`0
`Write Scalar Component
`0
`1
`Write Vector Component
`1
`1
`Write 1.0 (one way of generating defaults)
`
`
`
`
`
`
`
`
`
`A few other export related definitions:
`'pixels' or ‘position’, only the ‘alpha’ component will contain the scalar result.
`1) When doing a Scalar export of
`The other 3 components will be expanded to 0.0. When exporting to 'parameters' the scalar result is put into
`all 4 components.
`2) When doing a Scalar export of 'parameters', non-export vector instructions may not be coissued.
`
`3) Exporting of 'Fog' is a special case.
`When exporting Fog, color must be exported at the same time. Fog will be exported in the Scalar
`pipe and Color in the Vector pipe.
`Instead Color and Fog are mixed into a single ARGBF word
`Masking is ignored for Fog exports.
`and exported to the render back-end.
`
`Predicate Select
`Notes
`Scalar Destination Pointer Bit[7] Vector Destination Pointer
`
`Bit[7]
`
`Ox
`Scalar Export Vector to GPR
`
`Ox
`Scalar Export Vector Export
`Ox Scalar to GPR Vector to GPR
`
`
`OX
`Scalar to GPR_ Vector Export
`10
`Scalarto GPR Vector to GPR
`Use Predicate register0
`1: skip
`0: execute
`10
`Scalarto GPR Vector to GPR
`Use Predicate Register0
`1: execute
`
`0: skip
`Scalarto GPR Vector to GPR
`10
`Use Predicate Register1
`1: skip
`0: execute
`10
`Scalarto GPR Vector to GPR
`Use Predicate Register1
`1: execute
`
`0: skip
`Scalarto GPR Vector to GPR
`11
`Use Predicate Register2
`1: skip
`0: execute
`11
`Scalarto GPR Vector to GPR
`Use Predicate Register2
`1: execute
`
`0: skip
`Scalar to GPR Vector to GPR
`11
`Use Predicate register3
`1: skip
`0: execute
`11
`
`
`
`oO/;/O/;oO;-|—>
`
`
`
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`
` Scalarto GPR Vector to GPR
`
`O/-/oO;/-/0
`
`
`
`
`
`
`
`Use Predicate register3
`1: execute
`0: skip
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`ake
`10/09/15 12:06 PM
`
`AMD1044_0013389
`
`ATI Ex. 2008
`IPR2023-00922
`Page 10 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 10 of 43
`
`

`

`
`
`ORIGINATE DATE
`
`
`
` EDIT DATE
`
`
`
` DOCUMENT-REV. NUM.
`
`
`
` PAGE
`
`
`17 January, 2002
`9 October, 2015
`GEN-CXXXXX-REVA
`11 ofAQ
`
`3.2.1.5 Export Types and Addresses
`The location where the data should be put in the event of an export is specified by in the destination addressfield of
`the ALU instruction word. Followingis a list of the possible types of exports and the range of addresses.
`
`Vertex Shading
`0:15
`16:31
`
`- 16 parameter cache
`
`Empty (Reserved?)
`32:43 - 12 vertex exports to the frame buffer and index
`44:47
`- Empty
`48:59
`- 12 debug export (interpret as normal vertex export)
`60
`- export addressing mode
`61
`- Empty
`62
`- sprite size export that goes with position export
`(point_h, point_w,edgeflag, misc)
`- position
`
`63
`
`- Color for buffer O (primary)
`- Color for buffer 1
`- Color for buffer 2
`- Color for buffer 3
`- Empty
`- Buffer 0 Color/Fog (primary)
`- Buffer 1 Color/Fog
`- Buffer 2 Color/Fog
`- Buffer 3 Color/Fog
`- Empty
`- Empty (Reserved?)
`- 12 exports for multipass pixel shaders.
`- Empty
`- 12 debug exports (interpret as normal pixel export)
`- export addressing mode
`- Empty
`- Z for primary buffer (Z exported to ‘alpha’ component)
`
`Pixel Shading
`
`01 2 3 4
`
`:7
`
`8 91
`
`0
`11
`12:15
`16:31
`32:43
`44:47
`48:59
`60
`61:62
`63
`
`3.3 ALU Opcodes
`The following table represents the ALU operations/opcodes supported by the Vector unit.
`
`
`
`Name Opcode_|Function Notes
`
`ADD
`0x00
`Result = A+B
`2 operandinstruction; possible coissue
`
`MUL
`0x01
`Result = A*B
`2 operand instruction, possible coissue
`
`MAX
`0x02
`If (A >= B) result = A; else result = B
`
`MIN
`0x03
`If (A < B) result = A; else result = B
`
`SETE
`0x04
`If (A = B) result = 1.0; else result = 0.0
`
`SETGT
`0x05
`If (A > B) result = 1.0; else result = 0.0
`
`SETGE
`0x06
`If (A >=B) result = 1.0; else result = 0.0
`
`SETNE
`0x07
`If (A != B) result = 1.0; else result = 0.0
`
`FRACT
`0x08
`Result = fractional part of A
`
`TRUNC
`0x09
`Result = integer part of A
`FLOOR
`Ox0a
`Result = TRUNCT(A)for positive A, (TRUNC(A) -1) for
`
`negative A
`
`MULADD
`OxOb
`Result=A*B+C
`3 operandinstruction; no coissue
`
`CNDE
`OxOc
`If (A == 0.0) result = B; else result = C
`3 operand instruction; no coissue
`
`CNDGE
`OxOd
`If (A >= 0.0 ) result = B; else result = C
`3 operand instruction; no coissu

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket