`
`ORIGINATE DATE
`
`
`
`
`
`
`
`
`Author: Andrew Gruber, Andi Skende
`
`17 January, 2002
`
`EDIT DATE
`
`9 October, 2015
`
`DOCUMENT-REV. NUM.
`
`GEN-CXXXXX-REVA
`
`PAGE
`
`1 of 43
`
`Issue To:
`
`Copy No:
`
`Shader Processor
`
`Rev 1.2
`
`
`
`Overview: This document describes the overall architecture of the Shaders, interfaces, partitioning into functional blocks as
`well as the timing of the shader pipeline. It’s intended for use by hardware designers.
`
`AUTOMATICALLY UPDATED FIELDS:
`Document Location
`: Hma_andi_mobile/..../doc_lib/parts/sp
`Current Intranet Search Title: Shacer Processor
`ons APPROVALS
`
`Name/Dept
`Signature/Date:
`
`
`
`
`Remarks
`
`
`
`THIS DOCUMENTCONTAINS [RN FORMATION THAT COULD BE
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES INC.
`THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`“Copyright 2001, ATI Technologies Inc. All rights reserved. The material in this document constitutes an unpublished work
`created in 2001. The use of this copyright notice is intended to provide notice that ATI owns a copyright in this unpublished
`work. The copyright notice is not an admission that publication has occurred. This work contains EIEN:oprictary
`information and trade secrets of ATI. No part of this document may be used, reproduced, or transmitted in any form or by any
`meanswithout the prior written permission of ATI Technologies Inc.”
`
`
`Ex.2042 - r400-doc_lib-design-blocks-sp__Shaders.doc__file#9(2).doc
`
`62641 Bytes*** © ATI HBReference Copyright Notice on Cover Page ©
`wee
`10/09/15 12:06 PM
`
`ATI 2042
`
`LG v. ATI
`
`IPR2015-00330
`
`AMD1044_0013380
`
`ATI Ex. 2008
`IPR2023-00922
`Page 1 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 1 of 43
`
`
`
`
`PAGE
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`Vat
`2 of 43
`17 January, 2002
`9 October, 2015
`R400 Shader Processor Model
`? sf |
`
`
`
`
`
`
`
`
`Table Of Contents
`STATE wu... ecccccsscssesssssccsescsssssessesssesesseessesesesesusssstaeseasscecenssuseseaeececsatsesessensencessneaeseersuentceseaneneceseaees 6
`1.1 Shader State ooccc cece cce cscs cess sessisesesasusstveasessisestisisstsisessisiteeseseseeeetesseeees 6
`1.1.1 GPRs (General Purpose Registers)... cccccccccc cesses ceseecseteteteeeteteeeteteetenseees 6
`1.1.2 Constant REGIStErs o.oo ccc ccc cscestetesssettesesestase vesesestissnsitatetsisetetsiseesesseeeees 6
`1.1.3 Previous Instruction Result .o..ccccecccceccccccsceeeccsceeeeseseseesesesetsesseetenaestensesensenes 7
`1.2
`UMItIAl SEATS cece ccc cecenerse tener sittnesesisise treaseatitintisisssitessisitetseseteteetiseeeees 7
`T.2.1 Vertex SHader iin cccccccececcccsescsesscsceetscscsetecscseceessevevsnsrestsaveventavevenseseveesevevens 7
`1.2.2 Pixel SMader ooo ccc cecccceccececcce cece cece cevevecevevevssevevetstssesesesetesesvivsvevevevevivivevveeees 7
`PROGRAM FORMATuuu... ccscceccsssssescsssessssssssnsescsesnsscseensencsesseaeeseceanaesesssnenenecenens 7
`2.
`3. ALU. ceccssssssssssesssnssneesssssnesssusanenssssaeeucassaseneseeusanesesueneesesesuesosseensanestensaesatsseanaseasenenseseeneas 7
`3.1 ALU SErUCTULonce cc cee cece eseasenisetentistsentessstasess tisasestisiesisetsisisecetseteeseseeeetes 7
`3.2 ALU Instruction fOr... ccc cee cececssestenssee ten teseitetessutenessstetevaseetseeseneeees 7
`3.2.1
`ALU Instruction Word Interpretation ....0.c.cccccccccccccsesesescsesecescevensnsvsvevsusesvsvavevsvsnevevensestey 9
`3.2.1.1. Relative vs. Absolute Constants... ccccccceccessceeeeeeseeeeceeseeessessssaseaaseeneeenseteeees 9
`3.2.1.2 Argument Selection and POINtELS «2.0.0...e cece eee ceeececc cee esesaaaeaeeeeeeeeseeeeees 9
`3.2.1.3
`Input and Output Modifiers 0000s ccc eeeeeeeeteeeecceeeeeessseeeeauseaaaaeeeseeeenesceeeees 9
`3.2.1.4 Export and Predicate related deCoding 0.2.0... ccceccccccee cece eeeecessaeeeeseeeaaaneeeees 10
`3.2.1.5 Export Types and AddreSsSe6Ss 2000 tete cee eeeeeeeeeeeeeeeeeeaaeaaaaaaeaeeeecees 11
`3.3 ALU OpPCode)oon cee ccessesessessessssssssensesseitattessissassissseesssssatseiseeetsetseeeseavesees 11
`3.4 DX9.0 Shader Instructions, related exceptions and corner CaS@S ......... occ cceeeeeecceeeeeeeeees 14
`3.5 Ma€aCro OPCOdESoo. cece sete tesesteneseeceseesssseensisesineesssesseresesisissisiseesisiesesenseees 14
`4.
`SHADER BLOCK DIAGRAMG. .....0 cece ccceeeecsesneeeesseeesseseneeseeneeeneenens 15
`4.1
`Shader as an SIMD architecture 20000...e cece cece ee ttt tebe bette teteeeteeeeeees 15
`4.2 Top-Level Diagram of a Shader Pipeline... ccc ccceesssseceeeeseseceeecccseeeensteassessaeeneeeesees 15
`5B.
`INTERFACES ..............cccccsssseccessssceencesccnsosecesoncecessnescessaseceataeeceuceseesscuessescuecessasecesscsecenccuessas 16
`5.1
`External Interfaces oo... ccccccceeessteeceeeeeeeeeeceeeeeescceneesseeeeaeeeeeeeeeeeseces secs taaaaasseeeaeasaeea 16
`5.1.1
`Naming Convention... cceccccccccecsesesescsesesescsevesecscsveeessssavevecsusveeesevevevevsvetsneneees 16
`5.1.2 Shader Engine to Texture Fetch Unit BUS ....0..cccccccccccscsesesceveesescsvsesesesesnsesnsestevsvsnsens 17
`5.1.3 Sequencer to Shader Pipe(s): Texture stall... 0 c.ccccccccccccceccssceeseseseeescsseseeeeeseeesesesees 17
`5.1.4 ScanConverter to Shader Pipe: |J DUS o...0.c.ccccccccccccesccsecesesescsssecsesetetecetseeseeeesvsetetereets 17
`5.1.5 Sequencer to Shader Pipe(s) - broadcast: interpolator DUS ........ ec cceccceeeeeernees 18
`5.1.6 Sequencer to Shader Pipe(s)-broadcast: Parameter Cache Read control bus........... 18
`5.1.7 Sequencer to Shader Pipe: GPR, Parameter Cache control and auto counter........... 18
`5.1.8 Shader Pipe to Shader Export (SX): Parameter data out of Parameter Cache........... 19
`5.1.9 Shader Export (SX) to Interpolators: Parameter Cache Return DUS ........ccccceeecen 19
`5.1.10 Shader Pipe to Shader Export (SX): Pixel/Vertex write to SX oo... cecccccccceeceeceeeeeeen 19
`5.1.11
`Sequencer to SPx: Instruction Interface ooo... ocecececccccececcsceesescscsesesvsvstetstsvsesteveveees 20
`5.1.12 Shader Pipe to Sequencer: Constant address l0ad.....ccccceccesesecseseeteseseeeesees 22
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`10/09/15 12:06 PM
`
`AMD1044_0013381
`
`ATI Ex. 2008
`IPR2023-00922
`Page 2 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 2 of 43
`
`
`
`
`
`
`
`
`
`
`PAGE
`Pat
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`
`
`
`
`17 January, 2002 9 October, 2015 GEN-GXXXXX-REVAeJe | 3 of 43
`5.1.13 Sequencer to SPx: constant broadCast ooo... ccececcceceeeecseeceeevsesteveveceveveveveveveveteees 22
`PARAMETER INTERPOLATION .........:cccccccssseeseccseeneecneennerssecneesceescenasceesnneesenenneneneensensensans 23
`6.
`SHADER LIMITATIONS .......ccccccsscssssccessensescsenenssssnseessnsseoaeseesensessesecnsssussaesssussoaaesesaeaeaasensas 24
`7.
`HARDWARE IMPLEMENTATION SPECIFICS.......sccsscsscnsccrscseesseeesssnssssssneeeesseennensssenssesesene 25
`8.
`8.1 General Information on the Shader Floating Point arithmetic.......0...0000 cece cceeeeeeeeeee 25
`8.2
`Interpolators and IU/XY Buffers 200 cere ener en eee e eee ee tener nnn neneeeeeeeeeeeeeeees 20
`O.2.1
` Interpolators ooo cee ccc cece cecececscesevessessevevsssssevessenseeesseesvevevsvitevevstevivanevevesnessnes 26
`8.2.1.1
`Interpolation Units ooo... cece cece cece eee ee eee e eee e een nttneene ener eeeeescceeeeaeeeaaaaeaaeaaeeeesenegs 26
`8.2.1.2 Parameter Selection Unit .....000000 0000 occ cece cece cece ce cette eee eeeeennnenteeeeeees 2?
`8.2.1.3 Parameter Difference & Cylindrical Wrap Engine... cccccceccse tte ssseeeeeeeeees 2f
`8.2.2 GPR Write Patri ccccccccccccccccccssssescensssntnesestasesetrsusentissttisinssisetsiestetsesiseneetinsess 28
`SoG
`10300) 0]|)re 29
`8.3.1 Vector Unit Pipeline ooo cece cscssececscscevesesssesesssesetestesseseesesscstevevesitiveneneey 29
`8.3.2 Argument Selection and ROUtING 0... cceccccsesesscesesesseseeseeseseseeevevsseevevsnevevenseey 30
`8.3.3 Parameter Data Path. ccc ccccescscsescececscssesevesesavessensesssstrsssseseetesseitesineey 31
`Se. ere|ere 33
`8.4.1
`Scalar Engine Pipeline oo... cccccccceccessesceceseseteseessseeescsseseessssevesecseetevicenivevenetey 33
`8.4.1.1.1.1.1 High Precision Pipeline Exp, PreproceSSing............0..ccccccccceeccccsccseeeeeeeccsetteeeeesscseneeeseeeniaa 34
`8.4.1.1.1.1.2 High Precision Pipeline Mantissa Calculation... ccc ceceeeeecscctttteeeecseseteeeeseeeceea 34
`8.4.1.1.1.1.3 High Precision Pipeline Logs Post ProceSSing............cceceeeeccerreeeeectttntteeesccnstiteeeereetea 38
`8.4.1.1.1.1.4 High Precision Pipeline Exponent 000.2... 00... c eee ceenee eee c ec eeteeeeeeeesttteteeeeescstttieeeeeeeee 38
`8.4.1.1.1.1.5 High Precision Special OUtPUtS....... ccc cce cece eters tteeeeccbeeteeeeeeesnneaeeeessccetteeeeeenns 39
`8.4.1.1.1.1.6 Determination of High Precision Coefficients... cece ccceeeeeeecstttteeeeesscteteteseeecea 39
`OPEN ISSUESQo... ceeceeeceeeesseesseeceeseeeeeeeeeeeneenenneennneseaeeoeeeseeeeeeesesesnaaaeesnneeaeeesaeeesseeneenensnnns 42
`
`9.
`
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`#ee
`10/09/18 12:06 PM
`
`AMD1044_0013382
`
`ATI Ex. 2008
`IPR2023-00922
`Page 3 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 3 of 43
`
`
`
`
`
`ORIGINAT!
`
` E DATE
`
`EDIT DAT
`
`
`
`
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`
`R400 Shader Processor Model 4 of 43
`9 October, 2015
`17 January, 2002
`
`
`
`Revision Changes:
`
`Rev 0.0 (Steve Morein)
`Date: Alpril, 2001
`Initial revision.
`
`Rev 0.1 (Andi Skende)
`Date: May 09, 2001
`
`Rev 0.2 (Andi Skende)
`Date: May 21, 2001
`Rev 0.3 (Andi Skende)
`Date: June 19, 2001
`
`Rev 0.4 (Andi Skende)
`Date: June 20, 2001
`Rev 0.5 (Andi Skende)
`Date: July 31, 2001
`
`Rev 0.6 (Andi Skende)
`Date: August 17,2001
`
`Rev 0.7 (Andi Skende)
`Date: November 8, 2001
`
`Rev 0.8 (Andi Skende)
`Date: November 27, 2001
`Rev 0.9 (Andi Skende)
`Date: December 10, 2001
`
`Rev 1.0 (Andi Skende)
`Date: January 15, 2002
`
`Rev 1.1 (Andi Skende)
`Date: January 21, 2002
`
`Rev 1.2 (Andi Skende)
`Date: January 22, 2002
`
`Documentstarted
`
`initial block
`Updated, added the instruction formant,
`diagrams and preliminary interface description
`
`of
`
`the SP<->TEX,
`
`description
`detailed
`A more
`RE/Sequencer <->SP interfaces.
`Added the paragraph related to shader functional
`limitations that the compiler needs to be awareof.
`A new updated and compressed version of ALU
`instruction format.
`Updated the Introduction of this document. A new
`Pipeline Timing Diagram wasinserted.
`Merged in the Shader Hardware Spec. A more detailed
`description of the interfaces with the other blocks was
`added. Updated some of the diagrams to a more
`correct representation of the datapaths.
`
`of Shader
`
`description/definition
`detailed
`A more
`interfaces with the other blocks.
`A more detailed description of the instruction supported
`by Shader Processor and it’s relation to instruction set
`exposed at API level.
`Updated the Alu instruction word definition and the list
`of the alu instruction opcodes supported by the shader
`pipe ALU unit.
`Updated the definition of the External Interfaces
`
`Updated the definition and naming of some of the
`external
`interfaces,
`rearranged the ALU instruction
`word definition such that the fields are dword aligned.
`The instruction opcode definition was updated and
`expanded.
`Updated most of the diagrams. Updated the External
`Interface definitions. Added a description of
`the
`Parameter
`Interpolation Units. Added a diagram
`desciption of the GPR write data paths.
`Updated some of the external interface definitions.
`Specified
`the
`expected
`behavior
`of
`hardware
`implementation of some shader opcode with some
`corner case values as input arguments. The MS
`Reference Rasterizer shader was used as guideline.
`Updated some of the external interface definitions.
`
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`ake
`10/09/15 12:06 PM
`
`AMD1044_0013383
`
`ATI Ex. 2008
`IPR2023-00922
`Page 4 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 4 of 43
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DAT
`
`
`
`
`
`DOCUMENT-REV. NUM.
`
`PAG
`
`
`
`
`
`5 of 43
`GEN-CXXXXX-REVA
`9 October, 2015
`17 January, 2002
`
`
`Introduction
`
`Shader Pipe (SP) serves as the central Arithmetic and Logic Unit (ALU) for the R400 Graphics Processor. There are four
`identical Shader pipelines in the R400 architecture. Differently from previous AT! architectures, the R400 Shader Pipetruly
`represents an Unified Shader Architecture. In R400, both vertex and pixel shading operations are implemented through the
`shader units. The R400 Shader Pipe represents an SIMD architecture. All the shader units of each and every pipe execute
`the same ALUinstruction on different sets of vertex parameters/pixel values. The building blocks of the R400 shader units
`execute operations on single precision IEEE floating-point values.
`
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`#ee
`10/09/18 12:06 PM
`
`AMD1044_0013384
`
`ATI Ex. 2008
`IPR2023-00922
`Page 5 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 5 of 43
`
`
`
`
`
`
`
`§
`
`
`ORIGINATE DATE
`
`EDIT DAT
`
`
`
`
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`
`
`
`17 January, 2002
`9 October, 2015
`R400 Shader Processor Model
`6 of 43
`
`State
`
`1.1 Shader State
`
`1.1.1 GPRs (General Purpose Registers)
`The general-purpose registers are 128 bits wide, composed of four 32-bit values. Depending on the operation these
`values are interpreted at RGBA, or XYZW, or STQW, or UVQW,or YUVA,or.. to simplify matters the only two aliases
`used here are XYZW and RGBA.
`
`To hide the latency of memory accesses the shader pipe will switch between different vectors. This is the same as
`the idea of “microthreading” that some advanced CPU's are investigating. The large register file is split between the
`vectors executing in the shader pipe. The management of the shader register file is automatic, and not visible to a
`program executing on a vector, except that a program is required to declare the number of GPRsit needs to execute.
`The hardware will not start a vector until the required number of registers is available. There is a direct tradeoff
`between the number of registers each program/vector needs and the number of vectors than can be simultaneously
`resident.
`If there are too few vectors resident, then the latency of memory accesses can no longer be hidden and
`performancesuffers.
`It is possible for a single program/vector to requestall 128 registers. This will make
`There are a total of 128 registers.
`it impossible to hide memory latency, but the program will still execute and generate the correct result.
`Most pixel programs are expected to have less than eight registers, vertex programs are expected to haveless than
`sixteen registers.
`If a
`The number of registers a program needs is the maximum number of registers it needs at any instruction.
`program needs only 3 general purpose registers nearly all of the time, except for a short period when it needs 8, it still
`needs to allocate eight. A significant performance optimization is for the compiler to reorder the instructions to
`minimize the number of needed registers.
`
`127
`95
`63
`31
`Q GPR
`
`AW
`B/Z
`GY
`R/X
`RO
`
`R1
`
`
`
`
`
`
`
`
`
`
`
`Notation:
`
`RO.A refers to the bits 96 to 127 of register one. So does RO.W
`
`1.1.2 Constant Registers
`There are also (192?) constant registers:
`
`127
`95
`63
`31
`O Const
`
`AIW
`BIZ
`GIY
`RIX
`Co
`
`C1
`
`
`
`
`
`
`
`
`
`
`
`
`R127
`
`C191
`
`These are ONLY available to vertex and pixel shader program in the primary commands stream. They should not be used
`for real time stream pixel shaders, or 2D shaders. Constant Registers are physically part of the Sequencer unit. As it
`become clear by reading the rest of this document, the content of the constant registers can be made available to the ALU
`units of the shader pipes in the form of one of the possible alu operation arguments. ALU instruction word providesfor that.
`
`The constant registers are shared between vertex shaders and pixel shaders, itis the drivers job to allacate one section to
`pixel shaders and another to vertex shaders to match the D3D programming model, other API’s may allow more freedom.
`To be able to support multiple textures easily, and to save hardwarearea, the texture state registers are stored in constant
`registers. A pair of constant registers hold 256 bits of texture state. Rather than have four or six sets of texture registers as
`we do in the R100,R200, and R300 bystoring them in the constant memory we can save area by reusing the logic already
`needed to update the constant registers in order. Since any single texture instruction will only fetch from one texture we do
`not need the simultaneous access we would get with implementing this as “normal” registers. The driver will probably decide
`to allocate a fixed number of the constant registers as texture registers.
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`#ee
`10/09/18 12:06 PM
`
`AMD1044_0013385
`
`ATI Ex. 2008
`IPR2023-00922
`Page6 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 6 of 43
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DAT
`
`
`
`
`
`DOCUMENT-REV. NUM.
`
`PAG
`
`
`
`7 of 43
`GEN-CXXXXX-REVA
`9 October, 2015
`17 January, 2002
`
`
`1.1.3 Previous Instruction Result
`
`Within an ALU clause the result of the previous operation is explicitly available, without requiring a register read.
`(due to an exposedpipeline delay, the result of the previous operation can not be read from the register file without a
`one-instruction delay slot). There are twodistinct previous instructions, one scalar and one vector.
`This register is not preserved between the end of one alu clause and the beginning of another.
`It can be used to avoid using another GPR if the result is not needed. Also, the output modifiers, which do effect the
`result of an instruction written into GPRs, do not effect the Previous Result content.
`
`127
`
`95
`
`63
`
`31
`
`0
`
`1.2 Initial state
`
`1.2.1 Vertex Shader
`
`A vertex shader initially has the x value of RO set to the vertex index. No other registers arefilled. The vertex shader
`must use the index to fetch the vertex data from the vertex array(s), The pointers to the vertex arrays should be
`placed in constant registers by the driver.
`
`1.2.2 Pixel Shader
`
`The pixel shader hasthe interpolated values generated from the values exported by the vertex shader.
`If the vertex shader did expxy, and the appropriate control bit in the rasterizer is set, then the register 0 contains the
`x,y,z,w of the pixel (screen space).
`If the pixel shader wants a world space x,y,z,w the vertex shader should output
`that.
`
`2. Program Format
`
`A pixel or vertex shader program consists of 16 clauses, eight texture clauses and eight alu clauses.
`The instructions in a clause will be executed sequentially. If a given instruction is implementing, for example,
`T*S+0D(T = texture for SRC A, S = Specular for Source B, D = Diffuse for Source C),
`it’s the Sequencer’s task to
`resolve the dependencies between the ALU clause and the respective texture clause.
`In other words, the sequencer
`will not issue the ALU instruction using texture data as input to the shader pipe, until the texture request has been
`issued to and serviced by the texture pipe.
`In general, the Shader is not aware of the origine of the SRC A, SRC B
`and SRC C data (texture, diffuse, specular, vertex parameters etc). Three address pointers into the register files (one
`for each operand) are all the shaders needto fetch these operands.In reality, as it will become more evidentlater in
`this document, there is no need for the pointer values to be passed to the shader units. This is related to the GPR's
`read/write mechanism we have chosen to implement.
`
`3. ALU
`
`3.1 ALU structure
`
`ALU consist of two distinct units: the ‘Vector’ ALU and the ‘Scalar ALU. The Vector ALU peforms operations in
`parallel across a 4-component vector, while the Scalar ALU performs operations on a single componentof a vector
`which is then replicated across all components. A single instruction may ‘co-issue’ both a Vector and a Scalar
`instruction, subject to the limitation that the vector instruction may only require 1 or 2 arguments. For example, a
`Vector MUL (Multiply) instruction can be coissued with a Scalar instruction, but a MULADD (Multiply and ADD) may
`not.
`For more details on the overall structure of the Shader ALU, refer to the figures in Section 5 of this document.
`
`3.2 ALU instruction format
`
`There are two opcodespresent in the ALUinstruction, one for the Vector operation and one for Scalar operation. The
`idea is that we can allow a 4-componentvector operation (if the compiler permits) coissued with a Scalar Operation.
`
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`#ee
`10/09/18 12:06 PM
`
`AMD1044_0013386
`
`ATI Ex. 2008
`IPR2023-00922
`Page 7 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 7 of 43
`
`
`
`Vt)
`<“- 6
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`17 January, 2002
`9 October, 2015
`R400 Shader Processor Model
`
`
`
`
`
`
`
`PAGE
`8 of 43
`
`
`
`The Scalar unit may use SRC C, depening on whether this source is being used by the vector operation. Please refer
`to Section 8 of this document on the limitations of a Vector or Scalar instruction issuing.
`
`Field
`Bits
`Size
`Description
`SRC A Select
`95
`1
`Select bit for selecting Constant vs Register/Vector/Scalar Feedback
`0: Constant
`
`1: Register/Previous Vector/Previous Scalar
`SRC B Select
`94
`1
`Select bit for selecting Constant vs Register/Vector/Scalar Feedback
`0: Constant
`
`1: Register/Previous Vector/Previous Scalar
`SRC C Select
`93
`1
`Select bit for selecting Constant vs Register/Vector/Scalar Feedback
`0: Constant
`
`1: Register/Previous Vector/Previous Scalar
`
`Vector Opcode
`92:88
`5
`Opcode for Vector instruction
`
`SRC A Register/Const-|87:80 8 Location of Source A in the register file
`
`ant Pointer
`If not Constant, Bits [6],[7] denote:
`00- (absolute register)
`01 - (relative register)
`10- (previous vector)
`
`11- (previous scalar)
`
`
`SRC B Register/Const-|79:72 8 Refer to SRC A Register/Constant Ptr
`
`ant Pointer
`
`SRC C Register/Const-|71:64 8 Refer to SRC A Register/Constant Ptr
`
`
`ant Pointer
`Constant0
`63
`1
`The address pointer into the Constant Register File is relative to some base
`
`Relative/Absolute
`address register (works in conjunction with Address Register Select)
`Constant
`62
`1
`The address pointer into the Constant Register File is relative to some base
`
`Relative/Absolute
`address register (works in conjunction with Address Register Select)
`
`
`
`Register Select
`Constant
`indexing is
`relative.
`It
`is used in conjunction with Constan0O
`Relative/Absolute and Constant1 Relative/Absolutefields.
`O:Loop indexrelative
`
`1:Address Register relative
`2
`60:59
`Predicate Select
`This bits are used in conjunction with bit 7 of Scalar Destination Pointer and
`
`Vector Destination Pointer
`
`0: No modification
`_1:negate
`SRC A Arg Modifier
`58
`
`0: No modification
`1:negate
`SRC B Arg Modifier
`57
`0: No modification
`1:negate
`SRC C Arg Modifier
`56
`SRC A swizzle
`55:48
`2 bits for each component
`45:46 alpha channel
`00:leave alpha
`01:red
`10:blue
`11:green
`47:48 red channel
`OO:leave red
`01:green
`10:blue
`11:alpha
`49:50 green channel
`OQ:leave green
`01:blue
`10:alpha
`11:red
`51:52 blue channel
`00:leave blue
`01:alpha
`10:red
`
`11:green
`
`SRC B swizzle
`47:40
`2 bits for each component
`(refer to ‘SRC A swizzle’)
`
`SRC C swizzle
`39:32
`2 bits for each component
`(refer to ‘SRC A swizzle’)
`
`Opcode for the Scalar instruction
`Scalar Opcode
`31:26
`0: No clamp 1: Clamp to [0.0, 1.0] range
`Scalar Clamp
`25
`0: No clamp 1: Clamp to [0.0, 1.0] range
`Vector Clamp
`24
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`ake
`10/09/15 12:06 PM
`
`Relative Address|61 1 This bit determines the address register used as base register when
`
` co|=}3}=>
`
`
`
`
`
`
`
`>}—|@|oc}00
`
`AMD1044_0013387
`
`ATI Ex. 2008
`IPR2023-00922
`Page 8 of 43
`
`
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 8 of 43
`
`
`
`&
`
`ORIGINATE DATE
`
`
`EDIT DATE
`
`
`DOCUMENT-REV. NUM.
`
`
`PAGE
`
`
`
`1/7 January, 2002
`9 October, 2015
`GEN-CXXXXX-REVA
`9 of 43
`
`Scalar Write Mask
`23:20
`4
`Defines which out of 32 bits words (four of them) in the result is written back in
`the register file. There’s one bit per channel.
`0: leave the current value
`
`1: write
`Defines which out of 32 bits words (four of them) in the result is written back in
`Vector Write Mask
`19:16
`4
`the register file. There’s one bit per channel
`0: leave the current value
`1: write
`
`Specifies the address into the register files for the result of scalar operation
`Scalarresult pointer
`15:8
`8
`Bit[6] determines whether the destination address into GPR’s is relative or
`absolute.
`0: absolute
`1: relative
`Bit[7] in conjuction with Predicate Select are used to define different scenarios
`of export and predicate functionality. For more on this, refer to Section 3.2.1.2
`Specifies the addressinto the register files for the result of vector operation
`Bit[6] determines whether the destination address into GPR’s is relative or
`absolute.
`0: absolute
`1: relative
`
`Vector result pointer
`
`7:0
`
`
`
`
`
`8
`
`
`
`
`
`There’s a total of 96 bits per instruction.
`SrcA, SrcB and SrcC GPR locations denoted by Src A(B, C) Register/Constant Ptr fields of the ALU instruction word,
`can be relative as well as absolute addresses.If relative, they are relative to a register (Relative Address Register)
`present in the Sequencer as a render state. The above applies to Constant values as well.
`The bit allocation and assignment for the different fields of the instruction word was done with under the limitations
`that they should be dword (32-bit) aligned.
`
`3.2.1 ALU Instruction Word Interpretation
`
`3.2.1.1 Relative vs. Absolute Constants
`The location of the Constant Values in the Constant Regiter File can be absolute or relative to an offset value. When
`relative, they can be relative to either a loop index or a given register content value. The truth table shows the
`instruction fields that are used to decode the nature of the constant values.
`
`Constantd
`Constant!
`Address Register
`Notes
`Relative/Absolute
`Relative/Absolute
`Select
`
`0 ConstantO —absolute|Constant1-absolute0 0
`
`
`
`
`
`0
`0
`1
`Constant0 --absolute 9 bits, Constant 1 Absolute
`
`1
`0
`0
`Constant0 —loop index relative Constant-absolute
`
`0
`4
`0
`ConstantO-absolute
`_Constant1 —loop index relative
`
`1
`4
`0
`ConstantO-loop index relative ConstantO-loop index relative
`1
`0
`1
`ConstantO-addressrelative Constant1 -absolute
`
`0
`4
`1
`ConstantO-absolute
`Constant-addressrelative
`1
`4
`1
`ConstantO-addressrelative Constant1-addressrelative
`
`
`
`
`
`
`
`
`
`
`
`Note from the table that if both Constants are relative, they are relative to the same value, being that a loop index or
`addressregister.
`
`3.2.1.2 Argument Selection and Pointers
`There can be a maximum of three sources (operands) required for an ALU operation of a vector type.
`The R400 ALUinstruction word definition provides location pointers into GPRs or Constant Memory for each of the
`three sources (SRC A Register/Constant Pointer, SRC B Register/Constant Pointer, SRC B Register/Constant Pointer).
`
`3.2.1.3 Input and Output Modifiers
`The R400 ALU Instruction word definition provides for only two input modifiers for each of the three sources,
`Negate and Swizzle.
`The R400 ALU Instruction word provides for two output (result) modifiers: Mask which only effects the results going
`into GPRs but not the Previous Results and Clamp.
`
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`#ee
`10/09/18 12:06 PM
`
`AMD1044_0013388
`
`ATI Ex. 2008
`IPR2023-00922
`Page 9 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 9 of 43
`
`
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DAT
`
`
`
`
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`
`
`R400 Shader Processor Model 10 of 43
`9 October, 2015
`17 January, 2002
`
`3.2.1.4 Export and Predicate related decoding
`The table below describes the encoding of the exports and predicate support in the instruction word.
`Exports are allowed from either Scalar or Vector Pipe. Similar to the GPR write-backs, masking of export data is
`permitted. The mask is present in the ALU instruction word.
`In cases when exports are coissued from Scalar and
`Vector pipes, the export address used is the Vector Result Pointer present in the instruction word. The Scalar
`Result Pointer in this case is ignored. The table below describes the “mixed” use of Scalar and Vector Result Masks
`per component (each MASK field is 4 bits wide, one bit per component/channel) when exports are coissued.
`
`Scalar Mask
`Vector Mask
`Result of Export
`
`0
`0
`Don't write
`
`1
`0
`Write Scalar Component
`0
`1
`Write Vector Component
`1
`1
`Write 1.0 (one way of generating defaults)
`
`
`
`
`
`
`
`
`
`A few other export related definitions:
`'pixels' or ‘position’, only the ‘alpha’ component will contain the scalar result.
`1) When doing a Scalar export of
`The other 3 components will be expanded to 0.0. When exporting to 'parameters' the scalar result is put into
`all 4 components.
`2) When doing a Scalar export of 'parameters', non-export vector instructions may not be coissued.
`
`3) Exporting of 'Fog' is a special case.
`When exporting Fog, color must be exported at the same time. Fog will be exported in the Scalar
`pipe and Color in the Vector pipe.
`Instead Color and Fog are mixed into a single ARGBF word
`Masking is ignored for Fog exports.
`and exported to the render back-end.
`
`Predicate Select
`Notes
`Scalar Destination Pointer Bit[7] Vector Destination Pointer
`
`Bit[7]
`
`Ox
`Scalar Export Vector to GPR
`
`Ox
`Scalar Export Vector Export
`Ox Scalar to GPR Vector to GPR
`
`
`OX
`Scalar to GPR_ Vector Export
`10
`Scalarto GPR Vector to GPR
`Use Predicate register0
`1: skip
`0: execute
`10
`Scalarto GPR Vector to GPR
`Use Predicate Register0
`1: execute
`
`0: skip
`Scalarto GPR Vector to GPR
`10
`Use Predicate Register1
`1: skip
`0: execute
`10
`Scalarto GPR Vector to GPR
`Use Predicate Register1
`1: execute
`
`0: skip
`Scalarto GPR Vector to GPR
`11
`Use Predicate Register2
`1: skip
`0: execute
`11
`Scalarto GPR Vector to GPR
`Use Predicate Register2
`1: execute
`
`0: skip
`Scalar to GPR Vector to GPR
`11
`Use Predicate register3
`1: skip
`0: execute
`11
`
`
`
`oO/;/O/;oO;-|—>
`
`
`
`EX. 2042 - r400-doc_lib-design-blocks-sp__Shaders.doc_file#9 (2).doc
`
` Scalarto GPR Vector to GPR
`
`O/-/oO;/-/0
`
`
`
`
`
`
`
`Use Predicate register3
`1: execute
`0: skip
`62641 Bytes*** © ATI BMReference Copyright Notice on Cover Page ©
`ake
`10/09/15 12:06 PM
`
`AMD1044_0013389
`
`ATI Ex. 2008
`IPR2023-00922
`Page 10 of 43
`
`ATI Ex. 2008
`
`IPR2023-00922
`Page 10 of 43
`
`
`
`
`
`ORIGINATE DATE
`
`
`
` EDIT DATE
`
`
`
` DOCUMENT-REV. NUM.
`
`
`
` PAGE
`
`
`17 January, 2002
`9 October, 2015
`GEN-CXXXXX-REVA
`11 ofAQ
`
`3.2.1.5 Export Types and Addresses
`The location where the data should be put in the event of an export is specified by in the destination addressfield of
`the ALU instruction word. Followingis a list of the possible types of exports and the range of addresses.
`
`Vertex Shading
`0:15
`16:31
`
`- 16 parameter cache
`
`Empty (Reserved?)
`32:43 - 12 vertex exports to the frame buffer and index
`44:47
`- Empty
`48:59
`- 12 debug export (interpret as normal vertex export)
`60
`- export addressing mode
`61
`- Empty
`62
`- sprite size export that goes with position export
`(point_h, point_w,edgeflag, misc)
`- position
`
`63
`
`- Color for buffer O (primary)
`- Color for buffer 1
`- Color for buffer 2
`- Color for buffer 3
`- Empty
`- Buffer 0 Color/Fog (primary)
`- Buffer 1 Color/Fog
`- Buffer 2 Color/Fog
`- Buffer 3 Color/Fog
`- Empty
`- Empty (Reserved?)
`- 12 exports for multipass pixel shaders.
`- Empty
`- 12 debug exports (interpret as normal pixel export)
`- export addressing mode
`- Empty
`- Z for primary buffer (Z exported to ‘alpha’ component)
`
`Pixel Shading
`
`01 2 3 4
`
`:7
`
`8 91
`
`0
`11
`12:15
`16:31
`32:43
`44:47
`48:59
`60
`61:62
`63
`
`3.3 ALU Opcodes
`The following table represents the ALU operations/opcodes supported by the Vector unit.
`
`
`
`Name Opcode_|Function Notes
`
`ADD
`0x00
`Result = A+B
`2 operandinstruction; possible coissue
`
`MUL
`0x01
`Result = A*B
`2 operand instruction, possible coissue
`
`MAX
`0x02
`If (A >= B) result = A; else result = B
`
`MIN
`0x03
`If (A < B) result = A; else result = B
`
`SETE
`0x04
`If (A = B) result = 1.0; else result = 0.0
`
`SETGT
`0x05
`If (A > B) result = 1.0; else result = 0.0
`
`SETGE
`0x06
`If (A >=B) result = 1.0; else result = 0.0
`
`SETNE
`0x07
`If (A != B) result = 1.0; else result = 0.0
`
`FRACT
`0x08
`Result = fractional part of A
`
`TRUNC
`0x09
`Result = integer part of A
`FLOOR
`Ox0a
`Result = TRUNCT(A)for positive A, (TRUNC(A) -1) for
`
`negative A
`
`MULADD
`OxOb
`Result=A*B+C
`3 operandinstruction; no coissue
`
`CNDE
`OxOc
`If (A == 0.0) result = B; else result = C
`3 operand instruction; no coissue
`
`CNDGE
`OxOd
`If (A >= 0.0 ) result = B; else result = C
`3 operand instruction; no coissu