throbber
OOlGltNATE DATE
`EOITOATE
`(date,\@ '"d MMMM,
`17 January, :2110,2
`�
`Author. Alktew Gmbe r, /irult Skende, Tom Frtslflger
`
`.-
`
`I
`
`CIOCIJM EN!T-RE.V. NUM,
`G.EN-CXXXXX-REVA
`
`PAGE
`1 or.cg
`
`i5S�To:
`
`I eopyiNo:
`
`Shader Processor
`
`Rev 2.102
`
`0¥erview: This d0C11.1m1rnt dH cribes lhe ovem!I a1chitecture or the Shadera, me rraces, par1ilionrng into func'lfon11I bfocks as
`wgll as the 1iming of the shade1 pipoline. Ifs inl:ended £or use by hardwa re d1Jsignen;_
`
`AurOMATICALL Y UPDATED FIELDS:
`Docum&'lllt Loc ation
`: flrna_mndi_rnobih::J .•. Jdoc._ 'pa rbs J�P
`Cunent ln'lrimet Ssarch Title: Shade-r �roe�
`�am�pt
`
`APPFt:OVM..s
`
`Sig111a11.Jrel0al.e
`
`Ref'l\erk$
`
`·copyright 2001, ATI lethn ologi c-.s Inc. A'll righ'ls rcscrv.cd. The material in fiilis doQ.lroont conslitu� ilil unpubli&hcd work
`
`
`
`to pro'odde 1notlee lhal ATI O'M\S � eopyrl9ht , Oiil$ llm'J)Ublished
`
`ore?Jledl �n 2001. TJ-ie-use oC this. oop'.,'righl flO'lj(e 1$ lnteri<Jed
`
`worlc. The copyright notice is. not an adtrnisslon Chai publie;a'lion has occumre,d, Thi!! work con'tam con denlial, proprietary
`may bl!i usad. ireprodu ootl, or ·trans.rmtBd in, any farm or b:, any
`information and trade secratis of All. No part or lhis document
`mcaM wUhout 'the prio-r WTilk:-.n perl'ni5$ioo of ATI Tcchm::ilogic&. I'm:,·
`
`ATI Ex. 2067
`IPR2023-00922
`Page 1 of 448
`
`

`

`Vat
`Ms
`t
`
`ORIGINATE DATE
`17 January, 2002
`
`EDIT DATE
`[date ‘vay “id MMMM,
`
`DOCUMENT-REV. NUM,
`R400 ShaderProcessor Model
`
`PAGE
`
`2of4ag
`
`aSUSE tna Fh ergs pent eaten Suse Gin Pic dude VN CdSe ar TS Vad CRT a goer fea Es 6
`11.
`
`
`1.1 3 ‘Previous Instruction Result... ee ee ccce esc cese sence seesesuseseceseussemepeeesseauesasseseeseensnenenes ie
`1. NSNE SaI sie servi dismerdst ax ann Raed AAW EERE esnaRL of
`Lee Pivel Shader........c.cc0.
`syste nina eae lp da eee dane ang en apencnnas
`M FORMAT.SPREE REE ERSTE EE PRE REE RREEE PERERA RRO RRR RRR R REET P REPRE ee Pee t
`CSSA Ae SEE STEEN SRS EST EE SS OEE TES SETS ESTEE SEES PSST ESTEE STSSee t
`
`te ho
`
`
`
`ALU ©
`
`Sd oo
`
`ImSEP|
`
`fe0ho nN leae
`
`joie EeeH
`“Pz =
`instruction format...
`See esy Sagopa doa aa Se gs RASSEUV
`Instruction Word intarp station praitha siting hig eh ct sia. wyguscsieucdanslaced jade eeedae Tear cussed 10
`Ar an DHT EHV FOUNTENSSsciatcxascincvsaeaccwete
`canmeadstandaiacamennedaininds vamoaennayidansneanns
`LdPoodfiedjt—fyNEE
`
`Ps, CL
`CUS et
`taunimenien
`oniania
`nina
`nputantOutputModifiers davis hada tepldiivideubeodanGapuuicuudlaynedertehousvuel riseddsineccbaakvaseetaacies
`
`ta hy La ta
`
`RT MMPNRRM ses 2h acy sss nas fac a i doc Aca oe URILL EEGa sehaa ese a een 11
`
`balaad hoaf he
`
`Export and Predicate related decoding ..u..ccccc cc ice cu ees eevee 11
`Export Types amd Addresses oo... ccccecssesessessersenssensenssnerenseessnsenssenresneaserseenenerten 11
`ec cc 5 gS dg betta Fae RSUEESE GOSGEI GSUGBEES 13
`
`BSIMDarchit@cure..........ccsceseesceesseeerersoneesseussay snes susvaucedecuevauseevenereueervenesns ces 21
`a
`
`=\5'|2:55ave
`Lovet Diagram ofa ShadercPipetn sidsaesescypuedasedeCoEPeaacladeedualeeceeee21
`cnyetr| mk =(mM
`
`A — Li
`Cr —=
`to
`
`an = oo
`in ie oo
`(Or — — co
`
`poho
`co|
`
`r Pi
`
`uencer: Constant addr
`
`ee eeee es es 28
`
`wbemal Interfaces...........0...cce seta ig oli Sine uaa tif pbb aif eb ania wan ec 2a
`Pbearrnirey CCWVERCHT. ........eeeacenc ces conser eesconersnceaecnesnesreasessess seaeeanensrenensecesrenstarensees nesses tea 22
`Shader Engine to Texture Fetoh Unit Bus. secssessesseesseuseuscsseuseuseessccessseruennsune dead
`uencer ta
`her Pipes): Teter
`Stall coo ccecccccecseesencccuscssesereecesscanscsatermecrevveavencevanen 2d
`talole|ek "i.
`scanGonverter to Shader Pipe: I BUS cee cees cus ce ecu eeu eeeseesseeseueueueeuseue ues 2d
`00/60oo
`to
`Shader Pi
`-b
`2 Ini
`EM
`seen acs laa seeders aan sve
`uencer to Shader Pipes
`dcast: Parameter Cache Read Oontroe BuS ose 24
`|cnjen —| al
`equencer to Shader Pipe: GPR, Parameter Cache control and auto counter .............24
`eayoo
`hader Pipe
`to Shader Export
`(SX): Parameter data out of Parameter Cache ............. 25
`hader Export (5%) to Interpolators: Parameter Cache Return bug... 25
`hader Pipe
`to Shader Export
`(SX): PimelVertex write to SXoo. 25
`frupcr)eri—_mefmm[meCkPho|
`Sequencer tose RRSSa VSTNRIIENoapeaesicajcSeayeases yadanenaut saGolpdebbaglavaeielevpeek 26
`— ok
`Sequencer toee SCREearEMScascnicepciceaccdpeccasetcadeyqacuanahepiasvaweqanaptanaesievdpenpel 24
`
`‘PARAMETER INTERPOLATION.. ere
`
`OO)“(oole>ims
`SALVE AA sinieet ee eeeeee iceeee 30
`HARDWARE IMPLEMENTATION SPECIFICS......c.cccccsctcctstcnsccssnssseersetserssisonssneesseonsanees 31
`General Information on the Shader Floating Point arithimettic ...,..............ccccceceecseseeseesensees 31
`PR
`Meters: sare LOY BTS on ecrccen tances ccattdacessesusadeeacdensdeseccusequidcstaarsansaysetsenicacageeapsanend |
`coyPf — sai
`Inter
`BARDae FessShr ineSETt erie IN ipsa eS Uk ae
`
`Int
`MUMS INA MRED: cg gosg esau arvana yen uh u's ike wy ODS a Aa eno bl gin NEEM worm a2
`
`a iN fi
`Parameter Selection Unit ...........
`sounpunuonqanc) queseusnnessantonssuasssetonaetusnessey SOE
`IN
`Parameter Difference &
`lindrical Wrap En
`Sigil Guadvydinaua Mabe add se bastee ag
`oo ) a to
`ATI Ex. 2067
`IPR2023-00922
`Page 2 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 2 of 448
`
`

`

`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`i?es2002
`
`[dateWa*“d MMMM,
`
`GEN-CRAOXN-REVA
`
`line MantiseaCalculation.edededesdeShinemieetuerdnsctentbocctssibdaiweesieinningarecasel
`
`Post Proce
`
`Sof49 B.4.0.17.12
`
`
`6.4.7.1.1.1.3 High Precision
`B41 1114
`a4. 1 iF LE;
`1 5
`horde | LY ss eee etree eter retirerere Lint ret Crearrier re Etre Liter etretrae ere errr eter tis
`
`8.4.1,1,1.1.6 Gaaiinatan of High PrecCmATCOMTS ooncteatenccteneeesen
`i
`OF EN sisiductnaivetibueainenhbunddainhinnie Peeeeee Pests esse
`eeea)
`
`Precision Pi
`
`Pipeline
`
`3.
`
`ATI Ex. 2067
`IPR2023-00922
`Page 3 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 3 of 448
`
`

`

`
`
`ORIGINATE GATE
`
`EDIT DATE
`
`1? January, 2002
`Revision Changes:
`
`[date Wa} “od MMMM,
`rumen —
`
`DOCUMENT-REV. NUM,
`
`R400 Shader Processor Model
`
`Rev 0.0 (Steve Morein)
`Date: April, 2001
`Initial revislor,
`
`Rev 0.4 (Andi Skende)
`Date: May 09, 2004
`
`Rev 0.2 (Andi Skenda)
`Date: May 21, 2001
`Rev 0.3 (Andi Skende)
`Date: June 19, 2007
`
`Rov 0.4 (Andi Skenda)
`Date: June 20, 2001
`Rev 0.5 (Andi Skendo)
`Date: July 31, 2001
`
`Rev 0.6 (Andi Skende)
`Date: August 17,2001
`
`Rev 0.7 (Andi Skende)
`Date: Movember 6, 2001
`
`Rev 0.4 (Andi Skende)
`Bate: Movember 27, 2007
`Rev 0.5 (Andi Skende)
`Date: December 10, 2001
`
`Rev 1.0 (Andi Skende}
`Date: January 15, 2002
`
`Rev 1.4 (Andi Skende}
`Date: January 21, 2002
`
`Rev 1.2 (Andi Skende)
`Date: January 22, 2002
`Rev 1.3 (Andi Skende)
`Date: January 20, 2002
`
`Becurnent started
`
`Updated, added the instruction formant, mitial block
`diagrans and preliminary interlace descnplien
`
`the SP<>TEX,
`
`A more detailed description of
`RE/Sequencer <->5P interfaces.
`functional
`Added the paragraph related to shader
`limitations that the compiler needs to be aware af.
`A new updated and compressed version of ALU
`instruction format.
`Updated the Intreducton of this document. A new
`Pipeline Timing Diagram was inserted.
`Merged in the Shader Hardware Spec, A more detailed
`description of the interfaces with the other blocks was
`added. Updated some of the diagrams to a more
`correct representation of the datapatirs.
`
`A more detailed description/definition of Shader
`interfaces with the other blocks.
`instructen
`the
`of
`A more detailed desenption
`supported by Shader Processor and it's relation to
`instruction set exposed at API level.
`Updated the Alu instruction word definition and the bist
`of the alu instruction apeedes supperted by the shader
`pipe ALU unit
`Updated the definition of the Extemal Interfaces
`
`Updated the definition and naming of some of the
`extemal
`interfaces, cearranged fhe ALU insiruchon
`word definition such that the fields are chord aligned,
`The instruction opoode definition was updated and
`expanded.
`Updated most of the diagrams. Updated the External
`Interface definitions. Added a description of
`the
`Parameter
`Interpolation Units. Added a diagram
`description of the GPR write data paths.
`Updated some of the external interface definitions.
`Specified
`the
`expected
`behavior
`of
`hardware
`implementation of some shader opcode with some
`comer case Values as Inpul arguments. The MS
`Reference Raslerizer shader was used as guideline
`Updated some of the extemal interface definitions.
`
`1. Changed the order of the swizzle bite per
`channel in the instrection word such that the
`LS8s belong to the red channel
`...MSBs for
`the alpha channel,
`2. Extended the $0SP_Instuct bus width to 21
`bits to account for the Scalar Opeode being &
`bits instead of 5.
`1. Modified the instruction word format.
`2. Added new aprodes (DST and MUL_PREV2)
`and redefined the opceoded values for both
`vector and scalar instructons,
`
`3. Updated the 50SP Instruction$interface
`definition,
`
`Rev 1.4 (Andi Skende)
`Date: August 12, 2002
`
`ATI Ex. 2067
`IPR2023-00922
`Page 4 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 4 of 448
`
`

`

`mete]
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV, NUM,
`
`i? January, 2002
`
`[date Va “d MMMM,
`
`GEN-CRAOXN-REVA
`
`Introduction
`
`Shader Pipe (SP) serves as the central Anthmetic and Logic Linit (ALU) for the R400 Graphics Processor. There are four
`identical Shader pipelines in the R400 architecture. Differently fom previous AT! architectures, the R400 Shader Pipe truly
`represents a Unified Shader Architecture (USA).
`In R400, both vertex and pixel shading operations are implemented
`through the shader units. The R400 Shader Pipe represents a Single Inetruction Multiple Data (SIMD) architecture. All the
`shader units of each and every pipe execute the same ALL instruction on different sets of vertex parametersipivel values.
`The building blocks of the R400 shader units execute operations on single precision IEEE floating-pomt values.
`
`ATI Ex. 2067
`IPR2023-00922
`Page 5 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 5 of 448
`
`

`

`rumen —
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM,
`
`1? January, 2002
`
`[date Vg “od MMMM,
`
`R400 Shader Processor Modal
`
`State
`
`1.1 Shader State
`
`1.1.1 GPRs (Genera! Purpose Registers)
`The general-purpose registers are 126 bils wide, composed of four 32-ba values. Depending on the operation these
`values can be interpreted, among others, as ABGR, WZYx, ORTS, QWVU, or AUYV respectively; to simplify matters
`the only ihe alias W27X will be used throughout this document.
`
`To hide the latency of memory accesses the shader pipe will switch between different vectors. This is the same as the
`idea of “microlhreading’ thal some advanced CPU's are investigating. The lange register file is split between the
`vectors éxeculing in ihe shader pipe. The management of the shader register file is automatic, and mot visible to a
`program executing on a vector, excepl that a program is required to declare the number of GPRsit needs to execute,
`The hardware will not start a vector uni the required number of registers is available. There is a direct tradeoff
`between the number of registers each program/vector needs and the number of vectors than can be simultaneously
`resident.
`if there are too few vectors resident, lhen the latency of memory accesses can mo longer be hidden and
`performance suffers.
`
`There are a total of 128 general purpose registers. A given shader can request al most 64 GPRs, Requesting a very
`lange number of GPRs will make i difficult to hide mvemery latency, but the program will still execute and generate the
`comect result.
`
`Most pixel programs are expected to have less than eight registers; verlex programs are expected to have less than
`sixteen registers.
`
`The number of registers a program needs is the mamum number of registers it meds at any instruction. a program
`needs only 3 general purpose registers nearly all of the time, except fora short period when it needs &.if still needs to
`allocate eight, A significant performance optimization is for the compiler to reorder the instructions to minimize the
`number of needed registers.
`
`
`
`Notation: ROW refers to the bits &6 to 127 of ragister one (50 does ROA).
`
`1.1.2 Constants
`
`There are algo (512) constant registers available lo vertex and pixel shaders in the primary command siream.
`
`127
`
`95
`
`BS
`
`31
`
`0 Constant
`
`
`
`Real-time hae it's own 256. Conatants are physically part of the Sequencer unit. As it will become clear by reading the rest
`of this document, the content of the constants can be made available to the ALU units of the shader pipes in the fomm of one
`of the possible ALU operation arguments. The ALU instruction word provides for that,
`
`The constant file is shared between veriex shaders and pixel shaders, it is the drivers job to allocate one section to pixel
`shaders and another to vertex shaders to match the O30 programming model: other API's may allow more freedom.
`
`To be able te support multiple textures easily, and to save hardware area, the texture state regisiers are stored in separate
`pod of constant registers. Each texture constant holds 192 bits of texture state. Rather than have four or six sets of texture
`registers as we do in the R100, R200, and ROG, by storing them in constant memory we can save area by reusing the logic
`
`ATI Ex. 2067
`IPR2023-00922
`Page6 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 6 of 448
`
`

`

`ORIGINATE DATE
`
`EDIT DATE
`
`i? January, 2002
`
`[date Va “d MMMM,
`pant
`
`GEN-CRAOXN-REVA
`
`DOCUMENT-REV, NUM,
`
`already needed to update the constant registers in order, Since any single texture instruction wall only feteh from one texture
`we do not need the simultaneous access we would get with implementing this as “normal” regisiers.
`
`1.1.3 Previous Instruction Result
`
`(This section i6. no longer accurate and needs updating)
`Within an ALU clause the result of the previous operation is explicitly available, without requiring a register read
`(due to an exposed pipeline delay, the resull of the previous operation can not be read trom the registerfile without a
`one-instruciion delay shot). There are two distinct previous insimctions, one scalar and one wector.
`This. register is not preserved between the end of one ALU clause and the beginning of another.
`fo can be weed to avoid using anoiher GPR if ihe result is not needed, Also, the output modifiers, which do affect the
`result of an instruction written into GPRs, do not affect the Previous Result content,
`
`1.2 Initial state
`
`1.2.1 Vertex Shader
`
`A vertex shaderinitially has ihe X value of RO sei to the vertex index. No other registers. are filled. The vertex shader
`must use the index io feich the vertex data from the vertex armay(s). The pointers to the vertex arrays should be placed
`in texture consiant registers by the driver.
`
`1.2.2 Pixel Shader
`
`The pixel shader has the interpolated values generated from the values exported by the vertex shader,
`
`If the vertex shader exports io parameter 0, and the R400 is appropriately programmed, then GPR O in the pixel
`shader contains the interpolated values for hal parameter al that piel.
`
`2. Program Format
`(This section is no longer accurate and needs updating)
`A pixel or vertex shader program consists of 16 clauses, eight texture clauses and eight alu clauses.
`The instroctions in a clause will be executed sequentially, If a given instruction is implementing, for example,
`T*5+O0(T = texture for SRC A, 5S = Specular for Source 6, O = Diffuse for Source C}, i's ihe Sequencer's task to
`resolve the dependencies between the ALU clause and the respective texture clause. In olher words, he sequencer
`will not sue the ALU instruction using texture data as input to the shader pipe, until the texture request has been
`issued to and serviced by the texture pipe. In general, the Shader is not aware of the origin of the SRC A, SRC B and
`SRC C data (texture, diffuse, specular, verlex parameters etc), Three address pointers into the register files (one for
`each operand) are all the shaders need fo fetch these operands. In reality, a5 A will become more evident later in this
`document, there is mo need for the pointer values to be passed to the shader unas. This is related to ihe GPR's
`Padwritée Mechanism we have chosen to implement.
`
`3. ALU
`
`3.1 ALU structure
`
`ALU consist of bvo distinc! unis: the “Vector ALU and the “Scalar ALU. The Vector ALU performs operations in
`parallel across a 4-component vector, while the Scalar ALU performs operations on a single component of a vector
`whieh is then replicated across all components, A single instruction will ‘co-issue’ both a Vector and a Scalar
`instruction, Almost all scalar insinuctions require SrcC as an operand, When the Vector operation is only using SrA
`and SrcB as operands (such as in a MUL (Multiply) instruction), the scalar pipe is tree to use Srec as it wishes. When
`the VEcior pipe is abo consuming SroG, such as in a three operand instruction like MULADD (Multiply and ADD), Srec
`is. fixed for the scalar pipe.
`It's important to understand that the given scalar operation sill occurs on SoC. Under
`most circumstances this will result in undesirable behavior unless the scalar operation is benign and has masked its
`destination writes.
`
`For more details on the overall iructure of he Shader ALU, refer to the figures in Section 5 of this document.
`
`ATI Ex. 2067
`IPR2023-00922
`Page7 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 7 of 448
`
`

`

`Select it for selecting Constant va. Register
`O: conten
`Selec ba for selecting Constant vs. Register
`0: Constant
`1: Register
`Select ba tor selecting Constant vs Register
`O: Constant
`1: Register
`
`‘VectorGpeode|@288_|5| Opcode for Vector instruction
`SRC A Register/Consi-
`87:80
`Location of SRC A in the Register or Constant file
`ant Polnter
`i Register, Bite [T]-[80] denote:
`Bit [87]
`0: De not execute ABS on inpul register
`1: Execute ABS on input regrter
`Bit (86)
`:
`O: Logical register addressing
`4: Currant Loop index relative register addressing
`Bits [85)-(B0] location of SRC A in the registerfile
`Hf Constant, Bits [87]-{20) denote:
`_Bits [87)-(60) bocation of SRA in the constant file
`
`SRC 8 Register/Const-|for2 Leeation of SRC 8 in the Register or Constant file
`Refer to SRC A Reg
`ant Pointer
`SRC C Register/Const-
`Location of SRC C in the Register or Constant file
`ant Pointer
`Refer to SRC A Reg
`Constant)
`The address pointer into the Constant file is relates to some base address
`Logical/Relative
`register (works in conjunchen with Relative Addreas Register Select)
`0: Legical eanatant addressing
`1, Relative constant addressing
`The address pointer into the Constant file is relatve to some base address
`register (works in congunchen with Relative Addreas Reglater Select)
`0: Logical constant addressing
`1: Relative constant addressing
`This bat detenmines the address register used as base register when
`
`creas—o—h— 1a Megate_
`
`indgong is
`
`relate,
`
`Logical/Relative and Constant’ Logical/Relative fields.
`O: Current Loop index relative
`1 Address Register relative
`Bite (60/59)
`OX: No predication
`10: Predicated — 1 means skip, 0 means execube
`11; Predicated -— 0 means skip, 1 means execute
`
`DOCUMENT-REV. NUM, 1? January, 2002
`
`ORIGINATE DATE
`
`EDIT DATE
`
`3.2 ALU instruction format
`
`[date Vg “od MMMM,
`
`R400 Shader Processor Modal
`
`There are two opcodes present in the ALU instruction, one for the Vector operation and one for Scalar operation, The
`idea is that we can allow a 4-component vector operation (if ihe compiler permits} cossued with a Scalar Operation.
`The Sealar unit may use SRC C, depending on whelher this source is being used by the vector operation, Please refer
`to Section 8 of this document on the limitations of a Vector or Scalar instrvction issuing.
`
`Constant
`Logical/Relative
`
`Relative
`
`Register ener
`
`Constant
`
`| is used in conjunction wih Constant)
`
`Predicate Select
`
`ATI Ex. 2067
`IPR2023-00922
`Page 8 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 8 of 448
`
`

`

`SRC A Swizzle
`
`SRC 6 Swizzle
`SRC C Swizzile
`
`Scalar Clamp
`
`;
`
`Scalar Write Mask
`
`Vector Write Mask
`
`
`
`Gefines which out of 32 bet words (four of tham) in the scalar result is written
`back in the Register file. There's one bit per channel.
`Bit [23]
`0: Leave ihe current value
`1: Write Scalar Ww
`Bit [22]
`O: Leave ihe current value
`4: Write Scalar 2
`Bit [24]
`O: Leave the current value
`1: Wirte Scalar ¥
`Bit [20]
`0: Leave the current value
`1: Write Scalar *
`Getines which out of 32 bet words (our of ther) in the vector result is written
`back in the Register file. There's ome bit per channel
`Bit [19]
`0: Leave the current value
`1: Witte Vector W
`Bit [18]
`0: Leave the current value
`1) Write Vector 2
`Bit [17]
`0: Leave the current value
`4: Write Vector ¥
`Bit [78]
`0: Leave the current value
`1: Write Wector X
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV, NUM,
`
`i? January, 2002
`
`[date Va “d MMMM,
`
`GEN-CRAOXN-REVA
`
`2 bite for each component
`Bits (52)51] — Wi channel swirzle
`OO: leave Wi
`O41: x
`41: ¥
`W1:2Z
`Bits (50)[49] — 2 channel swizzle
`OO: leave 2
`O1 W
`10: ¥
`Wy
`Bits (48)[47] -— '¥ channel swale
`OO: leave ¥
`O1Z
`10: Ww
`WX
`
`Bits (46)[45] = % channel swale
`O0: leave
`oy
`10:2
`14:
`2 bits for each component(refer io SRC A Swizzle
`2? bits for each component
`(refer to SRC A Swizzle
`poode for the Scalar instruction
`
`ATI Ex. 2067
`IPR2023-00922
`Page 9 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 9 of 448
`
`

`

`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM,
`
`1? January, 2002
`
`[date Vg “od MMMM,
`punt
`
`R400 Shader Processor Modal
`
`and scalar operations.
`
`Scalar Destination|15:8 Bit [18] denotes the destnation of the ALL results
`
`
`Pointer
`0: Register file (Scalar and Vector]
`1: Export file (Gcalar and Vector)
`I Register file destination, Bite [14]-[6] denote
`Bit [14] determines whether the scalar dastinaton address into Register file is
`logical or ralatrve.
`O: Logical register addressing
`1: Current Loop index relative register addressing
`Bits [13]-[8] species the address into the Register file for the result of scalar
`operation.
`I Export fila destination, Bite [14]-[8] denote:
`Bit [14] determines parameter export masking behavior. See table in 3.2.1.4
`Bits
`[13]-[E] ate unused, Must Be Zero
`
`Vector Destination=70 Bit [7] determines the ABS input modifier on al constants in instruction
`
`Peinter
`0: Do not execute ABS on all constants
`1: Execute ABS on all constants
`Bit
`[6] determines whether the vector destination address into Regater ar
`Export file is logical or relative,
`0: Logical register addressing (Must Be cero for exports on r400)
`1: Current Loop Index relative register addressing
`Hf Register file destination, Bits [5]40] denote:
`Bits (5)-(0) species the address into the Regester file for the result of vector
`operation.
`i Expert fila destinaben, Bite [5)-[0] denote:
`Bits [5]-(9] specifies ihe address into the Export file for the result of vector
`
`total of 86 bits per instruction. The bit allocation and assignment for the different fields of ihe instruction
`There's €
`word was done with under the limitations that they should be DWORD (32 bit) aligned.
`
`3.2.1 ALU Instruction Word Interpretation
`3.2.1.1 Ar
`nt Selection
`and Pointers
`There can be a maximum of three sources (operands) required for an ALU operation of a vector type.
`The ROO ALU instrection word definition provides location pointers into the Register file (GPiRs) or Constant file for
`eet of the three sources (SRC A Register/Constant Pointer, SRC B Register/Constant Pointer, SRC B Register/Constant
`nter}.
`
`3211.1 Logical vs, Relative Registers
`SrcA, SrcB and SrcC GPRlocations denoied by SRC A (B, C} Register/Constant Pointer fields of the ALU insinuctian
`word, can be bogical as well as relative addresses. If relative, they are relative to the Current Loop Index present in the
`Sequencer state.
`
`S102 Logical vs, Relative vs, Absolute Constants
`Constants. can also be addressed in logical and relative fashions along with an absolute mode. When relative, they
`Can be relative to either the Cument Loop indéx (CLI) of Address Register (AR) in the sequence state. The truth table
`below shows the instrection fields that are used to decode the nature of the constant values.
`
`Constanto
`Lagieal/Relative
`0: Logical
`
`a
`
`Same as Constant
`
`Same as Consiantt
`_ Absolute
`Absolute
`
`Relatve, CLI|Logical Same as Constant!
`Logical | Relative, CLI|Same as Constantt
`
`
`R
`
`gical
`
`Same as Constant
`Sarre a6 Constanti
`
`_ Relative, AR
`
`Note from the table that if both Constants are relative, they are relalive to the same value, either the Cumeni Loop
`Index (CLI) or Address Register (AR),
`
`ATI Ex. 2067
`IPR2023-00922
`Page 10 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 10 of 448
`
`

`

`mete]
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV, NUM,
`
`i? January, 2002
`
`[date Va “d MMMM,
`
`GEN-CRAOXN-REVA
`
`Constant® refers to the first constant in the instneciion; Constant] and Constant2 refer to the second and third
`consiants in the instruction respectively.
`
`3.2.1.2 InputandOutput Modifiers
`The R400 ALU Instruction word definition provides for only three input modifiers for each of the ihree sources,
`Negate, ABS and Swizzle. When the source is a Constant value,
`inpul modifier ABS applies to all constants in
`instruction. In all situations, ABS is ahvays applied before Negate. Input modifiers do not apply to PreviowsScalar.
`
`The R400 ALU Instruction word provides for two output (result) modifiers: Write Mask which only affects the results
`going into GPRs but not the PreviousScalar and Clamp.
`
`3.2.1.3 GPR write-backs
`The table below deseribes the precedence onder for “mined” use of Scalar Write Mask and Vector Write Mask for
`GPR wrile-backs when the Scalar Destination Pointer and Vector Destination Pointer in the instrection word
`speciy the same GPR. This is done per component (each MASK field is 4 bits wide, one bil per component/channel).
`
`Oo
`
`Dont writerit(mask)_ _
`
`Result of GPR write-back
`
`3.2.1.4ExportandPredicaterelateddecoding
`
`Exporis aré allowed from either Scalar or Vector Pipe. Similar to ihe GPR write-backs, masking of export dala is
`permitted. The mask is present in the ALU instruction word. When exponiing, the export address used is ihe Vector
`Destination Pointer present in the instruction word. The Scalar Destination Pointer in this case is abvays ignored.
`The table below deseribes the “mixed” use of Scalar Write Mask and Vector Write Mask per component when
`exports are coissued, The ability to genérate 0.0f or 1.07 during export provides one method for defaults.
`
`Scalar Write Mask |
`O
`
`| Resultof Export
`‘Bit[14]
`O Bont woite (mask)
`4: Write 0.0f
`
`Write1.01
`
`iSense
`
`A few oiler export related definitions and restrictions:
`1) Exporting of 'ColonFog' is a special case,
`a) When exporting Fog, Color must be exported al the same time. Fog will be exported in the Scalar
`pipe and Golor in the Vector pape.
`bj) The SP produces a final export Color by obeying the vector/scalar mask rules for exports. The
`SP does not see bit[14] so when the vecior and scalar masks for a given channel are 0 a O.0f is
`generated, The SP than merges Fog (always from the scalar pipe) inte the final export color,
`Finally, channel masking is applied in (20/5%) only when the scalar and vector masks for that
`channel are 0 and bij14] is 0.
`¢) Note for Fog to work correctly, SW should ahways output ihe same Fog factor from the sealar pipe
`for all masked writes and all channels of Color should be written before ibe shader exits.
`
`3.2.1.5 Expon Types and Addresses
`The location where the data should be put in ihe event of an export is specified by in the destination pointer field of the
`ALU instrection word. Following is a list of ihe possible types of exports and the range of addresses.
`
`Vertex Shading
`0:15
`16:31
`a2
`33:37
`38:46
`
`+16 parameter cache
`- Empty (Reserved?)
`- Export Address
`+5 verlex exports to the frame buffer and index
`- Emply
`
`ATI Ex. 2067
`IPR2023-00922
`Page 11 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 11 of 448
`
`

`

` ral
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`1? January, 2002
`[date Vg “od MMMM,
`R400 Shader Processor Modal
`12 of 49
`
`.
`
`&
`
`runs
`
`47
`48:52
`53:59
`60
`él
`62
`63
`
`- Debug Address
`- S$ debug export (interpret as normal memory export)
`-Emply
`- export addressing mode
`- Empty
`» position
`- Sprite size export that goes wilh position export
`(X = point size, ¥ = edge flag is bit 0, 7 = VtxKill is bitwise OR of bits 30-0 (any bit other than
`sign means Vixkall_)
`
`Pixel Shading
`Oo
`1
`2
`a
`4:15
`16
`7
`18
`18
`20:31
`32
`33:37
`38:46
`a7
`48:52
`60
`61
`62:63
`
`~ Color for buffer 0 (primary)
`- Color for buffer 4
`=» Color for butter 2
`- Cor for bulfer 3
`-Empty
`- Buffer 0 ColonFog (primary)
`~ Buffer 1 ColonFog
`- Buffer 2 ColonFog
`- Bulfer 3 ColonFog
`- Empty
`~ Export Address
`- 45 exports for multipass pixel shaders.
`- Ermply
`- Debug Addmass
`-§ debug exports (interpret as normal memory export)
`> export addressing mode
`- 2 for primary bulfer (2 exporied to "X" component)
`- Empty
`
`ATI Ex. 2067
`IPR2023-00922
`Page 12 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 12 of 448
`
`

`

`DOCUMENT-REV, NUM, i? January, 2002
`
`[date Va “d MMMM,
`
`GEN-CRAOXN-REVA
`
`ORIGINATE DATE
`
`EDIT DATE
`
`3.3 ALU Opcodes
`The following table represents the ALU operations/opcodes supported by the Vector unit.
`Nokes
`component add.
`arands
`possible colesue.
`component
`om.
`orands possible codssue.
`Component mim.
`2 operand: posgible oplesue,
`
`l eperands possible colscue,
`
`Result
`Elee
`Reaolt
`
`+f
`
`Rasult
`Ela
`Result
`= Broa.
`
`+
`
`-FUGGR(Sroniy
`
`= TRUWC(Srcal y
`Ragulet
`Tf
`(
`(8rék = 0.00) a6 (8rek l= Reeult)
`Ragolt
`+= -Lsof
`Srok © Srol + Srey
`
`|
`
`Tf
`
`[Srca == 0.00)
`Raault
`= Seche
`Eloe
`Rasult = Srocy
`(Srck > O.0f)
`Result
`= Spehy
`Elee
`Rasult
`= Spocy
`[Sro0k > OD.)
`Retult = Brel:
`Rasult
`= Srecy
`
`Por component
`
`Pe Soh T
`7 operand: popalbla coisas,
`
`seat gragter than.
`Par fommonet
`2 operands: pogelble eoleaue,
`
`Far-component oof qroater than
`agian .
`2 oporands possible coissue,
`
`FOr component set no equa
`a
`i porslble mol aie.
`
`“fractlonal’ part of
`Fer componont
`Trek.
`1 coerands:
`possible. ooissue.
`er. Sopanent
`Srois
`1 operands:
`Ciotis
`Par component Floor
`1 operand: poztible eoiseue.
`
`oval tip] y= ack
`Per component
`3 oeprandy no colazsoe,
`Par component fomditional move
`Hjizh |] .
`1 operand: no colssue,
`
`Por component conditional move
`greater than sequal,
`i operands: no colssue.
`
`Per component comditional moahe
`greater
`thar.
`i operands: no ooissue,
`
`i component dok prodiect
`Aetult paplicated in all
`channels.
`ae
`possible cod
`J component dot produ
`Aesult
`feplléated in
`sranine Lite
`2 operands
`posstbloe ca
`2 Component dct prodipct with add.
`Assit Feplicated in all
`channels.
`3 operands no colszsue.
`Cubsimap instriction.
`2 operand (8fck = AR.Zaey, Seb =
`An-VREE}: potrible celserue,
`
`Ainate
`eoordinatay
`Remlt = man {scan HM, Sroke ky
`
`&
`
`I
`
`1 component maximum.
`Result
`replicated in all
`shannelz.
`
`foue
`
`ATI Ex. 2067
`IPR2023-00922
`Page 13 of 448
`
`ATI Ex. 2067
`IPR2023-00922
`Page 13 of 448
`
`€
`

`

`R400 Shader Processor Model
`[date Vg "d MMMM,
`(Serene == OLOf) be
`[Srckome ONT]
`{
`fF
`|
`Result = d.0f%
`Eat Pred cataReg (Execute) J
`iee {[
`Basult = Spek. W # L.0ta
`fat hred] Ca tebe (Shiels
`C Larch f= OVE) ae
`Rasult = O.oty
`fatPrediearaleg (Executels
`laa -[
`Result = Srok.W + 1.0£f1
`SetPradicateneg(sklp}1
`
`ff cloned S-ble Integer DOCUMENT-REV. NUM.
`
`EDIT GATE
`
`[SrcA. == Sik] 7F
`
`Tt
`
`1E
`
`JE
`
`(orense » O.0t]
`(
`Result = O.0f2
`fot Predicatoneg (Executes
`
`bt CSrckWee O.0f} 7
`
`|
`
`lse |
`Result = Srck.W + L.oty
`fet Pred] catenegi Seige:
`
`4 E
`
`LSrcr.,
`
`if
`
`] E
`
`(2ccR. wh w= OL) &e
`[
`Result = D.0Es
`Sat Pred cateheg (Executeals
`lea [
`Result = Srck.W 4 Liots
`SetPradicateHag iSkin);
`
`[orca
`fezult
`Killed
`
`lse
`
`]E
`
`Rasult
`Kliled
`
`1 E
`
`lse
`Ragult
`Result .W
`Srcn.wy
`Reoult.é
`Grok. da
`Result. = SrcA.¥ *
`Reale, x
`L-OFF
`Result
`= MAKifroA, &2c8) 7
`SQResultl - FLOOR (Srek.W = 0, 50)7
`it
`(SQResultF s= -250.0f5 [
`SOULE = SOResul thy
`
`Eraba ty
`
`PREDSETE PUSH
`
`ne PUSH
`
`FREDSEDGE - PUSH
`
`|
`
`[acrenant equals
`Fredlcats counter
`Update predicate register.
`femuit ceplicated in all
`four
`channels.
`2 operand: possible colsous,
`foe noOce below,
`
`inoremant not
`Predicate counter
`equals Update predicate ceglater.
`Aesult
`fFeplicated la all
`faue
`channels.
`2 operand) possible colague.
`fee note below,
`
`lncrenant
`Freadileats counter
`greater than: Update predicate
`register.
`Four
`Aezult
`replicated in all
`channels.
`2 oparands possible coleoue.
`fee Hoke Below.
`
`increment
`Predicate counter
`greater than mpials Update
`Predicate caglater.
`four
`Result coplicuted In all
`changala,
`2 operand: podeible codessie,
`See note below.
`
`For component pixel
`Set KIL] bit.
`2 operands possible
`fae note Below.
`
`Per component plod
`thane Sat KELL bit.
`2 operand: possible
`foe note below,
`
`K1lll groater
`
`oolssue.
`
`Per component pixel
`than equaly Set kill bic.
`2 operand: poseible celgsue,
`See note De lcws
`
`Greater
`
`OF CompheAtT pat
`Set kI11 bit.
`2 oparand; possible codasue.
`See note below.
`
`equaur
`
`Computes dlatanc

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket