throbber
AMD
`
`“i
`
`ORIGINATE
`29-Nov-11
`
`EDIT DATE
`i2-Feb-14
`
`DOCUMENT-VER. NUM.
`1.0
`
`PAGE
`1 of 35
`
`GFXIP_9x SX
`
`Micro-Architecture Specification
`
`Rev 1.0 — Last Edit: 12-Feb-14
`
`THIS DOCUMENTCONTAINS|
`
`INFORMATION THAT COULD BE
`
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF AMD THROUGH
`
`UNLICENSED USE OR UNAUTHORIZED DISCLOSURE.
`
`Preserve this document's integrity:
`
`= Do not reproduce any portions of it.
`
`= Do not separate any pages from this cover.
`
`drive the Internet and businesses, For more information, visit henoy/ /wew end. con
`
`. This document is issued to you alone. Do not transfer it to or share it with another person, even within your
`organization,
`
`3. Store this document in a locked cabinet accessible only by authorized users. Do not leaveit unattended.
`
`4. When you no longer need this document, return it to AMD, Please do not discard it.
`
`“Copyright 2011, Advanced Micro Devers, inc ("AMD") AS rights reserved. This work contains confidential, propeietary to Dee reader wlormation and trade
`wecrets of AMID. Mo part of this document may he wed, reproduced, or tranimitted i any form or fy any means without the olor written permission of AMD.”
`
`AMD, the AMD Arrow Loypo and comSinations thereof are trademarks af Advanced Mera Devers, Me
`trademara of HOMI Licensing, LLC
`
`PCle 6 o registered trademart of POSIG. HOM & a
`
`AMD (NYSE: AMD) is a semiconductor Geagn innovator wading the next era of vivid digital experences with its ground-breaking AMD Fusion Accelerated
`Processing Units (APU). AMD's graphics and computing technalagies power a variety of devices including PCs, game conscles and the powerful computers that
`
`uP I SE MAM doce
`
`19788 Byer
`
`12/OLJES 12 AM
`
`AMD1044_0104744
`
`ATI Ex. 2025
`IPR2023-00922
`Page 1 of 35
`
`ATI Ex. 2025
`
`IPR2023-00922
`Page 1 of 35
`
`

`

`1.0
`ORIGINATE
`EOIT DATE
`‘&
`29-Nov-11
`Areas
`
`DOCUMENT-VER. NUM.
`
`AMD
`
`Revision History
`
`‘DateSsBy=~ Revision|Deseniption
`
`
`(O41|iitialCopyfromgfs8
`WIZ
`eu|
`IDI
`1
`=
`baa
`=z
`5ell
`i——4)
`=!
`
`First edit for out-of-date contents
`
`NP SA MRS coce~ 1588 Byey
`
`SSUES 1aM
`
`AMD1044_0104745
`
`ATI Ex. 2025
`IPR2023-00922
`Page 2 of 35
`
`ATI Ex. 2025
`
`IPR2023-00922
`Page 2 of 35
`
`

`

`DOCUMENT-VER, NUM. 29-Nov-11
`
`ORIGINATE
`
`EDIT DATE
`aete
`
`LO
`
`PREFACE
`
`1
`
`INTRODUCTION
`
`Table of Contents
`
`DESPINITIONS / GLOSSARY OF TERMS
`LI
`‘Tor Lever DIAGRAM
`12
`Migure 1 SV tn chip contest
`13
`ATX REQUIRED FEATURES
`Had
`SX support for 2 PC/GDS redirect busses instead of I in 1 SE configurations (16 PIX PER SH)
`1.42
`SX support for 4 RBs. (16 Pik PER SH)
`ht
`SX support for deeper Position buffer, Colar buffers and position alloc storage {16 Pix PER SH)
`Lia
`Streaming of performance counters (STREAM PERF CNTRS (GPU.O1 & GPU.02))
`
`1
`
`DETAILED CHANGE DESCRIPTION
`
`SX SUPPORT POR 2 PC/GDS REDIRECT HUSSES INSTEAD OF LIN 1 OR 2 SE CONFIGURATIONS (1 SH PER SE
`14
`ONLY) &
`Testing
`1.1.1
`SX SUPPORT For 4 RBs,
`12
`Tesning
`2d
`3 SX SUPPORT POR DEEPER POSITION SUPPER, COLOR UPPERS AND POSITION ALLOG STORAGE
`
`2
`
`3
`
`4
`
`PERFORMANCE
`
`POWER
`
`HARDWARE IMPLEMENTATION: TOP LEVEL
`
`Tot LEVEL DRAWING
`4.1
`Figure 2) SN Top Level Dicgrane
`4.2
`ADDRESSING THE BUFFERS
`4.3
`FORMATOFTHE DATA
`44
`(CONTROLS OF AN EXPORT
`45
`Cook EXPORT
`AST
`The color scorehuard
`$52
`Export buffer address computation
`Figure 3: din example ofaddrexang the export Muffer
`4.6
` Posros EXPORTS
`4.7
`REDIRECT EXPORTS
`
`5
`
`HARDWARE IMPLEMENTATION: INTERFACE
`
`SHADER Core DvTeEREACES (SPUSQU’SP)
`$1
`SH SPT EXPREQ
`Saad
`SPY SH EXPGRAAT
`5.12
`SO SY EYPCMD
`5.43
`SE SY EXPADDR
`SLA
`SY_ SPY Free Signals
`SAS
`SUH &¥ DATA
`S5.h6
`$2
`SX OPC INTERFACES
`SV PC EXPCMD
`$3
`SX TOPA INTERFACES
`$4
`SX ToODB evTeRFacts
`
`4
`
`5
`
`5
`5
`6
`7
`?
`?
`7
`?
`
`8
`
`s
`x
`9
`9
`
`9
`
`o
`
`Ut)
`
`10
`id
`10
`I
`1
`12
`{2
`Iz
`3
`13
`13
`
`14
`
`14
`is
`I6
`lé
`is
`is
`ig
`20
`20
`2]
`23
`
`UP eT MA docs 177 Bye
`
`LVOLDE 1-23 aN
`
`AMD1044_0104746
`
`ATI Ex. 2025
`IPR2023-00922
`Page 3 of 35
`
`ATI Ex. 2025
`
`IPR2023-00922
`Page 3 of 35
`
`

`

`29-Nov-11
`
`EDIT DATE
`12-Feb-14
`
`DOCUMENT-VER, NUM.
`LO
`
`ORIGINATE
`
`A micro-architecture (or block level) specification serves muiny pamposes.
`
`«©
`
`The micro-architecture specification is used by design verification teams to build block and sysiem level
`venfication environments (test benches, test soonarios, test cases and testing methodologies’stnvcgics)
`Although, a document is clearly not the ONLY vehick used by design verification teams, it ls a critical
`picee of the verification planning process in that it provides valuable data to support the test plan meetings
`between design and verification teams
`®=The micro-architecture specification is also a useful tool for peer and block design reviews, Interfacing
`blocks require explicit details of control and handling of data being transferred. transformed between
`blocks. The specification as the obvious resource thal peer tears go to for this type of information
`This specification ts also used by post-silicon verification teams and in the creation of documentation (such
`as programmeng guidelines, cic.) that must be prepared for external customers,
`This documentuton ts also useful when designs are transferred to other teams. For example, derivatives of
`a graphics core are weed in many other products that include an integrated core, hand held devices, c1c
`

`
`*
`
`Exch new major architecture should comain a micro-architecture specification for cach block in the subsystem. In
`the case where a design ts derived from a previous project, thal previous project specification would be updated and
`checked into the new project revision control documentition area, All the delta features would be described in ihe
`feature section of the micro-architecture specification and the document in general should be updated to match the
`new project updated block design.
`
`A lemplite is provided as a means of descnibing the detail reqpaced by all teams that use the micro-arclatecture
`specification amd to drive consistency between the specifications from one block to another. This templiae wis
`created by the Design Verification Workgroup which is comprised of representatives across all AMDBusiness Units.
`This templaic has been and will be distributed to other Business Units and Design Design Verification and
`Architecture teams for review and feedback. This template was created froma reviewof manyof ihe existing micro-
`architecture specifications. Examples are pulled from these documents and presented here to illustrate the type of
`information that ts required, To distinguish between the descriptions of contem and examples,all examples appear
`in therlien,
`
`NP SA MRS coce~ 1588 Byey
`
`SSUES 1aM
`
`AMD1044_0104747
`
`ATI Ex. 2025
`IPR2023-00922
`Page 4 of 35
`
`ATI Ex. 2025
`
`IPR2023-00922
`Page 4 of 35
`
`

`

`29-Nov-11
`
`ORIGINATE
`
`EDIT DATE
`12-Feb-14
`
`DOCUMENT-VER, NUM.
`10
`
`The SX (shader export) block is responsible to receive and re-order color and position exports from the shader
`core and forward themto the correct client: PA for position, and the correct DB for color The SX is also
`conduit for parameter and GDS exports and in this role forwards the data unchanged to the PC (parameter
`cache) block.
`
`The main teput forthe SX block is the shader output brs which is 2 busses cach 16x32 bits wide. The SX
`cutpul busses ane:
`1)
`128 bits wide bus to the PA (primitive assembicr) block supporting | position per clock
`2)
`256bits to cach DB (supporting up to 4 DB per SX) thus supporting 4 “compressed” pixels per clock (b4
`bpp) or 2 uncompressed pixels per clock (12% bpp).
`16x32 bit bus to PC/GDS.
`
`3)
`
`1.1 Definitions / glossary of terms
`
`Thread—one instance of a shader program being executed on a vector of pivels, vertices, or primitives
`Fach thread has its own state which is aniqiefron anyother thread
`Clause—0 group ofinstrnctienty all ofthe same type (allALU, all texture-fetch, ete.) execnted as a group;
`part ofa thread,
`Wave-one insinictien operates on a wave ofpolsvertices primitives over 4 clock cveles. Thes is the baste
`pantafwerk. The size ofa vector depends on the xusteo configuration, bul is always 4 clock cvcley,
`SIMD—Sorgle lostruction, Multiple Data. Here, a SIMD refers te one slice ofthe SP machine which all
`recemesthe same instmiction and operates on a vector ofdeta. Most taplementations will have multiple
`AIMDs. Fook SIMD recemves a separate iisirnction jroar the SO.
`
`1.2 Top Level Diagram
`
`PP x ST MAS doce 15788 Byte
`
`SS/OLRE 1-13 AM
`
`AMD1044_0104748
`
`ATI Ex. 2025
`IPR2023-00922
`Page 5 of 35
`
`ATI Ex. 2025
`
`IPR2023-00922
`Page 5 of 35
`
`

`

`ORIGINATE
`29-Nov-11
`
`EDIT DATE
`12-Feb-14
`
`DOCUMENT-VER, NUM.
`
`10
`
`Figure 1; 5Xin chip context
`
`This is the SX block in full chip context. The 5X sits at the bottomof the SH array and as such there will be
`as many SX blocks as there are SH blocks (there can be one SH block or 2 SH blocks per SE depending on
`the configunuion chosen). Up to 4 DBs can be connected to a single SX block and 2 512 input busses are
`the sole data inputs to the block. The outputs of the SX, other than the DBs are the PA for position and the
`PC/GDSfor parameter/GDS transactions,
`
`MP Pe SA MAS doce
`
`29 70R Byter
`
`L2/OUEE 1:1 AM
`
`AMD1044_0104749
`
`ATI Ex. 2025
`IPR2023-00922
`Page 6 of 35
`
`ATI Ex. 2025
`
`IPR2023-00922
`Page 6 of 35
`
`

`

`7 of35
`ORIGINATE
`EOIT DATE
`DOCUMENT-VER. NUM.
`‘i
`29-Nov-11
`isfatri¢
`LO
`
`PAGE
`
`AMD
`
`Requircinents (Delta Requirements)
`
`For more details, please sec GFXIP_¥x_SX_delta.doc
`
`13
`
`6AM
`
`uired Features
`
`13.1
`
`Support RB+ feature for 2RB setting
`a, Motivation: gfx9 9.87, Enable dual RB+ per SE (Scalable pixel rate per shader array)
`b, Area; 35%
`c. Schedule: 2 week EMU, 4 week RTL, 6 week DV
`d,
`Impacted blocks: SPI SX SC etc
`
`re SE MB doce
`
`35788 Byte
`
`SSO23:29 aM
`
`AMD1044_0104750
`
`ATI Ex. 2025
`IPR2023-00922
`Page 7 of 35
`
`ATI Ex. 2025
`
`IPR2023-00922
`Page 7 of 35
`
`

`

` ORIGINATE
`
`29-Nov-11
`
`EDIT DATE
`la-Feb-14
`
`DOCUMENT-VER. NUM.
`
`PAGE
`8 of 35
`
`Detailed Change Description
`
`1.1
`
`SX support for 2 PC/GDS redirect busses instead of 1 in 1 or 2 SE
`configurations (1 SH per SE only)
`
`5X will decode the SQ_5X_expomd bus and replicate the control data for 4 clocks on 2 separate
`5X_PC_expemd busses. This is to make this change transparent to the PC blacks.
`
`
`
`The 2 SX busses configuration is enabled when GPU__GC__DUAL_PC_EXPORT_BUS == 1. This feature
`should only be enabled if NUM_SE <= 2 and NUM_SH_PER_SE = 1, In this case only 1 of the vdata
`busses can carry GDS information at a time as the GDS will only have 2 input ports to the actual logic
`(the other ports are just pass through for the PC).
`
`4.8.2 Testing
`
`1. Test with lots of parameters and no pivels und HW coverage assert to make sure SPI is able to use both
`busses at the same time for PC transfers
`2; Test with lots of parameters and GDS ops to make sure SPL doesn't allow 2 GDS transactions at the same
`tine
`
`1.2 SX support for 4 RBs.
`
`SEP FT MAdoce 1)7E Bytes
`
`LEOLRE 1:1 aM
`
`AMD1044_0104751
`
`ATI Ex. 2025
`IPR2023-00922
`Page 8 of 35
`
`ATI Ex. 2025
`
`IPR2023-00922
`Page 8 of 35
`
`

`

`12-Feb-14 29-Nov-11
`
`DOCUMENT-VER, NUM.
`
`1.0
`
`ORIGINATE
`
`EDIT DATE
`
`This feanere is controticd by:
`
`epu.ge numbpersxe4/4
`
`// gpe.ec.num_rbpersc! gpu.gc.qum_shperse
`
`Bvervthing should be alreaty ready in the SX to support this but we need to test this and imke sure everything
`works as expected as the 4 RB per SX option was not tested much
`4.8.2 Testing
`
`PERF: Fill tests with vanous formats to make sure we can peg the export busses, With 4 RB per
`\)
`SX we are balanced so we nood to fully utilize the export busses to cnable all 4 RBsai once.
`
`1.3 SX support for deeper Position buffer, Color buffers and position alloc
`storage
`
`Those features are all controlled byfeature flags:
`
`epu.sx.color_scorcboard_shots=%4 // number of color waves
`epu.sx.posscorchoard slots=32 // mumber of vertex waves
`epusx.color_export_bulfersize256// this is the depth of the memoryin the SX
`Epusx.pos_export_buffer_stac=312 // this is the depth of the memoryin the SX
`gpu.sx.color_exportregbuffer_size=1024 (/ this is the logical size of the buffer should be 4x
`color_export_buffer_sive
`EPILSX.posexport_reebuffersize=2048 // this is the logical size of the buffer should be 44
`pos_export_buffer_size
`
`Mcase Thebes
`The knyptos project is using the above scitings.
`Acndcase
`
`We need to support values of 16, 32 and 64 for gpu.sx.pos_scoreboardslots, 256 and 512 for
`gpu.sx.color_export buffersive and 256 and $12 for gpasx. pos_export_bulfersive. Those affect memory depth
`andl pointer widths and wrap potnis inthe SX, The last 2 settings affect the SPI but must be changed to always be 4x
`of the above values.
`
`Right now, we are not planning to make the color buffer any deeper than it ts for area reasons bat we will make the
`postion buffer deeper so to enable the same number of WS waves per SE than Tahiti used to have
`
`2 Performance
`
`The SX has peak data input and oulpat rate requirements
`*
`Sustain the receipt of shader data exports at the maxinuen rae of 1024 bits per clock.
`*
`Sustain {2-4} pivels per clock peak output cach DB's depending on surface format
`*
`Sustain position exports to inaintain | primitive per clock operations in the PA per SX
`
`3
`
`Power
`
`We have picked out 4 main busy signats to genenuc the CAC signal which will be used by the power team.
`They have sume weight and be added together, the MSB will be used as CAC output.
`
`rr hk TE MM doce 1589 Byte
`
`S2/OLRE 23:23 AM
`
`AMD1044_0104752
`
`ATI Ex. 2025
`IPR2023-00922
`Page 9 of 35
`
`ATI Ex. 2025
`
`IPR2023-00922
`Page 9 of 35
`
`

`

`
`
`DOCUMENT-VER. NUM.
`1.0
`
`PAGE
`10 of 35
`
`EDIT DATE
`ORIGINATE
`12-Feb-14
`29-Nov-11
`4 Hardware Implementation: Top Level
`
`The SX works on 3 classes of exports from the shader: Color, Position and redirect exports. We will not explain
`a3 classes tm detxil,
`
`4.1 Top Level Drawing
`
`T5174 Gennes wen edepercdet cortoe
`Cxpot ae
`1) Powter can only be esperted cn | fan af o gwen toe
`2) MOOS?eae can only be eeperted on 1 tus at 6 geri tne
`2) 4 OB contg cartel Go 2 beratecs to aero Otte at he sete
`tte fie 08 Gum fe coreg OFT ww cre feat oo Pea oF PD
`GOB/Tess & 0823)
`
`wmtncton GPR--esd
`oO =
`Fe
`
`=
`
`et
`
`Ss See
`
`==||
`
`Newest ens oF
`reer
`
`4.2 Addressing the buffers
`
`Figure 2: SX Top Level Diagram
`
`Before we go in the details of howcach export mode works it is important to understand how Uhe various
`export buffers are addressed. The SPI is the master compolier for all the export buffers, it controls the
`allocation and supplics the base addresses for cach wave front and docs so on every export. The addresses
`supplicd by the SPL are always per 128 bit chunks and as such (since the export buffers really consist of 4
`128 bil memones side by side) it takes an increment of 4 in the buffer address before a newaddress is used in
`the memories, This is done so we can pack multiple expos on a single memory address, Here is the format
`of the address as supplied by the SPI;
`
`Memory address
`
`(10:2)
`
`Memory [D (1:0
`
`EP PT MA doce 3570 Byte
`
`LE/OLEE 12:1 aM
`
`AMD1044_0104753
`
`ATI Ex. 2025
`IPR2023-00922
`Page 10 of 35
`
`ATI Ex. 2025
`
`IPR2023-00922
`Page 10 of 35
`
`

`

`PAGE
`
`ORIGINATE
`29-Nov-11
`
`EDIT DATE
`12-Feb-14
`
`DOCUMENT-VER, NUM.
`LO
`
`Ll of 35
`
`The 2 LSBsare used to address which memory vou wart to write to and the upper 9 bits are the memory
`address itself. As such the computed address nceats 10 wrap at
`GPU_SX_ {COL/POS} EXPORT_BUFFER_DEPTH*4. The SPI will take care of wrapping the base
`addresses bud the SX ts responsible of wrapping any intemal address if computes (within the wave).
`
`4.3
`
`Format of the data
`
`The shader core. because of its scalar nature changed a bol from the previous architectures. Because ofthis it
`will export the data differently than it used to. The native form of exports will now be per component instcad
`of per pixel So instead of receiving 4 XYZW pinels, the SX will receive 16 X components from 16 different
`pixels, followed by 16 Ys and eo on (until all the required components, pot necessarily all 4, are exported),
`The render backends still expect to see the data in the same 4 XYZW format so the SX must nowreformat
`the data tht was sent from the shader
`
`For color we are planning to support 3 retive sives across 6 formats (all specified in the
`SPLSHADER COLFORMAT am! SPISHADER2 FORMATregisters), Those formas are:
`
`00 - SPI_SHADER_ZERO©. No exports dane (OC)
`01 - SPLSHADER_32_R: Can be FP32 of SINTS2/UINT32 Red Component (1C)
`02 - SPI_SHADER_32_GR: Can be FP32 or SINTS2/UINT32 GR Components (26)
`03 - SPI_SHADER_32_AR Can be FP32 or SINT32/UINT32 AR Components (20)
`04 - SPISHADER_FP16_ABGR: FP16 ABGR components (2C)
`06 - SPL_SHADER_UNORM16_ABGR: UNOGRM16 ABGR Components (2C}
`06 - SPILSHADER_SNORM16_ABGR: SNORM16 ABGR Components (2C)
`07 - SPISHADER_UINT16_ABGR: UINT1é ABGR Components (2C)
`08 - SPI_SHADER_SINT16_ABGR: SINT16 ABGR Components (20)
`09 - SPIL_SHADER_32_ABGR: Can he FP32 or SINTS2/UINTS2 ABGR Components (4C)
`
`For Position, the formats are a little more simple and are sfored in the SPLSHADER_POSFORMAT:
`They are:
`
`00 - SPI_SHADER_NONE SPI_SHADER_NONE (0C}
`01 - SPI_SHADER_1COMP: SPI_SHADER_1COMP (1C}
`02 - SPI_SHADER_2COMP: SPI_SHADER_2COMP (2C}
`03 - SPI_SHADER_4COMPRESS: SPI_SHADER_4COMPRESS (2C}
`04 - SP|_SHADER_4COMP: SPI_SHADER_4COMP (4C}
`
`This format can be different per MRT.it ts the responsibility of the shader to output the data in the nghi
`format as specified by the SPI_SHADER_*FORMATregisters. Whale the register lists all those modes (for
`futare compalibrlits ), currently only modes 0 or 4 are supparted by the HW for position exports.
`
`There will be a maximum of 2 512 bit data busses supported, controlled by 1 SPI expaddr bus (phase 0), bus
`0), phase 2, bus 1) and 1 SQ expend bus (same nubes as SPI bus).
`Controls of an export
`
`4.4
`
`An export to the SX whether it is color, position or redirect always starts with an SPLSX_enpaddr
`transaction. This | clock transaction is iclling the SX about the state of the export and which data bus will
`cary the data, the simd field ts used for this purpose (sind (1 are telling the SX to look at vdata bus 1) and
`Sim 2/3 sorting is for vdata bus 1).
`
`A fixed number of clocks Later. the SQ.SX_expemd hus will trigger. This ts also a | clock bus and also
`obeys the same simd niles than the SPI bus. This bus further qualifies the state for the export to happen.
`
`rr Ik A MB doce
`
`~ 37783 Byer
`
`SS/OLEE 13-0 AM
`
`AMD1044_0104754
`
`ATI Ex. 2025
`IPR2023-00922
`Page 11 of 35
`
`ATI Ex. 2025
`
`IPR2023-00922
`Page 11 of 35
`
`

`

`12 af 35
`
`ORIGINATE
`
`29-Nov-11
`
`EDIT DATE
`12-Feb-14
`
`DOCUMENT-VER, NUM.
`LO
`
`PAGE
`
`An export transaction is concluded witha 4 clock transaction on cither SP_SX_vdata{0,1} busses. This burs
`carries the data to be written and the write enable masks.
`
`4.5 Color export
`
`Because the shader core now expons all of its data per component, there was an opportunity to compress the
`export buffers, Before room for 4 components was slways reserved even if we were using only | or 2 of
`them. Nowthat only the components that arc being used are exported (and the shader is responsible to make
`sure the format of the shuder export is consisiomt with the format specified in the SP]SHADER * FORMAT
`fegisiers), we can compress the buffers and only write the needed data (without holes), This has multiple
`advantages:
`
`1) Since only the valid channels ane written, the same buffer depth that we used to have in older
`architectures. will hide more shader Linney with the sume depth in the compressed formal cases,
`2) Only the used data is watien/read so we will save power and always read/write as fast as we can
`3) All converters and alpha test can be moved to shader code to save area and some imore power,
`
`In onder to correctly pack the data to the cobon'position buffer the SX will have to rely on some state in onder
`to leave enough room in between components in onder lo write the next componenis of the same pixel. Same
`is Inve for MATs, since the DB expects the MRTs of a given quad to be all sent before we move to the next
`quad, the data needs to be written in consecutive (or close to consecutive) memory locations, In onder to
`achieve this. the SX will implement the following address equation (per DB).
`
`45.1 The color scoreboard
`
`Since the wave 1s coming from shader out-of-order, we have a scoreboard fo maintain the order of the incoming
`wave, Onlythose waves which have been tagged in scoreboard (SPLEXPADDR bus will contain scoreboard id
`information) and will be sent out of sx fromm small to bigger range. And the scoreboard should be crawled one
`by one. The scoreboard save ts different duc to different proyect.
`
`4.5.2 Export buffer address computation
`
`
`
`45.2.1Beforegfx8.1
`
`Adkiness © base (from the Sl) + MRT_FULA_SIZE + quad# in phase(compressed)*MRT_CUR_SIZE + MRT_PREV_SIZE +
`COMP
`
`Where:
`
`MRT_FULL_SIZE = SUM_FORMATS* @ quads in prev_phases
`MRT_CUR_SIZE = se of current MRT per SPI_SHADER_*FORMATregister
`MRT_PREV_SIZE = @ quads in phase* (sun of prew MRTS per SPLSHADER_*_FORMATregister (have to use DST to know’)
`
`MW (MRT_FPORMAT # 40)
`COMP=)
`if (MRT_PORMAT = 20)
`COMP = (component+quad#/2)'S2
`if (MRT_PORMAT = 4C)
`COMP = fcomponent+quad#ly%4
`
`UP Ik SA MBS coce—~ 15788 Byey
`
`SOURS 1aM
`
`AMD1044_0104755
`
`ATI Ex. 2025
`IPR2023-00922
`Page 12 of 35
`
`ATI Ex. 2025
`
`IPR2023-00922
`Page 12 of 35
`
`

`

`ORIGINATE
`29-Nov-11
`
`EDIT DATE
`12-Feb-14
`
`DOCUMENT-VER. NUM.
`LO
`
`PAGE
`
`13 of 35
`
`For example, let's say that we examine DBO's export buffer (cach DB has its own expon buller and is
`umquely mddressed), Let sty the SPLSHADERCOLFORMATis prognummed as 0x1291 so 4 MATS are
`exported by the shader and their forme is:
`
`MRTO 1C
`MRTt 40
`MRT2 20
`MRTS 1
`
`Let's further say that DBO will get | quad in phase 0, 3 im phase 1, 4 in phase 2 and | in phase 3. Then,
`
`slarting at address 0), the export buffer would book like:
`
`Figure 3: An example of addressing the export buffer
`
`The SPI always allocates # quads written to this DB * SUMof all MRT formats so in this case Y quads * BC
`= 72 locations.
`45.2.2 Gix8_1 and later
`Inorder to support RB+, we slightly modify buffer management behavior, The address calculation wayis
`changed from Loop phase -> Loop MRT -> Loop Quad (Figure 4)
`
`EP Pe SE MAS doce
`
`11 7EE Bytes
`
`L2/OURE LL AM
`
`AMD1044_0104756
`
`ATI Ex. 2025
`IPR2023-00922
`Page 13 of 35
`
`ATI Ex. 2025
`
`IPR2023-00922
`Page 13 of 35
`
`

`

`PAGE 12-Feb-14
`
`DOCUMENT-VER, NUM.
`1.0
`
`14 of 35
`
`ORIGINATE
`
`EDIT DATE
`
`P
`
`mato
`
`maT
`
`“
`
`Ps
`
`7 4
`
`‘
`
`fo
`
`if
`
`r ~ “~
`
`Quado
`
`Quad
`
`Ni
`
`\
`
`\
`
`MATT
`
`phase
`
`phasel
`
`phase2
`
`phase3
`
`To Loop MRT > Loop Phase > Loop Quad (Figure 5}
`
`Figure 4 Old memory address caleultation
`
`MRTO
`
`MRT1
`
`sexeakiae
`
`euanassse
`
`MRT?
`
`wr
`
`o
`
`a
`
`,
`
`Fo
`
`ae a
`
`~.|
`
`Phasel
`
`Phrase?
`
`Prase3
`
`i
`
`NK,
`
`he
`
`\
`
`Quadd
`
`Quadt
`
`}
`
`|
`
`\ a
`\
`
`Figure 5 New memory address calculation
`
`By sucha new memoryorganteation way, we will avoid some bank conflict issuc duc to suppor new RE
`feature. More detail could be found through document PLSX LinkQuad_investigation.docx
`
`4.6 Position exports
`
`Position exports work ina similar fashion than the old color exports bul since there is only | position buffer
`(4 memories 128 bits wide cach) and | client reading the buffer (the PA), it makes its management slightly
`casicr
`
`In the current plan of record, the SP! will always wlocate 64 positions and the SX will always free 64, no
`matter howmany vertices are actually in the buffer, The deallocation will be done once a position export for
`ihe full vector is completed.
`
`4.7 Redirect exports
`
`Since the parameter caches moved outside of the SX block and the SX is the datapath for the GDS writes, the
`SX needs to redirect some ofthe data it recetyes from the shader to the PC/GDSblocks. It knows what to
`
`EP eT MA doce 277EE Byer
`
`12VOLRE IL AM
`
`AMD1044_0104757
`
`ATI Ex. 2025
`IPR2023-00922
`Page 14 of 35
`
`ATI Ex. 2025
`
`IPR2023-00922
`Page 14 of 35
`
`

`

`15 of 35
`
`ORIGINATE
`29-Nov-11
`
`EDIT DATE
`12-Feb-14
`
`DOCUMENT-VER. NUM.
`LO
`
`PAGE
`
`redireet and when to do so by looking al the SPL_SX_expaddrpassthrough bit and the
`$Q_SX_espemd_target field. When set to a locmiion outside the SX (PC or GDS) the SX will write the input
`data onthe SX_PC_expemd/vdara busses instead of processing the (ransaction internally
`What is different from Cl to SI is we can have 2 redirect buses when the feature dual_pe_bus is on
`
`4.8 RB+ feature
`The data bus between SX and DB are 246bit and if one quad is J2bpp, it can be transfer 2 quaxts uf fully utilize the
`bus, That's what we want to improve in RB+ feature,
`SX will reocive pixel waves export information (one wave include 16 quads) from SPI like previous propect. In order
`to support RB+ feature. SX will book at the relationship of adjacent quads. If theyare tagged as paired which means
`there is potential chance to enter & pixel per clock mode, We will see source data detection (see the export data and
`do a comparison acconting some predefined rules) result and check the down-convert table, If both operation are
`passed. Then those 2 quads will be sent together in. one 256bit transaction
`
`Source data detection
`48.1
`The first step of source data detection ts used to generated some chock result between the export data and reference
`value (0/1), For example, if the setting is to see the value is (), the export data will be checked to see if it is 0. IF it is,
`we reward the check result is pass.
`For different formal, a small value epsilon will be used to jodie it is close to 0 or 1 (regard it as 0 oF 1), this is also
`defined in 4 register
`
`Per Context Per MRT Register: Add the following registers per state set and per cach of the 8 MRTs-
`
`BLEND_CPT_EPSILON ONRAIeeeeroneneaonce
`SaniaUn CNatacieeethead
`CisMistotatin187(setfor11bitalphaformats)
`03=Mustbewithin1S21(ator1bphoats)
`05—Mustbewithin 1oNI0
`05— Mustbewithin1.0°2"-9
`07—Mustbewithin PRSeatsBerote
`
`02 — Must bewithin 1,0°2*-
`
`04 = Mustbe within 1.0°2*-1
`
`08 — Must be within 1.0°2*-8
`
`10°27
`
`11 —Mustbewithineen for&tet
`ee
`“3
`(set
`formats}
`13—Mustbewithin1528aeorsbtformats
`15—Mustbewithin "eh(setfor4 bitformats)
`oecn #1 ridofthisandhaveasinglebathatsays ifthesourcedata
`beforeor aftertheformat down-conversion?
`Thedetection of0.0or 1.0canranedone viaeae on the unbiasedexponent onlyofthe FP16 or FP32 channels
`
`14—Mustbewithin 7
`
`and the MSB of the mantissa or else the upper bits of monnas. The intention is to match the precision of the vero and
`one detection to the precision of the destination format, For cxample if a value of 2°-12 is being added into an 8
`bit surfiee which can only represent 2"-8_ there will be no change to the destination result even though the souroe
`data is non-*cro,
`Source data detection is needed for these SP] SHADER export formats:
`SPL SHADER export format|Math «
`|FPIGABGR —s—ic“$a<aé$sd
`
`mt and MSB of mantissa
`
`oouremdEyponentandMSBofmantissa Onn
`
`FP32_ ABGRetix)
`APR Compa ee onemand MSB ofmantissa
`
`UP Fe MA doce 35789 Byte
`
`A2/OLE 10 aM
`
`AMD1044_0104758
`
`ATI Ex. 2025
`IPR2023-00922
`Page 15 of 35
`
`ATI Ex. 2025
`
`IPR2023-00922
`Page 15 of 35
`
`

`

`
`
`
`
`EDIT DATE
`ORIGINATE
`12-Feb-14
`29-Nov-11
`
`SNORM16_ABGR
`N+1 bits from the 2°-N Epsi
`The result is a set of masks of which pixels have “alpha” and/or “color” close enough to 0,0 andlor 10 } If Alphes
`docs mo exist inthe shader expon forme, it can be considered 1.0, Ifall of the shader exponed color channels are
`close to W.0.or 1.0, then “color” is 1.0 of 0.0, otherwise it i neither.
`
`PAGE
`16 of 35
`
`DOCUMENT-VER. NUM,
`
`Such a check is done on color and alpha independently. Then the second step is to generbypass/don't_rd_dst
`check result by belowniles:
`
`Per Context Per MRT Registers; Add SX_MRT#BLEND_OPT per state set and per cach of the &
`MRTs which controls how to detect specific optimizations on source and destination data and how to combine those
`optimivalions independenty for alpha and color
`Field Name
`COLOR_SRC_OPT
`
`Desctiption
`Sot which values in color or alpha will preserve the source color data and which
`values will allow the color source data to ba ignored. Settings can be derived trom
`COLOR_SRCBLEND.
`
`20
`
`POSSIBLEVALUES.
`00-BLEND_OPT_PRESERVE_|
`IGNORE ALL: Setwth ZERO
`01 -BLENO_OPT.
`BONE: GetwitOn
`Set with SRC_COLOR
`02 -BLEND_GPT_PRESERVE_C1_|
`03 - BLENO_OPT_PRESERVE_CO_IGNORE_C1: Set with
`ONE_MINUS_SRC_COLOR
`04 - BLEND_OPT_PRESERVE_A1_IGNORE_AD: Set with SRC_ALPHA
`05 - BLEND_OPT_|
`}IGNORE_At: Set
`eyaor
`06 - BLENO_OPT_PRESERVE_NONE_IGNORE_AO: Setwith ALPHA_SATURATE
`07 - BLENO_OPT_PRESERVE_NONE_IGNORE_NONE:Setwith any other
`BLEND_* mode
`Set which values in color of alpha will preserve the dest colordata and which valves
`
`64
`
`We atomte cols deeclafeBeanced: Salinecan be’doentom
`
`POSSIBLEVALUES”
`REQBEte
`00 - BLENO_OPT_PRESERVE_NONE_IGNORE ALL: Set with ZERO
`C2”BLENO-OPT-PRESERVE-C1-TONORE:GOGetwinSAC.COLOR
`t
`SRaKHORAOSetatALPHA
`OPT_PRESERVE_CO_IGNORE_C1: Set with
`One.MINUSsmSRC_ALPHA apres
`o7--BLENO-OPRESERVENONE.\GHORE-NONGSetwithanyother
`10:8 pesdissebsbalboa nconty Sncanestspalepee merryalba
`COLOR_COMB_FCN
`bypass the blender, destination reads can be skipped,
`andor the
`overwrite can
`whole pixel can be discarded, Settings can be derived from COLOR_COMB_FCN
`
`OPT_PRESERVE_NONE_IGNORE_AO.
`
`Set with ALPHA_SATURATE
`
`POSSIBLEVALUES:
`00 - OPT_COMB_NONE: No optimizations are enabled.
`01 - OPT_COMB_ADD: Set with OST_PLUS_:
`02 - OPT_COMB_SUBTRACT: Setwith SRC_MINUS_DST
`03 - OPT_COMB_MIN: Set with MIN_DST_:
`04 -OPT_COMB_MAX: Set with
`_DST_SRC
`05 - OPT_COMB_REVSUBTRACT. Sat withDST_MINUS_SRC
`06 - OPT_COMB_BLEND_DISABLEO:Set this or*_OPT_DISABLE when blend is
`
`07 -OPT_COMB_SAFE_ADD: Same as legacy CB auto mode
`
`MP te SE MAS doce 150R Bytes
`
`LE/OLEE 11:19 AM
`
`AMD1044_0104759
`
`ATI Ex. 2025
`IPR2023-00922
`Page 16 of 35
`
`ATI Ex. 2025
`
`IPR2023-00922
`Page 16 of 35
`
`

`

`AMD
`
`17 of 35 ALPHA_SRC_OPT
`
`ORIGINATE
`EDIT DATE
`DOCUMENT-VER. NUM.
`PAGE
`29-Nov-11
`la-Feb-14
`1816 werapaaeyehoornuaEESSS
`
`same as ALPHA
`
`00 - BLEND_OPT.EALL: Set wth ZERO
`01 -BLEND_OPT_PRESERVE_ALL_KSNORE_NONE; Set with ONE
`02 -BLEND_OPT_PRESERVE_C1_IGNORE_CO: Set with SRC_COLOR
`
`03 -BLENDOPTFreoenveCrIGNORE_C1: Setwith
`ONE_MINUS_SRC_COLOR
`04 - BLENO_OPT_PRESERVE_At_IGNORE_A0O: Set with SRC_ALPHA
`05 - BLEND_OPT_PRESERVE_A0_IGNORE_A1: Set with
`06.BLENDGPTPRESERVE:NONE_IGNORE_AO: Setwith ALPHA_SATURATE
`ONE_MINUS_SRC_ALPHA,
`aeeetVE_NONE_IGNORE_NONE: Setwithanyrother
`22.20 Set which values in color or alpha will preserve the dest color data and which values
`Renee oa ee Settingscan be derived fromALPHA
`Note thatALPHA_SATURATE in alpha should be set as ONE and COLOR Is the
`same as ALPHA
`
`ALPHA_DST_OPT
`
`POSSIBLEVALUES:
`~BLENO_OPT_PRESERVE_NONE IGNORE_ALLSetwith ZERO
`|OPT_PRESERVE_ALL_KGNORE_NONE:Setwith ONE
`“OPT_PRESERVE_C1_IGNORE_CO: Set with SRC_COLOR
`OPT_PRESERVE_CO_IGNORE_C1: Set with
`COLOR
`GPT_PRESERVE_Al_IGNORE_AO: Setwith SRC_ALPHA
`1T_PRESERVE_AQ_IGNORE_A1. Set with
`
`aa
`
`gasne
`ieiALPHA_COMB_FCN
`
`_PRESERVE_NONE_IGNORE_AO: Set with ALPHA_SATURATE
`_PRESERVENONE_IGNORE_NONE: Set with any other
`
`2624 Set how to combine the source and destination optimizations to figure out when an
`overwrite can bypass the blonder, destination reads can beskipped, and/or the
`whole pixel can be discarded, Settings can be derived from ALPHA_COMB_FCN
`
`After 0.0 and 1.0 are detected for alpha and color, the abowe BLENDOPT tests are executed to get “PreserveSre”,
`“TgnoreSire”, "PreserveDst”, and “IgnoreDst” fags for cach of color and alpha.
`In AO.CO,AL,CL, A means Alpha, C
`means Cobor, and O means ==0.0 while | means ==1. For the alpka blend opt tests, color is treated as alpha as well
`aliasing 2 and 3 with 4 and 5..
`and
`the COMBFCNs
`these with
`of
`cach
`and Dst of
`Then
`combine
`th Sr
`CB_COLOR#INFO.NUMBER_TYPE to get “BlendBypass”, “Don't_rd_dst”, and “Discardpixel” flags for cach
`of cobor and alpha:
`COMB FCN
`
`Discard Pixel
`
`ike oo
`
`merae
`
`GFP Pe TA MAGdoce —~ 1)70 Bytey
`
`LE/OUEE 12:19 AR
`
`AMD1044_0104760
`
`ATI Ex. 2025
`IPR2023-00922
`Page 17 of 35
`
`ATI Ex. 2025
`
`IPR2023-00922
`Page 17 of 35
`
`

`

`PAGE
`
`ORIGINATE
`29-Nov-11
`
`EDIT DATE
`satis
`
`DOCUMENT-VER. NUM.
`1.0
`
`18 of 35
`
`Never
`Never
`Never
`OPT_COMB_NONE
`
`
`OPT_COMB_ADD (Preserve Sic || SRC=) RA—[pnore Dst (Ignore Sre | SRCthy ak
`Ignore Dst
`Preserve Dsi
`
`OPT_COMB_SUBTRAC (Preserve Src || SRC=") RK—Ignore Dat Never
`
`
`
` D
`
`(Preserve Sree && SRC—1)
`Ignore Dst ||
`SRC=) && uno
`OPT_COMB_MIN
`&& u/snonn
`BlendBypass
`___
`.
`ee
`en
`SRC0 &&unorm
`Ignore Dest ||
`(Preserve Sre At SROww 1)
`| OPT_COMB_MAX
`pb opeARe ae ____BlendBypass eS ame
`OPT_COMB_REVSUBT
` SRC==) && Ignore Dst
`Ignore Dst
`(Ignore Sre || SRC=) RA
`1? ioe
`Preserve Ds
`OPT_COMB_BLENDDI Always
`Never
`SABLED
`OPT_COMB_SAFE_AD
`
`Preserve See && Ignore Dest
`
`Ignore Dst
`
`Ignore Src && Preserve Dst
`
`Always
`
`|
`
`Add SX_MRT#BLEND_OPT_CONTROLper state set which controls when to pay attention to cach of
`the color and alpha optimizations per MRT
`DESCRIPTION: Enables/Disablosthe BLEND.OPTS PRT: Program thesedemvedfrom the TARGET_MASK,
`ALEND_ENABLE, and FORGE_DST_ALPHA_1
`ce”SHADER._MASK register'susedformapping from shader export to MRT SX_MRT[1-7]_CONTR

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket