throbber
ORIGINATE
`
`EDIT DATE
`
`DOCUMENT-VER. NUM.
`
`PAGE
`
`| CREATEDATE
`
`1.0
`
`| PAGE | of|
`
`Author: Todd Martin, Mangesh Nijasure
`
`ISSUED TO:
`
`COPY NO.
`
`WD/IA/VGT
`
`Micro-Architecture Specification
`
`Rev 1.0 —Last Edit: [ SAVEDATE \@ "d-MMM-yy" \* MERGEFORMATJ5
`
`THIS DOCUMENT CONTAINS |
`
`INFORMATION THAT COULD BE
`
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF AMD THROUGH
`
`UNLICENSED USE OR UNAUTHORIZED DISCLOSURE.
`
`drive the Internet and businesses. For more information, visit | HYPERLINK "hittp://www.amd.com"|.
`
`Preserve this document's integrity:
`
`[SYMBOL222 \f "Symbol" \s 8 \h] Do not reproduceanyportionsofit.
`
`[SYMBOL222 \f "Symbol" \s 8 \h] Do not separate any pages from this cover,
`
`This documentis issued to you alone. Do not transfer it to or share it with another person, even within your
`organization.
`
`Store this documentin a locked cabinet accessible only by authorized users. Do notleaveit unattended.
`
`Whenyou no Jonger need this document, return it to AMD, Please do not discard it.
`
`
`“Copyright 2011, Advanced Micro Devices,
`Inc.
`("AMD"). All rights reserved. This work contains confidential, proprietary to the reader information end trade
`secrets of AMD, No part of ths document! may be used, reproduced, or transmitted in any form or by any means without the prior writter| permission of AMD.”
`
`AMD, the AMD Arrow Logo and combinations thereof are trademarks of Advanced Micra Devices, Inc.
`trademark of HDMILicensing, LLC.
`
`PCle is a registered trademark of PCISIG. HDM) 6 4
`
`AMD (NYSE: AMD) Is a semiconductor design Innovator leading the next era of Vivid digital experiences with its ground-breaking AMD Fusion Actelerated
`Processing Units (APU). AMD's graphics and computing technologies power a variety of devices including PCs, game consoles and the powerful eomputers that
`
`frierrame | —{numretrars |'ByRes
`
`AMD
`
`lotintdate \ie "M/W bern AM/PM"]
`
`AMD1044_0048455
`
`ATI Ex. 2026
`IPR2023-00922
`Page 1 of 110
`
`

`

`[ SAVEDATE \@ [ CREATEDATE
`
`ORIGINATE
`
`EDIT DATE
`
`DOCUMENT-VER. NUM.
`
`PAGE
`
`*
`
`1.0
`
`[ PAGE ] of [
`
`Revision History
`
`fe[roo[ovwnonSd
`
`
`
`[filename ] — [numchars ] Bytes
`
`[printdate \@ "MM/dd/yy hh:mm AM/PM"}
`
`AMD1044_0048456
`
`ATI Ex. 2026
`IPR2023-00922
`Page 2 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 2 of 110
`
`

`

`
`ORIGINATE
`EDIT DATE
`DOCUMENT-VER. NUM.
`PAGE
`by
`[ CREATEDATE
`[ SAVEDATE \@
`1.0
`[ PAGE] of [
`"d-MIMIM-
`hs
`ADA
`
`AM D
`
`"
`
`Table of Contents[ TOC \O "1-6" \T
`"FIGCAPTION,3,FIGC APTIONTOP,3 ,FIGCAPTIONBOTTOM,3,CAPTION,3"]
`
`[filename ] — [numchars ] Bytes
`
`(printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048457
`
`ATI Ex. 2026
`IPR2023-00922
`Page 3 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 3 of 110
`
`

`

`PAGE [cREATEDATE|LSAVEDATE \@
`
`ORIGINATE
`
`EDIT DATE
`
`DOCUMENT-VER. NUM.
`1.0
`
`*
`
`[ PAGE J of [
`
`[filename ] — [numchars ] Bytes
`
`(printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048458
`
`ATI Ex. 2026
`IPR2023-00922
`Page 4 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 4 of 110
`
`

`

`PAGE [createpate|! SAVEDATE \@
`
`ORIGINATE
`
`EDIT DATE
`
`DOCUMENT-VER. NUM.
`1.0
`
`[ PAGE ] of|
`
`1
`
`Introduction
`
`This document contains a descriptionof the features and hardware implementation of the WD,IA, and the
`VGTblocks and howtheyfit into the overall graphics architecture.
`
`1.1 Open Issues
`There are no knownopenissues for the WD, IA, or VGT.
`
`1.2 Scope
`This documentdetails the feature requirements and the hardware implementation for the WD, IA, and VGT
`blocks.
`
`1.3 Reference
`Noexternal documents other than those explicitly linked in this specification are necessary to understand the
`material presented here. This micro architecture specificationis self sufficient in describing the design and
`features of the WD, IA, and VGTblocks.
`
`1.4 Definitions / glossary of terms
`
`-
`
`-
`
`- WD-WorkDistributer, receives all the draw commandsand breaks them up into work groups whichare
`sent to one or more IA units
`JA —Input Assembler, receives work groups and breaks themup into prim groups for the VGT. Fetches
`indices from memory.
`VGT- Vertex Geometry Tessellator, this is the main block responsible for supporting all DX and OGL
`drawpackets
`- THREAD - A thread isasingle entity in a wavefront, this can be vertices, primitivesetc
`
`- WAVEFRONT- A groupofthreads that execute in SIMD fashion.
`-
`SE-— Shader Engine
`-
`PA —Primitive Assembler
`-
`LS—Local data Shader
`- HS-—Hull Shader
`- DS—Domain Shader
`
`-
`ES—Export Shader
`-
`GS — Geometry Shader
`-
`VS-— Vertex Shader
`-
`_CS— Compute Shader
`-
`EOP —End OfPacket
`-
`EOPG- End Of Primgroup
`
`[filename ] — [numchars ] Bytes
`
`(printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048459
`
`ATI Ex. 2026
`IPR2023-00922
`Page 5 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 5 of 110
`
`

`

`PAGE [cREATEDATE|LSAVEDATE \@
`
`ORIGINATE
`
`EDIT DATE
`
`DOCUMENT-VER. NUM.
`1.0
`
`*
`
`[ PAGE J of [
`
`[filename ] — [numchars ] Bytes
`
`(printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048460
`
`ATI Ex. 2026
`IPR2023-00922
`Page6 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 6 of 110
`
`

`

`* [cREATEDATE|LSAVEDATE \@
`
`ORIGINATE
`
`EDIT DATE
`
`DOCUMENT-VER. NUM.
`1.0
`
`PAGE
`[ PAGE J of [
`
`1.5 Top Level Diagram
`
`This diagram showsa 4 shader engine configuration.
`
`[| EMBEDVisio.Drawing.11 ]
`
`[filename ] — [numchars ] Bytes
`
`(printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048461
`
`ATI Ex. 2026
`IPR2023-00922
`Page 7 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 7 of 110
`
`

`

`PAGE [cREATEDATE|LSAVEDATE \@
`
`ORIGINATE
`
`EDIT DATE
`
`DOCUMENT-VER. NUM.
`1.0
`
`*
`
`[ PAGE J of [
`
`2 Delta Requirements
`
`All delta features have been folded into this spec.
`
`[filename ] — [numchars ] Bytes
`
`(printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048462
`
`ATI Ex. 2026
`IPR2023-00922
`Page 8 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 8 of 110
`
`

`

`PAGE [CREATEDATE|LSAVEDATE \@
`
`ORIGINATE
`EDIT DATE
`DOCUMENT-VER. NUM.
`
`1.0 [ PAGE]of [
`3 Features / Functionality
`a
`
`3.1 Overview
`
`The WD,IA, and VGTare responsible for creating thread data and wavefront assignmentfor nearlyall the
`graphics shaderstages in use bythe graphics core. These are Vertex Shader (VS), Export Shader (ES),
`Geometry Shader (GS), Local Data Shader (LS), and Hull Shader (HS). The remaining shadertype, Pixel
`Shader(PS), is not controlled directly by the VGT, but the VGT does provide primitive information to the
`Primitive Assembly (PA) block which begins a processof clipping and sendsclipped primitives to a Scan
`Converter (SC) that produces Pixel Shader (PS) data and wavefronts.
`
`Byproviding these workloadsin a deterministic, state-controlled order, the WD/IA/VGTenables and controls
`the various graphics pipelines (from a simple VS->PSpipeline, to a complicated pipeline such as LS->HS-
`>TESS_FIXED_FUNCTION->ES(as DS)->GS->VS). Primarily the WD/IA/VGTaccomplishes this by
`decomposing input packets whichyield all of the shader types. In addition to the controlling shader stages, the
`VGTinparticular, also provides a fixed function tessellation stage of the graphics pipeline. The register
`VGT_SHADER_STAGESENindicates which shaderstages are enabled for a given DrawInitiator.
`
`There is one WD blockper chip and one VGTper Shader Engine. A single IA is paired with two VGT blocks
`so a two SE chip will have one IA while a four SE chip has 2 IAs. A single VGT has a throughputof one
`primitive per cycle so the number of VGTspresent directly controls the maximumgeometry throughput of the
`system.
`
`The WD/IA/VGTis responsible for:
`
`COMPUTE:
`
`e¢
`
`Compute support has completely moved to the CP.
`
`GRAPHICS:
`
`e
`e
`
`e
`
`Receiving graphicsstate, draw requests, and synchronizationevents, from the GRBM bus.
`=Fetching, from memory, the individual vertex indices (16 or 32 bit pointers to vertex data) requested by a
`drawcall.
`Grouping the indices into primitives such aslines, triangles, or patches.
`Determining index reuse within a fixed windowofindices. This avoids redundantvertex shading.
`Providing primitive informationto the Primitive Assembler (PA) block.
`Providingstatistics and synchronizing events to the Command Processor (CP).
`Alternate graphics workloads to shader engines.
`Support legacytessellation mode (does not use the LS/HS/DSshaderstages, and has a different fixed
`function tessellation algorithm)
`Arbitrate (at a packet boundary) between high and normalpriority drawcalls
`Independent pipeline reset such that work assigned a given VMIDis stopped as quick as possible.
`Front end harvesting including deactivation of individual VGTsand associated IA.
`Order data for streamout.
`
`IfGeometry Shading is enabled
`o Generate ES wavefronts/vertices/GS primitives and send them to the SPI
`o Uponcompletion of ES, generate GS wavefronts and send themto the SPI
`© Receive Geometry information output from the GS
`
`[filename ] — [numchars ] Bytes
`
`(printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048463
`
`ATI Ex. 2026
`IPR2023-00922
`Page 9 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 9 of 110
`
`

`

`PAGE
`
`ORIGINATE
`EDIT DATE
`DOCUMENT-VER. NUM.
`
`
`[CREATEDATE|[SAVEDATE \@ [ PAGE]of [1.0
`"d-NAMM-—-w"
`*
`nN
`An A
`o Uponcompletion of GS, generate VS wavefronts/vertices (including streamout data) and send them
`to the SPI. In this situation the VS is also known as the CopyShadersinceit is copying data from
`the GSVSring buffer to the position buffer and parameter cache.
`o Generate Primitive information and send it to the PA
`o
`ES/GSoutputdata is eitherall off chip or all on chip
`
`e
`
`If Tessellation is enabled
`o Generate LS wavefronts/vertices and send them to the SPI
`© Uponcompletion of LS, generate HS wavefronts and send them to the SPI
`o Uponcompletion of HS,retrieve tessellation factors from memory, and execute the fixed function
`tessellator stage.
`o Create DS wavefronts/data from output of tessellator. Send the DS wavefronts to the SPI as either
`ES wavefronts (Geometry Shader enabled) or as Vertex Shader wavefronts (Geometry shader
`disabled)
`o Optionally (if Geometry Shader enabled) create GS wavefronts/data and send themto the SPI
`© Generate Primitive information and send it to the PA
`
`Shader_en|
`VS
`HS
`DS
`GS
`Hardwaredata flow
`
`mode
`LS->HS->TE->ES->GS->VS
`on
`on
`on
`on
`F
`
`
`
`
`E off|API VS runsas LS. LS->HS->TE->VSon on on
`D
`on
`on
`off
`on
`Notvalid
`
`C off|Not validon on off
`
`
`
`B
`on
`off
`on
`on
`Notvalid
`
`A off|Not validon off on
`
`
`
`9
`on
`off
`off
`on
`API VS runs as the ES. ES->GS->VS
`
`8 off|VS->PSon off off
`
`
`
`
`7
`off
`on
`on
`on
`Not possible VS has to be on
`
`6 off|Not possible VS has to be onoff on on
`
`
`
`5
`off
`on
`off
`on
`Notpossible VS has to be on
`
`4 off|Not possible VS has to be onoff on off
`
`
`
`3
`off
`off
`on
`on
`Not possible VS has to be on
`
`2 off|Not possible VS has to be onoff off on
`
`
`
`1
`off
`off
`off
`on
`Not possible VS has to be on
`
`0 off|Not possible VS has to be onoff off off
`
`
`
`
`
`
`
`
`3.2 Data Flow based on SHADER_STAGES_EN programming
`
`
`
`
`
`
`
`
`
`3.3 Distribution of work amongst shader engines.
`
`If there is one IA the WDpasses through drawcalls, howeverif there are two IA’s the WD breaks drawsinto
`work groups whichare twice the size of a prim group. The IA sends an entire prim group to a VGTbefore
`switching to the next VGT. A endof prim group (eopg) signal follows each prim group and the SC looks for
`these to know whento switch input FIFOs.
`
`The WDwill discard any drawcall that that contains 0 indices or aprimitive type of DI.PT_NONE. Beginning
`in gfx8, the WD will also discard any drawcall that sets the numberofinstances to 0. If there is one IA, the
`drawwill be dropped and nothing will be sent downthe pipe, but if there are two IA’s, the WD will send a null
`cop downthe pipe and toggle the IA that will receive work next.
`
`[filename ] — [numchars ] Bytes
`
`(printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048464
`
`ATI Ex. 2026
`IPR2023-00922
`Page 10 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 10 of 110
`
`

`

`PAGE [CREATEDATE|[SAVEDATE \@
`
`ORIGINATE
`EDIT DATE
`DOCUMENT-VER. NUM.
`
`1.0 [ PAGE]of [
`
`Belowis an example of processing primitive groups from IA to SCs.
`
`The Input Assembler(IA) block processesa serial stream of primitives. The bolded P’s represent primitives
`that will be sent to SEO.
`
`Pp, P, p (Copg). p. p. p (copg). p (eop/eopg), event, p, p (cop/eopg). event, eop, event, p, p (copg), p, eop/eopg,
`event
`
`Four packets processed by VGT modules are shownbelowwithdifferent colors. This represents data sent from
`the IA to VGT’s 0 and 1. PA_0 will receive the same sequence as VGT_0 and PA_1 will receive the same
`sequence as VGT_1. After reset processing starts with VGT module 0.
`
`Eopg marks the end of the prim groupandthis is whattells the IA block to switch SE’s. Eopgis onlysent to
`the active SE.
`
`Eopis sent to both SE’s in unordered mode. Otherwise the PA drops the eopthat isn’t accompanied by copg.
`
`Eachline represents a point in time so items onthe sameline occur simultancously.
`Read the data flow fromtop to bottom.
`
`VGT 1
`
`P P P
`
`/eopg
`eop
`event
`
`P
`
`P/eopg/eop
`event
`
`event
`
`P
`
`VGT 0
`
`P P P
`
`/eopg
`
`P/eopg/eop
`event
`
`yp
`event
`
`event
`
`P P
`
`/eopg
`
`eopg/eop
`eop
`event
`event
`
`
`The table below shows howPA outputs are loaded into SCs. Some of the primitives which can span scan
`windowsof two SCsare loaded into FIFOs of both SCs.
`
`The processing of FIFOs of each SCsare done independently for each SC. The diagram below showsorder of
`processing within FIFOsfor a given SC.
`
`The SC switches the fifo it’s reading from after reading out a eopg. The SC synchronizes FIFOs when
`processing an event or cop though whenordered mode (default) is used the PA will only send copifit is
`accompanied bya eopg.
`
`[filename ] — [numchars ] Bytes
`
`(printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048465
`
`ATI Ex. 2026
`IPR2023-00922
`Page 11 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 11 of 110
`
`

`

`PAGE [ CREATEDATE
`
`ORIGINATE
`
`EDIT DATE
`[ SAVEDATE \@
`
`DOCUMENT-VER. NUM.
`
`1.0
`
`[ PAGE ] of [
`
`Read the data flowfromtop to bottom. Eachline shows whatis being read out by the SC during aclock cycle.
`
`SsCc_0
`
`$Cc_l
`
`FIFO_1
`
`P P P
`
`/eopg
`eop
`event
`
`P’/eopg/eop
`event
`
`event
`
`FIFO_0
`
`eopg
`
`eopg/eop
`event
`
`cop
`event
`
`event
`
`FIFO_1
`
`cops
`eop
`event
`
`P*/eopg/eop
`event
`
`event
`
`FIFO_0
`
`P
`
`P P
`
`/eopg
`
`P/eopg/eop
`event
`
`»p
`event
`
`event
`
`P
`P/eopg
`eopg/eop
`eop
`eopg/eop
`eop
`event
`event
`event
`event
`
`
`eopg
`
`The following table shows what each SC seesafter reading from the input FIFOs.
`Readthe data flow fromtop to bottom. Packets are separated forclarity.
`
`SC 0
`P
`P
`P
`
`P/eop
`event
`
`P*/eop
`
`SC 1
`P
`P
`P
`
`eop
`event
`
`P
`
`[filename ] — [numchars ] Bytes
`
`(printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048466
`
`ATI Ex. 2026
`IPR2023-00922
`Page 12 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 12 of 110
`
`

`

`PAGE [ CREATEDATE
`
`[ PAGE ] of [
`
`ORIGINATE
`
`EDIT DATE
`[ SAVEDATE \@
`
`DOCUMENT-VER. NUM.
`;
`
`event
`
`P
`
`P
`eop
`event
`
`event
`
`P
`
`cop
`event
`
`3.4 Vertex Reuse
`
`The intent of the vertex reuse determinationis to efficiently use the Vertex Shader by preventing the Vertex
`Shaderfrom processing the same vertex multiple timesif that vertex is used in multiple primitives that occur
`relatively close together in the input stream. The VGT must detect vertex reuse within the previous 30 (or less)
`vertex indices. In other words, vertex reuse is determined bythe redundant occurrence of an external vertex
`index within a limited scope of the external vertex indexlist. If a “hit” is detected for a given vertex index, that
`vertex index is not resubmitted for vertex processing.
`
`Reuse checksare performed in multiple sub-blocks in the VGT. Hereis a list of shader stages and where vertex
`reuse occurs.
`
`VS -> PS: Vertex Reuse Block performs the reuse. If Streamout is enabled reuse is automatically disabled.
`
`ES -> GS -> VS -> PS: The GS Reuse Check Module performs reuse checks to remove redundantvertices from
`ES wavefronts. GS prims output strips, but there is no reuse betweenthe strips. If Streamout is enabled reuse is
`automatically disabled prior to the VS and thestrips are converted tolists.
`
`LS -> HS -> VS -> PS: Thetessellator performs reuse prior to the VS stage. There is no reuse betweenpatches,
`only betweenprimitives output bya single patch. If Streamout is enabled reuse is automatically disabled.
`
`LS -> HS -> ES -> GS -> VS -> PS: Thetessellator performs reuse prior to the ES stage. GS prims output
`strips, but there is no reuse between the strips. If Streamoutis enabled reuse is automatically disabled prior to
`the VS andthestrips are converted tolists.
`
`3.4.1 Bank Conflict Detection
`
`Withthe increase to a reuse depth from 16 to 30 in gfx8, it became possible for there to be bank conflicts in the
`parameter cache.
`In order to simplify logic in the SPI and the Parameter Cache, there will be bank conflict
`prevention code added to some of the VGT reuse checkers. Anyintra-primrelative indices that would cause
`bank conflicts in the parameter cache will not be allowed.
`
`[filename ] — [numchars ] Bytes
`
`(printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048467
`
`ATI Ex. 2026
`IPR2023-00922
`Page 13 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 13 of 110
`
`

`

`
`
`ORIGINATE
`EDIT DATE
`DOCUMENT-VER. NUM.
`PAGE
`
`
`[CREATEDATE|[SAVEDATE \@ [ PAGE]of [1.0
`. 21 they are
`stored in the parameter cache as follows. Each of the columns shownis a bank, there are 16 banks in the PC.
`
`
`
`
`
`
`
`
`
`
`| Poa] 21|20[19|18| [17 [16
`
`15 }14]13/}12}11}1l0/9
`|8
`|7
`Joe 15
`|4
`|
`1
`|0
`
`If the new incoming primitive(a triangle) has the indices 22, 3 and 6. The indices 22 and 6 will be fetched from
`the same bank and will cause a conflict.
`
`The newconflict detection code will eliminate this condition by replicating the index 6 as follows
`
` (15[14[13|12|r]ilo{s|7|e[5[4[3j2[1Jo|
`
`This produces more unique vertices than ideal but it eliminates the need for complex bank conflict detection in
`the SPI and the PC.
`
`Indices from different primitives are allowed to request indices from the same bank, this conflict checking is
`handled downstream. The VGTwill only be responsible for eliminating anyintra-primconflicts.
`
`This conflict detection step is not necessary in the GS RCMorin the TE11 when the GS is enabled. This is
`because the primitives at these pipeline stages do not go to the parameter cache and will not cause bank
`conflicts when fetched
`
`The VGT_GS_VERTEX_REUSEregister is now deprecated. The VGT_VERTEXREUSEBLOCK_CNTL
`register nowcontrols reuse depth for all the reuse buffers (DX9, GS and TE11)
`
`Setting VGT_REUSE_OFF.REUSE_OFFturns off reuse in all blocks. This did not turn reuse off in the GS
`block earlier.
`
`To support this increased reuse depth, reuse is nowturned off for any degenerate primitives (any primitive with
`repeated indices). This is an implementation level detail to save schedule.
`
`3.5 Dealloc Distance and Reuse Depth
`
`The shader always writes 16 vertex parameters. Therefore dealloc_distance and reuse depth is always set to 16
`(points) or 14 (triangles). This also showsthat for legacy (non-DX11) tessellation we can setup
`HOS_REUSE_DEPTH to 16. The following changes were done to remove dealloc_slot issue independent of
`quad_pipes.
`
`1. The VGT submits 64 vertices per wave unless half pack is switched on.
`if (half_packed flag)
`create 32 verts per vector
`else
`create 64 verts per vector
`
`2. De-allocate distance will always be 16 and reuse can always be 14 unless driver wants to limitit to be less
`
`3. The VGT changeto create
`a.
`1 NewFlag pervertex vectorattachedto the first primitive containing the first vert of a newvertex
`vector (Same as today)
`1 De-allocate signal per Vertex Vector submitted instead of4.
`
`b.
`
`[filename ] — [numchars ] Bytes
`
`(printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048468
`
`ATI Ex. 2026
`IPR2023-00922
`Page 14 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 14 of 110
`
`

`

`[ CREATEDATE
`
`EDIT DATE
`[ SAVEDATE \@
`
`DOCUMENT-VER. NUM.
`1.0
`
`PAGE
`[ PAGE ] of [
`
`ORIGINATE
`
`ii.
`
`c.
`
`of the previous vector
`It will be variable base onthe resulting numberofvertices per vector set up in number |.
`1.
`vert 80 and every 64 thereafter
`1 last signal that
`is sent in the msb of de-allocate signal
`i. This will be attached to the primitive containingthe first reference to a newspecific vert
`for cach vector depending on the vector size
`1.
`vert 61 and every 64 thereafter
`ii. This signal is only for SC usage
`4. The largest actual de-allocate count the VGT will send nowwill be 3
`an
`The scan converter change his partial vector submit circuit to emit a partial vector when he hasa de-
`allocate count of nonzero and gets a last flag. This will prevent hang conditions whenthereis a lot of
`culling going on. The Scan converter will also remove the previous partial submit for this reason and add
`asserts for the occurrenceofthe partial submit and a fail assert if de-allocate is everylarger than 2.
`6. The PA changes to pack parameters into the parameter cache. It will use the bad pipe flags and numberof
`parameters to determine nextoffsets.
`It will do a special case
`SQallocates and de-allocate based on num_quad_pipes and numberof parameters.
`limiting when num_quad+pipes > 2 and num_parameters >= 16,it will actuallyact like two quad pipes. So
`
`7.
`
`if (num_quad_pipes >2 && half_packedflag)
`alloc_amount = num_paramters * 2
`
`else
`
`alloc_amount num_quad_pipes * num_parameters
`
`3.6 VertexID
`
`VertexID is a 32 bit unsigned integer value created by the VGT and loaded into a VGPRfor the API vertex
`shader.
`It is unique per vertex though each VGT maintains its own count. This feature is not required by DX or
`OpenGL. The countis reset by a RESET_VTX_CNT event.
`
`3.7 PrimitivelD
`
`PrimitiveID is a 32 bit unsigned integer value created by the VGT and loaded into a VGPR. Theregister
`VGT_PRIMITIVEIDRESETspecifies the reset value to be used at the beginning of each instance. Typically
`it’s programmed to 0. For special GS modeslike scenario A and B the VGT_PRIMITIVEID_ENregister
`specifies that the primitive ID value should be loaded into a VGPRat the expense of aninstancestep rate value.
`
`PrimitiveID is automatically available to the HS, DS and GS. If it’s needed in the PS it must be passed as a
`parameter.
`It is expected that the situations where primitiveID is used by the PS butthere is no GSinstantiated
`are rare. To avoid having the hardware haveto pipe the full 32-bit primitiveID through hundreds of clocks of
`pipeline, the driver will be expected to change the VS into a GS_A, whichis basically a VS whichgets
`primitiveID on the input, and output the primitiveID on the expected vector/component where the PS expects it.
`The onlyother unique processing associated with a GS_Ais that the VGT must guarantee that the leading
`vertex is unique(i.c. does not hit in the vertex reuse cache). This is required so that unique data for the
`primitive (i.e. primitiveID) is available for constant interpolation for the primitive.
`
`3.8
`
`InstancelD
`
`InstanceID is a 32 bit unsigned integer value created by the VGT and loaded into a VGPR.It starts at 0 for all
`of the verticesofthe first instance, and increments thereafier for each instance.
`It should also be 0 for non-
`instance drawcalls.
`
`[filename ] — [numchars ] Bytes
`
`(printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048469
`
`ATI Ex. 2026
`IPR2023-00922
`Page 15 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 15 of 110
`
`

`

`ORIGINATE
`EDIT DATE
`DOCUMENT-VER. NUM.
`PAGE
`
`[CREATEDATE|[SAVEDATE \@ 1.0
`
`[ PAGE ] of [
`
`The value will be supplied to the API VS and is available to the GS and PS. The path to the VSis the only
`hardware-dedicated path for instanceID as the driver is expected to create a VS (which would pass it along if
`necessary) if there is no VSinstantiated.
`
`The VGT will also supplyup to two step-rate divide valuesto assist the fetch shader for cases with a small
`numberof unique step-rates.
`In case the required numberstep rates exceeds what is supported by VGT,.
`the
`remaining instanceID/step-rate will be calculated bythe fetch shader. A vertex wavefront mayconsist of
`vertices with different instanceID’s.
`
`Ina multiple VGT system, the end of instance flag needs to be propagated to all VGTsinorderto correctly
`increment the instanceID and reset the reuse buffer. In variants with more than one IA, the WD sends a
`null_eoi to the ‘other’ [A which propagates the flag to the VGTs connected to it. Whenthe entire instancefits
`onone IA,the null_eois sentto the other side can add up as dead cycles and showupas performanceglitches.
`
`If the entire drawis smaller than a primgroup, the null_eois are suppressed and instance_id is still handled
`correctly. This is done by adding an interface bit from the WDto the IA indicating that the draw wasa
`candidate for optimization. Anynull_cois that are not cops will be discarded gracefully and will not exit the IA
`
`3.9 Reset Index
`
`It’s typically used to breakstrips, but
`A reset index is a special index value that signifies the end of a primitive.
`maybe enabledforlists. Reset index is not supported with patches. Reset index checking occurs in the IA and
`it’s enabled bysetting the VGT_MULTL_PRIM_IB_RESET_ENregister. The index value to check for is
`specified in VGT_MULTIPRIM_IB_RESET_INDX.
`
`Enabling reset index limits performance for designs that have greater than 2x primrate (2 or more IAs) asit
`requires WD_SWITCH_ON_EOPtobeset. For this reason we recommendour developerrelations personnel
`evangelize usinglists instead of strips with reset indices.
`
`Partial primitives that result from a reset indexor at the end of a packetare silently discarded.
`
`Prior to gfx8, drivers modified the value of the reset index check register
`VGT_MULTI_PRIM_IB_RESET_INDXbasedonthe index type (8, 16 or 32 bit). In orderto alleviate the
`software validation that is performed, in gfx8 and later projects the hardware masksout the register bits
`depending on the number ofbits in the current index.
`
`For 16 bit indices, earlier the driver needed to program 0Ox0000VVVV. where VVVV is the reset index.
`
`Nowthe driver can use OXXXXXVVVV, where XXXXare don’t care
`
`Besides the performance implications other caveats to using reset indexare:
`1. Line stipple will not produce the correct visual result with this mode. The line stipple pattern will not reset
`between strips (whichit would if the strips were sent with separate VGT_DRAW_INITIATOR
`commands).
`Edgeflags will not be correct for the prim order VGT_GRP_POLYGON. This will have a visual impact in
`OpenGLforthis primitive order if POLY_MODEis set to LINES or POINTS.(This applies mostlyto the
`OpenGLpolygonprimitive.)
`
`2.
`
`3.10 Provoking Vertex
`
`[filename ] — [numchars ] Bytes
`
`(printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048470
`
`ATI Ex. 2026
`IPR2023-00922
`Page 16 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 16 of 110
`
`

`

`PAGE [CREATEDATE|[SAVEDATE \@
`
`ORIGINATE
`EDIT DATE
`DOCUMENT-VER. NUM.
`
`1.0 [ PAGE]of [
`Ifflat shading is enabled for aprimitive, then the provvoking vertexis the vertex whose coloris used to shade the
`entire primitive. OpenGL and Direct3D differ (for most primitive types) in their respective selections of the
`provoking vertex. The VGT will be designed so that the OpenGL primitives will always program the provoking
`vertex select to “last vertex” and the Direct3D primitives will always program the provoking vertex select to
`“first vertex”.
`
`OpenGL Specification
`The following table is based directly on table 4-2 from OpenGL Programming Guide, Second Edition. (The
`version in the OpenGL spec counts vertices and primitives starting at 1, whereas this version counts vertices and
`primitives starting at 0. After swapping for specified vertex order within the primitive, the provoking vertex is
`the last vertex in the primitive with the exceptionof the polygon primitive where the first vertex is the
`provoking vertex.
`
`Table1.
`
`OpenGLProvokingVertex.
`
`| N/A — 4i(first vtx in quad)
`
`Type of Polygon
`triangle strip
`
`triangle fan
`
`quadstrip 1
`independent quad
`
`|
`
`OpenGL
`Vertex Used to Select the Color
`for the ith Polygon
`
`i+2 (last vtxoumtwmuntri)
`OO
`
`Direct3D
`Vertex Used to Select the Color
`for the ith Polygon
`i (firstvtx intri)
`
`i+2 (last vtx intri)
`
`i (first vtx in tri)
`
`21+3 (next-to-last vtx in quad)
`| 4i+3 (last vtx in quad)
`
`N/A — 2i (first vtx in quad)
`
`! For OpenGL quadstrips, the provoking vertexis the last vertex in the vertex buffer that forms the primitive;
`however,it is the next-to-last vertex the primitive using the primitive-relative vertex order. For example,if the
`vertex buffer contains VO, V1, V2. and V3inthat order, then the first quad primitive fromthat strip will have the
`vertex order VO, V1, V3, V2. The provoking vertex for the quad is V3. See [ REF _Ref687713 \r \h ] for more
`detail.
`
`[filename ] — [numchars ] Bytes
`
`(printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048471
`
`ATI Ex. 2026
`IPR2023-00922
`Page 17 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 17 of 110
`
`

`

`ORIGINATE
`
`DOCUMENT-VER. NUM.
`
`EDIT DATE
`[ SAVEDATE \@
`1.0
`3.10.1 Primitive Vertex Ordering and Provoking Vertex Summary
`Table 2.
`Primitive Vertex Order and Provoking Vertex Summary
`
`[ PAGE ] of [
`
`PAGE [ CREATEDATE
`
`
`
`
`.
`.
`Vo. V1
`Line List
`V2.
`V3
`
`
`.
`.
`Line Strip
`
`Line Loop
`
`Tri List
`S
`
`Vo. V1
`VL.V2
`Vo. V1
`V1. V2
`V2, VO <= created by VGT
`VO. V1. V2
`V3, V4, V5
`
`Vo. V1
`V1, V2
`V2, VO <= created by VGT
`VO. V1, V2
`V3, V4, V5
`
`Tri Stri
`
`P
`
`V0. V1, V2
`V1, V3, V2 <= VGT swapslast two
`
`V0, V1, V2
`V2, V1, V3 <= VGT swapsfirst two
`
`VO, V1, V2
`V1. V2. VO <= VGTrotatesfirst to last
`Tri Fan
`a
`V2, V3, VO <= VGTrotatesfirstto last
`VO, V2, V3
`
`
`Quad List (Native)
`
`Does not exist — assumed
`
`V0. V1, V2, V3
`
`V4, V5, V6, V7
`
`ve ve ve vy
`
`coe
`
`Does not exist — assumed
`
`
`
`
`QuadList (Decomposed)|VO. V1. V2 and V0, V2. V3 vr ve v3 ane ve ve el
`
`
`
`V4, V5, V6 and V4, V6, V7
`
`7
`
`oo
`
`QuadStrip
`.
`(Native)
`
`QuadStrip
`(Decomposed)
`
`Does not exist assumed
`V0. V1, V3, V2 <= VGT swapslast two
`_
`;
`;
`V2, V3, V5, V4 <= VGT swapslast two
`V4. V5. V7. V6 <= VGT swapslasttwo
`Does not exist — assumed
`V0. V1, V3 and VO, V3, V2
`V2. V3. V5 and V2, V5, V4
`V4. V5. V7 and V4, V7, V6
`
`Polygon (Decomposed)
`
`voay assumed
`V0. v2. V3 ‘atad
`ey.
`V0, V3, V4 etc...
`
`VO, V1, V3, V2 <= VGT swaps last two
`on tye
`_
`/ P
`V2. V3. V5. V4 <= VGT swapslast two
`V4. V5. V7. V6 <= VGT swaps last two

`Woy
`Ws
`waps
`fast'lw
`VO. V1. V3 and V1. V2. V3
`V2, V3, V5 and V3, V4, V5
`V4. V5. V7 and V5. V6. V7
`oo ~
`
`V1, V2, VO <= VGTrotates first to last
`V2. V3. VO <= VGTrotates first to last
`V3, V4, VO <= VGTrotatesfirst to last
`
`Direct X Specification
`The DirectX 8.0 documentation states “Whenflat shading is enabled, the system shadesthe triangle with the
`color fromits first vertex.” There is no direct mentionofflat shading lines, but the VGT design assumes that
`lines also use the first vertex in each line segment as the provoking vertex.
`
`[filename ] — [numchars ] Bytes
`
`[printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048472
`
`ATI Ex. 2026
`IPR2023-00922
`Page 18 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 18 of 110
`
`

`

`PAGE [createpate|! SAVEDATE \@
`
`DOCUMENT-VER. NUM.
`1.0
`
`[ PAGE ] of|
`
`ORIGINATE
`
`EDIT DATE
`
`3.11 Primitive Types
`
`3.11.1 Triangle List
`
`The first edge in eachtriangle is a bold line. For OpenGL,the last vertex (shown with a square box in | REF
`_Ref685804 \r \h \* MERGEFORMATJ) is used as the provoking vertex. For Direct3D,thefirst vertex (shown
`ina circle in [ REF _Ref685804 \r \h \* MERGEFORMATJ) ineachtriangle ina triangle list is the provoking
`vertex.
`
`OpenGL and D3D order.
`
`
`3.11.2 Triangle Strip
`
`The first edge in eachtriangle is a bold line. Note for OpenGL,onlythe last vertex (shown with a square box in
`[ REF _Ref685843 \r\h \* MERGEFORMAT ]}) in eachtriangle progresses in a series (V2, V3, V4, etc). For
`OpenGL,thelast vertex is used as the provoking vertex.
`
`Provoking Vertex
`- OpenGL
`
`OpenGLorder
`
`[filename ] — [numchars ] Bytes
`
`(printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048473
`
`ATI Ex. 2026
`IPR2023-00922
`Page 19 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 19 of 110
`
`

`

`ORIGINATE
`EDIT DATE
`DOCUMENT-VER. NUM.
`PAGE
`
`[createpate|! SAVEDATE \@ 1.0
`
`_Ref685858 \r\h \* MERGEFORMAT]) in eachtriangle progresses in a series (VO, V1, V2, etc). For
`Direct3D, the first vertex is used as the provoking vertex.
`
`
`
`[ PAGE ] of|
`
`1
`
`3.11.3 Triangle Fan
`
`D3D order
`
`The first edge in eachtriangle is a bold line. Note for OpenGL,the last vertex (shown with a square box [| REF
`_Ref685872 \r\h \* MERGEFORMATJ) is used as the provoking vertex.
`
`The first edge in each triangle is a bold line. Note for Direct3D,the first vertex (shown ina circle in | REF
`_Ref685888 \r \h \* MERGEFORMATJ) is used as the provoking vertex.
`
`OpenGLorder
`
`[filename ] — [numchars ] Bytes
`
`(printdate \@ "MM/dd/yy hh:mm AM/PM"]
`
`AMD1044_0048474
`
`ATI Ex. 2026
`IPR2023-00922
`Page 20 of 110
`
`ATI Ex. 2026
`
`
`IPR2023-00922
`Page 20 of 110
`
`

`

`EDIT DATE
`[ SAVEDATE \@
`
`DOCUMENT-VER. NUM.
`1.0
`
`PAGE
`
`ORIGINATE
`[ CREATEDATE
`
`[ PAGE ] of [
`
`D3D triangle fan order
`
`3.11.4 Quad List
`
`The first edge in each quad is a bold line. Note for OpenGL,the last vertex (shown with a square box [

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket