`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`
`
`
`
`GEN-CXXXXX-REVA
`
`4 September, 2015
`
`
`
`
`11 March, 2001
`
`Steve Morein
`
` Author:
`
`Copy No:
`issue To:
`
`
`R400 Top Level Specification
`
`ver 0.2
`
`
`
`Overview: This replaces the R400 architecture specification.
`
`AUTOMATICALLY UPDATED FIELDS:
`Document Location:
`Document
`Current intranet Search Title :
`R400 Top Level Spec
`ee APPROVALS
`ss
`
`Name/Dept
`de =
`Signature/Date
`
`
`
`
`
`
`
`Remarks:
`
`
`
`THIS DOCUMENT CONTAINS [RRNFORMATION THAT COULD BE
`
`SUBSTANTIALLY DETRIMENTAL TO THE INTEREST OF ATI TECHNOLOGIES
`INC. THROUGH UNAUTHORIZED USE OR DISCLOSURE.
`
`
`
`
`“Copyright 2000, ATi Technologies Inc. All rights reserved. The material in this document constitutes an unpublished
`work created in 2000. The use of this copyright notice is intended to provide notice that ATI owns a copyright in this
`unpublished work. The copyright notice is not an admission that publication has occurred. This work contains
`
`EEEoprictary information and trade secrets of ATI. No part of this document may be used, reproduced, or
`
`transmitted in any form or by any means without the prior written permission of AT! Technologies inc.”
`
`
`Exhibit2044.D0C
`
`48154 Bytes*** © ATI BBReference Copyright Notice on Cover Page © ***goo445 12:48 PM ATI 2041
`
`LG v. ATI
`IPR2015-00325
`
`AMD1044_0017965
`
`ATI Ex. 2013
`IPR2023-00922
`Page 1 of 32
`
`ATI Ex. 2013
`
`IPR2023-00922
`Page 1 of 32
`
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`
`
`11 March, 2001
`4 September, 2015
`R400 Top Level Spec
`
`Table Of Contents
`
`1.-FEATURES 00... ceccsseessseseseesessseueesseenessentnassonesesenstessetesenenersoneresenuevssenneneseattasnetssesesersenesssnssene 7
`LD AGP Bxcece ene one nett tebe bitte teeta nite cn titebcnieeeetieeteebeetestrineneettseees 7
`1.2
`286 Bit Memory INterface ooo ccc cece cence eee cn rece sen co eeeeccncceeeccnsenereeccrcceeeteccnteeeeecnecnes 7
`1.3
`Unified Proc@SSing DIDG cc cccccec ccc ceeec cee ees eeee case cnc neceeeeceeeeeececececuensaeaeereeereseenenevensarees 7
`LA
`FROME GN SCAN. ce cece ee eee n nnn RRR DERN E ere ee EEE EEE EEE D ERE E CCDC C EEC CGgeE recog etne teen teN en ceea ee 7
`15
`Real-Time drawing commana DIY 00 cece cence bette eet beeb eee bbceeeeesceteeeeeceeeeteeeeenes 7
`sce)Oat10|<ee &
`1.6.1
`Noise Textures ooo cece ceees css cesen sever csesnnescnntessrerertesntnntscrettrnrertrnasinitinieenineys 8
`1.6.2
`Shadow buffers occ cccecceccsececcscscecescevscevenetescserentnseretseteanetetsereeeseeesteevseetessecsetees 8
`1.6.3
`Sort Independent Transparency o.oo... cc ccccccccceceeeseeeveesveveeevcevevevstettessveveservetnerseneeen 8
`1.6.4—ANt-Alla@SiNg oooc cece eee ee recess ersrnetercenetirrsivertisinviriinitintntentrvenitervinisinirsren 8
`1.6.5—Texture COMPrESSION o.oo cece cece cesses csctesssnterercretenescneetterertrnrvinitininitinivnitinenenness 8
`1.6.6—ZCOMPPESSION ice ccc cess seececseesescetecesevevrescsseuseeteseususesesstssueseceustissivetesieesstees 8
`1.6.7—Texture Filtering o.oo cc cece cccscecseeeestscseeeeeveseenssueetesventenvevevenstisatevitetavenenetrseneenes 8
`1.6.8—Curved Surface SUPPOFE A... cece cscs tener ceesvetseetesesvenenetrsntennetetnenvinitentirteteveneeen 9
`1.6.9
`Displacement MANS oo... ccc ccc ceeeeesesveteeeevevteensseseneevensenvavavesstetvasevenetavenenetstentenes g
`1.7
`PIG COLL CEDocc ccc cece eee ee cc cee cence ceeeeceeeeecuccedeneneeeeeeeeeeeeecctencecceeeereseeeeeectietecetenen 9
`2.
`PERFORMANCE ......o.. ee csescssseessensesecceessonsersesuecesoumesossueenseutenseuensuonassonauersneanssenaeeecaaenanse 9
`3.
`SCHEDULE. .0.....n..cecccecssnnersecerersnnenecuenensececneeeesonanesensenscssseesuenenenersnensersuoneresneneresseneesenteae® 9
`
`4,—-§-PROCESS... ec ccscccssctssseersssnunersasnenessnsessenenenenseseneressuueessenmevcsseuasdnenteneestsenuresenunsesesnenansaneseneeszone 9
`5,
`GENERAL CHIP OPERATION... .scscsccssssnsesseesssoeseseneersssoeneseensssensseonstensoeesesenereseneneennense 9
`5.1
`Unnifiecl Smad er ooo. cece ce eee c ecco neces cone ee eee cn cen eee ecco Oo eeeeec Cees ee ccaeenrereeccucseeeteccateeeeccnenees g
`§2
`BD RENTINGccc cece eect eee dete ete te ence a dae eceeeeeeeseceteecctentessteceeteeeterineees 10
`5.3.
`REAL TIMES RENGETING 0... ccc cece cece eee e ten ee eee nee ree ET DEE EEE ED HERERO CEEOL EEC CtttEgecetetEeCeEReHEEeEOaS 12
`5.4
`State MANAGemMenth ooo ccc cece ect ee cee ee reece CeE Eee bce deed Edda e de cb ed ede EGE Ee Eco cr et eEECceeeeereess 13
`§ 8
`BAM eR cece cece cece cece anes ee ee eee eeea aa be ae Waa d dda dH eH Std Cobb cet eeeeceeeeeecceseeescesanarsaneaneeecees 13
`5.6
`DISDIAY ODEON. cece cece ee teense renee ee EE EE EEE EE DOOD CRED CEE CCECGCEtteCcgetteeSCeteReneHeenreenaas 14
`6.
`BLOCK DIAGRAM ........ccccscssecsresesererseeesessuerssneenssonmetcsseusssnenseseestissnresenuuesseseneresnnanesenten 15
`To
`BLOCKS 0. eecsceceseenesenseesnesensenesenensessenseeesnasenssaenansuuaaeseueneneasenseeseesenasesenaensseaneaeneaenanaeseses 15
`8.
`BLOCK DESCRIPTONSG.....0.0 cc eseessenene scones ssnnenesosenennensensaesanonsasenenerevneaeaesenenesennananentans 16
`8.1
`HBIU — most Dus INLerfaCe UNI ce cecccec eee cce cece cee neneceeeeeeeeceeeeeecenccersereneneceenenscneseseeerens 16
`S11
`Description occ eee cceececceeeeseeetecscesetavseseessestesinssonteresenetevonnnessrerererespnereseserss 16
`8.1.2 Major interfacesoo cece eee ec eee ee tee e eee ee te tate tr teeiteteettnientiststettnsness 16
`BS. 1.3
`BlOCK diAQrAIM o.oo ec cc cee cccceecescecetecsveveevecevevervevnvesitaventititivertetitentirerevevirervenee, 17
`8.2 CP ~ Control PrOCeSSOFeee eee eee cette ee tect tect betes eeccteneeeecttnesieens 18
`8.2.1—DeSeription cece cece cecs cece cserteteeceettevstetenvsvitetiestnteatentteevivetertnevenetnennes 18
`8.2.2 Major interfaces... ccecccccececcssecescsestecseeeesescnteviecetttenseitentevevivervenitintiteresvnieranens 18
`B.2.3—BIOCK GAGA. oo. oeccecccccccc cee cs este ces ceestevstevestecntevirvetttesieatenticitversitisintistreniniteranens 19
`83
`RBBM — register interface MANAG! oo... cece eee ceee cee c cence ebb ce ee eebeeebeesegesteeecateneneenenreuanes ig
`SB. 1
`De@Seription occ eects ees eeee cs ertenee ce ttnevseeestsvitevrsentrinttitnevitinrsnsinetmeanenes 19
`
`Exhibit 2044.00C
`
`48154 Bytes*** © ATI BRference Copyright Notice on Cover Page © ***jo445 12-48 py
`
`AMD1044_0017966
`
`ATI Ex. 2013
`IPR2023-00922
`Page 2 of 32
`
`ATI Ex. 2013
`
`IPR2023-00922
`Page 2 of 32
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`
`
` |
`
`11 March, 2001
`4 September, 2015
`GEN-CXXXXX-REVA
`8.3.2 Major interfaces... occ cece sees senses eresveneersevesstsettatitnentsvntis teens 19
`8.3.3
`BlOCK diAQraM ence cccccccce cece secs crcecesccescevevevevescinevesssereisevavintesevivnsiesvesienteneeeieess 20
`8.3.4
` RBBM operation o.oo cceceeceescesseeeeveveteeervevevstietvetitettesvttessertittitierene 20
`84
`CLK — clack generator... ce cece ence cece cece teeter sie ceeeeseadeeecesensaseecescseeeesetiieeeeees 21
`SAL Deseriptionccc cc cece ee cece cece tent tete sete teteteevettetettneititietsustitesmtesercsees 21
`8.4.2 Major interfaces ooo... cccccccecccccccceeeeveeeeeeseeeenvevevevervevevevsentetiteventevitvesttestwvinmervenee, 21
`S.A.3—BlOCK GiAQFAM. o.oo. ccc cece cscs eeeeesetetees ee tevsvenisersetivstitttitittirerviniwistentntitteenns 22
`B.5
`TO = test COMPO] ccc ccc ce cece reece cence ne esse ence cece codec eeecsceneeecesenccseeeeccccetecciteseceuees 22
`SS.)
`De@SCription cece cece cseeeescevenersestrevrestesnsvrteneeisnttiniensnttevitiersitnttmersnes 22
`8.5.2 Major interfaces... cece cceseecseseeereseseevscseetevsettrtevenivitevavenivecitetevinivensvenenevecetey 22
`8.5.3
`Blok diagram... cc cece cceeec cscs este cseseetseseetevscneetevsettinevenitevevavenavititeveviviveneveneterecetes 22
`8.6 VIP = VId@O INDUL DOT cece cee teense tee ee denn COTE REE CREE CRED CEES KEE get teES Cette ESeataRebenens 22
`8.6.1—De@SCription ooo. cceescecetee scenes venetesscenevevseatesevstevenssnntenstesetnvtvensrenstonannenerererss 22
`8.6.2 Major interfaces. o.oo ccc ec cece cece tebe et ee te te tenet netetetitetetivitinititinettteees 22
`8.6.3
`BlOCK diaQraM nooo occ cc cce cc ceccecceceeveeveceveeseveeevevevevervevevevientititeventrvitvertientwtitervene 22
`8.7
`ROM ~ BOOT POMLecce ener b eco e tec c cde be cee ce cs beeeeesccteneeeecttensseaes 22
`S.7 1
`De@SCription ccc cece eeceevevecetecseeversesteaveveteteeceveteseteteetsvetentscrereerenetererteseterereess 22
`8.7.2 Major interfaces... occ ccccccceccceeeee sees tenses erveveneersevevesitettetitvintsvninistertneittenenn 22
`PS el
`=|(601ato=1]=01 22
`BB
`(20 2 MSTAoo cece cece ce cee ence ce keen eee cen coed ee code de ec ee ca dec cede ec ccseeeeseneraeeeencnterriees 22
`S.8.])
` De@SCriPtiOn ccc eee se senses cevenerceetressenetsvitnerssnttinitntntnevititinsietistatnersnes 22
`8.8.2 Major interfaces... cece cee cceseecseseeeeeseseevscneetevsetertevensvativavenevetitetevinitensneneterecetey 22
`8.8.3
`Block diagram. oo ice ee ec ee tenet neni ee et eeteceeetetetetetinstetetitinieititinetettetey 23
`8.9
`DU — DISDIY cece ccc cence cence un cee cree ene eceeceucreesesconecseecsecesccerseeccccreersereutcrscenescserees 20
`8.9.1
`De@SCriIPtion occ ceccceseesccscesescesesescesteeseveveseccevevevesnsetessvavsserteseseniesrwssetenerereess 23
`8.9.2 Major interfaces... ccc ccc ceeceeceeeeeeseeeeveveneeervevevstietvatittentsvitetessertensuttenenen 23
`8.9.3
`BOCK GIAQPAM Loo. cece cee ce cscs eeeerseteteeseeeevevenisersetevistitntinitvertrvenitintertiniittevenn 24
`8.10
`MA — MEMmory FUee EERE net tet tttttettteatenen teenie 24
`SLO.)
`Deseription ooo ccc ecccceceecseseecscecstecseevesescnveveevetnvesstventicivavervetiventirerevvirervees, 24
`8.10.2
`Major interfaces... ccc cece cscs ees eee eresveneer sevens sentestvintsvnvistertnttmenenes 24
`8.10.3
`Block diQQrarm. en... cece cece cseseecseseeetesertesscseevevsertrtevenivatevstenivititetevitivensvenenerecetey 24
`8.11
`HDP ~ Host Data Path occ cccccce cee cececcecececcgne asec eeeeeeeceeuenecsaeeeeceeeeeececeneanereeeesecees 24
`S.PL.1
`Description cece cesses cseeteeseteeescneetiecsttterssnensticnsversititnsietissvnitersaens 25
`S.11.2
`Major interfaces... ccc cc cceceecccseeceeseetevsceeevevserertevenevevevaveneveciuevevevavevsvenevevecasey 25
`S.11.3
`Block diagram... ccc ccc cece ee ce cent tenet te te ee teteteteteetestetetetetetetenenetenetetes 25
`812
`IDCT — Mpeg decoder... ccc ccc cece ceee een cee ccceneee cue eeeecceeseceeesenieeeentrecatieecsteerenees 25
`8.12.1
`Description 00ccc eee eee ee cece ee ee cate cetetetetee vee tetetetetertitittetetitetitettrcseees 25
`8.12.2
`Major interfaces... ce cc cceceeceeeeees eee erevveveeervevevevtentetiteventevitevesttertnutmervene, 25
`8.12.3
`Block diagran. occ ccc cece cs ceeee sete senses tevevenieersevenssettesiteventsvnitirtentnsitmtenenn 25
`8.13
`PA— Primitive ASSOMDIY .0... 0.0 ccc ccc cece nett tennes see sescceseeeeseeeeescueseesseeeseesseeseeereeesereieas 25
`Exhibit 2041.00
`48154 Bytes*** © ATI BBReference Copyright Notice on Cover Page © ***yo4/45 12-48 pw
`
`
`
`DOCUMENT-REV. NUM.
`
`AMD1044_0017967
`
`ATI Ex. 2013
`IPR2023-00922
`Page 3 of 32
`
`ATI Ex. 2013
`
`IPR2023-00922
`Page 3 of 32
`
`
`
`
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`
`
`
`
` ORIGINATE DATE
`
`EDIT DATE
`
`
`
`
`
`
`
`et
`11 March, 2001
`4 September, 2015
`R400 Top Level Spec
`4 of 32
`
`PO sn B=cr00c(0)9 25
`8.13.2
`Major interfaces... ccc ccc csceceeccescesecseesescstesessnttesevsvtessetevenieivesersnnesereess 25
`8.13.3
`Block diagram... cece cc cccceeeecececeeeeeeveveveeervevetestietvatiteventsvetetertierteniitternene 26
`814
`TD — Texture D@COMPIeSSION 00... cece tence ceetn ee seceteebsceeeeeceeeeeeeseeeseesegetansesenananeaes 26
`8.14.1
`Description occ cece cece ee ee te te tetetetetee vee tetetetetentitientetetitetitetereseees 26
`8.14.2
`Major interfaces o.oo ce cc cccceeceeseeeseeesevvevevervevevevsstntetiteventevitvestestnvtmervene, 27
`8.14.3
`Block diaAgraen... cece ccc ceeeee sete senses eerevenieersetenstiettenitninnsvnitirtsentnititernenns 27
`8.15
`RE — Raster Engine... ccc cccecee tee ee eee eee e nee nnd On EE CE ED CeEOK CCGG EEGeccgetteESceteHeneeeenreenags 2?
`SUS.)
`Description oc cece cece ee cescseetecseeteetscntetiecettterisntenticntirervitivinstienenitersenens 27
`8.15.2—Major interfaces... ccc cece cc cseeeseseseeescseetevseterneveninatevstenevicitevevinisensteniterecenes 28
`S153
`Block diagram... cece cece cseseecseseeesesestevscneevevsestrnevensvetevatenevititevevivinensneneterecesey 29
`B16
`SP - SNA] PHOS eee ccc cece cee eee een ee eee ceded deed cr Codes tec c eee Ecc deeb edeee cnr edeetencnaeeeeccsees 30
`S. 16.1
`Description oooccc cscs eecestensvesenevscrtetevseevensenstetrtesvarseireniesstonenererensten 30
`8.16.2—Major interfaces ooo cece cece cc cece cent eee et tebe bree ne tent tetetstetetenititititetetetitenes 30
`8.16.3
`Block diagram... ooo ccoccccccceccccceeeveeeveeseveeeveveveeervevevevstenvetiveventevitvertiestnetevene, 31
`8.17
`TPR TOxture PIDGcee tee ee etn nnn Eon OEE oOo C oto ce bette c eb etteceteene tetas 32
`S.V7.1
`Description oc ccc cee ccecececceneeeveteteevevecetescreteevenstererseseverereeteteteseneseteterenees 32
`8.17.2
`Major interfaces... ccc cece ceeseees eee eresveneersetenerssettittvintsvnwistentnttmernenss 32
`
`POS 0 Al =010 e 116|£11 32
`8.18
`RB — ReNder BACKS cccccccccece cee ee ee ccecececcceceereeeeeeecceeeeecccecesecerenenecrenenecresessceress 32
`S18.)
`Description oc ccc cece cece ee cesceeetesseetessscneetievstnterssneennicnsversititinsistinsisitersenens 32
`8.18.2—Major interfaces... ccc ccecee cc eseeesesesteescseetevsererteveninatevatenevetitetevininenstenitereceey 32
`8.18.3
`Block diagram... cece eee ec cence eben et ee tate tevetetetetenstetetetinitititinettieetey 32
`819
`MC —Memory Controller oo cece c eee e eee ce eben cceaaeecueeeeeeceeeenctstnieceeees 32
`S19]
`Description ooccc es eeeevnstensvesenevscntetevseenensenstetitesverseirennisstenenererenetes 32
`8.19.2
`Major interfaces... ccc cccecec cece ees eeeeeeeveneeervevetstsettatttentsvtetestestenttmenene 32
`8.19.3
`Block di@QraN en... ccc cece cs ceeeee sete tenses eevevenieervevevttsettasitvintsvniweriertneimtenenns 32
`COMMON FOUNDATIONS .........cc:ccccsnccscconsscnosssesnensseeseueesessuensevenaeaecenauaveesauessersusesuananesneneas a2
`9,
`Logic DESIQK eee cecec eee e ce cee cease eee e ee ce cece cca ea ee seeeeeeeeceeeeeccititssiteeeesteeeeeeesees 32
`9.1
`QL.1
`Datta formats i.e ccccccccceccsesescessevecevecseescreecrereesvevseressuscreevssssusvevseresvevevivescavenes 32
`QiL.2—Register Bus... ccccccccccecceccceececeveeseevensrcaeevervevavenstavenevvinivenavevevevettetsvneteatstineereesees 32
`9.1.3
`Block Communication protocol oo... ccececccseseseseeteeeesseteenseatettsvnitestertisstenttenvenens 32
`D2 SOWA cece cece cee cece ce ee cence nee Ec nEG EEC cnc CEE Ee Sd COE E EE Ecc E EGU C cca dE E EEC ed cH CoE eeEccceteeEECcEtEesetees 32
`
`Exhibit 2041.00
`
`48154 Bytes*** © ATI BBReference Copyright Notice on Cover Page © ***yo4/45 12-48 pw
`
`AMD1044_0017968
`
`ATI Ex. 2013
`IPR2023-00922
`Page 4 of 32
`
`ATI Ex. 2013
`
`IPR2023-00922
`Page 4 of 32
`
`
`
`
`
`
`
`Revision Changes:
`
`ORIGINATE DATE
`
`EDIT DATE
`
`
`
`DOCUMENT-REV. NUM.
`
`11 March, 2001
`
`4 September, 2015
`
`GEN-CXXXXX-REVA
`
`Rev 0.0 (Steve Morein)
`Date: March 11, 2001
`Initial revision.
`
`
`
`Date March 14,2001
`
`Document recreated from earlier documents
`
`Finally got back to editing it.
`
`Exhibit2041,.D0C
`
`48154 Bytes*** © ATI BBRference Copyright Notice on Cover Page © ***p45 12-48 py
`
`AMD1044_0017969
`
`ATI Ex. 2013
`IPR2023-00922
`Page 5 of 32
`
`ATI Ex. 2013
`
`IPR2023-00922
`Page 5 of 32
`
`
`
`
`
`
`
`
`
`ORIGINATE DATE
`
`EDIT DATE
`
`
`
`DOCUMENT-REV. NUM.
`
`PAGE
`
`11 March, 2001
`
`introduction
`
`4 September, 2015
`
`R400 Top Level Spec
`
`6 of 32
`
`The R400 will be the high end standalone graphics chip product whenit is introduced.
`lt will be followed very rapidly with two variants:
`The RV400, aimed at the volume PC space
`The R450, aimed at a volume high end market.
`The targets for the three chips are:
`
`
`
`
`
`
`Clock | pixels/clk|texture alu ops/clk|Memory die size|TapeoutMemory |
`
`; speed|| |Speed Po fetches/celk | width
`
`
`
`R400|400MHz | 8 16 32 | 256 400MHz 11.5 July,2002 |
`
`
`
`
`
`
`
`
`RV40|500MHz 4 8 16 128 500 MHz 8.5 Nov 2002 |
`
`
`
`
`
`
`
` Part
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`R450|500MHz|8 16 32 | 256? 500 MHz 9.5 Feb 2003
`
`Exhibit 2041.000
`
`48184 Bytes*** © ATI BRRcference Copyright Notice on Cover Page © ***jo0495 12:43 0M
`
`AMD1044_0017970
`
`ATI Ex. 2013
`IPR2023-00922
`Page 6 of 32
`
`ATI Ex. 2013
`
`IPR2023-00922
`Page 6 of 32
`
`
`
`
`
`
`
`1. Features
`
`11 March, 2001
`
`4 September, 2015
`
`GEN-CXXXXX-REVA
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`
`
`1.1 AGP 8x
`
`The chip will support the 32 bit AGP interface at speeds up to 8x. | expect that we will need to support AGP 1x and 2x
`which require 3.3 Voit 1/0 (AGP 4x is 1.5v and AGP &x is 750mv). AGP fast writes are supported for access to the
`frame buffer.
`
`Open issue: 64 bit address space support.
`
`1.2 256 Bit MemoryInterface
`The R400 and R450 support four memory channels, which can be 32 or 64 bits wide; the maximum memory bus
`width is a total of 256 bits. The RV400 supports two memory channels and a maximum total width of 128 bits.
`
`All channels
`
`need to be configured identically, 1, 2 or 4 channels can be configured.
`
`
`
`
`
`
`
`Memory standards supported:
`
`| VO
`Voltage
`Memory type
`| Speed
`
`| SSTL2.5
`2.5
`DDR
`100 to 500 MHz
`| SSTL1.8
`1.8
`DDR/infineon
`| 300 to 500 MHz
`
`| Elpida
`i Infineon
`
`1.8 (1.5?)
`1.2,1.0V
`
`Elpida
`nfinion e-dram
`
`300 to 400 MHz
`| 500 MHz
`
`No support for SSTL3.3, or SDRAM (LVTTL — 3.3V) is planned.
`
`1.3 Unified Processing pipe
`The most ambitious feature in this design is the “truly unified pipe” : a single programmable pipeline is used for 2D,
`Video, 3D vertex, and 3D pixel operations. The unified pipeline does all ofits calculations in 32 bit floating point, the
`same as the existing vertex transform in previous chip, and the next step in the precision of the color/pixel
`caiculations which have increased from 8 bits (R100), through 16 bits (R200), to the 20 bits in the R300.
`
`There is an area cosi to the unified pipeline since we are forced to go to 32 bit precision for color, when application
`requirements may need less (22 to 24 bits). However the unified pipeline results in a single math/register structure
`compared to the separate structures in a more traditional design. it is hoped that by only needing to design the one
`structure we can make the investment in design time and effort to really optimize the area.
`
`Some of the benefits to merging the pipelines include allowing the vertex operations to do texture fetches, which we
`could not afford add logic to the transform pipe to do, a single programming model for both operations, more precision
`on color than we would normaily provide, and the ability to support significantly more registers and instructions in
`pixel shaders.
`
`One important benefit is load balancing. In the current pipeline when the app it transform bound the pixel pipeline is
`idle some significant portion of the time, and when the app is raster bound the transform hardware idle. The unified
`pipeline presented here dynamically allocates its processing power between transform and raster.
`1.4 Front end scaling
`We will remove the back end scaling capability from the display, and replace it with a non-scaling overlay. This will
`require us to be able to implement scaling using the unified pipeline. Key features that will need to be supported are
`large filter kernels, de-interlacing, frame rate conversion, and good support for YUV and color conversion.
`
`1.5 Real-Time drawing command ability
`To allow for the emulation of backend scaling as well as support new features we need to be abie to interrupt the 3D
`pipe and be able to execute high priority commands with low latency. The point of interruption is in the primitive
`Exhibit2041.D0C
`48184 Bytes*** © ATI BBcference Copyright Notice on Cover Page © ***gogaj5 12-28 eM
`
`AMD1044_0017971
`
`ATI Ex. 2013
`IPR2023-00922
`Page 7 of 32
`
`ATI Ex. 2013
`
`IPR2023-00922
`Page 7 of 32
`
`
`
`R400 Top Level Spec
`
`ORIGINATE DATE
`
`EDIT DATE
`
`DOCUMENT-REV. NUM.
`
`11 March, 2001
`
`4 September, 2015
`
`assembly, the maximum latency will be about the time it takes to render 4096 pixels. The real time commands are
`inserted into the 3D pipeline after transform, clipping, and setup. Those function need to be performed bythe driver.
`There are also limits on the numberof constant registers available.
`
`1.6 3D Features
`
`There are a number of new 3D features we are considering for inclusion. Additional features may be added, and
`some of these may be cropped.
`
`1.6.1 Noise Textures
`
`Perlin style noise is useful for a number of applications. It is generated on chip and consumes no external memory
`bandwidth. It also larger than any physical texture can be: 256x256x256 lattice points, and still has detail when the
`resolution is 4Kx4Kx4K. There is an opportunity to get this adopted as part of dx9.
`
`1.6.2 Shadow buffers
`
`John Carmack is using shadow volumes to generate shadow effects in doom3. Shadow volumes are very poor way
`to use modern 3D pipelines. (will add more detail here later). Shadow buffers have two key limitations: very high
`resolutions are required to avoid aliasing, and traditional shadow buffers can not be mip-mapped so filtering is real
`problem. We are able to solve the first problem through a combination of our improved anti-aliasing Z compression,
`and a new method of implementing the shadow map probe.
`
`1.6.3 Sort Independent Transparency
`We are currenily looking into how best to support sort independent transparency. The two plans are either the dual Z
`buffer approach, or the approach described in <need to decide where the email should be placed so others can see>
`
`1.6.4 Anti-Aliasing
`The changes from the R300 include an increased number of samples per pixel, probably eight, and support for an
`allocated frame buffer size smaller than the worst case maximum.
`
`1.6.5 Texture compression
`To further reduce bandwidth we need to improve texture compression. We need to achieve both better compression
`that S3TC, and have a high enough quality that textures that would lose too much detail with S3TC can be
`compressed. Both of these goals do not need to be achieved simultaneously on all textures. We also need to look at
`compression of non-traditional surfaces such as normal maps. Advances here are dependent on the availability of
`resources to work on this. If we are unable to find resources we will support the s3tce compression currently in D3D.
`
`1.6.6 Z compression
`<larry needs to give me a paragraph here>
`
`1.6.7 Texture Filtering
`The texture pipes can fetch a 2x2 region from the texture map and filter it.
`The data per pixel can either be four eight bit values, two sixteen bit values, or one 32 value. All data needs to be
`fixed point.
`Linear filters are completely built in, and it takes 1 cycle for bi-linear, 2 for tri-linear, four for quadra-linear (filtered mip-
`mapping of volume textures). Variable depth anisotropy is supported in hardwarewith the texture pipe calculating the
`number of samples needed. Optionally the pixel shader can calculate the number of samples, and how to increment
`the texture address, and provide this to the texture pipe.
`
`Exhibit2041.D0C
`
`48154 Bytes*** © ATI BEReference Copyright Notice on Cover Page © ***poc45 1248 py
`
`AMD1044_0017972
`
`ATI Ex. 2013
`IPR2023-00922
`Page 8 of 32
`
`ATI Ex. 2013
`
`IPR2023-00922
`Page 8 of 32
`
`
`
`
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`
`
`GEN-CXXXXX-REVA
`11 March, 2001
`4 September, 2015
`1.6.8 Curved Surface Support
`We will support curved surfaces through combination of vertex shader code and a tessellation engine to generate
`new vertices.
`
`The tessellation engine generated new vertex indices from a input vertex index array. The newindices contain both
`the coordinate in parametric space of the vertex, and the indices to the surface, or to data from which the surface can
`be derived. More information is available in the programming guide.
`
`1.6.9 Displacement maps
`The tessellation engine for curved surfaces can dice triangles into micropolygons, the vertex shaders for the vertices
`can then accessinto a displacement map and change the location of the points.
`1.7 High color depth
`We will support a 64 bit color buffer (16:16:16:16), we will support two formats: sRGB64 and a floating point format..
`<need to insert format details.
`
`2. Performance
`
`The basic performanceis:
`
` R400 MHz __fillrate bi-linear equiv peak tri/sec
`
`MHz
`Fill rate
`Bi-linear texture
`Peak tri/sec
`ee cee [fetches ene
`| R400
`: 400
`3.2 gigapixel
`6.4 Billion
`400 Million
`Rv4000500(200(400(800Million_
`| R450
`: 500
`40
`8.0
`500 Million
`
`
`
`
`
`
`
`Under normal conditions, and when notfurther limited by memory bandwidth we expect to be > 75% efficient.
`
`3. Schedule
`
`
`Samples
`Production
`Tapeout
`| R400
`July, 2002
`Oct, 2002
`Dec, 2002
`
`| RV400
`Nov, 2002
`Jan, 2003
`March, 2003
`| R450 May 2003 Jan, 2003 April 2003
`
`
`
`
`
`
`
`
`
`
`4. Process
`
`At the momentthis looks like an easy choice: .13 will be in production for over a year, and .10 does not show up until
`the very end of 2002 according to the TSMC and UMC roadmaps.
`
`We will probably want to be in a flip chip packaging approach to meet power distribution goals. With the 256 bit bus
`we will have at least 600 signal |/O’s (404 in memory). We may be as much as 10A at 1V for average power, which
`will require very good power distribution, area bond flip chip is probably the only option.
`
`5. General Chip operation
`
`5.1 Unified Shader
`
`The unified shader is a simd/vector engine that performs the same instructions on four sets of four (16 total)
`elements. For pixel shader operations the elements are pixels with the sets of four required to be 2x2 footprints. For
`
`Exhibit2041.D0C
`
`48184 Bytes*** © ATI BBcference Copyright Notice on Cover Page © ***gogaj5 12-28 eM
`
`AMD1044_0017973
`
`ATI Ex. 2013
`IPR2023-00922
`Page 9 of 32
`
`ATI Ex. 2013
`
`IPR2023-00922
`Page 9 of 32
`
`
`
`DOCUMENT-REV. NUM.
`
`R400 Top Level Spec
`
`PAGE
`
`10 of 32
`
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`
`11 March, 2001
`4 September, 2015
`
`vertex shader operations the sixteen elements are sixteen vertices. The basic elementis a 4 value vector — frequently
`interpreted as X,y,Z,w or1,9,b,a.
`
`The user model for the unified shader is composed of a variable number of general purpose registers, a subset of
`which are usually initialized with data. An ALU can do simple math, conditional moves, and permutations on the
`registers, and the ability to do a limited number of memory reads using the texture cache. The numberofregister is
`variable, and the number of registers required for an operation are specified when the task is submitted to the unified
`shader. The unified shader will not start the task until there is enough free room for the task’s registers.
`
`The unified shader is based on the R300 pixel shader.
`
`5.2 3D Rendering
`
`For 3D rendering data is passed twice through the unified shader- once to transform the vertices and a second time
`to determine the color of the pixels.
`
`The input to the 3D pipe is expected to be indexed vertex arrays. Linear vertex arrays can easily be supported by the
`CP generating sequential indices. Inline vertex data is an open issue, | would prefer to write it to memory and then
`fetch it as a vertex array rather than add a direct path.
`
`The stream of indices is sent to the Primitive Assembly block by the CP. The front of the primitive assembly biock
`maintains the tag for the vertex cache; The vertex cache stores transformed vertices. As misses are detected in the
`tag, the indices that miss are placed into 16 entry vectors. Each vector contains a state pointer, a pointer to the vertex
`shader to be used, and the 16 indices to vertices that need to be transformed. When either a vector is filled with 16
`entries or a state change happens (so that the next vertex does not share the state and vertex shader with the
`previous vertex) the vector is issued to one of the “shader” pipelines for transformation. Which of the four shader
`pipelines it
`is issued to determined either by some effort of load balancing or a simple round robin. All that is
`submitted to the pixel pipeline is the state, the vertex program, and the indices. The shader pipeline will fetch the
`vertex array data through the cache infrastructure that is also used for texture fetches. After the tag the indices
`(actually now the indices into the vertex cache) are placed into a latency FIFO to hide the latency of transforming the
`vertices.
`
`The shader pipeline receives the vector of 16 indices from the primitive assembly block. The shader pipeline
`operates, when rendering pixels, by processing a vector of four 2x2 pixel footprints, a total of 16 pixels. For vertex
`processing each of the pixels is replaced with a vertex. The vertex program includes information of how many local
`variables it will need. The rasterizer waits until that many local variables are free, (as each executing thread in the
`shader pipeline terminatesit frees its local variables). With the proposed shader data path the maximum number of
`local variables per vertex is 256. However this leaves no ability to hide latency, 16 fo 32 local variables will probably
`maximize latency hiding and therefore performance. The vertex shader program can use all the capabilities of the
`shader pipeline including texture fetches and dependent lookups. At the end of the vertex program, the transformed
`coordinates must be output. One output will be the x, y, z, w position which we be stored in the position cache of the
`
`vertex cache. The vertex program may also output a number of parameter values (colors, texture coordinates, other
`interpolated inputs into the pixel shader). The parameter values must be output as a multiple of four 128 bit words, as
`the parameter cache is designed for this.
`
`The primitive assembly block reads the indices back out of the latency FIFO and accesses the position cache portion
`of the vertex cache.
`It assembles the
`vertices into primitives (lines,
`triangles, rectangles, quads?, points, ?).
`Baricentric values are assigned to the vertices, and will be used later in the rasterizer to interpolate the parameters.
`The parameters are not accessed by the primitive assembly logic, which only works from the position data. The
`primitive is clipped against both the viewing volume as well as user clip planes, with fractional baricentric coordinates
`assigned to the clipped primitive sections. The primitive goes through the perspective divide and the viewport
`transform. The resulting screen space primitive is setup (plane equations for 1AWV, Z, and the baricentric coordinates).
`The resulting primitive data, including the indices back into the parameter portion of the vertex cache are broadcast to
`the four pipes. The final time that an index is output that access the oldest vertex cache line, a token is also sent.
`Whenall of the four pipelines return the token the primitive assembly block can free that cacheline and allow it to be
`used for a new vector of vertices. The performance goal in the primitive assembly biock is a triangle every two clocks.
`An alternative option is for the vertex shader to generate screen coordinates and clip codes. If a primitive needs to be
`clipped, which can not be determined until primitive assembly, then the vertices are reverse transformed backinto clip
`space bylogic in the primitive assembly biock, clipped, and then transformed back into screen space.
`Exhibit2041.D0C
`48154 Bytes*** © ATI BEReference Copyright Notice on Cover Page © ***poc4495 1248 py
`
`AMD1044_0017974
`
`ATI Ex. 2013
`IPR2023-00922
`Page 10 of 32
`
`ATI Ex. 2013
`
`IPR2023-00922
`Page 10 of 32
`
`
`
`
`
`
`ORIGINATE DATE
`EDIT DATE
`DOCUMENT-REV. NUM.
`PAGE
`
`
`GEN-CXXXXX-REVA
`11 of
`11 March, 2001
`4 September, 2015
`
`To help meet marketing BS numbers we can look into doing backface culling at a rate of one triangle per clock. This
`will boosi us to peak bs number of 500 million triangles per second.
`
`Each pipe has a FIFO in front of the rasterizer to load balance. Each pipe will handle 16x16 tiles of the screen which
`are interleaved between the pipes. To maximize the effective size of the FIFO we will probably cull the triangle list
`before the FIFO. The rasterizer will request the parameter data from the parameter cache for the primitives. A small
`latency hiding FIFO will hide the latency of the access to the parameter cache. The parameter cache is 512 bits wide,
`and the interfaces from the parameter cache to the rasterizer are 128 bits wide, this allows the parameter cache to
`output one pipelines request per clock, which is serialized over four clocks, keeping all four interfaces busy. The
`rasterizer keeps a small cache of three to four ver