throbber
7/26/2017
`
`Beyond3D - ATI Xenos: Xbox 360 Graphics Demystified
`
`
`
`3D Tabies He
`PR
`interviews
`Reviews Articles
`News
`Forums AboutUs Advertise Contact MyB3D
`we
`
`Games
`
`Software
`
`Processor Games
`
`d
`
`Consumer Graphics
`
`ProGraphics
`
`GPGPU
`
`Displays
`Consoles
`Development
`Oi ATI Xenos: Xbox 360 Graphics Demystified - Page 1
`Pubilshed on 13th Jun 2005, written by Dave Baumann for Consumer Graphics - Last updated: 21st Mar
`& NVIDIA Ferm! GPU and
`2007
`Architecture Analysis
`F
`& ATI Cypress Gaming
`orewor
`Performance Analysis
`= ATI Cypress GPU and
`Those that havefollowed the development of 3D graphics over the past ten years or so
`ate will have seen a continual development of the capabilities of the processors, but
`a
`Architecture Analysis
`fundamentally following the path of OpenGLpipeline model. 3dfx really ignited the
`© NVIDIA GT200 GPU and
`market with their “Voodoo Graphics” add-in boards, which were not much more than
`Architecture Analysis
`just a raster engine: It utilised one chip for texture sampling and another for pixel
`processing (a simple Render Output unit - ROP); 3dfx further evolved that by adding
`= an extra texture unit, allowing for slightly more complex effects in the raster pipeline.
`f
`‘p
`And so it was that this mode! was followed for a numberof years with the main
`. woa oe developments being the numberof pixel pipelines and textures supported per pipeline,
`ANNOUNCED
`until NVIDIA took the step of moving further forward on OpenGL pipeline and giving
`a eeny ~ NVIDIA Tesla
`accelerated support to the Transformation and Lighting process with GeForce 256.
`;
`:
`Whilst graphics processors had varying degrees of the geometry process, from clipping
`- RebeoetnaotOnan Source
`to setup, handied In hardware, adding a T&Lengine wasa significant step up the
`3D
`OpenGLpipeline, but didn’t really fundamentally change our thinking of graphics
`© A speculative look on the Wil U—sprocessors.
`GPU
`
`u Andy Keane Interview & Tesla=At the same time as the graphics vendors started giving us T&L engines the pixel
`Adoption/Deployment
`processors gradually increased In flexibility as well, up until the point that
`2 onaaaSeeman into
`“programmable shader architectures” were all anyone could talk about. The pixel
`sanpinaane Beyond3D's ae
`pipelines became more flexible such that they Nad limited programmability, as did
`ever book review
`vertex processing, with vertex shaders operating in paralle} with T&L engines.
`4 Q&A with Visceral's Technical
`Nowadays thelevel of programmability of both vertex and pixel shaders has Increased
`Soauremee i Brooks on
`significantly with each vertex shaders enveloping the T&L processors entirely and pixel
`¢ E3 2011: Behind Closed Doors -
`shaders consuming the texture processors. However, despite an Increasingly important
`Witcher 2 Xbox 360... anda
`onus being placed on the arrangement and capabilities of the shader Arithmetic Logic
`Nurse
`Units (ALU’s) in this programmable era, the designs of contemporary grephics
`| Tan Buck - NVIDIA Tesia Launch
`processors still bear the fundamental similarities to thelr forebears: vertex processing
`up one endof the pipeline, pixel processing down the other and still very much aligned
`with multiplies of pixel pipelines.
`
`hittps:/Awww.beyond3d.conm/content/articies/4/1
`
`Conceivably there Is no reason why this development model couldn’t continue to exist
`In the PC space and it certainly seems jike It will from al! vendors for at least the next
`year. However, ATI have multiple design teams working on different architectures
`concurrently, so whilst their PC processors may follow a fairly familiar lineage other
`parts of the company have been talking this shader era with a completely fresh
`perspective In order to consider the needs of a "Programmable Graphics Processor” and
`extract as much of the potential of the ALU‘s as possible by trying to minimise the
`
`1/2
`
`AMD1044_0279902
`
`AMD1318_0158231
`ATI Ex. 2124
`
`IPR2023-00922
`
`Page 1 of 33
`
`ATI Ex. 2124
`IPR2023-00922
`Page 1 of 33
`
`

`

`7/28/2017
`
`Beyond3D - ATI Xenos: Xbox 360 Graphics Demystified
`
`wasted cycles, In doing so they will force us to reconsider how we think of the overall
`pipeline and make initia) performance assessments based upon “pipelines” alone.
`
`Introduction
`
`Ever since the announcementthat ATI were working with Microsoft on "Future XBox
`technologies" the rumour mil! has been working overtime as to the graphics behindit.
`Someof the messages since the announcement of the XBOX 360, the eventual console
`ATI's work will appear in, have not necessarily been reflective of the actual operation
`and even a little contradictory from representatives directly from ATI, With strict NDA's
`and designs being built for two different competitive consoles, very tight controls of
`what could be talked about had to be implemented within ATI, and the XBOX group
`operated very much within their ownsilo; it wasn't until Microsoft lifted the NDA's that
`ATI could even speak of it on a wider internalbasis, let along externally, and even then
`there is a lot of information to gather.
`
`Since XBOX 360's announcementand ATI's
`unleashing from the non disclosure agreements we've
`had the chance to notjust chat with Robert Feldstein,
`VP of Engineering, but also Joe Cox, Director of
`Engineering overseeing the XBOX graphics design
`team, and two lead architects of the graphics
`processor, Clay Taylor and Mark Fowler. Here we hope
`to accurately impart a slightly deeper understanding
`of the XBOX 360 graphics processor, how it sits within
`the system, understand more about its operation as
`well as give someinsights into the capabilities of the
`processor. Bear in mind that we are under NDAfor
`some of the operational details of the graphics
`processor to gain an understanding of how it differs
`from current platforms however someof the specifics
`won't be revealed in full detail! in this article.
`
`Click for a bigger version
`
`
`
`Throughout this article we'll attempt to piece together the operation of the graphics
`processor based on our conversations with ATI and some developers who have already
`had some knowledge ofXBOX 360's capabilities, however we'll also offer some opinions
`on certain elements. Sections typed in blue indicate Beyond3D's suppositions and have
`not been directly indicated to us by ATI.
`
`Page Navigation
`Page 1
`7 >
`
`Site width adjust
`
`Reset width
`Privacy Policy
`
`https:/Avww.beyond3d.com/content/articies/4/1
`
`2/2
`
`AMD1318_0158232
`ATI Ex. 2124
`
`IPR2023-00922
`
`Page 2 of 33
`
`ATI Ex. 2124
`IPR2023-00922
`Page 2 of 33
`
`

`

`7/28/2017
`
`Beyond3D - AT| Xenos: Xbox 360 Graphics Demystified
`
`Beyond3D ee)
`
`
`
`News Reviews Articies
`
`interviews
`PR
`30 Tabies
`Forums AboutUs Advertise Contact MyB3D
`
`HO
`~
`
`ProGraphics
`
`GPGPU
`
`Consumer Graphics
`
`© NVIDIA Fermi GPU and
`Architecture Analysis
`« ATI Cypress Gaming
`Performance Analysis
`& ATI Cypress GPU and
`Architecture analysis
`4 ATI RV740 GPU and
`Architecture Analysis
`& NVIDIA GT200 GPU and
`Architecture Analysis
`
`eo
`
`eG
`
`© E3 2011: Behind Closed Doors -
`Witcher 2 Xbox 360... and a
`Nurse
`« A speculative look on the Wii U
`GPU
`& Q&A with Visceral’s Technical
`Art Director Doug Brooks on
`Dead Space 2
`Tan Buck - NVIDIA Tesla Launch
`AMD's John Bridgman on
`Radeon, Linux and Open Source
`3D
`Andy Keane Interview & Tesla
`Adoption/Deployment
`a Was Harry Potter actually into
`rendering? Beyond3D'sfirst
`ever book review
`Dave Kirk - NVIDIA Tesia
`Launch
`& Travelling in Style: Beyond3D's
`C++ AMP contest - WINNER
`ANNOUNCED
`Diving into Anti-Aliasing
`
`c
`
`L
`
`Displays Games
`Consoles
`Development
`ATI Xenos: Xbox 360 Graphics Demystified - Page 2
`Published on 13th Jun 2005, written by Dave Baumann for Consumer Graphics - Last updated: 21st Mar
`2007
`
`Software
`
`Processor Games
`
`Xbox 360 System Overview
`The "XBOX 360" console wasofficially unveiled at a show on MTV the weekprior to E3
`2005, and at the unveiling Microsoft revealed a few technica! details of the platform.
`The primary specifications for the system are:
`* 3.2GHz Custom IBM Central Processor
`© Three CPU Cores
`° Two Threads Per core
`o VMX Unit Per Core
`© 128 VMX Registers Per Thread
`© 1MB L2 Cache (Lockable by Graphics Processor)
`
`+ S500OMHz Custom ATI Graphics Processor
`oe Unified Shader Core
`© 48 ALU’s for Vertex or Pixel Shader processing
`© 16 Filtered & 16 Unfiltered Texture samples per clock
`© 10MB eDRAM Framebuffer
`
`» 512MB System RAM
`o Unified Memory Architecture (UMA)
`© 128-bit interface
`© 700MHz GDOR3 RAM
`
`Of these core components obviously we are going to be most concerned with the
`graphics processing element. Whilst the graphics processor is different from others
`seen before in the PC space, and is very different from even ATI's impending new PC
`graphics components,it will be interesting to take a look at the graphics processorfor
`the very reasonthatit doesn't directly correspond to any current graphics processor
`but also because we feel that this will give hints as to the architectural direction ATI are
`likely to be taking in the future for PC and other applications.
`
`ATI C1 / Xenos
`
`A namethat has long since been mentioned in relation to the graphics behind Xenon
`(the development name for XBOX 360) is R500. Although this name has appeared from
`various sources, the actual development name ATI uses for Xenon's graphics is "C1",
`whilst the more "PR friendly" codename that has surfaced |s "Xenos". ATI are probably
`fairly keen not to use the R500 name as this drawsparallels with their upcoming series
`of PC graphics processors starting with R520, however R520 and Xenos are very
`
`https:/Awww.beyond3d.com/content/articies/4/2
`
`4
`
`AMD1044_0279904
`
`AMD1318_0158233
`ATI Ex. 2124
`
`IPR2023-00922
`
`Page 3 of 33
`
`ATI Ex. 2124
`IPR2023-00922
`Page 3 of 33
`
`

`

`7/28/2017
`
`Beyond3D - AT! Xenos: Xbox 360 Graphics Demystified
`
`Click for 2 bigger version
`
`distinct parts. R520's aim is obviously
`designed to meet the needs of the PC
`space and have Shader Mode! 3.0
`Capabilities as this |s currently the
`highest DirectX API specification available
`on the PC, and as such these new parts
`still have their lineage derived from the
`R300 core, with discrete Vertex and Pixel
`Shaders; Xenos, on the other hand,is a
`custom design specifically bullt to
`address the needs and unique
`characteristics of the game console. ATI
`had a clean slate with which to design on
`and no specified API to target. These
`factors have led to the Unified Shader
`design, something which ATI have
`prototyped and tested priortoits
`eventual implementation ( with the
`rumoured R400 development ? ) , with
`capabilities that don’t fall within any
`corresponding API specification. Whilst
`ostensibly Xenos has been halied as a
`Shader Model 3.0 part, its capabilities
`don't fall directly inline with it and exceed
`it in some areas giving this more than a
`whiff of WGF2.0 (Windows Graphics Foundation 2.0 - the new namefor DirectX Next /
`DirectX 10) aboutit.
`
`
`
`
`The Xenos graphics processoris not a singie element, but actually consists of two
`distinct elements: the graphics core (shader core) and the eDRAM module. The shader
`core is a 90nm chip manufactured by TSMC andis currently slated to run at SOOMHz*,
`whilst the eORAM module is another 90nm chip, manufactured by NEC and runs at
`SO0MHz* as well. These two chips both exist side by side, together on a single
`package, ensuring a fast interiink between the two. The main graphics chip, the parent
`core, could be considered as a “shader core” as this Is oneofits primary tasks. The
`eDRAM module |s a separate, daughter chip which contains the elements for reading
`and writing color, z and stencil and performing all of the alpha blending and z and
`stenci! ops, including the FSAA logic. We'll explore the capabilities and operations of
`both these chips in greater detail throughout thearticle.
`
`(*) Note: We understand the clockspeeds for the shader core and daughter die are
`target clockspeeds at present and there may be some room for small movementeither
`way on both dies dependant on yields. As Microsoft have now announced 500MHz
`speedsit is more likely that these will be the eventual release speeds.
`
`One elementthat has been reported on is the number of 150M transistors in relation to
`the graphics processing elements of Xenon, however according to ATI this is not
`correct as the shadercore itself is comprised from in the order of 232M transistors. It
`maybe that the 150Mtransistor figure pertains only to the eDRAM module as with
`10MB of DRAM, requiring onetransistor per bit, 80M transistors will be dedicated to
`just the memory; when we add the memory controllogic, Render Output Controllers
`(ROP's) and FSAA logic on top of that It may be concelvable to see an extra 70M
`transistors of logic in the eDRAM module.
`
`https:/Avww. beyond3d.com/content/articies/4/2
`
`2/4
`
`AMD1318_0158234
`ATI Ex. 2124
`
`IPR2023-00922
`
`Page 4 of 33
`
`ATI Ex. 2124
`IPR2023-00922
`Page 4 of 33
`
`

`

`7/28/2017
`
`Beyond3D - AT! Xenos: Xbox 360 Graphics Demystified
`Cilck for a bigger version
`
`
`
`Update: We've recently been given an image of the Xenos graphics chip package
`(above) that highlights the dual die nature, with the parent die quite clearly to the
`centre of the package and the daughter over to the left. While the 232M transistor
`figure for the parent was given to us by ATI we are still trying to establish a more
`official figure for the daughter (even though these types of transistor counts are very
`much estimates anyway). We've speculated that the 150M figure that appeared when
`XBOX 360 wasfirst announced may just relate to daughter die, however another figure
`that has arisen is 100M - judging from the die sizes the daughter die doesn’t have
`more than half the area of the parent, which would giv indications towards the 100M
`side although 80M of those transistors are DRAM which may be more dense than the
`logic circuitry that will dominate the parent die, We are trying to get further
`clarification.
`
`One of the mistakes that Microsoft made with the original XBox was to contract their
`componentproviders into supplying entire chips with, evidently, no developmentpath -
`at least, this was the case with NVIDIA NV2A graphics processor, which resulted in
`Microsoft and NVIDIA going through a ‘egal arbitration process. Although the
`components in the XBOX 360 Inits initial form are hardly low cost, the cost of the unit
`over the course of Its lifetime is one that has quite obviously been addressed with
`contracts that pay via royalties for chips sold and with Microsoft in charge of ordering
`the chips from the various Fabs, howeverthe original semiconductor manufacturers are
`lIkely to still be In charge of further developments in terms of putting the cores on to
`smaller processes and we believe thatthis is part of the contract that ATI has with
`Microsoft. An obvious area for cost reduction of the Xenos processor is by merging the
`shader and daughter die on to a single core - we suspect thatthis will not happen until
`there is a process shrink available (that can also cater for both the complex logic and
`eDRAM) as two cores on 90nm mitigate some ofthe yield risks of a single, large die on
`90nm.
`
`Page Navigation
`<j Page 2
`y >
`
`https:/Avww.beyond3d.com/content/articies/4/2
`
`3/4
`
`AMD1044_0279906
`
`AMD1318_0158235
`ATI Ex. 2124
`
`IPR2023-00922
`
`Page 5 of 33
`
`ATI Ex. 2124
`IPR2023-00922
`Page 5 of 33
`
`

`

`7/28/2017
`
`Beyond3D - AT! Xenos: Xbox 360 Graphics Demystified
`
`Site width adjust
`
`Reset width
`Privacy Policy
`
`https:/Avww.beyond3d.com/content/articies/4/2
`
`v4
`
`AMD1318_0158236
`ATI Ex. 2124
`
`IPR2023-00922
`
`Page 6 of 33
`
`ATI Ex. 2124
`IPR2023-00922
`Page 6 of 33
`
`

`

`7/28/2017
`
`Beyond3D - AT| Xenos: Xbox 360 Graphics Demystified
`
`Beyond3sD Ce)
`
`
`
`News Reviews Articies
`
`3D Tabies He
`PR
`interviews
`Forums AboutUs Advertise Contact MyB30
`ws
`
`ProGraphics
`
`GPGPU
`
`Games
`
`Software
`
`Processor Games
`
`Displays
`Consoles
`Development
`ATI Xenos: Xbox 360 Graphics Demystified - Page 3
`Published on 13th Jun 2005, written by Dave Baumann for Consumer Graphics - Last updated: 21st Mar
`2007
`
`Bandwidths and Interconnects
`
`Whencreating a high performance computing platform bandwidth between components
`and operations is highly important, especially when creating a system that has to last
`for 3-5 years before a new version comes about, such is the worid of consoles. With
`the Xenos processor being both a high performance graphics processing elementof the
`XBOX 360 as well as the "Northbridge" componentof the system, which is essentially
`the communication hub for the other components of the system, it has meny
`interconnects and bandwidths to deal with. Below is a diagram highlighting the
`connection bandwidths between the most important elements it is connected to:
`
`Consumer Graphics
`
`© NVIDIA Fermi GPU and
`Architecture Analysis
`© ATI Cypress Gaming
`Performance Analysis
`@ ATI Cypress GPU and
`Architecture analysis
`@ ATI RV740 GPU and
`Architecture Analysis
`a NVIDIA GT200 GPU and
`Architecture Analysis
`
`&
`
`& A speculative look on the Wii U
`GPU
`« Was Harry Potter actually into
`rendering? Beyond3D'sfirst
`ever book review
`© Diving into Anti-Aliasing
`a
`Ian Buck - NVIDIA Tesla Launch
`© E3 2011: Behind Closed Doors -
`Witcher 2 Xbox 360... and a
`Nurse
`Dave Kirk - NVIDIA Tesia
`Launch
`& Q&A with Visceral's Technical
`Art Director Doug Brooks on
`Dead Space 2
`Travelling in Style: Beyond3D's
`C++ AMP contest - WINNER
`ANNOUNCED
`« Andy Keane Interview & Tesia
`Adoption/Deployment
`@ AMD's John Bridgman on
`Radeon, Linux and Open Source
`3D
`
`o
`
`https:/Avww. beyond3d.com/content/articies/4/3
`
`14
`
`AMD1044_0279908
`
`AMD1318_0158237
`ATI Ex. 2124
`
`IPR2023-00922
`
`Page 7 of 33
`
`ATI Ex. 2124
`IPR2023-00922
`Page 7 of 33
`
`

`

`7/28/2017
`
`Beyond3D - ATI Xenos: Xbox 360 Graphics Demystified
`
`o
`
`Southbridge
`
`ae
`5)-
`r=
`co
`=ce]
`
`As we discussed earlier, the XBOX 360 carries a unified memory architecture and
`Xenos's parent die is acting as the Northbridge controller as well as the graphics
`processing device. The system memory bandwidth is 22.4GB/s courtesy of the 126-bit
`GDDR3 memory interface running at 700MHz. At 232M transistors the Xenos parent die
`isn't an enormouschip so internal memory communication Isn't going to be too latency
`bound, hence the memory interface only needs to be a standard crossbar, which is
`partitioned into two 64-bit blocks. Xenos's parent die also has a 32GB/s connection to
`the daughter, eDRAM die Connection to the Southbridge audio and I/O controlleris
`achieved via two PCI Express lanes which results in 5S00MB/s of both upstream and
`downstream bandwidth.
`
`As the CPU is going to be using Xenos to handle all its memory transfers, the
`connection between the two has 10.8G8/s of bandwidth both upstream and
`downstream simultaneously. Additionally the Xenos graphics processoris able to
`directly jock the cache of the CPU in order to retrieve data directly from it without it
`having to go to system memory beforehand. The purpose ofthis is that one (or more,
`if wanted) of the three CPU cores could be generating very high levels of geornetry that
`the developer doesn't want to, or can't, preserve in the memory footprints avaliable on
`the system whenin use. High-resolution dynamic geometry such as grass, leaves, hair,
`particles, water droplets and explosion effects are all examples of one type of scenario
`that the cache locking may be used In.
`
`https:/Awww.beyond3d.com/content/articies/4/3
`
`2/4
`
`AMD1318_0158238
`ATI Ex. 2124
`
`IPR2023-00922
`
`Page 8 of 33
`
`ATI Ex. 2124
`IPR2023-00922
`Page 8 of 33
`
`

`

`7/28/2017
`
`Beyond3D - AT! Xenos: Xbox 360 Graphics Demystified
`
`Xenos Daughter Die
`
`=4
`l+4
`oe)
`
`oi
`
`o)I
`
`oaf
`
`The one key area of bandwidth, that has caused a fair quantity of controversy in its
`inclusion of specifications, is that of bandwidth available from the ROPS to the eDRAM,
`which stands at 256GB/s. The eDRAM Is always going to be the primary location for
`any of the bandwidth intensive frame buffer operations andso it is specifically designed
`to remove the frame buffer memory bandwidth bottieneck - additionally, Z and colour
`access patterns tend not to be particularly optimal for traditional DRAM controllers
`where they are frequent read/write penalties, so by placing al of these operations in
`the eDRAM daughter die, aside from the system calls, this leaves the system memory
`bus free for texture and vertex data fetches which are both read only and are therefore
`highly efficient. Of course, with 10MB of frame buffer space available this isn't sufficient
`to fit the entire frame buffer in with 4x FSAA enabled at High Definition resolutions and
`we'll cover how this Is handled later In the article.
`
`Both XBOX 360 and Playstation 3 feature UMA and graphics busses, respectively, that
`have been announced to use fairly fast 7OOMHz GDDR3 memory, but both only have a
`128-bit interface. Whilst this is less of a surprise for XBOX 360 as Xenos's use of
`eDRAM will move the vast majority of the frame buffer bandwidth to the EDRAM
`interface leaving the system memory bandwidth available primarily for texturing
`bandwidth. It does seem odd that by the time the consoles will be released the
`likelihood is that high end PC graphics wil! using at least the same speed RAM but on
`double wide busses. The primary issue here is, again, one of cost - the lifetimes of a
`console will be much greater than that of PC graphics and process shrinks are used to
`reduce the costs of the interna| components; 256-bit busses may actually prevent
`process shrinks beyond a certain level as with the number of pins required to support
`busses this width could quickly become pad limited as the die size is reduced. 128-bit
`busses result in far fewer pins than 256-bit busses, thus allowing the chip to shrink to
`smaller die sizes before becoming pad limited - by this pointit is also likely that
`Xenos's daughter die will have been integrated into the shader core, further reducing
`the number of pins that are required.
`
`Page Navigation
`<j Page3 A
`
`https:/Awww.beyond3d.com/content/articies/4/3
`
`3/4
`
`AMD1044_0279910
`
`AMD1318_0158239
`ATI Ex. 2124
`
`IPR2023-00922
`
`Page 9 of 33
`
`ATI Ex. 2124
`IPR2023-00922
`Page 9 of 33
`
`

`

`7/28/2017
`
`Beyond3D - AT! Xenos: Xbox 360 Graphics Demystified
`
`Site width adjust
`
`Reset width
`Privacy Policy
`
`https:/Avww.beyond3d.com/content/articies/4/3
`
`44
`
`AMD1318_0158240
`ATI Ex. 2124
`
`IPR2023-00922
`
`Page 10 of 33
`
`ATI Ex. 2124
`IPR2023-00922
`Page 10 of 33
`
`

`

`7/28/2017
`
`Beyond3D - AT| Xenos: Xbox 360 Graphics Demystified
`
`Beyond3D ‘CO
`
`
`
`News
`Reviews Articles
`Interviews
`Forums AboutUs Advertise Contact
`
`PR
`
`3DTables
`MyB830D
`
`@‘
`a
`
`ProGraphics
`
`GPGPU
`
`Games
`
`Software
`
`Processor Games
`
`Displays
`Consoles
`Development
`ATI Xenos: Xbox 360 Graphics Demystified - Page 4
`Published on 13th Jun 2005, written by Dave Baumann for Consumer Graphics - Last updated: 21st Mar
`2007
`Pixel and eDRAM Operation
`
`Consumer Graphics
`
`©@ NVIDIA Fermi GPU and
`Architecture Analysis
`© ATI Cypress Gaming
`Performance Analysis
`© ATI Cypress GPU and
`Architecture analysis
`& ATI RV740 GPU and
`Architecture Analysis
`= NVIDIA GT200 GPU and
`Architecture Analysis
`
`© Ian Buck - NVIDIA Tesla Launch
`» Was Harry Potter actually into
`rendering? Beyond3D'sfirst
`ever book review
`® Traveiling in Style: Beyond3D's
`C++ AMP contest - WINNER
`ANNOUNCED
`“ Andy Keane Interview & Tesia
`Adoption/Deployment
`» E3 2011: Behind Closed Doors -
`Witcher 2 Xbox 360.,, and a
`Nurse
`© A speculative look on the Wil U
`GPU
`&
`Dave Kirk - NVIDIA Tesla
`Launch
`« AMD's John Bridgman on
`Radeon, Linux and Open Source
`3D
`» Diving into Anti-Aliasing
`©& Q&A with Visceral's Technical
`Art Director Doug Brooks on
`Dead Space 2
`
`https:/ww.beyond3d.com/content/articies/4/4
`
`Despite references to 192 processing elements in to the ROP's within the eDRAM we
`can actually resolve that to equating to 8 pixels writes per cycle, as wel! as having the
`capability to double the Z rate when there are no colour operations. However, as the
`ROP's have been targeted to provide 4x Multi-Sampling FSAA at no penalty this
`equates to a total capability of 32 colour samples or 64 Z and stencil operations per
`cycle.
`
`Most PC graphics processors have to balance their output with the available bandwidth
`and as such their ROP units usually only cater for 2 Multi-Samples per pixe! in a single
`cycle, and the Z output doesn't double with the number of Multi-Samples being
`produced either. Z and colour compression techniques are also employed in order to
`get close to the output capabilities with the bandwidth available. ATI's calculations lead
`to a colour and z bandwidth demand of around 26-134GB/s at 8 pixels with 4x Multi-
`Sampling AA enabled at High Definition TV resolutions. The lower end of that
`bandwidth figure is derived from having 4:1 colour and Z compression, however the
`lossless compression techniques are only optima! when there are no triangle edges
`intersecting a pixel, but with the presumed high geometry detail within a next
`generation console titles the opportunities for achieving this compression ratio across
`the entire frame will be reduced. So, with 256GB/s of bandwidth available in the
`eDRAM frame buffer there should always be sufficient bandwidth for achieving 8 pixels
`per clock with 4x Multi-Sampling FSAA enabled and as such this also means that Xenos
`does not need any lossiess compression routines for Z or colour when writing to the
`eDRAM frame buffer.
`
`So, as far as the operation is concerned, once pixe| data has come through the shader
`array andis ready to be processed into colour values in memory the Z data ofthe pixel
`is matched with the correct colour data coming out of the shaders. Xenos supports an
`“Alpha to Mask" feature, which allows for the use of Multi-Sampling for sort-
`independent translucency. All of this processing is performed on the parent die and the
`pixels are then transferred to the daughter die in the form of source colour perpixel
`and |oss-less compressed Z, per 2x2 pixel quad. The interconnect bandwidth between
`the parent and daughter die is only an eighth of the eDRAM bandwidth because the
`source colour data value is commonto all samples of a pixel here, and the Z is
`compressed, Once on the daughter die the pixels are unpacked to their Multi-Sampie
`level and each sample is driven through their Z and Alpha computations and thefina!
`data is stored on the eDRAM unti! either the entire frameor currenttile (we'll cover this
`in more detail! later) being rendered Is finished.
`
`3
`
`AMD1044_0279912
`AMD1318_0158241
`ATI Ex. 2124
`
`IPR2023-00922
`
`Page 11 of 33
`
`ATI Ex. 2124
`IPR2023-00922
`Page 11 of 33
`
`

`

`7/28/2017
`
`Beyond3D - AT] Xenos: Xbox 360 Graphics Demystified
`When the frameortile has finished rendering, the colour data will then be resolved on
`the daughter die, with the Multi-Samples being blended downto their pixel |evel. The
`resolved buffer information is then passed back from the daughter die to the parent
`which then outputs to system RAM such that, when ail the tiles are finished, this can
`then be outputted to the display device. Although the resolved colour data has to be
`stored in system RAM, which uses some bandwidth during the transfer, the efficiency of
`the write as the resolved data comes out of the daughter die to be written to system
`RAM Is very high. This high efficiency Is due to the fact that it is dealing with a
`significant quantity of non-fragmented data and the busisn't as busy with lots of other
`bandwidth consuming, high frequency and inefficient frame buffer read / write / modify
`operations for the back buffer, This helps in alleviating the fact that the parentdie is
`also handling system memory requests. Also note that data can be written to the
`eDRAM at the sametimeas It Is being cleared from the previous data that resided
`there, meaning there should be [ittie to no walt when removing the previous data from
`the eDRAM ( We've heard comments from developers familiar to both designs that this
`element ofXenos bears similarities to the "Flipper" design for Nintendo's Gamecude, a
`part that was originally designed by ArtX, who ofcourse were subsequently purchase
`by ATI, however ATI are keen to point out that while there may be apparent similarities
`the designs are entirely independentas there are distinct virtual and physical barriers
`between the groups working on the various console developments, past and present,
`and no members of the Flipper architecture team were Involved in Xenos's
`development),
`
`As all the sampling units for frame buffer operations are multiplied to work optimally
`with 4x FSAA this Is actually the maximum modeavailable, Although the developer can
`choose to use 2x or no FSAA,there are no FSAA levels available higher than 4x. The
`sampling pattern is not programmablebutfixed, although it does use a sample pattem
`that doesn't have any of the sample points intersecting one or another on either the
`vertical! or horizontal axis. Although we don't know the exact sample pattern shape, we
`suspect it will be similar to that seen on other sparse sampled / jittered / rotated grid
`FSAA mechanisms we've seen over the past few years, such as this.
`
`The ROP's can handle several different formats, including a special FP10 mode. FPLO is
`a floating point precision mode in the format of 10-10-10-2 (bits for Red, Green, Blue,
`Alpha). The 10 bit colour storage has a 3 bit exponent and 7 bit mantissa, with an
`available range of -32.0 to 32.0, Whilst this mode does have somelimitationsIt can
`offer HDR effects but at the same cost In performance and size as standard 32-bit (8-
`8-8-8) Integer formats which will probably result In this format being used quite
`frequently on XBOX 360 tities. Other formats such as INT16 and FP16 are also
`available, but they obviously have space implications. Like the resolution of the MSAA
`samples, there is a conversion step to changethe front buffer format to a displayable
`8-8-8-8 format when moving the completed frame buffer portion from the eDRAM
`memory out to system RAM.
`
`The ROP's are fully orthogonal so Multi-Sampling can operate with al! pixel formats
`supported.
`
`Render to texture operations will also be rendered out to the eDRAMfirst and then read
`out to UMA memory, when complete, in order to be used as a texture surface for the
`final frame rendering. Render to texture operations can also have Multi-Sample FSAA
`applied and the result can either be resolved on the way out to system memory or kept
`at the high resolution Multi-Sample level. As with standard pixel operations, the eDRAM
`memory can be written to with either another render to texture operation or pixel data
`whilst the data from the previous render to texture is being pushed out to UMA
`memory.
`
`Page Navigation
`<j] Page4
`¥ &
`
`https:/Avww.beyond3d.com/content/articies/4/4
`
`2/3
`
`AMD1318_0158242
`ATI Ex. 2124
`
`IPR2023-00922
`
`Page 12 of 33
`
`ATI Ex. 2124
`IPR2023-00922
`Page 12 of 33
`
`

`

`7/28/2017
`
`Beyond3D - AT! Xenos: Xbox 360 Graphics Demystified
`
`Site width adjust
`
`Reset width
`Privacy Policy
`
`https://www.beyond3d.com/contentarticies/4/4
`
`w3
`
`AMD1044_0279914
`
`AMD1318_0158243
`ATI Ex. 2124
`
`IPR2023-00922
`
`Page 13 of 33
`
`ATI Ex. 2124
`IPR2023-00922
`Page 13 of 33
`
`

`

`7/28/2017
`
`Beyond3D - AT! Xenos: Xbox 360 Graphics Demystified
`
`Beyond3sD Ce)
`
`
`
`3D Tables
`PR
`interviews
`News Reviews Articles
`Forums AboutUs Advertise Contact MyB3D
`
`@
`
`eb
`
`Displays
`Consoles
`Development
`ATI Xenos: Xbox 360 Graphics Demystified - Page 5
`Published on L3th Jun 2005, written by Dave Baumann for Consumer Graphics - Last updated: 21st Mar
`2007
`
`Z-Only Rendering Pass
`
`Some games these days make use of graphics chips abilities to fast reject workload
`based on Z information. Engines such as Doom 3 or Source have the capabilities to, on
`each frame, run a geometry only pass which is for the purpose of pre-filling the Z
`buffer with the fina! Z depths of that frame. When thefull frame is ready to be
`rendered, pixel information that has a higher Z depth than the information in the Z
`buffer is rejected before any pixel operations are carried out on it, meaning that there
`are no pixels written that are wasted due to overdraw. This z-only prepass is expected
`to be commonly used on Xenos as it has additional advantages for tiling, explained
`later.
`
`A geometry pass to populate Z information Is going to gain from a processor that has
`double the Z compare / write units in relation to its pure pixelfill-rate, which Xenos's
`does. However another factor is that this pass is actually going to require geometry
`processing over the vertex shaders. In a traditional shader capable graphics processor
`the number of vertex units can often be many times less than the pixel shader ALU's,
`however in the case of Xenos al! of the shader units will be tasked purely with the
`geometry processing which should also ensure a fast operation of this early Z pass.
`
`As with ATI's current desktop parts, Xenos features a Hierarchical Z buffer. Hierarchical
`2 buffers contain "coarser" Z information than the full resolution Z buffer - usually
`Hierar

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket