`
`https://www.anandtech.com/print/1/19/
`
`ANIANIDIECH
`
`Original Link: https:/Awww.anandtech.com/show/1719
`
`Microsoft's Xbox 360, Sony's PS3 - A Hardware
`Discussion
`
`93
`en
`
`by Anand Lal Shimpi & Derek Wilson on June 24, 2005 4:05 AM EST
`Posted in GPUS
`
`The point of a gaming console is to play games. The PC userin all of us wants to benchmark, overclock and
`upgrade even the unreleased game consoles that were announced at E3, but we can’t. And these sorts of
`limits are healthy, becauseit lets us have a system that we don’t tinker with, that simply performsits function
`and that is to play games.
`
`The game developers are the ones that have to worry about which system is faster, whose hardwareis better
`and what that meansfor the gamesthey develop, but to us, the end users, whether the Xbox 360 hasa faster
`GPU orthe PlayStation 3’s CPU is the best thing since sliced bread doesn’t really matter. At the end of the
`day, it is the games and the overall experience that will sell both of these consoles. You can have the best
`hardware in the world, but if the games and the experience aren’t there, it doesn't really matter.
`
`Despite what we've just said, there is a desire to pick these new next-generation consoles apart. Of courseif
`the gamesareall that matter, why even bother comparing specs, claims or anything about these next-
`generation consoles other than games? Unfortunately, the majority of that analysis seems to be done by the
`manufacturers of the consoles, and fed to the users in an attempt to win early support, and quite a bit of itis
`obviously tainted.
`
`While we would'veliked this to be an article on all three next-generation consoles, the Xbox 360, PlayStation
`3 and Revolution, the fact of the matter is that Nintendo has not released any hardware details abouttheir
`next-gen console, meaning that there’s nothing to talk aboutat this point in time. Leaving us with two
`contenders: Microsoft's Xbox 360, due out by the end of this year, and Sony’s PlayStation 3 due out in Spring
`2006,
`
`This article isn’t here to crown a winneror to even begin to claim whichplatform will have better games,it is
`simply here to answer questions weall have had as well as discuss these new platforms in greater detail than
`we have before.
`
`Before proceeding withthis article, there’s a bit of required reading to really get the most out of it. We
`strongly suggest reading through our Cell processorarticle, as well as our launch coverage of the PlayStation
`3. We would also suggest reading through our Xbox 360 articles for background on Microsoft's console, as
`well as an earlier piece published on multi-threaded game development. Finally, be sure that you're fully up
`to date on the latest GPUs, especially the recently announced NVIDIA GeForce 7800 GTX asit is very closely
`related to the graphics processorin the PS3.
`
`This article isn’t a successor to any of the aforementioned pieces, it just really helps to have an understanding
`of everything we've covered before - and since we don’t wantthis article to be longerthanit alreadyis, we'll
`just point you backthere tofill in the blanks if you find that there are any.
`
`Now, on to the show...
`
`A Prelude on Balance
`
`The most important goal of any platform is balance onall levels. We’ve seen numerous examples of what
`architectural imbalances can do to performance, having toolittle cache or too narrow of a FSB can starve
`high speed CPUs ofdata they need to perform. GPUs without enough memory bandwidth can’t perform
`anywhere neartheir peakfillrates, regardless of what they may be. Achieving a balanced overall platform is a
`very difficult thing on the PC, unless you have an unlimited budget and are able to purchase the fastest
`components. Skimping on your CPU while buying the most expensive graphics card may leave you with
`performancethat’s marginally better, or worse, than someone else with a more balanced system with a faster
`CPU and a somewhat slower GPU.
`
`With consoles however, the entire platform is designed to be balanced out of the box, as best as the
`manufacturer can getit to be, while still remaining within the realm of affordability. The manufacturer is
`responsible for choosing bus widths, CPU architectures, memory bandwidths, GPUs, even downto the type
`of media that will be used by the system - and most importantly, they make sure thatall elements of the
`system are as balanced as can be.
`
`The reasonthis article starts with a prelude on balance is because you should not expect either console
`maker to have put together a horribly imbalanced machine. A company whois already losing money on every
`
`lof 2
`
`ATI Ex. 2136
`
`IPR2023-00922
`Page 1| of 12
`10/4/2022, 10:25 PM
`AMD1318_0154194
`
`ATI Ex. 2136
`IPR2023-00922
`Page 1 of 12
`
`
`
`Microsoft's Xbox 360, Sony's PS3 - A Hardware Discussion - Print View
`
`https://www.anandtech.com/print/1/19/
`
`console sold, will never put faster hardwarein that consoleif it isn’t going to be utilized thanks to an
`imbalancein the platform. So you won't see an overly powerful CPU paired with a fill-rate limited GPU, and
`you definitely won't see a lack of bandwidth to inhibit performance. What you will see is a collection of tools
`that Microsoft and Sony have each, independently, put together for the game developer. Each console has its
`strengths and its weaknesses, but as a whole, each consoleis individually very well balanced. So it would be
`wrong to say that the PlayStation 3’s GPU is more powerful than the Xbox 360’s GPU, because you can’t
`isolate the two and compare them in a vacuum, how they interact with the CPU, with memory, etc... all
`influences the overall performanceofthe platform.
`
`The Consoles and their CPUs
`
`The CPUs atthe heart of these two consolesare very different in architecture approach, despite sharing
`some commonparts. The Xbox 360’s CPU, codenamed Xenon, takes a general purpose approach to
`microprocessor design and implements three general purpose PowerPC cores, meaning they can execute
`any type of code andwill doit relatively well.
`
`The PlayStation 3’s CPU, the Cell processor, pairs a general purpose PowerPC Processing Element (PPE,
`very similar to one core from Xenon) with 7 working Synergistic Processing Elements (SPEs)that are more
`specialized hardware designed to execute certain types of code.
`
`So the comparison between Xenon and Cell really boils down to a comparison between a general purpose
`microprocessor, and a hybrid of general purpose and specialized hardware.
`
`Despite what many havesaid, there is support for Sony’s approach with Cell. We have discussed, in great
`detail, the architecture of the Cell processor already but there is industry support for a general purpose +
`specialized hardware CPU design. Take note of the following slide from Intel’s Platform 2015 vision for their
`CPUs by the year 2015:
`
`Cols
`C
`eR inett teMleled ere
`for
`.
`
`else tinelitethe:Le|pad ied
`Releaed
`:
`U
`eeeae ely
`ereers
`* Scalar cores
`+ Capable of TFLOPS+
`+ Full System-on-Chip
`eTA eee csceed
`embedded.
`
`-
`-
`+ CMP with ~10 cores
`
`* Symmetric multithreading
`
`The use of one or two large general purpose cores combined with specialized hardware and multiple other
`smaller coresis in Intel’s roadmap for the future, despite their harshcriticism of the Cell processor. The
`difference is that Cell appears to be far too early for its time. By 2015 CPUs may be manufactured on as
`small as a 32nm process, significantly smaller than today’s 90nm process, meaning that a lot more hardware
`can be packed into the same amount of space.
`In going with a very forward-looking design, the Cell
`processorarchitects inevitably had to make sacrifices to deal with the fact that the chip they wanted to design
`is years aheadofits time for use in general computation.
`
`Introducing the Xbox 360's Xenon CPU
`
`The Xenon processor was designed from the ground up to be a 3-core CPU, so unlike Cell, there are no
`disabled cores on the Xenonchip itself in order to improve yield. The reason for choosing 3 cores is because
`it provides a good balance between thread execution powerand die size. According to Microsoft's partners,
`the sweetspot for this generation of consoles will be between 4 and 6 execution threads, which is where the
`3-core CPU camefrom.
`
`The chip is builton a 90nm process, much like Cell, and will run at 3.2GHz - alsolike Cell. All of the cores are
`identical to one another, and they are very similar to the PPE usedin the Cell microprocessor, with a few
`modifications.
`
`In
`The focus of Microsoft's additions to the core has been in the expansion of the VMxXinstruction set.
`particular, Microsoft now includes a single cycle dot-product instruction as a part of the VMX-128 ISAthatis
`implemented on each core. Microsoft has stated that there is nothing stopping IBM from incorporating this
`
`2 of 12
`
`ATI Ex. 2136
`
`IPR2023-00922
`Page 2 of 12
`10/4/2022, 10:25 PM
`AMD1318_0154195
`
`ATI Ex. 2136
`IPR2023-00922
`Page 2 of 12
`
`
`
`Microsoft's Xbox 360, Sony's PS3 - A Hardware Discussion - Print View
`
`hitps://www.anandtech.com/print/I/19/
`
`of yet we have not seen anyone from the Cell camp claim support for
`
`support imo other chips, but
`
`cycle cdot-orocucts on the PPE.
`
`My
`cores share a meager IMB L2 cache, wh
`
` s migraie more to multithreaded engines, this small cache w
`develope
`itely become a performance
`
` ©
`
`fe two ihreads
`Harecy
`ty, you effectively have a worst cz
`limiter, ith each core being abi
`
`scenario of 6 threads splitting a 1M6 L2 cache. As a carr
`mi,
`ine current dual core Pentium 4s have a
`
`
`IMB L2 cache per core and that numberis only ex
`in the future.
`
`
`
`
`point of the xhox 36('s Xenon core is the fact that all three cores are identical, and
`The most important selling
`they are all general purpose microproce ors. The developer does not nave to worry about muhi-ihreading
`
`oeyand the
`deting diel cock io be dread
`afe, once bis oak
`
`
`
`The oth
`of the cot
` iad here is that por
`
`portant iring to keep i
`5
`
`
`
` une assembhy
`and the Xbox 360 wil he f
`sed there
`will cbvio
`
`yy have ie be
`ly trivial, Arywhere any|
`
`changes, bul with relatively minor code changes and some time optimizing, code portability between the PC
`
`anthe Xoox 260 will or
`
`
`PowerPC slattorms: there's an architecture switch, but the programming model doesmt change much.
`
`The same cannot however be said for Cell arid the PlayStation @. The easiest way to sart code from the
`
`
`Nbox 386 to the PS3 would be to run the code exciusively on the Cel
`iqle PPE, which cavioushy wouldn't
`offer very good performance for heavily multithreaded tides. But wih a sore effort, the PlayStation 3 does
`
`nave a lot of poter
`
`Xenon vs, Cell
`
`The frst cubic garne demo on the PlayStation 2 was Epic Games’ Unreal Engine 3 at Sony's PS3 press
`
`
`
`
`conference, Tim Sweeney, the founder ant UES
`father o
`nic, performed the demo and helped shed some
`ight on how multithreading can work on the PlayS
`
`According to Tim, a lot of things aren't aporonriate fo
`
`ds dhad “Fortinet
`
` fy these compris
`
` ne and scripdng. But
`
` oe-ti
`
`eaded architecture, so dedicating the CPL to those
`
` KS iS aporopriate,
`CPU time on a& traditional si:
` while the S
`sand GPU do their thing.”
`
`
`cceleration
`
`
`UES, our focus on SPE &
`50 whal does Tim Sweeney see the SPEs being used for in URS? “VY
`
`pussible but requing mare
`idle systems, sound: a few other areas are
`ison phy
`S, animation ugdates,
`
` BY
`Rypentation,”
`sci in Gell is fer more balanced than most we"
`
`
`Tit
`ss view on the PREIS
`peancounieed There ara many
`who see the SPE,
`
` get to why In a moment}, while Inere are
`5 ullerly useless for executing amning (wv
`
`
`others who have been talking about doing far too much on SPEs where the genera
`purpose PPE would do
`much better,
`
`
` SPEs for are fairly
`mand
`oy
`
`able. For exa
`For ihe most part, the areas thal UES uses the C
`
`lized architectur
`
`Slot of s
`
`arming
`oroc:
`rq mah
`se forthe SPEs giverthir redhier spec
`
` calcuked
`But ihe
`
`ate phys
`té
`
`
`given how bre
`
`
`Collision detection is a big part of what is commonly referred to as “game physics.” As the narne ingles,
`Soke, Without co
`
` collision detection simply refers to the gar 2 engine aternining when two objec
`detection, oullets would never
`
`Oars, BES... arming other things
`
`One
`method of implementing collisioa d
`4 Pardboning
`
`ructure of the fee
`(BSP) tree. BSP tree
`$ are created by organ
`
`
`tself doesn't matter to this discu
`d is that to traverse a BSP tree in
`
`
`ssion, bul the important thing to keap int rr
`orderic test for a collision between some chiect and a oclygon in the tree you have to perform a lot of
`You fir FaVverse t
`
`compe
`finding to ind the polygen you want to test for a collision: against.
`Then you he
`
`sec whether ccollision has ¢
` curred between the abiec
`fo perform a nur
`
`
`ss invelves a lot of conditional branching, cade which
`you're comparing and the solygonitself, This
`
`
`dkes iO be ruron @ nigh performance CHO core with @ very qood branch predicior
`
`
`Unforniinately, te SPEs have no branch prediction, so BSF tree traversal will tie Up an SPE for quite a bit of
`
`
`
`e while r
`ecution can
`tr
`not performing very well as each oranch condition has to be evaluatedt before ¢
` bul if would reck
` continue, Howe
`fe to structure collision detectionfor ¢
`ecudan on the SP
`SOS
`
`
`3 different approach to the collision detection aigortivns than what would be normally imolamented on a PC
`or ABOX 240.
`WAH ing on providing examp
`aciually dane, out fs tough geting acce
`information al this stage given that a number of NDAs are sill in place invelving Cell development for the
`
`
`of howiis done, ob
`yy the Epc tearmfound tae SPEs to be
`a good match for ihe
`
`
`mt justc
`
`
`
`ss to detailed
`
`We're
`
`Sof l2
`
`ATI Ex. 2136
`
`IPR2023-00922
`Page 3 of 12
`10/4/2022, 10:23 PM
`AMD1318_0154196
`
`ATI Ex. 2136
`IPR2023-00922
`Page 3 of 12
`
`
`
`Microsoft's Xbox 360, Sony's PS3 - A Hardware Discussion - Print View
`
`https://www.anandtech.com/print/1/19/
`
`others that go unused.
`
`In fact, if properly structured and coded for SPE acceleration, physics code could very well run faster on the
`PlayStation 3 than on the Xbox 360 thanks to the more specialized nature of the SPE hardware. Not to
`mention that physics acceleration is particularly parallelizable, making it a perfect match for an array of 7
`SEES:
`
`Microsoft has referred to the Cell’s array of SPEs as a bunch of DSPsuseless to game developers. The fact
`that the next installment of the Unreal engine will be using the Cell’s SPEs for physics, animation updates,
`particle systems as well as audio processing meansthat Microsoft's definition is a bit off. While not all
`developers will follow in Epic’s footsteps, those that wish to remain competitive and get good performance out
`of the PS3 will haveto.
`
`The bottom line is that Sony would not foolishly spend over 75% of their CPU die budget on SPEsto use
`them for nothing more than fancy DSPs. Architecting a game engine around Cell and optimizing for SPE
`acceleration will take more effort than developing for the Xbox 360 or PC, but it can be done. The question
`then becomes, will developers doit?
`
`In Johan’s Quest for More Processing Powerseries he looked at the developmentallimitations of multi-
`threading, especially as they applied to games. The end result is that multi-threaded game development
`takes between 2 and 3 times longer than conventional single-threaded game development, to add additional
`time in order to restructure elements of your engine to get better performance on the PS3 isn't going to make
`the transition any easier on developers.
`
`WhyIn-Order?
`
`Ever since the Pentium Pro, desktop PC microprocessors have implemented Out of Order (O00) execution
`architectures in order to improve performance. We've explained the idea in great detail before, but the idea is
`that an Out-of-Order microprocessor can reorganizeits instruction stream in orderto bestutilize its execution
`resources. Despite the simplicity of its explanation, implementing support for OoO dramatically increases the
`complexity of a microprocessor, as well as drives up power consumption.
`
`In a perfect world, you could group a bunch of OoO cores onasingle die and offer both excellent single
`threaded performance, as well as great multi-threaded performance. However, the world isn't so perfect, and
`there are limitations to how big a processor’s die can be.
`Intel and AMD can only fit two of their OoO cores
`ona 90nm die, yet the Xbox 360 and PlayStation 3 targeted 3 and 9 cores, respectively, on a 90nm die;
`clearly something has to give, and that something happened to be the complexity of each individual core.
`
`Given a game console’s 5 year expectedlifespan, the decision was made (by both MS and Sony)to favor a
`multi-core platform over a faster single-core CPU in order to remain competitive towards thelatter half of the
`consoles’lifetime.
`
`So with the Xbox 360 Microsoft used three fairly simple IBM PowerPC cores, while Sony has the much
`publicized Cell processorin their PlayStation 3. Both will perform absolutely much slower than even
`mainstream desktop processorsin single threaded game code, but the majority of games these days are far
`more GPU bound than CPU bound, so the performance decreaseisn’t a huge deal.
`In the long run, with a bit
`of optimization and running multi-threaded game engines, these collections of simple in-order cores should be
`able to put out somefairly good performance.
`
`DoesIn-Order Matter?
`
`As we discussed in our Cell article, in-order execution makes a lot of sense for the SPEs. With in-order
`execution as well as a small amount of high speed local memory, memory access becomes quite predictable
`and codeis very easily scheduled by the compiler for the SPEs. However, for the PPE in Cell, and the
`PowerPC coresin Xenon, the in-order approach doesn’t necessarily make a whole lot of sense. You don't
`have the advantage of a cacheless architecture, even though you do havethe ability to force certain items to
`remain untouched by the cache. More than anything having an in-order general purpose core just works to
`simplify the core, at the expense of depending quite a bit on the compiler, and the programmer, to optimize
`performance.
`
`Verylittle of modern day gamesis written in assembly, mostofit is written in a high level languagelike C or
`C++ and the compiler does the dirty work of optimizing the code andtranslating it into low level assembly.
`Compilers are horrendously difficult to write; getting a compiler to workis a pretty difficult job in itself, but
`getting one to workwell, regardless of what the input codeis, is nearly impossible.
`
`However, with a properly designed ISA and a good compiler, having an in-order core to work onis not the end
`of the world. The performanceyoulose by not being able to extract the last bit of instruction level parallelism
`is made up by the fact that you can execute far more threads per clock thanksto the simplicity of the in-order
`cores allowing more to be packed on a die. Unfortunately, as we've already discussed, on day onethat’s not
`going to be muchof an advantage.
`
`The Cell processor’s SPEs are even more of a challenge, as they are more specialized hardware only
`suitable to executing certain types of code. Keeping in mind that the SPEs are not well suited to running
`
`4of l2
`
`ATI Ex. 2136
`
`IPR2023-00922
`Page 4 of 12
`10/4/2022, 10:25 PM
`AMD1318_0154197
`
`ATI Ex. 2136
`IPR2023-00922
`Page 4 of 12
`
`
`
`Microsoft's Xbox 360, Sony's PS3 - A Hardware Discussion - Print View
`
`hitps://www.anandtech.com/print/I/19/
`
`ATI Ex. 2136
`
`IPR2023-00922
`Page 5 of 12
`10/4/2022, 10:23 PM
`AMD1318_0154198
`
`
`orang heavy code, loop unroiling will do a& lotto improve performance as it can significantly reduce the
`number of branches that must be executed.
`In order to squeeze the absolite maximum amount of
`
`performance out of tne SPEs, develooe
`forced to hand code sare routin
`
`5 initial performance
`
`for optimized, compiled SPE cade anpar to befar less than the peak througout.
`num
`
`
`
`
`
`game develooers
`While the moveto in-ander archivsctures wan't cause
`tao much pain with good compl
`their disposal, the move to multi-threaded game develonment and optimizing for te Cell in ganeral will be
`much rnore challenging.
`
`How Many Threads?
`
`
`
`er this year we saw the beginning af a transition fromm very SOE (TRCTORH
`
`
`
`audt-cn
`
`
`
`gns on the PO ¢
`og. Thefull tansition word be conapiete for another¢
`Just asit has begun on the desktop PC side, it also as begun in ihe next-generation of consoles,
`
`E m
`
`Remember that consoles must have a lifespan of around 5 years, so even if ihe multithreaded transition isn’t
`
`2 years, Lis aeq ssary for the
` for anoth
`S consoles
`fo be built arcuad midi
`
`
`goiag io happen with gar
`core Process
`sto suppor the ecosysiern when that transition occurs.
`
`siigle direaded, meaning that in the case cf the Xoox 260, only one
`games are
`The problern is thet today,
`
`Out OF IRS CHE
`chvhen running ¢
`
`
`The PlayStation 3 would fair
`
`
` no better, as ihe Ce fECHON Core to One oF the Mbox 260 cores.
`
`The reas
`a ihis is a problern is because these general purpose cores that make up the Nbox 3660's Xenon
`CPU or the single general purpose PPE mely weak cores, far shaver than a Pentium 4 or
`
`
`Athlon 64, even running at much lower clock sp
`
`
`Ste
`trarisition: £0
`girt tf
` Looking at the Xbox 360 and Pi
`
`
`
`
`port them to PCs. While the majo:
`wnuitithreadad engines with conscies and eventually
`ty of the PC installed
`
`
`
`base today sill runs on single-core orocessors, the install base for both ihe abox S60 and PSS will be
`
`ihan the :
`troduce a multithreaded game engi
`guaranteed to be muhi-core, so what better platform te id
`
`consoles where you can quaramce that all of your users will be able to take advantage of ihe multithreading,
`
`
`Cin the other hane, locking at all of t
`iy demas we've seen of Xbox 360 and PS3 games, rot a singi
`
`
`
`
`H¥er bet
` "Ss or Al thanthe
`
`
`twee
` appears
`ohys
`ngle threaded! games on the PC today, Atk
`
`
`Life 2. Sut nothing that is particularly amazing, earth
`es of ragdoll physics similar to that of H
`seen axamp!
`shattering or shocking. Defiret
`4) nothing that anpears to be leveraging the power of a multicore processor.
`
`
`OS WE'VE
`infact, all of the
`
`generation of GPUs - not she
`
`wer AOLE hawall deyy
`vould be expiniting the & hardware threads suponrted by Xenon,
`fluffy ans
`
`
`
`stead we got a much more down to earth answer.
`
`The majors
`
`
`. develoners have s
`used for all game code, physics and Al and in some ca:
`it cut physics imo a separate
`
`
`ect all
`
`
`
`i generation and even some se
`thread, but for the most part you cane
`nd generationtitles
`to
`
` ove to two hardwar
`tion thre
`
`debut as
`
`
`fs may in fact only be
`threaded games. The m
`sically single
`an atierirto bring geyform ne Ln iG mar with whet can be done of michrange mf figh-end Fo
`
`$ today,
`
`
`
`by executy
`4 single thread running on Xenonisn't going to be very cormpatitive performance wise, especial
`
`SOPs.
`
`code that is particularly well
`suitect to GioO desktop pre:
`
`
`With Microsoft ih
`
`telling us not io expect more than one or two threads of execution to be dedicated
`
`
`Y pve of the Kbox 3260's
`to game code, will the remaining vo cores of the Xenon go unused for ine first year
`
` remaining cores wan'tclrecty be used for gee a G rhormance acceler:
`Whe tf
`
`
`exishen
`Hon, heey worrd
` per threads.
`remainidle ~ enter the Abox G80
`
`nce io additional threads, generated at
`The frsitime we discussed helper threads on AnandTech was
`
`
`eafeich data that the CPU would eventually
`runtime, that could use idle execution resources to go out and pr
`need,
`
`The Xbox 360 wit use a few diferent tyoes of helper threads to not only make the most out of the CPU's
`
`
`
`th
` o help balance the overe
`nerd
`aiforra. Keep iq mind that with de 350, Mice
`SOL
` nes will be stored on. The dual layer BDVI-9
` mereased ihe size of ihe medi
`
`
`a That ce
`spec is stil in effect,
` have ihe same arnournt of storage
`meaning that game develoners shipping tides for the Mbox 360 in 2006 5
`space as they did back in 2001.
`ihven thal current Xbox tiles generally use around 4.58 of space, i's nota
`dig teal, but by 2010 S038 may feel a fet ight.
`EXE C¢
`Thanksto ic
`
`fon power in the S-core Xenon, developers can now perorm real-time decompression of
`gareclatain order to me
`vidi
`
`maximize that S28 of storage. Or, if spaceisr’ much of a concam, develope
`
`
`are now able to use more
`sophisticated encoding algorithms to encode audio/video 1G use ihe same amount of space as they are today,
`vO
`
`
` soft nas alre
`ys 2
`a rauch higher quabty audio and video. M
`Jihad in g
`oO will
`
`
`
`audicvides
`will be another use for the
`sion ¢
`eodec, Trish
`
`
`
`time decompress
`antially useihre WAI HD
`
`
`oof (2
`
`ATI Ex. 2136
`IPR2023-00922
`Page 5 of 12
`
`
`
`Microsoft's Xbox 360, Sony's PS3 - A Hardware Discussion - Print View
`
`hitps://www.anandtech.com/print/I/19/
`
`extra sower of the system.
`
`Another imeresting use will be digiad audio encoding: in the original Xbox Microsoft used a relatively
`axp
`DSP featured in the nForce
`south bridge to perform real-time Dolby Digital Encoding. The feature
`
`
`allewed Microsefi to offer a sin
`
`
`
`ficrosoft to purchase a specie
`
`5 the South Bridge in the Xbox 3
`
`But for the most part, on day 1. you shouldn't expect xbox 360 games te be much more than the same type of
`
`
`i
`as will be the fact that
`he threaded titles we've had on the Pt.
`In fact, the biggest draw to ine newcon
`
`for the first time, we will have the a
`to run games rendered internally af 1280 x 720 on a garne con
`
`
`inpother words, round one of thie next q
`bor af game consoles
`ang to oe 8 GPU bartle.
` ng about the general purp
`
` The impe
`BH hays 6
` ta
`sec? this fact is that Minna
`BOGECLIENT power OF
`
` or With only 1 - 2 threads of execution
`the xbox G60 and how iis 3 time
`ned of the PS2’s Call proc
`
`oeing dedicated for game code, the advantage is oretty march lost at the start of the console battle
`
` ame constraints thal Microsoft does, anci thus
` nere is tess: of a qeecdto oerform rea!
`
`Sony does’t have th
`
`time decompression of game content. Keep in mind thatthe PS3 wil ship with a Blu-ray drive, with Sony's
`50 Me
`
`The PSS wv
`saindmurn disc spec by
`g a hefty 23. 3G5 of storage for asingle
`layer Blt ray cis
`
`
`
`
`
`
`
`SPEs.
`¥y Suited for the ©
`use of H.264 encading for all video content, the decoding of which is verfecti
`
`Audio encoding will also be done on the SPEs, once again as there Is litle need to use anyextra hardware to
`gerfomm a task tiatis perfectly sued for ing SPEs.
`
`
`
`The Xbox 360 GPU: ATs Xenos
`
`On a purely hardware level, ATs Xbox 380 GPU (codenarned Aenos) is quite interesting. The panitse'f is
`made uo of iwe chysically distinet silicon Cs. One iC is the GPU itself, which houses all the shader hardware
`and most of the processing power. The second iO (which AT| refers to as the “daughter die") is a TOME block
`of embecided DRAM (feDRAM combined with ihe hardware necessaryfor z and stencil operations, color and
`
`
`alpha processing, and antl alias
`ag. This daughter die is cormectec to the GPU praner via A S2GB/sec
`interconnect, Data sent over ins bus will se compressed, 50 usable bandwidth will be tigher than 22GR/sac.
`in side the daughter die, betwean the orocessing hardware and ihe eDRAMitself, bandwidth is 256GB/sec.
`
`At this point in time, much of the bandwidth generated by granhics hardware is required to handle color and z
`data moving to the frameouffer. ATL hopes to eliminate this as a bottleneck by moving this orocessing and ihe
`pack fre abubaY off the mein mernory bus. The bus to rain memory is 512MB af 128-bit 7OUMHz GDDR?
`
`(which results iq just over 22GB/sec of bandwidth). This
`$ banchwdth than currant desktop graphics
`cards have available, but by offloading work and bandwidth for color and z to the daughter die, AT! saves
`theraselves a good deal of bancdwichh. The 22GRy/secis left for textures and the rest of ihe system (the Xbox
`implements a single pool of unified memory}
`
`The GPU assentally acts as ihe No
`nike middis of everyihing. From the
`
`
`f The rest of te
`systemis
`7 SOOME/sec of bandwidth up and down. The high Sancdwidth to the CPU is quite useful as the
`
`
`GPU is able to directly read from the L2 cache. In the console world, the
`and the Xbox 260 stands to continue that tradition.
`
`
`
`have been
`hader hardware
`
`
`
`that of currenl desktop graghics hardware. For years, vertex and
`
`ted
`
`, but AF! has s
`
`
`
`implemented separate
`ught te cambine their functionality in a ur
`qader
`architecture.
`a
`
`What's A Unified Shader Architecture?
`
`
`The GPUin the <box G60 uses a differant architecture than we are used to seeing. To be sure, vertex and
`oixel shader programs will rurion ihe part, but not on separate sagrnents of the hardware. Vertex and pixel
`orocessing differ in purpose, but there is quite a bit of overlap in tae type of hardware needed to do both. The
`
`
`unified shader architecture that ATE che
`£0 use in their Abox 860 GPU allows themto pack more
`functionaliy
` onto fewer tra
`
`5 hardware ¢
`
`
`s lobe dunlicavsd for use ia differant parts of ihe
`chip and will run both vertex and shader prograrns on the sarne hardware.
`
`There are 3 paratiel grouns of 16 shader units each. Each of the three grouns can cither operate on vertex or
`
` 1
` ar ope:
`
`
`
`oixel Gata. Each shaker unit:
`e to perform one 4 wide vecior operation and 1 seai
`hon per clock
`ovele, Current AT! narchvare is able to perform two 3 wide vector and two scalar
`onerations per cycle!
`
`
`
`nd one ¢
`me and can coe
`
`
`
`
`
`ne of R420is8 wealone, The vertexpip lar og oixel pipe
`
`we ook ats’ traight up processing power, this
`gves R420 the ability to crunch 158 components (0 of which
`
`
`
`Xbox GPUis able to crunch 240 @20it components
`
`nitand 124 are imited to 24bit precision). The
`
`3 a 51% incre
`
`
`units per
`clock cycle. Where th
`s that can b
`
` swell a5 @ gener re ere
`
`nlpelir
`peing equal,
`£
`Ing al oos/eye
`
`a 24 piped R420.
`
`ATI Ex. 2136
`
`bof l2
`
`IPR2023-00922
`Page 6 of 12
`10/4/2022, 10:23 PM
`AMD1318_0154199
`
`ATI Ex. 2136
`IPR2023-00922
`Page 6 of 12
`
`
`
`Microsoft's Xbox 360, Sony's PS3 - A Hardware Discussion - Print View
`
`hitps://www.anandtech.com/print/I/19/
`
`
`What wil make or break the difference between something like a 24 pioad R420 and the unified shaders of
`
`the Mbox GPU is how well applications will lend themselves to the adaptive nature of the hardware. Current
`
`
`
`earlytie sane
`
`is gq cower as i processing power, This
`configurations don't have
`quite log
`wher we conmsder the fact that gz
`than vertices. For each
`
`
`Ss will need the
`geometrypr
`tive. there are likely a good nur
`
`
`
`«could either be dedic
`ed to
`
`
`
`same ratio of geometry to pixel pow
`means that all the ons per
`
`geometrypros
`Sing in iruly polygon intense scenes. On ihe tip side (and more likely}, any given clock cytle
` could s
`i240 ons be
`if game designers
`
` ag power dedicated to a sing
`accordingly, we COL
`
` on current hardware.
`
`
`
`predicting that developers will use lots of very small triangles in Xbox 360 games. As engine
`
`
`ine 3 have shown incradib
`
`
`augment low
`eSulS USING pix
`shaders and normal rans to
`
` en or tne ega. In other words, will we see
`geometric deta
`we can't tell if ATI is trying to orovide the chick
`
`
` COP
`
`
`on Xbox $60 heca
`Tor he
`se the is
`riangl
`devela
`mary
`re moving in thet dire
`
`wrat wil
`run wellon ATHs hardware?
`road, tis obvious
`Regqardiess of the paths that lead to th
`nat the Xbox 260
`a geomelrypower house,
`
`
`
`
`
`re all 3 blocks of 16 shace
`BU wil be able
`a io became v
`io handle
`
`Not only
`x» Shaders, but A’
`twice as many 7 operations a z only oass is performed, The sarne is true of current AT! and NVEINA
` & £0 O¢
`
`narchvare, but the fact that a geometry only pass can row make
`arform: 48 vector
`
`ar ORerationsin any given clock cycle while doing twice the z operations is quik inviguing, This
`and 43
`
`
`
`could allow some very geornetricaily comolicaied scenes.
`
`inside the Kenos GPU
`AS previo ly mentioned, the 48 shaders will be able to run eiher vertex or piel shader programs in any
`
`
`
`given chock cycie.
`To clarify,
`gach block af 16 sha
`
`
`shader units will function on a sighth higher than DXS.0c, butin order to take advantage of the technology,
`AT! and Microsoft will have to customize the API,
`
`inorder to get data into ihe shader units, textures ave read from main memory. The eORAM of the system is
`unable to assist with textu
`
`g. There are 16
`near fitered texture samplers,