`The point of a gaming console is to play games. The PC userin all of us wants to benchmark, overclock and
`upgrade even the unreleased game consoles that were announced at E3, but we can’t. And these sorts of
`limits are healthy, becauseit lets us have a system that we don’t tinker with, that simply performsits function
`and that is to play games.
`The game developers are the ones that have to worry about which system is faster, whose hardwareis better
`and what that meansfor the gamesthey develop, but to us, the end users, whether the Xbox 360 hasa faster
`GPU orthe PlayStation 3’s CPU is the best thing since sliced bread doesn’t really matter. At the end of the
`day, it is the games and the overall experience that will sell both of these consoles. You can have the best
`hardware in the world, but if the games and the experience aren’t there, it doesn't really matter.
`Despite what we've just said, there is a desire to pick these new next-generation consoles apart. Of courseif
`the gamesareall that matter, why even bother comparing specs, claims or anything about these next-
`generation consoles other than games? Unfortunately, the majority of that analysis seems to be done by the
`manufacturers of the consoles, and fed to the users in an attempt to win early support, and quite a bit of itis
`obviously tainted.
`While we would'veliked this to be an article on all three next-generation consoles, the Xbox 360, PlayStation
`3 and Revolution, the fact of the matter is that Nintendo has not released any hardware details abouttheir
`next-gen console, meaning that there’s nothing to talk aboutat this point in time. Leaving us with two
`contenders: Microsoft's Xbox 360, due out by the end of this year, and Sony’s PlayStation 3 due out in Spring
`This article isn’t here to crown a winneror to even begin to claim whichplatform will have better games,it is
`simply here to answer questions weall have had as well as discuss these new platforms in greater detail than
`we have before.
`Before proceeding withthis article, there’s a bit of required reading to really get the most out of it. We
`strongly suggest reading through our Cell processorarticle, as well as our launch coverage of the PlayStation
`3. We would also suggest reading through our Xbox 360 articles for background on Microsoft's console, as
`well as an earlier piece published on multi-threaded game development. Finally, be sure that you're fully up
`to date on the latest GPUs, especially the recently announced NVIDIA GeForce 7800 GTX asit is very closely
`related to the graphics processorin the PS3.
`This article isn’t a successor to any of the aforementioned pieces, it just really helps to have an understanding
`of everything we've covered before - and since we don’t wantthis article to be longerthanit alreadyis, we'll
`just point you backthere tofill in the blanks if you find that there are any.
`Now, on to the show...
`A Prelude on Balance
`The most important goal of any platform is balance onall levels. We’ve seen numerous examples of what
`architectural imbalances can do to performance, having toolittle cache or too narrow of a FSB can starve
`high speed CPUs ofdata they need to perform. GPUs without enough memory bandwidth can’t perform
`anywhere neartheir peakfillrates, regardless of what they may be. Achieving a balanced overall platform is a
`very difficult thing on the PC, unless you have an unlimited budget and are able to purchase the fastest
`components. Skimping on your CPU while buying the most expensive graphics card may leave you with
`performancethat’s marginally better, or worse, than someone else with a more balanced system with a faster
`CPU and a somewhat slower GPU.
`With consoles however, the entire platform is designed to be balanced out of the box, as best as the
`manufacturer can getit to be, while still remaining within the realm of affordability. The manufacturer is
`responsible for choosing bus widths, CPU architectures, memory bandwidths, GPUs, even downto the type
`of media that will be used by the system - and most importantly, they make sure thatall elements of the
`system are as balanced as can be.
`The reasonthis article starts with a prelude on balance is because you should not expect either console
`maker to have put together a horribly imbalanced machine. A company whois already losing money on every
`Microsoft's Xbox 360, Sony's PS3 - A Hardware Discussion - Print View
`console sold, will never put faster hardwarein that consoleif it isn’t going to be utilized thanks to an
`imbalancein the platform. So you won't see an overly powerful CPU paired with a fill-rate limited GPU, and
`you definitely won't see a lack of bandwidth to inhibit performance. What you will see is a collection of tools
`that Microsoft and Sony have each, independently, put together for the game developer. Each console has its
`strengths and its weaknesses, but as a whole, each consoleis individually very well balanced. So it would be
`wrong to say that the PlayStation 3’s GPU is more powerful than the Xbox 360’s GPU, because you can’t
`isolate the two and compare them in a vacuum, how they interact with the CPU, with memory, etc... all
`influences the overall performanceofthe platform.
`The Consoles and their CPUs
`The CPUs atthe heart of these two consolesare very different in architecture approach, despite sharing
`some commonparts. The Xbox 360’s CPU, codenamed Xenon, takes a general purpose approach to
`microprocessor design and implements three general purpose PowerPC cores, meaning they can execute
`any type of code andwill doit relatively well.
`The PlayStation 3’s CPU, the Cell processor, pairs a general purpose PowerPC Processing Element (PPE,
`very similar to one core from Xenon) with 7 working Synergistic Processing Elements (SPEs)that are more
`specialized hardware designed to execute certain types of code.
`So the comparison between Xenon and Cell really boils down to a comparison between a general purpose
`microprocessor, and a hybrid of general purpose and specialized hardware.
`Despite what many havesaid, there is support for Sony’s approach with Cell. We have discussed, in great
`detail, the architecture of the Cell processor already but there is industry support for a general purpose +
`specialized hardware CPU design. Take note of the following slide from Intel’s Platform 2015 vision for their
`CPUs by the year 2015:
`others that go unused.
`In fact, if properly structured and coded for SPE acceleration, physics code could very well run faster on the
`PlayStation 3 than on the Xbox 360 thanks to the more specialized nature of the SPE hardware. Not to
`mention that physics acceleration is particularly parallelizable, making it a perfect match for an array of 7
`Microsoft has referred to the Cell’s array of SPEs as a bunch of DSPsuseless to game developers. The fact
`that the next installment of the Unreal engine will be using the Cell’s SPEs for physics, animation updates,
`particle systems as well as audio processing meansthat Microsoft's definition is a bit off. While not all
`developers will follow in Epic’s footsteps, those that wish to remain competitive and get good performance out
`of the PS3 will haveto.
`The bottom line is that Sony would not foolishly spend over 75% of their CPU die budget on SPEsto use
`them for nothing more than fancy DSPs. Architecting a game engine around Cell and optimizing for SPE
`acceleration will take more effort than developing for the Xbox 360 or PC, but it can be done. The question
`then becomes, will developers doit?
`In Johan’s Quest for More Processing Powerseries he looked at the developmentallimitations of multi-
`threading, especially as they applied to games. The end result is that multi-threaded game development
`takes between 2 and 3 times longer than conventional single-threaded game development, to add additional
`time in order to restructure elements of your engine to get better performance on the PS3 isn't going to make
`the transition any easier on developers.
`Ever since the Pentium Pro, desktop PC microprocessors have implemented Out of Order (O00) execution
`architectures in order to improve performance. We've explained the idea in great detail before, but the idea is
`that an Out-of-Order microprocessor can reorganizeits instruction stream in orderto bestutilize its execution
`resources. Despite the simplicity of its explanation, implementing support for OoO dramatically increases the
`complexity of a microprocessor, as well as drives up power consumption.
`In a perfect world, you could group a bunch of OoO cores onasingle die and offer both excellent single
`threaded performance, as well as great multi-threaded performance. However, the world isn't so perfect, and
`there are limitations to how big a processor’s die can be.
`Intel and AMD can only fit two of their OoO cores
`ona 90nm die, yet the Xbox 360 and PlayStation 3 targeted 3 and 9 cores, respectively, on a 90nm die;
`clearly something has to give, and that something happened to be the complexity of each individual core.
`Given a game console’s 5 year expectedlifespan, the decision was made (by both MS and Sony)to favor a
`multi-core platform over a faster single-core CPU in order to remain competitive towards thelatter half of the
`So with the Xbox 360 Microsoft used three fairly simple IBM PowerPC cores, while Sony has the much
`publicized Cell processorin their PlayStation 3. Both will perform absolutely much slower than even
`mainstream desktop processorsin single threaded game code, but the majority of games these days are far
`more GPU bound than CPU bound, so the performance decreaseisn’t a huge deal.
`In the long run, with a bit
`of optimization and running multi-threaded game engines, these collections of simple in-order cores should be
`able to put out somefairly good performance.
`DoesIn-Order Matter?
`As we discussed in our Cell article, in-order execution makes a lot of sense for the SPEs. With in-order
`execution as well as a small amount of high speed local memory, memory access becomes quite predictable
`and codeis very easily scheduled by the compiler for the SPEs. However, for the PPE in Cell, and the
`PowerPC coresin Xenon, the in-order approach doesn’t necessarily make a whole lot of sense. You don't
`have the advantage of a cacheless architecture, even though you do havethe ability to force certain items to
`remain untouched by the cache. More than anything having an in-order general purpose core just works to
`simplify the core, at the expense of depending quite a bit on the compiler, and the programmer, to optimize
`Verylittle of modern day gamesis written in assembly, mostofit is written in a high level languagelike C or
`C++ and the compiler does the dirty work of optimizing the code andtranslating it into low level assembly.
`Compilers are horrendously difficult to write; getting a compiler to workis a pretty difficult job in itself, but
`getting one to workwell, regardless of what the input codeis, is nearly impossible.
`However, with a properly designed ISA and a good compiler, having an in-order core to work onis not the end
`of the world. The performanceyoulose by not being able to extract the last bit of instruction level parallelism
`is made up by the fact that you can execute far more threads per clock thanksto the simplicity of the in-order
`cores allowing more to be packed on a die. Unfortunately, as we've already discussed, on day onethat’s not
`going to be muchof an advantage.
`The Cell processor’s SPEs are even more of a challenge, as they are more specialized hardware only
`suitable to executing certain types of code. Keeping in mind that the SPEs are not well suited to running
`Microsoft's Xbox 360, Sony's PS3 - A Hardware Discussion - Print View
`Microsoft's Xbox 360, Sony's PS3 - A Hardware Discussion - Print View
`extra sower of the system.
`Another imeresting use will be digiad audio encoding: in the original Xbox Microsoft used a relatively
`DSP featured in the nForce
`south bridge to perform real-time Dolby Digital Encoding. The feature
`allewed Microsefi to offer a sin
`ficrosoft to purchase a specie
`5 the South Bridge in the Xbox 3
`But for the most part, on day 1. you shouldn't expect xbox 360 games te be much more than the same type of
`as will be the fact that
`he threaded titles we've had on the Pt.
`In fact, the biggest draw to ine newcon
`for the first time, we will have the a
`to run games rendered internally af 1280 x 720 on a garne con
`inpother words, round one of thie next q
`bor af game consoles
`ang to oe 8 GPU bartle.
` ng about the general purp
` The impe
`BH hays 6
` ta
`sec? this fact is that Minna
` or With only 1 - 2 threads of execution
`the xbox G60 and how iis 3 time
`ned of the PS2’s Call proc
`oeing dedicated for game code, the advantage is oretty march lost at the start of the console battle
` ame constraints thal Microsoft does, anci thus
` nere is tess: of a qeecdto oerform rea!
`Sony does’t have th
`time decompression of game content. Keep in mind thatthe PS3 wil ship with a Blu-ray drive, with Sony's
`50 Me
`The PSS wv
`saindmurn disc spec by
`g a hefty 23. 3G5 of storage for asingle
`layer Blt ray cis
`¥y Suited for the ©
`use of H.264 encading for all video content, the decoding of which is verfecti
`Audio encoding will also be done on the SPEs, once again as there Is litle need to use anyextra hardware to
`gerfomm a task tiatis perfectly sued for ing SPEs.
`The Xbox 360 GPU: ATs Xenos
`On a purely hardware level, ATs Xbox 380 GPU (codenarned Aenos) is quite interesting. The panitse'f is
`made uo of iwe chysically distinet silicon Cs. One iC is the GPU itself, which houses all the shader hardware
`and most of the processing power. The second iO (which AT| refers to as the “daughter die") is a TOME block
`of embecided DRAM (feDRAM combined with ihe hardware necessaryfor z and stencil operations, color and
`alpha processing, and antl alias
`ag. This daughter die is cormectec to the GPU praner via A S2GB/sec
`interconnect, Data sent over ins bus will se compressed, 50 usable bandwidth will be tigher than 22GR/sac.
`in side the daughter die, betwean the orocessing hardware and ihe eDRAMitself, bandwidth is 256GB/sec.
`At this point in time, much of the bandwidth generated by granhics hardware is required to handle color and z
`data moving to the frameouffer. ATL hopes to eliminate this as a bottleneck by moving this orocessing and ihe
`pack fre abubaY off the mein mernory bus. The bus to rain memory is 512MB af 128-bit 7OUMHz GDDR?
`(which results iq just over 22GB/sec of bandwidth). This
`$ banchwdth than currant desktop graphics
`cards have available, but by offloading work and bandwidth for color and z to the daughter die, AT! saves
`theraselves a good deal of bancdwichh. The 22GRy/secis left for textures and the rest of ihe system (the Xbox
`implements a single pool of unified memory}
`The GPU assentally acts as ihe No
`nike middis of everyihing. From the
`f The rest of te
`7 SOOME/sec of bandwidth up and down. The high Sancdwidth to the CPU is quite useful as the
`GPU is able to directly read from the L2 cache. In the console world, the
`and the Xbox 260 stands to continue that tradition.
`have been
`hader hardware
`that of currenl desktop graghics hardware. For years, vertex and
`, but AF! has s
`implemented separate
`ught te cambine their functionality in a ur
`What's A Unified Shader Architecture?
`The GPUin the <box G60 uses a differant architecture than we are used to seeing. To be sure, vertex and
`oixel shader programs will rurion ihe part, but not on separate sagrnents of the hardware. Vertex and pixel
`orocessing differ in purpose, but there is quite a bit of overlap in tae type of hardware needed to do both. The
`unified shader architecture that ATE che
`£0 use in their Abox 860 GPU allows themto pack more
` onto fewer tra
`5 hardware ¢
`s lobe dunlicavsd for use ia differant parts of ihe
`chip and will run both vertex and shader prograrns on the sarne hardware.
`There are 3 paratiel grouns of 16 shader units each. Each of the three grouns can cither operate on vertex or
` 1
` ar ope:
`oixel Gata. Each shaker unit:
`e to perform one 4 wide vecior operation and 1 seai
`hon per clock
`ovele, Current AT! narchvare is able to perform two 3 wide vector and two scalar
`onerations per cycle!
`nd one ¢
`me and can coe
`ne of R420is8 wealone, The vertexpip lar og oixel pipe
`we ook ats’ traight up processing power, this
`gves R420 the ability to crunch 158 components (0 of which
`Xbox GPUis able to crunch 240 @20it components
`nitand 124 are imited to 24bit precision). The
`3 a 51% incre
`units per
`clock cycle. Where th
`s that can b
` swell a5 @ gener re ere
`peing equal,

`Ing al oos/eye
`a 24 piped R420.
`Microsoft's Xbox 360, Sony's PS3 - A Hardware Discussion - Print View
`What wil make or break the difference between something like a 24 pioad R420 and the unified shaders of
`the Mbox GPU is how well applications will lend themselves to the adaptive nature of the hardware. Current
`earlytie sane
`is gq cower as i processing power, This
`configurations don't have
`quite log
`wher we conmsder the fact that gz
`than vertices. For each
`Ss will need the
`tive. there are likely a good nur
`«could either be dedic
`ed to
`same ratio of geometry to pixel pow
`means that all the ons per
`Sing in iruly polygon intense scenes. On ihe tip side (and more likely}, any given clock cytle
` could s
`i240 ons be
`if game designers
` ag power dedicated to a sing
`accordingly, we COL
` on current hardware.
`predicting that developers will use lots of very small triangles in Xbox 360 games. As engine
`ine 3 have shown incradib
`augment low
`eSulS USING pix
`shaders and normal rans to
` en or tne ega. In other words, will we see
`geometric deta
`we can't tell if ATI is trying to orovide the chick
`on Xbox $60 heca
`Tor he
`se the is
`re moving in thet dire
`wrat wil
`run wellon ATHs hardware?
`road, tis obvious
`Regqardiess of the paths that lead to th
`nat the Xbox 260
`a geomelrypower house,
`re all 3 blocks of 16 shace
`BU wil be able
`a io became v
`io handle
`Not only
`x» Shaders, but A’
`twice as many 7 operations a z only oass is performed, The sarne is true of current AT! and NVEINA
` & £0 O¢
`narchvare, but the fact that a geometry only pass can row make
`arform: 48 vector
`ar ORerationsin any given clock cycle while doing twice the z operations is quik inviguing, This
`and 43
`could allow some very geornetricaily comolicaied scenes.
`inside the Kenos GPU
`AS previo ly mentioned, the 48 shaders will be able to run eiher vertex or piel shader programs in any
`given chock cycie.
`To clarify,
`gach block af 16 sha
`shader units will function on a sighth higher than DXS.0c, butin order to take advantage of the technology,
`AT! and Microsoft will have to customize the API,
`inorder to get data into ihe shader units, textures ave read from main memory. The eORAM of the system is
`unable to assist with textu
`g. There are 16
`near fitered texture samplers,

