Two months away from the release of the next generation consoles, many have already made up their minds about which machine offers more gaming power before a single game has been released. Compare basic graphics and memory bandwidth specs side-by-side and it looks like a wash - PlayStation 4 comprehensively bests Xbox One to such a degree that sensible discussion of the respective merits of both consoles seems impossible. They're using the same core AMD technologies, only Sony has faster memory and a much larger graphics chip. But is it really that simple?

In the wake of stories from unnamed sources suggesting that PS4 has a significant advantage over its Xbox counterpart, Microsoft wanted to set the record straight. Last Tuesday, Digital Foundry dialled into a conference call to talk with two key technical personnel behind the Xbox One project - passionate engineers who wanted the opportunity to put their story across in a deep-dive technical discussion where all the controversies could be addressed. Within moments of the conversation starting, it quickly became clear that balance would be the theme.

"For designing a good, well-balanced console you really need to be considering all the aspects of software and hardware. It's really about combining the two to achieve a good balance in terms of performance," says Microsoft technical fellow Andrew Goossen.

"We're actually very pleased to have the opportunity to talk with you about the design. There's a lot of misinformation out there and a lot of people who don't get it - we're actually extremely proud of our design. We think we have very good balance, very good performance, we have a product which can handle things other than just raw ALU [GPU compute power]. There are also quite a number of other design aspects and requirements that we put in around things like latency, steady frame-rates and that the titles aren't interrupted by the system and other things like that. You'll see this very much as a pervasive ongoing theme in our system design."

Xbox One: additional processors and the audio block Microsoft's recent Hot Chips 25 presentation on the Xbox One processor suggested that the chip had 15 processors on-board. We were curious as to how that broke down. "On the SoC, there are many parallel engines - some of those are more like CPU cores or DSP cores. How we count to fifteen: [we have] eight inside the audio block, four move engines, one video encode, one video decode and one video compositor/resizer," says Nick Baker. "The audio block is completely unique. That was designed by us in-house. It's based on four tensilica DSP cores and several programmable processing engines. We break it up as one core running control, two cores running a lot of vector code for speech and one for general purpose DSP. We couple that with sample rate conversion, filtering, mixing, equalisation, dynamic range compensation then also the XMA audio block. The goal was to run 512 simultaneous voices for game audio as well as being able to do speech pre-processing for Kinect." But to what extent will this hardware actually see utilisation, especially in cross-platform games? "So a lot of what we've designed for the system and the system reservation is to offload a lot of the work from the title and onto the system. You have to keep in mind that this is doing a bunch of work that is actually on behalf of the title," says Andrew Goossen. "We're taking on the voice recognition mode in our system reservations whereas other platforms will have that as code that developers will have to link in and pay out of from their budget. Same thing with Kinect and most of our NUI [Natural User Interface] features are provided free for the games - also the Game DVR."

"Andrew said it pretty well: we really wanted to build a high performance, power-efficient box," adds hardware architecture team manager Nick Baker. "We really wanted to make it relevant to the modern living room. Talking about AV, we're the only ones to put in an AV in and out to make it media hardware that's the centre of your entertainment."

We've seen the Xbox One dash and the media functions are pretty cool, but first and foremost, it's all about the games. It's safe to say that there are two major areas of controversy surrounding the Xbox One design - specifically the areas in which it is considered weaker than the PlayStation 4: the memory set-up and the amount of GPU power on tap. Both systems have 8GB of RAM, but Sony chose 8GB of wide, fast GDDR5 with 176GB/s of peak throughput while Microsoft opted for DDR3, with a maximum rated bandwidth of just 68GB/s - clearly significantly lower. However, this is supplemented by on-chip ESRAM, which tops out at 204GB/s. In theory then, while marshalling and dividing resources between the two memory pools will be a factor, Xbox One clearly has its own approach for ensuring adequate bandwidth across the system.

Until we get our hands on the final hardware, Wired's internal photography of a pre-production Xbox One remains our only look inside the box. Rumour-mongers should note the lack of discrete GPU - there's to be no last minute addition of extra processing hardware. All of Xbox One's major systems are built into the single chip on the right, which is surrounded by the 2133MHz DDR3 modules.

Memory management is one of the most divisive points that separate the two systems. The question must surely be that if GDDR5 is the preferred set-up, why didn't Microsoft choose it? Still cash-rich to the extreme, clearly the firm could afford to pay the premium for GDDR5. We wondered whether it was fair to assume that this higher bandwidth RAM was ruled out very early on in the production process, and if so, why?

"Yeah, I think that's right. In terms of getting the best possible combination of performance, memory size, power, the GDDR5 takes you into a little bit of an uncomfortable place," says Nick Baker. "Having ESRAM costs very little power and has the opportunity to give you very high bandwidth. You can reduce the bandwidth on external memory - that saves a lot of power consumption and the commodity memory is cheaper as well so you can afford more. That's really a driving force behind that... if you want a high memory capacity, relatively low power and a lot of bandwidth there are not too many ways of solving that."

The combined system bandwidth controversy Baker is keen to tackle the misconception that the team has created a design that cannot access its ESRAM and DDR3 memory pools simultaneously. Critics say that they're adding the available bandwidths together to inflate their figures and that this simply isn't possible in a real-life scenario. "You can think of the ESRAM and the DDR3 as making up eight total memory controllers, so there are four external memory controllers (which are 64-bit) which go to the DDR3 and then there are four internal memory controllers that are 256-bit that go to the ESRAM. These are all connected via a crossbar and so in fact it will be true that you can go directly, simultaneously to DRAM and ESRAM," he explains. The controversy surrounding ESRAM has taken the design team very much by surprise. The notion that Xbox One is difficult to work with is perhaps quite hard to swallow for the same team that produced Xbox 360 - by far and away the easier console to develop for, especially so in the early years of the current console generation. "This controversy is rather surprising to me, especially when you view as ESRAM as the evolution of eDRAM from the Xbox 360. No-one questions on the Xbox 360 whether we can get the eDRAM bandwidth concurrent with the bandwidth coming out of system memory. In fact, the system design required it," explains Andrew Goossen. "We had to pull over all of our vertex buffers and all of our textures out of system memory concurrent with going on with render targets, colour, depth, stencil buffers that were in eDRAM. Of course with Xbox One we're going with a design where ESRAM has the same natural extension that we had with eDRAM on Xbox 360, to have both going concurrently. It's a nice evolution of the Xbox 360 in that we could clean up a lot of the limitations that we had with the eDRAM. "The Xbox 360 was the easiest console platform to develop for, it wasn't that hard for our developers to adapt to eDRAM, but there were a number of places where we said, 'gosh, it would sure be nice if an entire render target didn't have to live in eDRAM' and so we fixed that on Xbox One where we have the ability to overflow from ESRAM into DDR3, so the ESRAM is fully integrated into our page tables and so you can kind of mix and match the ESRAM and the DDR memory as you go... From my perspective it's very much an evolution and improvement - a big improvement - over the design we had with the Xbox 360. I'm kind of surprised by all this, quite frankly." "The Xbox 360 was the easiest console platform to develop for, it wasn't that hard for our developers to adapt to eDRAM... [ESRAM] is very much an evolution and improvement... over the design we had with the Xbox 360." Gallery: The recent Hot Chips 25 presentation at Stanford University saw Microsoft giving a more in-depth presentation on the Xbox One processor and Kinect - you can see the processor-specific elements of the talk here. This content is hosted on an external platform, which will only display it if you accept targeting cookies. Please enable cookies to view. Manage cookie settings Indeed, the level of coherence between the ESRAM and the DDR3 memory pools sounds much more flexible than many previously thought. Many believed that the 32MB of ESRAM is a hard limit for render targets - so can developers really "mix and match" as Goossen suggests? "Oh, absolutely. And you can even make it so that portions of our your render target that have very little overdraw... for example if you're doing a racing game and your sky has very little overdraw, you could stick those sub-sets of your resources into DDR to improve ESRAM utilisation," he says, while also explaining that custom formats have been implemented to get more out of that precious 32MB. "On the GPU we added some compressed render target formats like our 6e4 [6 bit mantissa and 4 bits exponent per component] and 7e3 HDR float formats [where the 6e4 formats] that were very, very popular on Xbox 360, which instead of doing a 16-bit float per component 64bpp render target, you can do the equivalent with us using 32 bits - so we did a lot of focus on really maximising efficiency and utilisation of that ESRAM."

How ESRAM bandwidth doubled in production hardware Further scepticism surrounds the sudden leap in ESRAM's bandwidth from an initial 102GB/s to where it is now - 204GB/s. We ran the story first based on a developer leak of a blog post the Microsoft tech team wrote back in April, but sections of "the internet" were not convinced. Critics say that the numbers don't add up. So how did the massive increase in bandwidth come about? "When we started, we wrote a spec," explains Nick Baker. "Before we really went into any implementation details, we had to give developers something to plan around before we had the silicon, before we even had it running in simulation before tape-out, and said that the minimum bandwidth we want from the ESRAM is 102GB/s. That became 109GB/s [with the GPU speed increase]. In the end, once you get into implementing this, the logic turned out that you could go much higher." The big revelation was that ESRAM could actually read and write at the same time, a statement that seemingly came out of the blue. Some believed that based on the available information from the leaked whitepapers, this simply wasn't possible. "There are four 8MB lanes, but it's not a contiguous 8MB chunk of memory within each of those lanes. Each lane, that 8MB is broken down into eight modules. This should address whether you can really have read and write bandwidth in memory simultaneously," says Baker. How ESRAM bandwidth is calculated Memory bandwidth for next-gen consoles is clearly a hot topic in tech discussions. GDDR5 is a known technology and its capabilities in terms of throughput are well-known. ESRAM is a different matter entirely, and especially after Microsoft massively revised Xbox One's bandwidth figures upwards, there have been demands for the tech team to show their calculations. Here, Nick Baker does just that: "[ESRAM has four memory controllers and each lane] is 256-bit making up a total of 1024 bits and that in each direction. 1024 bits for write will give you a max of 109GB/s and then there's separate read paths again running at peak would give you 109GB/s. "What is the equivalent bandwidth of the ESRAM if you were doing the same kind of accounting that you do for external memory? With DDR3 you pretty much take the number of bits on the interface, multiply by the speed and that's how you get 68GB/s. That equivalent on ESRAM would be 218GB/s. However just like main memory, it's rare to be able to achieve that over long periods of time so typically an external memory interface you run at 70-80 per cent efficiency. "The same discussion with ESRAM as well - the 204GB/s number that was presented at Hot Chips is taking known limitations of the logic around the ESRAM into account. You can't sustain writes for absolutely every single cycle. The writes is known to insert a bubble [a dead cycle] occasionally... one out of every eight cycles is a bubble so that's how you get the combined 204GB/s as the raw peak that we can really achieve over the ESRAM. And then if you say what can you achieve out of an application - we've measured about 140-150GB/s for ESRAM. "That's real code running. That's not some diagnostic or some simulation case or something like that. That is real code that is running at that bandwidth. You can add that to the external memory and say that that probably achieves in similar conditions 50-55GB/s and add those two together you're getting in the order of 200GB/s across the main memory and internally." So 140GB-150GB is a realistic target and DDR3 bandwidth can really be added on top? "Yes. That's been measured." "Yes you can - there are actually a lot more individual blocks that comprise the whole ESRAM so you can talk to those in parallel. Of course if you're hitting the same area over and over and over again, you don't get to spread out your bandwidth and so that's one of the reasons why in real testing you get 140-150GB/s rather than the peak 204GB/s... it's not just four chunks of 8MB memory. It's a lot more complicated than that and depending on how the pattern you get to use those simultaneously. That's what lets you do read and writes simultaneously. You do get to add the read and write bandwidth as well adding the read and write bandwidth on to the main memory. That's just one of the misconceptions we wanted to clean up." Goossens lays down the bottom line: "If you're only doing a read you're capped at 109GB/s, if you're only doing a write you're capped at 109GB/s," he says. "To get over that you need to have a mix of the reads and the writes but when you are going to look at the things that are typically in the ESRAM, such as your render targets and your depth buffers, intrinsically they have a lot of read-modified writes going on in the blends and the depth buffer updates. Those are the natural things to stick in the ESRAM and the natural things to take advantage of the concurrent read/writes." Microsoft's argument seems pretty straightforward then. In theory, Xbox One's circa 200GB/s of "real-life" bandwidth trumps PS4's 176GB/s peak throughput. The question is just to what extent channelling resources through the relatively tiny 32MB of the much faster ESRAM is going to cause issues for developers. Microsoft's point is that game-makers have experience of this already owing to the eDRAM set-up on Xbox 360 - and ESRAM is the natural evolution of the same system. This content is hosted on an external platform, which will only display it if you accept targeting cookies. Please enable cookies to view. Manage cookie settings Microsoft says that game performance doesn't scale with the number of compute units you have. We put that theory to the test by comparing a 2GB Radeon 7850 with a 2GB Radeon 7870 XT, both downclocked to 600MHz (to more accurately reflect compute power of the two systems in the initial specs) and with identical memory bandwidth. Across ten tests we found that 50 per cent more compute power actually yielded an average of 24 per cent improvement in game frame-rates. Memory bandwidth is one thing, but graphics capability is clearly another. PlayStation 4 enjoys a clear advantage in terms of on-board GPU compute units - a raw stat that is beyond doubt, and in turn offers a huge boost to PS4's enviable spec sheet. Andrew Goossen first confirms that both Xbox One and PS4 graphics tech is derived from the same AMD "Island" family before addressing the Microsoft console's apparent GPU deficiency in depth. "Just like our friends we're based on the Sea Islands family. We've made quite a number of changes in different parts of the areas... The biggest thing in terms of the number of compute units, that's been something that's been very easy to focus on. It's like, hey, let's count up the number of CUs, count up the gigaflops and declare the winner based on that. My take on it is that when you buy a graphics card, do you go by the specs or do you actually run some benchmarks?" he says. "Firstly though, we don't have any games out. You can't see the games. When you see the games you'll be saying, 'what is the performance difference between them'. The games are the benchmarks. We've had the opportunity with the Xbox One to go and check a lot of our balance. The balance is really key to making good performance on a games console. You don't want one of your bottlenecks being the main bottleneck that slows you down."

Tweaking Xbox One balance and performance Microsoft's approach was to go into production knowing that there'd be some headroom for increasing performance from the final silicon. Goossen describes it as "under-tweaking" the system. Actual in-production games were then used to determine how to make use of the available headroom. "Balance is so key to real effective performance. It's been really nice on Xbox One with Nick and his team - the system design folks have built a system where we've had the opportunity to check our balances on the system and make tweaks accordingly," Goossen reveals. "Did we do a good job when we did all of our analysis and simulations a couple of years ago, and guessing where games would be in terms of utilisation. Did we make the right balance decisions back then? And so raising the GPU clock is the result of going in and tweaking our balance." "We knew we had headroom. We didn't know what we wanted to do with it until we had real titles to test on. How much do you increase the GPU by? How much do you increase the CPU by?" asks Nick Baker. "We had the headroom. It's a glorious thing to have on a console launch. Normally you're talking about having to downclock," says Goossen. "We had a once in a lifetime opportunity to go and pick the spots where we wanted to improve the performance and it was great to have the launch titles to use as the way to drive an informed decision on performance improvements we could get out of the headroom." Goossen also reveals that the Xbox One silicon actually contains additional compute units - as we previously speculated. The presence of that redundant hardware (two CUs are disabled on retail consoles) allowed Microsoft to judge the importance of compute power versus clock-speed: "Every one of the Xbox One dev kits actually has 14 CUs on the silicon. Two of those CUs are reserved for redundancy in manufacturing, but we could go and do the experiment - if we were actually at 14 CUs what kind of performance benefit would we get versus 12? And if we raised the GPU clock what sort of performance advantage would we get? And we actually saw on the launch titles - we looked at a lot of titles in a lot of depth - we found that going to 14 CUs wasn't as effective as the 6.6 per cent clock upgrade that we did." Assuming level scaling of compute power with the addition of two extra CUs, the maths may not sound right here, but as our recent analysis - not to mention PC benchmarks - reveals, AMD compute units don't scale in a linear fashion. There's a law of diminishing returns. "Every one of the Xbox One dev kits actually has 14 CUs on the silicon... And we actually saw on the launch titles... we found that going to 14 CUs wasn't as effective as the 6.6 per cent clock upgrade that we did." Gallery: Microsoft used actual launch titles to refine the balance of the Xbox One system - doubtless, Forza Motorsport 5 would have been one of them. This content is hosted on an external platform, which will only display it if you accept targeting cookies. Please enable cookies to view. Manage cookie settings "Everybody knows from the internet that going to 14 CUs should have given us almost 17 per cent more performance," he says, "but in terms of actual measured games - what actually, ultimately counts - is that it was a better engineering decision to raise the clock. There are various bottlenecks you have in the pipeline that can cause you not to get the performance you want if your design is out of balance." "Increasing the frequency impacts the whole of the GPU whereas adding CUs beefs up shaders and ALU," interjects Nick Baker. "Right. By fixing the clock, not only do we increase our ALU performance, we also increase our vertex rate, we increase our pixel rate and ironically increase our ESRAM bandwidth," continues Goossen. "But we also increase the performance in areas surrounding bottlenecks like the drawcalls flowing through the pipeline, the performance of reading GPRs out of the GPR pool, etc. GPUs are giantly complex. There's gazillions of areas in the pipeline that can be your bottleneck in addition to just ALU and fetch performance."