So here we go - a complete transcript of Digital Foundry's discussions on the Xbox One architecture with two integral members of the team that helped create the hardware. We're looking at around an hour's worth of very dense tech talk here, much of which you will not have seen before.

But first, a little background. How did this opportunity come about? At Gamescom in August, it became clear that Microsoft was looking to adjust its stance on how it talked about its hardware from a technological perspective. Almost certainly this came about owing to an overall spec sheet that does not look too encouraging compared to the equivalent metrics being offered by Sony for the PlayStation 4, and it was clear that gamer interpretations of some of the specs didn't quite square with Microsoft's thinking over its design.

Over and above the upcoming console war though, it's clear that Xbox One has been designed with a very different philosophy in mind, with some ambitious tech powering elements such as concurrent apps and multiple virtual machines. There's a very different approach to GPU compute too - not to mention the whole balance argument. Coming out of the experience, it was clear that this was a story that the architects were passionate about and very much wanted to tell.

That said, Microsoft does have a history in sharing in-depth data on the make-up of its console architectures, and its presentation at Hot Chips 25 this year at Stanford University indicated that the design team were willing to talk in detail about the silicon to a degree beyond what Sony are willing to share - which is perhaps understandable on the PlayStation front when you have a spec sheet that essentially does most of the talking for you.

"For Microsoft, this was an opportunity to explain a design philosophy that core gamers aren't connecting with so easily."

So the question many of you are no doubt asking is, are we looking at a free-flowing technical discussion or a PR exercise? Well, let's not kid ourselves - every interview that reaches publication is some form of public relations for the interviewee and that applies equally whether we're talking to Microsoft, Sony or anybody else. Perhaps the lingering disappointment for us with our Mark Cerny interview was the fact that it quickly became evident he was not going to let us into much that he hadn't already covered elsewhere. It's also fair to say that the impressive specs, well-rounded line-up and a phenomenally well-managed PR strategy have left Sony in a very favourable position, with nothing to prove - for now, at least.

For Microsoft, things are clearly very different. It's a case of explaining a design philosophy that core gamers aren't connecting with so easily, while at the same time getting across the message that the technological prowess of a games console isn't limited just to the compute power of the GPU or the memory set-up - though ironically, in combination with the quality of the development environment, these are the very strengths that allowed Xbox 360 to dominate the early years of the current-gen console battle.

Onto the discussion then - perhaps Digital Foundry's most expansive hardware interview yet, kicking off with the requisite conference call introductions...

Andrew Goossen: My name is Andrew Goossen - I'm a technical fellow at Microsoft. I was one of the architects for the Xbox One. I'm primarily involved with the software side but I've worked a lot with Nick and his team to finalise the silicon. For designing a good, well-balanced console you really need to be considering all the aspects of software and hardware. It's really about combining the two to achieve a good balance in terms of performance. We're actually very pleased to have the opportunity to talk with you about the design. There's a lot of misinformation out there and a lot of people who don't get it. We're actually extremely proud of our design. We think we have very good balance, very good performance, we have a product which can handle things other than just raw ALU. There's also quite a number of other design aspects and requirements that we put in around things like latency, steady frame-rates and that the titles aren't interrupted by the system and other things like that. You'll see this very much as a pervasive ongoing theme in our system design. Nick Baker: I'm Nick Baker, I manage the hardware architecture team. We've worked on pretty much all instances of the Xbox. My team is really responsible for looking at all the available technologies. We're constantly looking to see where graphics are going - we work a lot with Andrew and the DirectX team in terms of understanding that. We have a good relationship with a lot of other companies in the hardware industry and really the organisation looks to us to formulate the hardware, what technology are going to be appropriate for any given point in time. When we start looking at what's the next console going to look like, we're always on top of the roadmap, understanding where that is and how appropriate to combine with game developers and software technology and get that all together. I manage the team. You may have seen John Sell who presented at Hot Chips, he's one of my organisation. Going back even further I presented at Hot Chips with Jeff Andrews in 2005 on the architecture of the Xbox 360. We've been doing this for a little while - as has Andrew. Andrew said it pretty well: we really wanted to build a high-performance, power-efficient box. We really wanted to make it relevant to the modern living room. Talking about AV, we're the only ones to put in an AV in and out to make it media hardware that's the centre of your entertainment. "We really wanted to build a high-performance, power-efficient box. We really wanted to make it relevant to the modern living room."

Xbox One comes from much the same team that designed Xbox 360, shown here in its classic launch format. The design team opted to match a state-of-the-art (for its time) graphics chip with a multi-core approach to the GPU at a time when a single, powerful processor was the vogue in PC design. Digital Foundry: What were your takeaways from your Xbox 360 post-mortem and how did that shape what you wanted to achieve with the Xbox One architecture? Nick Baker: It's hard to pick out a few aspects we can talk about here in a small amount of time. I think one of the key points... We took a few gambles last time around and one of them was to go with a multi-processor approach rather than go with a small number of high IPC [instructions per clock] power-hungry CPU cores. We took the approach of going more parallel with cores more optimised for power/performance area. That worked out pretty well... There are a few things we realised like off-loading audio, we had to tackle that, hence the investment in the audio block. We wanted to have a single chip from the start and get everything as close to memory as possible. Both the CPU and GPU - give everything low latency and high bandwidth - that was the key mantra. Some obvious things we had to deal with - a new configuration of memory, we couldn't really pass pointers from CPU to GPU so we really wanted to address that, heading towards GPGPU, compute shaders. Compression, we invested a lot in that so hence some of the Move Engines, which deal with a lot of the compression there... A lot of focus on GPU capabilities in terms of how that worked. And then really how do you allow the system services to grow over time without impacting title compatibility. The first title of the generation - how do you ensure that that works on the last console ever built while we value-enhance the system-side capabilities. Digital Foundry: You're running multiple systems in a single box, in a single processor. Was that one of the most significant challenges in designing the silicon? Nick Baker: There was lot of bitty stuff to do. We had to make sure that the whole system was capable of virtualisation, making sure everything had page tables, the IO had everything associated with them. Virtualised interrupts.... It's a case of making sure the IP we integrated into the chip played well within the system. Andrew? Andrew Goossen: I'll jump in on that one. Like Nick said there's a bunch of engineering that had to be done around the hardware but the software has also been a key aspect in the virtualisation. We had a number of requirements on the software side which go back to the hardware. To answer your question Richard, from the very beginning the virtualisation concept drove an awful lot of our design. We knew from the very beginning that we did want to have this notion of this rich environment that could be running concurrently with the title. It was very important for us based on what we learned with the Xbox 360 that we go and construct this system that would disturb the title - the game - in the least bit possible and so to give as varnished an experience on the game side as possible but also to innovate on either side of that virtual machine boundary. We can do things like update the operating system on the system side of things while retaining very good compatibility with the portion running on the titles, so we're not breaking back-compat with titles because titles have their own entire operating system that ships with the game. Conversely it also allows us to innovate to a great extent on the title side as well. With the architecture, from SDK to SDK release as an example we can completely rewrite our operating system memory manager for both the CPU and the GPU, which is not something you can do without virtualisation. It drove a number of key areas... Nick talked about the page tables. Some of the new things we have done - the GPU does have two layers of page tables for virtualisation. I think this is actually the first big consumer application of a GPU that's running virtualised. We wanted virtualisation to have that isolation, that performance. But we could not go and impact performance on the title. We constructed virtualisation in such a way that it doesn't have any overhead cost for graphics other than for interrupts. We've contrived to do everything we can to avoid interrupts... We only do two per frame. We had to make significant changes in the hardware and the software to accomplish this. We have hardware overlays where we give two layers to the title and one layer to the system and the title can render completely asynchronously and have them presented completely asynchronously to what's going on system-side. System-side it's all integrated with the Windows desktop manager but the title can be updating even if there's a glitch - like the scheduler on the Windows system side going slower... we did an awful lot of work on the virtualisation aspect to drive that and you'll also find that running multiple system drove a lot of our other systems. We knew we wanted to be 8GB and that drove a lot of the design around our memory system as well.

"With the architecture, from SDK to SDK release as an example we can completely rewrite our operating system memory manager for both the CPU and the GPU, which is not something you can do without virtualisation." The ability to run apps concurrently with the game with zero impact on performance required a significant amount of engineering, but the final result does work very well. What will make or break the system will be the quality of the apps themselves - certainly functions like party set-up and video editing work nicely. Digital Foundry: Were you always targeting 8GB right from the beginning? Andrew Goossen: Yeah I think that was a pretty early decision we made when we were looking at the kind of experiences that we wanted to run concurrently with the title. And how much memory we would need there. That would have been a really early decision for us. Digital Foundry: CPU-side, I'm curious. Why did you choose eight Jaguar cores rather than, say, four Piledriver cores? Is it all about performance per watt? Nick Baker: The extra power and area associated with getting that additional IPC boost going from Jaguar to Piledriver... It's not the right decision to make for a console. Being able to hit the sweet spot of power/performance per area and make it a more parallel problem. That's what it's all about. How we're partitioning cores between the title and the operating system works out as well in that respect. Digital Foundry: Is it essentially the Jaguar IP as is? Or did you customise it? Nick Baker: There had not been a two-cluster Jaguar configuration before Xbox One so there were things that had to be done in order to make that work. We wanted higher coherency between the GPU and the CPU so that was something that needed to be done, that touched a lot of the fabric around the CPU and then looking at how the Jaguar core implemented virtualisation, doing some tweaks there - but nothing fundamental to the ISA or adding instructions or adding instructions like that. Digital Foundry: You talk about having 15 processors. Can you break that down? Nick Baker: On the SoC, there are many parallel engines - some of those are more like CPU cores or DSP cores. How we count to 15: [we have] eight inside the audio block, four move engines, one video encode, one video decode and one video compositor/resizer. The audio block was completely unique. That was designed by us in-house. It's based on four tensilica DSP cores and several programmable processing engines. We break it up as one core running control, two cores running a lot of vector code for speech and one for general purpose DSP. We couple with that sample rate conversion, filtering, mixing, equalisation, dynamic range compensation then also the XMA audio block. The goal was to run 512 simultaneous voices for game audio as well as being able to do speech pre-processing for Kinect. Digital Foundry: There's concern that custom hardware may not be utilised in multi-platform games but I'm assuming that hardware-accelerated functions would be integrated into middlewares and would see wide utilisation. Nick Baker: Yeah, Andrew can talk about the middleware point but some of these things are just reserved for the system to do things like Kinect processing. These are system services we provide. Part of that processing is dedicated to the Kinect. Andrew Goossen: So a lot of what we've designed for the system and the system reservation is to offload a lot of the work from the title and onto the system. You have to keep in mind that this is doing a bunch of work that is actually on behalf of the title. We're taking on the voice recognition mode in our system reservations whereas other platforms will have that as code that developers will have to link in and pay out of from their budget. Same thing with Kinect and most of our NUI [Natural User Interface] features are provided free for the games - also the Game DVR. Digital Foundry: Perhaps the most misunderstood area of the processor is the ESRAM and what it means for game developers. Its inclusion sort of suggests that you ruled out GDDR5 pretty early on in favour of ESRAM in combination with DDR3. Is that a fair assumption? Nick Baker: Yeah, I think that's right. In terms of getting the best possible combination of performance, memory size, power, the GDDR5 takes you into a little bit of an uncomfortable place. Having ESRAM costs very little power and has the opportunity to give you very high bandwidth. You can reduce the bandwidth on external memory - that saves a lot of power consumption as well and the commodity memory is cheaper as well so you can afford more. That's really a driving force behind that. You're right, if you want a high memory capacity, relatively low power and a lot of bandwidth there are not too many ways of solving that. "In terms of getting the best possible combination of performance, memory size, power, the GDDR5 takes you into a little bit of an uncomfortable place. Having ESRAM costs very little power and has the opportunity to give you very high bandwidth." Gallery: Some say that the Xbox One's architecture is complicated in comparison to PlayStation 4. Microsoft itself describes the split memory set-up as the natural evolution of Xbox 360's eDRAM/GDDR3 combination. This content is hosted on an external platform, which will only display it if you accept targeting cookies. Please enable cookies to view. Manage cookie settings Digital Foundry: And there wasn't really any actual guarantee of availability of four-gigabit GDDR5 modules in time for launch. That's the gamble that Sony made which seems to have paid off. Even up until very recently, the PS4 SDK docs still refer to 4GB of RAM. I guess Intel's Haswell with eDRAM is the closest equivalent to what you're doing. Why go for ESRAM rather than eDRAM? You had a lot of success with this on Xbox 360. Nick Baker: It's just a matter of who has the technology available to do eDRAM on a single die. Digital Foundry: So you didn't want to go for a daughter die as you did with Xbox 360? Nick Baker: No, we wanted a single processor, like I said. If there'd been a different time frame or technology options we could maybe have had a different technology there but for the product in the timeframe, ESRAM was the best choice. Digital Foundry: If we look at the ESRAM, the Hot Chips presentation revealed for the first time that you've got four blocks of 8MB areas. How does that work? Nick Baker: First of all, there's been some question about whether we can use ESRAM and main RAM at the same time for GPU and to point out that really you can think of the ESRAM and the DDR3 as making up eight total memory controllers, so there are four external memory controllers (which are 64-bit) which go to the DDR3 and then there are four internal memory controllers that are 256-bit that go to the ESRAM. These are all connected via a crossbar and so in fact it will be true that you can go directly, simultaneously to DRAM and ESRAM. Digital Foundry: Simultaneously? Because there's been a lot of controversy that you're adding your bandwidth together and that you can't do this in a real-life scenario. Nick Baker: Over that interface, each lane - to ESRAM is 256-bit making up a total of 1024 bits and that's in each direction. 1024 bits for write will give you a max of 109GB/s and then there's separate read paths again running at peak would give you 109GB/s. What is the equivalent bandwidth of the ESRAM if you were doing the same kind of accounting that you do for external memory... With DDR3 you pretty much take the number of bits on the interface, multiply by the speed and that's how you get 68GB/s. That equivalent on ESRAM would be 218GB/s. However, just like main memory, it's rare to be able to achieve that over long periods of time so typically an external memory interface you run at 70-80 per cent efficiency. The same discussion with ESRAM as well - the 204GB/s number that was presented at Hot Chips is taking known limitations of the logic around the ESRAM into account. You can't sustain writes for absolutely every single cycle. The writes is known to insert a bubble [a dead cycle] occasionally... One out of every eight cycles is a bubble, so that's how you get the combined 204GB/s as the raw peak that we can really achieve over the ESRAM. And then if you say what can you achieve out of an application - we've measured about 140-150GB/s for ESRAM. That's real code running. That's not some diagnostic or some simulation case or something like that. That is real code that is running at that bandwidth. You can add that to the external memory and say that that probably achieves in similar conditions 50-55GB/s and add those two together you're getting in the order of 200GB/s across the main memory and internally. One thing I should point out is that there are four 8MB lanes. But it's not a contiguous 8MB chunk of memory within each of those lanes. Each lane, that 8MB is broken down into eight modules. This should address whether you can really have read and write bandwidth in memory simultaneously. Yes you can there are actually a lot more individual blocks that comprise the whole ESRAM so you can talk to those in parallel and of course if you're hitting the same area over and over and over again, you don't get to spread out your bandwidth and so that's why one of the reasons why in real testing you get 140-150GB/s rather than the peak 204GB/s is that it's not just four chunks of 8MB memory. It's a lot more complicated than that and depending on how the pattern you get to use those simultaneously. That's what lets you do read and writes simultaneously. You do get to add the read and write bandwidth as well adding the read and write bandwidth on to the main memory. That's just one of the misconceptions we wanted to clean up. Andrew Goossen: If you're only doing a read you're capped at 109GB/s, if you're only doing a write you're capped at 109GB/s. To get over that you need to have a mix of the reads and the writes but when you are going to look at the things that are typically in the ESRAM, such as your render targets and your depth buffers, intrinsically they have a lot of read-modified writes going on in the blends and the depth buffer updates. Those are the natural things to stick in the ESRAM and the natural things to take advantage of the concurrent read/writes. Digital Foundry: So 140-150GB/s is a realistic target and you can integrate DDR3 bandwidth simultaneously? Nick Baker: Yes. That's been measured.

"The Xbox One has a conservative 10 per cent time-sliced reservation on the GPU for system processing. This is used both for the GPGPU processing for Kinect and for the rendering of concurrent system content such as snap mode." The Kinect debug tools offer an intriguing look at how the camera views the world. The challenge Microsoft and game developers face is integrating the tech effectively into games. That said, what we've seen of how Kinect powers the new dash is really impressive stuff. Digital Foundry: On the leaked whitepapers, peak bandwidth was a lot smaller and then suddenly we ran a story [based on an internal Xbox One development blog] saying that your peak bandwidth doubled with production silicon. Was that expected? Were you being conservative? Or did you get hands-on time with your final processor and figured out that - wow - it can do this? Nick Baker: When we started, we wrote a spec. Before we really went into any implementation details, we had to give developers something to plan around before we had the silicon, before we even had it running in simulation before tape-out, and said that the minimum bandwidth we want from the ESRAM is 102GB/s. That became 109GB/s [with the GPU speed increase]. In the end, once you get into implementing this, the logic turned out that you could go much higher. Andrew Goossen: I just wanted to jump in from a software perspective. This controversy is rather surprising to me, especially when you view ESRAM as the evolution of eDRAM from the Xbox 360. No-one questions on the Xbox 360 whether we can get the eDRAM bandwidth concurrent with the bandwidth coming out of system memory. In fact, the system design required it. We had to pull over all of our vertex buffers and all of our textures out of system memory concurrent with going on with render targets, colour, depth, stencil buffers that were in eDRAM. Of course with Xbox One we're going with a design where ESRAM has the same natural extension that we had with eDRAM on Xbox 360, to have both going concurrently. It's a nice evolution of the Xbox 360 in that we could clean up a lot of the limitations that we had with the eDRAM. The Xbox 360 was the easiest console platform to develop for, it wasn't that hard for our developers to adapt to eDRAM, but there were a number of places where we said, "Gosh, it would sure be nice if an entire render target didn't have to live in eDRAM," and so we fixed that on Xbox One where we have the ability to overflow from ESRAM into DDR3 so the ESRAM is fully integrated into our page tables and so you can kind of mix and match the ESRAM and the DDR memory as you go. Sometimes you want to get the GPU texture out of memory and on Xbox 360 that required what's called a "resolve pass" where you had to do a copy into DDR to get the texture out - that was another limitation we removed in ESRAM, as you can now texture out of ESRAM if you want to. From my perspective it's very much an evolution and improvement - a big improvement - over the design we had with the Xbox 360. I'm kind of surprised by all this, quite frankly. Digital Foundry: Obviously though, you are limited to just 32MB of ESRAM. Potentially you could be looking at say, four 1080p render targets, 32 bits per pixel, 32 bits of depth - that's 48MB straight away. So are you saying that you can effectively separate render targets so that some live in DDR3 and the crucial high-bandwidth ones reside in ESRAM? Andrew Goossen: Oh, absolutely. And you can even make it so that portions of your render target that have very little overdraw... For example, if you're doing a racing game and your sky has very little overdraw, you could stick those subsets of your resources into DDR to improve ESRAM utilisation. On the GPU we added some compressed render target formats like our 6e4 [six bit mantissa and four bits exponent per component] and 7e3 HDR float formats [where the 6e4 formats] that were very, very popular on Xbox 360, which instead of doing a 16-bit float per component 64pp render target, you can do the equivalent with us using 32 bits - so we did a lot of focus on really maximizing efficiency and utilisation of that ESRAM. Digital Foundry: And you have CPU read access to the ESRAM, right? This wasn't available on Xbox 360 eDRAM. Nick Baker: We do but it's very slow. Digital Foundry: There's been some discussion online about low-latency memory access on ESRAM. My understanding of graphics technology is that you forego latency and you go wide, you parallelise over however many compute units are available. Does low latency here materially affect GPU performance? Nick Baker: You're right. GPUs are less latency sensitive. We've not really made any statements about latency. Digital Foundry: DirectX as an API is very mature now. Developers have got a lot of experience with it. To what extent do you think this is an advantage for Xbox One? Bearing in mind how mature the API is, could you optimise the silicon around it? Andrew Goossen: To a large extent we inherited a lot of DX11 design. When we went with AMD, that was a baseline requirement. When we started off the project, AMD already had a very nice DX11 design. The API on top, yeah I think we'll see a big benefit. We've been doing a lot of work to remove a lot of the overhead in terms of the implementation and for a console we can go and make it so that when you call a D3D API it writes directly to the command buffer to update the GPU registers right there in that API function without making any other function calls. There's not layers and layers of software. We did a lot of work in that respect. We also took the opportunity to go and highly customise the command processor on the GPU. Again concentrating on CPU performance... The command processor block's interface is a very key component in making the CPU overhead of graphics quite efficient. We know the AMD architecture pretty well - we had AMD graphics on the Xbox 360 and there were a number of features we used there. We had features like pre-compiled command buffers where developers would go and pre-build a lot of their states at the object level where they would [simply] say, "run this". We implemented it on Xbox 360 and had a whole lot of ideas on how to make that more efficient [and with] a cleaner API, so we took that opportunity with Xbox One and with our customised command processor we've created extensions on top of D3D which fit very nicely into the D3D model and this is something that we'd like to integrate back into mainline 3D on the PC too - this small, very low-level, very efficient object-orientated submission of your draw [and state] commands. "The biggest thing in terms of the number of compute units, that's been something that's been very easy to focus on. It's like, hey, let's count up the number of CUs, count up the gigaflops and declare the winner based on that."