Core and Interconnect

At IDF we finally learned some more about the Skylake core architecture powering the 6th generation processors from Intel.

The Skylake architecture is Intel’s first to get a full release on the desktop in more than two years. While that might not seem like a long time in the grand scheme of technology, for our readers and viewers that is a noticeable change and shift from recent history that Intel has created with the tick-tock model of releases. Yes, Broadwell was released last year and was solid product, but Intel focused almost exclusively on the mobile platforms (notebooks and tablets) with it. Skylake will be much more ubiquitous and much more quickly than even Haswell.

Skylake represents Intel’s most scalable architecture to date. I don’t mean only frequency scaling, though that is an important part of this design, but rather in terms of market segment scaling. Thanks to brilliant engineering and design from Intel’s Israeli group Intel will be launching Skylake designs ranging from 4.5 watt TDP Core M solutions all the way up to the 91 watt desktop processors that we have already reviewed in the Core i7-6700K. That’s a range that we really haven’t seen before and in the past Intel has depended on the Atom architecture to make up ground on the lowest power platforms. While I don’t know for sure if Atom is finally trending towards the dodo once Skylake’s reign is fully implemented, it does make me wonder how much life is left there.

Scalability also refers to the package size – something that ensures that the designs the engineers created can actually be built and run in the platform segments they are targeting. Starting with the desktop designs for LGA platforms (DIY market) that fits on a 1400 mm2 design on the 91 watt TDP implementation Intel is scaling all the way down to 330 mm2 in a BGA1515 package for the 4.5 watt TDP designs. Only with a total product size like that can you hope to get Skylake in a form factor like the Compute Stick – which is exactly what Intel is doing. And note that the smaller packages require the inclusion of the platform IO chip as well, something that H- and S-series CPUs can depend on the motherboard to integrate.

Finally, scalability will also include performance scaling. Clearly the 4.5 watt part will not offer the user the same performance with the same goals as the 91 watt Core i7-6700K. The screen resolution, attached accessories and target applications allow Intel to be selective about how much power they require for each series of Skylake CPUs.

Core Microarchitecture

The fundamental design theory in Skylake is very similar to what exists today in Broadwell and Haswell with a handful of significant and hundreds of minor change that make Skylake a large step ahead of previous designs.

This slide from Julius Mandelblat, Intel Senior Principle Engineer, shows a higher level overview of the entirety of the consumer integration of Skylake. You can see that Intel’s goals included a bigger and wider core design, higher frequency, improved right architecture and fabric design and more options for eDRAM integration. Readers of PC Perspective will already know that Skylake supports both DDR3L and DDR4 memory technologies but the inclusion of the camera ISP is new information for us.

The Skylake Core has had minor changes and nip-tucks done across the board that add up to a significant increase in gen-on-gen IPC improvements. These include things you might normally expect – branch prediction improvements and buffer capacity increases, faster prefetch capability, and deeper out of order buffers for better instruction parallelism. The execution units themselves have also been improved with lower latencies, more units and better power efficiency when not in use. Load and store bandwidth has been increased in the core with deeper store and fill buffers, better page miss handling and L2 cache miss speed ups. Even HyperThreading is slightly improved with a wider retirement protocol.

This table showcases the slight, iterative improvements from Sandy Bridge to Haswell and now to Skylake. Some of these changes are more substantial than you might have expected from previous steps. Inflight stores are increased by 33% and scheduler entries are upped by more than 60%. Individually these changes might not mean much but combined they are show improved parallelism for modern applications and operating systems.

The core architecture also has some improvements for power optimizations including resource configuration that can gate off power hungry AVX2 hardware when it is not in use. Resources that are not being used, in general, have been downscaled somewhat. Scenario based power that might be useful for media playback workloads allow for better mobile platforms; idle power is reduced and C1 state (for low performance requirement workloads) improves dynamic capacitance.

Interconnect and Memory Feature Improvements

Maybe the most impressive changes to Skylake come in the shape of changes to the cache and memory architecture. The throughput on the LLC (last level cache) has been doubled when handling misses. The fabric, part of the ring bus design, has double the available internal bandwidth for moving data from agent to agent without increase power and with only a 50% increase in transistor usage. Memory QoS has been improved to aid in the implementation of the new image signal processor (ISP) and higher resolution displays.

This fabric performance improvement should not be overlooked. With a move to DDR4 memory and changes to the eDRAM, this improvement is directly visible in synthetic testing and could be a way to gain some impressive performance in very specific workloads.

eDRAM performance and usability has been improved as well – it is observed by all memory accesses and is fully coherent. It can cache any data in the processor and there is no longer a need to flush it for maintenance purposes. It can be utilized by I/O devices and the display engine in order to take advantage of low power display refresh capabilities.

eDRAM Integration on Broadwell

In previous iterations of the eDRAM a portion of the LLC (25%) was used to hold the eDRAM access tags and the eDRAM wasn’t able to communicate with the rest of the system directly.

eDRAM Integration on Skylake

Skylake moves the eDRAM controller into the system agent, freeing up 512KB of LLC capacity while also giving other parts of the processor easier access to the data in the eDRAM. That memory can now interface with the main system DRAM directly and enable display refreshes without waking other portions of the processor that might be powered down during idle states.

Unfortunately, even though we are told there are more SKUs coming with the eDRAM integration, there are no plans for Intel to offer a consumer LGA part using it for compute workloads. As we were with Haswell, I am disappointed by that decision.