Haswell – A New Architecture

It’s finally here, the Intel Haswell processor in the form of the Core i7-4770K for desktop users. Is it time to upgrade?

Thanks for stopping by our coverage of the Intel Haswell, 4th Generation Core processor and Z87 chipset release! We have a lot of different stories for you to check out and I wanted to be sure you knew about them all.

This spring has been unusually busy for us here at PC Perspective – with everything from new APU releases from AMD, new graphics cards from NVIDIA and now new desktop and mobile processors from Intel. There has never been a better time to be a technology enthusiast though some would argue that the days of the enthusiast PC builder are on the decline. Looking at the revived GPU wars and the launch of Intel's Haswell architecture, 4th Generation Core processors we couldn't disagree more.

Built on the same 22nm process technology that Ivy Bridge brought to the world, Haswell is a new architecture from Intel that really changes focus for the company towards a single homogenous design that has the ability to span wide ranging markets. From tablets to performance workstations, Haswell will soon finds its way into just about every crevasse of your technology life.

Today we focus on the desktop though – the release of the new Intel Core i7-4770K, fully unlocked, LGA1150 processor built for the Z87 chipset and DIY builders everywhere. In this review we'll discuss the architectural changes Haswell brings, the overclocking capabilities and limitations of the new design, application performance, graphics performance and quite a bit more.

Haswell remains a quad-core processor built on 1.4 billion transistors in a die measuring 177 mm2 with integrated processor graphics, shared L3 cache, dual channel DDR3 memory controller. But much has changed – let's dive in.

The Haswell Architecture

I have already done quite a bit of writing about the Haswell architecture itself, but much of it is going to be new to our readers or at the very least many will need a refresher. Let's dive into some of the details that were first revealed at the Intel Developer Forum last September.

While Sandy Bridge and Ivy Bridge were really derivatives of prior designs and thought processes, the Haswell design is something completely different for the company. Yes, the microarchitecture of Haswell is still very similar to Sandy Bridge (SNB), but the differences are more philosophical rather than technological.

Intel's target is a converged core: a single design that is flexible enough to be utilized in mobility devices like tablets while also scaling to the performance levels required for workstations and servers. They retain the majority of the architecture design from Sandy Bridge and Ivy Bridge including the core as well as the key features that make Intel's parts unique: HyperThreading, Intel Turbo Boost, and the ring interconnect.

The three pillars that Intel wanted to address with Haswell were performance, modularity, and power innovations. Each of these has its own key goals including improving performance of legacy code (existing), and having the ability to extract greater parallelism with less coding work for developers.

The modularity of Haswell is what gives the processor design its extreme flexibility while providing a consistent optimization path for software developers. The ability for a designer to write an application that can run (though at different feature or performance levels) across the entire array of devices that Haswell will find its way in is powerful.

Haswell (at least in this iteration) will be available in various different configurations including 2-4 processing cores, three different levels of graphics subsystem, differing idle and active power levels, interconnects, and platforms. This will greatly increase the power and performance ranges of Haswell compared to Ivy Bridge (and Sandy Bridge) and is enabled by the system agent that acts as the intermediary between all of the components on the SoC.

Intel also claims that Haswell will permit third-party IP integration, and thus will be capable of adding specific features and technologies as the OEMs demand.

Power Management

Changes to power management on Haswell address both active (in use) and sleep states in order to see the biggest alterations from previous architectures. The goal is to lower the power consumption required during CPU load while also decreasing the amount of time it takes for the entire system to enter and leave sleep states. Intel introduces a new S0ix status that it is borrowing from the ultra-mobile designs of Atom to get a 20x improvement in low power states, and allows improved realizable battery life.

Just as important as the new states themselves is that Intel claims they are completely transparent to "well written" software.

Other changes in the design address power with Haswell, including changes to Turbo Boost technology and more granular voltage and frequency "islands" for the CPU to enter. Also changed from SNB and IVB is that the frequency of the cores is decoupled from the ring bus allowing voltages to scale more gracefully to where the power is actually needed. For example, Ivy Bridge and Sandy Bridge both required power to increase on the CPU cores when the GPU needed more bandwidth on the ring interconnect for other purposes, which is a waste of valuable power.

While we talked about the idle power changes in the slide above, Intel also pointed out that at this point that is is the only CPU vendor that has complete control over its manufacturing. Intel can utilize that advantage by tweaking the process in very specific ways to meet any goals that the engineers might have.

Because the majority of Haswell designs will be completely Intel-based platforms, it makes sense for Intel to address this as well. You will see new voltage regulators and better power-managed controllers (embedded now) in addition to new IO options like I2C, SDIO and I2s that are traditionally only found in mobile devices. New link power states for traditional IO connections like USB and SATA are being introduced that can nearly drop power draw at idle to zero watts.

Haswell Microarchitecture Changes

While the Haswell design is based mainly on the architecture introduced with Sandy Bridge, there are some changes that Intel made to improve performance in the more typical fashion with an eye towards IPC (instructions per clock).

There were no changes in the key pipelines of Haswell but there were many areas that Intel said are "typical improvement points" for the company. The branch predictor has been improved as this is usually the best return on time investment from a CPU-design stand point; Intel increased the buffers on the OOO (out of order) structures in order to help improve the ability for the processor to find parallelism and take advantage of it.

Throughput also sees a boost, with 8 total ports on the reservation station with another ALU unit, another branching unit, and address store. This gives Haswell some improved metrics like two branches per cycle and two floating point MADDs per cycle – both improvements over what we saw in Sandy Bridge and Ivy Bridge processors.

New compute instructions expand on AVX, doubling both single precision and double precision FLOPs per core per cycle. Other new instructions accelerate very specific algorithms with updates for extract and deposits, bit manipulation, rotates, etc.

The cache implementation also sees interesting changes with Haswell including a doubling of the bandwidth to 32-bits wide and one L2 cache read every cycle. Seeing both L1 and L2 cache bandwidths double in a single generation without changing the organization and size of those structures is impressive, though it needs more explanation as well.

Another big upcoming change is the introduction of transactional synchronization extensions (TSX). TSX is a method to improve concurrency and multi-threadedness with as little work for the programmer as possible. By using these new ISA extensions, a developer can apply simple prefixes and suffixes to code blocks to indicate that they are independent and can be run in parallel. Hardware is then capable of managing transactional updates and restart execution if the required block isn't able to be run.

While this might be pretty specific to discuss with our audience, the implications are impressive. Increasing the parallelization of software is one of the key issues holding back innovation on many levels. We have seen the GPU vendors fight this (think CUDA) for years, and Intel's continued push into the MIC (many integrated core) markets will require it as well. If you are interested in this technology, you should check out David Kanter's detailed analysis of it.