First, two quotes from my review of the original iPhone:

“The issue is that the iPhone interface is just as responsive as a computer, so you inherently expect the sort of performance you'd see on a notebook and it's just impossible on a device like the iPhone." “I think overall we need a handful of upgrades to the iPhone alongside 3G; we need a faster processor, possibly more system memory, maybe even faster flash. The MLC flash in the iPhone has absolutely horrendous write speeds compared to SLC, which could be holding the iPhone back a bit. I can see Apple introducing a 3G version in about 12 months, addressing many of these issues at the same time.”

Indeed, 12 months after the launch of the first iPhone - Apple did fix the wireless performance issues with the iPhone 3G. Unfortunately, the hardware remained untouched. All of my other complaints in those two quotes remained open ticket items between Apple and I. In fact, things got worse. Here we have what I wrote at the end of my iPhone 3G review:

“Apple must be wary of the direction the iPhone is headed in. While the UI was absolutely perfect for the phone that launched a year ago, today’s iPhone is hardly the same. With easily over twice as many applications on an iPhone today vs. a year ago, performance and navigation have both suffered. The impact isn’t tremendous, but Apple will have to adjust the iPhone accordingly in order to avoid turning the platform into a bloated, complicated mess.”

Two days ago, Apple announced the iPhone 3GS - designed to address one thing: performance. The other half of my complaint in the conclusion of my 3G review, addressing navigation and UI with the new expanded iPhone platform, isn’t addressed by the 3GS. I suspect that in another year we’ll see that. But today, it’s about hardware.

The Impetus

After yesterday's Pre vs. iPhone 3G battery life article I got a few emails from people very close to the chips used in the iPhone 3GS. A couple of exchanges later and I realized it might be time to go a little deeper with the hardware behind the iPhone 3G, iPhone 3GS and the Palm Pre.

The Original

The iPhone and iPhone 3G use a system on a chip (SoC) from Samsung. The SoC is a custom part and actually has Apple’s logo on the chip. The SoC houses the CPU, GPU and memory for the iPhone.

The CPU is based on the ARM11 core, in specific it is the ARM1176JZF-S. The CPU runs at 412MHz to save power, although the core is capable of running at 667MHz. The ARM11 CPU is a single-issue in-order microprocessor with an 8-stage integer pipeline. It’s got a 32KB L1 cache (16KB for instructions, 16KB for data) and no L2 cache. The ARM11 CPU in the iPhone also has a vector floating point unit, but thankfully the SoC includes a separate GPU for 3D acceleration. You can think of this core as a very high clocked, very advanced 486. And extremely low power. Under typical load, the CPU core should consume around 100mW. By comparison, the CPU in your laptop can require anywhere from 10 - 35W. Idle power is even lower.

Paired with this CPU is a PowerVR MBX-Lite GPU core. This GPU, like the CPU, is built on a 90nm process and is quite simple. The GPU does support hardware transform and lighting but it’s fully fixed function, think of it as a DirectX 6/7 class GPU (Riva TNT2/GeForce 256). Here’s PowerVR’s block diagram of the MBX:

The MBX-Lite in the iPhone shares the same architecture as the MBX but is optimized, once more, for power efficiency and thus is significantly slower.

I don’t have exact clock speed information for the MBX-Lite in the iPhone but I’m guessing around 60MHz.

Coupled with the CPU and the GPU in the iPhone’s SoC is 128MB of DDR memory, all on the same chip. It’s a pretty impressive little package. You get a CPU, GPU and memory all in a package that’s physically smaller than Intel’s Atom.

Now the 486 came out in 1989 and the original 3dfx Voodoo graphics card came out in 1996. The iPhone’s SoC would be ridiculously powerful if it were running the sorts of applications we had back then, but it’s not. We’re asking a lot from this little core and although it has performed admirably thanks to some clever software engineering on Apple’s part, it’s time for an update.

Enter the ARM Cortex A8

This past weekend Palm introduced its highly anticipated Pre. While I’m still working on my review of the Pre, I can say that it’s the closest thing to an iPhone since Apple first unveiled the product two summers ago. In many ways the Pre is lacking in areas that the iPhone has honestly perfected, but in others the Pre easily surpasses Apple’s best.

One such area is raw performance. While both the iPhone and iPhone 3G use the same old CPU/GPU, the Pre uses TI’s OMAP 3430 processor. The 3430, like the SoC Apple uses, has both a CPU and GPU on the same package. Instead of the ARM11 and the PowerVR MBX-Lite however, the OMAP 3430 uses an ARM Cortex A8 core and a PowerVR SGX GPU. Both are significant improvements over what was in the original iPhone.

Thankfully, Apple fans don’t have to be outclassed for long - the newly announced iPhone 3GS uses a comparable CPU/GPU pair.

Although unannounced, the iPhone 3GS uses (again) a Samsung SoC but this time instead of the ARM11 + MBX-Lite combo it’s got a Cortex A8 and PowerVR SGX; just like the Pre.



A derivative of this is what you'll find in the iPhone 3GS

If the ARM11 is like a modern day 486 with a very high clock speed, the Cortex A8 is like a modern day Pentium. The A8 lengthens the integer pipeline to 13 stages, enabling its 600MHz clock speed (what I’m hearing the 3GS runs at). The Cortex A8 also widens the processor; the chip is now a two-issue in-order core, capable of fetching, decoding and executing two RISC instructions in parallel.

The ARM11 processor in the iPhone/iPhone 3G has a basic vector floating point unit, but the A8 adds a much more advanced SIMD engine called NEON. The A8 also has twice as many double precision FP registers as the ARM11. The addition of NEON and the improved vector FPU in the A8 makes the processor much less like the original Pentium and much more like Intel’s Atom. Granted, Atom is significantly faster than the A8, but it also draws much more power.

Caches also get a significant improvement. I believe Apple will be using a derivative of Samsung’s S5PC100, which has a 32KB/32KB L1 cache (I/D, we may see a 16KB/16KB config instead) and a 256KB L2 cache. The L2 cache, as you’ll remember from the first section, is a new addition to the A8; the ARM11 core didn’t have an L2.

iPhone 3G (ARM11) iPhone 3GS (ARM Cortex A8) Manufacturing Process 90nm 65nm Architecture In-Order In-Order Issue Width 1-issue 2-issue Pipeline Depth 8-stage 13-stage Clock Speed 412MHz 600MHz L1 Cache Size 16KB I-Cache + 16KB D-Cache 32KB I-Cache + 32KB D-Cache L2 Cache Size N/A 256KB

The combination of higher clock speeds, more cache and a dual-issue front end results in a much faster processor. Apple claims the real world performance of the iPhone 3GS can be up to 2x faster than the iPhone 3G, and I believe that’s quite feasible.

The new SoC is built on a 65nm manufacturing process, down from 90nm in the original hardware. However, power consumption should still be higher for the new SoC compared to the old one. ARM’s own site lists ~0.25mW per MHz for the ARM11 core but < 0.59mW per MHz for the A8. That’s for a 650MHz low power A8 core and I’m expecting 600MHz for the 3GS, that’s at most 3x the power consumption of the CPU in the original iPhone. So how can Apple promise better battery life?

The thing about these comparisons is that they don’t show the full picture. With the same battery capacity, running at full speed, the new iPhone 3GS would run out of juice faster than the existing iPhone 3G. But that’s rarely how people use their phones. Chances are that you’ll perform a few tasks before putting the phone back to sleep, and what matters then is how quickly you can complete those tasks.

Just under nine years ago Intel talked about a technology called Quick Start. Let me quote from our IDF 2000 Day 2 coverage (wow, that was a while ago):

“"Intel has figured out that it is best to use full CPU power for a split second to finish a task and then put the CPU to idle as this conserves battery life the best. Although one may suspect that when running complex operations the CPU would not have time to go idle, this is not the case. To illustrate this point, Intel used an example of DVD playback. Very stressful on the system as a whole, Intel's quick start technology allows the CPU to "hurry up" and perform the DVD decoding operations and then go idle until the frame is displayed to screen and the next scene needs to be calculated. This saves battery life because, although the system may require 3 watts or so to "hurry up", the power consumption goes down near .25 watts when idle. By averaging these two numbers, one can quickly see how quick start can extend battery life."”

The A8’s power consumption has to be well under that 3x max I quoted above, and the iPhone 3GS needs to be more than just 2x faster at executing instructions, but if possible then it’s quite feasible for the faster A8 to draw more instantaneous power but draw less power on average than the ARM11 core in the original iPhone.

The iPhone Becomes a Gaming Platform: Enter the PowerVR SGX

Now that we’re familiar with the 3GS’ CPU, it’s time to talk about the GPU: the PowerVR SGX.

Those familiar with graphics evolution in the PC space may remember Imagination Technologies and its PowerVR brand by their most popular desktop graphics card: STMicro’s Kyro and Kyro II. The Kyro series used the PowerVR3 chips and while STMicro ultimately failed to cement itself as a NVIDIA competitor in the desktop, the PowerVR technology lived on in ultra-mobile devices.

The SGX is on Imagination Technologies’ fifth generation of its PowerVR architecture, and just like the Kyro cards we loved, the SGX uses a tile based renderer. The idea behind a tile or deferred renderer is to render only what the camera sees, not wasting clocks and memory bandwidth on determining the color of pixels hidden by another object in the scene. Tile based renderers get their name from dividing the screen up into smaller blocks, or tiles, and working on each one independently. The smaller the tile, the easier it is to work on the tile on-chip without going to main memory. This approach is particularly important in the mobile space because there simply isn’t much available bandwidth or power. These chips consume milliwatts, efficiency is key.

The MBX-Lite used in the original iPhone was also a tile based architecture, the SGX is just better.

Also built on a 65nm process the PowerVR SGX is a fully programmable core, much like our desktop DX8/DX9 GPUs. While the MBX only supported OpenGL ES 1.0, you get 2.0 support from the SGX. The architecture also looks much more like a modern GPU:

Pixel, vertex and geometry instructions are executed by a programmable shader engine, which Imagination calls its Universal Scalable Shader Engine (USSE). The “coprocessor” hardware at the end of the pipeline is most likely fixed-function or scalar hardware that’s aids the engine.

The SGX ranges from the PowerVR SGX 520 which only has one USSE pipe to the high end SGX 543MP16 which has 64 USSE2 pipes (4 USSE2 pipes per core x 16 cores). The iPhone 3GS, I believe, uses the 520 - the lowest end of the new product offering.

A single USSE pipe can execute, in a single clock, a two-component vector operation or a 2 or 4-way SIMD operation for scalars. The USSE2 pipes are upgraded that handle single clock 3 or 4 component vector operations, have wider SIMD and can co-issue vector and scalar ops. The USSE2 pipes are definitely heavier and have some added benefits for OpenCL. For the 3GS, all we have to worry about is the single USSE configuration.

iPhone 3G (PowerVR MBX-Lite) PowerVR SGX @ 100MHz PowerVR SGX @ 200MHz Manufacturing Process 90nm 65nm 65nm Clock Speed ~60MHz 100MHz 200MHz Triangles/sec 1M 3.5M 7M Pixels/sec 100M 125M 250M

In its lowest end configuration with only one USSE pipe running at 200MHz, the SGX can push through 7M triangles per second and render 250M pixels per second. That’s 7x the geometry throughput of the iPhone 3G and 2.5x the fill rate. Even if the SGX ran at half that speed, we’d still be at 3.5x the geometry performance of the iPhone 3G and a 25% increase in fill rate. Given the 65nm manufacturing process, I’d expect higher clock speeds than what was possible on the MBX-Lite. Also note that these fill rates take into account the efficiency of the SGX’s tile based rendering engine.

Final Words: Preparing for 3GS

As I mentioned earlier, the Palm Pre uses a similar combination of hardware to what I expect from the iPhone 3GS. TI’s OMAP 3430 combines a Cortex A8 CPU core with a PowerVR SGX 530 GPU. The difference is that while the Pre uses its excess horsepower to enable user-level application multitasking, Apple won’t be. The Pre is most definitely faster than the iPhone, but it still has some rough edges. Combine the power of the Pre with the highly optimized software stack of the iPhone and you’ve got the recipe of an extremely fast iPhone. While I’ve yet to play with one, on paper, the 3GS should be every bit as fast as the videos make it seem.

The iPhone 3GS' performance upgrades should make the phone feel a lot faster, but the real improvement will be what it enables application and game developers to do. Apple recently hired two former AMD/ATI CTOs, presumably to work on some very graphics-centric projects. The iPhone 3GS may be a mild upgrade from a consumer perspective, but what it's going to enable is far from it; watch out Nintendo. Remember the performance gains we saw in the early days of 3D graphics on the PC? We're about to go through all of that once more in the mobile space. Awesome.

Looking toward the future, there’s always more around the corner. There’s the Cortex A9 which brings multiple cores to the table, and the PowerVR SGX engine can be scaled up simply by adding more USSE pipes to the architecture. Newer manufacturing processes will enable bringing these technologies to life without any decrease in battery power.

It’s curious to me how central ARM and Imagination Technologies are to these smartphones. On the PC side it’s all about Intel, AMD and NVIDIA but when we’re talking Pres and iPhones it’s all ARM and PowerVR. Intel wants to bring Atom down to ARM power consumption levels and NVIDIA desperately searches for treasure in the mobile market, but those two are the underdogs in this race. For the foreseeable future at least.

There you have my take on the iPhone 3GS’ hardware. If Apple would just get their pre-ordering system working right I might not even have to camp out this year...