Yesterday, Marvell announced a whopper of a processor—the ARMADA 628. While most of the coverage so far has focused on the three A9-class processor cores, the craziest feature of this chip is its on-die GPU. The 628's GPU can push 200 million triangles per second (MT/s); for some perspective, compare the Playstation 3's GPU at 250MT/s This GPU, plus the three Sheeva PJ4 cores, means that you can put console-caliber gaming performance—1080p graphics and all—in a handheld.

Given the amount of hardware on this new chip, it's complete overkill for a smartphone. But as a tablet or handheld gaming part, you could do some really fantastic things with this. Let's take a look under the hood.

Core competency

When Marvell bought tiny ASICA in 2003, the company acquired an ARM architecture license along with it. This means that Marvell is one of the few companies that can either modify ARM's stock designs, or make entirely new microarchitectures that implement one of ARM's ISA variants. In 2006, Marvell bought the XScale team from Intel, and combined it with the ASICA team. The new group started work on an advanced ARM v7 core called Sheeva.

I haven't seen a block diagram of Sheeva, and there's not much in the way of publicly available information on it. What we do know is that it's a two-issue, superscalar design that can do some amount of instruction reordering. It's not clear if the design contains a full-blown instruction window, like the A9 and other out-of-order processors, or if it merely has some static scheduling logic that lets one instruction bypass another instruction that's stalled in the pipeline. The latter option isn't unheard of, and it's typically done only for floating-point instructions, since they're the most likely to stall. My guess is that Sheeva takes the latter route, and does some simple reordering of stalled floating-point instructions. This would be more power-efficient than a full-blown instruction window.

Whatever the details of Sheeva's pipeline and microarchitecture, Marvell has confirmed to me that the three cores in the ARMADA 620 are the latest PJ4 variant of Sheeva. However, Marvell is moving on from the Sheeva branding, and is opting to identify the cores as "A9-class," which is a nod to their two-issue, out-of-order(?) nature.

Sheeva was already capable of 1.2GHz, but the new design can go up to 1.5GHz. But only two of the 628's Sheeva cores run at the full 1.5GHz. The third one is down-clocked to 624MHz, and interesting design choice that saves on power but adds some extra utility. In a sense, the 628 could be called a 2.5-core design.

It's quite common to see another, slower ARM core integrated onto an SoC for the purposes of baseband processing (this is often done with a simple ARM11 core), but this is the first time I've seen this particular down-clock approach taken for non-baseband purposes.

Graphics

As I mentioned above, though, the most mind-blowing aspect of the new design is the 200MT/s number for the on-die GPU. That's some serious graphical horsepower, and it puts this chip almost on par with a current-gen console.

For a nonconsole comparison, consider that NVIDIA's Tegra 2 debuted earlier this year with a GPU that could push 90MT/s, a number that was considered insanely large for the time. No official data is available on the iPhone's A4 processor, but the rumored number is 28MT/s.

What both of these smartphone SoC comparables tell me is that the 628 really can't be looked at as a smartphone chip. You don't get something for nothing, and they've obviously loaded this part up with a ton of hardware. I don't have a transistor count for the new chip, but it has to be quite high. Think about it: three A9-class cores, a near console-level GPU, a generous 1MB L2 cache, and a host of added blocks for HD video decoding, USB 3.0, and so on. This part could really go in a PC or laptop. I'm thinking a tablet is probably the smallest form factor we'll see it in, at least in the near-term.