One of the canards that’s regularly trotted out in discussions of ARM vs. x86 processors is the idea that ARM chips are intrinsically more power efficient thanks to fundamental differences in the ISA (instruction set architecture). A new research paper examines these claims using a variety of ARM cores as well as a Loongson MIPS microprocessor, Intel’s Atom and Sandy Bridge microarchitectures, and AMD’s Bobcat.

This paper is an updated version of one I’ve referenced in previous stories, but its methods and claims are worth investigating in more detail. ISA investigations are intrinsically difficult given that it’s effectively impossible to separate the theoretical efficiency of an architecture from the proficiency of its design team or the technical expertise of its manufacturer. Even products that seem identical can have important differences — ARM revised the Cortex-A9 core four different times and has released three updates to the Cortex-A15. Then you have the particulars of manufacturing — Intel, TSMC, Samsung, and GlobalFoundries aren’t carbon copies of each other and the CPU inside a Tegra K1 isn’t 100% identical to the Cortex-A15 inside a Samsung Exynos SoC.

That’s just the hardware side of the equation. Toss in compiler optimizations and library support and it’s even harder to write a definitive apples-to-apples comparison of any two architectures.

With that said, the team from the University of Wisconsin has taken a pretty good whack at an incredibly complex problem and compared the following architectures.

Test setup and modeling

The chips in question were tested in desktop, mobile, and server workloads with a mixture of programs including CoreMark, WebKit, SPEC tests, and a variety of other benchmarks. Power consumption data was gathered at the SoC level, while performance information was gathered using a variety of profiling techniques.

All of the systems save the Cortex-A15 were tested using Linux 2.6 LTS with minor patches. The A15 had to be tested with Linux 3.8 due to compatibility issues. All tests were compiled with GCC 4.4, all target-independent optimizations were enabled (O3), with machine-specific tuning disabled. None of the tests included SIMD code and while auto-vectorization was enabled, very few SIMD instructions were generated for ARM or x86. All of the tests were compiled in 32-bit mode for all of the architectures.

Next page: Benchmark results