This is a guest blog by Michael Meeks, General Manager of Collabora Productivity and leader of the LibreOffice team. The views expressed in this blog are his alone.

Changing how Calc uses silicon

LibreOffice is an Open Source office suite used all over the world, on Windows, MacOS and Linux. Between January and October 2011, it was downloaded approximately 7.5 million times, about 15 million times in 2012 and around 25 million times in 2013. Counting our user-base is hard, but each week, another million unique IP addresses check in to see if there is a new version to download.

One of the key productivity applications in LibreOffice is Calc, a spreadsheet application. We’ve been doing some very heavy retooling of Calc behind the scenes of late, working with the LibreOffice community and AMD to make Calc one of the most capable and up-to-date productivity applications out there.

This rework has involved changing how Calc uses the silicon in the devices it runs on. In short, spreadsheet apps normally rely on the CPU. Actually, they should be smart about using the right processor, and for Calc and other spreadsheet programmes, that’s the GPU. And the results are stunning – a near seven times speed boost in benchmarking. Here’s the how and why.

Spreadsheets are bigger than ever

A significant portion of spreadsheet documents are created to help make business decisions, and they’re used for things the original creators of the format never imagined. Their performance has, for many years, been directly correlated to what you were doing with them, and how much CPU horsepower you had to hand. People’s understanding of what spreadsheets could do was limited to specific areas of the business, and access to large amounts of data to process was often limited, too. All of that is changing, and changing fast.

People are using spreadsheets differently now. Large chunks of data are easier to store – it may have been said in jest, but the idea that Big Data is data that has become simpler and cheaper to store than to throw away is true. Secondly, people are more aware of how they can take large data sets and smash them together to get a third set of usable information. To do this, they need a more powerful spreadsheet application that can do what modern working environments demand of it.

Looking back at an early spreadsheet from 3,000 BC – an obelisk – the data it contains (“don't fight the big guy”) is tiny – only four columns for a start – compared to modern spreadsheets. Excel 2010, for example, can handle 16,000 columns and 1,000,000 rows. When you add in formulae, pivot tables and the like, that’s an awful lot of information to handle and change.

Parallel processing on a GPU

When you look at applications like Microsoft Excel and Calc, the spreadsheet application in LibreOffice, elements of both of these can trace their lineage all the way back to Visicalc. CPU benchmarking has always involved testing the CPU with large spreadsheet transformations. But actually, the best processor for handling, say, formulae scattered over 16,000 x 10^6 cells is not the CPU. It’s the GPU. Doing this boosts both performance and, somewhat counter-intuitively, saves power at the same time. Graphics processors are extremely good at handling parallel processing tasks – certainly compared to CPUs. Because GPUs can do jobs in parallel at an optimal clock frequency, there’s a power saving involved as well.

When you get to a point where you have more than a hundred rows of data in a column, the GPU can start to help. These sorts of documents are actually quite common. Finance is always the example trotted out, but every day office applications can lead to documents that suffer from terrible performance. Take Human Resources, keeping track of staff attendance. A spreadsheet is created, then more and more data is added over time, with formulae extended to crunch it. The value of doing this might initially be to help the sales team understand whether people buy more red cars on a Tuesday, but the complexity quickly grows. Are people more likely to buy red cars with the performance pack and alloys on the Friday after pay day, for example? Doing this creates documents with thousands of rows and quite a bit of complexity.

The right technologies for the job

Reworking Calc has created other benefits. The team wrote a converter to turn your standard Formulae into OpenCL – the Open Compute Language. By doing this, we were able to make the move from CPU to GPU, but it also allows us to do something else: to take advantage of the next generation of processor designs using a Heterogeneous Systems Architecture (HSA) which allows a much more efficient OpenCL implementation in spreadsheets.

Anyway, thanks to OpenCL, these spreadsheet optimisations are now far more portable than before: there’s no need to write custom assembler for each CPU you are porting to. This benefit – and the other benefits I’ve described above – can be applied to many applications that do a lot of number crunching. The right processor for the job can be used, for starters – and porting that hard optimisation work to other platforms – tablets, or smartphones, for example is easier. Using OpenCL also improves the underlying code structure and boosts performance – whether a GPU is present or not.

What does this mean in the real world? Well, nearly a seven times faster performance on AMD versus Intel in a benchmark test1 based on real time analysis and visualisation of streaming stock quotes. Accomplishing a task in a seventh of the time sounds like the sort of performance benefit heavy spreadsheet users could appreciate.

All of this is why the contributors to Calc – including engineers from AMD and Collabora (and MultiCore Ware) – helped rework LibreOffice Calc to make best use of the best bit of the processor in your PC to handle spreadsheets. We later worked out it was the biggest core re-factoring of Calc code in over a decade, and as a result, Calc is faster and more powerful for everyone. There is also the lovely, cool feeling of the right compute unit in your computer doing the work not only more quickly, but not glowing quite so hot as well.

About Michael Meeks

Michael is an enthusiastic believer in Free Software. He is the General Manager of Collabora Productivity, leading our LibreOffice team, supporting customers, consulting on development alongside an extremely talented team. He serves as a member of the board of The Document Foundation, and the LibreOffice Engineering Steering Committee; in the past he served on ECMA/TC45 improving Microsoft's description of their format OOXML. Prior to this he was a Novell/SUSE Distinguished Engineer working on various pieces of Free Software infrastructure across the Linux desktop stack. Prior to that he worked on both hardware and software for real-time video editing at Quantel.

1 AMD A10-7850K APU gets up to 7X better OpenCL performance with LibreOffice Calc. AMD tests are performed on optimized AMD reference systems. PC manufacturers may vary their configuration yielding different results. Test project used LibreOffice Calc V4.2.0 to perform 21,000 calculations, and plot 1000 points for 21 different stocks. A desktop PC with AMD A10-7850K APU with AMD Radeon™ R7 Series graphics, 2x4GB DDR3-2133 RAM, video driver 13.300.0.0 - 09-Dec-2013, took 120 milliseconds with OpenCL on. A desktop PC configured with an Intel Core i5-4670K with Intel HD 4600 graphics, 2x4GB DDR3-1600, video driver 10.18.10.3345 - 30-Oct-2013, took 950 milliseconds with OpenCL™ on. Both systems used the same SSD hard drive and Window 8.1 build 9600. KVD-8