People have side projects. This one is mine.

What if you accelerate the popular sqlite database with OpenCL? This is one of the ideas that was floated as part of the GPGPU team to get a feel for what might be accomplished on ARM hardware with a mobile GPU.

In my case I’m using the Mali opencl drivers, running with ubuntu linux on a Samsung Chromebook which includes a dual core A15 and a Mali T604. You can replicate this same setup following these instructions.

At Linaro Connect Asia 2014 as part of the GPGPU session I gave an overview of the effort but I wasn’t able to give any initial performance numbers since my free time is highly variable and Connect arrived before I was quite ready. At the time I was about a week out from being able to run a microbenchmark or two since I was just getting to the step of writing some of the OpenCL.

Before I get to some initial numbers let me review a bit of what I talked about at Connect.

To accelerate sqlite I’ve initially added an api that sits next to the sqlite C api. My API in time should be able to blend right into the sqlite API so that no code changes would be needed by end user applications. With sqlite you usually have a call sequence something like :

sqlite3_open(databaseName, &db); c= sqlite3_prepare_v2(db, sql, -1, &selectAll_statement, NULL); while (sqlite3_step(selectAll_statement) == SQLITE_ROW) { sqlite3_column_TYPE(selectAll_statement,0); }

The prepare call takes sql and converts it to an expression tree that is translated into a bytecode which is run inside of a VM. The virtual machine is really nothing more than an big switch statement and each case handles an op code that the VM operates over. sqlite doesn’t do any sort of JIT to accelerate it’s operation. (I know what you’re thinking, hold that thought.)

The challenge to make a general purpose acceleration is to take the operation of the VM and move that onto the GPU. I see a few ways to accomplish this. In the past work that Peter Bakkum and Kevin Skadron had done they basically moved the implementation of the VM into the GPU using Cuda. This kind of approach really doesn’t work in my opinion for using OpenCL. Instead I’m currently of the opinion that the output of the sql expression tree ought to be a bit more than just VM bytecodes. I do wonder if utilizing llvm couldn’t offer interesting possibilities including SPIR (the Khronos intermediate representation standard for OpenCL) . Further research for sure.

The opencl accelerated API sequence looks like:

opencl_init(s, db); opencl_prepare_data(s, sql); opencl_transfer_data(s); opencl_select(s, sql, 0); opencl_transfer_results(s);

At this point, what I’ve managed to do is using a 100,000 row database with 7 columns run the same query using the sqlite c interface and my opencl accelerated interface.

With the sqlite c API the query took 420274 microseconds on my chromebook a dual core A15 cpu running at 1.7 Gz.

The OpenCL accelerated API running on the Mali T604 GPU at 533Mhz(?) from the same Chromebook yields 110289 microseconds. This measured time includes both the running of the OpenCL kernel and the data transfer from the result buffer.

These are early results. Many grains of salt should be applied but over all this seems like good results for a mobile GPU.