Given a piece of code, how do you know if it's lock-based or lock-free, and why do you care?

Sounds like a trivial question – it's lock-based if it says, "lock a mutex", which is usually an OS service. If it doesn't say "lock" or "unlock", then it's lock-free. And we care because OS services like mutexes are expensive and if we don't use them, code runs faster.

Which is all true, in a way, but it doesn't really answer the question. "The OS" isn't magic – it's code. You can have code implementing userspace locks outside the OS. Or you can have an OS where there's just one address space, and then that OS's locks are "userspace" code like any other. And so the question remains: which piece of code implements "locking" and which doesn't?

It's all code, right? A piece of code is a sort if it sorts. When is code locking? What is locking, as opposed to "lock-free" code? We'll try to answer that question, and then the answer will tell us why (and when) the "locking"/"lock-free" distinction actually matters.

We'll start with two examples – atomic addition and spin locks. The code for these can be surprisingly similar, and this similarity will help highlight the one thing that really sets the two apart.

Consider an atomic addition – something like gcc's __sync_fetch_and_add(), which increments a number at some memory location and returns the old value "atomically".

"Atomic" (only) means that nobody will mess up the state midway through the operation. "Atomic" doesn't mean, for example, that a single instruction is used – in fact, often it's not the case. Nor does "atomic" imply "lock-free". An implementation of atomic addition could legitimately use:

A single machine instruction A loop observing that someone messed up our state, and retrying A mutex that's locked while the incrementing is done

Let's say it's a loop, something along the lines of:

do { val = *addr; } while(!compare_and_swap(addr, val, val+inc));

This is atomic, because if someone modifies addr before we manage to write val+inc, then compare_and_swap (CAS – a typical primitive) will observe that addr no longer holds val, fail, and make us retry. Note that we can get stuck at the loop for an indefinite period of time.

Now consider a spinlock – a loop doing something like:

while(!compare_and_swap(addr, UNLOCKED_VAL, LOCKED_VAL));

This will wait until addr holds UNLOCKED_VAL, and will modify it to keep LOCKED_VAL – unless someone else comes ahead of us, in which case they will write LOCKED_VAL first – and we're back to calling our CAS primitive in a loop. Note that we can get stuck at the loop for an indefinite period of time.

So now you see the difference between "locking" and "lock-free": our atomic addition loop is lock-free, and our spin lock loop implements locking.

Wait, what?

They're both loops, and very similarly-looking ones. Moreover, we can get stuck at both loops for an indefinite period of time. How come they're at the opposite sides of the locking/lock-free distinction?! Where's the difference?

The difference is in whether we get stuck if another thread gets suspended.

The first loop – atomic addition – never gets stuck because of someone else being suspended. On the contrary, it will finish faster. It gets stuck if others modify addr all the time and it keeps retrying. If they're suspended, they can't modify addr, and it succeeds more quickly.

The second loop – the spin lock – will very much get stuck if another thread obtains the lock and then gets suspended before releasing it. And it will keep running until that thread gets the opportunity to finish whatever it did with the lock taken, and releases the lock.

That's why we care – and that's why, with locking, we tend to involve the OS in the first place! Having the OS involved could be a vast improvement over our spin lock loop, because the OS could notice that our thread is stuck because a suspended thread holds a lock. And if that thread is ready to run, the OS could let it run right now, knowing that otherwise, our thread won't make progress anyway.

Moreover, if our thread has a high priority, and the suspended "locker" thread has a low priority, the OS could raise the "locker's" priority until it releases the lock – because releasing the lock that a high-priority thread wants should itself have high priority. (This stuff is known as dealing with "priority inversion" – the situation where they are less important than we are, and yet they block us.)

And for all these good things to happen, the OS needs to know that a lock is taken – which it doesn't with our spin lock loop. For the OS, it's just some loop that could be doing anything. The OS would only know what happens if we used its own locking functions. (BTW, there's another option to get the OS involved: explicitly fiddle with threads and priorities inside the spin lock loop if it takes too long. It can get ugly but it can be the right thing to do.)

Of course having the OS involved will cost us, and we don't want locks in our code because of that, but that's a consequence. The root cause is that threads that lock and get suspended block other threads that lock, and this is why spin locks aren't always a great way to get rid of the pesky OS overhead.

This also shows when spin locks or similar are fine – efficient and just as good as "lock-free code". For example, if you're using hardware threads which are never suspended, such as hardware threads in XMOS processors, then locks are fine. We'd certainly see problems if suspension was possible, but it isn't.

There are other, perhaps more common situations, when locking is fine. For instance, two highest-priority threads running on two distinct physical processors can communicate using spin locks very nicely, because they can't get suspended while holding a lock (they could be if interrupts have still higher priority, but we don't care if interrupt latency is small enough – and perhaps they are interrupt handlers locking some common resource.) Such two threads don't need any help from the OS.

Or, perhaps there's a high-priority thread that sets two variables, A and B, for a low-priority thread. A is written to and then B. After getting notified that A was written, the low-priority thread reads B in a loop – as long as B is NULL, it wasn't written to yet, so keep polling.

Effectively, this is locking – if the high-priority thread gets suspended, then the low-priority thread will get stuck in the loop. But, assuming the high-priority thread has the highest priority, it won't get suspended – and again we're fine without any help from the OS.

These examples show that "locking" isn't defined by "using OS locking functions", and it can take many different forms. You don't spot it in code by looking for certain library calls or for loops having some known look. You spot locking by asking, can I get stuck at this point if someone gets suspended? (And sometimes you answer, "yes, so it's locking; but they won't get suspended, so I don't care.")

Likewise, "lock-free" isn't defined by the use of CAS or LL/SC (load linked/store conditional – two commonly available instructions that can be used to implement compare-and-swap), or by a specific sort of loops.

For instance, our atomic addition loop could be modified to check if the value exceeded a certain size, and quitting the loop if it did without modifying the value:

do { val = *addr; if(val+inc >= size) break; } while(!compare_and_swap(addr, val, val+inc));

This is still atomic addition – if someone modifies addr, it doesn't break our code. We just keep retrying until addr holds something that can be incremented without exceeding a given size, and we manage to actually increment it without someone beating us to it. This can be used to implement a sort of lock-free queue.

And this code isn't "lock-free" because CAS is used in a loop – that is true for our spin lock as well – but because we aren't blocked when threads are suspended (in fact, it helps us if they get suspended, since there are less interferences).

And this is why we never need help from the OS with lock-free code.

Before concluding, it's worth noting a way in which lock-free code isn't better than lock-based code. We've already seen that "lock-free" isn't (necessarily) better in any way when suspension doesn't threaten us. But regardless of suspensions, "lock-free" is never better in a particular way – specifically, it isn't better if you're worried about correctness.

For instance, suppose you have a counter keeping an account balance. There's a bug where money is transferred between accounts, and the value of the source account is checked before decrementing it:

if(balance - transfer > 0) { balance -= transfer; }

But several transfers can do the check before reaching the decrementing statement. Then they will all "succeed" – each will decrement the source account balance. As a result, the account balance may reach a negative value, and transfers will be authorized that shouldn't have been.

Here, it doesn't matter if the balance counter is locked or updated using any sort of lock-free addition. Both ways are better than nothing in the sense that all decrements will eventually happen and none will be lost. Both ways are still busted because the check and the decrement should be done atomically, and not just the decrement.

A program using a lock-free counter, or container, or any sort of data structure is thus exactly as correct or buggy as a lock-based version of the same program (provided that the locking and the lock-free updates are both implemented correctly, of course). That is, you can make (or avoid) exactly the same mistakes having to do with what orders of events you allow and how you handle them.

In other words, "locking" and "lock-free" are two equivalent ways to order events, the difference being that with locking, you have to deal with suspension, while with lock-free updates, you have to deal with the effects of concurrent interferences.

(You could say that locking tends to be less buggy and potentially less efficient because a uniform approach to coding is possible: just use the OS's locks. But as we've seen, you don't always have to or want to – and on the other hand, a uniform approach to coding is possible with lock-free code as well: just use off-the-shelf lock-free data structure libraries. And just like it's not automatically clear that locks are easier to develop with, it's not automatically clear that lock-free code is more efficient; both questions depend on the details.)

So that's it; I find it interesting though it could be rather trivial for you – and if you're one of these people, you might have spotted mistakes, in which case I'll be grateful for pointing them out. (I'm not that good at this stuff; what I'm comparatively good at is getting things done without this stuff – that is, sweeping it under the rug and getting imperative, concurrent, bug-free programs. But that doesn't generalize to all types of programs and it's a subject for a separate discussion.)

P.S. according to Wikipedia, it's actually a bit subtler. Specifically, the property of not being blocked by suspended threads makes an algorithm "non-blocking", but not "lock-free". "Lock-freedom" is a stronger property – a non-blocking algorithm is lock-free if there's guaranteed progress (as opposed to a situation where threads are busy undoing the effects of each other's interferences, and make no "real" progress). And there's a still stronger property, "wait-freedom", where progress is guaranteed in a bounded number of steps.

I allowed myself to ignore the distinction between non-blocking and lock-free because it's not a key part of what distinguishes lock-based synchronization from all the other kinds, which are often collectively called "lock-free" in informal speech. Of course in practice, a non-blocking algorithm that can get stuck in an interference-undoing endless loop is useless unless there's some sort of a workaround. Presumably that's the reason why "non-blocking" used to be a synonym for "lock-free" until about 2003 – it's not easy to think how a non-blocking algorithm that can get stuck forever can be made useable enough for this sort of thing to deserve a name of its own.