Next we need to add two discreet pieces of information attached to each record. Called the created XID and expired XID. Both representing a transaction ID but not always required to have values. Explained in more detail for each scenario.

It’s best to store these directly with the record itself. However, that is not always possible so you can maintain them somewhere external as long as you are the only one modifying the data to maintain consistency.

Record Visibility and Locking

Ultimately this is what MVCC comes down to:

The visibility of a row depends on who is looking at it.

Every row has to be tested if it is visible from the point of view of the transaction looking at it:

#class Transaction:

def record_is_visible(self, record):

# The record was created in active transaction that is not

# our own.

if record['created_xid'] in active_xids and \

record['created_xid'] != self.xid:

return False # The record is expired or and no transaction holds it that

# is our own.

if record['expired_xid'] != 0 and \

(record['expired_xid'] not in active_xids or \

record['expired_xid'] == self.xid):

return False return True

Furthermore, we discussed before that simultaneous transactions cannot make a modification to the same record. If this happens there are two ways to handle this:

Abort and rollback the transaction that tried to make the most recent change(s). You may also want to propagate the error back to the original client. Wait (block) the second transaction until that record becomes available. This has some special challenges with performance and potential reread errors.

The safest and easiest one is the first choice — so that’s what I’m going to use in this tutorial. When we do need to modify a record we need to check if it’s locked by another transaction with this:

#class Transaction:

def row_is_locked(self, record):

return record['expired_xid'] != 0 and \

row['expired_xid'] in active_xids

Adding a Record

This is an easy one. We set the created_xid to the current transaction ID and expired_xid to 0 :

#class Transaction:

def add_record(self, record):

record['created_xid'] = self.xid

record['expired_xid'] = 0

self.rollback_actions.append(["delete", len(records)])

records.append(record)

The rollback_actions will be explained a bit later.

Deleting a Record

There are two possibilities:

The expired_xid is 0 meaning the record has never been deleted by anyone. So by setting expired_xid to the current transaction ID we are marking it as deleted. The expired_xid it not 0 and expired_xid is an active transaction. The record has been deleted by another active transaction and so we cannot touch it.

A third scenario where expired_xid is not an active transaction is not possible due to the normal visibility constraints.

#class Transaction:

def delete_record(self, id):

for i, record in enumerate(records):

if self.record_is_visible(record) and \

record['id'] == id:

if self.row_is_locked(record):

raise Error("Row locked by another transaction.")

else:

record['expired_xid'] = self.xid

self.rollback_actions.append(["add", i])

The rollback_actions will be explained a bit later.

Updating a Record

Updating is a simply combination of deleting the old record and adding a new record. This allows the existing record to still be viewed by other transactions. If the delete_record fails then the exception raised would cause the subsequent add_record not to happen which is what we want.

#class Transaction:

def update_record(self, id, name):

self.delete_record(id)

self.add_record({"id": id, "name": name})

Commit Changes

Once all the modifications have been made we need to commit all the changes so that future clients/transactions can see these new changes. Very easy, we simply remove our transaction ID from the active list of transactions:

#class Transaction:

def commit(self):

active_xids.discard(self.xid)

Notice this operation has a consistent time complexity, it is not based on how many changes we are commiting. This makes long running transactions that have a huge number of changes possible and sometimes more desirable.

Rollback Changes

Rolling back can be done several ways, one way is to replay the changes in reverse:

#class Transaction:

def rollback(self):

for action in reversed(self.rollback_actions):

if action[0] == 'add':

records[action[1]]['expired_xid'] = 0

elif action[0] == 'delete':

records[action[1]]['expired_xid'] = self.xid active_xids.discard(self.xid)

This is fine for an application that can guarantee that the all rollback actions will be replayed, and that when the application is shut down forcefully that all the active transactions will have rollback() invoked on them.

If you want higher durability that can recover from the application randomly crashing you will need to keep the transaction IDs stored somewhere more persistent. Then you can have extra code when the application starts up that checks to see if it was shutdown safely, and manually repairs the records if need be. I will not go into this as this article is already long enough, but it may be explained in a future article.

Vacuuming or Reclaiming Space

You have probably noticed that with this algorithm that it doesn’t ever actually delete data, only mark it as deleted. This makes it fantastic for keeping lots of records on disk by only appending/update-in-place to the file for modifications.

Overtime you will likely want to gain back all that dead space. If it’s in a memory database you could simply iterate through the records and permanently delete the records that are now fully dead.

If your using a medium like the disk this isn’t so easy. If your application isn’t too complicated you may be able to rewrite all of the non-dead rows out to a new file and switch it underneath your application. There are tons of solutions for this that aren’t specific to how MVCC works so I’ll leave that up to you.