Communicating through UUID conflicts

“This is definitely stupid, yet slightly applicable”

Part 1: Posing the problem

For one issue at work, we tossed around the idea of generating UUIDs client-side in order to ensure that we had a unique immutable id without having to go to the server to get a permanent id back. Obviously, if somebody’s maliciously creating ids, they can choose arbitrary values and create collisions. Additionally, one client could tell if another client had already chosen a certain value for an id by trying to commit a certain UUID to the DB. For example:

> Client 1 creates a "todo" with a certain UUID.

> Client 2 tries to create a “todo” with the same UUID.

> Client 2 now has some information about client 1’s actions.

I ended up posing the purely academic question of how we might be able to send arbitrary messages over that channel. As an example of the mechanism of communication:

> Client 1 creates a "todo" with the following UUIDS

- 000...-000000001

- 000...-000000003

- 000...-000000005

> Client 2 creates todos of the following UUIDs with responses:

- 000...-000000001 -> 409 Conflict

- 000...-000000002 -> 201 Created

- 000...-000000003 -> 409 Conflict

- 000...-000000004 -> 201 Created

- 000...-000000005 -> 409 Conflict

If we interpret the 409s as 1s, and the 201s as 0, we get the string 10101, or 21 in binary. One client has now communicated some information to the other, merely through conflicts in IDs!

Design Goals and Optimization Criteria

We’ve established that basic communication can occur between different clients, but designing a protocol requires having solid design guidelines. Here are a few possible goals for a system like this, where an asterisk denotes whether I’ll be addressing those goals in this design:

The protocol should be relatively robust to existing records in the database* Two communicators must not know when the other is sending a message* Each client should be able to send multiple messages over a period of time* The protocol should mitigate concurrency issues* The protocol should mitigate the impact of malicious actors The protocol should support more than two clients over the shared medium* The protocol should reduce of number of requests to the server The protocol should support client discovery

With these constraints and criteria, we can begin to start crafting a communication protocol and have a general notion of why we might prefer one protocol over another. To be perfectly clear: This is a toy problem with interesting constraints.

Modeling the Problem Domain

Each client can create a record (like a todo) with a certain UUID in this space:

----

0: 00000000-0000-0000-0000-0000000000

1: 00000000-0000-0000-0000-0000000001

...

n: ffffffff-ffff-ffff-ffff-ffffffffff

----

Assume that if an ID hasn’t been used already, you get a 201 Created back from the server. If an ID has been used, you get a 409 Conflict back. All GETs will 404 Not Found, so in order to tell whether an ID has been used, you have to try to create a “todo” with that ID.

The interesting bit is that by reading whether or not another client has created a record with a UUID, you inevitably end up creating one if it doesn’t already exist. In other words, by reading whether a bit is set, you inherently overwrite it to be set. That is, the “create a record” operation can be viewed as a joint “read and write true” operation:

[...]

> Client 2 creates todos of the following UUIDs with responses:

- ...001 -> 409 Conflict

- ...002 -> 201 Created

- ...003 -> 409 Conflict > Client 2 tries to read the same UUIDs again:

- ...001 -> 409 Conflict

- ...002 -> 409 Conflict // Overwritten!

- ...003 -> 409 Conflict

To Shared Memory

We can imagine a table of sequential UUIDs and whether they're taken:

----

uuid: | ...0-0000001 | ...0-0000002 | ...0-0000003 | ...

taken?: | no | yes | yes |

----

We can model the space of taken vs. untaken UUIDs as shared memory, where the address is the sequential index of the UUID, and data is whether the record exists or not.

----

addr: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | ...

data: | 0 | 1 | 1 | 0 | 1 | 1 | 0 | ...

----

This address space will be the basis of the rest of the discussion.

Part 2: Tackling the Problem

From here on out, I’ll be trying to design a message protocol that conforms to the constraints above.

The first attempt I’ll construct consists of a sled of 1s at the bottom of the memory space, a data structure I’m calling a “shared message store” that’s just beyond the sled of 1s, and a data heap that starts at the top of the address space and grows downwards in address space.

The rough overview is that when a client polls for messages, it sequentially reads the sled of 1s to try to find the shared message store, reads the whole shared message store, performs manipulations to the shared message store, and rewrites the shared message store just beyond where the previous one ended (which is now filled with 1s due to the read.)

Prerequisite: The sled of ones

The purpose of the sled of ones is to enable a client to read and write values from the memory, even though all values are overwritten on read. I’ll use this mechanism extensively in the protocol. Following is a quick example to illustrate the mechanism:

Basic read of data from the sled of 1s

---

> Client reads the first byte, observes it's a 1, then continues

| 1111111101011000000000 |

^ > Client reads the second byte, observes it's a 1, then continues

| 1111111101011000000000 |

^ [ until... ] > Client reads a zero, and interprets it as the start bit to a value

| 1111111101011000000000 |

^ // overwrites the start bit > Client reads the rest of the k-sized message and interprets it.

| 1111111111011000000000 |

^^^^^^ // reads k=6 bytes, 101100, overwrites it > Client rewrites the value just beyond the newly written 1s

| 1111111111111110000000 |

^ ^^ // Writing "101100" to right after 0 The state looks the same, except for having more 1s in the sled:

| 1111111111111110101100 |

Initial Memory Layout for the Protocol

To make it easier on ourselves to start, we’re going to start with the assumption that each client can perform an entire routine atomically. This is, of course, a horrible assumption, but I’ll try to address it later. Here is the memory layout I’m starting with:

Screenshot because the screen was too wide

Glossary of fields

start : The start bit, indicating the beginning of the message store

: The start bit, indicating the beginning of the message store msg_count : The number of messages in the message store

: The number of messages in the message store head_ptr : The pointer for the top of the message data heap. Gets updated when a new message is added to the data heap.

: The pointer for the top of the message data heap. Gets updated when a new message is added to the data heap. dest_addr : The client id of the destination for that message.

: The client id of the destination for that message. msg_ptr : The pointer to the to the message data for that message record

: The pointer to the to the message data for that message record msg_size: The length of the message data for that message record

To read from the message, read down the field of 1s until you find a zero,

then read the subsequent 2 bytes to get the message count. Since messages are of a fixed size, we can now read the whole set of message records. Once we’ve read the message queue, we can read the data in the message queue.

Since after reading, the field of 1s now extends through where the shared message store was written, we rewrite the message store at the end of the new field (removing any messages addressed to that client.) As an example:

> Client reads the first 0

| 111111 0 00000010 11110110 00001111 10111010 00001010 000... |

^ (msg_cnt head_ptr dst_addr msg_ptr msg_size) > Client reads subsequent 2 bytes to get message count and head ptr

| 111111 1 00000010 11110110 00001111 10111010 00001010 000... |

^^^^^^^^ ^^^^^^^^ // In this case 1 message in the store > Client reads message off in struct

| 111111 1 11111111 11111111 00001111 10111010 00001010 000... |

^^^^^^^^ ^^^^^^^^ ^^^^^^^^ > Client rewrites message store right of overwritten values:

| 111111 1 11111111 11111111 11111111 11111111 11111111 0 00.. |

write starts here ^

After reading, re-copy the message store just past the end of the newly invalidated block, exactly as described above.

Inserting a message into the Message Store

Inserting a message is equivalent to reading a message, but right before rewriting the message, perform these steps:

Write the message data to just before the head_ptr address. Update head_ptr to the start of the newly written data blob. Add a new message record to the client’s representation of the message store. Update msg_count

Concurrency

I’m going to attempt to tackle this one whittling down from strong guarantees

to weak ones. The concurrency models from strongest to weakest are:

Clients can execute an entire routine atomically Clients can execute a readwrite of k addresses atomically (with some limit). Clients can execute a readwrite of 1 address atomically.

The above protocol is robust to (1) but to no others, and thus will have to be amended.

Tackling k = O(log n)

Can we get atomic locks if we assume for an address size of n, k = O(log n)? The answer seems to be yes, via the following mechanism. First, we split up the shared message store into a static part and a dynamic part:

head_ptr_record

| 01 | 00100101 |

start head_ptr

We move the residual part of the message store to just beyond the head_ptr:

shared_message_store:

| message 1 | ... | message m-1 | message m | msg_count | message layout is unchanged:

| dest_addr | msg_ptr | msg_size |

Our whole memory space now looks like this:

[ ones | head_ptr_rec | zeros | message_store | message_data ]

With this memory layout, we can now lean heavily on k = O(log n) to construct a locking mechanism, as detailed below:

Detail of the Locking Mechanism

Let’s modify our original behavior, such that we read in chunks of k. In other words, we set our word size to be k.

Let’s say k = 8, for this exercise

| 11111111 11111111 00101011 00000000 |

| Word 1 Word 2 Word 3 Word 4 |

We then issue a read for word 3, reading that byte sequence in a single atomic operation (as per our assumed atomicity constraint). The client reads 00101011 from Word 3, leaving the memory space as follows:

| 11111111 11111111 11111111 00000000 |

The interesting thing is that if we choose the sled of ones to

be written in chunks like this, we can define a locking mechanism

on top of it. We state for a client trying to acquire a read/write lock:

A word of all ones is part of the ones sled and the client

should read the next word. A word of all zeros means that someone else has a lock A word that starts with 01 means that the word is a valid head

ptr, and the client has gotten the lock.

Choosing a start sequence of 01 allows us to disambiguate between

these 3 possibilities because a valid message will never be all 1s or all

0s. The convenient thing about this series of choices is that a lock is guaranteed to belong to at most one client at a time:

Hitting a series of all 1s does not change the state, nor affect the lock condition.

State:

| 11111111 00000000 |

| word 1 word 2 | > Client reads word 1 atomically State:

| 11111111 00000000 | unaffected

| word 1 word 2 |

Hitting a series of all 0s does not affect the lock condition, per the following:

> State before client reads word 2

| 11111111 00000000 00000000 | locked because word 2 is blank

| word 1 word 2 word 3 | > State after client reads word 2

| 11111111 11111111 00000000 | locked because word 3 is blank

| word 1 word 2 word 3 |

Hitting a valid head ptr atomically grabs a lock for the client reading:

> State before client reads word 2

| 11111111 01100100 00000000 | lock available

| word 1 word 2 word 3 | > State after client reads word 2

| 11111111 11111111 00000000 | locked acquired, word 3 is all 0s

| word 1 word 2 word 3 |

This requires atomic manipulation of 2 + log2(n) bits where n is the size of the space of available UUIDs.

Initializing the Memory Space

This protocol does not (yet) handle the initial coordination of the clients. If the memory space were blank at the onset of clients joining, then all clients will loop consistency, each thinking that some other client must have pulled the lock.

To solve this, I’ll add in a quick asymmetry to the specification: a client that writes to the first word automatically gains the read/write lock and assumes the default head pointer state of the form [ 01 | top_of_address_space ]. Since a field of zeros is a valid shared message store, the client performs no additional setup beyond simply assuming the lock.

An extremely rough implementation of the protocol up to this point can be found in my scratchpad repo.

Tackling k=2

Really, the essence of the locking mechanism we need is the start sequence. We can replace the whole lock operation with just the sled of 1s, followed by the two-bit sequence 01. From a practical perspective, this makes it far less likely that two clients will conflict horrifically if atomicity can only be guaranteed to k=1, but that’s little consolation when the jitter between requests is by no means negligible.

To tackle k=2, I partition the memory into 3 sections and treat each section as a contiguous shared memory space. I arbitrarily assign:

First 1/3rd (M1): Locking mechanism (just the 01 sequence from the head_ptr data structure)

Second 1/3rd (M2): Shared Message Store (identical the original construction)

Final 1/3rd (M3): Message Data

State Snapshot

M1: | 11 11 11 11 01 00 00 00 | no client lock on data

| ones sled | lk | zeros | M2: | 1111111111 011010010000 | Shared data store after sled.

| ones sled | msg store...| Sledded because lk has no ptr. M3: | 110101111001001 0000000 | No sled, because msg store has

| message data | zeros | pointers into this space.

Since the locking mechanism can no longer store a pointer, we must still

put in a sled of 1s on both M1 and M2. This is the primary modification to

get us to k=2, and is the best construction I’ve come up with so far.

Tackling k = 1?

I have no idea how to do this, and strongly suspect it’s impossible. I don’t know how to prove it impossible either, so if anybody has suggestions, I’m all ears.

Error correction

Obviously, we’re not the only one using this system; there will be existing entries in the UUID space. We’ll have to work around them and mitigate the (extremely rare) errors incurred. I’m going to state for

We have 2 possible system behaviors to watch out for:

The system allows records can be deleted and their IDs reused. The system does not allow for the reuse of IDs.

For generality, we’ll assume the former model. For messages, pointers, counts, and other pieces of data, a simple garden-variety Hamming code should be sufficient to detect and correct single errors.

For forward-correction in the start sequence and locking mechanisms, the 3 word states we need to distinguish are all 1s (for the ones sled), all 0s (for resource locked), and a valid start sequence of our choosing. To ensure a Hamming distance of 3 between any two of these cases, we need 6 bits. We’ll therefore expand the start sequence from 2 bits to 6 bits, and choose [ 000111 | head_ptr ] as the memory layout. On observing a value, we’ll correct to whichever valid 6 byte sequence has the minimum Hamming distance to the observed value.

And I think that solves just about all the problems I’m willing to tackle within this thought experiment.

Part 3: Further Explorations and Contextualization

Given that the constraints in this problem set are relatively new, there’s ample space for creative exploration. My proposed protocol is far from the only possible style of solution, and after a few conversations, I started seeing how different strategies might emerge.

For example, my solution does not utilize time domain — I implicitly assume that we can never assume time as a total ordering. However, on seeing this problem, my coworker proposed a concept we started calling a synchronization block: a region of memory shared asymmetrically by two clients, by which client 2 can notify client 1 of an arbitrary event. Client 1 sequentially reads position 1, position 2, position 3, until it reads a 1 (unless it reaches the end of the buffer).

Normal operation

|1111110000000| Client 1 reads 0 from each position sequentially

^

|1111111111111| Client 2 writes to each position in the block

^^^^^^^^^^^^^

|1111111111111| Client 1 reads a 1, representing some event

^ Out of buffer operation

|1111111111111| Client 1 reads the last 0 address in the buffer,

^ noticing that no 1 was written. |1111111111111| Client 2 reads all positions, noticing that all

^^^^^^^^^^^^^ positions all return 1. Both clients now know that client 1 did not receive the event.

Under the assumption that time does not pass uniformly, this does not work. But under the constraints of real-world systems, we can often reasonably make these assumptions and get away with such a data block. The primary point of this illustration is how small variants yield interesting and novel constructs in the problem space.

Why on earth did you spend time on this?

In my mind, the interesting part of this article is the problem more than the protocol. The protocol’s purpose is to show that the problem itself is worth investigating. Even though this is a toy problem, it has interesting consequences that yield nicely under some, but not a lot, of mental energy. I love those sorts of tasks.

As always, if you find anything wrong with this article, please reach out and tell me. Happy hacking, all.