In this post: efficiently detecting when cycles appear, as edges are added to a graph, by taking advantage of a restriction on the type of graph.

Motivation

I worry a lot about little details. About ways people using my code might end up shooting themselves in the foot. Case in point: the collapsing futures library for Objective-C. This library basically exists because I wanted to take a mistake, forgetting to flatten doubly-eventual futures, and make it impossible by design.

In the collapsing futures library, any future that ends up with a result that is itself a future will instead collapse to use that would-have-been-the-result’s result as the actual exposed result. Even if the result is a huge chain of futures containing futures containing futures on top of futures, the top-most future will collapse to contain the same result as the bottom-most future. Collapsing continues until we have a result that’s not a future.

This is great… until someone sets a future to be its own result. Then, depending on how you implemented collapsing, the program will either get stuck in an infinite loop or else the future will end up stuck in eternal limbo (and referencing itself, which is a memory leak in Objective-C).

A programmer probably won’t set a future to be its own result on purpose. But it’s not too far-fetched to imagine them doing it by accident, perhaps like this:

TOCFutureSource* res = [TOCFutureSource new]; if (...) { ... } else { TOCFutureSource* dst = someOtherSourceWeWantToBehaveLike(); [res trySetResult:res.future]; // whoops, brain-dead typo! } return res.future;

Granted that’s not very well written code (why not just return dst’s future directly?), but the point is that I want to help out the poor programmer who wrote it. I want to give some detectable indication that something is wrong. I’d like to explicitly mark the cycled future as immortal, so there’s at least something they can spot in the debugger and go “huh?” followed by googling “immortal future” followed by learning.

Which, finally, leads us to the problem I want to talk about in this post: incremental cycle detection.

Incremental Cycle Detection

Cycle detection is the process of determining if a graph has a cycle. Incremental cycle detection is the process of determining when a cycle is introduced into a graph, as you add edges to it. (If you’re removing edges it’s called decremental cycle detection, and if you’re both adding and removing them then it’s dynamic cycle detection.)

Determining when a collapsing future will end up stuck collapsing into itself is an incremental cycle detection problem. As we set futures to be the result of other futures, we’re building up a directed graph of futures pointing to the future they will reach through to get their final result. When a cycle appears in the graph, a future is collapsing into itself and needs to be saved from limbo.

The bad news is that the best known incremental cycle detection algorithms are not very efficient. For example, three years ago the state of the art was improved by Bender et al. to (where is the number of nodes, is the number of edges, and is ‘take the minimum between’).

In our case, where futures only have one result so there can’t be more than one out-edge per node, that corresponds to time. is too much time, at least for our application. The cost isn’t worth the benefit.

The good news is that we have an edge. We don’t need an algorithm for the fully general case. We know that our nodes have an out-degree of at most 1, because futures only ever get one result. We can take advantage of that to get a more efficient algorithm.

Simple Beginnings

The major benefit of having out-degree at most 1 is that our graphs are a bit like funnels. They always lead to a single destination, never forking or diverting along alternate paths. Everything merges together. This suggest a trivial, if inefficient, cycle detector: just head forward and see if you run into yourself. On a normal graph this wouldn’t work, because we’d need to make choices about which way to go when there were multiple ways forward, then retrace our steps when we picked wrong, and etc and etc… but we’re in a funnel with one way forward or no way forward! There’s no way to avoid running into ourselves, so it’s easy.

Fun fact: you can call these ‘funnelling’ graphs spaghetti stacks.

Here’s some naive code that will add edges while detecting if we break the single-out-edge no-cycle rules:

void SetForwardRemovingCycles(Node src, Node dst) { assert(src != null); assert(dst != null); assert(src != dst); assert(src.Forward == null); // funnel forward to check for cycle Node n = dst; do { n = n.Forward; } while (n != null && n != src); src.Forward = dst; // did we create a cycle? if (n == src) { RemoveNodesInCycleContainingNode(n); // left as an exercise for the reader } }

The above code is trivial, but the overall time it takes can be quite high. The worst case occurs when a chain builds up from front to back. Each successive node in the chain will have to scan across the entirety of what’s already been built, taking longer and longer each time. Overall that’s steps, which is quadratic time and worse than the general purpose algorithm. (Too bad we can’t have infinitely many items, use a slightly inappropriate form of summing, and finish steps before we started.)

We need to be a bit more clever, and shorten those long chains somehow.

Permanent Lookahead

What if we just made nodes point further ahead every time we passed over them? In the case of collapsing futures we don’t even need to remember the original node we were pointing at, because we only care about the final destination. Anytime we passed over a chain it would get shorter, preventing compounding inefficiencies.

Tweaking our algorithm to constantly point nodes further forward as it travels, shrinking paths every time they’re used, gives code that looks like this:

void SetForwardRemovingCycles(Node src, Node dst) { assert(src != null); assert(dst != null); assert(src != dst); assert(src.Forward == null); // funnel-frog forward to check for cycle Node n = dst; while (true) { Node p = n; n = n.Forward; if (n == null || n == src) break; n = n.Forward; if (n == null || n == src) break; // permanently shorten path we took by leap-frogging intermediate node p.Forward = n; } src.Forward = dst; // did we create a cycle? if (n == src) { RemoveNodesInCycleContainingNode(n); // left as an exercise for the reader } }

Although it looks very similar, the above algorithm is significantly more efficient overall. Any path that gets travelled down will have its size cut in half. Now our worst case is a chain built back to front, followed by repeatedly adding nodes to the back of the longest chain until we’re left with a very shallow graph.

I’m not quite sure what the worst case is here. I think it might be . The point of mentioning this idea is not the results, it’s that it eventually reminded me of this same optimization being applied to another algorithm: union-find.

Detecting Redundant Unions

In the union-find world, where the only operations are “what set is this item in?” and “merge these two items’ sets”, the optimization from above is a simple form of what is called path compression. It is one of two optimizations that allow union-find operations to be performed in time.

Mind-blowingly, is the inverse Ackermann function. Roughly speaking, it’s how many steps you must go into the sequence before you exceed . Hint: for all practical purposes is less than because has more digits than there are atoms in the observable universe. For all practical purposes, union-find operations take amortized constant time. They can also be made worst-case efficient, taking less than logarithmic time.

Why am I talking about how efficient union-find is? Because it solves our problem. Originally I was thinking of the problem in terms of cycles, but it also makes sense in terms of connected components.

Before we give a node its only out-edge, it must be at ‘the front’ of the connected component it is part of (because the graphs are shaped like funnels). It’s the root of the spaghetti stack. If we terminate the out-edge on a node from another connected component, the connected components are merged and there’s a new ‘front’ to the component. If we terminate the out-edge on a node already in the same connected component, then we know there’s a cycle because going forward will hit the front of the connected component and that’s the node we just added an out-edge to.

In other words, anytime we set one future to be another’s result, we union them into the same ‘result set’. But just before we union them, we check if they are already in the same result set. If they are, then we just created a cycle and the whole set must become immortal. So:

void SetForwardRemovingCycles(Node src, Node dst) { if (resultSetsUnionFind.Find(src) == resultSetsUnionFind.Find(dst)) { MarkElementsOfSetAsImmortal(src); } else { resultSetsUnionFind.Union(src, dst); } }

So does having out-degree bounded to 1 make the problem easier? Uh, YES. We went from an academic time bound to what is effectively a linear time bound, with existing implementations in practice. That’s a huge improvement.

I did not known union-find would work when I started writing this post. It’s one of the nice side benefits of writing: being forced to think through a problem in order to present it. I started with just the “hey this adjusting links to point further ahead thing is way more efficient!” idea, but now I have an even better optimization I can implement for the collapsing futures lib.

Summary

Constraining a problem can make it much simpler to solve.

When out-degree is bounded to 1, you can do incremental cycle detection with a union-find algorithm.

The amortized time bound on union-find is ridiculous.

—

—

—



Twisted Oak Studios offers consulting and development on high-tech interactive projects. Check out our portfolio, or Give us a shout if you have anything you think some really rad engineers should help you with.



Archive