About a year ago, I was using NetworkX (a Python package for studying graphs) for one of my projects. While I was working on it, I found that the topological sort implementation in NetworkX was very slow on large-scale instances. So I decided to take a look at it, hoping I can figure out what’s wrong and fix it. In the end, I sped up considerably the implementation. Here is how I did it.

Please note that you don’t need to understand how topological sort works to follow the rest of the article as this is just a pretext to realize that it’s important to choose the right data structures when implementing algorithms.

Below is the original implementation of topological_sort_recursive in NetworkX.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 def topological_sort_recursive (G,nbunch = None): if not G . is_directed(): raise nx . NetworkXError( "Topological sort not defined on undirected graphs." ) # function for recursive dfs def _dfs (G,seen,explored,v): seen . add(v) for w in G[v]: if w not in seen: if not _dfs(G,seen,explored,w): return False elif w in seen and w not in explored: # cycle Found--- no topological sort raise nx . NetworkXUnfeasible( "Graph contains a cycle." ) explored . insert( 0 ,v) # inverse order of when explored return True seen = set () explored = [] if nbunch is None: nbunch = G . nodes_iter() for v in nbunch: # process all nodes if v not in explored: if not _dfs(G,seen,explored,v): raise nx . NetworkXUnfeasible( "Graph contains a cycle." ) return explored

The main issue in this implementation is that two crucial operations on the variable explored , which is a list of nodes, take linear time instead of taking a constant time if proper data structures were chosen.

The operations w not in explored (lines 13 and 25) and explored.insert(0,v) (line 16) are inefficient on a list because a list in CPython is implemented as an array. This means that both a membership test ( in ) and inserting at an arbitrary position ( insert ) usually take linear time. In the first case, one must perform a check on each element and in the second case one must shift all elements to the right by one position to insert an element at the first position. This leads to lots of unnecessary memory copies.

To prevent such situations, one should keep in mind the run-time of the operations on the standard data structures. If this is not familiar to you, this page is a good place to start.

Enough said, let’s now look at how to speed up the topological sort implementation.

To fix the in operation, we simply need to change the type of explored to a set. This way, a membership test now takes a constant time because a set is implemented as a hash table in CPython.

Since we need to keep track of the order of the visited nodes (this is the purpose of the insert call at line 16) and a set is unordered, we must add another variable ( order ) of type list to which we insert the nodes. But instead of inserting a node at the beginning of the list, we insert it at the end. Appending to a list usually takes a constant time because we only need to insert a new element at the end. Finally, we need to reverse the list at the end of the algorithm to get the same order as in the original implementation.

Here is the new implementation:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 def topological_sort_recursive (G,nbunch = None): if not G . is_directed(): raise nx . NetworkXError( "Topological sort not defined on undirected graphs." ) def _dfs (v): ancestors . add(v) for w in G[v]: if w in ancestors: raise nx . NetworkXUnfeasible( "Graph contains a cycle." ) if w not in explored: _dfs(w) ancestors . remove(v) explored . add(v) order . append(v) ancestors = set () explored = set () order = [] if nbunch is None: nbunch = G . nodes_iter() for v in nbunch: if v not in explored: _dfs(v) return list ( reversed (order))

I did a similar refactoring for the non recursive implementation and benchmarked my new implementations on a graph with 199,496 nodes and 600,000 arcs. The results are quite impressive:

Implementation Original (s) New (s) Recursive 4027.5 0.5 Non-Recursive 15.6 0.6

Before, it took more than a hour to apply the recursive topological sort on this instance and 15.6 seconds for the non-recursive version. It now takes less than a second in both cases.

I pushed my changes to the git repository and with the recent release of version 1.8 they are now part of the official release.

Hacker News Discussion

Reddit Discussion