Data structures and algorithms - helping Santa Claus find his road to San Francisco

It's almost a Christmas time and very soon Santa Claus will pull out his sleigh along with the herd of reindeers. But how his gonna find his way to all the places he is supposed to visit? Well, asuming his not going to buy any GPS device we will try to help him out writing the shortest path algorithm from scratch with the underlying data structures. The goal of this article is to understand the nitty-gritty details about the algorithms and underlying data structures and not how to write code in functional style. That's why I'm not going to use the functional data structures and style (most of the time) but bear with me, this is just to set everyone one on the same page and give the basic understanding of the topic. I want also to give some intuition about the choices that has been made in terms of running time and space complexity.

I mentioned Santa Claus but what about San Francisco? To check the correctness of my algorithm I wanted to run it against the real datasets with millions of nodes and edges. I googled for some public datasets and found the "9th DIMACS Implementation Challenge - Shortest Paths" California and Nevada large road networks datasets. Perfect for the task at hand. So let's say Santa Claus has to find his way to Golden Gate Bridge in San Francisco from any point in California and Nevada. Of course following the roads on his sleigh (yeah, this kills the whole magic).

The shortest path algorithm (part I)

The road network can be seen as the big graph of nodes connected by the edges. As many of you know, the most popular algorithm to find the shortest path between two nodes in a graph is Dijkstra algorithm. The algorithm exists in many variants; Dijkstra's original variant found the shortest path between two nodes, but a more common variant fixes a single node as the "source" node and finds shortest paths from the source to all other nodes in the graph. This is the approach I'm going to use in this article.

Let's look at the example from Dijkstra algorithm wikipedia page:

From the animated image below we see that the "source" node a (1) is fixed and distances are calculated to each node in the graph. The shortest path to the destination node b (5) is calculated and is equlat to 20. It passes through the nodes 3 and 6 which is the shortest path indeed.

The pseudo code

The pseudo code of Dijkstra algorithm looks something like this:

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: initialize graph initialize Q // some structure to store the grpah nodes Q . insertAll ( graph . getVertices ()) while ( pq is not empty ) node = Q . extract_minimum () // this determines the speed of the algorithm edges = node . getEdges () for all edges { destination = edge . getDestination () newDistance = edge . getLength () + node . getDistance () if ( newDistance < destination . getDistance ()) { destination . setShortestDistance ( newDistance ) Q . decrease_key ( destination ) // this also determines the speed of the algorithm

When inserting the graph into the structure Q the starting node's shortes distance is set to 0.0 while the other nodes' shortest distances are set to infinity . Remove from the Q the min element and explore all of it's edges. Compare the shortest distances with all adjacent nodes and if any distance is less than the shortest distance on the current node, update adjacent node shortest distance inside the Q . Continue until Q is not empty. Nodes which got no edges will finish with the shortest distance of infinity because it is not possible "get to them" from the starting node. However, they will be still removed from the Q .

The running time

The original variant of the algorithm uses linked list or an array to store all the nodes of the graph. The extract_minimum operation is simply a linear search through all nodes (vertices) in Q and in that case the running time is \(O(|V|^2)\) where \(|V|\) is the number of nodes (vertices).

Depending on how Q is implemented the running time can be \(O(|E| \cdot T_{dk} + |V| \cdot T_{em})\) where \(T_{dk}\) and \(T_{em}\) are the complexities of the decrease_key and extract_minimum operations in Q respectively.

Now that we have a basic understanding on how the Dijkstra shortest path aglorithm works we need to implement our data structure Q . One of the approaches of speeding up the algorithm is to use a Priority Queue" and that is what I'm going to implement next.

Priority Queue

What is a Queue? A queue is an abstract data type supporting two operation Enqueue to the rear of the collection of elements and Dequeue from the front of the collection of elements. This makes it a FIFO data structure, the first element added to the queue will be the first one to be removed.

What is a Priority Queue? A priority queue is a generalization of a queue where each element is assigned a priority and elements come out in order by priority.

Typical uses cases are:

Scheduling jobs

Dijkstra’s algorithm: finding a shortest path in a graph

Prim's algorithm: constructing a minimum spanning tree of a graph

Huffman's algorithm: constructing an optimum prefix-free encoding of a string

Heap sort: sorting a given sequence

Supported operations:

Insert(p) adds a new element with a priority p .

adds a new element with a priority . Extract() extracts an element with the minimum or maximum priority (depending if it's a Min or Max priority queue)

For simplicity I'm using in my code Enqueue and Dequeue respectively.

Additionaly others operations like TryDequeue() , Delete(element) and ChangePriority(p) can be implemented.

Naive implementation considerations

If the priority queue is implemented with unsorted array/list, what are the costs of different operations?

Enqueue(e) adds e to the end. Running time \(O(1)\)

adds to the end. Running time Dequeue() scans the array/list for the min/max element. Running time \(O(n)\)

If the priority queue is implemented with sorted array, what are the costs of different operations?

Enqueue(e) finds a position for e \(O(\log n)\) by using binary search, shift all elements to the right of it by 1 \(O(n)\) , insert e \(O(1)\) . Running time \(O(n)\)

finds a position for by using binary search, shift all elements to the right of it by 1 , insert . Running time Dequeue() Extracts the last element. Running time \(O(1)\)

If the priority queue is implemented with sorted list, what are the costs of different operations?

Enqueue(e) finds a position for e \(O(n)\) cannot use binary search, insert e \(O(1)\) . Running time \(O(n)\)

finds a position for cannot use binary search, insert . Running time Dequeue() Extracts the last element. Running time \(O(1)\)

Is there a better data structure? Yes, binary heap

Enqueue(e) Running time \(O(\log n)\)

Running time Dequeue() Running time \(O(\log n)\)

Priority Queue with Binary Heap

Binary max-heap is a binary tree (each node has zero, one, or two children) where the value of each node is at least the values of its children. In other words, for each edge of the tree, the value of the parent is at least the value of the child. Binary min-heap is the other way round.

My implementation should support the following operations

TryDequeue() Retruns the root element without removing the node. Running time \(O(1)\)

Retruns the root element without removing the node. Running time Enqueue e Attach a new node to any leaf. This however my violate the heap property. To fix this we let the new element to siftUp .

Attach a new node to any leaf. This however my violate the heap property. To fix this we let the new element to . private siftUp i it swaps the problematic node (indexed i ) with its parent until the property of the heap is satisfied. Invariant: heap property is violated on at most one edge. This edge gets closer to the root while sifting-up. Running time \(O(H)\) where \(H\) is the height of the tree.

it swaps the problematic node (indexed ) with its parent until the property of the heap is satisfied. Invariant: heap property is violated on at most one edge. This edge gets closer to the root while sifting-up. Running time where is the height of the tree. Dequeue() replace the root with any leaf but again this may viloate the heap property. To fix it, we let the problematic node siftDown .

replace the root with any leaf but again this may viloate the heap property. To fix it, we let the problematic node . private siftDown i it swaps the problematic node with larger child until the heap property is satisfied. We swap with the larger child which automatically fixes one of the two bad edges. Running time \(O(H)\) where \(H\) is the height of the tree.

As most of the operations works in time \(O(H)\) where \(H\) is the height of the tree, we definitely want a tree to be shallow.

How to keep a tree shallow? Making it a complete binary tree. A binary tree is complete if all its levels are filled except possibly the last one which is filled from left to right. The first advantage is that a complete binary tree with \(n\) nodes has height at most \(O(\log n)\). The second advantage is that it can be easily stored as array. How to find the parent, leftChild and rightChild of the node stored in array? This is the simple formulas for 0 based indexed array:

1: 2: 3: let parent i = ( i - 1 ) / 2 let leftChild i = 2 * i + 1 let rightChild i = 2 * i + 2

This is the mutable implementation of priority queue using System.Collections.Generic.List<'T> (which is baked by an array):

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: 64: 65: 66: 67: 68: 69: 70: type PriorityQueue < ' T when ' T : comparison > ( values : seq < ' T > , isDescending : bool ) = let heap : System . Collections . Generic . List < ' T > = System . Collections . Generic . List < ' T > ( values ) let isGreater x y = if isDescending then x > y else x < y let isLower x y = not ( isGreater x y ) let mutable size = heap . Count let shrinkHeap () = let shouldShrink = size < heap . Count / 2 if shouldShrink then heap . RemoveRange ( size , heap . Count - size - 1 ) let parent i = ( i - 1 ) / 2 let leftChild i = 2 * i + 1 let rightChild i = 2 * i + 2 let swap i maxIndex = let temp = heap . [ i ] heap . [ i ] <- heap . [ maxIndex ] heap . [ maxIndex ] <- temp let siftUp i = let mutable indx = i while indx > 0 && isLower heap . [ parent indx ] heap . [ indx ] do swap ( parent indx ) indx indx <- parent indx let rec siftDown i = let l = leftChild i let r = rightChild i let maxIndexLeft = if l < size && isGreater heap . [ l ] heap . [ i ] then l else i let maxIndex = if r < size && isGreater heap . [ r ] heap . [ maxIndexLeft ] then r else maxIndexLeft if i <> maxIndex then swap i maxIndex siftDown maxIndex else () let build () = for i = size / 2 downto 0 do siftDown i do build () new ( values ) = PriorityQueue < ' T > ( values , true ) new () = PriorityQueue < ' T > ([], true ) member this . IsEmpty = size = 0 member this . Count = size member this . Dequeue () = if this . IsEmpty then raise ( new Exception ( "No more elements to dequeue" )) let result = heap . [ 0 ] heap . [ 0 ] <- heap . [ size - 1 ] // we limit the boundary but the last element stays in memory // we could use heap.Remove but it's O(n) operation so too slow size <- size - 1 shrinkHeap () siftDown 0 result member this . Enqueue p = if heap . Count = size then heap . Add ( p ) else heap . [ size ] <- p size <- size + 1 siftUp ( size - 1 )

All the operations are described above. As you can see it is very simple and strightforward. The resulting implementation is:

Fast : all operations work in time \(O(\log n)\)

: all operations work in time Space efficient : elements are stored in the list. Parent-child connections are not stored but are computed on the fly.

: elements are stored in the list. Parent-child connections are not stored but are computed on the fly. Easy to implement : just few lines of code.

: just few lines of code. Not functional friendly: mutable structures can be sometimes useful, but this implementation doesn't help with functional style and immutability.

The interesting bits are the build() function which allows to turn an array into the heap. We repair the heap property going from bottom to top. Initially, the heap property is satisfied in all the leaves (i.e., subtrees of depth 0) then start repairing the heap property in all subtrees of depth 1. When we reach the root, the heap property is satisfied in the whole tree. Running time is \(O(n \log n)\) since we call siftDown for \(O(n)\) nodes. If node is already close to the leaves than sifting it down is fast.

Benchmarking the running time

I used BenchmarkDotNet to mesure the performance of my implementation and I compared it to the FSharpx.Collections.PriorityQueue which is a full functional data structure.

This is the test I used:

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: let rng = Random () let max = pown 10 9 type PriorityQueue () = let mutable list : int list = [] [< Params ( 2000 , 20000 , 200000 , 2000000 )>] member val public Length = 0 with get , set [< Setup >] member self . SetupData () = list <- [ for i in 2.. self . Length -> rng . Next ( 1 , max )] [< Benchmark >] member self . MaxPriorityQueue () = PriorityQueue < int > ( list ) [< Benchmark >] member self . MaxPriorityQueueInsertingFromStart () = let pq = PriorityQueue < int > () list |> List . iter pq . Enqueue [< Benchmark >] member self . FSharpxPriorityQueueInsertingFromStart () = let pq = PriorityQueue . empty true let rec insertIntoPriorityQueue xs ( q : IPriorityQueue < IComparable > ) = match xs with | [] -> () | h :: tail -> insertIntoPriorityQueue tail ( q . Insert ( h )) insertIntoPriorityQueue list pq [< EntryPoint >] let main argv = ( BenchmarkSwitcher [| typeof < PriorityQueue >|]) . Run argv |> string |> printfn "%s" 0

And here are the results:

1: 2: 3: 4: 5: BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.2.9200.0 Processor=Intel(R) Core(TM) i7-3720QM CPU 2.60GHz, ProcessorCount=4 Frequency=2533316 Hz, Resolution=394.7395 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1586.0 DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1586.0

Method Length Mean StdDev Allocated MaxPriorityQueue 2000 568.6416 us 10.9266 us 107.1 kB MaxPriorityQueueInsertingFromStart 2000 687.4757 us 1.7271 us 123.5 kB FSharpxPriorityQueueInsertingFromStart 2000 423.1923 us 10.4438 us 167.92 kB MaxPriorityQueue 20000 5,646.1750 us 13.7735 us 1.16 MB MaxPriorityQueueInsertingFromStart 20000 7,006.6801 us 21.8178 us 1.35 MB FSharpxPriorityQueueInsertingFromStart 20000 4,854.3166 us 19.2608 us 1.68 MB MaxPriorityQueue 200000 57,873.1854 us 302.7054 us 11.14 MB MaxPriorityQueueInsertingFromStart 200000 72,939.0291 us 573.1936 us 13.06 MB FSharpxPriorityQueueInsertingFromStart 200000 104,380.8573 us 1,036.8847 us 16.8 MB MaxPriorityQueue 2000000 596,396.9013 us 3,905.1761 us 107.11 MB MaxPriorityQueueInsertingFromStart 2000000 747,160.8772 us 26,935.2538 us 126.29 MB FSharpxPriorityQueueInsertingFromStart 2000000 1,537,904.5511 us 62,809.3363 us 168 MB

My implementation seems to be fastest starting from 200000 elements and with 2 millions is twice as fast as FSharpx.Collections.PriorityQueue for enqueueing elements.

Now that I have my working implementation of priority queue, I can finally implement the shortest path Dijkstra algorithm

The shortest path algorithm (part II)

I need some basic data structures to represent my graph of nodes. I need an Edge , a node that I called Vertex and the Graph .

Note: in my implementation I use the name 'vertex' instead of 'node' but this is the same

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: type Edge = { DestinationVertexId : int ; Distance : double } [< CustomComparison ; StructuralEquality >] type Vertex = { Id : int ; ShortestDistance : double ; Edges : Edge list ; Path : int list } interface IComparable < Vertex > with member this . CompareTo other = compare this . ShortestDistance other . ShortestDistance interface IComparable with member this . CompareTo ( obj : obj ) = match obj with | :? Vertex -> compare this . ShortestDistance ( unbox < Vertex > obj ) . ShortestDistance | _ -> invalidArg "obj" "Must be of type Vertex" type Graph = { Vertices : Vertex list }

The Edge points to the destination Vertex and has also a Distance . The Vertex has a list of edges and the ShortestDistance label which will be calculated and updated according to the "source" Vertex . Note that the implementation of IComparable interface is made on that label which is logical because when dequeuing elements from the min priority queue we want to explore nodes with the shortest distance first. The Graph is just a list of nodes.

Let's start with the implementation following the pseudo code that I showed before. This looks something like that in F# (I'm sorry for not following the functional style but I care more about the steps of the algorithm) :

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: let shortestPath graph destinationId = let pq = PriorityQueue < Vertex > ( graph . GetVertices (), false ) while not pq . IsEmpty do let vertex = pq . Dequeue () printfn "Visiting node %i - value %f" vertex . Id vertex . ShortestDistance for edge in vertex . Edges do let destinationId = edge . DestinationVertexId match pq . TryFind ( fun e -> e . Id = destinationId ) with | None -> () | Some ( indx , destination ) -> let newDistance = edge . Distance + vertex . ShortestDistance if newDistance < destination . ShortestDistance then let newDestination = { destination with ShortestDistance = newDistance } pq . Update indx newDestination else ()

What is interesting that at line 15 I need to update the queue pq.Update indx newDestination with the new calculated ShortestDistance for the node. This is an operation \(O(1)\) because I have an index retrieved at line 10 with pq.TryFind . To find the index of the node in the queue I need to traverse the whole tree which is \(O(|V|)\) operation. Watch out, but this may cause the serious performance problems.

Test drive on wikipedia example

Before trying my algorithm on the real world data, let's check if it's correct on wikipedia example (see the image above in "The shortest path algorithm (part I)"). We need to find the path between the node a (1) and b (5). The value should be 20.

Running the algorithm produces the following output

1: 2: 3: 4: 5: 6: Visiting node 1 - value 0.000000 Visiting node 2 - value 7.000000 Visiting node 3 - value 9.000000 Visiting node 6 - value 11.000000 Visiting node 5 - value 20.000000 Visiting node 4 - value 20.000000

The node 5 has a value of 20 which is correct. We can move to more serious step.

Real world example: Santa Claus way to San Francisco (FAIL)

The next step is to try it on the real world data. After some googling I found interesting data used by 9th DIMACS Implementation Challenge - Shortest Paths. I used California and Nevada datasets with the following properties:

# of nodes 1 890 815

1 890 815 # of edges 4 657 742

This will be more challendging that 6 node example from wikipedia. Before running the sample I made a small change to my algorithm to ease the data conversion from the data set files. I'm not using my Graph type but a simple Dictionary<int, Vertex> .

The dataset file has the following format

1: 2: 3: 4: 5: a 1 1048577 456 a 1048577 1 456 a 2 1048578 2389 a 1048578 2 2389 a 3 1048579 358

We can drop the first column (letter a). The second column is the starting nodeId, next column is the destination nodeId and the last column is the distance.

Here is the script loading the data from the file and converting it to the Dictionary<int, Vertex> .

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: let readLines filePath = System . IO . File . ReadLines ( filePath ) let lines = readLines "USA-road-d.CAL.gr" let graphMap = lines |> Seq . map ( fun l -> let a = l . Split () ( int a . [ 1 ], int a . [ 2 ], float a . [ 3 ])) // we drop the first column |> Seq . groupBy ( fun ( f , t , d ) -> f ) |> Seq . map ( fun ( key , values ) -> ( key , values |> Seq . map ( fun ( k , v , z ) -> z , v ) |> Seq . toList )) |> Map . ofSeq let roadNetwork = graphMap |> Map . map makeVertex |> Map . fold ( fun ( graph : Dictionary < int , Vertex > ) _ v -> graph . Add ( v . Id , v ); graph ) ( Dictionary < int , Vertex > ())

Note that all of the nodes has the ShortestPath set to positiveInfinity. Then we need to set the starting node by setting it's shortest distance to 0.

1: 2: let start = roadNetwork . [ 1215934 ] roadNetwork . [ 1215934 ] <- { start with ShortestDistance = 0.0 }

The node number 1215934 is the following address 512 Partridge Ave, Bakersfield, CA 93309, USA.

The destination is the node number 1598609 at the address US-101, San Francisco, CA 94129, États-Unis Golden Gate Bridge

Let's run the algorithm.

Well, after maybe 30 minutes I killed the FSI because nothing happened, it was stuck somewhere and maybe the algorithm would never finish.

Let's try to find out why. If you remember the running time of the algorithm is \(O(|E| \cdot T_{dk} + |V| \cdot T_{em})\) where \(T_{dk}\) and \(T_{em}\) are the complexities of the decrease_key and extract_minimum operations in Q respectively. The Q is my priority queue pq , the decrease_key is the pq.TryFind and pq.Update indx newDestination all together. Extract_minimum is pq.Dequeue() but I know that this operation works in \(O(1)\) so it's not a problem. The pq.TryFind uses the pre-order traversal of the tree which is \(O(|V|)\). Too bad, because the running time of the algorithm is something like \(O(|E| \cdot |V| + |V|)\). With almost 5 millions of edges and 2 millions of nodes, I'm not surprised that the algorithm has never finished.

Can we do better? Yes. It's time for the optimization

Real world example: Santa Claus way to San Francisco (SUCCESS)

What can be improved? After some thinking here are some ideas:

The graph is connected so after the node has been explored we can mark it as visited and when other nodes will point to it, we simple can ignore it. So the idea is to not visit already visited nodes. There may be nodes which we don't want to explore because they won't be reachable at all. Their ShortesDistance will be positiveInfinity when dequeuing so we can simply ignore it. Another improvment would be to get rid of the costly pq.TryFind function. Why not to store nodes in a dictionary and priority queue at the same time? The dictionary would allow to fast lookup of the node and the priority queue would ensure to get the minimu node with the ShortestDistance . What we need is just to pq.Enqueue newDestination node and update it in the dictionary at the same time for further retrieval. This would be a double space complexity but that's a tradeoff Another nice feature will be to track all visited nodes so we could draw a path for the Santa Claus.

Here are some improvments:

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: let shortestPath ( graph : Dictionary < int , Vertex > ) destinationId = let pq = PriorityQueue < Vertex > ( graph . Values , false ) let mutable dest = Option < Vertex > . None let visited = Dictionary < int , Vertex > () while not pq . IsEmpty do let vertex = pq . Dequeue () if vertex . ShortestDistance <> Double . PositiveInfinity && not ( visited . ContainsKey ( vertex . Id )) then if vertex . Id = destinationId then dest <- Some ( vertex ) for edge in vertex . Edges do let destinationId = edge . DestinationVertexId if not ( visited . ContainsKey ( destinationId )) then let newDistance = edge . Distance + vertex . ShortestDistance let destination = graph . [ destinationId ] if newDistance < destination . ShortestDistance then let newDestination = { destination with ShortestDistance = newDistance ; Path = destination . Id :: vertex . Path } pq . Enqueue newDestination graph . [ destinationId ] <- newDestination else () else () visited . Add ( vertex . Id , vertex ) dest

As you can see the graph is now represented with Dictionary<int, Vertex> . This is our structure allowing finding nodes in \(O(1)\) time. At line 9 and 15 we check that the node is not already visited, this should save some time. Also on line 9 we ignore all the nodes with positiveInfinity. Then on the line 20, if the shortest distance was calculated for the node, we enqueue it in the priority queue so it will be explored before nodes with positiveInfinity and we update on the line 21 the dictionary of nodes, so it will be easily retrieved the next time. On the line 18 when creating the newDestination record I'm adding to the Path containing all the visited nodes, the current destination.Id of the node. This is our tracking system of the shortest path.

Let's run the algorithm again we those improvments. We will hopefully calculate the path from Bakersfield CA to San Francisco Gloden Gate Bridge!

The algorith finished!

1: Real : 00 : 00 : 18.985 , CPU : 00 : 00 : 18.968 , GC gen0 : 14 , gen1 : 13 , gen2 : 0

It's a very good improvment, from never finishing run to 18 seconds. I know that 18 seconds is maybe much for a GPS device but it's a very good start to improve. The running time is \(O(|E| + |V|)\) which is a way better than before.

How to know if the algorithm is correct? Well, we have tracked all the intermediate nodes (1415 nodes) on the shortest path. I have retrived their lon lat coordinates and draw it on the google map. Here is what I have:

It's maybe not the fastest way to get from Bakersfield to San Francisco Golden Gate bridge but it seems that this can be the shortest path indeed.

Let's look more closely:

And the destination point:

Let's try a longer way. The destination point will be always the Golden Gate Bridge in San Francisco but the starting point will be 1385 US-93, Jackpot, NV 89825, USA

The algorithm has finished!

1: Real : 00 : 00 : 18.658 , CPU : 00 : 00 : 18.531 , GC gen0 : 14 , gen1 : 13 , gen2 : 0

This is similar running time as before. Cool. It produces the shortest path of 2210 nodes. Let's check on the map:

It seems like tha algorithm is working. Let's look at the starting point:

and the arrival:

This may be not following exactely the roads but it seems the coordinates from th datasets.

Conclusion

The goal of this article was to show how with few lines of code and F# you can write a simple mutable data structure and algorithm that operates on millions of node and edgest to calculate the shortes path from one point on the map to another. The code doesn't follow the functional style and mutability is everywhere but the goal was to understand the underlying implementation. The next time I focus on the same problem but this time writing a fully functional datastructure and algorithm. This is a toy example but in the real world algorithms have to take into account the weights on the edges, traffic data and so on, and should run much more quicker than this one. The best implementation from 9th DIMACS Implementation Challenge that I used the datasets from, have query times expressed in microseconds. Note that these are not simply Dijkstra's algorithms, of course, as the whole point was to get results faster.

The script used for this blog is here. The priority queue implementation with performance tests is here.

References

Here are some references used in this article:

val not : value:bool -> bool



Full name: Microsoft.FSharp.Core.Operators.not

Multiple items

val seq : sequence:seq<'T> -> seq<'T>



Full name: Microsoft.FSharp.Core.Operators.seq



--------------------

type seq<'T> = System.Collections.Generic.IEnumerable<'T>



Full name: Microsoft.FSharp.Collections.seq<_>

type bool = System.Boolean



Full name: Microsoft.FSharp.Core.bool

namespace System

namespace System.Collections

namespace System.Collections.Generic

Multiple items

type List<'T> =

new : unit -> List<'T> + 2 overloads

member Add : item:'T -> unit

member AddRange : collection:IEnumerable<'T> -> unit

member AsReadOnly : unit -> ReadOnlyCollection<'T>

member BinarySearch : item:'T -> int + 2 overloads

member Capacity : int with get, set

member Clear : unit -> unit

member Contains : item:'T -> bool

member ConvertAll<'TOutput> : converter:Converter<'T, 'TOutput> -> List<'TOutput>

member CopyTo : array:'T[] -> unit + 2 overloads

...

nested type Enumerator



Full name: System.Collections.Generic.List<_>



--------------------

System.Collections.Generic.List() : unit

System.Collections.Generic.List(capacity: int) : unit

System.Collections.Generic.List(collection: System.Collections.Generic.IEnumerable<'T>) : unit

val raise : exn:System.Exception -> 'T



Full name: Microsoft.FSharp.Core.Operators.raise

val max : e1:'T -> e2:'T -> 'T (requires comparison)



Full name: Microsoft.FSharp.Core.Operators.max

val pown : x:'T -> n:int -> 'T (requires member get_One and member ( * ) and member ( / ))



Full name: Microsoft.FSharp.Core.Operators.pown

type 'T list = List<'T>



Full name: Microsoft.FSharp.Collections.list<_>

Multiple items

val int : value:'T -> int (requires member op_Explicit)



Full name: Microsoft.FSharp.Core.Operators.int



--------------------

type int = int32



Full name: Microsoft.FSharp.Core.int



--------------------

type int<'Measure> = int



Full name: Microsoft.FSharp.Core.int<_>

val set : elements:seq<'T> -> Set<'T> (requires comparison)



Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.set

Multiple items

module List



from Microsoft.FSharp.Collections



--------------------

type List<'T> =

| ( [] )

| ( :: ) of Head: 'T * Tail: 'T list

interface IEnumerable

interface IEnumerable<'T>

member GetSlice : startIndex:int option * endIndex:int option -> 'T list

member Head : 'T

member IsEmpty : bool

member Item : index:int -> 'T with get

member Length : int

member Tail : 'T list

static member Cons : head:'T * tail:'T list -> 'T list

static member Empty : 'T list



Full name: Microsoft.FSharp.Collections.List<_>

val iter : action:('T -> unit) -> list:'T list -> unit



Full name: Microsoft.FSharp.Collections.List.iter

Multiple items

type EntryPointAttribute =

inherit Attribute

new : unit -> EntryPointAttribute



Full name: Microsoft.FSharp.Core.EntryPointAttribute



--------------------

new : unit -> EntryPointAttribute

val typeof<'T> : System.Type



Full name: Microsoft.FSharp.Core.Operators.typeof

Multiple items

val string : value:'T -> string



Full name: Microsoft.FSharp.Core.Operators.string



--------------------

type string = System.String



Full name: Microsoft.FSharp.Core.string

val printfn : format:Printf.TextWriterFormat<'T> -> 'T



Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn

Multiple items

val double : value:'T -> double (requires member op_Explicit)



Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.double



--------------------

type double = System.Double



Full name: Microsoft.FSharp.Core.double

Multiple items

type CustomComparisonAttribute =

inherit Attribute

new : unit -> CustomComparisonAttribute



Full name: Microsoft.FSharp.Core.CustomComparisonAttribute



--------------------

new : unit -> CustomComparisonAttribute

Multiple items

type StructuralEqualityAttribute =

inherit Attribute

new : unit -> StructuralEqualityAttribute



Full name: Microsoft.FSharp.Core.StructuralEqualityAttribute



--------------------

new : unit -> StructuralEqualityAttribute

val compare : e1:'T -> e2:'T -> int (requires comparison)



Full name: Microsoft.FSharp.Core.Operators.compare

type obj = System.Object



Full name: Microsoft.FSharp.Core.obj

val unbox : value:obj -> 'T



Full name: Microsoft.FSharp.Core.Operators.unbox

val invalidArg : argumentName:string -> message:string -> 'T



Full name: Microsoft.FSharp.Core.Operators.invalidArg

union case Option.None: Option<'T>

union case Option.Some: Value: 'T -> Option<'T>

namespace System.IO

type File =

static member AppendAllLines : path:string * contents:IEnumerable<string> -> unit + 1 overload

static member AppendAllText : path:string * contents:string -> unit + 1 overload

static member AppendText : path:string -> StreamWriter

static member Copy : sourceFileName:string * destFileName:string -> unit + 1 overload

static member Create : path:string -> FileStream + 3 overloads

static member CreateText : path:string -> StreamWriter

static member Decrypt : path:string -> unit

static member Delete : path:string -> unit

static member Encrypt : path:string -> unit

static member Exists : path:string -> bool

...



Full name: System.IO.File

System.IO.File.ReadLines(path: string) : System.Collections.Generic.IEnumerable<string>

System.IO.File.ReadLines(path: string, encoding: System.Text.Encoding) : System.Collections.Generic.IEnumerable<string>

module Seq



from Microsoft.FSharp.Collections

val map : mapping:('T -> 'U) -> source:seq<'T> -> seq<'U>



Full name: Microsoft.FSharp.Collections.Seq.map

Multiple items

val float : value:'T -> float (requires member op_Explicit)



Full name: Microsoft.FSharp.Core.Operators.float



--------------------

type float = System.Double



Full name: Microsoft.FSharp.Core.float



--------------------

type float<'Measure> = float



Full name: Microsoft.FSharp.Core.float<_>

val groupBy : projection:('T -> 'Key) -> source:seq<'T> -> seq<'Key * seq<'T>> (requires equality)



Full name: Microsoft.FSharp.Collections.Seq.groupBy

val toList : source:seq<'T> -> 'T list



Full name: Microsoft.FSharp.Collections.Seq.toList

Multiple items

module Map



from Microsoft.FSharp.Collections



--------------------

type Map<'Key,'Value (requires comparison)> =

interface IEnumerable

interface IComparable

interface IEnumerable<KeyValuePair<'Key,'Value>>

interface ICollection<KeyValuePair<'Key,'Value>>

interface IDictionary<'Key,'Value>

new : elements:seq<'Key * 'Value> -> Map<'Key,'Value>

member Add : key:'Key * value:'Value -> Map<'Key,'Value>

member ContainsKey : key:'Key -> bool

override Equals : obj -> bool

member Remove : key:'Key -> Map<'Key,'Value>

...



Full name: Microsoft.FSharp.Collections.Map<_,_>



--------------------

new : elements:seq<'Key * 'Value> -> Map<'Key,'Value>

val ofSeq : elements:seq<'Key * 'T> -> Map<'Key,'T> (requires comparison)



Full name: Microsoft.FSharp.Collections.Map.ofSeq

val map : mapping:('Key -> 'T -> 'U) -> table:Map<'Key,'T> -> Map<'Key,'U> (requires comparison)



Full name: Microsoft.FSharp.Collections.Map.map

val fold : folder:('State -> 'Key -> 'T -> 'State) -> state:'State -> table:Map<'Key,'T> -> 'State (requires comparison)



Full name: Microsoft.FSharp.Collections.Map.fold

module Option



from Microsoft.FSharp.Core

Please enable JavaScript to view the comments powered by Disqus.