Previously I posted about an optimisation for A* pathfinding that lets you generate exploration paths in addition to normal paths thanks to the way the optimisation works. I will now present a benchmark comparing my method to the two “standard” optimisations. Before the results, it is important for me to explain how I performed the benchmark.

The Benchmark

Each version of the pathfinding method was an implementation of A* which was identical aside from how the closed list was handled, as well as one or two micro-optimisations. The five versions of the A* algorithm were:

A* with Booleans to mark whether nodes were explored, and a list to track what nodes had been visited. Optimisations: The nodes that have been visited are kept in a collection for O(1) insertion. When the nodes need to be reset, the collection is put into an array for O(1) node access. A* with whether a node was explored already stored using a bitmask, and a list to track what nodes had been visited. Optimisations: Same as above. The nodes that have been visited are kept in a collection for O(1) insertion. When the nodes need to be reset, the collection is put into an array for O(1) node access. A* with Booleans to mark whether nodes were explored, and all nodes in world marked false after the path was found A* with whether a node was explored already stored using a bitmask, and all bitmasks reset after the path was found When checking a bitmask, the location of the relevant bitmask in the array of bitmasks must be calculated (one division operation) and the bit to check must be found (one modulo division). We need these values several times for each node so during the expansion of a node I stored those two values rather than recalculating them. A* with a float value to represent whether a node was explored



I wanted to compare all the methods above to using proper closed lists, but the closed list based methods took too long to run (one path search took long as the entire experiment takes now!) but that’s fine – these methods being tested all beat closed lists. The first two methods seemed like they were unlikely to be any faster than the others, but I added them because I figured that for larger worlds keeping a list would be the only practical option for resetting nodes. For the openlist I used a data structure I had coded specifically for open lists, a collection which inserts items in order.

The game world was a 200×200 grid world with 8 direction movement, but to make it I simply copy-pasted the same 100×100 world 4 times. There were 44 waypoints, 11 per quarter of the gameworld. For the tests, I took every pair of waypoints and calculated the paths between them, for every method. Even though I was building my methods off a A* implementation I had tested before, I had the test program draw the paths generated so I could ensure that all three methods generated the same correct path. Of course, this path drawing time was not measured. Note that I have some experience with A* on the map I tested, which allowed me to pick waypoints which I knew would strain A* with many expansions.

I used the Stopwatch class from the C# library to time each pathfinding algorithm from when I called it to when it returned the path. If a path could not be found, an error was thrown and the experiment halted. Note that the stopwatch class can have had its results interfered with by things not related to the code running, like the JIT Compiler. However, the Stopwatch class provides a flag to indicate if it can do high accuracy performance measures that avoid most of these issues, or if it just uses the system timer (https://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.ishighresolution(v=vs.110).aspx). This flag returns true for my PC and therefore the Stopwatch class performs high accuracy measures. However despite this the initial results seemed to vary too much, so I performed the above test 10 times and put all the results in one CSV file for processing. The resulting test set 946 tests repeated ten times for a total of 9460 tests run.

Results:

Average Time To Run:

Number of Searches Optimisation – 2.47E-03 seconds Booleans (Keep List of Explored Nodes) – 2.57E-03 seconds Bit Mask (Reset Whole Grid) – 2.59E-03 seconds Booleans (Reset Whole Grid) – 2.63E-03 seconds Bit Mask (Keep List of Explored Nodes) – 2.68E-03 seconds

% of individual test cases in which Number of Searches Optimisation beat other optimisations: Booleans (Keep List of Nodes Explored) – 60.58% Bit Mask (Reset Whole Grid) – 82.81% Booleans (Reset Whole Grid) – 87.64% Bit Mask (Keep List of Explored Nodes) – 91.58%



Analysis

On average, my method beats the next best method by a ten thousandth of a second, which when scaled down to the ~2.5ms region of time the algorithm typically runs in is only about a 2-3% average speedup. During my test runs my method only beat the next best method ~60% of the time. However I looked closer at this and found that only around 17% of the time did the results of the two methods differ by more than one standard deviation.

If we compare how the other four methods did relative to each other, it may seem odd how a list of nodes + booleans won. However a closer look what happens in the code helps explain it. With a bit mask, you must perform divisions to locate which bit mask and which bit in the mask you are looking at. This has to be done for every node you consider exploring. This makes comparisons slower than booleans. We see the benefit of quick world resets in the way that the bit mask method that resets the entire world is faster than the boolean-based method that resets the entire world. However having a list storing what Booleans have been visited limited the number of nodes to reset and, since comparisons were faster too, the Booleans + List method beat both methods that reset the game world.

As for the worst performing method, when you combine the slow bit mask comparisons with the list you get the worst performance because each bitmask covers several nodes so you can end up repeatedly adding the same node. The result: Repeatedly resetting the same nodes and more allocations. Of course you could check the list of nodes to explore to see if a node is in it but then have O(n) insertions. It sounds fine in theory but in my experience that drastically slows things down.

Conclusion

My conclusion is that the two best methods are more or less equivalent in performance, so I will stick with my method that gives me the free exploration algorithm. If you wish to not use my method, avoid bit packing unless you are somehow have more issues with space than with CPU usage.

If you have questions or constructive criticism about my test method, analysis or conclusion, feel free to offer them. Since I like improving code efficiency, I’d like feedback on how I measure my performance.