Python Patterns - Implementing Graphs

Warning This page stays here for historical reasons and it may contain outdated or incorrect information.

find_shortest_path

find_graph()

find_path()

Copyright (c) 1998, 2000, 2003, 2019 Python Software Foundation. All rights reserved. Licensed under the PSF license.

There's considerable literature on graph algorithms, which are an important part of discrete mathematics. Graphs also have much practical use in computer algorithms. Obvious examples can be found in the management of networks, but examples abound in many other areas. For instance, caller-callee relationships in a computer program can be seen as a graph (where cycles indicate recursion, and unreachable nodes represent dead code).

Few programming languages provide direct support for graphs as a data type, and Python is no exception. However, graphs are easily built out of lists and dictionaries. For instance, here's a simple graph (I can't use drawings in these columns, so I write down the graph's arcs):

A -> B A -> C B -> C B -> D C -> D D -> C E -> F F -> C

graph = {'A': ['B', 'C'], 'B': ['C', 'D'], 'C': ['D'], 'D': ['C'], 'E': ['F'], 'F': ['C']}

Let's write a simple function to determine a path between two nodes. It takes a graph and the start and end nodes as arguments. It will return a list of nodes (including the start and end nodes) comprising the path. When no path can be found, it returns None. The same node will not occur more than once on the path returned (i.e. it won't contain cycles). The algorithm uses an important technique called backtracking: it tries each possibility in turn until it finds a solution.

def find_path(graph, start, end, path=[]): path = path + [start] if start == end: return path if not graph.has_key(start): return None for node in graph[start]: if node not in path: newpath = find_path(graph, node, end, path) if newpath: return newpath return None

>>> find_path(graph, 'A', 'D') ['A', 'B', 'C', 'D'] >>>

Note that while the user calls find_path() with three arguments, it calls itself with a fourth argument: the path that has already been traversed. The default value for this argument is the empty list, '[]', meaning no nodes have been traversed yet. This argument is used to avoid cycles (the first 'if' inside the 'for' loop). The 'path' argument is not modified: the assignment "path = path + [start]" creates a new list. If we had written "path.append(start)" instead, we would have modified the variable 'path' in the caller, with disastrous results. (Using tuples, we could have been sure this would not happen, at the cost of having to write "path = path + (start,)" since "(start)" isn't a singleton tuple -- it is just a parenthesized expression.)

It is simple to change this function to return a list of all paths (without cycles) instead of the first path it finds:

def find_all_paths(graph, start, end, path=[]): path = path + [start] if start == end: return [path] if not graph.has_key(start): return [] paths = [] for node in graph[start]: if node not in path: newpaths = find_all_paths(graph, node, end, path) for newpath in newpaths: paths.append(newpath) return paths

>>> find_all_paths(graph, 'A', 'D') [['A', 'B', 'C', 'D'], ['A', 'B', 'D'], ['A', 'C', 'D']] >>>

def find_shortest_path(graph, start, end, path=[]): path = path + [start] if start == end: return path if not graph.has_key(start): return None shortest = None for node in graph[start]: if node not in path: newpath = find_shortest_path(graph, node, end, path) if newpath: if not shortest or len(newpath) < len(shortest): shortest = newpath return shortest

>>> find_shortest_path(graph, 'A', 'D') ['A', 'C', 'D'] >>>

These functions are about as simple as they get. Yet, they are nearly optimal (for code written in Python). In another Python Patterns column, I will try to analyze their running speed and improve their performance, at the cost of more code.

UPDATE: Eryk Kopczyński pointed out that these functions are not optimal. To the contrary, "this program runs in exponential time, while find_shortest_path can be done in linear time using BFS [Breadth First Search]. Furthermore a linear BFS is simpler:"

# Code by Eryk Kopczyński def find_shortest_path(graph, start, end): dist = {start: [start]} q = deque(start) while len(q): at = q.popleft() for next in graph[at]: if next not in dist: dist[next] = [dist[at], next] q.append(next) return dist.get(end)

[[['A'], 'B'], 'D']

len(find_shortest_path(graph, 'A', 'D'))

[dist[at], next]

dist[at]+[next]