Graphs 101

Implementation in Python

Graphs are collections of nodes and connections. If we define some properties for these nodes and their connections, we can model a lot of real world problems. Graphs are especially useful for representing networks. Social networks, telephone lines, and road maps are a few examples of real world networks that their problems can be modeled and solved by graphs.

In Graph lexicon, a node and a connection are known as Vertex and Edge, respectively. Moving forward I will use these two terms instead of nodes and connections. Below, I illustrated a simple graph. Circles are vertices and lines are the edges.

An example of a general graph

Graph types:

By defining rules for vertices and edges, we can create different types of graph. For example, if you define a direction for edges, you can create a new type of graph known as Directed Graphs. Another example is a cycle. In general, cycles can happen in a graph, but if you impose a rule that your graphs can not have a cycle, then you can create a new type of graph known as Acyclic Graphs. Combining the two, we can have another type of graph known as Directed Acyclic Graphs (DAGs). Other than direction, we can give weight to edges and create Weighted Graphs.

These different graph types are useful for modeling different behaviors in real world networks. For example, in social networks, friendship is modeled with Undirected graphs, but following is modeled with Directed graphs.

Graph representation in memory:

There are two common memory representations of a graph. One is Adjacency Matrix, and the other is Adjacency List. I try to explain both of them with the example graph shown in the below picture. Adjacency in this context means direct connection. For example, in the example graph, two vertices of V₁ and V₅ are adjacent vertices of V₀ because there is a direct path from V₀ to them.

Example Graph

Adjacency Matrix:

This is the easiest way to represent a graph in the memory. But it takes a lot of space. If you have V vertices, it takes O(V²) space, which only gets efficient if you get closer to a fully connected graph. In real applications, most matrices are sparse (i.e. empty) and this implementation is not space efficient.

Adjacency Matrix

Adjacency List:

This is the space-efficient alternative of the adjacency matrix. It is also the preferred way of representing medium to large graphs.

As you can see in the below figure, in adjacency list, we create a master list of all vertices in the graph object, and each item in the graph object is connected to a vertex object that contains adjacent vertices and the weight of edges.

Adjacency List

Graph implementation in Python:

Below is a simple implementation of a graph with an adjacency list. This implementation can be customized for different graph problem.

Graph Traversal:

Graph traversal refers to the ways we can navigate the network and visit each vertex. There are two common ways to traverse a graph:

Breadth First Search (BFS): navigating a graph layer by layer (or level by level when there is a hierarchy). In other words, start from a node. Visit all its children. Then move on to the grandchildren and continue this process until there is no unvisited vertex.

Animated BFS (source: Wikipedia)

This search method needs a queue data structure for implementation. Using a queue guarantees that we will explore the breadth of a vertex before moving on to the other layers. This behavior guarantees that the first path between two vertices is the shortest-paths between them.

Depth First Search (DFS): navigating a graph by digging deep down into paths branching off of the starting point until it reaches a point where either there is no more edge forward or meet a previously visited vertex.

Animated DFS (source: Wikipedia)

This search method needs a stack data structure for implementation. You can either use stack explicitly in an iterative way, or use recursion, which uses a stack implicitly.

BFS implementation code:

DFS implementation code:

Common problems in the Graph world:

Over time, computer scientist have found some interesting solutions for a small set of graph problems. Here I try to discuss some of them. In real world, the network problems are complicated and sound complex, but our job is to figure out how to reframe the problem to match with one of the existing solutions. It is more art than science.

Find the shortest path between two vertices Detect a cycle in a graph Minimum Spanning Tree Topological sort

1. Finding the shortest path between two vertices:

This problem is only challenging when we have a weighted graph. For unweighted graphs, BFS gives us the answer without any hassle. For a weighted undirected graph, we can use Dijkstra algorithm to find the shortest path between two vertices. Dijkstra is a smarter version of BFS. A good detailed introduction of Dijkstra algorithm is here. Another source for visualization of Dijkstra algorithm can be found here. At the end of this section, I provide the code for Dijkstra tested on our toy example. Dijkstra has a time complexity of roughly O(V²).

Dijkstra Algorithm animation (source: Wikipedia)

There are two important things to consider when working with Dijkstra algorithm: 1. it can not handle negative weights, and 2. it is a single-source shortest path algorithm.

When we have negative weight edges and the possibility of a negative cycle, the alternative is Bellman-Ford algorithm. When there is a negative cycle inside the graph, the shortest path is meaningless, and the only job of the algorithm is to detect it and report it to the user. That’s it. The advantage of Bellman-Ford is that it has an internal mechanism that allows to detect the negative cycle and report it to the user. This extra capability comes at the expense of a little bit higher time complexity. Bellman-Ford has a time complexity of O(VE), which is higher than Dijkstra.

Bellman-Ford algorithm is not that very intuitive, and if you want to gain an intuitive understanding of the algorithm, it is better to solve a simple problem step by step. A good short video for this purpose is here. I provide the code for Bellman-Ford on our toy example at the end of this section.

When we are interested in all pairs shortest path, one way is to repeat Dijkstra algorithm for all vertices. However, the smarter way is to use Floyd-Warshall algorithm, which leverage the power and simplicity of adjacency matrix.

Floyed-warshall algorithm is very simple and straightforward. It consists of three loops. Two loops for scanning the adjacency matrix and another loop for checking if there is another vertex that can be put between two vertices and makes their indirect path shorter. Because of these three loops, time complexity of the algorithm is O(V³). I provide the code for Floyd-Warshall on our toy example at the end of this section. The input and output of the algorithm are shown below. The algorithm starts with a distance matrix as an input and then updates it within the three aforementioned loops and provides an output distance matrix, which has all-pairs shortest path distances.

Distance matrix as an input to Floyd-Warshall algorithm

Floyd-Warshall output, all pairs shortest path

Dijkstra code for our toy example:

Bellman-Ford code for our toy example:

Floyd-Warshall code for our toy example:

2. Detect a cycle in a graph:

One simple way to detect a cycle is to use DFS. Keep track of visited vertices and if you see them twice, then there must be a cycle. The other way is to use Union-Find algorithm. A good video tutorial for UF is here (change the video playback speed to 1.5x). You can find the implementation of UF later when I explain Kruskal’s algorithm.

Cycle detection by DFS code for our toy example:

3. Minimum Spanning Tree:

A spanning tree is a subset of edges in a weighted undirected graph that connects all the vertices without any cycle. A minimum spanning tree (MST) is a spanning tree with minimum total edge weight. Keep in mind that it is called tree because it doesn’t have a cycle.

There are two main algorithms to solve this problem. Prim’s and Kruskal algorithm. I try to explain them briefly and provide the code.

MST example (source: Wikipedia)

Prim’s algorithm:

It is a simple greedy algorithm. It starts from one vertex randomly and try to build a MST tree by attaching edges with the lowest weight to the tree until it reaches all vertices. The number of tree edges is |V|-1 .

MST using Prim’s algorithm for our toy example:

Kruskal’s algorithm:

This algorithm is very easy in concept. You can sort edges by weight and check them one by one from shortest to the longest to build a MST. If each edge didn’t form a cycle with existing edges of the tree, include them in MST; otherwise, discard. Continue until you have |V|-1 edges in the MST. That simple.

A simple example of Kruskal’s algorithm (source: Wikipedia)

To most important step of this algorithm is to detect a cycle. There are several algorithms to detect a cycle. The simplest one is to use DFS as I mentioned earlier. However, for using DFS, we must create a graph for each MST, which is not that convenient. For that reason, the common alternative is Union-Find algorithm, which can do the job with fewer lines of code.

MST using Kruskal’s algorithm for our toy example:

4. Topological sort

It is one form of ordering and is only relevant to DAGs. In a nutshell, it is a linear ordering of vertices and for every directed edge uv , u must come before v in the ordering.

There are two common algorithms for solving a topological sort, one is a recursive method based on DFS. The other one is an iterative method known as Kahn’s algorithm.

In the recursive method, for all vertices, the algorithm searches in depth to find the leaf vertices and pushes them to the stack.

The Kahn method is built on a known fact about DAGs, which a DAG has at least one vertex with no incoming edge (in-degree = 0) and one vertex with no outgoing edge (out-degree = 0). The algorithm counts the in-degree of all vertices first and then enqueue them based on their in-degree value. Time complexity is O(V+E).

I provide the code for both methods. Since, our toy example is not a DAG. So, I test my code on a below graph. This graph has multiple topological sort solutions.

a DAG example

Topological sort using DFS for our example:

Topological sort using Kahn’s algorithm for our example:

Visualization Sources

Graphs problems are better learned with visualization. Here I direct you to two most popular resources for visualizations. One is university of San Francisco CS page (here), which has provided many nice and intuitive visualizations for CS data structures and algorithms. The other one is VisuAlgo (here), which is built to provide nice visualization for complicated algorithms of the CS.