For the clustering problem, we will use the famous Zachary’s Karate Club dataset. For more detailed information on the study see the linked paper.

Essentially there was a karate club that had an administrator “John A” and an instructor “Mr. Hi”, and a conflict arose between them which caused the students to split into two groups; one that followed John and one that followed Mr. Hi.

The students are the nodes in our graph, and the edges, or links, between the nodes are the result of social interactions outside of the club between students.

Since there was an eventual split into two groups (clusters) by the end of the karate club dispute, and we know which group each student ended up in, we can use the results as truth values for our clustering to gauge performance between different algorithms.