Proof: Both a and b belong to the component where the node lca(a,b) is the centroid. Suppose, by contradiction, that lca(a,b) doesn’t divide the path from a to b into two disjoint parts.

It means that both a and b will be in the same component after the removal of lca(a,b) in the original tree. Consequently, the centroid of that component would be a common ancestor of a and b lower than lca(a,b). Absurd. ∎

3. Each one of the n² paths of the original tree is the concatenation of two paths in a set of O(n lg(n)) paths from a node to all its ancestors in the centroid decomposition.

That’s the most important and the hardest idea to understand! Let’s see an example first. Look at the node 14 on figure 3. We have n paths starting in node 14 and ending in node a. We can represent all those paths in four different ways:

1. a ϵ {14} ➡ from 14 to 14 and then from 14 to a.

2. a ϵ {15} ➡ from 14 to 15, and then from 15 to a.

3. a ϵ {6, 9, 13} ➡ from 14 to 11, and then from 11 to a.

4. a ϵ {1, 2, 4, 5, 7, 8, 10, 12} ➡ from 14 to 3, and then from 3 to a.

Note that 14, 15, 11 and 3 are the ancestors of 14 in the centroid decomposition. What is the relation between a and them?

The idea is that instead of choosing all possible endpoints of the path in the original tree, we’ll choose all the lowest common ancestors of two nodes in the centroid decomposition.

This technique allows us to represent n paths using just two paths in a set of four ways. If we generalize the idea to any node, we can represent n² paths using two paths in a set of O(n lg(n)) ways.

Proof: The last property says that each path in the original tree can be represented as two in the centroid decomposition (from a to lca(a,b) and from lca(a,b) to b). Now, we have to prove that there’s only O(n lg(n)) paths from every node to its ancestors.

The height of the centroid decomposition is logarithmic. It means that the number of ancestors of a node is O(lg(n)). There are n nodes, so the total number of ancestors is O(n lg(n))∎

Another proof: The last property says that each path in the original tree can be represented as two in the centroid decomposition (from a to lca(a,b) and from lca(a,b) to b). Now, we have to prove that there’s only O(n lg(n)) paths from every node to its ancestors.

Instead of looking at all the ancestors of a node, let’s look at all its descendants (the number of ancestors has to be equal to the number of descendants). We’ll analyze, again, each level of the centroid decomposition.

It’s easy to see that the root has n-1 descendants.

Again, we don’t know how many descendants each node of the second level has. We know, nonetheless, that the sum of descendants of all nodes in the second level is O(n). The same goes for other levels.

For each one of the log(n) levels, we have no more than n descendants. It means that the number of descendants / ancestors is O(n lg(n)). ∎

Exercises

3. Given the centroid decomposition of figure 5, find the original tree. Is there more than one possible answer?

4. Show that every centroid decomposition is a centroid decomposition of itself or give a counterexample.

5. Think about the following statement: “Given any rooted tree, the path from a to b can be decomposed into the path from a to lca(a,b) and the path from lca(a,b) to b and we can apply the same strategy used in the third property.” If that’s true, why should we use centroid decomposition?