Adventures in Programming Interviews: Misleadingly Simple NP-Hard Problem

7,808 reads

A directed graph denoting debts between five participants

It was late in the day, some would consider it evening, and I have already had five consecutive interviews in a row; this company offered no real respite from it all; some interviewers asked if I needed a break, but those questions tend to have only one “correct” answer. I was really tired. All that was left was an algorithm interview from a pair of interviewers.

They asked this simply framed problem:

Given a list of debts between pairs of people, minimize the number of transactions needed to clear all debts.

I ended up receiving an offer, but I did terribly in this interview. My initial instinct was that this was a graph problem, perhaps NP-Hard. My second instinct was that they would not ask an NP-hard question for a simple software engineering role. My mistake was that I assumed my interviewers understood the problem they were asking.

Their Solution

I befriended one of the interviewers later and learned this was what they had in mind:

Observe that it only matters that people receive or pay the amount needed for them to be made whole, and not where that money came from. For example, if Alice owed Bob $20, and Bob owed Carl $20, then these debts can be cleared by Alice giving Carl $20, even though Alice never directly owed Carl any money. Reduce each individual’s debts into a single number, so for example, if Alice owes 3 people $20 each, and 2 people owe Alice $50 each, then Alice is owed $40 (-$20 x 3 + $50 x 2 = $40). Remove people whose debt is $0. Separate the people who owe money and those owed money into separate sorted lists or sorted data structures, and then going from the highest value in each category, start pairing them off, and repeating this until the lists are exhausted.

I went home that day frustrated, and later analyzed this problem more deeply. This problem was decidedly NP-hard. It’s a fun problem, but inappropriate for a software engineering interview.

The observation and the reduction are correct. The greedy solution is not. Consider this counterexample after steps 1 and 2 above:

The optimal perfect groupings are drawn in blue and red

There are three people who owe amounts of $10, $3, and $3, respectively.

There are three people who are owed amounts of $6, $5, and $5, respectively.

The correct solution is 4 transactions, where 6 would be paired with the two 3’s the 10 would be paired with the two 5’s. The greedy approach produces 5 transactions.

Each transaction can eliminate either 1 or 2 participants. The optimal solution maximizes the number of transactions that can eliminate 2 participants. Let’s call a perfect grouping as a group of participants who owe debts and are owed debts that can be paired together without remainder. Each perfect grouping of participants introduces a transaction that can eliminate 2 participants.

The optimal solution would find the two perfect groupings, ($10 | $5, $5) and ($3, $3 | $6). The greedy solution only ever finds everyone as one perfect grouping ($10 $3 $3 | $6 $5 $5).

NP-Hard Proof

To prove that this problem is NP-Hard, the well-known subset sum problem, which is NP-Complete, can be reduced to this problem, thus proving that the problem is at least as hard as subset sum. The subset sum problem is a decision problem, where, given a set of integers S and a target integer s, whether there is a non-empty subset of S that sums to s. This reduction will use the positive variant of the subset problem, where all elements of S and the target integer s are positive.

The subset sum problem reduction visualized

This is the reduction:

Transform the set of integers S into participants who owe debt of their respective integer values. Add two participants who are owed debts of s and t, where t is the sum of all integers in the set S minus s. This ensures that the sum of debts and sum of money owed are balanced. Compute the minimum number of transactions required to clear all debts. n is the number of participants who owe money, e.g. the size of the set S. There are n + 2 participants in the reduced problem. Since there are only two people who are owed money, there can be at most 2 perfect groupings. If 2 are found, then the minimum number of transactions required is n, and there is a non-empty subset of S that sums to s. If it’s n + 1, then there is not.

Through the above reduction, this problem can decide the subset sum problem, which means this problem is at least as hard as subset sum. This problem is NP-Hard. QED.

The Solution

The solution is to maximize the number of perfect groupings to minimize the number of transactions, because each perfect grouping reduces the number of transactions needed to clear all debts by 1. When there are no strict subsets that are perfect matchings, the entire solution becomes a single perfect grouping. Rephrased, the solution is to maximize the number of distinct subsets that sum to 0.

Stepping through the above counterexample to the greedy solution. Invalid states are not drawn in this visualization (there are 60 undrawn states).

This is the solution I found:

Reduce each individual’s debts as a single number of the debt that they owe to the collective, those who are owed debts represent their debts as negative numbers. Remove the people whose reduced debt value is 0. Let n represent the number of remaining participants after the removal. Use a bit mask to represent the state, which is a series of 0’s and 1’s of length n. 0 at the ith digit means that the ith participant has not had their debt settled, and 1 would mean that ith participant has had their debt settled. There are 2^n such states: all 0’s represent the starting state of no settled debts, and all 1’s represents the end state with all debts settled. Each state maps to an integer, representing the maximum number of distinct non-empty subset sums. All 0’s have no non-empty subset sums as there are no settled debts, so the value is 0. Create a method that takes in the given state, and all the subset sums from the debts of the unsettled participants. For each subset sum found, flip those bits to 1 to represent that they have been settled, and recursively call the same method with the new state. Call the method with an initial state of all 0’s. The integer value associated with each state that will be cached/memoized is the maximum depth of recursive call stack from all 0’s that it took to reach that state, representing the longest chain (and the most number of distinct non-empty subset sums groupings found). The cache/memoization will ensure that no state gets computed twice. Find the integer value for all 1’s. The solution to this problem is the number of participants minus that number,

Run time analysis: At each state, there needs 2^k operations, where k is the number of unsettled participants debts to consider. There are (n choose k) states for a given k. Precompute all possible sums and store them into a lookup table O(2^n). Using the precomputed lookup table, there are O(2^k) operations per state, the big O runtime of this algorithm is:

A good explanation of why the two sides are equal can be found here. This algorithm will take O(2^n) space as there are O(2^n) states and O(2^n) sums.

Afterwards

The most frustrating part about all of this was that my interviewers thought that the problem was way easier than it actually is, and reported that I was not able to solve such an easy problem in the span of 45 minutes. This specific interview significantly hurt the offer I received. Unfortunately, there’s a lot of luck to the interview process as there is a lot of luck in life. The best things to do are to learn to not assume that the interviewers fully understand their own questions, and write a blog post about it.

Thank you Daniel Wasserman for helping me look over this article!

Tags