After a major financial crisis, there is much discussion about how finance has become a casino gambling with other’s money, keeping the winnings, and walking away when the money is lost.

When thinking about financial reform, all the many losers in the above scenario are apt to take the view that this activity should be completely, or nearly completely curtailed. But, a more thoughtful view is that sometimes there is a real sense in which there are right and wrong decisions, and we as a society would really prefer that the people most likely to make right decisions are making them. A crucial question then is: “What is the difference between gambling and rewarding good prediction?”

We discussed this before the financial crisis. The cheat-sheet sketch is that the online learning against an adversary problem, algorithm, and theorems, provide a good mathematical model for thinking about this question. What I would like to do here is map this onto various types of financial transactions. The basic mapping is between “wealth” and “weight”, with the essential idea that you can think of wealth as either money or degree of control over decision making. The core algorithms start with a “wealth” spread over many experts, each of which makes predictions and then has it’s wealth updated according to a soft exponential of the value of it’s prediction.

Going Long. The basic strategy here is to buy low and sell high. This strategy is not inherently sound from a learning theory point of view, because a single purchased item can sometimes drop to zero value. Similarly, a single purchased item can sometimes grow radically in value. Neither of these properties are desirable from the viewpoint of a learning algorithm. In the zero value case, a good decision maker can be wiped out by one decision, while in the large value case, a lucky decision maker can randomly achieve overwhelming credit. Nevertheless, there is a sense in which this strategy is compatible. If each item purchased either doubles or halves in value, the fluctuation in the wealth of a decision maker is analogous to the fluctuation in the relative weight of on an expert in the online learning framework. … with diversification. Going long with diversification implies purchasing several items and selling them later. Adding diversification to the “Long” strategy helps it align substantially better with an optimal learning theory strategy. Single points of failure are avoided, while random fluctuations up in wealth are reduced. Going Short. The short strategy is borrowing an item (typically a stock), selling it high, then buying it back low to cover the debt. It’s technique used to make money when a stock decreases in value. This technique was banned for a time during the crisis. From the perspective of learning theory, short selling is more dangerous than long, because it’s possible to end up with negative wealth when a stock is sold short, and then it increases in value. To avoid this, it’s necessary to have sufficient collateral to cover the short at all times. If this collateral is at least twice the value when shorting occurs, it’s hard for participants to become wealthy by luck, because wealth at most doubles. Diversification is also a potentially useful helper strategy. Insurance. Credit Default Swaps are effectively a form of insurance where one party pays another small amounts unless something bad happens, in which case large amounts of money go the other direction. In the financial crisis, credit default swaps made the crisis viral, as the “pay up” clauses triggered, particularly wiping out AIG. Insurance has the same general problem as short selling—it can result in negative wealth unless there is sufficient collateral. It also has the same solution. Clawback. The basic idea of a “clawback” is that when someone fouls up really badly, you extract it from their past paychecks. As far as I can tell, this sort of clause exists in nearly no contracts, but it’s a popular proposal in retrospect, particularly for certain AIG employees who destroyed their company. The driving problem here is that the actual value of a decision is not known for some time, and it’s misestimated in the short term. Learning theory suggests that you should apply updates to estimated value as soon as possible to adjust wealth, which would correspond to a potential 100% clawback clause.

Two things strike me in considering the above.

The first is that for normal people interacting with the financial system a set of financial rules + good sense have developed such that wealth tends to grow and shrink in a manner similar to what learning theory would suggest is near optimal. For example, most people use the going long strategy by default and most diversify. Most don’t use the short strategy, but those that do must have sufficient collateral. Normal people don’t have access to credit default swaps, and normal insurance has real collateral requirements. Clawbacks are automatic, as normal people bet with their own money and take their own losses.

The second is that larger actors have become quite skillful at avoiding the rules, with unsecured credit default swaps, unsecured shorts, and no clawback rules. But, learning theory is math, so it can’t really be avoided—instead what happens is inefficient decision making via inefficient learning algorithms on a societal scale.

My belief is effective financial reform will impose limits on agents just as learning theory implies. This is also the answer to the title question—it’s gambling if the corresponding learning algorithm has high regret, and it’s rewarding good prediction if the corresponding learning algorithm has low regret. Since this is already done effectively for normal people, shifting all agents towards the limits imposed in that direction works. This means lower bounds on collateral (or equivalently upper bounds on leverage), and standardized markets where all agents can interact on an equal basis. Adding in automatic clawback provisions for all performance-based pay would also probably be very effective.

A full dose of this medicine may upset many people directly affected by such legislation, as it limits their actions and imposes downsides. But this needn’t be so, because the math is straightforward, very robust, and designed precisely to pick out the good decision makers giving them wealth as rapidly as responsibly possible to make and control bigger decisions. If you are a good decision maker, then you should want this.

On the research front, there are substantial improvements we could hope for. Some basic questions are: How can we better structure marketplaces to allocate wealth according to the dynamics of an online learning algorithm? And what are the holes in the mapping between online learning and markets that need repair? And how do you repair them? And how do the repairs effect learning algorithms when backported? Good answer to this question could be radically valuable. Yiling and Jenn have a paper mapping out connections between prediction markets and online learning this year at EC, which is of interest for this direction of research.