Here is my semi-long promised post on the theory of Linear Optimization or Linear Programming. About three thousand words under the cut, but there are pictures and I think it’s readable. Please ask if you have questions!

Summary: Optimization problems are a way of maximizing some output function (say your profit) subject to some constraints. If the output function and all the constraints are linear, solving this is surprisingly easy; I explain some of the method with pictures. We can view the process in a few different ways, and introduce the concept of duality. I talk about generalizations, including convex optimization (which, surprisingly, is almost as easy as linear optimization) and integer programming (which, surprisingly, is not).

Also, there are pictures of cats.

From a mathematical perspective, optimization is the question of maximizing some “objective function” subject to “constraints.” The objective function is just a function that describes how much of something we get–it can measure profit or utility or fuzzy bunnies, whatever our goal is. (It’s an “objective” function in the sense of “mission objective,” not in contrast to a subjective function). The constraints are restrictions that tell us what we’re allowed to do. They can describe prices of inputs, or the breeding rate of rabbits, or the rule that we aren’t allowed to restrict speech, or any other limitations we want or need to put on things.

An optimization algorithm is a technique for finding this maximum. It’s not too hard to state a problem where you can tell that a maximum has to exist, but it’s not at all obvious what you need to do to get that best result. (If your boss comes to you and asks what decision will maximize your firm’s profit, the answer “some decision, definitely” is not terribly helpful). So we’re talking about ways of not just showing that some optimization problem has a solution, but of actually finding the solution.

In this post we’re specifically discussing linear optimization, which means that the objective function is a linear function and the constraints are all given by linear inequalities (that is, they look like $f(\mathbf{x}) \leq 0$ where $f$ is a linear function). (We actually can relax this a bit and ask about convex functions and reasons; I’ll come back to this thought at the end).

These techniques were originally developed by Leonid Kanotorovich, a Soviet mathematician who was trying to solve the central planning problem. (You can read more about the history in slatestarscratchpad’s mainblog post on Red Plenty, and even more if you go read the actual book).

Kantorovich’s constraints were the inputs it takes to make each good (at each factory etc); and his objective function was…Uh, well, one of the fundamental problems of central planning is that it’s not clear what you should be optimizing for. Kantorovich sidestepped this by actually starting with a fixed output and trying to minimize the amount of inputs necessary. This turns out to be equivalent, and we’ll come back to this equivalence later.

(You might worry about whether the actual constraints in the real world are linear. The answer is probably “no, not really.” Fortunately, we can generally get pretty good results by approximating non-linear functions by linear functions–this is in fact the main point of derivatives, see my post on linear functions for more. It is true that this only works as long as we stay away from extreme edge cases, but under the quite reasonable assumption that we don’t want to devote our entire economy to making men’s shoes, size nine, gray, this probably isn’t too big a problem).

These techniques are currently used by almost every big logistical organization: FedEx, for instance, has a massive army of people using these and related techniques to optimize their shipping organization. So while the original goal of “saving the Soviet Union’s economic planning” didn’t really work out, the techniques do have tremendous practical applications. After all, a corporation is just a planned economy with a potentially well-defined objective function.

So how does linear optimization work? Let’s start by drawing some simple pictures. Suppose we have an economy that can produce either guns or butter–

Actually, no, wait, that’s boring. Suppose you run a company that makes two products: board game versions of Tetris and Rubik’s cubes that deliver electric shocks to their users. (No, I don’t know why you would make and sell either of those things. Or who’s buying them from you. It’s your company, you tell me). And suppose you make \$5 of profit for each pointless board game and \$3 of profit for every dangerous math toy.

Your objective, presumably, is to maximize profit. We can write $T$ for the number of tetris games you make, and $R$ for the number of Rubik’s Cubes you make. So your objective function is $f(T,R) = 5T + 3R$, and we can check that this is a linear function. (Note: the function stops being linear if you assume the amount you make has an effect on the price, so these techniques are most useful when no one manufacturer is a monopolist or anything).

But with just this information, you obviously maximize your profit by making “as many toys as possible.” In order for the question to be interesting, we need some constraints. First of all, you (sadly for everyone else) can’t make negative toys, so we have $T \geq 0$ and $R \geq 0$. Then let’s say that you can make at most a hundred stupid tchotchkes a month. We can render this with the constraint equation $T + R \leq 100$. At this point the problem is still easy–the Tetris game is more profitable than the Rubik’s cube, so we should make 100 of the Tetris games for a total profit of $500.

So let’s add another constraint. Maybe both toys require a certain amount of plastic, and it’s hard to find sufficiently low-quality plastic so you can only get 300 ounces a month. The Tetris game takes 4 ounces of plastic to make, and the Rubik’s cube only takes two ounces. We have a second constraint, now, that $4 T + 2 R \leq 300$. At this point you can probably still solve the problem in your head. We want to make as many Tetris games as possible, so what’s the most we can make? A little experimentation or calculus will show that you can make fifty Tetris board games and fifty electrified Rubik’s cubes, and we get a total profit of $f(50,50) = 5\cdot 50 + 3 \cdot 50 = 400$ dollars.

Let’s add one more contstraint. You probably don’t want to work more than, say, 55 hours a month, and it takes forty minutes to make a Tetris game and only thirty to make a Rubik’s cube. This gives us a third constraint, which is that $40T + 30 R \leq 3300$. And now this is hard to think about in our heads. (I don’t know what the answer is yet). So we need some tools.

The first tool, when you can use it, is to draw a picture. (This is why we’re still working in two variables). First let’s draw a picture of what solutions the constraints allow, ignoring the feasible region for now. We can draw a line for each constraint, and shade in the set of solutions it allows:

This allows us to sketch a picture of the precise feasible region. This is a picture of all the outputs you could choose to make. So under our constraints you could make zero of either thing, or you could make ten Tetris games and twenty Rubik’s cubes. But you can’t make seventy of each.



Everything we’ve done so far works with any functions, not just linear functions. But now we can introduce two facts that make our job much easier. First, that the maximum is always on the boundary. This should be intuitive from the example: if you can make more Rubik’s Cubes without making fewer Tetris games, that will obviously give you more money. This is a general theorem about linear functions on regions defined by linear inequalities.

Second, and even better, the maximum is always on a vertex. And furthermore, if two vertices yield the same output, the entire line between them will also yield that same output. So at a first pass we can just check all the vertices. Looking at the picture above, there are five vertices; checking them all we get the following table: $$ \begin{array}{cc} (T,R) & f(T,R) \\ (0,0) & 0 \\ (0,100) & 300 \\ (75,0) & 375 \\ (30,70) & 360\\ (60,30) & 390 \end{array} $$ So we maximize profit by making sixty terrible Tetris games and thirty wretched Rubik’s cubes, for a profit of $390. The new constraint cost us ten dollars.

In that problem we could just check all the vertices; in interesting problems there can be millions of vertices, so that’s not terribly efficient. But we can think about the problem in a couple of other ways. The picture I like to have in my head is a picture of moving level sets. For any output of the objective function, we can draw a line that represents “all the solutions that give that output.” So for instance this is the line of all solution that have an output of 300:

Just from looking at the picture we can see that we can do better. Here’s the line for 400:

And we see that we can’t quite do that well. But we almost can! Conceptually we can think about slowly moving the line further and further out until it just barely hits one vertex:

This is the line for 390, and thus it is our solution.

This isn’t really a general purpose algorithm, although it’s quite useful for seeing and really grasping what’s going on. So keep those pictures in mind.

But also think back to how we intuitively solved the easy version of the problem. We could make at most 100 toys. So at any given point we could trade out a Tetris game for a Rubik’s cube or vice versa. And it’s clear that trading one Rubik’s cube for one Tetris game will gain two dollars, so you should keep making that trade until you can’t any more.

This is the basic idea behind the simplex algorithm which was invented by George Dantzig in 1947, when he was working for the US Air Force. (If you’ve ever heard the story about the math PhD student who solved a major open problem because he accidentally thought it was course homework–that was this guy and related to this problem).

The algorithm: start at a vertex of the feasible region, doesn’t really matter which one. Compute the gradient (derivative) of the objective function there (this is really easy since it’s linear). Figure out which adjacent vertex (that is, a vertex directly connected by a line segment) is in the direction of most rapid increase, and go to that vertex. (This is called a “pivot”). Repeat.

Sometimes there are no adjacent vertices that give better results. In that case you move to one of the vertices that is “just as good”. This is called “stalling” and it’s fairly normal for it to happen. But there’s an obvious risk of winding up in a “cycle” where you get back to a vertex that you already hit earlier–in which case the program will never terminate. But fortunately there are rules that can ensure this never happens and the program always terminates.

Importantly, the entire problem can be encoded in a giant matrix, and then this entire algorithm can be encoded as simple matrix operations. This is great, because matrix operations are fast and efficient and easy to execute.

Consequently, the simplex algorithm is “usually” very efficient. In practice it works on most problems. Under any reasonable method of generating random problems, the simplex method works very well on average. However, there are problems where the simplex method is very slow; with a carefully designed problem (such as the Klee-Minty cube) you can force it to check every vertex.

There is a whole family of so-called “vertex” or “basis-exchange” algorithms that stay on the boundaries of the feasible region. Many of them are efficient on average but all of them are vulnerable to these sorts of pathological problems where they take exponential time.

More recently (starting around 1980) Khachiyan, Karmarkar, and others developed “interior-point” methods that use information from the interior of the feasible region, and have worst-case polynomial time performance. Both types of algorithms are in common use in industry today, in almost any industry that uses large-scale logistics operations.

Another perspective we can use to look at this is the perspective of the dual problem. Duality shows up in a lot of areas of math–the ability to, essentially, take a problem and look at it from the other direction. Or a slightly different problem that gives information about the original problem.

In this case we stated our original, or “primal” problem, as trying to maximize our profit subject to some constraints, which you can think of as resource costs. The number we can set is the “number of units produced.” The dual version of the problem is to try to minimize the total cost of the resources used, subject to having certain minimum prices for finished goods. The number we can set in the dual problem is the price of resources.

So in our problem, we would minimize the function $f^\dagger (x,y,z) = 100x + 300 y + 3300 z$ (the total cost of resources) subject to the constraints $x + 4y +40 z \geq 5$ and $x + 2y +30z \geq 3$ (which say that you have to get at least \$5 for every Tetris game and at least \$3 for every Rubik’s cube). That is, our dual objective function is the “cost of all the resources used,” and the dual constraints reflect the observation that if resource prices are “too low” relative to their value then you’ll have resource shortages.

Here’s a picture of the feasible region for the dual problem. (You can see why we prefer drawing pictures of two-variable problems. )

The dual problem isn’t the same as the primal, but rather a mirror. A feasible solution to the dual problem provides an upper bound for the primal problem–every solution satisfying the constraints will guarantee at least the maximum profit available to the primal problem. Conversely, every feasible solution to the primal gives a lower bound for the dual–it uses as many or more resources than the minimum possible.

This technique actually generalizes to all sorts of optimization problems. But in general, there is a “duality gap”: the solution that minimizes the dual problem might not get the solution as low as the maximum to the primal problem. (Recall they are mirrors but not identical). But in the special case we’re in, the linear case, that can’t happen.

If there is an optimal solution to the primal problem, then there is an identical optimal solution to the dual problem. (If the dual problem is completely infeasible–there’s no way to satisfy the constraints–then the primal problem is unbounded, which means you can get infinite profit. And vice versa).

This is highly related to the idea of duality and equilibria in game theory. Supposedly, Dantzig mentioned his work to von Neumann and von Neumann immediately conjectured that duality would hold because linear programming worked just like the game theory he was developing.

In fact, there’s a sense in which you can think of a linear optimization problem as a two-player game played between a producer and a planning board–the producer wants to maximize profit, the planning board wants to minimize profit, and the optimum reflects the Nash equilibrium strategy. You can actually prove Nash’s theorem using these linear programming techniques; the proof isn’t difficult but is long and boring, and essentially reduces to “turn every game into a linear optimization problem, and then show that it has a solution.”

I want to finish up by talking about two different ways we can make the problem harder.

One thing we can do is relax the linearity condition. The general problem of “optimize a function subject to constraints” is really hard; it’s not that difficult to write down optimization problems that are harder than NP-complete (and thus definitely require exponential time), although most interesting ones are in NP in some form or another.

To keep things manageable, people often only search for approximate solutions–it’s much easier to find a solution that’s “probably close to optimal” than it is to prove that you definitely have the optimal solution. There’s usually a tradeoff that you can get a more exact answer in exchange for a longer computational time.

But we can restrict the functions a bit and still get a manageable problem; this is the domain of convex optimization. A region is “convex” if you can pick any two points in the region, and the line segment connecting them is entirely contained in the region. So for example, a triangle or a circle is convex, but a star or a chevron is not.

It’s not too hard to see that the regions defined by linear inequalities that we’ve been discussing are convex.

We say that a function is convex if the region above its graph is convex. Again, it’s not difficult to see that linear functions are convex.

The simplex method does not work on convex regions–it needs to move among vertices, and convex regions (like circles) may not have any vertices, and if they do the vertices aren’t necessarily special. But we still know that the optimum has to be somewhere on the boundary, and a simple modification of the interior point methods will still work. In particular, convex optimization problems still have the property that the dual problem will always overlap the primal, so the two problems have a feasible solution in common, which is optimal.

Cosma Shalizi has an excellent post on Red Plenty where he discusses some of the limits of convex optimization. Sadly the entire economy probably is not convex (which would essentially imply, among other things, that there was never increasing returns to scale). But we’re fortunate enough that a number of important problems are convex and are tractable to these types of algorithms.

The other way of making this problem harder is more interesting to me personally, and actually why I originally brought this topic up. (Don’t worry, you’re near the end of this post!) When we solve the problem earlier, it asked for 60 Tetris games. If it had asked for 58, we still would have been happy. But what if it had asked for 58.5? We can’t make half a Tetris game. What if it had told us we needed $19 \pi$ Tetris games?

The tools that allow us to find solutions will find the best real-number solution, but we might need to find a solution that’s entirely made up of integers. For a lot of problems this isn’t a big deal–if the algorithm calls for 1023.7 tons of concrete, you can round that off to 1024 tons without having huge problems.

But it turns out that just rounding off the best fractional solution is not a good way to find the best integer solution! In fact, finding integer solutions is an NP-complete problem that’s very important to computer science, and related to problems like the Knapsack Problem. I will talk about this more in a future post.