A simple explanation of the P vs NP problem

You may have seen a lot of buzz about “P vs NP” floating around on the Internet, and possibly the media. If you have no idea what that’s all about, and you want to find out, this blog post is for you. As one redditor put it: “Polynomial time? fast algorithm? Huh? Let’s start off simply: What is a P, what is an NP, and why are they fighting?”

P and NP are complexity classes. You know how in boxing, you have “weight classes” such that boxers of a similar weight are categorized together? In computer science, algorithms, and problems of a similar complexity level are categorized together. [1]

P stands for “polynomial time” and NP stands for “Non-deterministic polynomial time”. The “polynomial time” part isn’t really important for understanding the “P vs NP” problem, so I’ll skip that, and focus instead on the “Non-deterministic” part.

There are some problems which, as you increase the “size” of the problem, only take a little bit longer to solve. For example, if I asked you to add 2 and 8 together, within 10 seconds, you’d probably be able to do it. And if I asked you to add 23482039 and 89374272 together, and you’ve got 80 seconds to do it, you could probably do it too. I’ve made the problem 8 times bigger (I’m asking you to add two 8-digit numbers together, instead of two 1-digit numbers together), but I gave you 8 times more time to work with.

There are other problems which, when I increase the size of the problem, become take much, much longer to solve. If I told you to find all the prime factors of 36, you could probably do it within a minute (2 x 2 x 3 x 3). But if I asked you to solve a problem 8 times bigger (16 digits instead of 2 digits), you probably couldn’t solve it with only 8 times more time (8 minutes instead of 1 minute).

Another interesting question is to ask how long does it take to check the solution to a problem. If I told you that someone told me that 2 + 8 is 11, and I asked you to check the answer for me, how would you go about it? Probably the easiest thing for you to do is to just try and solve the original problem (2 + 8) yourself, and see if you get the same answer. But if I told you “Someone told me that the prime factors of 420420 are 2, 2, 3, 5, 7, 7, 11, 13. Can you check it?”, then trying to solve the original problem (“What are the prime factors of 420420?”) is way too much effort, when instead, you could just multiply the prime factors together to check if it totals the original figure.

Problems that are of complexity class “P” are generally considered to grow slowly: they only take a little bit of extra time to solve as you increase their size. Since they can be solved to quickly, usually when you want to verify the solution to these problems, you just re-solve the problem from scratch. Problems in “NP” are quick to verify. Note that I haven’t said anything about whether or not problems in NP are quick to solve, only that they are quick to verify!

In fact, whether or not NP problems are quick to solve is an open question. Maybe there’s a secret trick to finding the prime factors of big numbers, but mathematicians just haven’t figured out the technique yet. Or maybe there is no trick to solving the problem quickly, and all we can hope to do is verify the solutions quickly.

This turns out to be a fundamental concept in computer science, and it’s practically the basis upon which all computer security systems in the world are based upon. Why is that? Because the characteristics “Difficult to figure out, but easy to verify” are the exact characteristics you want in a password: You want it to be difficult for anyone else to be able to figure out your password, but you want it to be easy for your computer to verify whether or not your password is correct (can you imagine if every time you wanted to log onto Facebook or Gmail, you had to wait through a “please wait, verifying password…” screen for a couple of hours?)

Most computer scientists today believe that P is not equal to NP (usually written shorthand as “P != NP”); that is to say, they believe that the problems in NP are impossible to solve quickly. However, there hasn’t been any proof that this may be the case. It’s a bit unnerving that all of computer security (including encryption used by the government security agencies, by financial institutions like your bank and credit card company, etc.) is founded upon this “gut feeling” that passwords are “probably” difficult to solve quickly.

If someone had managed to prove that P = NP, they would have been able to defeat virtually any and all computer security system in existence; having finally found a way to quickly “solve” the “password problem”, and not merely quickly verify a password, given the solution. So you can see why there’s a lot of excitement surrounding this question, at least amongst computer scientists and cryptographers.

Fortunately, the proof that’s circulating the internet these days claims to prove that P != NP. If this proof is correct, then we’ve determined once and for all that the foundational theory of cryptography truly are secure… until someone comes along and invents a Quantum Computer, at least… but that’s for another blog post.

1: One interesting thing to note about boxing weight classes is that the weight class only places an upper bound on the boxer, but no lowerbound. So if you were a boxer who weighted 124 pounds, you could enter the “Featherweight” weight class (which has an upper limit of 126 pounds), but you could also, if you were exceedingly good, enter the lightweight (135 pounds), or middleweight (160 pounds), or even higher. In other words, every boxer who is a lightweight, is also a boxer who is a middleweight, and heavyweight, and every weight class above. However, a boxer who 160 pounds could not enter a “lightweight” fight, as you can’t enter in classes below you. It’s the same thing with complexity classes: Problems which are in P, are also in NP, and also in higher classes. But higher class-problems cannot be categorized into the lower ones. This is all pedantic, though, because usually when someone says “Bob is a heavyweight boxer”, they mean “Bob is a heavyweight boxer, and is not in any weight class below heavyweight”. Technically, bob could weight a mere 110 pounds and still be in the “heavyweight” class, but nobody ever uses the terms that way. Again, same thing with complexity classes: Usually when someone says a problem is in class P, they mean “P and not in any simpler class”. That’s how I’ve been using the terms in this blog post.