— by Larry Wasserman

What is the biggest open problem in statistics or machine learning?

This question has come up a number of times in discussions I have had

with various people. (Most recently, it came up when I met with students and post-docs at CSML

where I had a very pleasant recent visit.)

In many fields it’s easy to come up with candidates. Some examples are:

Computer Science: Prove that P is not equal to NP.

Pure Math: Prove the Riemann hypothesis.

Applied Math: Solve the Navier-Stokes existence and smoothness problem.

Physics: Unify general relativity and quantum mechanics.

Or better yet … explain why there is something instead of nothing.

You can probably think of other choices, but my point is that there ARE some obvious choices.

There are plenty of unsolved problems in statistics and machine

learning. Indeed, wikipedia has a page called Unsolved problems in Statistics.

I admire the author of this wikipedia page for even attempting this. It is not his or her fault that the list is pathetic. Some examples

on the list include: the admissibility of the Graybill-Deal estimator (boring), how to detect and correct for systematic errors (important

but vague), the sunrise problem: What is the probability that the sun will rise tomorrow? (you’ve got to be kidding). I’m sure we could

come up with more interesting problems of a more technical nature. But I can’t think of one problem that satisfies the following

criteria:

1. It can be stated succinctly.

2. You can quickly explain it to a non-specialist and they will get the gist

of the problem.

3. It is sexy. In other words, when you explain it to someone they

think it is cool, even if they don’t know what you are talking about.

Having sexy open problems is more than just an amusement; it helps raise

the profile of the field.

Can anyone suggest a good open problem that satisfies the three criteria?