SAT Solvers: Is SAT Hard or Easy?



Are practical SAT problems easy or hard?



Ed Clarke, a Turing Award winner, was at the recent NSF workshop on design automation and theory. Ed has, for approximately three decades, worked successfully on advancing the state of the art of verification, especially that of hardware.

Today I want to talk about a discussion that came up at the workshop. Ed and others stated that in “practice” all the instances of SAT that arise naturally are easily handled by their SAT solvers. The questions are simple: why is this true? Does P=NP? What is going on here?



I have know Ed since he presented his first paper at the 1977 POPL, a conference that I used to attend regularly. He proved a wonderful theorem about an approach to program correctness that was created by Steve Cook. I worked on extending Steve’s and Ed’s work, my work appeared later that year at the FOCS.

Cook had a clever–not surprising it was clever–idea on how to circumvent the inherent undecidability of determining anything interesting about programs. I will try to motivate his idea by the following program:

Search all natural numbers x,y,z ordered by the value of x+y+z, and if f(x,y,z)=0, then print x,y,z and stop.

Here is some polynomial. Steve’s insight was the reason this program is hard to understand has nothing to do with the complexity of the program. Rather, it is all due to the domain complexity of the natural numbers.

Cook’s brilliant idea was to figure out a way to “factor” out the domain complexity. He did this by adding an oracle that would answer any question about the domain. In the above program, this ability means that telling if the program halts is easy: just ask the oracle “is there an so that ?”

Sounds like a trivial idea. But, wait. If you have access to an oracle for the natural numbers, then anything is easy to determine about any program. This would lead to a trivial theory. Cook had to rule out the possibility that the oracle would be asked “unnatural questions.” By this I mean that you might be able to use the fact the arithmetic is very powerful and encode the analysis of your program into an arithmetic formula. Cook did not want to allow this.

The way he did this is cool: the analysis of the program has access to the domain oracle, but the analysis must be correct for all oracles. Thus, if you try to encode information in some “funny way”, then for some domain(s) your analysis is wrong. This was the clever insight of Cook.

Steve showed that certain kind types of programs could be analyzed with such an oracle and any domain, while Ed proved that certain more powerful type programs could not. I proved a necessary and sufficient condition on the type of programs for which Steve’s method works. My proof is quite complex–I try not to say that about my own work–but it really is tricky. Look at the papers if you are interested, or perhaps I will do a full post on this work in the future.

Thanks Paul

Before we turn to SAT solvers, I want to thank Paul Beame who has helped tremendously with this post. I thank him profusely for his time and effort. Think of this a guest post, but all the errors and mistakes are mine. Anything deep is due to Paul.

SAT is hard, SAT is easy, SAT is hard,

There is something fundamentally strange going on. The theory community is sure that P NP, and so SAT is hard. The solver community is sure that they can solve most, if not all, of the problems that they get from real applications, and so SAT is easy. What is going on here?

In my post on P=NP is Ill-posed, I raised some of the many ways that the P=NP question could play out. There are plenty of ways that both the theorists and the solvers could be right. As you know I like to question conventional wisdom, and in particular I am not convinced that P=NP is impossible. But, I also question the conventional wisdom of the solvers that practical SAT problems are all easy. So I feel a bit like we are living in Wonderland with Alice: (with apologies to Lutwidge Dodgson aka Lewis Carroll.)

“There is no use trying; one can’t believe impossible things, like P=NP.” Alice says. “I dare say you haven’t had much practice. When I was your age, I always did it for half an hour a day. Why, sometimes I’ve believed as many as six impossible things before breakfast, like L=P, P=NP, NP=QBP, and ” The Queen replies.

SAT Solvers

In order to start to understand the situation let me first discuss the state of the art in SAT solvers. I have used many comments from Paul, and also have relied on the pretty survey paper of Carla Gomes, Henry Kautz, Ashish Sabharwal, and Bart Selman. This is definitely not my area of expertise, but I hope I can get the main points right. As always all the errors are mine, and all the correct stuff is due to Paul or my ability to transfer the facts correctly from the survey. Please let me know if I made any mistakes.

My understanding is SAT solvers divide roughly into two classes: complete and incomplete. Complete means that the algorithm always finds an assignment if the problem is satisfiable; incomplete means the algorithm may not. The complete ones all use Martin Davis, Hilary Putnam, George Logemann, and Donald Loveland (DPLL) as a basis, but usually add sophisticated methods to make the algorithm go faster. The incomplete methods tend to be different variations on random walk methods. I known the D and second L well in DPLL: I known Davis for a long time and Loveland taught me logic while I was a graduate student at CMU.

The DPLL method is based on a backtracking method. It assigns literal values and makes recursive calls to the simplified set of clauses. If it gets stuck, then it backtracks and continues searching. This method was initially called the DP method, until Logemann and Loveland who added two additional insights: take advantage of unit clauses and “pure” literals. If a literal occurs, for example, always as and never as , then all clauses with can be removed. Such a literal is called pure. Paul points out the history and relationship from DP to DPLL is more complex, but I will leave that out.

Paul says that there are three keys to fast solvers today.

Branching heuristics that take into account where the action is based on the learned clauses. One method is called VSADS and uses a weighted majority algorithm. Clause learning which is a way of adding new clauses to the formula based on failed branching. This can potentially help avoid large segments of the search. Watched literals that allow the modified formula to be kept implicit. This makes the propagation of the updates fast and tends to keep everything in the cache.

Like all backtracking methods, the selection (1) of which literal to set next is extremely important. The same is true, for many other search algorithms. For instance, linear programming with a good pivot rule seems to take polynomial time in practice. Another example, is searching game trees with search. If the “best” move is always used first, then the search tree is dramatically decreased in size.

There are many additional ideas that are covered in the survey. One key idea is to make random decisions when to restart the backtracking. The method is restarted even if it is not stuck. The methods usually keeps the clauses already set to guide the next restart (2) of the backtrack algorithm. Finally, note that idea (3) is an implementation one, which is of great importance on modern computers. No doubt similar methods will soon be needed due to the rise of many-core processors.

The second class of SAT solvers is based on random walk technology. These algorithms start at a random assignment and then use various local moves to change the assignment, hopefully toward the correct one. Some algorithms make local improvements, some allow plateau type moves, others use more complex types of random walks. There is evidence that these methods can work well on random SAT problems as well as some of those from real applications.

There also are methods that are random, but are fundamentally different. A relatively new one is called Survey Propagation (SP), which was discovered in 2002 by Marc Mézard, Giorgio Parisi, and Riccardo Zecchina. The solvers based on SP are like belief propagation methods that are used for decoding error-correcting codes. In each iteration about of the literals are set. When the formula density gets too low, SP switches to a random local search algorithm. Paul points out that the survey seems to not say anything about this part of SP.

Why are Real SAT Problems Easy?

The solvers know well that any DPLL type method cannot be fast on all SAT problems, based on known lower bounds for resolution. However, that still does not explain why these methods seem to work so well in practice. Even the contests–see this site–which are held regularly on solving SAT divide the problems given to the solvers into three classes:

randomly generated sets of clauses;

hand crafted sets of clauses;

examples from “real” applications.

Paul points out that he would take Ed Clarke’s contention that “SAT is easy in practice” with a grain of salt. First, before SAT the verification community was trying to solve PSPACE-hard problems, so SAT is a great advantage since is only NP-complete.

Second, the SAT solvers work except when they fail–reminds me of an Amazing (James) Randi story, for another time. The real difficulty is that no one understands what causes a SAT solver to fail. Tools that have this type of unpredictable behavior tend to create problems for users. Many studies of human behavior show that it’s the variance that makes tools hard to use.

Paul goes on to say that one of the research directions is the discovery of why SAT solvers fail. One problem is symmetry in the problem, since large amounts of symmetry tend to mess up backtracking algorithms. This is why the Pigeon Hole Principle is a “worst case” for many SAT methods.

Even if solvers cannot do as well as Ed claims, I still think that one of the most interesting open questions for a theorist is why do the algorithms do so well on many real problems? There seem to be a number of possible reasons why this behavior is true. I would love to hear about other potential reasons.

The real problems may have sufficient entropy to behave like actual random SAT problems. Mike Mitzenmacher suggested this idea to me. For example the SP method can handle over one million clauses of randomly generated sparse clauses, according to Selman.

The real problems may have small tree width. More precisely the graph that describes the interaction of the literals has small tree width. If you do not know this concept please check here. Suffice it for now that many search problems with tree width and size can be solved in time order . See this paper for results that relate to this.

The real problems may have some modular structure. Clearly, if a problem comes from modeling a hardware circuit, the SAT clauses may have special structure. They may not have small tree width, but they should have some additional structure. What this structure is and how it interacts with the SAT solvers is unclear. Luca Trevisan suggested this idea to me. One problem that I pointed out is that many crypto-systems, such as AES, have a very structured hardware circuit. Yet the folklore, and published papers, suggest that SAT solvers do not break these systems. So the notion of modular SAT problems is perhaps yet to be found.

The real problems may have been selected to be easy to solve. A hardware designer at Intel, for example, may iteratively refine her design so that the SAT solver can succeed in checking that it satisfies its specification. This is a kind of “natural selection”: perhaps designs that cannot be checked by the solvers are modified. The closest thing that theorists study that has a bit of this flavor is famous smooth analysis technology of Dan Spielman and Shang-Hua Teng. Maybe for every practical design there is one “nearby” that can be checked efficiently. Suresh Venkatasubramanian suggested this idea to me. I think it raises some nice questions of how to model smooth analysis in a discrete domain.

The real problems may have More ideas needed. Please feel free to suggest some other ideas.

Open Problems

I keep quoting Alan Perlis about the power of exponential algorithms. I will soon start to quote Paul Beame who has made some great comments on this issue. Somehow we teach our theory students that exponential algorithms are “bad.” This seems quite narrow and theorists should look at the design and analysis of more clever algorithms that run in exponential time.

I also think another issue raised here is the reliance we place on the worst case model of complexity. As I stated in an earlier post there is something fundamentally too narrow about worst case complexity. The fact that SAT solvers seem to do so well on real problems suggests to me a problem with the worst case model. How can we close this gap between theory and practice?

Perhaps the main open problem is to get theorists and solvers more engaged so that we can begin to build up an understanding of what makes a SAT problem easy, and what makes it hard. And what are the best algorithms for “real” SAT problems.