$\begingroup$

The decidability of conjunctive query containment has been open for over twenty years. Resolving this would be a breakthrough in database theory.

Query containment takes as input two queries $Q_1$ and $Q_2$ and asks whether $Q_1$ applied to any database $I$ yields at least as many answers as $Q_2$ when applied to the same database $I$.

In conjunctive queries one uses AND to link together existentially quantified predicates. In SQL terms, conjunctive queries are the SELECT-FROM-WHERE queries using "=" and "AND" but no subqueries or aggregation. This is perhaps the most common kind of database query, and includes most search engine queries.

What makes query containment potentially undecidable is the quantification over infinitely many possible databases $I$. Algorithms that do exist tend to rely on turning this infinite quantification into a syntactic question, whether there is a homomorphism of some kind between $Q_1$ and $Q_2$.

For slightly more powerful (i.e. "advanced") queries that allow OR or $

e$, query containment is known to be undecidable.

To compare queries by counting how many answers they generate, one uses the semiring $(N,+,\times)$ of natural numbers with addition and multiplication. Query containment can also be generalized to other ordered semirings. For all positive semirings, conjunctive query containment is NP-hard. However, for most semirings other than $(N,+,\times)$ that people care about, conjunctive query containment is decidable. Unfortunately, the counting case falls into the zone of semirings where decidability of conjunctive query containment is still open.

For pointers to the extensive literature and a rigorous treatment, see a ToDS paper (in press) by some people.

One could turn this into a compelling question for a non-technical audience by demonstrating Googlefight, then asking how one can tell which query gives more answers than the other, without first peeking at the data. If $Q$ and $R$ are conjunctive queries, then $Q$ always gives at least as many answers as $Q \text{ AND } R$, because the latter is somehow syntactically "larger" (there is a homomorphism from $Q$ to it), but things get rather tricky from there on.