Editor’s note: Guest contributor Eric Ries is a consultant and the author of The Lean Startup, which he will be launching at Disrupt SF on Tuesday. Follow him @ericries.

I was recently asked to spend some time with an early stage startup that has a revolutionary new product. I asked them if they thought they were making their product better. As with every other startup I’ve asked, they said yes.

Then I asked them, “How do you know?” Their answer was also pretty standard: they explained that they were adding new features, improving quality, and generally executing against the product roadmap. The features are a combination of requests from their early customers and vision-inspired guesses from the founders. Each month, their gross numbers—the total number of customers, total revenue, and total usage—move up and to the right. So, they said, they must be on the right track.

Then I asked them this question: what would happen to the company if the entire product development team took a month off and went on vacation? The sales staff would keep signing up new customers. The website would continue to get new traffic from word of mouth. Could they be sure that they wouldn’t—as a business—be making just as much “progress” as they claim to be making now?

In one scenario, they’ve been working overtime, putting in crazy hours, and in the other, they’d be on vacation. If both scenarios lead to the same result, how can the product development team claim to be making progress? To be doing effective work?

Most product teams don’t know if they are making their product better or worse; that’s why customers feel a twinge of fear every time they have to update or upgrade. Despite this, those same companies may be having extremely fast growth because even though the product is getting worse, other things are going right: network effects are kicking in, the company is being lauded in the press, or they are surfing on a general wave of growth in their industry.

In one of the startups I founded, we had an extended period where we were really focused on the conversion rate of new customers into paying customers (it was a freemium business model). Our vanity metrics were looking good—up and to the right. The graphs even had the shape of the classic hockey stick. But cohort analysis revealed that something was wrong.

We divided our customers into cohorts, looking at the new customers who joined each day as a distinct group. Then we could ask, “How did today’s customers compare to yesterday’s?” And, much to our frustration, the conversion rates were almost exactly the same. It was easy to get conspiratorial. It felt like each group had set up a conference call with the previous group. “How many of you bought the product? One out of a hundred? OK, good, we’ll do that, too.”

The frustrating part was this pattern stayed constant for months, even though we were making the product “better” almost every day. The fact that customer behavior wasn’t changing revealed that we were wrong. We weren’t making the product better, we were making it worse.

The antidote to this problem is to stop using vanity metrics and start measuring progress more rigorously.

Most of us think of A/B testing (sometimes called split-testing) as a technique out of direct marketing, where it was pioneered. But it’s even more powerful when used directly in product development.

In my new book, I tell many stories of companies who have made the switch away from vanity metrics. One such company is Grockit, the online education company that rocked TC50. When they switched to routinely split-testing new features, they made a shocking discovery: most of their new features did not change customer behavior at all.

Because new features add overhead to products (generally making them more complicated), a new feature has to provide so much benefit to customers that it’s worth incurring this overhead. There is no such thing as a “neutral” new feature. “The same” means worse.

When our product changes fail to actually improve business metrics—customer retention, usage, or sales, we should have the courage to admit it. In fact, failing to make the product better is an extremely powerful moment, one in which we have the opportunity to learn something important about ourselves and our customers. If we think a feature makes the product better but our customers disagree (not by what they say but by how they behave), then something about our mental model is flawed. It’s time for a new experiment to figure out what.

If this is sounding to you “just like” the scientific method, you’re right, it is. Most of us are currently doing product development astrology, not science. But it doesn’t have to be that way. We can do much, much better.

Products are really experiments

Visionaries are right. Customers don’t know what they want. There’s plenty of good psychology research that shows that people are not able to accurately predict how they would behave in the future. So asking them, “Would you buy my product if it had these three features?” or “How would you react if we changed our product this way?” is a waste of time. They don’t know.

But imagine a physicist who told you science was impossible because you can’t ask electrons what they want. You’d laugh in his or her face. Science works by conducting experiments that reveal how the world actually works. A scientific approach to product development works the same way.

Experimentation does not mean shipping something to see what happens. If we do that, we’re guaranteed to succeed—at seeing what happens. Something will always happen. You can always make up a good story about something you think you’ve learned, and no matter how bad things are going, you can always find at least one chart in Google Analytics that is up and to the right. And, as I mentioned before, it’s possible for your vanity metrics to be going up even while you’re ruining your product.

Science requires having a prediction, a hypothesis, about what will happen—based on theory, a set of assumptions. We need that prediction so that we can compare it to the actual results of our experiment. That’s one of the reasons why vision is so critical to startups. We need to be able to predict what customers are supposed to do when they encounter our product.

We can even use that vision to make quantitative predictions. Do we believe that the product will go viral? Then product/market fit means something very specific: that the viral coefficient will be greater than one. Do we believe our product is extremely sticky, either because of strong network effects, addictive gameplay, or other forms of lock-in? Then product/market fit means something different: customer retention and engagement should be very high. A similar pattern holds if we think we can grow through paid advertising. Product/market fit means that the cost of acquiring a new customer is less than the marginal profit we make from that customer. Each of these is a hypothesis we need to test our early product against. If product improvements aren’t moving the numbers in the direction of that hypothesis, they are waste.

Science and vision are not opposites or even at odds. They need each other. I sometimes hear other startup folks say something along the lines of: “If entrepreneurship was a science, then anyone could do it.” I’d like to point out that even science is a science, and still very few people can do it, let alone do it well. Science requires vision, just as startups require vision. Building the right product requires systematically and relentlessly testing that vision to discover which elements of it are brilliant, and which are crazy.