The misuse of data science by managers, marketers and other untrained professionals is the hallmark of my generation. Cherry picking, confirmation bias, ridiculously small sample sizes — those are just the obvious mistakes. As digital marketers, we need to start looking at the data of internet era as suspect and not just see errors in how we process that data.

Old School Data

A generation ago companies turned to third-parties such as Nielsen to get data on how many eyeballs were potentially seeing an ad. A company like Nielsen has no incentive to lie. On the other hand, networks that sell ads, whether they be traditional broadcasters or Facebook, have every reason to pad the numbers in their favor.

Even if we compare a traditional print publication to Google or Facebook, it’s much easier to verify how many papers are sold. You can’t ever know exactly how many people saw your ad tucked away on page B3, but focus groups can give you an idea if your target audience tends to glance at page B3.

Measuring things used to be expensive. You didn’t hire a market researching firm to cook up a bunch of useless numbers. This provided a strong disincentive towards data-ism.

There’s no need to romanticize the Mad Men era, but we can still reflect on whether older approaches to data hold merit for marketers today.

Don’t take self reported data at face value

I have no reason to accuse Facebook or Google of outright lies — at least in the sense that their analytics report one thing but they show perspective customers a different number. There’s no reason to lie so blatantly when you control the narrative.

My CV doesn’t mention how many hours I spend on Reddit at work, projects I’ve screwed up or blog posts that not even my Mom would read. Nothing on my CV is false; it’s just not the full picture. Any time you get to choose what data to provide, you do this. Nobody takes five pictures and then selects the worst one to share. There’s nothing wrong with this, but that’s how social media or CV don’t give a full picture.

This is why data needs to go through a disinterested third party that publishes everything, not just the highlights. In medicine this has become a serious issue as drug companies fund studies. If you run enough studies, one of them will eventually show something that your marketing department can use. When studies don’t paint your product in a positive light: don’t publish them.

The issues raised in the corruption of evidence based medicine are valid in any field. For instance:

Negative trials (those that show no benefit for the drugs) are likely to be suppressed. For example, in the case of antidepressants, 36/37 studies that were favourable to drugs were published. But of the studies not favorable to drugs, a paltry 3/36 were published. Selective publication of positive (for the drug company) results means that a review of the literature would suggest that 94% of studies favor drugs where in truth, only 51% were actually positive.

Even if you take Google or Facebook at their word, they’re never going to show you data that makes them look bad. Every single SaaS company that’s trying to bamboozle you with self-reported data falls into the same trap. The prevalence of unmarked promoted content, native advertising and affiliate marketing make me skeptical of any data published online — even if it’s from a ‘neutral’ review site.

When companies don’t make money, suddenly profit isn’t an important metric. Look at engagement and monthly active users! You don’t have to lie to obscure the truth. That’s controlling the narrative.

Real data is a mess

Meaningless statistics are cheap to get. It’s trivial to count how many times a server has generated a webpage. That number isn’t going to tell you how many humans in your target audience have read your content. Even with endless CAPTCHAs, it’s still not clear if a visitor is a regular human, bot or click farm worker.

Once a computer spits out a number, it’s impossible to have a reasoned discussion about that number with managerial types. You’ll immediately be accused of being a luddite or having no idea what you’re talking about.

This is in stark contrast to the reality at big tech companies. Google, Amazon and FB are throwing the world’s best data scientists at getting clean data from online metrics and still running into problems.

Getting closer to outright lies

When you dig deeper into how much of the internet is fake, the numbers are jarring:

Studies generally suggest that, year after year, less than 60 percent of web traffic is human; some years, according to some researchers, a healthy majority of it is bot.

Even more absurd:

My favorite statistic this year was Facebook’s claim that 75 million people watched at least a minute of Facebook Watch videos every day — though, as Facebook admitted, the 60 seconds in that one minute didn’t need to be watched consecutively. Real videos, real people, fake minutes.

In other words, FB counts videos autoplaying in the background as you scroll through your feed as being watched. Broadcast TV networks simply can’t manipulate numbers this way.

VC metrics are irrelevant for most businesses

The internet and the VC system have unearthed a slew of meaningless metrics. You really shouldn’t care about engagement, social media likes or other metrics that are easily gamed. Does it matter how many Twitter followers a company has if nobody is being their product?

It’s all too easy to get stuck chasing metrics that don’t bring you paying customers. In bigger companies this is understandable, as there are still business critical positions that are remote from the actual product.

In smaller companies, ego and delusion warp reality. It’s hardwired into our brains to like attention and being popular. Hence it’s easy to get sidetracked. Evolution hasn’t selected for preferring profit over vanity metrics yet.

When startups are chasing VC money, the problems with data only compound. It’s in nobody’s interest to purge inactive users, ignore bot traffic and otherwise filter out noise. Profitability doesn’t matter when you’re spending someone else money and your data will never see a third-party audit. What could go wrong?

The data you can actually trust

Take a step back and look at metrics that aren’t likely to be fudged. If someone buys something using your link, that’s probably real. If a client tells you that they found your company because of a blog post, that’s honest feedback. Conversions and sales should be the only metrics that matter at the end of the day.

Social media shares and subscribers might correlate with conversions for a company, or they might not — I’ve written blog posts that have gone viral on social media yet brought in zero customers. A good corporate blog is expensive: developers, content writers and designers aren’t cheap. You have to be pulling in a lot of sales to justify those sorts of costs.

Be skeptical of what you can measure and how well you can measure it. Value personal and qualitative feedback. Use data to identify issues that require more attention instead of using metrics as end in and of themselves. Ignoring vanity metrics and focusing on the bottom line will make you a better marketer.

I’m not a data nihilist. There’s plenty of meaningful data for marketers to use once they get through the minefield of bad data science and fake data.