Traditionally, advertisers have accepted fraud as a cost of doing business on the Web because the volume of fake traffic generated by bots and malware was too great to do anything about. But recent advances in big data analytics are giving ad buyers and other companies new tools to detect “garbage traffic” and other unethical schemes, such as cookie stuffing, and to reclaim ad revenue previously lost to fraud.

One of the Internet’s dirty little secrets is that more than more than one-third of all Web traffic is fake, having been generated by bots, plug-ins, and assorted pieces of malware created by cybercriminals, shady online outfits, and others who make money by impersonating people on the Web. For every two humans making their way across the Internet–where they view display advertising and occasionally make a purchase–there is one browser session being directed from afar.

This garbage traffic hurts advertisers. For starters, it artificially pumps up the page-view numbers, making ad buys more expensive than they should be. L’Oréal, General Motors, and Verizon Communications recently discovered their online ad purchases were influenced by garbage traffic, and some are asking for additional ads to compensate, according to a recent Wall Street Journal story.

Downstream from the biggest companies, advertisers have done little to deal with the situation. But the losses are becoming too big to ignore. While online advertising is expected to be up 17 percent this year, to about $50 billion, fraudsters will collect about $6 billion of it, or 12 percent, according to White Ops, an advertising fraud detection firm.

Now, thanks to advances in big data analytics, it’s becoming possible to inspect traffic on a much more granular level, and to weed out the garbage traffic and identify the fraudulent schemes, such as cookie stuffers.

Tim Weaver, the CIO of yogurt giant Dannon, realized something wasn’t quite right with his Web log data after running it through a big data transformation tool from Paxata. The data was related to a coupon program that Dannon’s Stonyfield Farm brand was running on its website.

“We were working with Stonyfield marketing team to create a report around the data so they could see the utilization of that offering,” Weaver tells Datanami. “And when we brought it into Paxata, the first thing that the Paxata console revealed to us is that there was some kind automated bot process that was requesting coupons about once every half a second or so.”

Without the ability to visualize a large amount of data, the bot may have continued to grab those coupons. But armed with that information, Weaver’s team immediately closed that account, and then put logic checks in place to make sure bots couldn’t impersonate a human again.

Another firm that’s leveraging big data to fight online fraud is Convertro, a provider of software that helps companies optimize their spending on advertising. The Santa Monica, California, company built a big data application on Hadoop to collect massive amount of click–stream data from a combination of ethical cookies, log-in data, and other methods. Once it’s all sifted and sorted, Convertro gives its media-buying customers dashboards that show, at a very fine-grained level, who’s viewing their ads across multiple channels (online, print, TV, radio); how much they spent to get their ads in front of people; and whether they should continue with that advertising mix.

Convertro didn’t build its big data machine with the goal of detecting fraud. At some level, the company doesn’t even care about garbage traffic, for the simple reason that bots don’t have credit cards and therefore can’t affect the “conversion rate,” or the frequency that traffic generated from some form of advertising turns into e-commerce spending. “It’s just not going to have any influence on the outcome,” says David Perez, Convertro’s chief marketing officer. “Whether the bots are present or missing, it doesn’t affect the conversion rate.”

But when it comes to schemes like cookie stuffing, Convertro takes a more active role in sniffing out fraud. Cookie stuffing is often conducted in conjunction with affiliate networks, which are online outfits that promise to drive traffic to a participant’s website in exchange for a commission on any sales. It usually starts by convincing regular Web users to install a cookie-stuffing toolbar into their Web browser. Then, when the user is about to make a purchase on a legitimate e-commerce website, the toolbar will stuff a cookie into the session at the last second before the transaction, to make it appear that it was the source of the referral and therefore collect the commission, usually between two and 15 percent.

Convertro uses its big data analysis engine–a combination of Hadoop, HP Vertica data visualization tools, and the Python-based data analysis tools from pandas, all running in Amazon AWS–to spot garbage traffic and instances of cookie stuffing at a very detailed level. It does it by inspecting other cookies, header data, and affiliate network ID numbers to determine where its customers’ traffic is originating from. Convertro then applies an algorithm that analyzes all this behavior to determine the likelihood that the traffic is part of a cookie-stuffing scheme.

“When the customer logs into their dashboard, they’re able to see a list of cookie stuffers, a summary report as well as a detail report that shows individual stuffings that are happening,” Perez tells Datanami. “It will show this particular affiliate ID had this number of cookie stuffs and show you the timestamps for individual ones that are happening. When an affiliate network says that’s not happening, you can actually show them the underlying file that explains it.”

Armed with this information, Convertro customers are taking actions in allocating their spending for marketing and advertising. In many cases, the action involves ending a relationship with big affiliate networks, such as ShopAtHome or CJ (formerly CommissionJunction), where cookie stuffing is a common occurrence. “We had one client where they ended their relationship with ShopAtHome, and they saved 94 percent of what they were paying them,” Perez says. “They killed the toolbar and they saved 94 percent and didn’t see a decrease in conversions.”

Convertro got so good at detecting the deceitful practice that it decided to publish a series of reports identifying the 10 biggest cookie stuffers. However, it recently stopped the series because the cookie stuffers tweaked their techniques to avoid detection. “Unfortunately, because we told all our customers about the toolbars, it biased the data,” Perez says. One large cookie stuffer went so far as to program its toolbars to automatically opt out a million of its users from Convertro’s legitimate data collection campaign, which Convertro ignored.

While marketers can overcome bot-driven traffic by paying for leads on a per-impression basis, many don’t have a good way to deal with cookie stuffing, which defrauds marketers who pay on a cost-per-acquisition basis or on an affiliate commission basis.

“I don’t begrudge these people who are doing it. They’re making good money,” he says. “There’s just a lot of naive marketers out there who either don’t know what’s happening or they just don’t have the time to deal with it or the tools that can detect it.”

Fighting fraud has long been one of the popular uses for Hadoop and other big data technologies. The work Convertro is doing provides one more example of how Hadoop can be used to make the world a little less comfortable for those who choose to steal for a living.

Related Items:

IBM Flushes Out Fraud with Big Data Analytics

Eight Ways Analytics Powers Fraud Detection

Fighting Telephone Fraud with Data Analytics