Shows like 'Empire' could be under-counted if set-top box data were the sole source of audience measurement.

A common complaint about TV ratings in the era of big data goes something like, "Why does everyone rely on a relatively small and antiquated Nielsen sample when set-top boxes can collect so much more information?"

It's true: There are some 60 million set-top boxes connected to cable and satellite systems, and a lot — though by no means all — of them can return data on who's watching which shows to the providers.

The users of those set-top boxes, however, are not representative of the country of the whole, a new Nielsen study found. Not only have the number of households using broadband only or over-the-air signals risen by a lot in recent years, but also relying on only on set-top box data would significantly undercount African-Americans and Hispanics and young-adult viewers.

Yes, this is a study by the dominant ratings provider, which obviously has a vested interest in remaining a leader in the TV measurement business. But Nielsen's analysis of who uses set-top boxes makes a case that more data doesn't necessarily mean better data.

As a case study, Nielsen looked at Empire. The Fox drama ranks 16th among adults 25-54 this season in Nielsen's national panel (which consists of 40,000 households and about 100,000 people). The company says its data shows about three-fourths of the show's audience is made up of people of color.

Among only those households whose set-top boxes allow "return path data," Empire would rank 38th in adults 25-54.

Users of set-top boxes that allow for return path data tend to be older and are more likely white than the Nielsen sample as a whole. Set-top box users are also a smaller piece of the TV whole than they were just a few years ago.

Nielsen says that 28 million households — about 23 percent of the 119.9 million TV homes in the U.S. — get their TV either over the air or via broadband only (using smart TVs or Roku, Apple TV, Chromecast and other similar devices). That's double the number of homes that did so just four years ago.

More than 40 percent of those over-the-air and broadband-only homes are either African-American, Asian or Hispanic, and 10 percent are adults under 25. All would end up under-counted only using set-top data.

The larger adults 18-34 demographic — who make up a significant number of cord-cutters — is 17 percent less likely to have a set-top box capable of returning data than the Nielsen representative panel. Advertisers and marketers pay a premium to reach those viewers; not counting them on the (relatively infrequent) occasions they do watch ad-supported TV would be bad for business.

Nielsen's system is hardly perfect: It has had issues catching up to the proliferation of ways and devices on which people can watch TV programming, for one thing. That's what led to a brief standoff with CBS in January.

But Nielsen does strive to put together a national sample that looks like the country as a whole. As TV outlets look for ways to build more inclusive shows, ensuring underrepresented groups are counted is important.

Follow THR.com/Ratings for more Long View columns and ratings news.