Stay on Top of Emerging Technology Trends Get updates impacting your industry from our GigaOm Research Community

You weren’t alone. You fired up your Netflix (s nflx) device a couple Fridays ago, happened across Orange is the New Black in your Netflix recommendations, started watching the first episode and then wondered why you’d never heard of it. Netflix’s other original programming — House of Cards and Arrested Development — received huge preavailability marketing, and they weren’t even this good.

The answer to your question, like the answer to so many other questions these days, is data. Netflix didn’t have to spend millions of dollars advertising the new show hoping you would tune in — it knew you’d see it in the recommendations, it knew you’d give it a try and it knew you’d like it. According to the company during its earnings call on Monday, “Orange is the New Black” actually had more viewers watching more hours than during its first week than its predecessors had.

The success of a show like Orange isn’t entirely because of data, of course — it still has to be well written and acted, and the show is based on a memoir rather than the result of an algorithm that creates TV concepts — but the data definitely helps Netflix figure out what viewers want to watch and how they want it presented. As more television is delivered digitally, the industry itself almost has to become more like the web, where visitor behavior is analyzed ad nauseum and data helps inform even seemingly trivial changes in page layout or user experience. Content is king, but every little thing matters when it’s coming at your users from every direction.

Netflix executives are kind of vague when discussing just what it knows about what viewers want, but you can get a pretty good idea by looking at the data science behind the company’s vaunted recommendation system. Here is a list of things Netflix tracks, according to one of the company’s former data scientists presenting at last year’s Hadoop Summit:

More than 25 million users

About 30 million plays per day (and it tracks every time you rewind, fast forward and pause a movie)

More than 2 billion hours of streaming video watched during the last three months of 2011 alone

About 4 million ratings per day

About 3 million searches per day

Geo-location data

Device information

Time of day and week (it now can verify that users watch more TV shows during the week and more movies during the weekend)

Metadata from third parties such as Nielsen

Social media data from Facebook and Twitter

What’s more:

“Netflix’s most-interesting use of data might be its attempts to actually analyze what’s going on in movies themselves. … [I]t already captures JPEGs and notes the exact time that credits start rolling, and it’s looking to take into account other characteristics. It could make a lot of sense to consider things such as volume, colors and scenery that might give valuable signals about what viewers like.”

I have no idea how much it’s used in the realm of content creations, but I think it’s the latter type of data collection, as well as some of the stuff around fast-forwarding and rewinding that could really set Netflix apart. That’s because it represents so much more than just the traditional metadata that comes into play when deciding what programs produce, a topic New York Times media reporter David Carr covered pretty thoroughly back in April (citing the above data points as proof of Netflix’s focus on data). It’s different, too, from the work of Epagogix, which uses a mix of human analysts and algorithms to predict the box office revenue of movie scripts.

Figuring out what audiences want to see seems almost too easy. In the case of Netflix and Orange is the New Black, for example, I can almost see the results of predictive models that have analyzed the high viewership and high ratings for shows like Oz, The Wire, The Sopranos, Nurse Betty, Dexter, Weeds, you name it. Serial, prison, crime, dark comedy, female lead — let’s go find something that hits these marks.

There’s also a fair debate to be had about whether relying too much on data can result in content that’s too formulaic. Repetition is great if you’re talking about predicting when jet engines will fail or whether an email message is spam, not so much when you’re trying to be original. Yeah, a dark comedy based in prison might be a guaranteed modest hit, but maybe it’s only possible to find the huge hit by throwing a lot of pilots against the wall and seeing what sticks.

So maybe it’s the little things that might make the biggest difference in the end. Think about how we interact with content on the web or on our mobile devices. I’ll click on a headline because I’m interested in the topic and the post sounds interesting, but if the author wastes 400 words blowing hot air before getting to the point, or if the page design sucks or I’m bombarded by takeover ads, I’m out. I might leave and not come back if I’m offended by a banner ad. There are plenty of other places to find something I want to engage with.

It’s the same the same with TV, whether that’s Netflix (especially using a device like Roku), DirecTV or cable. At some point, I assume directors have to start considering things like how to ensure people get past the opening credits. Are there common elements at play when people decide to turn off certain shows halfway through or stop watching altogether? Should we avoid certain elements that people fast-forward through in shows like this? Should we have more of the things they rewind? (Orange is the New Black is a really good show, but it’s also pretty dense in the sex scene and topless women departments. Just sayin’ …)

We’re still talking about big data, to be sure, it’s just helping to ensure my show about a yuppie soccer dad with secret life as an eco-terrorist is a little more watchable than yours.