Since we finally unveiled our Steam Gauge project late last night, we've been overwhelmed by positive responses to the data. It's been come from all over—comment threads, Twitter, e-mail, and links from other sites. It's much appreciated.

We've also received some questions and concerns about our data, our methodology, and what we plan to do with this project going forward. Here are some responses to the most common issues that have been brought up in the last 24 hours or so.

Isn't your data off? Steam didn't always track gameplay hours in the past

Indeed. Before posting our analysis last night, I was not aware that Steam only started tracking the "number of hours played" statistics on SteamCommunity.com in March of 2009. This isn't a small oversight: games played solely before this date would show up erroneously as "unplayed" in our data, and games released before that time might show fewer total hours than they should. This helps explain why older games like Ricochet and Deathmatch Classic seem so unpopular among people who own them—because most players probably put in their hours before March of 2009.

To be clear, this issue should not affect the "ownership" data in the original piece—games bought at any time appear in the scraped Steam data correctly. For some of the other charts, it simply means that games from the pre-2009 period can't be compared completely accurately to those released after March of 2009. I've noted this in the original piece and updated a number of charts to reflect this.

The biggest change in the actual data comes in the aggregated distribution of hours played. Restricting these charts to games with a release date after March of 2009 (i.e. the ones we have a "complete" gameplay picture of) shows that only 26.1 percent of registered copies are sitting unplayed, as you can see above, rather than the 36.9 percent cited in the original article. This is probably closer to the true number across all Steam games, though it's hard to say how the actual play data looks for the 800 or so Steam games released before gameplay tracking was activated.

While this is an important limitation in the data, it doesn't completely change the general conclusions we've reached so far. The most played games are still the games that have seen the most play in the last five years, which is all we have information about. That time span covers the entire release history of over two-thirds of the games ever released on Steam. It's a pretty robust data set to study even if it isn't perfect.

Steam doesn't measure my gameplay hours correctly, even since 2009. Is this common?

Since launching our piece last night, we've received a lot of anecdotal reports from people saying the gameplay hour numbers listed for their games on SteamCommunity.com are inaccurate. Most say the site failed to register sometimes huge chunks of gameplay, while some say the site is actually showing more hours than they've actually spent with the game.

It's hard to tell exactly how widespread or impactful this problem is, unless there's someone out there with a treasure trove of independent verification for exactly how many hours they've put into their games separate from Steam's reporting.

Absent that, there's not much we can do to account for these kinds of reported discrepancies. I will say that, in my personal experience, the hourly reporting from Steam seems to match up with real world gameplay time in most, if not all, cases. In any case, we're at the mercy of Valve's reporting here. If you really think there's a problem with their numbers, we encourage you to take it up with them.

What about offline gameplay and time spent idling on menus?

Both of these are also potential issues with the accuracy of Steam's "hours played" reporting. Players that are using Steam in offline mode don't seem to be counted in the service's aggregate statistics, and those that leave a game window open and idle for hours or days can inflate numbers substantially (though stats like "median hours played" are more resistant to these problems). In the case of a game like Dota 2, time spent spectating other matches is also showed as "gameplay," which can be considered a skew in the data as well.

Again, we're at the mercy of Steam's reporting here. There's no real way to accurately separate out or re-add these numbers into the gameplay data. It would be nice to think these issues cancel each other out to some extent, but in reality they probably slightly inflate the data for a lot of games that users are liable to leave running in the background for one reason or another. All we can suggest is that you take this into account when considering the data.

How accurate are your numbers?

Since publishing, we've had a few more developers reach out either privately or publicly to offer their own Steam sales data for comparison to our estimates. With over a dozen "real world" spot checks in hand now, we have yet to see an instance where the error in our numbers is more than 10 percent off from the actual numbers developers have access to. Sometimes our error is much less than that, of course, and the error can go in either direction (though so far our numbers seem to over-estimate slightly more often than under-estimating).

While 10 percent isn't a small functional margin of error, it's also much better than a simple shot in the dark guess. If we're reporting sales of two million units for a game, you can be pretty confident the actual sales number is somewhere between 1.8 and 2.2 million.

How are free-to-play games handled? Doesn't everyone on Steam "own" them?

In a way, everyone "owns" every free-to-play game, but not in our stats. If you go to the store page for a free-to-play game like Dota 2, Steam will indeed tell you that you "already own the game." However, the SteamCommunity.com profile pages don't seem to show the game on your account until you actually download and start playing it. Since we're taking our data from those Steam Community pages, the only people who show up as "owners" of free-to-play games in our reports are the ones who have downloaded and played those games at least once.

What about total revenue? How many of these games are selling during deep discounts?

No doubt a lot of sales are coming through bundling or discounted sales. The problem is, we really have no idea when a particular Steam game was purchased before we started running our analysis in February. Thus, we don't know how much the going price was for that copy of the game when it was acquired.

Even then, it's hard to tell if a game was purchased at full Steam asking price or through some sort of pay-what-you-want bundle. And that doesn't even begin to take into account in-game purchases in free-to-play games. We could look at how much the game library would be worth at current Steam asking prices, but that would probably be more misleading than illuminating.

That said, we are looking at sales rates since we started running our analysis and plan to examine how things like discounts and bundling promotions affect that sales trajectory in the future.

That long tail graph is pretty hard to read. Can you improve it at all?

Sure. Here's a version with the vertical axis in logarithmic format to better discern the vast middle ground.

Is anyone else doing anything like this?

There are a few other notable projects that take a similar tack to aggregating public Steam data, though none that we think are quite the same as our project:

Steam Charts has been tracking data based on Steam's real time, daily "most played" numbers. They archive and aggregate the results for posterity.

GaugePowered lets you view hourly play data for your own account, and it even gives aggregated estimates of how man dollars per hour you'll spend on a particular game.

The Global Stats Project looked at achievement data for 80 popular Steam games to get some interesting data on completion rates.

Steam Database gathers a bevy of basic information about every game and app on the service through Valve's API and serves it up in an easy to use format.

SteamPrices.com is one of a number of sites that keeps track of when games are discounted. Steam Alerts will even send you a notice when the price drops below a certain level.

Finally, the TF2 Backpack Examiner is an incredibly thorough database of which hats and items are owned by players of that game.

I'm sure there are plenty I am missing; please leave links to good ones in the comments.

Can we get your raw data? How about your code?

I'm a bit hesitant to simply give away the results of what has been nearly a year of on and off work on this concept before using it for a least a few more "exclusive" reports and analyses myself. And rest assured, there are a lot more reports on this data from a number of angles coming down the pike.

As for my code, it's a bit of a disorganized, uncommented mess at the moment. I have laid out my methodology in quite a bit of detail, though, so anyone with the server capacity and coding skill could probably replicate my work pretty easily (and do whatever they want with the numbers).

That said, I know there are a lot of people clamoring for a deeper dive into this treasure trove, so I've included an expanded list of the top 100 played games on the next page, complete with "ownership" data as well. Hopefully that will give all you data-heads something to chew over while we work on our next report.

If there's some particular analysis you'd like to see in the future which we might not have thought of, leave a comment and we'll take it under consideration. If you have a specific request for data about a particular game that you simply must know about, drop me a note and I'll see what I can do.

Is Dota 2 really that popular?

Yes, it is.

Click through to the next page for an expanded list of the top 100 most-played Steam games, according to our estimates

[Update - April 17, 2014: Added an answer addressing how free-to-play games are handled in our stats.]