With Valve's content distribution/social/DRM platform, Steam, having supported all three major desktop platforms for nearly a year now, I've been intending to write something about the Steam Hardware & Software Survey for some time. In the coming months, the impact of Valve's recent SteamOS and Steam Machines announcements is likely to begin shifting some attitudes, perceptions and ultimately (dramatically or subtly) changing the landscape that the survey results depict.

In this article I will be looking at what value the survey has for gamers and the gaming industry, some things to look out for when interpreting the results, and some thoughts on how the survey could be improved to increase its credibility.

What Is The Survey And Why Should We Care?

From sometime prior to 2004, Valve have been collecting and publicly publishing what is most likely the largest survey of consumer gaming hardware and system configuration ever conducted. What presumably started as a tool for Valve to help understand their audiences has grown into something that gamers, journalists and even other developers get excited about and attempt to draw upon as a resource.

Traditionally most understandings of computer hardware usage have come from sales statistics, which in itself can be a good measure for the uptake of new technologies, but for better or worse, vendors are usually unlikely to reveal this sort of information unless it can be used to increase their image.

Some software developers have collected details and statistics from their own users (much as Valve did when they originally launched the Half-Life 2 Hardware Survey which would eventually evolve into the Steam Hardware & Software Survey), but whilst these sorts of datasets seem more likely to be presented and discussed, demographic biases within userbases make it difficult to draw conclusions about the industry.

Technical groups and journalists also run surveys, but it's difficult to imagine that a magazine or tech blog's readership would be representative of the broader gaming community either.

One key underrepresented aspect in particular is the longevity of hardware, which as anybody who has had a hand-me-down computer or graphics card knows, plays a large role in the consumer hardware ecosystem.

Being tied to a title agnostic distribution platform that spans the bredth of gaming genres, the Steam Hardware & Software Survey is positioned to inherently sidestep most of the issues that face the types of data collection mentioned above. By providing a moderately unobtrusive opt-in prompt to randomly selected users when Steam is launched, the survey is able to target gamers specifically, cross demographic bounds and have a higher likelihood of response.

The frequency and regularity of the survey also provides what is perhaps the most detailed picture of gaming system configuration over time that has ever been compiled.

Clear understanding of industry state and trends can become a resource to hardware vendors, allowing them to better understand what legacy devices are in use, potentially leading to better support and drivers, and better timed/targeted introductions of new models.

For game developers, knowing what sort of configurations are out there can help provide sensible indicators for minimum system requirements to test against and driver/API features to target.

For gamers, the survey results can become an indicator for when upgrades are appropriate, whether second hand parts are likely to be of use/value to others, what hardware might be worth considering for future purchases, and (for better or worse) when bragging can be justified.

There are less tangible values here at play as well. Gaming culture often tends to live and breathe on who can and can't play. Having realiable indicators that help lead developers towards broader compatibility and users towards common capabilities seems more likely to result in a healthy community than one that's driven entirely by sales of latest hardware.

For the future, having understanding of the ways in which platforms have changed and evolved over time may provide otherwise imposible-to-discern insights into the heritage behind whatever legacies we leave. In ways I can not put into words[1], this feels meaningful.

What Should We Consider When Interpreting The Survey Results?

So we've ascertained that there's some value in the figures here, but for anybody who isn't used to interpreting statistics, there are still some hurdles in the way of pulling meaningful interpretations out of the results.

Percentages as Ratios

First up, it can be easy to miss that percentages are ratios. When looking at changes over time, an increase or decrease in percentages doesn't necessarily mean that there is a corresponding increase or decrease in the population represented by that percentage.

Let's take a look at some hypothetical numbers. Percentages on the left are pulled from the May 2013[2] results and multiplied by 54 million to give an estimate of the number of active accounts represented by them. In the values on the right, the total number of users has been increased to 52,263,476, with Linux going up by 2% of the estimated May 2013 figures and the other platforms being given an arbitrary increase on top of that.

Example change in stats where an increase in Linux usership is represented by a lower Linux percentage.

What we see is that even though the Linux population has increased by 20,000, it has become proportionally smaller, highlighting that proportional growth doesn't necessarily mean an increase or a decrease in population size. This means that the percentage point variance indicators from previous results that are shown in the survey can only indicate changes in dominance, and not growth or decline of population sizes.

Dominance is something to consider when looking at things like safe target hardware specs, but shouldn't be taken as an indication of market size. Whilst Steam's vast userbase might make a lowest-common-denominator approach seem worthwhile, common hardware or operating system isn't a reliable indicator of common tastes in games - especially when hardware specs are constantly shifting as users upgrade and Valve are pushing Steamplay as a selling point, encouraging users to feel comfortable migrating between or using multiple operating systems.

Platform Bias

On the topic of dominance, the current presentation of survey results doesn't highlight or normalise for bias caused by the selections of titles available on Steam. For example, a comparison of DirectX 10/11 compatible systems versus older systems is highlighted without any indication of how many DirectX 10/11 exclusive or non-supporting titles are published on Steam.

Nowhere is this more problematic for interpretation of the survey results than when looking at operating systems. At the time of writing (mid October 2013), there were 2,205 games published on Steam. 2,204 of those supported Windows, 527 supported Mac OS, 204 supported Linux[3], and 526 of those were listed as providing SteamPlay.

It seems undeniable that the software distributed via Steam produces a significant bias, the sort of which you would expect when a restaurant with a tiny selection of non-meat dishes claims that they see little custom from vegetarians.

On the left we have total titles available for each platform, and on the right, a breakdown of how many titles support each platform.

Taking results from mid October 2013 (which, sadly has less granularity than earlier data[2]), let's look at estimated platform users per title.

( platformPercentage × totalUsers ) ÷ platformTitles = usersPerTitle

Assuming that one half of the "Other" category represents Linux users, and the remainder is equally split between Mac OS and Windows, we end up with 1.39% for Linux users, 3.87% for Mac OS and 94.74% for Windows.

On the left, we have estimated total users per platform, whilst on the right, we have users adjusted for title availability - a more familiar ratio.

Adjusted for title availability, operating system dominance in the Steam Hardware & Software Survey comes very close to matching the ratios seen when aggregating per platform revenue from all major Humble Bundle promotions together (even including some non-cross platform ones), leaving the results telling a much different story when interpreted with title bias in mind.

In addition to the potentially massive bias caused by title compatibility, operating system dominance is also potentially confounded by Linux (and sometimes Mac OS) gamers making use of technologies such as Wine to play Windows exclusive games both prior to and after Steam launching natively on their platforms. Players making use of Wine are sometimes prompted to participate in the survey, introducing anomalous data (it has been speculated that unusual MSAA values may be an indicator of Wine usage). Dual booting also results in users being prompted to participate in the survey from their non-preferred platform. Whether these classes of survey respondents should be considered to be on one platform or another is definitely debatable, but either way, awareness of this is important to take into consideration when interpreting the survey results.

Insignificant Fractions

Valve provide a degree of resolution in the survey results which causes percentage point variances to be of a very high resolultion, so high in fact that we often see people trying to analyse variances which are too small to be considered meaningful.

Let's take a look at how big 0.01% is within our estimated number of active accounts.

0.01% of 54 million.

You may find yourself thinking, "That might be "insignificant" out of 54 million, but five and a half thousand extra customers isn't something I'd say no to!" And that would be perfectly valid. The problem with running off with that interpretation is that 0.01% may not be 0.01%. Let start to look at what that means within the scope of a sample.

To do this, we need to make some assumptions. Let's pretend, for the sake of having something to calculate against, that 1/12th of the active Steam userbase is prompted to participate in the survey.

54,000,000 ÷ 12 = 4,500,000

Let's also assume that half of those people are interested in or bother to get around to actually participating.

4,500,000 ÷ 2 = 2,250,000

How many people does it take to make up 0.01%? Let's take a look. I'll make this one bigger so that you can see more clearly.

To make 0.01% 3 pixels wide at the circumference, this chart would need to be 6,500 pixels in diameter.

Margins Of Error

So it takes just 225 individuals to represent 0.01% in a 2,250,000 sample. "Alrighty, what's wrong with that?" I hear you ask impatiently.

Statisticians, being lovely people, have some tools (or concepts if you prefer) for helping understand the accuracy of samples. One of these is the Margin of Error. It's based around the idea of using an expected confidence level to determine how much of a standard deviation can be reliably expected to truly represent the broader populaton to give you a measurement of how much your results could reasonably be off by (this is a simplified definition - head over here or for deeper exploration).

If we assume a confidence level of 95%, which represents a radius of 0.98 from the true population value, and divide that by the square root of the sample size, we can extrapolate our Maximum Margin of Error (the most common way of expressing a margin of error, perhaps because it requires less calculation).

0.98 ÷ ( √2,250,000 ) = 0.000653333

If all of our assumptions are correct, the margin of error for any given value in the survey results is going to be about 0.065 percentage points, which renders our 0.01% or even 0.05% meaningless.

How Could The Survey Be Improved?

You may have noticed that there's a been a bit of "let's assume this" or "let's pretend that" in this article. This is because there are some aspects of the survey's method or presentation of results which are unclear to an extent that it makes interpretation unduly cumbersome and hurts the credibility of the survey's results.

Whilst there is value and return for the industry and community in the survey, its credability and reliability are undermined by several missing pieces of information (which, sadly, seems a common practice these days).

To finish up, I'm going to offer some suggestions for changes and enhancements that I feel would dramatically improve the survey's robustness andreliability.

Changes After Publishing Need A Notice

After a new batch of results go live, it's common for the figures to fluctuate for several days. Whether this is caused by new data coming in or mistakes during publishing is unclear. A notice explaining why the changes were necessary would help avoid confusion surrounding conflicting citations and encourage those using out of date information to update.

Sampling Method Needs Some Level Of Transparency

The sampling method used by the survey is horrendously unclear. An understanding of sample size and the means through which users are randomly selected could be given without undermining the anonymity of the survey. This would allow for accurate calculation of margin of error and better estimates of confidence.

It's currently also not clear what happens when a user declines to participate in the survey. Is another user prompted in their place, does the sample size shrink, or does something less intuitive happen to maintain sample size?

Survey Results Need Less Ambiguity

Currently, the survey provides no indication of population size or fluctuation. Both of these are criticial in understanding what changes in over time seen within the survey actually represent. There is also a huge potential for misinterpretation of the "Change" value depending upon whether the values indicated are displayed as percentage points, or percentages, that is, whether the value represents a increase/decrease in the value in the "Percentage" column, or whether it represents a percentage of the previous (or current) sample's results for that item.

The survey results also give no indication of what the rules governing the aggregation of the "Other" category are. With the "Other" category often representing a much greater amount than the lowest visibly ranked items, it seems important to understand what thresholds are involved. Understanding rounding methodology would also help provide clarity to smaller values.

The survey results have from time to time included a number of items (such as the "Unknown" category seen during 2012) which seem to indicate invalid or unrecognised values. It would be worthwhile to have some indication of what these may represent and notices when they have been resolved or otherwise identified. It would also be nice to know whether gamers making use of Wine are in any way identified as such.

The Survey Deserves A F.A.Q. And Channels For Communication

In June 2011, I set out to gather as much official information as I could regarding the survey and compiled what little I could find in a thread on the Steam Powered User Forums.

Later that month, I attempted to report or confirm whether a bug was preventing me from participating in the survey. After some persistence with Steam Support (ticket number 1747-WTPA-0758), I was effectively told that the survey was so anonymous that it would not be possible to know if there were a technical issue. Almost a year later, the following notice appeared above the survey results:

...specifically, only systems that had run the survey prior to the introduction of the bug would be asked to run the survey again. This caused brand new systems to never run the survey. In March 2012, we caught the bug, causing the survey to be run on a large number of new computers...- Steam Hardware & Software Survey, April 2012

Whilst I'm very thankful for Support Tech Seth's patience in responding to my queries, I feel that my experience highlighted the need for channels of communication to people involved with or at least knowledgeable about the survey. At the very least, a F.A.Q. would have helped me confirm and understand what the intended behaviour of the survey was and allowed me to make a more useful bug report and potentially earlier identification of the issue.

Some Possible Enhancements

As mentioned previously, compatibility bias in the titles published on Steam is likely to be playing a big role in shaping the survey results. Whilst data collection is likely to be more difficult, statistics on system requirements and compatibility of titles could provide interesting insight into which titles support legacy hardware and the uptake of new driver/API features which would be of value to hardware vendors and gamers alike.

The value to posterity touched on in the first section of this article is undermined by historical data from the survey not remaining published. Whilst this still exists to a degree in web caching projects like Archive.org's Wayback Machine and in any editorial coverage the survey receives, an official archive of historical survey data would be a worthwhile aid to analysts.

When a new month's survey results are added, the survey page suffers downtime (sometimes lasting days). An "Update in progress" notice would definitely help reduce confusion during this period.

Some kind of noficiation system or feed for new results would also be a nice addition.