2011-02-17 : Ben Lehman: Playtesting: Stop

a guest post by Ben Lehman

Before I start, I just want to thank Vincent for giving me some space on his blog to discuss this. Being able to guest-post on anyway is pretty swell. Oh, and also, understand that the "you" in this essay is addressed from me, an amateur game designer, to other amateur game designers. If you're not a game designer, have fun reading some inside baseball. Okay, now on to the essay.

You need to stop playtesting. It's not just that you're doing it wrong, it's that you're doing it wrong in ways that hurt your games. Furthermore, by promoting a culture of design in which playtest is held up as the be all and end all of the design process, to which all other elements of design must bow down, you are causing at best cause other, newer designers to feel inferior and inadequate and at worse cause them to mutilate their own designs in ways similar to which you have mutilated your own.

Playtesting is fucking dangerous, and you need to stop doing it, stop talking about it, and stop using it as a substitute for the hard work of game design.

(I'm not saying don't playtest, btw. There is a very specific place for playtesting in the design process, which we'll get to at the end. I am saying that you need to stop playtesting because it is a terrible tool. If you were to see someone pushing a power drill towards their eyeball, you'd say "STOP DRILLING!" and not take the time comment about that the drill does have some appropriate uses.)

Here is a short sampling of the things that I regularly see people use playtesting for, but are terrible, no good, horrible ideas: Identifying rules and textual errors, mathematics and probability analysis, marketing and advertising, developing or finishing your game.

Let's take each in turn, shall we?

(A brief aside to define playtesting: it means an appropriately-sized group of people sitting down to test an unfinished game, putatively to assist in its design process.)

Textual Errors:

I hear people talk, sometimes, about "textual playtesting," where they use playtesters to test the coherency and completeness of their game text. This is a terrible idea, and results in mangled and incomplete texts and overconfident authors.

No one actually reads RPG texts. No, seriously, they're just not part of play. Whether or not your rules cohere in play has much more to do with your game's similarity to other games that they've played, and very little to do with the contents of the text. Your text could be totally complete and clear, and many RPG groups will muck it up anyway. Contrariwise, your text could be riddled with procedural and textual holes, and most groups that would playtest for you could make it work correctly.

Furthermore, since the effectiveness of a draft text is largely rooted in the group's prejudices, "textual playtesting" by groups who know who you are (and thus are willing to playtest your game) is not only likely to result in a chopped up, incoherent text, but is going to highly prejudice your text to be just like other games that you and your group of playtesting connections are familiar with. If you, like me, are interested in innovation, you can see the problem.

The right thing to do about textual errors is twofold: First, follow an effective and coherent didactic strategy; second, hire an editor who will provide you with structural feedback. Once your rules are already firmed up, take a look at good textbooks and board game rules and Jack Chick tracts, look at the different ways that teaching in text can be made to work effectively, pick a strategy, and pursue it with gusto and consistency. Then, turn it over to an editor who understands both good style and your goals and take almost every one of their suggestions for re-ordering and rewrites.

After this, I guess it couldn't hurt to turn it over to players for a round or two, as long as you don't compromise your didactic goals and run it through editing again, from scratch, if you make any changes. Personally, I'd just skip it: I think a competent editor knows better than any random group of gamers.

Rules Problems:

Okay, so sure, your game text shouldn't be playtested. But you should definitely test to look for rules holes and rules problems, right?

In a word, no.

In general, role-playing game rules are pretty simple. It's possible that your game is as complicated as Magic: the Gathering or Power Grid, in which case, yeah, exhaustive testing is the only possible way to discover that one particular subrule interferes in a bad way with some other particular subrule. Even then, it's not exactly a good tool: it's the best of a lot of terrible options.

But if your game is simpler, like say 3rd edition D&D (and most of my audience of game designers is producing games which are way simpler than 3rd edition D&D), you should have determined and fixed all rules holes well before they reach playtesting, not only because it is insulting to your testers and wastes their time, but because playtesting is simply a bad way to find rules holes.

(Let's define what I mean by "rules hole." I mean a gap where procedure of the game falls through: there's simply no way to proceed in the game. Also, I'm discussing "rules problems" which are rules that have detrimental effects on play: most commonly infinite loops or insufficiently thought-out mathematics, but also other things, like division of play responsibilities in such a way that it violates the Czege principle.)

Let's unpack that. Why is playtesting a terrible way to find rules holes? Simply put, unless you test exhaustively, you're not going to find all the rules holes and problems that are present in your game. This is almost tautological, but playing can only test the combinations and interactions which come up during play. All other possible combinations and interactions and rules uses are going to remain untested. Any rules holes or rules problems in that set are going to remain, often with detrimental effects on your game once it gets out in the world and all those unexamined holes are revealed through play.

For finding these problems and holes, though, playtesting isn't just inefficient and incomplete, it's also ineffective. Even considering the long odds that a particular rules hole will reveal itself during a playtest, most role-playing groups are not willing to simply leave a rules hole as a rules hole. They will patch it with their pre-conceptions, prejudices, and group social contract. Furthermore—and there's some interesting social context stuff here which I don't want to get into in this essay—they are not likely to even remember that the rules hole came up: once they've papered over it, it may as well not exist. The chances of actually getting them to report it to you, the designer are nearly nil. Even if you, the designer, are playing, the chances of you remembering it as a hole are low.

The right thing to do about rules holes is to take your rules set, once it's finished being written and modified and so on, and critically evaluate every possible rules interaction, looking for holes. Use a pencil and paper, if you're like me, or a blank text file, or even just your own head and the shower (although this can be dangerous: see above about memory holes.) Yes, this is a huge pain in the butt and it takes a long-ass time. It's also some of most important work you can do as a game designer, and if you're not willing to do it you should take up some other creative activity. The procedure for finding rules problems is similar, but requires even more critical faculty on your part.

Once you get good at this, you can start taking short cuts: "okay, so let's look at all abilities that work like this: can any of them cause unexpected interactions?" But ultimately you're going to have to do this a lot.

It's possible that your game has a lot of complicated interactions and nested cycles, in which case you may not actually be able to handle this. In this case, there are a couple of strategies. First, you could simply run through most of the common of the interactions, confirm that they basically work, and hope that no exotic ones come up in later play. Second, you can really get into your game, understanding its internal logic and appropriateness, which lets you review much faster, and does involve a fair amount of play (even solitaire, see below.) Third, you can change the mechanics of your game to include a cybernetic control system with the human players, which I will get into more in the next section.

A great tool for all of this is the imaginary play session. Just sit down and take on the role of several different players, with different personality and goals, playing your game. Figuring out how it works, from the inside, is a huge step towards the "logic and appropriateness" above.

Testing Mathematics and Probabilities:

This is, in many ways, a subset of the above, but I see people getting this wrong in such blatant ways that I think it deserves a particular call out. Playtesting is a terrible way to test the mathematics and probabilities of your game, which includes things like if the resolution system "feels whiffy" or whatever. Playtesting is absolutely not the place to determine this stuff, because no amount of playtesting will produce statistically significant results, and most games' probabilities are easily calculated anyway.

You know, I could write several paragraphs trying to explain that, but I don't know if I can do it in a clear way. Let me say it again in all caps: NO AMOUNT OF PLAYTESTING WILL PRODUCE STATISTICALLY SIGNFICANT RESULTS.

If you need me to clear that up for you, just ask in the comments.

The right practice, here, is to determine the results distribution of the game through probability calculation. I just do this with a pencil and paper, counting up probabilities on my fingers and applying my college probability class. If you're more spreadsheet / programmer oriented, you could do it with the monte-carlo method: have a computer run millions of tests of your mechanics and determine the probability distribution that way. If you're not into math or programming, just get a friend to help you: a lot of folks, including me, will do it pretty gladly.

It is possible that there are enough complicated loopy bits in your game mechanics that they elude rigorous calculation. Bliss Stage had this problem. The solution I implemented is to give a human player (in this case the GM) a throttle which they can use to accelerate or decelerate the mathematically unstable parts of the system (in this case, number of interlude scenes and, to a lesser degree, trauma spending strategies.) An additional advantage of this method is that it is called, I shit you not, a "cybernetic control system" which makes it 10 million times cooler than any other rules subsystem. (Dogs in the Vineyard also does this, with the Give rules.)

The absolute best practice in this case is to use simple enough mathematics that you don't need a lot of calculation to determine the probability distributions. Apocalypse World does this quite admirably, as do many other games.

Marketing:

You know all the stuff Vincent has been saying lately about the social context of game design, about your target audience, and how your game speaks to them or fails to speak to them? You know what won't help you find, develop for, speak to, or interact with your target audience? Playtesting.

If you don't have a target audience in mind, or if your target audience is something inaccessible or lame, your problem begins well before playtesting, and I really can't help you. Nor can anyone else. But if you know your target audience, and so on, and so forth, playtesting won't really help you unless your playtesters are composed of your target audience. Furthermore, even in that case, playtesting isn't the beginning of a conversation with your target audience. The beginning of the conversation is publication, and the continuation of the conversation is play.

Again, let's unpack. As a designer, your goals operate at both social and personal levels. At the social level, you're looking at the place you want your game to occupy in society and, in particular, in your subcultures. At the personal level, there's the aesthetics and premises you want to convey to each player of your game.

Playtesting fails at the social level simply because you can't "test" your games social impact: it's presence at that scale is basically a one time thing. There's no parallel societies in which to release your game once you've tested it in one.

(There's a possible exception here for people who write in languages other than English: you can use your native market as a test-bed for the global market. But if you publish in English, your first publication is going into the global market, with no takebacks.)

Playtesting fails at the personal communication level because a game is either not yet communicating what you want it to, in which case it needs further development but definitely hasn't succeeded in your artistic goals; or it does effectively communicate your goals in vision, in which case it's done and needs to be published.

In either case, playtesting is a means to prepare for an audience, not a means of finding or engaging with one.

Furthermore, in the land of sales and moneys, playtesting is pretty terrible advertising. While public playtesting can be a good way to generate "buzz" for your game, it doesn't often generate the sort of buzz you want (it's mostly buzz in a small group of insiders), in general, by the time that playtesting reports are good publicity for your game, playtesting itself is largely superfluous. Either playtesting is revealing problems with your game, in which case it's not good publicity, or its time that you should write your rules text, edit it, and publish it, in which case, stop playtesting.

There's a sidenote here, which is that there's a culture where endless playtest is, itself, a social entrance and a means of gaining social cred. To some degree, our over-playtesting has created this culture: you're not really "in" unless you're playing some unfinished game. For reasons both above and below, this is a terrible, terrible thing for actual game design. (I'm hesitating on elaborating here: I can probably unpack this more in the comments if anyone wants.)

Development:

Playtesting is an awful means of revising or developing your game rules. A lot of people seem to think that the process of playtesting is about revision, but in fact most rules revisions should come well before playtesting (see above), and the few remaining rules revisions should come well after playtesting. Never, under any circumstances, should a playtest group be revising the rules of the game. Neither is it a good practice to revise the rules of a game during a playtest.

The exact wrong people to tell you how to change your games rules are the players of the game, who are in the midst of the game's emotional and social manipulations, and cannot clearly see your design goals nor your own style. Allowing playtest groups to dictate rules changes violates your game's direction and focus, and results in confused, chopped up game texts full of a hodgepodge of different techniques from various people's favorite games. It drives against coherency and against enjoyment.

Furthermore, the amnesia of the game group (which was mentioned above) means that any rules changes that emerge from play are likely to be negotiated through the social consensus and prejudices of the group. It is highly unlikely that these rules will emerge in a state where they make any damn sense to anyone who wasn't a member of said group. So in addition to the scattering effect of letting playtesters develop your game, whatever rules changes that are made are likely to be superstitious garbage.

The best practice here is to design all rules changes yourself, in the context of your own understanding and testing of the design. No one knows the creative vision and focus of your game better than you. If you are unable to work out a solution to your rules issues, remember that playtesting is not a solution. Probably the best idea to to either let the game lie fallow for a bit, or to discuss the issue with trusted fellow designers, members of the target audience, or good friends (ideally: all of the above.) Generally speaking, in discussion, your own ideas are going to come to the surface and they'll fit much better.

("But what about playstorming!?" Go ahead. Ask me that. I dare you.)

Finishing Your Game:

I've saved the best (as in: the worst) for last. This here's the thing which I've read which most makes me aggravated: "I've taken a lot of time to carefully playtest the game. No rush to publication here! I took seven years testing it." This is absolutely bullshit. And it's not just, like, horrible self delusion that sitting around with your finger up your ass is making your game better. Nor is it just that you're absolutely wasting your own time, as well as the time of all your playtesters. This sort of thing is corrosive to the culture of play and design and you all need to stop this bullshit right now.

Rule of thumb: playtesting is like being engaged to be married: after a year you're just wasting your time. At full press of playtesting, a year is the absolute longest it should take to get a game into shape (it's okay to take longer than that when you add in writing the text, editing, layout, printing, marketing). If it takes longer than that there are a few possibilities: Your game is done and you should stop dwaddling and just publish already; you are not a good enough designer to finish this game right now and you should work on other projects and come back to it; you've been slacking off on your design process, which is fine but slacking off is not a fucking virtue; your game will never be good and you should stop wasting your time with this project and move on to others; you are artificially extending your playtest because you think it gets you more social cred to have games in development than actually ever finish a game; your game really is quite complicated and it just needs a little more testing to be great.

The reason that this is so harmful and corrosive is that it serves to make game design seem remote and inaccessible to novice and amateur designers, which is exactly the wrong thing for them to learn. "Oh, someone like me doesn't have the resources / time / commitment for the seven years of playtesting that real pros have to do. I guess I can never be a real game designer." Or, worse, they get caught in a cycle of endless fruitless playtesting, rather than finishing projects and learning and growing. Even worse, letting a playtest group mangle their game as described above. This is what happens when we let procrastination be turned into a virtue.

The best practice here is to shit or get off the pot. Failing that, stop acting so sanctimonious about it.

When to Playtest

Despite all of the above, playtesting is actually an important part of game design, it's just that it is a limited tool and, when misapplied, it is detrimental to both games and texts. So what is playtesting good for, anyway?

Playtesting is necessary for revealing problems with the parts of your game where the mechanics and processes of your game interface with the players at the table, and most particularly with their imaginations and social interactions (this is a broad definition of "mechanics and processes:" including such things as who speaks when, the game's setting, character-player relationships, and so on.) It is only useful for revealing problems, not resolving them, for the reasons noted above. It is useful for the parts of the game that rely on imagination and social interaction because these are the two things which you can't account for procedurally, and so problems there are invisible to your individual testing and calculation.

I should add a word about fun here. Included in this is whether or not the game is any fun (or, as I prefer to think of it, satisfying). For role-playing games, satisfaction is an emergent social property: we like what the game is doing to our imaginations and social interactions. There is, of course, a taste element to this, and because of this you should keep in mind the social context of your design and your target audience, just as you should throughout the whole design process. But there's also a lot that isn't a taste thing. Many games, particularly those in development, simply do not have that spark of satisfaction, and playtesting is required to reveal that.

Even then, playtesting is not a particularly satisfactory tool for determining these problems. Even with a great deal of playtesting, done appropriately, it is possible you will miss certain creative or social interactions which will cause problems with your game in play. But the problem is that there's simply no other way of knowing. Playtesting isn't miraculous, it's just the best of a bad job.

Of course, fixing it involves a lot of design work and creativity. The process is still a little misty to me, although I know it involves looking at how the moment-to-moment process of play is interacting with the goals for play, as well as a healthy dose of critically examining said goals altogether. Ultimately, I and most other designers rely on bolt-out-of-the-blue insights to resolve issues at this level.

The Takeaway

More playtest does not magically make your game better. Neither does it fix most problems that your game might have. It is not a replacement for skilled writing, a creative and technical vision, editing, inspiration, or game design work. It is a very specific tool for finding very specific problems, and attempts to use it beyond that are detrimental both to your own game design and to design culture in general.

It is also important when we embark on a playtest that we respect our playtesters' time and attention. Relying on playtest to point out textual, procedural, and mathematical flaws which could be more easily spotted with simple analysis is degrading to our playtesters and also simply bad game design practice. Stop it.

1. On 2011-02-17,said: 2. On 2011-02-17,said: 3. On 2011-02-17,said: 4. On 2011-02-17,said: 5. On 2011-02-17,said: 6. On 2011-02-17,said: 7. On 2011-02-17,said: 8. On 2011-02-17,said: 9. On 2011-02-17,said: 10. On 2011-02-17,said: 11. On 2011-02-17,said: 12. On 2011-02-17,said: 13. On 2011-02-17,said: 14. On 2011-02-17,said: 15. On 2011-02-17,said: 16. On 2011-02-17,said: 17. On 2011-02-17,said: 18. On 2011-02-17,said: 19. On 2011-02-17,said: 20. On 2011-02-17,said: 21. On 2011-02-18,said: 22. On 2011-02-18,said: 23. On 2011-02-18,said: 24. On 2011-02-18,said: 25. On 2011-02-18,said: 26. On 2011-02-18,said: 27. On 2011-02-18,said: 28. On 2011-02-18,said: 29. On 2011-02-18,said: 30. On 2011-02-18,said: 31. On 2011-02-18,said: 32. On 2011-02-18,said: 33. On 2011-02-18,said: 34. On 2011-02-18,said: 35. On 2011-02-18,said: 36. On 2011-02-18,said: 37. On 2011-02-18,said: 38. On 2011-02-18,said: 39. On 2011-02-18,said: 40. On 2011-02-18,said: 41. On 2011-02-18,said: 42. On 2011-02-18,said: 43. On 2011-02-19,said: 44. On 2011-02-19,said: 45. On 2011-02-19,said: 46. On 2011-02-19,said: 47. On 2011-02-19,said: 48. On 2011-02-19,said: 49. On 2011-02-19,said: 50. On 2011-02-19,said: 51. On 2011-02-19,said: 52. On 2011-02-19,said: 53. On 2011-02-19,said: 54. On 2011-02-19,said: 55. On 2011-02-19,said: 56. On 2011-02-19,said: 57. On 2011-02-20,said: 58. On 2011-02-19,said: 59. On 2011-02-20,said: 60. On 2011-02-21,said: 61. On 2011-02-21,said: 62. On 2011-02-22,said: 63. On 2011-02-22,said: 64. On 2011-02-22,said: 65. On 2011-02-22,said: 66. On 2011-02-22,said: 67. On 2011-02-22,said: 68. On 2011-02-22,said: 69. On 2011-02-22,said: 70. On 2011-02-22,said: 71. On 2011-02-22,said: 72. On 2011-02-22,said: 73. On 2011-02-23,said: 74. On 2011-02-23,said: 75. On 2011-02-23,said: 76. On 2011-02-23,said: 77. On 2011-02-23,said: 78. On 2011-02-23,said: 79. On 2011-02-25,said: 80. On 2011-02-25,said: 81. On 2011-02-25,said: 82. On 2011-02-25,said: 83. On 2011-02-25,said: 84. On 2011-02-25,said: 85. On 2011-02-25,said: 86. On 2011-02-25,said: 87. On 2011-02-27,said: 88. On 2011-02-28,said: 89. On 2011-02-28,said: 90. On 2011-02-28,said: 91. On 2011-02-28,said: 92. On 2011-03-03,said: 93. On 2011-05-26,said: 94. On 2011-05-26,said: 95. On 2011-05-26,said: 96. On 2011-05-26,said: 97. On 2011-05-26,said: 98. On 2011-05-27,said: 99. On 2012-03-24,said: 100. On 2012-03-24,said:

RSS feed: new comments to this thread