The following blog post, unless otherwise noted, was written by a member of Gamasutras community.

The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.

Foreword

Design is something that is highly personal - I have no problem with this and I have seen many unique and diverse approaches. What used to bother me was the mysticism that seemed to surround effective design practices. Having come from a music composition background where rules and heuristics reigned supreme, "zen" design was not something that sat well with me - especially when you consider the financial risk associated with "shooting from the hip". Authors like Dan Cook, Raph Koster, Joris Dormins, Steve Swink, Ernest Adams & Jesse Schell (to name but a few!) have made significant inroads in turning alchemy into science. One of the branches of this new approach to understanding games is Rational Design or Rational Level Design [RLD]. I have previously written a few articles about the concept here, here and here but I have never written anything publicly available which starts from the beginning.

I had originally intended to put all of this into a co-authored, free open source text book called "The Rational Designers Handbook." I have since changed my mind and decided to turn my work in progress handbook into a series of blog posts right here on Gamasutra. This way, I can leave the co-authoring at the discretion of other game developers, scholars and educators. So here it is, the first part of the Rational Designers Handbook - An Introduction.

What is Rational Level Design / RLD?

Rational level design (RLD) is a way of objectively quantifying elements of user experience in order to create a consistent game play experience. RLD is most commonly used to understand how various game elements impact on difficulty. As difficulty plays such a significant role in determining user experience (a precarious balance between rage and boredom!), we can use the objective, number driven system of RLD to craft user experience. Although RLD is now used to create much more than just game levels, the RLD tag has stuck and is now used to describe design activities using this data driven approach.

Why do we use RLD?

RLD is most often used on projects with significant financial accountability or where we have large teams of people working on the same project. Although RLD is about crafting user experience based on a set of objective metrics, it's wide spread use in large production environments is largely a consequence of risk mitigation.

In RLD game elements are modified and created based on observation of user data and informed approximations created by mathematical regression methods. The key benefit of RLD is that it unites teams of designers by quantifying production process which are usually considered to be too intangible or abstract to be represented by traditional means.

RLD is also a useful tool for students and those learning design. Unlike Zen design which favors experience and intuition over metrics, the RLD approach provides a framework and a process which can help to alleviate some of the mystery of effective design. Many designers who adopt the RLD approach eventually integrate its concepts into their own design style and create a type of Zen / RLD hybrid which is informed by experience and enhanced by logic.

Tools of the Trade & Requisite Knowledge

RLD benefits from the fact that many of the tools required to implement it are readily available, sometimes free and generally well documented. Although some large publishers and developers will have developed their own in house, proprietary tools you will be able to implement your RLD pipeline just as well.

Grid Paper (yes I said paper!)

Rulers, pens, protractors, set squares, scientific calculators etc.

Some form of spread sheet application

Beyond the tools, a good grasp of mathematics will help greatly.

The Purpose of this Article & Subsequent Updates to the RLD Handbook

This article (and subsequent articles) are intended to be a resource for those wishing to use RLD as part of their own design processes. This series covers a number of examples of RLD and how they have been implemented as both development and analytical tools. It is important to note that this series is part of the broad body of knowledge which informs game design. Throughout the series you will see links and references to other sources. I highly recommend that you take the time to read the work of these authors. Most importantly acknowledge that the process of design is usually very personal. You ultimately need to select the models and structures that you have the most resonance with.

Process & “Unsexy” Stuff: How do we create difficulty numbers?

To understand how we create difficulty numbers in RLD process, there are two main issues that we need to understand;

Mathematically speaking, linear increases in complexity tend to create exponential increases in game difficulty. Think of this as the link between your RLD tables and the actual psychology of players interact with the final products of these tables. RLD is always a starting point and there will always be elements of your design which cannot be expressed objectively with numbers. When numbers fail, then look towards chaos to create interesting game level experiences.

Linear Increases & Exponential Difficulty

In order to understand where our RLD numbers come from, we need to closely examine point number one; linear increases in complexity, create exponential increases in difficulty. One of the key terms used as part of this process is dimensionality. Mathematically speaking, dimensionality is the number of spatial dimensions that we would need in order to enumerate every possible outcome of a set of modifiers.

As an example, let’s say that we are creating a game based around the outcome of a six sided dice. This six sided dice would be considered our one modifier. Statistically, we would say that our six sided diced is one probability space, consisting of six outcomes. The player could roll the dice and have a one-in-six chance of rolling any number. This type of ‘event’ has only one dimension, because we only need one axis to enumerate all of the possible outcomes. Although it is a little mundane, our enumeration would look something like this… (Figure 1)

Figure 1

If we expand our game to include the outcome of two dice (modifiers) then we have a set of events which needs to be enumerated using two axes… (Figure 2)

Figure 2

In the two dice event, we can see how one axis modifies the other – one plus one equals two etc. We can also see how every time one element is added to our designs, it exponentially increases the potential outcomes that the player has to sort through. By using two six-sided dice, we have gone from six outcomes to thirty-six outcomes. Not only this, but by adding one extra modifier, the probability of certain outcomes also becomes much more complex an interesting. By simply having two modifiers, we have a much more complex probability space: Events like rolling a sum total of 2 or 12 become one in thirty-six probability events whilst rolling a sum total of 7 becomes a one in six probability event. Although linear increases in modification lead to exponentially increased probability space, humans are exceptionally capable of dealing with these huge probability spaces. According to many, this is one of the reasons why we find such pleasure in identifying patterns.

We can visualize this concept in a different way by considering the amount of options a player might have at any one point as a branching decision tree. In the image above, we can see how having three options at any one stage (left) is exponentially more complex than only having two (right).

Figure 3

Figure 3 frames the RLD problem – whenever we modify or add a single element to a game, the growth rate of difficulty is not linear, but rather exponential to the point where a game will become impossible.

Creating our RLD Numbers

Now that we understand how linear increases in complexity cause exponential increases in difficulty, we can now put this into the context of a level design. Let’s say that you have a jumping challenge with linear increments in the gaps between platforms. This example is created in Unreal Development Kit (UDK) using the default physics and control type. Here I have created a test level to get a feel for my spatial metrics – i.e. to understand how the actual spatial measurements relate to my experience as a player. (Figure 4)

Figure 4

Here I have four platforms which I want to jump between. Each successive platform uses a linear increase in the gap. In this example, I have spaced each successive platform 16UU (Unreal Units / approximately 32 cm in real world measurements)[1] further apart each time. Using this spatial test, I will ask a set of new testers to jump between them without any prior training. Once each tester is done, I log the amount of failed attempts at each platform. I also will personally observe these tests so I can equate arbitrary numbers with player experience. What is really interesting about these types of experiments is that you will invariably find that the fail rate will increase exponentially, even with these linear increases in separating the platforms.

Difficulty always reaches a point of exponential growth. This is best represented when you attempt to graph your data. Based on this exact experiment, I found that one in ten testers failed the easy jump on their first attempt. Two in ten testers failed the medium jump on their first attempt and six in ten testers failed the hard jump on their first attempt. This then led me to create a table (Figure 5), with three levels of difficulty; easy medium and hard. Each level of difficulty then gets assigned a number based on the fail rate of the test data. The "RLD Difficulty Value" is the number of failed attempts per ten total attempts. My hardest jump, is now assigned a difficulty of 'six', because six out of ten attempts were unsuccessful and this pattern continues throughout my table.

Figure 5

One important thing to note is that easy tasks will always need to have some element of risk AND that difficulty has a finite limit before it becomes impossible. (Figure 6)

Figure 6

Modifiers & Dimensionality

So we now have a simple set of numbers which we can use to design our jumping challenges. But these numbers make a few assumptions; 1. The player will always jump to platforms which are at the same level AND 2. Having a challenge with eight successive easy jumps is equally as difficulty as a challenge with one hard jump. We know from experience though that this is simply not the case. To correct these two issues, we need to look at what our modifiers are, and what their values should be.

Figure 7

In the context of this example, the first modifier that we should consider should be gravity (Figure 7). We know that there is going to be a finite point to how high a platform can be before the player will simply be unable to reach it. We also know that there will be a point at which the platform is so low, that the player will incur gravitationally induced death when they land on it. By performing a spatial test we can quantify these numbers and use some simple mathematics to work backwards from these points.

Figure 8 is an example of the type of sandbox environment that should be created when working with any new game engine. These environments will change depending on the game, but in the context of a first person game, (which UDK is best suited to) you will need a way to test things like run speed, jump distance etc. Most importantly though, these types of test environments are invaluable for understand the “feel” of your game. Based on this interaction, when then attempt to transcribe feel into a system of objective metrics.

Figure 8

Figure 8 is a series of jump platforms created in UDK – the engine I will be working with. Each set consists of nine different platforms of incrementally increasing height and gap width. The aim of this particular sandbox is to find the limits of what the player can do and then use these as our maximum / impossible values to figure out our RLD metrics.

Figure 9

We know from our initial testing data what our RLD numbers are for a player jumping between level platforms. We know that one in ten testers failed the 128UU jump, two in ten failed the 160UU jump and six in ten failed the 192UU jump (highlighted in red in the table above). Based on this data, we now know that we should never exceed a value of six for any single permutation of width x height in the probability space.

This is the point at which you will need to get your hands dirty with some math – specifically we need to use the data we have and find the best way to interpolate the missing data. We know that difficulty tends to be exponential, so it stands to reason that we should probably consider an exponential regression as our starting point. A suite of regression tools can be found here.

Figure 10

Using this tool (Figure 10), I entered the data that I had at each point. Sometimes you will find that you need to try a number of regression methods before you get to one with an acceptable margin of error. In this case an exponential regression did not work, but a logarithmic regression did. (Figure 11)

Figure 11

Now that I have been provided a formula to use, I can then use this to interpolate all of my missing data for one series. When I graph the result, I can see my exponential curve start to form. What we have now is an approximate fail rate for each permutation along one axis. Our task now is to go back into the spatial test that we have created and try to find the limits of each series. For the purposes of this example, six will be the number that we use to indicate our maximum value.)

Figure 12[2]

At this point you have two options and I will list these in order of preference;

Interpolate the remaining values based on your perception of the problem BUT only use the numbers one, two and six to represent easy through to difficult Use the regression tools to mathematically figure these values out.

Although approach two can be mathematically correct, it often does not accurately represent the psychology of the problem. For example, as you run up to one of these ledges, there will be either internal states of doubt, tension, ease etc. that will skew the actual result. It is for this reason that I personally like to interpolate the values manually. Again, this reinforces the point that RLD is a great starting point for some elements of the design. As you can see from the table above, I found that small increases in height didn’t actual make the challenge noticeably harder until we approached a jumping challenge which was 64UU high. Once we have established these maximum points, we can then define anything that is either off limits – highlighted in black, or anything that really shouldn’t be considered a risk (i.e. values less than one).

For situations similar to this, you will often find that you end up with a set of RLD numbers which mimic this pattern; (Figure 13)

Figure 13[3]

You can use graphing tools to fine tune each one of these curves via a system of trial and error, or you could also use the regression tools which were employed to create the original series. The trial an error method involves graphing the data in real time on a per-series basis and then adjusting the numbers in each series until you achieve a graphed slope, similar to the one created using the proper logarithmic regression method.

Now if you are a designer and you a freaking out about the amount of math used to interpolate these values then you need to hear my dirty secret for figuring these values out. Believe it or not, aftermarket car ECU tuning software often has table systems which will automatically perform logarithmic, exponential and polynomial regressions based on limited data sets! They tend to work the same way that Microsoft Excel’s linear regression tool work, but with modified formulas. The best part is if you normalize your RLD number, punch them into the tuning software of your choice, it will give you a pretty get complete set of data points ready for tweaking!

Arbitrary Modifiers

In the context of this example, the last modifier which we need to consider is the amount of consecutive jumps required before the player can reach a check point. The more consecutive steps, the greater the difficulty - sometimes though it is very difficult to find an objective source to help us define a number for modifiers like this. Unlike the gravity modifier, this could potentially be a near infinite number before we consider it to be “impossible.” It is situations like this you will need to employ arbitrary modifier numbers. Even though these numbers are arbitrary you need to base them on some type of logic.

Figure 14

In Figure 14 we have a jumping challenge with three distinct sections. Each section is divided by a ‘safe zone’ – an area in which the player can rest and opt-in to the next section of the challenge. As designers, we need some way to define how difficulty is impacted by the amount of steps in each section. As the amount of steps per safe zone can be quite long, it important to select a number sequence which when graphed, maintains a fairly consistent slope. A good rule of thumb can actually be derived from the field of audio engineering. If you have one violin player which is too quiet how many extra violin players will you need to double the volume? As odd as it sounds, you would need about ten violin players to create double the volume! The same principle could be applied to this example and the perception of difficulty. For this reason, your sequence of numbers should be a near linear progression.

Figure 15

In my example above, I have created a series of modifiers which I can then use to modify the overall difficulty of each section of the jumping challenge. These numbers are based on the violin example above – when we multiply something by ten; we roughly double the perceived difficulty. As a lead designer, you will always needs to specify a cap for systems like this so that others in your team will share a consistent set of rules for the development of game spaces.

The Psychology of the Numbers

Some argue that RLD is an overly deconstructionist, mathematical and sterile way of designing game experiences. When used in isolation, without any type of testing this may seem to be the case, however when used correctly and consistently, RLD is a powerful tool when it comes to shaping user experience. Difficulty plays a significant role in the play experience. A game should always been designed with peaks and troughs of anxiety in order to keep the player engaged, however unexpected breaks in this “natural” progression can be highly detrimental. (Figure 16)

Figure 16[4]

Crafting Experience with RLD

Now that we have a clear idea of the impact of our modifiers, we can then look at crafting an element of our level – specifically a jumping challenge. Although RLD may seem clinical and sterile there is a very measurable psychological element to getting it right (and wrong!). The ideal game experience is one in which anxiety rises over time and where there are no flow, or immersion breaking elements in the design. Difficulty has the largest impact on this sense of immersion and flow. When things are too easy we tend to become bored, when things are too hard we become frustrated. By using RLD to craft our game experience, we ensure that difficulty follows this tried and true formula.

To demonstrate the link between the game experience and careful level construction via RLD, let us continue the jumping challenge example. We want to craft a jumping challenge with three distinct sections (Figure 17). Each section is punctuated by a safe zone which acts as a check point. We have defined our RLD numbers for each jump type first based on user testing and then via mathematical regression (and more testing). We have also expanded the dimensionality of our challenge via the inclusion of two more modifiers: gravity and the number of steps. Now it’s time to pull out the graph paper and draw a concept.



Figure 17

If we were to graph this example without the modifiers then we come up with something like the example below (Figure 18). We have a challenge which starts off easily, yet has a massive jump in difficulty.

Figure 18

By graphing only one modifier our graph suggests that we have created an experience breaking challenge. When the anxiety / boredom labels are applied to this graph you can see how we step out of the flow channel and into dangerous territory (Figure 19). But we are missing two important modifiers of this jumping challenge.

Figure 19

Once we add in our extra modifiers, our graph starts to become more indicative of the actual player’s perception of the challenge (Figure 20). The extreme spike that we saw in the second part of the challenge is somewhat mitigated. However the second part of the challenge is still problematic.

Figure 20

RLD is always a process of revision and refinement. The most obvious thing that stands out about this challenge is that the middle section is far too difficult in its current form. We need to look back to our modifiers and see what we can do to address this. We have a few options which we can use based on our metrics;

Make the gaps smaller Leave the gaps the same and adjust verticality Reduce the amount of steps

One of the most important elements of RLD is to discern the priority of your system of metrics. You will want at least three metrics – primary, secondary and tertiary. The relationship between your metrics will generally always be transitive. That is, adjusting the primary metric will have the greatest impact and will have a cascade effect on the secondary and tertiary metrics and so on.

In the case of this challenge, if I simply use smaller jumps (i.e. the easy jumps) then I know that this will more than likely change the challenge from being too hard, to being too easy. I then defer to my secondary metric – vertical modification / gravity modification. Logic dictates that asking the player to jump to a lower platform will be slightly easier. We still keep the same gap, but simply apply one of the modifiers to tweak the numbers.

Based on this logic, you can use the table you created earlier in order to create a combination of jumps which result in a slight lowering of difficulty. In the case of this jumping challenge, simply lowering each consecutive step by 16UU should give me the desired result (Figure 20). It is important to note that I referenced the table BEFORE drawing and calculating the actual example. This is the benefit of doing RLD work as part of your pre-production – iterations become quicker and solving design issues becomes more process orientated rather than a process of arbitrary guestimations.

Figure 21

When we graph the end result, we come up with a much improved distribution of difficulty (Figure 21). Remember that difficulty is always relative - as the players skill increases, so too will there perception of what is (and is not) difficult. Although this graph shows the third part of the jumping challenge moving into anxiety territory, this is where your distributions should roughly be.

Chaos & the Primary Metrics of RLD

Chaos is what happens to a player’s perception of difficulty when you jumping challenge goes from this… (Figure 22)

Figure 22

To this…. (Figure 23)

Figure 23

In both instances, our jump platforms are distributed using the same height and distance metrics, yet the representation of the challenge is different – maybe even chaotic.

Now when I say chaos, I don’t mean disorganization or haphazard, random & uninformed design decisions. What chaos is in the context of RLD is all of the modifiers, (both of the game and of the players mind which are evident in the game play experience) which are not able to be easily expressed in a quantifiable and objective manner. Although we do not quantify these phenomenon, we nevertheless use them and consider them in the process of RLD. Put another way, chaos is the part of RLD which we do not represent numerically, but is nevertheless an integral element of player experience.

A good example of the chaos element would be considering the role that aesthetics play on the perception of difficulty in our jumping challenge. We could implement the same jumping challenge numerous ways aesthetically. Although we may change how the puzzle or challenge is implemented using various artistic approaches, the numerical difficulty and percieved difficult may be very different. For instance, one implementation of the puzzle or challenge might have deafening, atonal audio overwhelming the player and another version may be identical, bar the audio. Which one is more difficult? The answer is that both of them are equally as difficult from an RLD perspective, it is just that one would be perceived to be more difficult based on aesthetic alone.

In my next post, I will explore the notion of global metrics and how we can account for these 'other' elements which impact on difficulty metrics.

[1] The conversion between UU and real world measurements depends largely on how the underlying physics system has been written. Although many different games use Unreal Engine, the translation of UU to real world measurements varies slightly.

[2] The black sections indicate permutations that are off limits. In the case of Figure 12, this blacked out sections are impossible under normal circumstances

[3] Just like the example given in Figure 12, more sections have been blacked out in this table to represent permutations which are too simple to be considered a puzzle.

[4] This explanation of the “Flow Channel” is derived from Jesse Schell. The art of game design a book of lenses. Elsevier/Morgan Kaufmann, 1st edition, August 2008.