The Multi-Armed Bandit theory is a concept that originated from a problem solving theory developed by Robbins in the 1950s. He explained his theory with gamblers who were presented with a row of slot machines. Of course the gamblers objective is to win as much money as possible from these machines. Their goal in this instance is to minimize their “regret”, with “regret” being the difference between the money won from using an optimal gambling strategy and their actual money won. Let’s firstly go through an example that demonstrates the power of this theory. Then we’ll look at how we can apply Multi-Armed Bandit theory to A/B testing and website conversion rate optimization to create a more efficient way to A/B test.

Let’s say we have three slot machines, displayed in the diagram below.

Knowing the payout rates of the slot machines makes it clear that Slot Machine C has the highest returns.

Without knowing the payouts you would want to test all three machines to see which one would give you the best payout. However, while you’re doing this testing you want to minimize the difference between how much money you make during the test and the amount of money you would have made if you knew slot machine C had the highest returns.

The multi-armed bandit theory begins with the gambler pulling all the machines levers an equal amount of times. This is called the “exploration” phase. As specific machines begin to payout rewards those levers are pulled more and more often. This is known as the “exploitation phase” and is used to maximize the returns at all times during the testing period.

This same concept can be applied to landing page optimization, A/B split testing and any other marketing optimization. Imagine that you have a website where the goal is to get users to register for a $10/month membership. Currently the website sees a 2% conversion rate of registrations, meaning that if 1000 people visit the site you will receive 20 registrations, equating to $200/month. In order to improve your registrations you could create a new variation of your landing page and run a 50/50 A/B split test where you send 50% of traffic to the control (i.e. your current page) and 50% of traffic to your new variation. Let’s say that the variation has a red “Register now” button instead of the original green button. In this example you are testing whether the red button results in more registrations than the green one. With a traditional A/B test you would continue to run the test until the conversion rate of the two variations differs significantly enough to distinguish a winner.

This is traditionally what many marketers have done to increase website goals, whether it be newsletter registrations, click throughs to certain pages or sales. However there is one major flaw with this. The problem is that during the the test you are sending 50% of your traffic to a page that doesn’t convert as highly as the other one. If it turned out that your original control converted better than your variation, then during the test you would have lost half of your potential conversions. This method is costly and inefficient and poses the risk of losing conversions and large sums of revenue.

A more efficient way to A/B test is to take a multi-armed bandit approach, just like with the slot machine example above. This helps maximize the conversion rate during the test period, mitigating the risk of losing revenue. Using the same example we discussed earlier, you have a total of 2000 visitors which you distribute between the control and variation landing pages. In the initial exploration phase an equal amount of traffic is sent to both landing pages. As soon as one page begins to see conversions, slightly more traffic is sent to that page. What happens over time is that the multi-armed bandit algorithm eventually enters an exploitation phase where the higher converting page gets more and more traffic until the losing page is completely phased out. This process of traffic allocation based on historical conversions is an automatic process that continuously runs even if you add more landing page variations to test other hypotheses.

2000 total visits to be distributed across 2 landing pages. The control converts at a 2% conversion rate as mentioned above and the variation converts at 4%, however at this stage we don’t know this. We run a multi-armed bandit experiment and begin pushing equal traffic to each variation. This is what potentially would happen (figures are to illustrate concept and theory):

100 to control with 1 conversion (1% conversion rate)

100 to variation with 3 conversions (3% conversion rate)

200 to control with 4 conversions (2% conversion rate)

400 to variation with 14 conversions (3.5% conversion rate)

300 to control with 7 conversions (2.3% conversion rate)

800 to variation with 29 conversions (3.62% conversion rate)

500 to control with 10 conversions (2% conversion rate)

1200 to variation with 46 conversions (3.83% conversion rate)

550 to control with 11 conversions (2% conversion rate)

1450 to variation with 58 conversions (4% conversion rate)

As you can see, over time, more and more traffic is distributed to the higher converting variation page. After 2000 visits, 600 were sent to the control and 1400 were sent to the variation page. The total number of conversions was 69, which equates to an average conversion rate of 3.45%.

In a traditional A/B test, 1000 visitors would have been sent to the control and 1000 visitors to the variation. There would have been a total of 60 conversions (20 to the control and 40 to the variation, making it 99% statistically relevant) equating to a 3% average conversion rate.

As you can see during the test under a multi-armed bandit model there would be an extra 9 conversions which would be an extra $180/month revenue or a 15% increase in revenue as opposed to doing a traditional split test. This increase is shown on one small test. Imagine if this happened on larger traffic numbers on continuous tests, it could equate to a large increase of revenue.



The multi-armed bandit approach to split testing allows marketers to maximize conversions (by decreasing their cost or regret) during the split testing period. It allows you to continuously split test different variations and theories on an ongoing basis, always ensuring that you’re not wasting valuable traffic on poorly performing web page variations.

We all know that we should be constantly testing different ideas to increase conversions on our websites which is why we have developed Growth Giant. Growth Giant is a tool that uses the Multi-Armed Bandit Algorithm to continuously test landing pages. It automatically directs traffic to your better performing pages to ensure you get the maximum conversion rate.

If you are interested in split testing in a more efficient way to maximize your conversions and revenue then enter in your email below and stay tuned for the launch of Growth Giant.