Some of us have had an embarrassing moment at a club using a raised voice to shout (over the loud music) to the person next to us only for the music to stop at the precise moment we say something. We only needed to shout because of the noise and we needed to compensate our voice to cut through. Sometimes a noisy environment is just too loud to even be heard despite your shouting. The same can be true for A/B tests, so how do we remove some of that noise to find to what’s really working?

Evan Miller already wrote a great blog post about Lazy Assignment, this article can be considered loosely an extension of this thought process and with not nearly as much mathematical rigour.

Consider the following; you have an eCommerce product detail page and you want to test a component at the bottom of this page. You have two pages with control and test component.

The element you want to test is below-the-fold, so people need to scroll to see it. Of course it’s only natural that some people won’t scroll, they might bounce or simply buy the item, so not everyone will actually see the component we want to test.

Now, consider that only 25% of the people visiting this page are scrolling down and seeing the component in both the control and test scenario.

You might say this is no problem, both control and test are equally exposed and we’ve got nothing to worry about. The challenge is the amount of noise being added by the 75% of people not scrolling down, this dilutes the effect of the test. It’s only reaching 25% of your audience so is it even worth testing? Maybe not for a small site but let’s say you run $100M business and you can make a 2% difference on 25% of your conversions. That’s still a $500k opportunity.

If we were testing a checkout change we don’t assign a person to either a control or test group at the point they are viewing our homepage, but only when they are seeing the change in checkout. They haven’t seen the test so it has no influence on them.

Equally if someone doesn’t scroll down on a page to see a component we are testing, why would we include them in a control or test group unless they have actually been exposed to the element we’re testing?

For the purpose of this example let’s assume that people who scroll have the same probability of converting as of those who don’t scroll. So if we have 200k people viewing the page at a 2% Conversion Rate then we will see 4k Conversions. If only 25% of people scroll to see the component we’re testing then those people will account for 25% of conversions (1000 in this case).

If we consider using a P-value of 0.05, then we need 105 incremental conversions for the test without viewability to reach significance. However when we only assign those who view the component to control/test then only 53 incremental conversions are needed to reach significance.

Now how do we actually go about implementing this in the real world?

We probably don’t want to generate scroll-heatmaps every time we want to perform a test, and even if we could, this might not be consistent over time or for different browsers. We actually need to measure viewability. Luckily this is already a problem that has been addressed by the advertising industry who need to ensure their clients are only paying for ads that are actually seen. I won’t be documenting an end to end test here but if you know how to code you will easily be able to follow along with Siebe Hiemstra’s tutorial on How to write a basic viewability tracker.(It’s elegant and properly debounced with requestAnimationFrame.)

The trick with your implementation which differs from an advertising tracker is that we want to use it for control or test group assignment. The viewability function will trigger when the component being tested is in view and then assign the user to the relevant group.

To my knowledge I don’t think any testing platforms enable viewability tracking out-of-the-box, so it’s something that is likely to be necessary to add to each one of your tests manually. If you know of any platforms that consider viewability out-of-the-box let me know.

Hopefully this could help you rethink some of the tests you might have performed which showed feeble results and might even mean that you could re-test these experiments using viewability and find some hidden value.