by ALEXANDER WAKIM



Ramp-up and multi-armed bandits (MAB) are common strategies in online controlled experiments (OCE). These strategies involve changing assignment weights during an experiment. However, if one changes assignment weights when there are time-based confounders, then ignoring this complexity can lead to biased inference in an OCE. In the case of MABs, ignoring this complexity can also lead to poor total reward, making it counterproductive towards its intended purpose. In this post we discuss the problem, a solution, and practical considerations.







Background Online controlled experimentsAn online controlled experiment (OCE) randomly assigns different versions of a website or app to different users in order to see which version causes more of some desired action. In this post, these “versions” are called arms and the desired action is called the reward (arms are often called “treatments” and reward is often called the “dependent variable” in other contexts). Examples of rew…