Engineers, designers, and product managers make development decisions every day with the implicit goal of improving some key metric. We want to redesign a page because we believe that it will increase retention by improving the user experience. We want to add a brand new feature or expand an existing one because we believe this will increase product usage by making the product more valuable to customers.

Each of these decisions ultimately comes down to answering the question: Does this change actually have the intended effect? This boils down to a matter of causal inference - does making decision X cause Y to happen? In the context of web applications, we’re able to use the gold standard for answering questions about causal inference: randomized experimentation.

In larger organizations where marketing, sales, and support can influence user behavior, it becomes especially important to use experimentation as a tool to make product decisions. It, to some degree, isolates you from these external factors. As a result, experimentation is at the heart of product development on the Sidekick team. We are continually iterating on ways that our users can more easily understand and get value out of Sidekick. This experimentation process helps us move faster and with more confidence at all stages of a feature’s lifecycle*. It also enables us to learn more about our user base and apply those learnings to other parts of the product. Most importantly, it helps us craft the future of our product.



Finding an Experimentation Framework and Solution

Enabling this process, while having it scale across an organization and ensure statistical validity of the tests, requires us to have tools that make it easy to set up, implement, and analyze experiments. In our view, the ideal experimentation framework has the following properties:

Users always receive the same parameter values for a particular experiment. Besides resulting in a poor user experience, having different sessions potentially assign different parameter values also makes it difficult to take away any learnings from the experiment. Experiment exposure logging is taken care of for you by the framework so that experiment analysis is painless, predictable, and bug-free. Exposure logging is simply logging which users were exposed to an experiment and what treatments they received. Experiments scale both in terms of the number of experiments you run and in terms of performance as your user-base and product grow. Experimental design, regardless of the complexity of an experiment, should be do-able and understandable by anyone in the organization. The framework should assist in ensuring that experimentation does not slowly end up making your codebase accumulate cruft.

Due to these constraints, we began using Facebook’s PlanOut experimentation language and framework in its Python implementation. We found PlanOut to be incredibly helpful, but as we slowly transitioned our infrastructure from server-side rendering to client-side rendering we began to find that passing experiment information and parameters, as well as experiment exposure logging from the server to the client, became cumbersome, slow, and resulted in many avoidable bugs.

To accommodate this shift, we decided that we wanted to move from using a Python-based implementation to a JavaScript-based implementation. As a result, we ported and open-sourced a JavaScript implementation of PlanOut that would allow us to define and manage experiments client-side, as well as allow for experiments to be defined through a UI and served in a serialized representation. This implementation has many benefits over our server-side implementation. The first is that it results in almost no performance overhead in running experiments since we don’t have to pass experiment parameters between the server and the client in order to run client-side experiments. This is incredibly important since we care deeply about application performance and degradations in performance can influence some of our important metrics.

The second advantage is that it is much easier and less bug-prone to define and implement user interface experiments in the same codebase, which are most of the experiments that we run. Integrating it into a standard single-page app architecture is simple since our implementation of PlanOut was built to fit into this workflow. Each experiment requires a set of inputs in order to determine the corresponding randomized assignment, but in the server-side implementations of PlanOut this was required to be defined when the experiment was initialized. Our implementation allows for the inputs to be registered in the standard bootstrapping of a single-page application instead of only being allowed to be supplied at initialization, which allows the initialization of our experiment classes and registering of experiment inputs to be separate from each other. This makes it possible to easily interact with multiple external services and ensures that we minimize duplicated exposure logging.

Using PlanOut to Experiment with Virality

One example of how we’ve used experiments on Sidekick is by attempting to influence the number and quality of viral invitations sent. Free users of Sidekick get a month of unlimited notifications if they send their friend an invite with Sidekick and their friend accepts the invite (the friend also gets a free month). One big driver of invitations is an invite suggestions component that allows people to send invites with one click. In this experiment, we want to see if we can get users to send more invites by tweaking certain parameters around the component. In this experiment we don’t want to enroll paying users, so we conditionally un-enroll paying users from the experiment (this takes care of not logging exposure for them). For the remaining free users we implement a 2x3 factorial experiment where the two factors are the number of invite suggestions shown and the wording of the invite CTA. We define the number of invite suggestions we want to show as one experimental parameter to see if the number of invites sent monotonically increases as the number of suggestions increases or if there’s a point at which additional suggestions have no effect or a negative effect. The other parameter chooses between showing a button geared towards a selfish incentive (Invite) and another towards an altruistic incentive (Gift):

Running an experiment requires both an experiment definition and a corresponding implementation; as a result we wanted a way to naturally move from experiment definition to experiment implementation. At HubSpot most of our client-side JavaScript applications use React for the view layer of the application, and since many of the experiments we run are user interface experiments, we naturally tried to figure out a way to seamlessly implement experiments in React. The result is a small library called react-experiments.

Introducing ReactExperiments

The core idea behind the library is that it forms a one-to-one mapping from PlanOut experiment parameters to the props of React components and uses the resulting randomized experiment assignments for a particular user as props for the corresponding component(s). This integrates nicely with PlanOut’s focus on parameters instead of variations. The result makes the connection between definition and implementation more understandable, makes it less likely to have subtle implementation bugs that could invalidate the experiment, and makes the process of continually implementing experiments less susceptible to accumulating cruft. Let’s look at how simple it becomes to implement the preceding experiment using React.

Suppose this is the implementation of our code before the experiment:

Now, to implement our experiment all we have to do is replace the last line with:

We didn't have to touch a single line of application logic to implement this experiment. Not one line.

The base of the library is a Parametrize component which powers the rest of the capabilities of the library. For instance, the parametrize higher-order component is a convenience wrapper for when all the relevant props are in the same component. However, when you want to implement an experiment in a component where the props of interest may be in children components you can directly use the base Parametrize component coupled with the withExperimentParams higher-order component around the children components to provide the necessary parametrization of props. Here is an example of utilizing both the Parametrize component along with the withExperimentParams higher-order component.

Both the withExperimentParams and parametrize higher-order components provide powerful abstractions for experimentation. In some cases, however, the experiments we want to run involve complete redesigns where it simply isn’t feasible to specify the different range of parameter values between the different designs. For these experiments, react-experiments offers an ABTest component that allows for a declarative way to define different variations within your components. The declarative API will make it obvious to readers of the code that an experiment is going on between a number of variations and what is the behavior of each of those variations. After the experiment is done, it's therefore rather easy to clean-up the "losing" components and remove cruft. This component, like the higher-order components, is simply a convenience wrapper around the core Parametrize component. We think that the ability to both design and implement experiments in this manner is powerful compared to other methods.

Thinking about product design and development with an experimentation mindset is an asset for any product team and can lead product teams to be more focused on the right things and able to iterate much more quickly. We’ve found that applying this experimentation mindset along with the combination of using PlanOut.js and react-experiments to implement these experiments has had many benefits. We hope that you find the tools that we’ve open sourced helpful for doing the same!

* http://onstartups.com/insider-look-at-hubspot-sidekick-growth-approach talks more about how we use experimentation in our growth process.

** Thanks to our PaaS at HubSpot, we didn’t have to worry about not being able to start and stop these experiments exactly when we needed to do so.