Editors Note: This post originally appeared on the Firefox Test Pilot site in November of 2016. We’re migrating all of our graduation reports to Medium to make them more discoverable.

Universal Search launched with Test Pilot in May 2016 and is the first experiment we’ve formally retired. We designed it to get you to the best things on the Web faster, by adding search suggestions and site recommendations into Awesome Bar results. We think it did that. We also learned a lot in the process. That’s why we’re calling Universal Search a success.

Our basic question was “Would Firefox users find enough value in the recommended sites to click on them in the Awesome Bar?” and we tested this and gathered feedback in a few ways during the course of the experiment.

Here’s how it went down

At launch, we showed participants two types of recommendations based on their queries: top-level domains (such as Facebook.com, Amazon.com, Youtube.com, etc), and Wikipedia article pages.

Result displaying top-level domain.

Result displaying Wikipedia article.

For the first two months, we showed recommendations when we thought they would be relevant, about 20 percent of the time. During this period, we saw that participants clicked on a displayed result about 10 percent of the time (we call this a click-thru rate, or CTR).

In July, we changed the experiment so that we didn’t display a recommendation if the site was already in a participant’s browsing history. We expected this to dramatically reduce the frequency of results shown, and it did. Results displayed fell from 20 percent to 7 percent. We were surprised to see that our CTR fell, too. From this, we hypothesized that duplicates we removed included many sites that users visit habitually, so our initial CTR included many clicks “stolen” from history navigation.

In October 2016, we added a third result type, rich movie cards. We predicted these would have a higher CTR than the other types, but discovered that while rich movie cards outperformed Wikipedia, they did not outperform top-level domains.

Here’s what we learned

When we evaluate an experiment we look at usage data as well as what participants tell us in post-experiment surveys. By both measures, Universal Search was a success. In fact, it outperformed our expectations. We had surprisingly high CTR for showing only three result types and overall user sentiment was quite positive.

Results from telemetry data

The data showed a final average CTR of 7 percent, and indicated a clear preference for top-level domain results. A breakdown of clicks by result type showed that Wikipedia pages were shown most and clicked least. Top-level domains were the most consistent performer: our better relevancy on top-level domains (even after de-duping) probably indicated we were short-cutting users to sites they already knew they wanted to visit. Movie Cards had the least amount of test data, but outperformed Wikipedia pages.

Search Results Displayed and Clicked

User data showing percentage of time recommendations were displayed and percentage of time participants clicked on recommendations.

Displayed

Clicked

User data showing percentage of time recommendations were displayed and percentage of time participants clicked on recommendations.

Key events to note in the graph

May to July — Top-level domain and Wikipedia results displayed

Mid-July to Mid-August — No data collected during telemetry system upgrade

Mid-August to September — Top-level domain (de-duped) and Wikipedia results displayed

October to November — Rich movie card, top-level domain (de-duped) and Wikipedia results displayed

Weekly Performance by Result Type

Percentage of results displayed and clicked by type.

Top-Level Domains: Displayed

Top-Level Domains: Clicked

Wikipedia: Displayed

Wikipedia: Clicked

Movie Cards: Displayed

Movie Cards: Clicked

Results from participant feedback

The great thing about collecting feedback from participants is that we can learn as much from our failures as our successes, and even a successful experiment is not successful for everyone.

Some participants who left the experiment in the first weeks mentioned performance issues. Others told us they preferred separate search and Awesome Bars: their learned habits were hard to overcome. Many who left also cited poor result quality: search isn’t easy to get right, especially with a limited data set.

However, some of the most telling, representative survey feedback came from users who stayed. Universal Search had a high retention rate, and of the users who remained:

65% agreed that Universal Search results were meaningful, and

60% agreed that Universal Search helped them complete tasks faster.

Participant feedback survey results.

Here’s what happens next

What’s next for Universal Search? Now that we’ve announced retirement plans for this add-on, we will no longer maintain it. If you have Universal Search enabled, you don’t need to do anything. We’ll automatically uninstall it in the coming weeks.

The experiment we built was not a complete product experience, but it was enough to meet our learning goals. If we tried a similar experiment again, we’d pursue better result relevancy and more diverse result types. We would also likely test alternate UX treatments to see if we could affect interaction with results, or improve user acceptance of the combined Awesome Bar for those users who preferred separate URL and search bars. These are questions we may examine in future experiments.

Thank you to all the Test Pilots who installed Universal Search, used it, and told us what they thought! We are grateful for your participation.

Read additional Universal Search analysis from Chuck Harmston.

Want to try a new experiment? Visit https://testpilot.firefox.com.

Javaun Moradi, Product Manager