Engaging with Actual Humans

At this point we could have gone immediately to a build phase, and may have in the past. However we chose to pursue further human interaction, by engaging people via in person feedback. The purpose of this human-centered research being to both understand what ideas resonated with people and to narrow in on what design concepts we should move forward with. Our test audience consisted of the people we hoped to ultimately engage with our data collection efforts — everyday internet citizens. We tested concepts by taking to the streets of Taipei and utilizing guerilla research methods. These concepts were quite varied and included everything from a voice-only dating app to a simple sentence read back mechanism.

Guerilla research with people passing by on the streets of Taipei.

We went into this research phase fully expecting the more robust app concepts to win out. Our strongly held belief was that people wanted to be entertained or needed an ulterior motive in order to facilitate this level of voice data collection. What resulted was surprisingly intriguing (and heartening): it was the experience of voice donation itself that resonated most with people. Instead of using a shiny app that collects data as a side-effect to its main features, people were more interested in the voice data problem itself and wanted to help. People desired to understand more about why we were doing this type of voice collection at all. This research showed us that our initial assumptions about the need to build an app were wrong. Our team had to let go of their first ideas in order to make way for something more human-centered, resonant and effective.

This is why we built Common Voice. To tell the story of voice data and how it relates to the need for diversity and inclusivity in speech technology. To better enable this storytelling, we created a robot that users on our website would “teach” to understand human speech by speaking to it through reading sentences. This interaction model has proved effective and has already evolved significantly. The robot is still a mainstay, but the focus has shifted. True to experience design practices, we are consistently iterating, currently with a focus on building the largest multi-language voice dataset to date.