Google just published today the Guidelines for Evaluation of Search Speech (version 1). Enrique Alfonseca, Staff Research Scientist, working on Google Assistant in Zurich, said that the publication of these guidelines come upon “requests from academics who are researching improvements in voice interactions, question answering, and voice-guided exploration.”

Alfonseca says also that the Search Speech Guidelines are similar to the Search Quality Rating Guidelines, but Google Assistant needed to have “its own guidelines in place, as many of its interactions utilize what is called “eyes-free technology,” when there is no screen as part of the experience.”

The dimensions for rating the answers used in Search Speech / Voice

Information Satisfaction: the content of the answer should meet the information needs of the user.

the content of the answer should meet the information needs of the user. Length: when a displayed answer is too long, users can quickly scan it visually and locate the relevant information. For voice answers, that is not possible. It is much more important to ensure that we provide a helpful amount of information, hopefully not too much or too little. Some of our previous work is currently in use for identifying the most relevant fragments of answers.

when a displayed answer is too long, users can quickly scan it visually and locate the relevant information. For voice answers, that is not possible. It is much more important to ensure that we provide a helpful amount of information, hopefully not too much or too little. Some of our previous work is currently in use for identifying the most relevant fragments of answers. Formulation: it is much easier to understand a badly formulated written answer than an ungrammatical spoken answer, so more care has to be placed in ensuring grammatical correctness.

it is much easier to understand a badly formulated written answer than an ungrammatical spoken answer, so more care has to be placed in ensuring grammatical correctness. Elocution: spoken answers must have proper pronunciation and prosody. Improvements in text-to-speech generation, such as WaveNet and Tacotron 2, are quickly reducing the gap with human performance.

According to Voicebot these guidelines are very important for brands and media, as “following these guidelines will be increasingly important in order for your content to surface in voice searches.”

Example: