As people increasingly communicate via asynchronous non-spoken modes on mobile devices, particularly text messaging (e.g., SMS), longstanding assumptions and practices of social measurement via telephone survey interviewing are being challenged. In the study reported here, 634 people who had agreed to participate in an interview on their iPhone were randomly assigned to answer 32 questions from US social surveys via text messaging or speech, administered either by a human interviewer or by an automated interviewing system. 10 interviewers from the University of Michigan Survey Research Center administered voice and text interviews; automated systems launched parallel text and voice interviews at the same time as the human interviews were launched. The key question was how the interview mode affected the quality of the response data, in particular the precision of numerical answers (how many were not rounded), variation in answers to multiple questions with the same response scale (differentiation), and disclosure of socially undesirable information. Texting led to higher quality data—fewer rounded numerical answers, more differentiated answers to a battery of questions, and more disclosure of sensitive information—than voice interviews, both with human and automated interviewers. Text respondents also reported a strong preference for future interviews by text. The findings suggest that people interviewed on mobile devices at a time and place that is convenient for them, even when they are multitasking, can give more trustworthy and accurate answers than those in more traditional spoken interviews. The findings also suggest that answers from text interviews, when aggregated across a sample, can tell a different story about a population than answers from voice interviews, potentially altering the policy implications from a survey.

Funding: This work was supported by a US National Science Foundation collaborative grant (SES-1026225 and SES-1025645) to MFS and FGC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. AT&T provided support in the form of salaries for authors PE and MJ, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.

Introduction

The growing use of smartphones is transforming how people communicate. It is now ordinary for people to interact while they are mobile and multitasking, using whatever mode—voice, text messaging, email, video calling, social media—best suits their current purposes. People can no longer be assumed to be at home or in a single place when they are talking on the phone, if they are willing to talk on the phone at all as opposed to texting or using another asynchronous mode of communication [1]. And they may well be doing other things while communicating more than they would have been even a few years ago.

This transformation is challenging the basis of how we gather essential information about society: how we measure our health, employment, consumer confidence, crime, education, and many other human activities. Modern social measurement depends on face-to-face (FTF) and landline telephone surveys, and more recently on self-administered web surveys on personal computers. As FTF and landline telephone communications change, it is possible that current methods will not be sustainable [2]. But the critical need for accurate data about the population persists; effective public policy and private sector strategy depend on credible measurement of people’s opinions and behaviors. For example, world economies and US electoral politics can be significantly affected by the US unemployment rate reported each month from the Current Population Survey, a government-sponsored survey with a sample of 60,000 households per month. As another example, policies on disease prevention, health insurance, and risk-related behaviors depend on surveys such as the Behavioral Risk Factor Surveillance System (BRFSS), in which a consortium of US states and the Centers for Disease Control and Prevention interview more than 400,000 US households per year to track health and disease trends. Any challenges to the accuracy of such data threaten our ability to understand ourselves collectively and create effective policy.

In the study reported here, we explored how members of the public report information about themselves in a survey when they are randomly assigned to respond in one of the new communication modes they now use every day, but which have not yet been used in social science and government surveys on a large scale. Our experimental design contrasts two factors that reflect the diversity in communication modes available on a single mobile device (in our case the iPhone): the medium of communication, voice vs. text messaging, and the interviewing agent, a human interviewer vs. an automated interviewing system. This leads to four modes of interviewing: Human Voice (telephone), Human Text (text message interview administered by an interviewer), Automated Voice (telephone interview administered by an automated system), and Automated Text (text message interview administered by an automated system). (The Automated Voice system is a version of what is known as Interactive Voice Response [IVR] in polls, market research, and other application areas, most often with touchtone response; see [3] on speech IVR systems). Each respondent was randomly assigned to one of these modes and answered on their own iPhone.

Our primary question is how these factors and the modes of interviewing they comprise affected the quality of survey data, as well as respondents’ subjective experience. We also examine what else respondents did while answering questions in these modes—whether they were multitasking and/or mobile—and how this affected the quality of their answers in the different modes. Because we measure survey responding on the same device for all respondents (as opposed to including other platforms such as Android or Windows), we can make fair experimental comparisons, even if the same modes could be deployed on other devices. Because respondents all used the uniform iPhone interface, any differences in responding across the modes cannot be because of platform differences. Because respondents used native apps on the iPhone—the Phone app or the Messages app—which they knew well and used for daily communication (as opposed to answering a web survey in a browser on the iPhone, or a specially designed survey app that they would need to download), any differences in responding across the modes are unlikely to have resulted from differential experience with the modes.

We examine data quality in these four modes by measuring the extent to which respondents’ answers were careful and conscientious (i.e., the extent to which respondents were not taking mental shortcuts or “satisficing”, see [4]–[5]), and the extent to which respondents were willing to disclose sensitive information. We measure thoughtfulness in answering questions that require numerical responses by looking at the percentage of answers that were precise (that is, not "heaped" or rounded by ending in a zero or a five); unrounded answers are more likely to result from deliberate, memory-based thought processes than estimation (see [6]–[7]), and they are more likely to be accurate in answers to objective factual questions [8]. We measure care in answering multiple questions that use the same response scale—from “strongly favor” to “strongly oppose”—by looking at the percentage of responses that were different from each other; the general view is that some variation across the responses (as opposed to “straightlining,” where the same response is given again and again) is likely to reflect more conscientious or thoughtful responding [9]. We use increased disclosure of sensitive information (e.g., more reported lifetime sexual partners, more reported alcohol use) as evidence of improved data quality, consistent with the evidence in survey research that more embarrassing answers are more likely to be true [10]–[12].

How might texting affect survey data quality? Little is yet known about how survey responding differs between mobile voice and text messaging interviews. Will people respond less thoughtfully in text because it is often used for casual communication, or more thoughtfully because there is less time pressure to respond? Will they respond less honestly because they aren’t hearing or speaking directly to a human interviewer, or more honestly because they feel less inhibited without spoken contact? Will the lasting visual record of text messages, which others might see, make people answer more honestly because they feel accountable, or less honestly because they feel embarrassed? Texting and speaking differ in fundamental ways that could affect both precision and disclosure in surveys, as Table 1 outlines. In addition to leaving a persistent visual record of what has been communicated, texting is less synchronous than speaking; the delay between successive text messages can be hours or even days. Even when the back-and-forth of texting is quick, it still does not allow simultaneous production and reception in the way that speaking does, nor does it allow precisely timed requests for clarification or feedback during the partner’s speech, because utterances arrive fully formed. In general, the rhythm of texting is quite different than the rhythm of speech: speakers in conversation are typically expected to respond immediately or to account for any delays (e.g., [13]–[14]), while the same expectations do not necessarily hold in text. In text, delay or complete lack of response doesn’t necessarily signal a problem in the way it does in speech. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 1. Voice vs. text on smartphones. https://doi.org/10.1371/journal.pone.0128337.t001 The effort required to communicate—to produce, comprehend and repair utterances—in voice and text also differs. In general, talking and typing require different mental and social processes [15], and the style of communication that has evolved in texting can be abbreviated (stemming in part from earlier character limits on the length of text messages) and less formal than other written modes [16]. In most cases people find it easy to talk and harder to type, although this may vary for different people, cohorts, and different mobile environments; for example, it may be easier to text than talk when it is noisy, and particularly hard to text when there is visual glare. Because of the lag between text messages, people texting may be more likely to shift their attention to other tasks, which means that to continue a text thread they must return their gaze and attention to the smartphone screen. In other words, the effort for multitasking (whether that means alternating between tasks or performing them simultaneously) can differ significantly between text and voice—texting while walking is harder (and less safe) than talking while walking. Additional effort in dialogue management can be required in texting if messages are received out of sequence because of network performance problems. More difficult to quantify is how different the social presence of the communicative partner is in texting. Texting feels different; there is no continuous (auditory) evidence about the presence of one’s partner—less of a sense that the partner is there (see, e.g., [17]–[18])—and less chance of interruption or feedback during an utterance. Text also doesn’t have as rich a palette of intonation cues, which can give a sense of a communication partner’s mental and emotional states, evaluative judgment, or attentiveness (though see [16] for discussion of the "paralinguistic restitution" that texters use to communicate nonverbal content). These differences could easily affect both precision and disclosure in surveys. For precision, texting reduces demand to respond immediately, and may enable respondents to take more time thinking about their answers and to respond more precisely. Alternatively, respondents might engage in more task-switching while texting, leading them to answer less precisely because the mental resources required for task switching diminish their processing ability. And the reduced social presence of the interviewer in text could lead respondents to feel less accountable for their answers, which could lead to less precision. For disclosure, texting offers a combination of features that could lead respondents to report more embarrassing information than when speaking: the asynchrony and reduced social presence of the interviewer give less immediate evidence of an interviewer’s reaction to answers, and possibly more time to grow comfortable with the potential consequences of disclosure. Also, texting an answer to a sensitive question might feel more “private” than speaking it out loud. On the other hand, respondents might disclose less in text if they worry about the possibility of others eventually discovering the answers on their phone, on synced computers, or in service providers’ databases. The asynchrony of texting could also give respondents more time to devise answers that are evasive or less truthful.