Discussion

Our goal for this research was not to answer the binary question of “should researchers be using public tweets or not?” but rather to explore users’ perceptions about research on Twitter and how they view contextual factors involved in this practice. Below, we first reflect on the major themes of our findings, then lay out some potential implications for both practice and design, and finally propose important future work implicated by this study.

As previously noted, the decades-old Belmont Report presents guiding principles for human subjects research and is relied upon by many researchers (Vitak et al., 2016). In analyzing our findings, we uncovered themes that tracked well to the Belmont Report (despite, most likely, our participants having little or no knowledge of them), suggesting that these guiding principles are still highly relevant.

Finally, a number of respondents specifically framed their desire to both know about the research and to see it when it’s finished as an issue of respect. Informed consent can be seen as both informing and consenting , and for many respondents, the former would be sufficient.

Many open-text responses we received emphasized respondents’ desire to understand contextual factors, often remarking that levels of comfort and whether or not they would be willing to grant permission to researchers “depends.” Factors included what and how many tweets are used, what the study is about, who is conducting it, or what methods researchers use. Many participants do not necessarily want to give a blanket answer about how they feel about Twitter research but want the control to consider research case-by-case.

As noted previously, many online data researchers do not perceive informed consent as relevant for collecting public data such as tweets ( Bruckman, 2014 ; Vitak et al., 2016 ). However, based on open-text responses to our survey, we saw that the idea of consent or permission came from the underlying importance of respect for our respondents. Many respondents’ attitudes relied heavily on whether or not permission was sought. For example, one respondent wrote,

Three of the strongest and most interesting themes from our data are tied to the Belmont principle of respect for persons or the recognition that people should have the right to exercise autonomy. Respect for persons commonly manifests through practices such as informed consent. The other two themes in our data we saw related to beneficence were the idea of choice and dissemination of findings.

Finally, the idea of minimizing risk and maximizing benefit is the argument for not suggesting that the solution is to stop doing Twitter-based research. After all, many respondents were positive about research, with comments such as “if it is for science why not” and “well research is a noble pursuit.” This suggests that they too are doing ethical calculus about whether a potential invasion of their privacy is worth it, for the benefit of research and science.

They also wrote about forms of harm such as being embarrassed by something published about them. One comment that came up frequently in reasons to be wary of research was that single tweets lack context or that quoting and further disseminating tweets makes them more public than they were intended to be.

The common ethical principle of “do no harm” is part of the Belmont Report as well in beneficence . Minimizing risk and maximizing benefit to participants is a large part of the ethical calculus that researchers often use. Our findings about dissemination could also be seen as part of this theme; given how much our respondents cared not just about being informed but about the opportunity to read published papers suggests that they may want the benefit of learning about the study, either for the sake of knowledge or for curiosity.

The Belmont principle of justice involves the assurance of reasonable, non-exploitative research methods that are administered fairly and equally to participants (and potential participants). Part of this, as explained in the Belmont Report, has to do with participant selection . However, it also involves fair (or at least equal) compensation to research participants. We are unaware of any studies of public Twitter data where the Twitter users have been monetarily compensated. Some participants stated their willingness to give permission would depend on compensation, or that commercialization of the research is a problem. They essentially saw this as a sort of exploitation. However, tying back to beneficence as well, it could be that providing a benefit to users—even if only what benefit is conferred by knowledge of the study and findings—would make these research practices seem less exploitative.

Implications for Practice and Design

The themes above suggest potential best practices or factors that researchers should consider in making ethical determinations about their research design. First, consider asking for permission if there is any reasonable way to do so. Since the study of public data may not fall under IRB or other regulatory purview, obtaining consent would not have to be a formal consent form. This would result in, as Brown et al. (2016) suggest, separating the legal from the ethical. Even simply the opportunity to “opt out” would be good practice—for example, tweeting at those whose tweet is included in the research and offering to remove their content from the dataset if they would prefer. Alternately, if a researcher does not seek permission beforehand, they could still consider informing the Twitter users after.

With respect to privacy, our findings point to some clear best practices: first, consider anonymizing identifying information when quoting tweets. Only a minority of our respondents stated that they would prefer for tweets to be attributed to them. Moreover, although from these questions we did not determine whether participants understand that verbatim tweets can be re-identified through Twitter search mechanisms even if their usernames are not disclosed, prior work suggests that many Twitter users are not aware of how widely available Twitter data are (Proferes, 2017). This suggests that participants who are comfortable with anonymous quotes but not for quotes attributed to them might be uncomfortable with their tweets being re-identified outside the context of a publication as well. We therefore recommend not quoting tweets verbatim without reason and generally to consider Bruckman’s (2002) levels of disguise when directly using content in a published work. This suggestion tracks to conclusions from M. L. Williams et al. (2017) on publishing tweets verbatim, that particularly if there is any personal information involved, researchers should consider obtaining consent.

However, attribution is also a decision where the subject matter of the study and population of participants should be considerations, as there may be contexts when attribution is appropriate and perhaps the most ethical course (Brown et al., 2016; Bruckman, Luther, & Fiesler, 2015). However, those respondents concerned about their privacy and the potential harm of data being traced back to them were most vocal in our data, and in a harm/benefit analysis, it would make sense to focus on minimizing this harm. Therefore, we suggest that publication of user identity should only occur when the benefits of doing so clearly outweigh the potential harms, or with user permission.

Also regarding privacy, respondents were highly uncomfortable with their profile information being analyzed in tandem with tweets. We also recommend not using deleted content (which is also prohibited by Twitter’s Developer Agreement [see Note 3]), due to the high level of discomfort expressed by our respondents, unless the benefit of doing so would justify violating users expectations.

In a general sense, our findings suggest users believe strongly in privacy by obscurity (Hartzong & Stutzman, 2013) and that research has the potential to disrupt this. Users often felt more comfortable with research using larger data sets (with the exception of more of a single users’ tweets, as respondents did not want researchers using their entire Twitter history). Furthermore, they felt more comfortable with the idea of tweets being analyzed by a computer rather than read by humans. However, both of these beliefs may be misguided, as re-identification is possible in large data sets (Zimmer, 2010a), and computer algorithms can be biased in both design and application (Friedman & Nissenbaum, 1996). Although we are certainly not advising against using qualitative methods in Twitter research, it is a situation where research design should be considered carefully, particularly, with respect to, for example, the subject matter of the study and content of the tweets.

Our primary proposal for best practices is for researchers to understand and reflect carefully on these contextual factors during study design. We suggest that researchers should most carefully think through ethics when research involves the more problematic factors listed above. For example, a study about sensitive topics such as medical conditions or drug use could be less appropriate for quoting tweets than a study about television habits. Within human-computer interaction (HCI) research, it is already common practice to take special precautions when working with vulnerable populations (Brown et al., 2016), and we suggest that this should extend to Twitter users as well. In sum, this work suggests that in making decisions about ethical use of tweets, researchers should pay close attention to the content of the tweets, the level of analysis with respect to making the content more public, and reasonable expectations of privacy (e.g., deleted or protected content). They should also consider taking steps toward informing users about the research and providing them with opt-out options, if it would not compromise the research or researchers.

In addition to these suggestions for best practices, in line with work done by Bravo-Lillo, Egelman, Herley, Schechter, & Tsai (2013), our findings point to ways in which the development of automated tools could contribute to ethical practices. First, for Twitter itself or anyone designing a new social computing system: consider providing a way for users to opt-in or opt-out of particular forms of research. This could be, for example, a flag set in the user profile or a black/white list included as part of the API. Another potential design would be to build a system that could provide public notices when data collection begins from a specific hashtag, informs users when their tweets are included in a dataset, and/or links those who have had their tweets used back to a published paper based on the results. Both of these designs would be a way to benefit Twitter (or other social media) users as well as to support best ethical practices. Similar to a system like Turkopticon (Irani & Silberman, 2013), we encourage others to think about symbiotic systems that would help empower users and research participants.

It is important to note, however, that we do not recommend that platforms solve this problem by making it impossible for researchers to collect public data. As expressed by many of our respondents, science and research is important. Over half indicated that if asked for permission, they would allow their tweets to be used without any dependencies and even more would give permission if they knew it was for scientific research. Therefore, we posit that disallowing use of public data in research altogether would be as poor an outcome as using it indiscriminately without any consideration for ethics.